HWPFOldDocument Examples



Introduction:

Apache POI is a powerful Java library that allows developers to create, modify, and display Microsoft Office files. While it is widely known for its capabilities with Excel and Word documents, this blog post will specifically focus on the Word processing module and, more specifically, the HWPFOldDocument class. We will delve into some practical examples to showcase the potential of Apache POI in handling older Word documents.

Understanding Apache POI HWPFOldDocument:

The HWPFOldDocument class in Apache POI is designed to work with Word documents created using the older binary file format (.doc) rather than the newer Office Open XML format (.docx). This class provides an interface to access and manipulate the content of these legacy Word documents.

Example 1: Reading Text from a .doc File

Let's start with a simple example of reading text from an existing .doc file:

import org.apache.poi.hwpf.HWPFOldDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;

public class ReadDocFile {
    public static void main(String[] args) {
        try {
            HWPFOldDocument document = new HWPFOldDocument(new FileInputStream("example.doc"));
            WordExtractor extractor = new WordExtractor(document);
            
            // Extract text from the document
            String text = extractor.getText();
            
            // Display the extracted text
            System.out.println("Document Text:\n" + text);
            
            // Close the document
            document.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

This example demonstrates how to open an existing .doc file, extract its text content using the WordExtractor, and display it.



Example 2: Modifying a .doc File

Now, let's explore how to make modifications to a .doc file:

import org.apache.poi.hwpf.HWPFOldDocument;
import org.apache.poi.hwpf.usermodel.Range;

public class ModifyDocFile {
    public static void main(String[] args) {
        try {
            HWPFOldDocument document = new HWPFOldDocument(new FileInputStream("example.doc"));
            Range range = document.getRange();
            
            // Modify the document content
            range.replaceText("oldText", "newText");
            
            // Save the modified document
            document.write(new FileOutputStream("modified_example.doc"));
            
            // Close the document
            document.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

In this example, we open a .doc file, obtain the document's range, replace a specified text with new text, and then save the modified document to a new file.

Conclusion:
Apache POI's HWPFOldDocument class provides a valuable tool for working with older Word documents in the .doc format. These examples showcase the basic operations of reading and modifying such files. As you explore further, you'll discover additional features and capabilities that Apache POI offers for handling Word documents, making it a versatile and indispensable library for Java developers working with legacy file formats.


Post a Comment

Previous Post Next Post