Apache POI's HWPFDocument Example



Introduction:

Apache POI (Poor Obfuscation Implementation) is a powerful Java library that provides a set of APIs for manipulating various Microsoft Office file formats. One of the components of Apache POI is the HWPF (Horrible Word Processing Format) module, which is designed for handling Microsoft Word documents in the older binary format (.doc). In this blog post, we'll explore the capabilities of Apache POI's HWPFDocument and provide examples to demonstrate its usage in document manipulation.

Setting Up Apache POI:

Before diving into examples, you need to set up Apache POI in your Java project. You can include the necessary dependencies using a build tool like Maven or Gradle. For Maven, add the following dependency to your `pom.xml` file:

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>5.0.0</version>
</dependency>

For Gradle, include the following in your `build.gradle` file:

implementation 'org.apache.poi:poi:5.0.0'

Example 1: Reading a Word Document:

To read the content of a Word document using Apache POI's HWPFDocument, you can use the following code snippet:

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;

import java.io.FileInputStream;
import java.io.IOException;

public class ReadWordDocument {
    public static void main(String[] args) {
        try (FileInputStream fis = new FileInputStream("path/to/your/document.doc")) {
            HWPFDocument document = new HWPFDocument(fis);
            WordExtractor extractor = new WordExtractor(document);
            String text = extractor.getText();
            System.out.println("Document Content:\n" + text);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}




Example 2: Modifying a Word Document:

You can also modify an existing Word document using HWPFDocument. In this example, we'll replace a specific text in the document:

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.Range;

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

public class ModifyWordDocument {
    public static void main(String[] args) {
        try (FileInputStream fis = new FileInputStream("path/to/your/document.doc");
             FileOutputStream fos = new FileOutputStream("path/to/your/modified_document.doc")) {
            HWPFDocument document = new HWPFDocument(fis);
            Range range = document.getRange();

            // Replace "oldText" with "newText" in the entire document
            range.replaceText("oldText", "newText");

            document.write(fos);
            System.out.println("Document modified successfully.");
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Example 3: Creating a New Word Document:

You can create a new Word document from scratch using HWPFDocument:

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.Range;

import java.io.FileOutputStream;
import java.io.IOException;

public class CreateWordDocument {
    public static void main(String[] args) {
        try (FileOutputStream fos = new FileOutputStream("path/to/your/new_document.doc")) {
            HWPFDocument document = new HWPFDocument();
            Range range = document.getRange();

            // Add content to the document
            Paragraph paragraph1 = range.getParagraph(0);
            paragraph1.insertBefore("This is the first paragraph.");

            Paragraph paragraph2 = range.insertAfter(new Paragraph());
            paragraph2.insertBefore("This is the second paragraph.");

            document.write(fos);
            System.out.println("New document created successfully.");
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Conclusion:
Apache POI's HWPFDocument provides a robust set of features for working with Microsoft Word documents in the older binary format. Whether you need to read, modify, or create Word documents, the examples provided in this blog post should serve as a solid foundation for your document manipulation tasks. As always, refer to the official Apache POI documentation for a more in-depth understanding of the library's capabilities and options.


Post a Comment

Previous Post Next Post