Introduction to the WordDataProvider Module
The WordDataProvider module provides a data provider API for reading and writing Microsoft Word documents through the DataProvider API. It supports the modern .docx format (Word 2007+) using Apache POI's XWPF library.
The following classes are provided by this module:
Reading Word Documents
Reading Paragraphs
%requires WordDataProvider
WordReadDataProvider dp("document.docx", {"content_type": "paragraphs"});
list<hash<auto>> paragraphs = map $1, dp.searchRecords();
Reading Tables
%requires WordDataProvider
WordReadDataProvider dp("document.docx", {
"content_type": "table",
"table_index": 0,
"header_row": True,
});
list<hash<auto>> rows = map $1, dp.searchRecords();
WordReadDataProvider dp2("document.docx", {
"content_type": "table",
"headers": ("Name", "Department", "Salary"),
});
Read Options
path: Path to the Word document
stream: Input stream for Word data
data: Binary Word document data
content_type: "paragraphs" (default) or "table"
table_index: Index of table to read (0-based, for table mode)
header_row: If True, first row of table contains headers
headers: List of header names to use
Writing Word Documents
Writing Paragraphs
%requires WordDataProvider
WordWriteDataProvider dp("output.docx", {
"content_type": "paragraphs",
"title": "My Document",
});
dp.createRecord({"text": "First paragraph content", "style": "Normal"});
dp.createRecord({"text": "Section Heading", "style": "Heading1"});
dp.createRecord({"text": "More content here.", "style": "Normal"});
dp.commit();
Writing Tables
%requires WordDataProvider
WordWriteDataProvider dp("output.docx", {
"content_type": "table",
"headers": ("Name", "Department", "Salary"),
"title": "Employee Directory",
});
dp.createRecord({"Name": "Alice Smith", "Department": "Engineering", "Salary": "75000"});
dp.createRecord({"Name": "Bob Johnson", "Department": "Marketing", "Salary": "65000"});
dp.commit();
Writing to Binary
%requires WordDataProvider
WordWriteDataProvider dp({
"content_type": "paragraphs",
"title": "In-Memory Document",
});
dp.createRecord({"text": "Some content"});
binary data = dp.getData();
Write Options
path: Output file path
stream: Output stream
content_type: "paragraphs" (default) or "table"
headers: List of column headers (for table mode)
title: Optional document title (added as Heading1)
Paragraph Styles
When writing paragraphs, the following style names are supported:
"Normal": Regular paragraph text
"Heading1": Main heading (bold, 16pt)
"Heading2": Secondary heading (bold, 14pt)
"Heading3": Tertiary heading (bold, 12pt)
Error Handling
Common exceptions:
WORD-READ-OPTION-ERROR: Invalid read options or option conflicts
WORD-WRITE-OPTION-ERROR: Invalid write options or option conflicts
Release Notes
WordDataProvider v1.0
- initial release of the module
- support for reading Word documents (.docx format)
- support for writing Word documents (.docx format)
- paragraph and table read/write modes
- header row detection for tables
- binary data input/output support