Professional XML (Programmer to Programmer)

As I write this, Office 2007 has just gone to Beta 2 and should be commercially available by the time the book is on the shelves. Apart from the ribbon and other highly visible changes to the Office user interface, the biggest change relates to XML developers. The native file format for most of the documents is now XML-or rather, a number of XML files bound together in a ZIP format. Figure 25-14 shows the contents of a simple DOCX file.

image from book Figure 25-14

The files stored within the document contain the actual text, as well as the formatting and other elements. The most commonly used files are:

The previous documents are the only required elements for a Word 2007 document. In addition, there are a number of optional files that may occur:

Listing 25-13: Document.xml.rels

<?xml version="1.0" encoding="utf-8" standalone="yes"?> <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings" Target="settings.xml" /> <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml" /> <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/numbering " Target="numbering.xml" /> <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme" Target="theme/theme1.xml" /> <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable " Target="fontTable.xml" /> <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettin gs" Target="webSettings.xml" /> </Relationships>

The basic flow for processing a document using OpenXML format should be the following:

  1. Read the _rels\.rels file to determine the file containing the document. Typically, this is the item identified as rId1, but this is not essential. Look for the relationship that contains a pointer to the http://www.schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument schema:

    <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDoc ument" Target="word/document.xml" />

  2. Open the document file and process.

  3. If you need additional information, refer to the document.xml.rels file to locate the files needed. All currently have types defined as a subset of the URN http://www.schemas.openxmlformats.org/officeDocument/2006/relationships.

The OpenXML specification does not only define Word documents; it also defines Excel and PowerPoint documents. It is also an extensible and flexible document format. See the References section that follows for the current specification.

Категории