2007 MicrosoftВ® Office System Inside Out (Bpg-Inside Out)

This section will show you how to access the ZIP package for an Office Open XML Format document and how to begin to make sense of what you find there. For the best results, I suggest that you take each subsection that follows step by step and be sure you understand and feel comfortable with the content before continuing onto the next.

Breaking into Your Document

Because each of your 2007 Office release Word, Excel, and PowerPoint documents is actually a ZIP package in disguise, you can just change the file extension to .zip to no access all of the files in the package. There are a few ways to go about this.

Caution 

If you have software installed that extracts files from a ZIP package, you might be able to look at the files in the ZIP package by using that extraction software, without first changing the file extension. However, you’re unlikely to see the folder structure of the package when you do this, which is an essential part of the package integrity. Changing the extension takes just a second and enables you to view and manage your files in Windows Explorer, for familiar file access options.

Inside Out-That ZIP Package Is Still a Document 

When you’re editing the files in the ZIP package, you might not want to spend the time switching back and forth between the Office Open XML file extension (such as .docx) and the .zip extension. Well, you don’t have to!

From the Open dialog box in Word, Excel, or PowerPoint, you can open documents that belong to the applicable program even when they’re using the .zip file extension. To see your ZIP package file, just select All Files from the Files Of Type drop-down list beside the File Name box and then select and open the file as you would when using its original extension. There’s nothing else to it. Word, Excel, and PowerPoint know that the Office Open XML Formats are ZIP packages and read the XML within those packages whether the file is saved using .zip or a file extension that belongs to the program.

Note that you can also open the ZIP package in the appropriate program through the Open With options available when you right-click the ZIP package on the Windows desktop or in Windows Explorer. If you do this, just be careful not to accidentally set the applicable program as the default for opening this file type, or you’ll add an extra step for yourself every time you want to access the document parts in the ZIP package.

However, for ease of use as well as for sharing documents with Microsoft Office users of all experience levels, it’s a good idea to make sure the file extension is changed back to its original state once you’ve finished editing the files in the ZIP package.

The Office Open XML File Structure

Once you change the file name to have the .zip extension, open the file in Windows Explorer. The example that follows walks you through the ZIP package of a simple Word document, originally saved with the .docx extension..

When you first view the ZIP package for a Word document in Windows Explorer, it will look something like the following.

Note that, at the top level of the ZIP package that you see in the preceding example, Excel and PowerPoint files look very similar except that the folder named word in the example is named xl or ppt, respectively, for the applicable program.

Exploring a bit further, when you open the folder named word, you see something similar to the following image.

If you return to the top level of the ZIP package and then open the docProps folder, the following is what you’ll see.

By default, this folder contains the files app.xml (for application properties such as word count and program version) and core.xml (for document properties such as the Document Properties summary information like author and subject). Additionally, if you use the options to save a preview picture or a thumbnail for your document, you see a thumbnail image file in the docProps folder. For Word and Excel, this will be a .wmf file and for PowerPoint it will be a jpeg file.

Note 

If you’re running the 2007 Office release on Windows Vista, you’ll find an option in the Save As dialog box in Word or Excel to save a thumbnail image of your document. In PowerPoint, or in all three programs when running Windows XP, you’ll see the option Save Preview Picture in the Document Properties dialog box.

Taking a Closer Look at Key Document Parts

Let’s take a look at the XML contained in a few of the essential document parts, to help accustom you to reading this file content.

The image you see below is the [Content_Types].xml file for the sample ZIP package shown under the preceding heading, as seen in Windows Explorer.

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> -<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"> <Default Extension="wmf" ContentType="image/x-wmf" /> <Default Extension="rels" ContentType="application/vnd.openxmlformats- package.relationships+xml" /> <Default Extension="xml" ContentType="application/xml" /> <Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats- officedocument.wordprocessingml .document .main+xml" /> <Override PartName="/word/styles.xml" ContentType="application/vnd.openxmlformats- officedocument.w wordprocessingml .styles+xml" /> <Override PartNarne="/docProps/app.xml" ContentType="application/vnd.openxmlformats- officedocument .extended-properties+xml" /> <Override PartName="/word/settings.xml" ContentType="application/vnd.openxmlformats- officedocument.wordprocessingml .settings+xml" /> <Override PartName="/word/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml" /> <Override PartName="/word/fontTable.xml" ContentType="application/vnd.openxmlformats- officedocument.wordprocessingml .fontTable+xml" /> <Override PartName="/word/webSettings.xml" ContentType="application/vnd.openxmlformats- officedocument.wordprocessingml.webSettings+xml" /> <Override PartNarne="/docProps/core.xml" ContentType="application/vnd.openxmlformats- package.core-properties+xml" /> </Types>

The following image shows you the content of the .rels file in the top-level _rels folder shown earlier for the sample ZIP package.

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> - <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> <Relationship bold">rId3" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/core properties" Target="docProps/core.xml" /> <Relationship bold">rId2" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/thumbnail" Target="docProps/thumbnail.wmf" /> <Relationship bold">rId1" Type="http ://schemas.openxmlf Formats .org/off ficeDocument/2006/relationships/off ficeDocument" Target="word/document.xml" /> <Relationship bold">rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended properties" Target="docProps/app.xml" /> </Relationships>

Note 

The .rels file should open without issue in Internet Explorer. But, if this doesn’t work for you, append the .xml file extension to a copy of the .rels file, just for viewing purposes. Also note that, when in the ZIP package, files will only open in their default assigned program. To be able to open a document part in both Internet Explorer and Notepad, as needed, copy the file out of the ZIP package. Then, right-click the file and point to Open With to select the program you need.

Depending on the content in your files, you might run across defined relationships in your .rels files that aren’t used to specify files in the ZIP package and therefore might take on a slightly different structure for the relationship target. For example, notice the following relationship from a document.xml.rels file for a document that contains a hyperlink to the Microsoft home page.

<Relationship bold">rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink" Target="http://www.microsoft.com" TargetMode="External" />

Though the relationship ID and Type have the same structure as a relationship to a document part, notice that the target in this case is to an external hyperlink instead of a file in the package.

When you open a file in its originating program (Word, Excel, or PowerPoint), keep in mind that the .rels files are the first place the program looks to know how to put the pieces together for the purpose of opening that file.

Building a Basic Word Document from Scratch

The document shown in the preceding portions of this section is a simple Word document with all the defaults you get when you use Word to create a new document in the DOCX file format. Now, it’s time to build a .docx file yourself, without using Word.

If you’re thinking about skipping over the rest of this section because it sounds either too complicated or unnecessary for your needs, please wait. This exercise is important for three reasons.

  1. You might be amazed at how easy this is to do. And, discovering the simplicity for yourself can help you master the tasks in this chapter that you want to learn.

  2. You can find all the XML code you need for this section in a provided sample file (explained in a note that precedes the first part of the following exercise), if you prefer not to type out the XML for yourself.

  3. This exercise is included early in the primer because doing a similar exercise when I was first learning about the Office Open XML Formats was the most helpful thing I did toward understanding the basics of how the parts in an Office Open XML ZIP package work and fit together.

That said, the exercise that follows walks you through creating a simple, essentialsonly Word document. Though it’s good practice for anyone creating Office Open XML Format documents through code to include all of the defaults that the source program (Word, Excel, or PowerPoint) includes when it creates a new document, only a few of those files are actually required for the source program (Word in our example) to be able to recognize and open the file. If you create a file that only contains the required bare basics, Word will recognize the missing pieces and add the document parts and relationships needed as you begin to use Word features in your document.

Every Office Open XML document requires [Content_Types].xml as well as a top-level _rels folder containing the .rels file. Each file also requires its program-specific folder with the main program-specific content file that goes in that folder (document.xml in a folder named word, in the case of a Word document). For a Word document, such as we’re about to build, these are the only three files you must have in your ZIP package to create a .docx file that Word will recognize and open without an error. In Excel and PowerPoint, a few other files are required.

To create your first Word document from scratch, you’ll need to create the files [Content_Types].xml, .rels, and document.xml, and place them in the correct folder structure. The steps that follow will walk you through getting this done.

Note 

In the sample files that are included on this book’s CD, find the  Copy XML.txt file, which contains all of the code in this and subsequent sections of this chapter, that you can copy into the files you create in Notepad if you prefer not to type the XML yourself.

Create the Folder Structure

On your Windows desktop, or in any convenient location, create a folder named First Document (or any name you like; this name is for identification purposes in this exercise only). This folder will store the structure for your new .docx file. In that folder, create two subfolders, one named _rels and the other named word. It is essential that these two folders are correctly named. When you’re done with this step, your folder structure should look like the following image.

Create the Main Document File

The main document file, document.xml, needs to reside in the word folder you just created. To create this file, do the following.

Caution 

To accommodate the page layout for the book, code in the unstructured XML samples throughout this chapter may break to a new line in the middle of a term or use a hyphen to start a new line. When you view code in Notepad, it might appear to break in the middle of a word as well, but it won’t use hyphens. Remember that all of the code between a single paired code (such as the <document> code shown here) is considered a single line and should not get manual line breaks when you type the code.

If you are typing this code yourself, double-check your syntax against the structured version of the same code that appears along with each unstructured sample. If copying the code instead of typing it, do so from the sample file named  Copy XML.txt referenced earlier.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <w:document xmlns:ve="http://schemas.openxmlformats.org/markup-compatibi1 ity/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http:// schemas.openxmlformats.org/officeDocument/2006/relationshi ps" xmlns:m="http:// schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-mi crosoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/ wordprocessi ngDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns: wne="http://schemas.microsoft.com/office/word/2006/wordml"><w:body><w:p><w: r><w:t>This is the first Word document I've created from scratch.</w:t></w: r></w:p><w:sectPr><w:pgSz w:w="12240" w:h="15840"/><w:pgMar w:top="1440" w: right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w: no gutter="0"/><w:cols w:space="720"/><w:docGrid w:linePitch="360"/></w:sectPr></ w: body></w: document>

Once you’re satisfied that your code is accurate, you can save and close this file. Notice that this code contains items you saw in the example from the preceding chapter section.

Let’s look at that document in one more format to help clarify the content. The image below shows the same document.xml file opened on the Tree View tab of the XML Notepad editor. 1160 Chapter 32 Office Open XML Essentials

Create the Content_Types File

In Notepad, save a file named exactly [Content_Types].xml to the root of your First Document folder. As with the document.xml file, following are two versions of the code that you need to add to this file, first shown in Internet Explorer so that you can clearly see the tree structure, and next shown as run-of-text, similar to the way code appears in Notepad.

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> - <Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"> <Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml" /> <Default Extension="xml" ContentType="application/xml" /> <Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats- officedocument.wordprocessingml.document.main+xml" /> </Types> <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Default Extension="rels" ContentType="application/vnd.openxmlformats-package. relationships+xml "/><Default Extension="xml" ContentType="application/xml"/><Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats-officedocu- ment.wordprocessingml.document.main+xml"/></Types>

As you see, this is a very simple file, containing the XML version statement at the top as well as the open code named Types that is the code in which all codes in the file are nested and where the namespace for the content types is defined. After that, you see the following.

Create the .rels File

The relationship file for this new document is the simplest of the three you need to create. In Notepad, create a new file and save it as .rels, inside the _rels subfolder you created within the First Document folder. Then, add the following content to that file (shown in both structured format and in Notepad run-of-text format).

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> - <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> <Relationship bold">rId1" Type="http ://schemas .openxmlf Formats .org/off ficeDocument/2006/relationships/off ficeDocument" Target="word/document.xml" /> </Relationships>

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/ relati onshi ps/officeDocument" Target="word/document.xml"/></Relationships>

In the preceding code, the XML version is defined (as it is in every .xml or .rels format file within the package); followed by the open code for the relationships content along with its namespace definition; followed by the single required relationship in this case, which is to the part named document.xml. See “The Office Open XML File Structure” on page 1150 for details on the three-part structure (ID, Type, and Target) of a relationship definition.

Compile and Open Your New Document

Once you save and close the .rels file, you can exit Notepad. You’re now ready to put your ZIP package together and open the file in Word, using the following steps.

  1. Open the First Document folder in Windows Explorer.

  2. Select the file [Content_Types.xml] as well as the two subfolders (_rels and word).

  3. Right-click, point to Send To, and then click Compressed (Zipped) Folder.

  4. When the .zip folder is created, change the name (including the file extension) to First document.docx. Then, press Enter to set the new name and click Yes to confirm when you see the warning about changing file extensions.

Double-click to open your new Word document. It should open in Word without error. If it does not, see the Troubleshooting and Inside Out tips at the end of this section for help finding the problem.

Add More Content Types, Document Parts, and Relationships

Even though you didn’t add all of the default content types and relationships that Word adds to a new document, all Word functionality is available to your new file. Make any edit (you can even type just a space if you like) and then save the file while it’s open in Word. Then, close it, change the file extension to .zip, and take a look at what Word did to your files.

What you’ll find is that Word added the default files it provides when it creates a new .docx file, and it added the necessary content type and relationship definitions to go along with them. Review the changes that Word made to your file. Once you’re comfortable with the ZIP package content, you’re ready to start working directly with the XML behind your Office Open XML documents.

Troubleshooting

How can I find the error when my ZIP package won’t open in Word, Excel, or PowerPoint?

When an Office Open XML Format document won’t open in Word, Excel, or PowerPoint, the problem can be as simple as a missing space, angle bracket, or another single character. But, when you have ZIP packages with multiple long files, how do you even begin to find the problem? Actually, in most cases you don’t have to-Word, Excel, or PowerPoint will do it for you.

When you try to open the file and an error message appears, click the Details button on the error message. In most cases, the precise location of the error will be listed, and the error type might be included as well. Take a look at the following example.

In this example, I left the quotation mark off following one of the namespace definitions in the document.xml part. Notice that the detail here shows you the document part, the line within that part, and the location in that line where the error occurs. See the Inside Out tip that follows for more on interpreting the location references.

Note that, if you’re using Internet Explorer to view and Notepad to edit your XML document parts, if there’s an error in one of the parts, Internet Explorer will most likely be unable to open it in the tree structure. Because of this, if you use the error detail to lead you to the error location and try to correct it in Notepad, you can confirm that the error is corrected before returning the file to the ZIP package and changing the file extension back to its original state, just by trying to open it in Internet Explorer.

See the Inside Out tip that follows for some help on how to locate the error in your code without any wasted time or effort.

Inside Out-Using XML Notepad and Word to Help Find Syntax Errors 

Perhaps you tried to open a file in Word, as discussed in the preceding Troubleshooting tip, and got an error. Or, maybe you just created one of the XML parts for a new document, such as a document.xml file, and then tried to open it in Internet Explorer only to get a syntax error at that point.

The error message you see may indicate a line and position number, or it may indicate a line and column number. Note that column and position are not the same thing. Position is the easier of the two to identify, as it corresponds to characters. no

One easy way to find the line and position number of the error is to try to open the file in XML Notepad, the free utility program mentioned earlier in this chapter (this is not the same as the Windows Notepad utility). So, if the Word error message tells you that the error occurred in the document.xml part, for example-or the error occurred in Internet Explorer when trying to open an individual XML part-you can try opening that document part in XML Notepad to instantly see the line and position number where the problem exists.

Keep in mind that everything within a paired code is considered a line of code. So, for example, in document.xml, line 2 refers to everything inside the paired <w: document…> code. Line 1 is the code that indicates the XML version. If then, for example, the XML Notepad error tells you that the error is located at line 2 and position 645, you’re looking for character 645 in the second line of code. Copy that line of code (you can open it in Windows Notepad to do this) and paste it into a blank Word document. Then, open the Visual Basic Editor (Alt+F11), press Ctrl+G to open the Immediate window, and type the following code in that window. (You may want to turn off Word Wrap from the Format menu in Notepad before copying text to Word, to avoid copying unwanted formatting marks.)

ActiveDocument.Characte rs(645).Select

Substitute 645, of course, for the position of the error in your code. Press Enter from that line of code and then switch back to the document (Alt+F11), and you’ll see the character causing the error selected on screen. No fuss, no muss, and no tearing your hair out because you can’t find the error when you look at the amorphous blob of code that appears in Windows Notepad.

Категории