Excel 2007[c] The Missing Manual
24.3. Excel and XML
XML is a great way to exchange data between different computer programs. But what does that have to do with Excel, which already has its own perfectly good file format? Here's the deal: More and more companies today use XML to pass data back and forth. For example, when companies exchange business orders, news organizations post stories, or real estate firms list properties for sale, chances are they're using an XML-based format. If you want to crack open these documents and analyze this data using all of Excel's features, including formulas and charts , then you'll need to use Excel's XML tools. There's another side to this story. Instead of trying to get XML information into Excel, you may need a way to get your worksheet data out of Excel. You might want to take an expense worksheet, export it to XML, and then feed that XML into an automated expense-processing program. That program could then track your expenses, submit them to your supervisor for authorization, and notify the payroll department when a payment is required. In a small company, it could be just as easy to print out the expense report and deliver it by hand (or email it). But in a large company, an automated application can help the whole process flow seamlessly, without forcing anyone to sort through stacks of paper or dozens of email messages. In these situations, XML really shines. Note: Experts estimate that Excel spreadsheets contain more data than all the world's relational databases combined. Excel's XML features can help you extract information that's trapped in your spreadsheet files and use it in other automated applications. Because XML is so flexible, there's no single-step solution to importing and exporting XML. You can't just perform an Open XML command because Excel doesn't know which XML format you're using. Instead, you first need to give Excel some information about the specific format you're using, and tell Excel how to extract the data you need. Excel makes this possible through a set of features called XML mapping . XML mapping lets you link a specific XML format to a specific spreadsheet. Once you set up this link, you can use it in two ways: to export data from your worksheet into an XML document, or to import the contents of an XML document into your worksheet. Some of the options for Excel's XML features are tucked away on a special tab that doesn't ordinarily appear. Before you do anything with XML and Excel, you need to display this tab. To do so, choose Office button 24.3.1. Mapping a Simple Document
The simplest way to map an XML document is to link each element in the XML document to a corresponding cell in a worksheet. Then, when you import the document, the data flows out of the elements and into the linked cells . Note: You can find all the XML files used in this chapter on the "Missing CD" page at www.missingmanuals.com. You can use these files to map your own worksheets. To try this out, you can use the simple Student.xml document shown below, which stores the test and assignment scores for a single student. <?xml version="1.0" ?> <Student> <Name>Lisa Chen</Name> <StudentID>45349920</StudentID> <Test1_Score>75</Test1_Score> <Test2_Score>63.23</Test2_Score> <Assignment1_Score>94</Assignment1_Score> <Assignment2_Score>90</Assignment2_Score> </Student>
Keep in mind that in real life, you wouldn't create this document by hand. Instead, it might be extracted from a database or, even more likely, generated by some sort of automated student grading program. Before you can map an XML document to an Excel workbook, you need to prepare the workbook. Simply follow these steps:
To map the Student.xml elements to your spreadsheet, follow these steps:
Once you've finished mapping the document and adding your headings, it's time to import the XML. Choose Date
24.3.2. Importing and Exporting XML
Now that you've mapped the Student.xml file, you've not only extracted some information from the XML document, you've also given yourself a range of options for managing the link between your spreadsheet data and the world of XML. The real magic of XML lies here. Once you've defined a link between your worksheet and a specific XML format, you can perform three tasks :
This ability to import means that in the Student.xml example, you could import a student's information, modify it, and then save it back to a different XML document. In fact, you could use your spreadsheet as a way to create dozens of different student documents in separate XML files, just by exporting different data to different files. A crafty developer could create a custom application that scans student XML files and then automatically generates and mails report cards. All you'd need to do is export the XML. In fact, the whole process could get even easier if someone develops a little piece of Excel macro code to handle the XML export. In that case, you'd just click a button on the worksheet, and Excel would export the student data to XML and submit it to the custom application. Developing this type of workflow takes a fair bit of work, and the first step's learning Excel's VBA macro language, which is introduced in Chapter 28. Note: Remember, if you save a mapped workbook, you're saving only the Excel spreadsheet file, with whatever data it currently contains. If you want to save the content to an XML file, you need to Export the XML data. Choose Developer | ||||||||||||
POWER USERS' CLINIC XML Mapping with a Schema | ||||||||||||
Serious XML gurus don't map a worksheet using an XML document. There are too many possible problems, including optional elements that Excel might ignore, data type rules that it doesn't enforce, and certain types of structures that Excel might misunderstand. A better way to map a worksheet to XML is to use an XML schema. A schema defines the structure of the XML document you want to import. If you're creating a worksheet to analyze lists of student grades, you could use a file called StudentList.xsd that defines the elements you'll use and the document structure. XML mapping works more or less the same with a schema as it does with an actual document. The key difference is that the schema removes the possibility for error. To map a schema, follow these steps:
When using an XML schema, Excel can perform validation with the schema data types to prevent invalid input. If you want to use this feature, you have to turn it on for the document. Select Data |
24.3.3. Mapping Lists
Excel's XML features really get interesting when you need to map XML documents that contain lists of information, like product catalogs, order tables, andas in the following examplea class report. As you've no doubt noticed, most Excel documents use lists of some kind. Whether you're tracking student grades, monthly expenses, or employee contact information, you rarely have just one piece of information.
XML documents are often designed to hold repeating elements. The Student.xml file, in contrast, only held the information for a single student. But you can readily create a document that holds a list of students, each one in separate <Student> element containers.
Here's an example (available in the StudentList.xml file, which you can find on the "Missing CD" page at www.missingmanuals.com). Only two students are shown here, but the actual StudentList.xml file contains many more students:
<?xml version="1.0" ?> <Students> <Student> <Name>Lisa Chen</Name> <StudentID>45349920</StudentID> <Test1_Score>75</Test1_Score> <Test2_Score>63.23</Test2_Score> <Assignment1_Score>94</Assignment1_Score> <Assignment2_Score>90</Assignment2_Score> </Student> <Student> <Name>Edwin Albott</Name> <StudentID>45349921</StudentID> <Test1_Score>85</Test1_Score> <Test2_Score>73.23</Test2_Score> <Assignment1_Score>94</Assignment1_Score> <Assignment2_Score>95.6</Assignment2_Score> </Student> </Students>
When you try to map this document, Excel quickly notices that the <Student> element repeats. Instead of mapping the <Student> element to a single cell (as it did in the previous section), Excel creates a mapped table that you can use to manage the list of students with filtering, sorting, and searching.
Note: This example, in fact, creates the same type of table you studied in Chapter 14. So why bother? It gives you all the fancy table tools, like sorting, filtering, alternating row formatting, and so on.
To map the StudentList.xml file, follow these steps:
-
Choose Office button
Open . The Open File dialog box appears.
-
Browse to the StudentList.xml file, and then open it .
If you haven't already gotten this file, download it from the "Missing CD" page at www.missingmanuals.com. Excel shows an Open XML dialog box with three options for the file.
-
Select "Use the XML Source task pane", and then click OK .
Excel warns you that you're mapping a document without a schema.
-
Click OK to continue .
The XML Source pane appears, with the structure of the StudentList.xml file. It looks similar to the previous example, but there's a difference now. When you drag an element onto the worksheet, Excel creates a table column complete with a header that has a drop-down list. Why? Because Excel recognizes that the StudentList.xml file contains multiple students. And Excel can't store all these students in your worksheet in a single cell.
-
Click the Student element, and then drag it to cell A1 .
When multiple columns belong to the same XML list, it's often easiest to drag these elements onto your spreadsheet in one operation. You can select multiple elements by holding down the Ctrl key while you select items in the XML Source pane or by selecting the parent element (the element that contains all the elements you want to insert).
In the student list example, if you select the Student entry in the XML Source pane, you also select all the elements that contain student information. You can then drag them all at once. It's entirely up to you whether you create your table as a series of contiguous columns (the easiest approach) or as separate columns spread out over your spreadsheet. Either way, the data's equivalent.
-
Choose Data
Connections Refresh All . Now the student information flows into the table, filling it up automatically, as shown in Figure 24-17.
|
Tip: In this example, all the XML data is contained in a repeating list of students. However, XML documents often use hybrid structures where they include some repeatable information (like the list of students), and some information that occurs only once (like the name of the class and the instructor who is teaching it). In this case, you would probably link individual cells in the top portion of your worksheet, and then add the table a little lower down.
24.3.4. Gaining the Benefits of XML Mapping
Regardless of whether you want to import or export XML, you should always save a copy of your mapped spreadsheet file. Excel stores the mapping information in that file. Essentially, you should think of this spreadsheet as a window that lets you analyze any XML file, as long as it has the same structure as the XML document you mapped.
When a semester of classes finishes up, an automated student grading application might generate a new XML document with the most up-to-date information. Fortunately, you don't need to map this documentas long as it matches the structure of the grading document you used initially, you can import the new XML document with a couple of mouse clicks. The new information then flows seamlessly into your existing workbook.
In a very real sense, you can reuse a mapped workbook to examine different XML files in the same way you can use a database query to get and analyze the most up-to-date information from a table in a database (as shown earlier).
For example, consider the StudentList.xml workbook that you created earlier. To make it a better tool for analyzing student grades, you would probably add a few extra ingredients . Here are some possibilities:
-
A calculated column in the table that determines each student's overall grade
-
A calculated field outside the table that determines the average or median grade
-
A chart that shows the distribution of grades in the class
The beauty of XML mapping is that once you've added these extra touches, you can reuse them with the data in other XML documents, provided these documents have the same structure as the document you used to map your worksheet. For example, you might receive a new file, called StudentList_Geography2007.xml, with a whole series of grades for another class. This document uses the same elements, so you don't need to go through the whole mapping process again.
Instead, you can just import this new information into your existing worksheet. All you need to do then is move to a mapped cell, select Developer
|