Effective XML: 50 Specific Ways to Improve Your XML
Many XML applications are intended solely for machine processing. For instance, SOAP messages are almost never seen by a person. However, most DocBook documents are edited by hand and are intended to be formatted and presented to people. In machine-oriented documents, mixed content is uncommon and order tends not to matter much. In narrative documents meant for human eyes, mixed content is extremely common and order matters a great deal. However, there's also a common middle ground of documents that are mostly intended for machine processing but may contain some portion of text meant for people. For example, consider a bank or credit card statement. Mostly it's just a list of transactions. However, statements often also contain a significant amount of narrative for a person to read, as shown in Figure 23-1. There is nothing in this part of the statement that could not be written in standard XHTML. Figure 23-1. The Narrative Fine Print from a Typical Bank Statement
For another example, imagine an invoice document. It probably contains a list of the products ordered, their prices, the delivery address, and so forth. This can all be represented in a straightforward, record-oriented fashion. <?xml version="1.0"?> <Invoice> <Customer>Jane's Electronics</Customer> <Product> <Name>Widget</Name> <SKU>324</SKU> <Quantity>10</Quantity> <Price currency="USD">2.95</Price> </Product> <Product> <Name>Gizmo</Name> <SKU>325</SKU> <Quantity>1</Quantity> <Price currency="USD">2344.95</Price> </Product> <ShipTo> <Street>135 Fremont Ave.</Street > <City>Santa Clara</City> <State>CA</State> <Zip>95054</Zip> </ShipTo> <Terms>Net-30</Terms> </Invoice> However, an invoice may also contain a paragraph of text thanking the customer for ordering the products, instructions for returning the product if necessary, and even ads for other products. All of these are traditional narrative text and need a more human-centered markup. Most developers focus on the more record-like aspects of a document when designing an XML application. Developers are more comfortable with this sort of data, and its structure tends to be more closely tied to the business rules. The narrative content is often an afterthought, if it's included at all, and it's rarely very well thought out. Fortunately, even as an afterthought, it doesn't have to be hard to add sophisticated narrative structure to your documents. The trick is, instead of trying to invent a markup language that describes paragraphs, sections, title, emphasis, and so on from scratch, borrow an existing markup language. In particular, I recommend that you borrow XHTML. XHTML has a number of advantages, not least among them:
There are two basic ways to integrate XHTML into other, more record-like documents.
Both approaches have their advantages and disadvantages. The first often seems to flow more naturally with the document as a whole, while the second makes it much easier to extract and process the HTML using a separate process from the one that manipulates the records in the document. Perhaps the best approach is to combine them, that is, to insert a placeholder element that contains an html element. For example, here's a simplified bank statement that includes HTML account information. <?xml version="1.0"?> <!DOCTYPE statement PUBLIC "-//MegaBank//DTD Statement//EN" "statement.dtd"> <Statement xmlns="http://namespaces.megabank.com/"> <Bank>MegaBank</bank> <Account> <Number>00003145298</Number> <Type>Savings</Type> <Owner>John Doe</Owner> </Account> <Date>2003-30-02</Date> <OpeningBalance>5266.34</OpeningBalance> <Deposit> <Date>2003-02-07</Date> <Amount>300.00</Amount> </Deposit> <ClosingBalance>5566.34</ClosingBalance> <AccountInfo> <html xmlns="http://www.w3.org/1999/xhtml"> <body> <h1> IMPORTANT INFORMATION ABOUT THIS ACCOUNT STATEMENT AND YOUR RIGHTS </h1> <ol> <li><strong>Review At Once:</strong> Notify the Bank in writing, within 14 days after we mail or make this statement available to you, of any irregularities, or you may lose valuable rights. See the brochure <cite> Information About Our Accounts and Services </cite> for details about this and other time limitations regarding notice or irregularities. (This paragraph does not apply to electronic funds or wire transfers.) </li> <li><strong>Electronic Funds Transfers Under Regulation E:</strong> In case of...</li> </ol> ... </body> </html> </AccountInfo> </statement> If you want to validate documents like this (and you don't always need to do that; sometimes just the markup is enough), you'll want to reference the XHTML DTD. This is not hard. You can load it with a parameter entity reference as discussed in Item 8 and demonstrated below. <!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.1//EN" "xhtml11.dtd"> %xhtml; You then simply include the html element in the content model of the AccountInfo element. <!ELEMENT AccountInfo (html)> The only tricky part is ensuring that no elements in your application share names , such as p , div , body , or table , with standard HTML elements. This is probably a good idea anyway because the HTML elements are so familiar to so many people that using the same names for other things is likely to cause confusion. (Other schema languages do not have this problem because they're namespace aware, but the W3C XML Schema Language schema for XHTML has not been finished as of June 2003.) In fact, you actually can choose from several variants of XHTML, depending on your needs. These include:
The W3C has even published profiles of XHTML that integrate MathML, SVG, and/or VoiceXML support. If none of these suit you, you can use the modularization techniques built into XHTML 1.1 to customize your own. You can add and remove elements and attributes, select only some of the modules, and build almost exactly the language you need. For example, suppose you want to use XHTML Basic but remove forms support. You would simply redefine the xhtml-form.module entity to IGNORE before importing the XHTML basic driver. <!ENTITY % xhtml-form.module "IGNORE" > <!ENTITY % xhtml-basic PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd" > %xhtml-basic; It is also possible to go the other way, that is, to mix your own vocabularies into XHTML. The difference is that in this case, the root element is html , and the main driver is HTML, not your own application. This is primarily useful for browser display, either with stylesheets or particular plug-ins. For instance, this is how SVG and MathML are added to web pages. However, this technique tends not to be as useful in custom, local applications. You may or may not need to customize XHTML like this before mixing it with your own applications. Either way, it's a lot easier to borrow one of the XHTML DTDs and embed XHTML in your documents rather than invent an equivalent language from scratch. Reusing XHTML saves developer time, saves author time, and produces more robust and maintainable documents. |