Structuring Data

In this section and throughout this chapter, we create our own XML markup. XML allows you to describe data precisely in a well-structured format.

XML Markup for an Article

In Fig. 19.2, we present an XML document that marks up a simple article using XML. The line numbers shown are for reference only and are not part of the XML document.

Figure 19.2. XML used to mark up an article.

XML is pretty easy.

1 3 4 5

6

Simple XML 7 8 May 5, 2005 9 10 11 John 12 Doe 13 14 15 16 17 18 In this chapter, we present a wide variety of examples that use XML. 19 20

This document begins with an XML declaration (line 1), which identifies the document as an XML document. The version attribute specifies the XML version to which the document conforms. The current XML standard is version 1.0. Though the W3C released a version 1.1 specification in February 2004, this newer version is not yet widely supported. The W3C may continue to release new versions as XML evolves to meet the requirements of different fields.

Portability Tip 19 1

Documents should include the XML declaration to identify the version of XML used. A document that lacks an XML declaration might be assumed to conform to the latest version of XMLwhen it does not, errors could result.

Common Programming Error 19 1

Placing whitespace characters before the XML declaration is an error.

XML comments (lines 23), which begin with , can be placed almost anywhere in an XML document. XML comments can span to multiple linesan end marker on each line is not needed; the end marker can appear on a subsequent line as long as there is exactly one end marker (-->) for each begin marker ( 3 4 5 letter SYSTEM "letter.dtd"> 6 7 8 = "sender"> 9 Jane Doe 10 Box 12345 11 15 Any Ave. 12 Othertown 13 Otherstate 14 67890 15 555-4321 16 = "F" /> 17 18 19 = "receiver"> 20 John Doe 21 123 Main St. 22 23 Anytown 24 Anystate 25 12345 26 555-1234 27 = "M" /> 28 29 30 Dear Sir: 31 32 It is our privilege to inform you about our new database 33 managed with XML. This new system allows you to reduce the 34 load on your inventory list server by having the client machine 35 perform the work of sorting and filtering the data. 36 37 38 Please visit our Web site for availability 39 and pricing. 40 41 42 Sincerely, 43 Ms. Jane Doe 44

Line 5 specifies that this XML document references a DTD. Recall from Section 19.2 that DTDs define the structure of the data for an XML document. For example, a DTD specifies the elements and parent-child relationships between elements permitted in an XML document.

Error Prevention Tip 19 1

An XML document is not required to reference a DTD, but validating XML parsers can use a DTD to ensure that the document has the proper structure.

Portability Tip 19 2

Validating an XML document helps guarantee that independent developers will exchange data in a standardized form that conforms to the DTD.

The DTD reference (line 5) contains three items, the name of the root element that the DTD specifies (letter); the keyword SYSTEM (which denotes an external DTDa DTD declared in a separate file, as opposed to a DTD declared locally in the same file); and the DTD's name and location (i.e., letter.dtd in the current directory). DTD document filenames typically end with the .dtd extension. We discuss DTDs and letter.dtd in detail in Section 19.5.

Several tools (many of which are free) validate documents against DTDs and schemas (discussed in Section 19.5 and Section 19.6, respectively). Microsoft's XML Validator is available free of charge from the Download Sample link at

msdn.microsoft.com/archive/en-us/samples/internet/xml/xml_validator/default.asp

This validator can validate XML documents against both DTDs and Schemas. To install it, run the downloaded executable file xml_validator.exe and follow the steps to complete the installation. Once the installation is successful, open the validate_js.htm file located in your XML Validator installation directory in IE to validate your XML documents. We installed the XML Validator at C:XMLValidator (Fig. 19.5). The output (Fig. 19.6) shows the results of validating the document using Microsoft's XML Validator. Visit www.w3.org/XML/Schema for a list of additional validation tools.

Figure 19.5. Validating an XML document with Microsoft's XML Validator.

Figure 19.6. Validation result using Microsoft's XML Validator.

(This item is displayed on page 940 in the print version)

Root element letter (lines 744 of Fig. 19.4) contains the child elements contact, contact, salutation, paragraph, paragraph, closing and signature. In addition to being placed between tags, data also can be placed in attributesname-value pairs that appear within the angle brackets of start tags. Elements can have any number of attributes (separated by spaces) in their start tags. The first contact element (lines 817) has an attribute named type with attribute value "sender", which indicates that this contact element identifies the letter's sender. The second contact element (lines 1928) has attribute type with value "receiver", which indicates that this contact element identifies the letter's recipient. Like element names, attribute names are case sensitive, can be any length, may contain letters, digits, underscores, hyphens and periods, and must begin with either a letter or an underscore character. A contact element stores various items of information about a contact, such as the contact's name (represented by element name), address (represented by elements address1, address2, city, state and zip), phone number (represented by element phone) and gender (represented by attribute gender of element flag). Element salutation (line 30) marks up the letter's salutation. Lines 3240 mark up the letter's body using two paragraph elements. Elements closing (line 42) and signature (line 43) mark up the closing sentence and the author's "signature," respectively.

Common Programming Error 19 6

Failure to enclose attribute values in double ("") or single ('') quotes is a syntax error.

Line 16 introduces the empty element flag. An empty element is one that does not contain any content. Instead, an empty element sometimes contains data in attributes. Empty element flag contains an attribute that indicates the gender of the contact (represented by the parent contact element). Document authors can close an empty element either by placing a slash immediately preceding the right angle bracket, as shown in line 16, or by explicitly writing an end tag, as in line 22

Note that the address2 element in line 22 is empty because there is no second part to this contact's address. However, we must include this element to conform to the structural rules specified in the XML document's DTDletter.dtd (which we present in Section 19.5). This DTD specifies that each contact element must have an address2 child element (even if it is empty). In Section 19.5, you will learn how DTDs indicate that certain elements are required while others are optional.

19 4 XML Namespaces

Категории