Structuring Data
In this section and throughout this chapter, we create our own XML markup. XML allows you to describe data precisely in a well-structured format.
XML Markup for an Article
In Fig. 19.2, we present an XML document that marks up a simple article using XML. The line numbers shown are for reference only and are not part of the XML document.
Figure 19.2. XML used to mark up an article.
XML is pretty easy.
1 3 4 5 6 |
Simple XML 7 8 May 5, 2005 9 10 11 John 12 Doe 13 14 15 | 16 17 18 In this chapter, we present a wide variety of examples that use XML. 19 20 |
This document begins with an XML declaration (line 1), which identifies the document as an XML document. The version attribute specifies the XML version to which the document conforms. The current XML standard is version 1.0. Though the W3C released a version 1.1 specification in February 2004, this newer version is not yet widely supported. The W3C may continue to release new versions as XML evolves to meet the requirements of different fields.
|
XML comments (lines 23), which begin with , can be placed almost anywhere in an XML document. XML comments can span to multiple linesan end marker on each line is not needed; the end marker can appear on a subsequent line as long as there is exactly one end marker (-->) for each begin marker ( 3 4 5 letter SYSTEM "letter.dtd"> 6 7 8 = "sender"> 9 Jane Doe 10 Box 12345 11 15 Any Ave. 12 Othertown 13 Otherstate 14 67890 15 555-4321 16 = "F" /> 17 18 19 = "receiver"> 20 John Doe 21 123 Main St. 22 23 Anytown 24 Anystate 25 12345 26 555-1234 27 = "M" /> 28 29 30 Dear Sir: 31 32 It is our privilege to inform you about our new database 33 managed with XML. This new system allows you to reduce the 34 load on your inventory list server by having the client machine 35 perform the work of sorting and filtering the data. 36 37 38 Please visit our Web site for availability 39 and pricing. 40 41 42 Sincerely, 43 Ms. Jane Doe 44
Line 5 specifies that this XML document references a DTD. Recall from Section 19.2 that DTDs define the structure of the data for an XML document. For example, a DTD specifies the elements and parent-child relationships between elements permitted in an XML document.
|
|
The DTD reference (line 5) contains three items, the name of the root element that the DTD specifies (letter); the keyword SYSTEM (which denotes an external DTDa DTD declared in a separate file, as opposed to a DTD declared locally in the same file); and the DTD's name and location (i.e., letter.dtd in the current directory). DTD document filenames typically end with the .dtd extension. We discuss DTDs and letter.dtd in detail in Section 19.5.
Several tools (many of which are free) validate documents against DTDs and schemas (discussed in Section 19.5 and Section 19.6, respectively). Microsoft's XML Validator is available free of charge from the Download Sample link at
msdn.microsoft.com/archive/en-us/samples/internet/xml/xml_validator/default.asp
This validator can validate XML documents against both DTDs and Schemas. To install it, run the downloaded executable file xml_validator.exe and follow the steps to complete the installation. Once the installation is successful, open the validate_js.htm file located in your XML Validator installation directory in IE to validate your XML documents. We installed the XML Validator at C:XMLValidator (Fig. 19.5). The output (Fig. 19.6) shows the results of validating the document using Microsoft's XML Validator. Visit www.w3.org/XML/Schema for a list of additional validation tools.
Figure 19.5. Validating an XML document with Microsoft's XML Validator.
Figure 19.6. Validation result using Microsoft's XML Validator.
(This item is displayed on page 940 in the print version)
Root element letter (lines 744 of Fig. 19.4) contains the child elements contact, contact, salutation, paragraph, paragraph, closing and signature. In addition to being placed between tags, data also can be placed in attributesname-value pairs that appear within the angle brackets of start tags. Elements can have any number of attributes (separated by spaces) in their start tags. The first contact element (lines 817) has an attribute named type with attribute value "sender", which indicates that this contact element identifies the letter's sender. The second contact element (lines 1928) has attribute type with value "receiver", which indicates that this contact element identifies the letter's recipient. Like element names, attribute names are case sensitive, can be any length, may contain letters, digits, underscores, hyphens and periods, and must begin with either a letter or an underscore character. A contact element stores various items of information about a contact, such as the contact's name (represented by element name), address (represented by elements address1, address2, city, state and zip), phone number (represented by element phone) and gender (represented by attribute gender of element flag). Element salutation (line 30) marks up the letter's salutation. Lines 3240 mark up the letter's body using two paragraph elements. Elements closing (line 42) and signature (line 43) mark up the closing sentence and the author's "signature," respectively.
|
Line 16 introduces the empty element flag. An empty element is one that does not contain any content. Instead, an empty element sometimes contains data in attributes. Empty element flag contains an attribute that indicates the gender of the contact (represented by the parent contact element). Document authors can close an empty element either by placing a slash immediately preceding the right angle bracket, as shown in line 16, or by explicitly writing an end tag, as in line 22
Note that the address2 element in line 22 is empty because there is no second part to this contact's address. However, we must include this element to conform to the structural rules specified in the XML document's DTDletter.dtd (which we present in Section 19.5). This DTD specifies that each contact element must have an address2 child element (even if it is empty). In Section 19.5, you will learn how DTDs indicate that certain elements are required while others are optional.
19 4 XML Namespaces
|