XML: A Managers Guide (2nd Edition) (Addison-Wesley Information Technology Series)

The fundamental unit of XML content is the element, which is an author-specified chunk of information. An element consists of an element name and element content . XML is case sensitive, so you must pay attention to case when assigning element names and creating element content. Start and end tags denote the boundaries of the element and contain the element name. The element content may consist of character data or other elements. An element may also be empty of content. Consider Example 2-3, an annotated version of our business card document to see examples of these content types.

Example 2-3

<BusinessCard> document element <Name> element content <GivenName>Kevin</GivenName> data content <MiddleName>Stewart</MiddleName> data content <FamilyName>Dick</FamilyName> data content </Name> <Title> Software Technology Analyst data content </Title> <Author/> empty content <ContactMethods> element content <Phone>650-555-5000</Phone> data content <Phone>650-555-5001</Phone> data content </ContactMethods> </BusinessCard>

In Example 2-3, "BusinessCard" is the top-level element. In XML, there can be only one element at the top level. This element is called the document element or sometimes root element . Think of this element as the trunk of the tree from which all other elements branch. Figure 2-2 shows the corresponding tree for Example 2-3 with each node representing an element and identified with the element name. Conceptually the element content resides within the node.

Figure 2-2. Business Card Element Tree

The annotations in Example 2-3 indicate the content model for each element. There are four allowable content models for elements.

  1. Data content. These elements contain only data. Note that in the 1.0 version of the XML specification, there is no way to enforce typing restrictions on data content such as integer, floating point, and date. Chapter 3 discusses a related standard, XML Schema, which addresses this potentially serious shortcoming.

  2. Element content. These elements contain only other elements.

  3. Empty. These elements contain neither elements nor data.

  4. Mixed content. These elements contain both data and other elements. None of the elements in Example 2-3 has this type of element content because many XML experts feel using elements with mixed content is poor design practice for data-oriented formats.

Notice that, except for empty elements, all elements in Example 2-3 have a start tag and an end tag. The start tag is bounded by angle brackets ”for example, <ElementName>. The end tag is bounded by angle brackets and has a leading slash, as in </ElementName>. All content, whether data or element, must occur between the start and end tags. An empty element may use this syntax by simply providing no content between the start and end tags ”for example, <ElementName></ElementName>. An empty element may also use an abbreviated syntax, with a single tag bounded by angle brackets with a trailing slash ”for example, <ElementName/>.

A document that obeys all the XML syntax rules is well formed . There are several technical criteria for well- formedness , but the primary ones are the following.

  • There is one document element.

  • All nonempty elements have start tags and end tags that match exactly.

  • All empty elements have the correct empty tag syntax.

  • Elements are strictly nested; there are no overlapping elements.

An XML processor can process a well-formed XML document unambiguously, building a tree data structure in which each node is an element that contains either data content or references to its subelements or both or neither. You could use such documents to represent many different kinds of content. Example 2-4 shows a document that represents the schema for a simple contact database. Figure 2-3 shows the corresponding tree.

Example 2-4

<Database> <Table> <Column>Name</Column> <Column>Phone Number</Column> </Table> <Table> <Column>Date</Column> <Column>Person</Column> </Table> </Database>

Figure 2-3. Contact Database Element Tree

Although the document in Example 2-4 captures the basic structure of the contact database, there is not enough information for a software application to process the document, establish a connection to the database in question, and perform queries. The element names "Database," "Table," and "Column" are insufficiently descriptive. Clearly, you need a richer syntax for describing the metadata associated with an element.

Категории