XML and SQL Server 2000

To understand the operation of XSLT stylesheets on XML data, we are going to have to learn to think a little bit differently than we have up to this point.

XSLT defines itself as a series of operations on a tree, which is a representation of an XML document. This tree is an intangible entity that has no defined application programming interface (API); it only describes the objects in the tree, their relationships, and their associated properties. Let's look at a more detailed portion of our XML process diagram and see what this new concept does for us.

Figure 2.2. Modifying the tree via XSLT.

So the tree representation of data is com bined in the transform process with the tree of the stylesheet to produce a new tree structure. The new tree can then be output in any of three formats: HTML, text, or XML. This isn't to say that other output formats cannot be generated, but these are the three outlined in the specification.

Now that we have the process diagrammed via the tree concept, let's look at an XML document itself in a tree-style format. Conceptually, there's nothing magical about thinking of an XML document as a tree. Listing 2.3 is a portion of our sample XML document.

Listing 2.3 A Sample XML Document

<PERSON PERSONID="p2"> <NAME> <LAST>Tenney</LAST <FIRST>Corey</FIRST> </NAME> <ADDRESS> <STREET>211 Home Improvement Circle</STREET> <CITY>Roy, UT</CITY> <COUNTRY>USA</COUNTRY> <ZIP>64067</ZIP> </ADDRESS> <TEL/> <EMAIL>tenney@yardwork.com</EMAIL> </PERSON>

Now let's look at this document as a tree representation.

The ROOT element can be thought of as a representation of the document as a whole. It is the starting point for document processors.

The concept of a ROOT element comes from the WC3 XPath specification, which became a Recommendation on November 17, 1999. The specification is available at http://www.w3.org/TR/1999/REC-xpath-19991116. The primary purpose of XPath is to address parts of an XML document. To help accomplish this, it also provides basic methods for manipulation of strings, numbers , and Boolean values. It is important to know that XSLT relies very heavily on XPath.

Nodes

The entire concept of a document tree comes from XPath's model of an XML document as a tree of nodes. What is a node? In its simplest form, a node is defined as an object in a document tree. Another new object is a context node. This is defined as the node at which you are currently located in the tree during processing. Every time a process is carried out and you move to another node to process it, the context node changes to follow you. Processes are carried out, and relationships between nodes are stated as being in relation to the context node. If this is a little difficult to understand, I'll be showing an example in the next section,"Location Paths."

Table 2.2 shows the seven type of nodes contained in a document tree.

Table 2.2. XML Document Nodes Types

Node Type

Description

Root node

There is one and only one root node for every XML document. An important point here is that the root node is not the same thing as the document element, which is the singular element that encloses all other elements in a well- formed XML document. An illustration of a document element is the <RESUMES> element in the well-formed sample XML document of Chapter 1,"Database XML."

Element node

There is an element node for every element in the document. It is delineated by a starting and ending tag or, in the case of an empty element, just an empty element tag such as </TEL> .

An element node may have a unique identifier (ID). This is the value of the attribute that is declared in the accompanying DTD as type <ID> . No two elements in a document can have the same unique ID. If an XML processor finds two elements in the same document with the same unique ID (meaning the document is invalid), the second element is treated as not having a unique ID.

Attribute node

Each element node can have an associated set of attribute nodes, which are declared within the opening tag of a tag set within an empty tag. Namespace declarations, although they appear to be attributes of the <STYLESHEET> tag by virtue of the xmlns designation, are not considered to be attribute nodes. They have their own node type, which is discussed later.

Text node

Character data is grouped into text nodes and never have an immediately following or preceding text node. These nodes are usually defined within a CDATA section of the accompanying DTD. A text node always has at least one character of data.

Comment node

There is a comment node for every comment in a document. These comments are delimited just as you would expect by the <!...> entities. This definition does not apply to comments that occur within the document type declaration.

Processing instruction node

There is a processing instruction node for every processing instruction delimited by the usual <?...?> . The definition does not apply to any processing instruction that occurs within the document type declaration.Also, although the XML declaration <?xml version = "1.0"?> looks like a processing instruction, it is not considered to be one and never has a node in a document tree.

Namespace node

Each element has an associated set of namespace nodes, one for each distinct namespace prefix that is in scope for the element (including the XML prefix, which is implicitly declared by the XML Namespaces Recommendation) and one for the default namespace if one is in scope for the element.

Armed with these definitions, now I think we can revisit Figure 2.3 and make a lot more sense out of it.

Figure 2.3. An XML document tree.

Location Paths

There is one more set of definitions to learn before we dive into the actual structure of stylesheets. XPath introduces, along with other concepts that will be covered at various points in this chapter, the model of self, children, descendents, siblings, and ancestors . XPath uses these terms to describe the concept of location paths, or to put it succinctly, a means of navigation around an XML tree. Table 2.3 defines the location path entities.

Table 2.3. XPath Location Path Entities

Entity Name

Direction

Description

self

n/a

Identifies the context node.

child

Forward

Identifies the children of the context node.

parent

Reverse

Identifies the context node's parent if one exists.

descendent

Forward

Identifies the descendents of the context node, which are a child, a child of a child, and so on. Attribute nodes and namespace nodes are never considered descendents.

ancestor

Reverse

Identifies the ancestors of the context node, which are a parent, a parent of a parent, and so on. The root node is always an ancestor unless it is the context node itself.

sibling

n/a

Identifies all nodes that are at the same depth as the context node in document order.

preceding-sibling

Reverse

Identifies all siblings in the same document as the context node that precede the context node in document order. This does not include ancestors, attribute nodes, or namespace nodes.

following-sibling

Forward

Identifies all siblings in the same document as the context node that follow the context node in document order. This does not include descendents, attribute nodes, or namespace nodes.

Remember the context node concept that I brought up in the earlier section "Nodes"? Well, these new terms like "children" and so on are defined in terms of a context node. Examples and syntax for these location paths will be discussed in the section "Patterns (Abbreviated Syntax)" later in this chapter. For now, though, I want you to see the interrelationships of these entities, and we'll do this utilizing our XML document tree in Figure 2.3 in concert with our sample document in Listing 2.3. Let's look at Table 2.4 and the element relationships expressed there, using keywords from XPath.

Table 2.4. XPath Location Path Entity Relationships

Entity

Nodes

self

Address (This is the context node.)

child

Street, city, country, zip

parent

Person

descendent

Street, city, country, zip, 211 Home Improvement Circle, Roy, UT, USA, 64067

ancestor

Person

sibling

Name, tel, email

preceding

Name, last, first,Tenney, Corey

preceding-sibling

Name

following

Tel, email, tenney@yardwork.com

following-sibling

Tel, email

Did the entities "preceding" and "following," along with their sibling counterparts, throw you?

Remember the definitions in Table 2.3. They are defined in terms of document order. If the tree diagram is not drawn exactly according to the document order, it can lead you to incorrect results. Be sure the diagram is correct before you use it to navigate a document.

Категории