Professional XML (Programmer to Programmer)

What follows is a short introduction to some of the most important concepts and features of XPath. Here you will learn how to navigate the tree structure of an XML document with path expressions, about node types, predicates, axes, and sequences.

Nodes

XPath looks at an XML document as a tree of nodes. Let's see what those nodes are through the following example:

<catalog> <product > <price>5.95</price> <description>Custom printed stainless steel coffee mug</description> </product> <product > <price>119.95</price> <description>Natural maple bedside table</description> </product> </catalog>

From the perspective of XPath, everything in this document is a node. There are seven types of nodes in XPath. The following four are used most frequently:

The other three types of nodes that you might encounter occasionally in XPath are as follows:

Note 

Do not confuse elements with tags. Tags refer to the lexical structure of XML, where <product> and </product> are opening and closing tags, and elements are what is placed between these tags, such as the id, price, and description attributes of that product.

In XPath, you always talk about elements, not tags. If you write an expression that points to the first product element, it returns the whole element, including its attributes and anything else between the opening and closing tags in the textual representation of XML.

Tree Structure

Nodes are organized in a tree structure as follows:

Path Expressions

The tree structure of an XML document is not unlike the structure of a file system. Instead of the elements and attributes used in XML, the file system has directories and files. On UNIX or Windows, you use a particular syntax, called a path, to point to a directory or file. The path to a file looks like C:\ windows\system32\drivers\etc\hosts on Windows or /etc/hosts on UNIX. In both cases, you specify directory and file names starting from the root and separating them by a forward or backward slash. For example, A/B or A\B refers to the child B of A.

The same is true in XPath. So the /catalog specifications in the previous document example signify the following:

Just as with path expressions on UNIX or Windows, you can use .. (two dots) to refer to parent node. For example:

Note 

In the first expression, /catalog/product returns two catalog elements. So you might wonder if /catalog/product/.. returns the parents of these two elements, and if the parent would be the same if the expression returns the catalog element twice. This doesn't happen, because a path expression never returns duplicate nodes. So /catalog/product/.. returns just one node: the catalog element.

If you prefix a name with @ (the "at" symbol), it points to an attribute with that name. For instance, the following expression returns the two id attributes “mug” and “table”:

/catalog/product/@id

Predicates

What if you don't want to get all the products from the catalog, but only those with a price lower than 10 dollars? You can filter the nodes returned by a path expression by adding a condition between square brackets. So to return only the products with a price lower than 10 dollars, you would write this:

/catalog/product[price <= 10]

There are two types of predicates:

Boolean expressions in predicates

Many developers don't know exactly how the boolean() function works, so the best solution is to always write expressions that either return a numeric value or a Boolean value. For example:

Axes

XPath expressions navigate through a tree. An axis is the direction in which this navigation happens. Let's see what this means on the expression /catalog/product that you have seen before:

The / operator is used here to select child elements. But you can also use it to navigate other axes, as they are called in XPath. For example, this expression selects the product elements that follow the first product:

/catalog/product[1]/following-sibling::product

In the case of the document you saw earlier, this returns the second product. You select the followingsibling axis by prefixing the last occurrence of product with following-sibling::. When no axis is specified in front of an element name, the child axis is implied. So you could rewrite the /catalog/ product expression as follows:

/child::catalog/child::product

There are 13 axes available in XPath. The eight axes that are most frequently used are these:

The remaining five axes are these:

In the previous example, the child and attribute axes were written as /child::catalog, which is just a long version of /catalog. Similarly, /catalog/product/attribute::id is a long version of /catalog/product/@id. But it is interesting to note here how these are defined as two distinct axes. One consequence is that an attribute is not a child of the element on which it is defined. The @id attribute is not a child of the product element, but the product element is the parent of the @id attribute.

Sequences

You have seen expressions that return more than one element, like /catalog/product. They are said to return a sequence. Sequences in XPath are similar to lists in other languages-they can contain items of different types, they can contain duplicates, and items in the sequence are ordered. However, a sequence cannot contain other sequences-they cannot be nested.

Path expressions can return sequences, but you can also build your own sequences using the comma (,) operator. For example, the following expression returns a sequence with the two numbers 42 and 43:

(42, 43)

Категории