Core C# and .NET
< Day Day Up > |
A significant benefit of representing XML in a tree model as opposed to a data stream is the capability to query and locate the tree's content using XML Path Language (XPath). This technique is similar to using a SQL command on relational data. An XPath expression (query) is created and passed to an engine that evaluates it. The expression is parsed and executed against a data store. The returned value(s) may be a set of nodes or a scalar value. XPath is a formal query language defined by the XML Path Language 2.0 specification (www.w3.org/TR/xpath). Syntactically, its most commonly used expressions resemble a file system path and may represent either the absolute or relative position of nodes in the tree. In the .NET Framework, XPath evaluation is exposed through the XPathNavigator abstract class. The navigator is an XPath processor that works on top of any XML data source that exposes the IXPathNavigable interface. The most important member of this interface is the CreateNavigator method, which returns an XPathNavigator object. Figure 10-4 shows three classes that implement this interface. Of these, XmlDocument and XmlDataDocument are members of the System.Xml namespace; XPathDocument (as well as the XmlNavigator class) resides in the System.Xml.XPath namespace.
Figure 10-4. XML classes that support XPath navigation
Constructing XPath Queries
Queries can be executed against each of these classes using either an XPathNavigator object or the SelectNodes method implemented by each class. Generic code looks like this: // XPATHEXPRESSION is the XPath query applied to the data // (1) Return a list of nodes XmlDocument doc = new XmlDocument(); doc.Load("movies.xml"); XmlNodeList selection = doc.SelectNodes(XPATHEXPRESSION); // (2) Create a navigator and execute the query XPathNavigator nav = doc.CreateNavigator(); XPathNodeIterator iterator = nav.Select(XPATHEXPRESSION);
The XpathNodeIterator class encapsulates a list of nodes and provides a way to iterate over the list. As with regular expressions (refer to Chapter 5, "C# Text Manipulation and File I/O"), an XPath query has its own syntax and operators that must be mastered in order to efficiently query an XML document. To demonstrate some of the fundamental XPath operators, we'll create queries against the data in Listing 10-10. Listing 10-10. XML Representation of Directors/Movies Relationship
<films> <directors> <director_id>54</director_id> <first_name>Martin</first_name> <last_name>Scorsese</last_name> <movies> <movie_ID>30</movie_ID> <movie_Title>Taxi Driver</movie_Title> <movie_DirectorID>54</movie_DirectorID> <movie_Year>1976</movie_Year> </movies> <movies> <movie_ID>28</movie_ID> <movie_Title>Raging Bull </movie_Title> <movie_DirectorID>54</movie_DirectorID> <movie_Year>1980</movie_Year> </movies> </directors> </films>
Table 10-3 summarizes commonly used XPath operators and provides an example of using each.
Note that the filter operator permits nodes to be selected by their content. There are a number of functions and operators that can be used to specify the matching criteria. Table 10-4 lists some of these.
Refer to the XPath standard (http://www.w3.org/TR/xpath) for a comprehensive list of operators and functions. Let's now look at examples of using XPath queries to search, delete, and add data to an XML tree. Our source XML file is shown in Listing 10-10. For demonstration purposes, examples are included that represent the XML data as an XmlDocument, XPathDocument, and XmlDataDocument. XmlDocument and XPath
The expression in this example extracts the set of last_name nodes. It then prints the associated text. Note that underneath, SelectNodes uses a navigator to evaluate the expression. string exp = "/films/directors/last_name"; XmlDocument doc = new XmlDocument(); doc.Load("directormovies.xml"); // Build DOM tree XmlNodeList directors = doc.SelectNodes(exp); foreach(XmlNode n in directors) Console.WriteLine(n.InnerText); // Last name or director
The XmlNode.InnerText property concatenates the values of child nodes and displays them as a text string. This is a convenient way to display tree contents during application testing. XPathDocument and XPath
For applications that only need to query an XML document, the XPathDocument is the recommended class. It is free of the overhead required for updating a tree and runs 20 to 30 percent faster than XmlDocument. In addition, it can be created using an XmlReader to load all or part of a document into it. This is done by creating the reader, positioning it to a desired subtree, and then passing it to the XPathDocument constructor. In this example, the XmlReader is positioned at the root node, so the entire tree is read in: string exp = "/films/directors/last_name"; // Create method was added with .NET 2.0 XmlReader rdr = XmlReader.Create("c:\\directormovies.xml"); // Pass XmlReader to the constructor xDoc = new XPathDocument(rdr); XPathNavigator nav= xDoc.CreateNavigator(); XPathNodeIterator iterator; iterator = nav.Select(exp); // List last name of each director while (iterator.MoveNext()) Console.WriteLine(iterator.Current.Value); // Now, list only movies for Martin Scorsese string exp2 = "//directors[last_name='Scorsese']/movies/movie_Title"; iterator = nav.Select(exp2); while (iterator.MoveNext()) Console.WriteLine(iterator.Current.Value);
Core Note
XmlDataDocument and XPath
The XmlDataDocument class allows you to take a DataSet (an object containing rows of data) and create a replica of it as a tree structure. The tree not only represents the DatSet, but is synchronized with it. This means that changes made to the DOM or DataSet are automatically reflected in the other. Because XmlDataDocument is derived from XmlDocument, it supports the basic methods and properties used to manipulate XML data. To these, it adds methods specifically related to working with a DataSet. The most interesting of these is the GeTRowFromElement method that takes an XmlElement and converts it to a corresponding DataRow. A short example illustrates how XPath is used to retrieve the set of nodes representing the movies associated with a selected director. The nodes are then converted to a DataRow, which is used to print data from a column in the row. // Create document by passing in associated DataSet XmlDataDocument xmlDoc = new XmlDataDocument(ds); string exp = "//directors[last_name='Scorsese']/movies"; XmlNodeList nodeList = xmlDoc.DocumentElement.SelectNodes(exp); DataRow myRow; foreach (XmlNode myNode in nodeList) { myRow = xmlDoc.GetRowFromElement((XmlElement)myNode); if (myRow != null){ // Print Movie Title from a DataRow Console.WriteLine(myRow["movie_Title"].ToString()); } }
This class should be used only when its hybrid features add value to an application. Otherwise, use XmlDocument if updates are required or XPathDocument if the data is read-only. Adding and Removing Nodes on a Tree
Besides locating and reading data, many applications need to add, edit, and delete information in an XML document tree. This is done using methods that edit the content of a node and add or delete nodes. After the changes have been made to the tree, the updated DOM is saved to a file. To demonstrate how to add and remove nodes, we'll operate on the subtree presented as text in Listing 10-10 and as a graphical tree in Figure 10-5. Figure 10-5. Subtree used to delete and remove nodes
This example uses the XmlDocument class to represent the tree for which we will remove one movies element and add another one. XPath is used to locate the movies node for Raging Bull along the path containing Scorsese as the director: "//directors[last_name='Scorsese']/movies[movie_Title= 'Raging Bull']" This node is deleted by locating its parent node, which is on the level directly above it, and executing its RemoveChild method. Listing 10-11. Using XmlDocument and XPath to Add and Remove Nodes
Public void UseXPath() { XmlDocument doc = new XmlDocument(); doc.Load("c:\\directormovies.xml"); // (1) Locate movie to remove string exp = "//directors[last_name='Scorsese']/ movies[movie_Title='Raging Bull']"; XmlNode movieNode = doc.SelectSingleNode(exp); // (2) Delete node and child nodes for movie XmlNode directorNode = movieNode.ParentNode; directorNode.RemoveChild(movieNode); // (3) Add new movie for this director // First, get and save director's ID string directorID = directorNode.SelectSingleNode("director_id").InnerText; // XmlElement is dervied from XmlNode and adds members XmlElement movieEl = doc.CreateElement("movies"); directorNode.AppendChild(movieEl); // (4) Add Movie Description AppendChildElement(movieEl, "movie_ID", "94"); AppendChildElement(movieEl, "movie_Title", "Goodfellas"); AppendChildElement(movieEl, "movie_Year", "1990"); AppendChildElement(movieEl, "movie_DirectorID", directorID); // (5) Save updated XML Document doc.Save("c:\\directormovies2.xml"); } // Create node and append to parent public void AppendChildElement(XmlNode parent, string elName, string elValue) { XmlElement newEl = parent.OwnerDocument.CreateElement(elName); newEl.InnerText = elValue; parent.AppendChild(newEl); }
Adding a node requires first locating the node that will be used to attach the new node. Then, the document's Createxxx method is used to generate an XmlNode or XmlNode-derived object that will be added to the tree. The node is attached using the current node's AppendChild, InsertAfter, or InsertBefore method to position the new node in the tree. In this example, we add a movies element that contains information for the movie Goodfellas. |
< Day Day Up > |