(Optional) Document Object Model (DOM)
Although an XML document is a text file, retrieving data from the document using traditional sequential file processing techniques is neither practical nor efficient, especially for adding and removing elements dynamically.
Upon successfully parsing a document, some XML parsers store document data as tree structures in memory. Figure 19.21 illustrates the tree structure for the root element of the document article.xml discussed in Fig. 19.2. This hierarchical tree structure is called a Document Object Model (DOM) tree, and an XML parser that creates this type of structure is known as a DOM parser. Each element name (e.g., article, date, firstName) is represented by a node. A node that contains other nodes (called child nodes or children) is called a parent node (e.g., author). A parent node can have many children, but a child node can have only one parent node. Nodes that are peers (e.g., firstName and lastName) are called sibling nodes. A node's descendant nodes include its children, its children's children and so on. A node's ancestor nodes include its parent, its parent's parent and so on.
Figure 19.21. Tree structure for the document article.xml of Fig. 19.2.
(This item is displayed on page 964 in the print version)
The DOM tree has a single root node, which contains all the other nodes in the document. For example, the root node of the DOM tree that represents article.xml (Fig. 19.2) contains a node for the XML declaration (line 1), two nodes for the comments (lines 23) and a node for the XML document's root element article (line 5).
Classes for creating, reading and manipulating XML documents are located in the C# namespace System.Xml. This namespace also contains additional namespaces that provide other XML-related operations.
Reading an XML Document with an XmlReader
In this section, we present several examples that use DOM trees. Our first example, the program in Fig. 19.22, loads the XML document presented in Fig. 19.2 and displays its data in a text box. This example uses class XmlReader to iterate through each node in the XML document.
Figure 19.22. XmlReader iterating through an XML document.
1 // Fig. 19.22: XmlReaderTest.cs 2 // Reading an XML document. 3 using System; 4 using System.Windows.Forms; 5 using System.Xml; 6 7 namespace XmlReaderTest 8 { 9 public partial class XmlReaderTestForm : Form 10 { 11 public XmlReaderTestForm() 12 { 13 InitializeComponent(); 14 } // end constructor 15 16 // read XML document and display its content 17 private void XmlReaderTestForm_Load( object sender, EventArgs e ) 18 { 19 // create the XmlReader object 20 XmlReaderSettings settings = new XmlReaderSettings(); 21 XmlReader reader = XmlReader.Create( "article.xml", settings ); 22 23 int depth = -1; // tree depth is -1, no indentation 24 25 while ( reader.Read() ) // display each node's content 26 { 27 switch ( reader.NodeType ) 28 { 29 case XmlNodeType.Element: // XML Element, display its name 30 depth++; // increase tab depth 31 TabOutput( depth ); // insert tabs 32 OutputTextBox.Text += "<" + reader.Name + "> "; 33 34 // if empty element, decrease depth 35 if ( reader.IsEmptyElement ) 36 depth--; 37 break; 38 case XmlNodeType.Comment: // XML Comment, display it 39 TabOutput( depth ); // insert tabs 40 OutputTextBox.Text += " "; 41 break; 42 case XmlNodeType.Text: // XML Text, display it 43 TabOutput( depth ); // insert tabs 44 OutputTextBox.Text += " " + reader.Value + " "; 45 break; 46 47 // XML XMLDeclaration, display it 48 case XmlNodeType.XmlDeclaration: 49 TabOutput( depth ); // insert tabs 50 OutputTextBox.Text += " 3 4 5 6 = "sender"> 7 Jane Doe 8 Box 12345 9 15 Any Ave. 10 Othertown 11 Otherstate 12 67890 13 555-4321 14 = "F" /> 15 16 17 = "receiver"> 18 John Doe 19 123 Main St. 20 21 Anytown 22 Anystate 23 12345 24 555-1234 25 = "M" /> 26 27 28 Dear Sir: 29 30 It is our privilege to inform you about our new database 31 managed with XML. This new system allows you to reduce the 32 load on your inventory list server by having the client machine 33 perform the work of sorting and filtering the data. 34 35 36 Please visit our Web site for availability 37 and pricing. 38 39 40 Sincerely, 41 Ms. Doe 42 |
In XmlDomForm's Load event handler (lines 1934), lines 2324 create an XmlReaderSettings object and set its IgnoreWhitespace property to true so that the insignificant whitespaces in the XML document are ignored. Line 27 then invokes static XmlReader method Create to parse and load letter.xml.
Line 29 creates the treeNode tree (declared in line 16). This TReeNode is used as a graphical representation of a DOM tree node in the TReeView control. Line 31 assigns the XML document's name (i.e., letter.xml) to tree's Text property. Line 32 calls method Add to add the new treeNode to the treeView's Nodes collection. Line 33 calls method BuildTree to update the treeView so that it displays source's complete DOM tree.
Method BuildTree (lines 3789) receives an XmlReader for reading the XML document and a treeNode referencing the current location in the tree (i.e., the treeNode most recently added to the treeView control). Line 40 declares treeNode reference newNode, which will be used for adding new nodes to the treeView. Lines 4284 iterate through each node in the XML document's DOM tree.
The switch statement in lines 4581 adds a node to the treeView, based on the XmlReader's current node. When a text node is encountered, the Text property of the new treeNodenewNodeis assigned the current node's value (line 49). Line 50 adds this TReeNode to treeNode's node list (i.e., adds the node to the treeView control).
Line 52 matches an EndElement node type. This case moves up the tree to the current node's parent because the end of an element has been encountered. Line 53 accesses TReeNode's Parent property to retrieve the node's current parent.
Line 57 matches Element node types. Each non-empty Element NodeType (line 60) increases the depth of the tree; thus, we assign the current reader Name to the newNode's Text property and add the newNode to TReeNode's node list (lines 6364). Line 67 assigns the newNode's reference to TReeNode to ensure that treeNode refers to the last child TReeNode in the node list. If the current Element node is an empty element (line 69), we assign to the newNode's Text property the string representation of the NodeType (line 73). Next, the newNode is added to the TReeNode node list (line 74). The default case (lines 7780) assigns the string representation of the node type to the newNode Text property, then adds the newNode to the treeNode node list.
After the entire DOM tree is processed, the TReeNode node list is displayed in the treeView control (lines 8788). TReeView method ExpandAll causes all the nodes of the tree to be displayed. TReeView method Refresh updates the display to show the newly added TReeNodes. Note that while the application is running, clicking nodes (i.e., the + or boxes) in the treeView either expands or collapses them.
Locating Data in XML Documents with XPath
Although XmlReader includes methods for reading and modifying node values, it is not the most efficient means of locating data in a DOM tree. The Framework Class Library provides class XPathNavigator in the System.Xml.XPath namespace for iterating through node lists that match search criteria, which are written as XPath expressions. Recall that XPath (XML Path Language) provides a syntax for locating specific nodes in XML documents effectively and efficiently. XPath is a string-based language of expressions used by XML and many of its related technologies (such as XSLT, discussed in Section 19.7).
Figure 19.25 uses an XPathNavigator to navigate an XML document and uses a treeView control and treeNode objects to display the XML document's structure. In this example, the TReeNode node list is updated each time the XPathNavigator is positioned to a new node, rather than displaying the entire DOM tree at once. Nodes are added to and deleted from the treeView to reflect the XPathNavigator's location in the DOM tree. Figure 19.26 shows the XML document sports.xml that we use in this example. [Note: The versions of sports.xml presented in Fig. 19.26 and Fig. 19.16 are nearly identical. In the current example, we do not want to apply an XSLT, so we omit the processing instruction found in line 2 of Fig. 19.16.]
Figure 19.25. XPathNavigator navigating selected nodes.
1 // Fig. 19.25: PathNavigator.cs 2 // Demonstrates class XPathNavigator. 3 using System; 4 using System.Windows.Forms; 5 using System.Xml.XPath; // contains XPathNavigator 6 7 namespace PathNavigator 8 { 9 public partial class PathNavigatorForm : Form 10 { 11 public PathNavigatorForm() 12 { 13 InitializeComponent(); 14 } // end constructor 15 16 private XPathNavigator xPath; // navigator to traverse document 17 18 // references document for use by XPathNavigator 19 private XPathDocument document; 20 21 // references TreeNode list used by TreeView control 22 private TreeNode tree; 23 24 // initialize variables and TreeView control 25 private void PathNavigatorForm_Load( object sender, EventArgs e ) 26 { 27 // load XML document 28 document = new XPathDocument( "sports.xml" ); 29 xPath = document.CreateNavigator(); // create navigator 30 tree = new TreeNode(); // create root node for TreeNodes 31 32 tree.Text = xPath.NodeType.ToString(); // #root 33 pathTreeView.Nodes.Add( tree ); // add tree 34 35 // update TreeView control 36 pathTreeView.ExpandAll(); // expand tree node in TreeView 37 pathTreeView.Refresh(); // force TreeView update 38 pathTreeView.SelectedNode = tree; // highlight root 39 } // end method PathNavigatorForm_Load 40 41 // process selectButton_Click event 42 private void selectButton_Click( object sender, EventArgs e ) 43 { 44 XPathNodeIterator iterator; // enables node iteration 45 46 try // get specified node from ComboBox 47 { 48 // select specified node 49 iterator = xPath.Select( selectComboBox.Text ); 50 DisplayIterator( iterator ); // print selection 51 } // end try 52 // catch invalid expressions 53 catch ( System.Xml.XPath.XPathException argumentException ) 54 { 55 MessageBox.Show( argumentException.Message, "Error", 56 MessageBoxButtons.OK, MessageBoxIcon.Error ); 57 } // end catch 58 } // end method selectButton_Click 59 60 // traverse to first child on firstChildButton_Click event 61 private void firstChildButton_Click( object sender, EventArgs e ) 62 { 63 TreeNode newTreeNode; 64 65 // move to first child 66 if ( xPath.MoveToFirstChild() ) 67 { 68 newTreeNode = new TreeNode(); // create new node 69 70 // set node's Text property to 71 // either navigator's name or value 72 DetermineType( newTreeNode, xPath ); 73 74 // add nodes to TreeNode node list 75 tree.Nodes.Add( newTreeNode ); 76 tree = newTreeNode; // assign tree newTreeNode 77 78 // update TreeView control 79 pathTreeView.ExpandAll(); // expand node in TreeView 80 pathTreeView.Refresh(); // force TreeView to update 81 pathTreeView.SelectedNode = tree; // highlight root 82 } // end if 83 else // node has no children 84 MessageBox.Show( "Current Node has no children.", 85 "", MessageBoxButtons.OK, MessageBoxIcon.Information ); 86 } // end method firstChildButton_Click 87 88 // traverse to node's parent on parentButton_Click event 89 private void parentButton_Click( object sender, EventArgs e ) 90 { 91 // move to parent 92 if ( xPath.MoveToParent() ) 93 { 94 tree = tree.Parent; 95 96 // get number of child nodes, not including sub trees 97 int count = tree.GetNodeCount( false ); 98 99 // remove all children 100 for ( int i = 0; i < count; i++ ) 101 tree.Nodes.Remove( tree.FirstNode ); // remove child node 102 103 // update TreeView control 104 pathTreeView.ExpandAll(); // expand node in TreeView 105 pathTreeView.Refresh(); // force TreeView to update 106 pathTreeView.SelectedNode = tree; // highlight root 107 } // end if 108 else // if node has no parent (root node) 109 MessageBox.Show( "Current node has no parent.", "", 110 MessageBoxButtons.OK, MessageBoxIcon.Information ); 111 } // end method parentButton_Click 112 113 // find next sibling on nextButton_Click event 114 private void nextButton_Click( object sender, EventArgs e ) 115 { 116 // declare and initialize two TreeNodes 117 TreeNode newTreeNode = null; 118 TreeNode newNode = null; 119 120 // move to next sibling 121 if ( xPath.MoveToNext() ) 122 { 123 newTreeNode = tree.Parent; // get parent node 124 125 newNode = new TreeNode(); // create new node 126 127 // decide whether to display current node 128 DetermineType( newNode, xPath ); 129 newTreeNode.Nodes.Add( newNode ); // add to parent node 130 131 tree = newNode; // set current position for display 132 133 // update TreeView control 134 pathTreeView.ExpandAll(); // expand node in Tree''''View 135 pathTreeView.Refresh(); // force TreeView to update 136 pathTreeView.SelectedNode = tree; // highlight root 137 } // end if 138 else // node has no additional siblings 139 MessageBox.Show( "Current node is last sibling.", "", 140 MessageBoxButtons.OK, MessageBoxIcon.Information ); 141 } // end method nextButton_Click 142 143 // get previous sibling on previousButton_Click 144 private void previousButton_Click( object sender, EventArgs e ) 145 { 146 TreeNode parentTreeNode = null; 147 148 // move to previous sibling 149 if ( xPath.MoveToPrevious() ) 150 { 151 parentTreeNode = tree.Parent; // get parent node 152 parentTreeNode.Nodes.Remove( tree ); // delete current node 153 tree = parentTreeNode.LastNode; // move to previous node 154 155 // update TreeView control 156 pathTreeView.ExpandAll(); // expand tree node in TreeView 157 pathTreeView.Refresh(); // force TreeView to update 158 pathTreeView.SelectedNode = tree; // highlight root 159 } // end if 160 else // if current node has no previous siblings 161 MessageBox.Show( "Current node is first sibling.", "", 162 MessageBoxButtons.OK, MessageBoxIcon.Information ); 163 } // end method previousButton_Click 164 165 // print values for XPathNodeIterator 166 private void DisplayIterator( XPathNodeIterator iterator ) 167 { 168 selectTextBox.Clear(); 169 170 // prints selected node's values 171 while ( iterator.MoveNext() ) 172 selectTextBox.Text += iterator.Current.Value.Trim() + " "; 173 } // end method DisplayIterator 174 175 // determine if TreeNode should display current node name or value 176 private void DetermineType( TreeNode node, XPathNavigator xPath ) 177 { 178 switch ( xPath.NodeType ) // determine NodeType 179 { 180 case XPathNodeType.Element: // if Element, get its name 181 // get current node name, and remove whitespaces 182 node.Text = xPath.Name.Trim(); 183 break; 184 default: // obtain node values 185 // get current node value and remove whitespaces 186 node.Text = xPath.Value.Trim(); 187 break; 188 } // end switch 189 } // end method DetermineType 190 } // end class PathNavigatorForm 191 } // end namespace PathNavigator (a) (b) (c) (d) |
Figure 19.26. XML document that describes various sports.
(This item is displayed on page 976 in the print version)
1 3 4 5 6 "783"> 7 Cricket 8 9 10 More popular among commonwealth nations. 11 12 13 14 "239"> 15 Baseball 16 17 18 More popular in America. 19 20 21 22 "418"> 23 Soccer (Futbol) 24 25 26 Most popular sport in the world. 27 28 29 |
The program of Fig. 19.25 loads XML document sports.xml (Fig. 19.26) into an XPathDocument object by passing the document's file name to the XPathDocument constructor (line 28). Method CreateNavigator (line 29) creates and returns an XPathNavigator reference to the XPathDocument's tree structure.
The navigation methods of XPathNavigator are MoveToFirstChild (line 66), MoveToParent (line 92), MoveToNext (line 121) and MoveToPrevious (line 149). Each method performs the action that its name implies. Method MoveToFirstChild moves to the first child of the node referenced by the XPathNavigator, MoveToParent moves to the parent node of the node referenced by the XPathNavigator, MoveToNext moves to the next sibling of the node referenced by the XPathNavigator and MoveToPrevious moves to the previous sibling of the node referenced by the XPathNavigator. Each method returns a bool indicating whether the move was successful. In this example, we display a warning in a MessageBox whenever a move operation fails. Furthermore, each method is called in the event handler of the button that matches its name (e.g., clicking the First Child button in Fig. 19.25(a) triggers firstChildButton_Click, which calls MoveToFirstChild).
Whenever we move forward using XPathNavigator, as with MoveToFirstChild and MoveToNext, nodes are added to the treeNode node list. The private method DetermineType (lines 176189) determines whether to assign the Node's Name property or Value property to the TReeNode (lines 182 and 186). Whenever MoveToParent is called, all the children of the parent node are removed from the display. Similarly, a call to MoveToPrevious removes the current sibling node. Note that the nodes are removed only from the TReeView, not from the tree representation of the document.
The selectButton_Click event handler (lines 4258) corresponds to the Select button. XPathNavigator method Select (line 49) takes search criteria in the form of either an XPathExpression or a string that represents an XPath expression, and returns as an XPathNodeIterator object any node that matches the search criteria. Figure 19.27 summarizes the XPath expressions provided by this program's combo box. We show the result of some of these expressions in Figs. 19.25(b)(d).
XPath Expression |
Description |
---|---|
/sports |
Matches all sports nodes that are child nodes of the document root node. |
/sports/game |
Matches all game nodes that are child nodes of sports, which is a child of the document root. |
/sports/game/name |
Matches all name nodes that are child nodes of game. The game is a child of sports, which is a child of the document root. |
/sports/game/paragraph |
Matches all paragraph nodes that are child nodes of game. The game is a child of sports, which is a child of the document root. |
/sports/game [name='Cricket'] |
Matches all game nodes that contain a child element whose name is Cricket. The game is a child of sports, which is a child of the document root. |
Method DisplayIterator (defined in lines 166173) appends the node values from the given XPathNodeIterator to the selectTextBox. Note that we call string method trim to remove unnecessary whitespace. Method MoveNext (line 171) advances to the next node, which property Current (line 172) can access.
(Optional) Schema Validation with Class XmlReader
|