Core C# and .NET
< Day Day Up > |
XML can be represented in two basic ways: as the familiar external document containing embedded data, or as an in-memory tree structure know as a Document Object Model (DOM). In the former case, XML can be read in a forward-only manner as a stream of tokens representing the file's content. The object that performs the reading stays connected to the data source during the read operations. The XmlReader and XmlTextReader shown in Figure 10-3 operate in this manner. Figure 10-3. Classes to read XML data
More options are available for processing the DOM because it is stored in memory and can be traversed randomly. For simply reading a tree, the XmlNodeReader class offers an efficient way to traverse a tree in a forward, read-only manner. Other more sophisticated approaches that also permit tree modification are covered later in this section. XmlReader Class
XmlReader is an abstract class possessing methods and properties that enable an application to pull data from an XML file one node at a time in a forward-only, read-only manner. A depth-first search is performed, beginning with the root node in the document. Nodes are inspected using the Name, NodeType, and Value properties. XmlReader serves as a base class for the concrete classes XmlTextReader and XmlNodeReader. As an abstract class, XmlReader cannot be directly instantiated; however, it has a static Create method that can return an instance of the XmlReader class. This feature became available with the release of .NET Framework 2.0 and is recommended over the XmlTextReader class for reading XML streams. Listing 10-6 illustrates how to create an XmlReader object and use it to read the contents of a short XML document file. The code is also useful for illustrating how .NET converts the content of the file into a stream of node objects. It's important to understand the concept of nodes because an XML or HTML document is defined (by the official W3C Document Object Model (DOM) specification[2] ) as a hierarchy of node objects. [2] W3C Document Object Model (DOM) Level 3 Core Specification, April, 2004, http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html Listing 10-6. Using XmlReader to Read an XML Document
// Include these namespaces: // using System.Xml; // using System.Xml.XPath; public void ShowNodes() { //(1) Settings object enables/disables features on XmlReader XmlReaderSettings settings = new XmlReaderSettings(); settings.ConformanceLevel = ConformanceLevel.Fragment; settings.IgnoreWhitespace = true; try { //(2) Create XmlReader object XmlReader rdr = XmlReader.Create("c:\\oscarsshort.xml", settings); while (rdr.Read()) { Format(rdr); } rdr.Close(); } catch (Exception e) { Console.WriteLine ("Exception: {0}", e.ToString()); } } private static void Format(XmlTextReader reader) { //(3) Print Current node properties Console.Write( reader.NodeType+ "<" + reader.Name + ">" + reader.Value); Console.WriteLine(); }
Before creating the XmlReader, the code first creates an XmlReaderSettings object. This object sets features that define how the XmlReader object processes the input stream. For example, the ConformanceLevel property specifies how the input is checked. The statement settings.ConformanceLevel = ConformanceLevel.Fragment;
specifies that the input must conform to the standards that define an XML 1.0 document fragment an XML document that does not necessarily have a root node. This object and the name of the XML document file are then passed to the Create method that returns an XmlReader instance: XmlReader rdr = XmlReader.Create("c:\\oscarsshort.xml, settings);
The file's content is read in a node at a time by the XmlReader.Read method, which prints the NodeType, Name, and Value of each node. Listing 10-7 shows the input file and a portion of the generated output. Line numbers have been added so that an input line and its corresponding node information can be compared. Listing 10-7. XML Input and Corresponding Nodes
Input File: oscarsshort.xml (1) <?xml version="1.0" standalone="yes"?> (2) <films> (3) <movies> (4) <!-- Selected by AFI as best movie --> (5) <movie_ID>5</movie_ID> (6) <![CDATA[<a href="http://www.imdb.com/tt0467/">Kane</a>]]> (7) <movie_Title>Citizen Kane </movie_Title> (8) <movie_Year>1941</movie_Year> (9) <movie_Director>Orson Welles</movie_Director> (10) <bestPicture>Y</bestPicture> (11) </movies> (12)</films>
Program Output (NodeType, <Name>, Value): (1) XmlDeclaration<xml>version="1.0" standalone="yes" (2) Element<films> (3) Element<movies> (4) Comment<> Selected by AFI as best movie (5) Element<movie_ID> Text<>5 EndElement<movie_ID> (6) CDATA<><a href="http://www.imdb.com/tt0467/">Kane</a> (7) Element<movie_Title> Text<>Citizen Kane EndElement<movie_Title> ... (12)EndElement<films>
Programs that use XmlReader typically implement a logic pattern that consists of an outer loop that reads nodes and an inner switch statement that identifies the node using an XMLNodeType enumeration. The logic to process the node information is handled in the case blocks: while (reader.Read()) { switch (reader.NodeType) { case XmlNodeType.Element: // Attributes are contained in elements while(reader.MoveToNextAttribute()) { Console.WriteLine(reader.Name+reader.Value); } break; case XmlNodeType.Text: // Process .. break; case XmlNodeType.EndElement // Process .. break; } }
The Element, Text, and Attribute nodes mark most of the data content in an XML document. Note that the Attribute node is regarded as metadata attached to an element and is the only one not exposed directly by the XmlReader.Read method. As shown in the preceding code segment, the attributes in an Element can be accessed using the MoveToNextAttribute method. Table 10-1 summarizes the node types. It is worth noting that these types are not an arbitrary .NET implementation. With the exception of Whitespace and XmlDeclaration, they conform to the DOM Structure Model recommendation.
XmlNodeReader Class
The XmlNodeReader is another forward-only reader that processes XML as a stream of nodes. It differs from the XmlReader class in two significant ways:
In Listing 10-8, an XmlNodeReader object is used to list the movie title and year from the XML-formatted movies database. The code contains an interesting twist: The XmlNodeReader object is not used directly, but instead is passed as a parameter to the constructor of an XmlReader object. The object serves as a wrapper that performs the actual reading. This approach has the advantage of allowing the XmlSettings values to be assigned to the reader. Listing 10-8. Using XmlNodeReader to Read an XML Document
private void ListMovies() { // (1) Specify XML file to be loaded as a DOM XmlDocument doc = new XmlDocument(); doc.Load("c:\\oscarwinners.xml"); // (2) Settings for use with XmlNodeReader object XmlReaderSettings settings = new XmlReaderSettings(); settings.ConformanceLevel = ConformanceLevel.Fragment; settings.IgnoreWhitespace = true; settings.IgnoreComments = true; // (3) Create a nodereader object XmlNodeReader noderdr = new XmlNodeReader(doc); // (4) Create an XmlReader as a wrapper around node reader XmlReader reader = XmlReader.Create(noderdr, settings); while (reader.Read()) { if(reader.NodeType==XmlNodeType.Element){ if (reader.Name == "movie_Title") { reader.Read(); // next node is text for title Console.Write(reader.Value); // Movie Title } if (reader.Name == "movie_Year") { reader.Read(); // next node is text for year Console.WriteLine(reader.Value); // year } } } }
The parameter passed to the XmlNodeReader constructor determines the first node in the tree to be read. When the entire document is passed as in this example reading begins with the top node in the tree. To select a specific node, use the XmlDocument.SelectSingleNode method as illustrated in this segment: XmlDocument doc = new XmlDocument(); doc.Load("c:\\oscarwinners.xml"); // Build tree in memory XmlNodeReader noderdr = new XmlNodeReader(doc.SelectSingleNode("films/movies[2]"));
Refer to Listing 10-1 and you can see that this selects the second movies element group, which contains information on Casablanca. If your application requires read-only access to XML data and the capability to read selected subtrees, the XmlNodeReader is an efficient solution. When updating, writing, and searching become requirements, a more sophisticated approach is required; we'll look at those techniques later in this section. The XmlReaderSettings Class
A significant advantage of using an XmlReader object directly or as a wrapper is the presence of the XmlReaderSettings class as a way to define the behavior of the XmlReader object. Its most useful properties specify which node types in the input stream are ignored and whether XML validation is performed. Table 10-2 lists the XmlReaderSettings properties.
Using an XML Schema to Validate XML Data
The final two properties listed in Table 10-2 Schemas and XsdValidate are used to validate XML data against a schema. Recall that a schema is a template that describes the permissible content in an XML file or stream. Validation can be (should be) used to ensure that data being read conforms to the rules of the schema. To request validation, you must add the validating schema to the XmlSchemaSet collection of the Schemas property; next, set XsdValidate to true; and finally, define an event handler to be called if a validation error occurs. The following code fragment shows the code used with the schema and XML data in Listings 10-1 and 10-3: XmlReaderSettings settings = new XmlReaderSettings(); // (1) Specify schema to be used for validation settings.Schemas.Add(null,"c:\\oscarwinners.xsd"); // (2) Must set this to true settings.XsdValidate = true; // (3) Delegate to handle validation error event settings.ValidationEventHandler += new System.Xml.Schema.ValidationEventHandler(SchemaValidation); // (4) Create reader and pass settings to it XmlReader rdr = XmlReader.Create("c:\\oscarwinners.xml", settings); // process XML data ... ... // Method to handle errors detected during schema validation private void SchemaValidation(object sender, System.Xml.Schema.ValidationEventArgs e) { MessageBox.Show(e.Message); }
Note that a detected error does not stop processing. This means that all the XML data can be checked in one pass without restarting the program. Options for Reading XML Data
All the preceding examples that read XML data share two characteristics: data is read a node at a time, and a node's value is extracted as a string using the XmlReader.Value property. This keeps things simple, but ignores the underlying XML data. For example, XML often contains numeric data or data that is the product of serializing a class. Both cases can be handled more efficiently using other XmlReader methods. XmlReader has a suite of ReadValueAsxxx methods that can read the contents of a node in its native form. These include ReadValueAsBoolean, ReadValueAsDateTime, ReadValueAsDecimal, ReadValueAsDouble, ReadValueAsInt32, ReadValueAsInt64, and ReadValueAsSingle. Here's an example: int age; if(reader.Name == "Age") age= reader.ReadValueAsInt32();
XML that corresponds to the public properties or fields of a class can be read directly into an instance of the class with the ReadAsObject method. This fragment reads the XML data shown in Listing 10-1 into an instance of the movies class. Note that the name of the field or property must match an element name in the XML data. // Deserialize XML into a movies object if (rdr.NodeType == XmlNodeType.Element && rdr.Name == "movies") { movies m = (movies)rdr.ReadAsObject(typeof(movies)); // Do something with object } // XML data is read directly into this class public class movies { public int movie_ID; public string movie_Title; public string movie_Year; private string director; public string bestPicture; public string movie_Director { set { director = value; } get { return (director); } } }
|
< Day Day Up > |