Validating an XML Document with a DTD
Problem
You want to verify that an XML document is valid according to a DTD.
Solution
Use the Xerces library with either the SAX2 (Simple API for XML) or the DOM parser.
To validate an XML document using SAX2, obtain a SAX2XMLReader, as in Example 14-8. Next, enable DTD validation by calling the parser's setFeature( ) method with the arguments xercesc::XMLUni::fgSAX2CoreValidation and true. Finally, register an ErrorHandler to receive notifications of DTD violations and call the parser's parse() method with your XML document's name as its argument.
To validate an XML document using DOM, first construct an instance of XercesDOMParser. Next, enable DTD validation by calling the parser's setValidationScheme( ) method with the argument xercesc:: XercesDOMParser::Val_Always. Finally, register an ErrorHandler to receive notifications of DTD violations and call the parser's parse( ) method with your XML document's name as its argument.
|
For example, suppose you modify the XML document animals.xml from Example 14-1 to contain a reference to an external DTD, as illustrated in Examples Example 14-11 and Example 14-12. The code to validate this document using the SAX2 API is presented in Example 14-13; the code to validate it using the DOM parser is presented in Example 14-14.
Example 14-11. DTD animals.dtd for the file animals.xml
Example 14-12. The file animals.xml, modified to contain a DTD
Example 14-13. Validating the document animals.xml against a DTD using the SAX2 API
/* * Same includes as Example 14-8, except is not needed */ #include // runtime_error #include using namespace std; using namespace xercesc; /* * Define XercesInitializer as in Example Example 14-8 * and CircusErrorHandler as in Example Example 14-7 */ int main( ) { try { // Initialize Xerces and obtain a SAX2 parser XercesInitializer init; auto_ptr parser(XMLReaderFactory::createXMLReader( )); // Enable validation parser->setFeature(XMLUni::fgSAX2CoreValidation, true); // Register error handler to receive notifications // of DTD violations CircusErrorHandler error; parser->setErrorHandler(&error); parser->parse("animals.xml"); } catch (const SAXException& e) { cout << "xml error: " << toNative(e.getMessage( )) << " "; return EXIT_FAILURE; } catch (const XMLException& e) { cout << "xml error: " << toNative(e.getMessage( )) << " "; return EXIT_FAILURE; } catch (const exception& e) { cout << e.what( ) << " "; return EXIT_FAILURE; } }
Example 14-14. Validating the document animals.xml against the DTD animals.dtd using XercesDOMParser
#include #include // cout #include // runtime_error #include #include #include #include #include "xerces_strings.hpp" // Example 14-4 using namespace std; using namespace xercesc; /* * Define XercesInitializer as in Example 14-8 * and CircusErrorHandler as in Example 14-7 */ int main( ) { try { // Initialize Xerces and construct a DOM parser. XercesInitializer init; XercesDOMParser parser; // Enable DTD validation parser.setValidationScheme(XercesDOMParser::Val_Always); // Register an error handler to receive notifications // of schema violations CircusErrorHandler handler; parser.setErrorHandler(&handler); // Parse and validate. parser.parse("animals.xml"); } catch (const SAXException& e) { cout << "xml error: " << toNative(e.getMessage( )) << " "; return EXIT_FAILURE; } catch (const XMLException& e) { cout << "xml error: " << toNative(e.getMessage( )) << " "; return EXIT_FAILURE; } catch (const exception& e) { cout << e.what( ) << " "; return EXIT_FAILURE; } }
Discussion
DTDs provide a simple way to constrain an XML document. For example, using a DTD, you can specify what elements may appear in a document; what attributes an element may have; and whether a particular element can contain child elements, text, or both. It's also possible to impose constraints on the type, order, and number of an element's children and on the values an attribute may take.
The purpose of DTDs is to identify the subset of well-formed XML documents that are interesting in a certain application domain. In Example 14-1, for instance, it's important that each animal element has child elements name, species, dateofBirth, veterinarian, and trainer, that the name, species, and dateOfBirth elements contain only text, and that the veterinarian and trainer elements have both a name and a phone attribute. Furthermore, an animal element should have no phone attribute, and a veterinarian element should have no species children.
These are the types of restrictions enforced by the DTD in Example 14-11. For example, the following element declaration states that an animal element must have child elements name, species, dateOfBirth, veterinarian, and trainer, in that order.
Similarly, the following attribute declaration indicates that a TRainer element must have name and phone attributes; the fact that no other attribute declarations for trainer appears in the DTD indicates that these are the only two attributes a TRainer element may have:
An XML document that contains a DTD and conforms to its constraints is said to be valid. An XML parser that checks for validity in addition to checking for syntax errors is called a validating parser. Although SAX2XMLReader parser and XercesDOMParser are not validating parsers by default, they both provide a validation feature that can be enabled as shown in Examples Example 14-13 and Example 14-14. Similarly, a DOMBuilder, described in Recipe 14.4, can be made to validate by calling its setFeature( ) method with the arguments fgXMLUni::fgDOMValidation and true.
|
See Also
Recipe 14.6