Validating an XML Document with a Schema

Problem

You want to verify that an XML document is valid according to a schema, as specified in the XML Schema 1.0 recommendation.

Solution

Use the Xerces library with either the SAX2 or the DOM parser.

Validating an XML document against a schema using the SAX2 API is exactly the same as validating a document that contains a DTD, assuming the schema is contained in or referenced from the target document. If you want to validate an XML document against an external schema, you must call the parser's setProperty( ) method to enable external schema validation. The first argument to setProperty( ) should be XMLUni::fgXercesSchemaExternalSchemaLocation or XMLUni::fgXercesSche-maExternalNoNameSpaceSchemaLocation, depending on whether the schema has a target namespace. The second argument should be the location of the schema, expressed as a const XMLCh*. Make sure to cast the second argument to void*, as explained in Recipe 14.5.

Validating an XML document against a schema using the XercesDOMParser is similar to validating a document against a DTD, assuming the schema is contained in or referenced from the target document. The only difference is that schema and namespace support must be explicitly enabled, as shown in Example 14-15.

Example 14-15. Enabling schema validation with a XercesDOMParser

XercesDOMParser parser; parser.setValidationScheme(XercesDOMParser::Val_Always); parser.setDoSchema(true); parser.setDoNamespaces(true);

If you want to validate an XML document against an external schema with a target namespace, call the parser's setExternalSchemaLocation( ) method with your schema's location as its argument. If you want to validate an XML document against an external schema that has no target namespace, call the parser's setExternalNoNamespaceSchema-Location( ) instead.

Similarly, to validate an XML document against a schema using a DOMBuilder, enable its validation feature as follows:

DOMBuilder* parser = ...; parser->setFeature(XMLUni::fgDOMNamespaces, true); parser->setFeature(XMLUni::fgDOMValidation, true); parser->setFeature(XMLUni::fgXercesSchema, true);

To validate against an external schema using DOMBuilder, set the property XMLUni::fgXercesSchemaExternalSchemaLocation or XMLUni::fgXercesSchemaExternalNoName-SpaceSchemaLocation to the location of the schema.

For example, suppose you want to validate the document animals.xml from Example 14-1 using the schema in Example 14-16. One way to do this is to add a reference to the schema to animals.xml, as shown in Example 14-17. You can then validate the document with the SAX2 API, as shown in Example 14-13, or using DOM, as shown in Example 14-14, with the modification indicated in Example 14-15.

Example 14-16. Schema animals.xsd for the file animals.xml

 

Example 14-17. The file animals.xml, modified to contain a reference to a schema

Another way is to omit the reference to the schema and enable external schema validation. Example 14-18 shows how to do this with the DOM parser.

Example 14-18. Validating an XML document against an external schema, using DOM

/* * Same includes as in Example 14-14 */ using namespace std; using namespace xercesc; /* * Define XercesInitializer as in Example 14-8 * and CircusErrorHandler as in Example 14-7 */ int main( ) { try { // Initialize Xerces and construct a DOM parser. XercesInitializer init; XercesDOMParser parser; // Enable validation parser.setValidationScheme(XercesDOMParser::Val_Always); parser.setDoSchema(true); parser.setDoNamespaces(true); parser.setExternalNoNamespaceSchemaLocation( fromNative("animals.xsd").c_str( ) ); // Register an error handler to receive notifications // of schema violations CircusErrorHandler handler; parser.setErrorHandler(&handler); // Parse and validate. parser.parse("animals.xml"); } catch (const SAXException& e) { cout << "xml error: " << toNative(e.getMessage( )) << " "; return EXIT_FAILURE; } catch (const XMLException& e) { cout << "xml error: " << toNative(e.getMessage( )) << " "; return EXIT_FAILURE; } catch (const exception& e) { cout << e.what( ) << " "; return EXIT_FAILURE; } }

 

Discussion

Like DTDs, discussed in the previous recipe, schemas constrain XML documents. The purpose of a schema is to identify the subset of well-formed XML documents that are interesting in a certain application domain. Schemas differ from DTDs in three respects, however. First, the DTD concept and the associated notion of validity are defined in the XML specification itself, while schemas are described in a separate specification, the XML Schema recommendation. Second, while DTDs use the specialized syntax illustrated in Example 14-11, schemas are expressed as well-formed XML documents. Third, schemas are far more expressive than DTDs. Because of these last two points, schemas are widely regarded as superior to DTDs.

For example, the DTD in Example 14-11 was only able to require that veterinarian elements have exactly two attributes, name and phone, with values consisting of characters. By contrast, the schema in Example 14-16 requires that the value of the phone attribute also match the regular expression (d{3})d{3}-d{4}, i.e., that it have the form (ddd)xxx-dddd, where d represents an arbitrary digit. Similarly, while the DTD was only able to require that the dateOfBirth element has textual content, the schema requires that the text be of the form yyyy-mm-dd, where yyyy ranges from 0001 to 9999, mm ranges from 01 to 12, and dd ranges from 01 to 31. The ability to impose these additional requirements is a great benefit, since it shifts work from the programmer to the parser.

See Also

Recipe 14.5

Категории