XML Hacks: 100 Industrial-Strength Tips and Tools

   

There are few issues regarding XML validation that cause as many headaches as validation of business rules (constraints on relations between element and attribute content in an XML document). This hack helps relieve that headache.

Even after the release of the new, grammar-based schema languages XML Schema and RELAX NG, it remains difficult to express restrictions on relations between the contents of various elements and attributes. This hack introduces a method that makes it possible to validate these kinds of rules by combining two XML Schema languages, RELAX NG (http://www.relaxng.org/) and Schematron (http://www.ascc.net/xml/resource/schematron/).

W3C XML Schema (http://www.w3.org/XML/Schema) lacks much support for co-occurrence constraints, and RELAX NG supports them only to the extent that the presence or absence of a particular element or attribute value changes the validation rules. On the other hand, Schematron provides good support for these types of constraints. Schematron is a rule-based language that uses path expressions instead of grammars to define what is allowed in an XML document. This means that instead of creating a grammar for an XML document, a Schematron schema makes assertions applied to a specific context within the document. If the assertion fails, a diagnostic message that is supplied by the author of the schema is displayed.

One drawback of Schematron is that, although the definition of detailed rules is easy, it can often be a bit cumbersome to define structure. A better language for defining structure is RELAX NG, so the combination of the two is perfect to create a very powerful validation mechanism.

As an example, here is a simple mathematical calculation modeled in XML (add.xml):

<addition result="3"> <number>1</number> <number>2</number> </addition>

This example shows a simple addition between two numbers, each modeled with a number element, and the result of the addition specified in the result attribute of the surrounding addition element.

A RELAX NG schema (in XML syntax) to validate this little document is very easy to write and can, for example, look like add.rng in Example 5-15.

Example 5-15. add.rng

<grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <element name="addition"> <ref name="number"/> <ref name="number"/> <attribute name="result"> <data type="decimal"/> </attribute> </element> </start> <define name="number"> <element name="number"> <data type="decimal"/> </element> </define> </grammar>

The schema defines the structure for the document as well as specifying the correct datatype for the number element and the result attribute. The problem is that the previous schema will also validate the following instance, which is structurally correct but mathematically incorrect (badadd.xml):

<addition result="5"> <number>1</number> <number>2</number> </addition>

In RELAX NG, there is no way to specify that the value of the result attribute should equal the sum of the values in the two number elements except by faking it using value elements. By "faking it" I mean that during RELAX NG validation the actual addition does not take place, just the checking of values against a schema. Schematron, on the other hand, is very good at specifying these kinds of relationships. Before explaining how to embed Schematron rules in the RELAX NG schema, let's backtrack and briefly look at how Schematron works.

As mentioned earlier, Schematron uses path expressions to make assertions applied to a specific context within the instance document. Each assertion specifies a test condition that evaluates to either true or false. If the condition evaluates to false then a specific message, specified by the schema author, is given as a validation message. In order to implement the Schematron path expressions, XPath is used with various extensions provided by XSLT. This is very good in terms of validation purposes because it means that the only thing needed for validation with Schematron is an XSLT processor.

In order to define the context and the assertions, a basic Schematron schema consists of three layers: patterns, rules, and assertions. In its simple form, the pattern works as a grouping mechanism for the rules and provides a pattern name that is displayed together with the assertion message if the assertion fails. The rule specifies the context for the assertions, and the assertion itself specifies the test condition that should be evaluated. In XML terms, the pattern is defined using a pattern element, rules are defined using rule elements as children of the pattern element, and assertions are defined using assert elements as children of the rule element.

A Schematron rule for validation of the addition constraint above could look something like this (add.sch):

<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron"> <sch:pattern name="Validate calculation result"> <sch:rule context="addition"> <sch:assert test="@result = number[1] + number[2]" >The addition result is not correct.</sch:assert> </sch:rule> </sch:pattern> </sch:schema>

The rule has a context attribute that specifies the addition element to be the context for the assertion. The assertion has a test attribute that specifies the condition that should be evaluated. In this case, the condition is to validate that the value of the result attribute has the same value as the sum of the values in the two number elements. If this Schematron rule were applied to the erroneous XML instance badadd.xml, a validation message similar to this would be displayed:

From pattern "Validate calculation result": Assertion fails: "The addition result is not correct." at /addition[1] <addition result="2">...</>

So, now we have one RELAX NG schema to validate the structure and one Schematron rule to validate the calculation constraint, and the only thing left is to combine them by embedding the Schematron rule in the RELAX NG schema (dropping the sch:schema document element). This is made possible because a RELAX NG processor will ignore all elements that are not declared in the RELAX NG namespace. The combined schema will then look like this (addsch.rng):

<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" xmlns:sch="http://www.ascc.net/xml/schematron"> <start> <element name="addition"> <sch:pattern name="Validate calculation result"> <sch:rule context="addition"> <sch:assert test="@result = number[1] + number[2]" >The addition result is not correct.</sch:assert> </sch:rule> </sch:pattern> <ref name="number"/> <ref name="number"/> <attribute name="result"> <data type="decimal"/> </attribute> </element> </start> <define name="number"> <element name="number"> <data type="decimal"/> </element> </define> </grammar>

The exact location of the embedded Schematron rule does not matter it can be placed anywhere in the RELAX NG schema. A good location for the embedded rule is within the definition of the element that is the context for the Schematron rule (shown emphasized in the combined schema). The finished RELAX NG schema with embedded Schematron rules is ready for validation, and the only thing left is an explanation of the validation process.

You can use the Topologi Schematron Validator (http://www.topologi.com/products/validator/download.php) to validate add.xml against addsch.rng (Figure 5-9). This validator not only validates against Schematron schemas, but also XML Schema, DTDs, and RELAX NG with embedded Schematron schemas. After downloading and installing the application, open it and then select the working directory for both the XML document and schema. Select the XML document add.xml and the schema addsch.rng, and then click Run. Results are displayed in a dialog box. Try it with badadd.xml to see the difference in results.

Figure 5-9. Topologi Schematron Validator

Without a validator like Topologi to validate the embedded Schematron rules, you can extract them from the RELAX NG schema and validate them separately using normal Schematron validation.

5.11.1 Pulling Schematron Out of RELAX NG

Luckily, this is very easy to do with an XSLT stylesheet called RNG2Schtrn.xsl (http://www.topologi.com/public/Schtrn_XSD/RNG2Schtrn.zip), which will merge all embedded Schematron rules and create a separate Schematron schema. This stylesheet is already in the working directory where you unzipped the file archive.

Apply this stylesheet to the RELAX NG schema with the embedded stylesheet with Xalan C++ [Hack #32] :

xalan -o newadd.sch addsch.rng RNG2Schtrn.xsl

When successful, this transformation will produce this result (newadd.sch):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <sch:schema xmlns:sch="http://www.ascc.net/xml/schematron" xmlns:rng="http://relaxng.org/ns/structure/1.0"> <sch:pattern name="Validate calculation result" xmlns="http://relaxng.org/ns/structure/1.0"> <sch:rule context="addition"> <sch:assert test="@result = number[1] + number[2]">The addition result is not correct.</sch:assert> </sch:rule> </sch:pattern> <sch:diagnostics/> </sch:schema>

Then you can use Topologi to validate add.xml against newadd.sch, or you can use Jing to do it (Version 20030619 of the JAR not jing.exe offers provisional support of Schematron):

java -jar jing.jar newadd.sch add.xml

Figure 5-10 describes the process of extracting a Schematron schema that is embedded in a RELAX NG schema (XML syntax), and then processing the RELAX NG and Schematron schemas separately.

Figure 5-10. Processing a RELAX NG schema with embedded Schematron rules

5.11.2 See Also

  • "An Introduction to Schematron," by Eddie Robertsson, on XML.com: http://www.xml.com/pub/a/2003/11/12/schematron.html

  • "Combining RELAX NG and Schematron," by Eddie Robertsson, on XML.com: http://www.xml.com/pub/a/2004/02/11/relaxtron.html

Eddie Robertsson

Категории

© amp.flylib.com,