Validating an XML Document
Credit: Mauro Cicio
Problem
You want to check whether an XML document conforms to a certain schema or DTD.
Solution
Unfortunately, as of this writing there are no stable, pure Ruby libraries that do XML validation. Youll need to install a Ruby binding to a C library. The easiest one to use is the Ruby binding to the GNOME libxml2 toolkit. (There are actually two Ruby bindings to libxml2, so don get confused: we e referring to the one you get when you install the libxml-ruby gem.)
To validate a document against a DTD, create a a DTD object and pass it into Document#validate. To validate against an XML Schema, pass in a Schema object instead.
Consider the following DTD, for a cookbook like this one:
require ubygems require libxml dtd = XML::Dtd.new(%{ })
Heres an XML document that looks like it conforms to the DTD:
open(cookbook.xml, w) do |f|
f.write %{
A difficult/common problem
But does it really? We can tell for sure with Document#validate:
document = XML::Document.file(cookbook.xml) document.validate(dtd) # => true
Heres a Schema definition for the same document. We can validate the document against the schema by making it into a Schema object and passing that into Document#validate:
schema = XML::Schema.from_string %{
Discussion
Programs that use XML validation are more robust and less complicated than nonvalidating versions. Before starting work on a document, you can check whether or not its in the format you expect. Most services that accept XML as input don have forgiving parsers, so you must validate your document before submitting it or it might fail without you even noticing.
One of the most popular and complete XML libraries around is the GNOME Libxml2 library. Despite its name, it works fine outside the GNOME platform, and has been ported to many different OSes. The Ruby project libxml (http://libxml.rubyforge.org) is a Ruby wrapper around the GNOME Libxml2 library. The project is not yet in a mature state, but its very active and the validation features are definitively usable. Not only does libxml support validation and a complete range of XML manipolation techniques, it can also improve your programs speed by an order of magnitude, since its written in C instead of REXMLs pure Ruby.
Don confuse the libxml project with the libxml library. The latter is part of the XML::Tools project. It binds against the GNOME Libxml2 library, but it doesn expose that librarys validation features. If you try the example code above but can find the XML::Dtd or the XML::Schema classes, then youve got the wrong binding. If you installed the libxml-ruby package on Debian GNU/Linux, youve got the wrong one. You need the one you get by installing the libxml-ruby gem. Of course, youll need to have the actual GNOME libxml library installed as well.
See Also
- The Ruby libxml project page (http://www.rubyforge.org/projects/libxml)
- The other Ruby libxml binding (the one that doesn do validation)is part of the XML::Tools project (http://rubyforge.org/projects/xml-tools/); don confuse the two!
- The GNOME libxml project homepage (http://xmlsoft.org/)
- Refer to http://www.w3.org/XML for the difference between a DTD and a Schema
Категории