XML Processing Tools
Python ships with XML parsing support in its standard library and plays host to a vigorous XML special-interest group. XML (eXtended Markup Language) is a tag-based markup language for describing many kinds of structured data. Among other things, it has been adopted in roles such as a standard database and Internet content representation by many companies. As an object-oriented scripting language, Python mixes remarkably well with XML's core notion of structured document interchange, and promises to be a major player in the XML arena.
XML is based upon a tag syntax familiar to web page writers. Python's xmllib library module includes tools for parsing XML. In short, this XML parser is used by defining a subclass of an XMLParser Python class, with methods that serve as callbacks to be invoked as various XML structures are detected. Text analysis is largely automated by the library module. This module's source code, file xmllib.py in the Python library, includes self-test code near the bottom that gives additional usage details. Python also ships with a standard HTML parser, htmllib, that works on similar principles and is based upon the sgmllib SGML parser module.
Unfortunately, Python's XML support is still evolving, and describing it is well beyond the scope of this book. Rather than going into further details here, I will instead point you to sources for more information:
Standard library
First off, be sure to consult the Python library manual for more on the standard library's XML support tools. At the moment, this includes only the xmllib parser module, but may expand over time.
PyXML SIG package
At this writing, the best place to find Python XML tools and documentation is at the XML SIG (Special Interest Group) web page at http://www.python.org (click on the "SIGs" link near the top). This SIG is dedicated to wedding XML technologies with Python, and publishes a free XML tools package distribution called PyXML. That package contains tools not yet part of the standard Python distribution, including XML parsers implemented in both C and Python, a Python implementation of SAX and DOM (the XML Document Object Model), a Python interface to the Expat parser, sample code, documentation, and a test suite.
Third-party tools
You can also find free, third-party Python support tools for XML on the Web by following links at the XML SIGs web page. These include a DOM implementation for CORBA environments (4DOM) that currently supports two ORBs (ILU and Fnorb) and much more.
Documentation
As I wrote these words, a book dedicated to XML processing with Python was on the eve of its publication; check the books list at http://www.python.org or your favorite book outlet for details.
Given the rapid evolution of XML technology, I wouldn't wager on any of these resources being up to date a few years after this edition's release, so be sure to check Python's web site for more recent developments on this front.
|