Event-Driven Parsing
Working with SAX-style XML parsers means doing event-driven programming. The flow of execution depends entirely on the data that is being read from a file. This inversion of control means that the thread of execution will be more difficult to trace. Our code will be called by code inside the Qt library.
Invoking the parser involves creating a reader and a handler, hooking them up, and calling parse(), as shown in Example 14.4.
Example 14.4. src/xml/sax1/tagreader.cpp
#include "structureparser.h" #include #include #include #include int main( int argc, char **argv ) { if ( argc < 2 ) { qDebug() << QString("Usage: %1 ").arg(argv[0]); return 1; } for ( int i=1; i < argc; ++i ) { QFile xmlFile( argv[i] ); QXmlInputSource source( &xmlFile ); StructureParser handler; <-- 1 QXmlSimpleReader reader; <-- 2 reader.setContentHandler( &handler ); <-- 3 reader.parse( source ); <-- 4 } return 0; } (1)a custom derived instance of QXmlContentHandler (2)the generic parser (3)Hook up the objects together. (4)Start parsing. |
The interface for parsing XML is described in the abstract base class QXmlContentHandler. We call this a passive interface because these methods get called, just not from our code. QXmlSimpleReader is provided, which reads an XML file and generates parse events, calling methods on a content handler in response to them. Figure 14.1 shows the main classes involved.
Figure 14.1. Abstract and concrete SAX classes
For the reader to provide any useful information, it needs an object to receive parse events. This object, a parse event handler, must implement a published interface, so it can "plug" into the parser, as shown in Figure 14.2.
Figure 14.2. Plug-in component architecture
The handler derives (directly or indirectly) from QXmlContentHandler. The virtual methods get called by the parser when it encounters various elements of the XML file during parsing. This is event-driven programming: You do not call these functions directly.
Example 14.5. src/xml/sax1/structureparser.h
#include class QString; class MyHandler : public QXmlDefaultHandler { public: bool startDocument(); bool startElement( const QString & namespaceURI, const QString & localName, const QString & qName, const QXmlAttributes & atts); bool characters(const QString& text); bool endElement( const QString & namespaceURI, const QString & localName, const QString & qName ); private: QString indent; }; #endif |
These passively called functions are often referred to as callbacks. They respond to events generated by the parser. The client code of MyHandler is the QXmlSimpleReader class, inside the Qt XML Module.
ContentHandler or DefaultHandler?
QXmlContentHandler is an abstract class with many pure virtual methods, all of which must be overridden by any concrete derived class. Qt has provided a concrete class named QXmlDefaultHandler that implements the base class pure virtual methods as empty do-nothing bodies. You can think of this class as a concrete base class. Handlers derived from this class are not required to override all of the methods but must override some in order to accomplish anything.
If we do not properly override each handler method that will be used by our app, the corresponding QXmlDefaultHandler method, which does nothing, will be called instead. In the body of a handler function, you can
- Store the parse results in a data structure
- Create objects according to certain rules
- Print or transform the data in a different format
- Do other useful things
See Example 14.6.
Example 14.6. src/xml/sax1/myhandler.cpp
[ . . . . ] QTextStream cout(stdout, QIODevice::WriteOnly); bool MyHandler::startDocument() { indent = ""; return TRUE; } bool MyHandler::characters(const QString& text) { QString t = text; cout << t.remove(' '); return TRUE; } bool MyHandler::startElement( const QString&, const QString&, const QString& qName, const QXmlAttributes& atts) { QString str = QString(" %1\%2").arg(indent).arg(qName); cout << str; if (atts.length()>0) { QString fieldName = atts.qName(0); QString fieldValue = atts.value(0); cout << QString("(%2=%3)").arg(fieldName).arg(fieldValue); } cout << "{"; indent += " "; return TRUE; } bool MyHandler::endElement( const QString&, const QString& , const QString& ) { indent.remove( 0, 4 ); cout << "}"; return TRUE; } [ . . . . ] |
The QXmlAttributes object passed into the startElement() function is a map, used to hold the name = value attribute pairs that were contained in the XML elements.
As it processes the file, the parse() function calls characters(), startElement(), and endElement() as these "events" are encountered in the file. In particular, each time a string of ordinary characters (between the beginning and end of a tag) is encountered, it's passed as an array of bytes to the characters() function.
We ran the previous program on Example 14.3 and it transformed that document into Example 14.7, something that looks a little like LaTeX, another document format.
Example 14.7. src/xml/sax1/tagreader-output.txt
section(id=xmlintro){ itle{ Intro to XML } para{ This is a paragraph } ul{ li{ This is an unordered list item. } li(c=textbook){ This only shows up in the textbook } } p{ Look at this example code below: } include(src=xmlsamplecode.cpp){}} |