Parsing XML

This chapter introduces two ways of parsing XML data, available from Qt's XML module. We demonstrate event-driven parsing with SAX, the Simple API for XML, and tree-style parsing with DOM, the Document Object Model.

14.1

The Qt XML Module

325

14.2

Event-Driven Parsing

325

14.3

XML, Tree Structures, and DOM

329

XML is an acronym for eXtensible Markup Language. It is a markup language similar to HTML (HyperText Markup Language), but with stricter syntax and no semantics (i.e., no meanings associated with the tags).

XML's stricter syntax is in strong contrast to HTML. For example:

Example 14.1 is an HTML document that does not conform to XML rules.

Example 14.1. src/xml/html/testhtml.html

This is a title

This is a paragraph. What do you think of that?

Html makes use of unterminated line-breaks:

And those do not make XML parsers happy.

 

 

 

If we combined XML syntax with HTML element semantics, we would get a language called XHTML. Example 14.2 shows Example 14.1 rewritten as XHTML.

Example 14.2. src/xml/html/testxhtml.html

This is a title

This is a paragraph. What do you think of that?

Html self-terminating linebreaks are ok:

They don't confuse the XML parser.

 

 

 

XML is a whole class of file formats that is understandable and editable by humans as well as by programs. XML has become a popular format for storing and exchanging data from Web applications. It is also a natural language for representing hierarchical (tree-like) information, which includes most documentation.

Many applications (e.g., Qt Designer, Umbrello, Dia) use an XML file format for storing data. Qt Designer's .ui files use XML to describe the layout of Qt widgets in a GUI. The book you are reading now is written in a flavor of XML called Slacker's DocBook.[1] It's like DocBook,[2] an XML language for writing books, but it adds some shorthand tags from XHTML and custom tags for describing courseware.

[1] http://slackerdoc.tigris.org/

[2] http://www.docbook.org

An XML document is comprised of nodes. Elements are nodes and look like this: text or elements . An opening tag can contain attributes. An attribute has the form: name="value". Elements nested inside one another form a parent-child tree structure.

Example 14.3. src/xml/sax1/samplefile.xml

Intro to XML This is a paragraph

 

  • This is an unordered list item.
  • This only shows up in the textbook

Look at this example code below:

In Example 14.3,