Parsing XML
This chapter introduces two ways of parsing XML data, available from Qt's XML module. We demonstrate event-driven parsing with SAX, the Simple API for XML, and tree-style parsing with DOM, the Document Object Model.
14.1 |
The Qt XML Module |
325 |
14.2 |
Event-Driven Parsing |
325 |
14.3 |
XML, Tree Structures, and DOM |
329 |
XML is an acronym for eXtensible Markup Language. It is a markup language similar to HTML (HyperText Markup Language), but with stricter syntax and no semantics (i.e., no meanings associated with the tags).
XML's stricter syntax is in strong contrast to HTML. For example:
- Each XML must have a closing , or be self-closing, like this:
.
- XML tags are case sensitive: is not the same as .
- Characters such as > and < that are not actually part of a tag must be replaced by passive equivalents such as > and < in an XML document to avoid confusing the parser.
Example 14.1 is an HTML document that does not conform to XML rules.
Example 14.1. src/xml/html/testhtml.html
This is a title
This is a paragraph. What do you think of that?
Html makes use of unterminated line-breaks:
And those do not make XML parsers happy.
- HTML is not very strict.
- An unclosed tag doesn't bother HTML parsers one bit.
|
If we combined XML syntax with HTML element semantics, we would get a language called XHTML. Example 14.2 shows Example 14.1 rewritten as XHTML.
Example 14.2. src/xml/html/testxhtml.html
This is a title
This is a paragraph. What do you think of that?
Html self-terminating linebreaks are ok:
They don't confuse the XML parser.
- This is proper list item
- This is another list item
|
XML is a whole class of file formats that is understandable and editable by humans as well as by programs. XML has become a popular format for storing and exchanging data from Web applications. It is also a natural language for representing hierarchical (tree-like) information, which includes most documentation.
Many applications (e.g., Qt Designer, Umbrello, Dia) use an XML file format for storing data. Qt Designer's .ui files use XML to describe the layout of Qt widgets in a GUI. The book you are reading now is written in a flavor of XML called Slacker's DocBook.[1] It's like DocBook,[2] an XML language for writing books, but it adds some shorthand tags from XHTML and custom tags for describing courseware.
[1] http://slackerdoc.tigris.org/
[2] http://www.docbook.org
An XML document is comprised of nodes. Elements are nodes and look like this: text or elements . An opening tag can contain attributes. An attribute has the form: name="value". Elements nested inside one another form a parent-child tree structure.
Example 14.3. src/xml/sax1/samplefile.xml
Intro to XML This is a paragraph
|
Look at this example code below: |
In Example 14.3,
- has two
- children, and its parent is a. Elements with no children can be self-terminated with a />, i.e., . Some elements such asand have attributes. Indenting nested elements helps readability, but extra whitespace is ignored by most parsers.
How many direct children are there of the
?[3] http://www.jedit.org
[4] http://quanta.kdewebdev.org/
[5] http://www.kde-apps.org/content/show.php?content=21706
Категории