What Is XML?

2017-11-03 09:05:02

XML as we know it today can trace its roots all the way back to the 1980s. Its origins lay in Standardized General Markup Language (SGML), which was developed in the early 1980s and was widely used for large documentation projects. SGML led to Hypertext Markup Language (HTML) in the early 1990s, which as you know, became the foundation of the World Wide Web.

HTML was designed to describe the appearance of data, but not to provide a facility to describe the structure of data. It's this fundamental limitation that led to the development of XML. In 1996, work was begun on Extensible Markup Language, which was an attempt to meld the best parts of HTML and SMGL into a powerful and flexible language.

Unlike HTML, XML is a meta language that describes the structure of data, rather than its presentation, making it ideal for Web-based applications. XML became a W3C Recommendation in early 1998 and rapidly became the premier language for creating structured documents. The following is a list of some of the many things that XML is well suited for:

Sharing information between organizations

Sharing information between disparate applications

Importing and exporting information

Web services

Mobile computing platforms such as PDAs

XML Overview

As mentioned earlier, XML is a set of standards that enables you to create structured documents through the use of tags (also known as elements ) that you define. The following example illustrates an XML document that contains a list of some of my favorite books:

Atlas Shrugged Ayn Rand 1957 Fountainhead Ayn Rand 1943 Anthem Ayn Rand 1938 1984 George Orwell 1949 Animal Farm George Orwell 1946 Brave New World Aldous Huxley 1932

As you can see, I've defined my own tags that give meaning to the data but don't describe the presentation of the data as HTML would. When we're ready to present the data to a user , we'll use another technology to control the format.

While you define your own tags, the rules for doing so are much more stringent than HTML. Table 21.1 illustrates some of the major differences between the two languages.

Table 21.1. Some Key Differences Between XML and HTML

XML HTML

Tags (elements) are case sensitive. Tags are not case sensitive.

An opening tag must have a closing tag.

Closing tags aren't always required; for example,

White space is not ignored. White space is is ignored.

Attribute values must be in quotes; for example,

Quotes aren't required for attributes; for example, .

`Tags (elements) can't overlap; for example, Galt .`	`Tags can overlap; for example, Galt .`
`An empty element must be specifically denoted; for example,` `. There can be only one one root element in an XML document.`	`A single tag is considered an empty element; for example,` `.`

An XML document has three basic parts:

Prolog

Body

Epilog

XML Prolog

The prolog of an XML document is optional and comprises three parts:

XML declaration

Comments

Document type declaration (DTD)

XML Declaration

The XML declaration is used to specify global information about the current XML document. Although not strictly required, it's considered good form to include an XML declaration as the first line in an XML document. An XML declaration specifies that the document is an XML document much the same way that the

tag specifies that a document is an HTML document. The following is an example of an XML declaration:

As you can see in the preceding example, the declaration specifies additional information about the XML document. The first attribute, version , specifies the version of the XML document. The most current version is 1.0.

The second attribute used in the XML declaration, encoding , specifies the type of character encoding used in the document. As mentioned earlier, XML is fully Unicode compliant and the UTF-8 designation means that the document is Unicode.

The final attribute, standalone , is used to indicate whether the document is complete or references an external document such as a DTD or style sheet.

Comments

After the XML declaration has been added to the prolog, you can add comments and processing instructions to the XML document, if necessary.

Comments enable you to add descriptive information to the XML that can help you and other developers understand and document the file. XML comments use the same format as HTML comments, as shown here:

Processing instructions enable you to supply special instructions to the XML parser. The most common usage of processing instructions is to link your XML file to a style sheet so that it can control the presentation of the XML. Extensible Stylesheet Language (XSL) is covered later in this chapter in the section titled "XSL."

Processing instructions use the form

Document Type Declaration

A document type declaration (DTD) is used to define rules that the structure of the XML document must abide by. For example, a DTD can be used to define whether a particular element is required, which child elements go inside which parent elements, and the type of data for an element.

There are two types of DTD: internal and external. An internal DTD is actually included in the XML file itself, whereas an external DTD is a separate file, usually named with the .DTD extension.

The following code snippet illustrates how you can include a reference to an external DTD in the prolog of your XML document:

Although a DTD isn't required, it's normally a good idea to use one so that the rules your XML document must follow are published and can be shared with other developers who need to interface with your XML data. It can be especially helpful to use a DTD when exchanging information with another organization so that they can test the validity of their XML against the DTD. Validity is covered later in this chapter in the section titled "Validity."

XML Content

The body of an XML document is where the actual data is stored, and you'll find that it looks much like an HTML document. The body of an XML document is composed of elements and attributes.

An element, also known as a tag, describes a unit of data. There are a few things you need to know about elements. First, each element consists of three parts: an opening tag, some data, and a closing tag. For example:

John Galt

If you're familiar with HTML, this should look familiar. is the opening tag, John Galt is the data, and is the closing tag.

You also need to note that to create a valid XML document, you must have a root element that contains all the other elements. For example:

John Galt Hank Rearden

In the preceding example, is the root element.

NOTE

You can denote an empty element (that is, one that has no data) by using a shorthand version of the element in the format < element /> .

Much as they do in HTML, attributes enable you to specify additional information about an element. Using attributes in your XML is fairly simple. You add them as name /value pairs in the opening tag of an element in the format name =" value " . For example:

John Galt

In the preceding example, ssn is the name of the attribute, whereas "111-22-3333" is the value of the attribute. It's important to note that you must follow the format shown here. Unlike HTML, either single or double quotes must enclose the value. The following snippet illustrates incorrect attribute syntax:

John Galt

XML Epilog

The epilog of an XML document is used for any additional comments or processing instructions that must be included after the closing root element.

Well- Formedness

As mentioned earlier, XML documents must follow very strict rules governing their syntax. Documents that follow these rules are said to be well- formed , meaning that an XML parser can read them.

To be well-formed, an XML document must meet the following requirements:

All XML documents must contain an XML declaration.

Every opening element must have a corresponding closing element.

Every document must have a single root element that serves as the container for all other elements in the document.

XML is case sensitive, and elements must match. For example,

Atlas Shrugged
is an example of invalid syntax.

All attributes must have the value portion of the attribute enclosed in single (') or double quotes ("). An element may have an unlimited number of attributes.

Empty elements are allowed and are indicated as < element />.

The hierarchy of elements in the document must be nested properly and overlapping is strictly prohibited . The following is an example of incorrect nesting:

KISS Alive!

Comments or remark statements are allowed and use the same format as HTML comment statements, as shown here:

Certain reserved characters must be replaced with entities. Although you can often get by in HTML without using entities, they're a must in XML if you want to be able to parse the data correctly. Table 21.2 shows the most common characters and the entities you need to use to replace them.

Table 21.2. Common XML Entities

Character	Entity
`>`	`>`
`<`	`<`
`"`	`"`
`'`	`'`
`&`	`&`

An XML document that meets all these rules is said to be well-formed, meaning that it should be easily readable by an XML parser. Documents that aren't well-formed create fatal errors that must be corrected before an XML parser can properly read the document.

Validity

Every XML document must be well-formed to be useful, but XML documents that use a DTD can be tested for validity, which is to say that the XML complies with the rules defined in the DTD.

It's important to remember that a well-formed XML document isn't necessarily valid, but a valid XML document is necessarily well-formed.

XSL

As you've learned, XML specifies the structure of data, but provides no control of the presentation of it, which is where XSL (Extensible Stylesheet Language) comes into play.

XSL enables you to define formatting rules that control the presentation of your XML documents. By using the XSLT (Extensible Stylesheet Language for Transformations) engine, you can convert your XML to HTML, making it possible to display your XML documents in a user-friendly, aesthetically pleasing format in virtually any media.

XSL goes way beyond the scope of this chapter. For more information about this important topic, please visit http://www.w3.org/TR/xsl/.

XML Parsers

An XML parser is a tool that enables you to programmatically access and process the contents of an XML document. There are two types of parsers: non-validating and validating parsers. As you might expect from its name, a non-validating parser reads the XML, but doesn't check whether the XML is valid based on a DTD.

On the other hand, a validating parser uses the DTD associated with a document to confirm that the XML follows the rules defined in the DTD. Microsoft Internet Explorer 6. x contains a validating parser. If you attempt to view an invalid XML file that specifies a DTD, you'll receive an error message specifying the error.

As a developer, the XML parser is very important to you because it not only checks your XML document for well-formedness and validity, it also provides a way for you to programmatically manipulate XML documents using application programming interfaces (APIs). There are currently two popular APIs: Document Object Mode (DOM) and Simple API for XML (SAX). Each API has its benefits.

DOM

The Document Object Model API is designed as a platform-neutral interface that enables you manipulate the content, structure, and style of XML documents using a tree-style schema. The root element of the document becomes the root node of the tree, and each XML element in the document becomes a node in the tree based on its hierarchy in the XML document.

When using DOM, the entire XML document is loaded and the tree structure is built in memory, making it quick and easy to navigate through the hierarchy of an XML document.

DOM, as it applies specifically to Domino, is covered in somewhat more depth later in this chapter. For more specific information about DOM, please see http://www.w3.org/DOM/.

SAX

The Simple API for XML (SAX) API was developed to provide a simple, lightweight API for handling XML documents. Although DOM is currently the W3C recommendation, SAX is rapidly becoming the de facto standard for server-to-server XML processing.

The fundamental difference between DOM and SAX is that SAX uses an event-driven model rather than a tree-based model. As an XML document is processed , events are generated for each element and passed to an event handler for processing.

SAX, as it applies specifically to Domino, is covered in more depth later in this chapter. For more specific information about SAX, please see http://www.saxproject.org/.

What Is XML?

`Domino and XML`

Категории