Secrets of RSS

XML stands for Extensible Markup Language, and the key word here is extensible. The idea is that you can write and extend your own language from scratch, as long as you follow the XML rules. XML is like HTML in that it's based on enclosing text inside elements. But with XML, you are the one who creates the elements (unlike with HTML, which has predefined elements).

Tip

XML was created and is maintained by the W3C. You can find the formal specification for the most recent version, XML 1.1, at www.w3.org/TR/xml11/, and the specification for XML 1.0, the version used to create RSS and Atom documents, at www.w3.org/TR/REC-xml.

Starting with the XML Declaration

Creating XML documents is best understood by example. Say you have a group of employees and you want to keep track of the projects they're working on. You could do this in an XML document, and, as with any other XML document (including RSS and Atom documents), you have to start with an XML declaration.

<?xml version = "1.0" encoding = "UTF-8"?> . . .

That's the way all RSS and Atom documents must start, with the declaration that says it is an XML document and gives the version. This declaration has a special form, starting with <?xml and ending with ?>. Like HTML elements, this declaration can support attributesthat is, items such as version and encoding that you see in this case.

Note

A version attribute is required in the XML declaration, and the encoding attribute is not.

The version attribute sets the XML version used to create the document, and is always 1.0 for RSS and Atom. The encoding attribute sets the character encodingthat is, the character set used in the document. UTF-8 (8-bit Unicode Transformation Format) is a good choice for an encoding attribute, and in fact is the default encoding for XML documents. A subset of Unicode, UTF-8 matches the ASCII character set used by Microsoft WordPad in Windows and most other text editors in English-speaking countries. If you want to write your RSS feed in languages with other character sets, such as Japanese, you'd use different encoding.

Note especially that each attribute is assigned a quoted text string; in our example this is version = "1.0". In XML, and so in RSS and Atom, you always assign quoted text strings to attributes. In HTML you don't need the quotes, and some attributes don't need to be assigned values; in XML, if you use an attribute, it must have an assigned value.

Creating the Document Element

Now that you've started with the XML declaration, you can add the elements that make up the body of an XML document. Just as in HTML, you store text data in XML elements. Unlike in HTML, you make up the elements you want to use in XML. For example, if you want to create an XML element to store an address, you might come up with a new XML element, the <address> element. Just as in HTML, XML elements have an opening tag and a closing tag, so here's how you might store someone's address:

<address> 14 Picklewood Avenue </address>

There are some rules for XML element names: They can't contain any spaces, they can't start with a number, and they can't start with a punctuation mark. Here are some invalid XML elements:

<phone number>890-5555</phone number> <5feettall>John</5feettall>

The text inside the element is referred to as the element's content, and if an element has content, it needs an opening and a closing tag in XML. Unlike HTML, where some elements don't need a closing tag (such as the <img> or <input> elements), XML requires a closing tag for elements that have content.

Elements that don't have content are referred to as empty elements in XML. An empty element doesn't have any text content between the opening and closing tags, but it can have attributes. Elements in XML do not need a closing tag, but you can use the XML markup /> to end an empty element if you want.

<data language = "English" />

Note

The data for empty elements is stored in attributes, not as content between an opening and closing tag.

The first element in every XML document is the document element. This element contains all the other elements in the document (just as in HTML, one element can contain others), for example:

<name> <lastname>Connery</lastname> <firstname>Sean</firstname> </name>

I'll call the document element <document> in our example, just to make it clear it's the document element, but you can use any valid XML element name (in an RSS document, for example, you use <rss> as the document element):

<?xml version = "1.0" encoding = "UTF-8"?> <document> . . . </document>

Now it's time to start adding to our document the elements that will contain the document's data.

Creating XML Elements

Let's say you want to store some data about your two employees. You could create two <employee> elements and add the elements to your new document:

<?xml version = "1.0" encoding = "UTF-8"?> <document> <employee> . . . </employee> <employee> . . . </employee> </document>

You now have two new XML elements in your document. So far, so good.

Creating XML Attributes

As with HTML, in XML you can add attributes to an element, starting with the opening tag. For example, you might add a status attribute to the <employee> element that indicates the employee's status (active or retired):

<?xml version = "1.0" encoding = "UTF-8"?> <document> <employee status="retired"> . . . </employee> <employee status="active"> . . . </employee> </document>

Nesting XML Elements

Terrific, now you're set. In XML, elements can contain either text data or other elements, so you can add all the data you want about your employees, such as a <name> element that contains nested <lastname> and <firstname> elements:

<?xml version = "1.0" encoding = "UTF-8"?> <document> <employee status="retired"> <name> <lastname>Connery</lastname> <firstname>Sean</firstname> </name> . . . </employee> <employee status="active"> <name> <lastname>Hepburn</lastname> <firstname>Audrey</firstname> </name> . . . </employee> </document>

You can add more information about each employee, such as the date they were hired and the projects they're working on:

<?xml version = "1.0" encoding = "UTF-8"?> <document> <employee status="retired"> <name> <lastname>Connery</lastname> <firstname>Sean</firstname> </name> <hiredate>October 15, 2006</hiredate> <projects> <project> <product>Printer</product> <id>111</id> <price>$111.00</price> </project> <project> <product>Laptop</product> <id>222</id> <price>$989.00</price> </project> </projects> </employee> <employee status="active"> <name> <lastname>Hepburn</lastname> <firstname>Audrey</firstname> </name> <hiredate>October 20, 2006</hiredate> <projects> <project> <product>Desktop</product> <id>333</id> <price>$2995.00</price> </project> <project> <product>Scanner</product> <id>444</id> <price>$200.00</price> </project> </projects> </employee> </document>

That's a complete XML document. To create your file, you can use any plain-text editor you have, such as WordPad in Windows (Figure 4.1). Make sure you save your document as a plain-text file, however. When you save it in WordPad, choose Text Document as the document's type, not Rich Text Format (RTF), which includes all kinds of formatting codes and will make your XML document invalid.

Figure 4.1. This XML document was created in Microsoft WordPad.

Well-formed and Valid XML Documents

There are two more criteria: XML documents must be well formed and valid. There are various rules you need to follow to make an XML document well formed, and you can find them in the XML specifications. The most important rule says that each XML document must have only one document element, and that element must contain all the other elements in the document. You must also avoid any nesting errors. Take a look at the following XML, in which everything is nested properly.

<projects> <project> <product>Desktop</product> <id>333</id> <price>$2995.00</price> </project> <project> <product>Scanner</product> <id>444</id> <price>$200.00</price> </project> </projects>

The following XML is not well formed because the first <project> element doesn't end before the next one starts, thus creating a nesting error. In other words, the two <project> elements are mixed up:

<projects> <project> <product>Desktop</product> <id>333</id> <price>$2995.00</price> <project> </project> <product>Scanner</product> <id>444</id> <price>$200.00</price> </project> </projects>

What makes an XML document valid? When you create an XML document, you can specify its grammar or syntax. For example, what attributes can the <employee> element have? What elements must the <project> element contain? And so on. There are two ways of specifying the grammar for XML documents these days: You can use a Document Type Definition (DTD) or an XML schema. You can use both DTDs and XML schema to check whether an RSS or Atom document adheres to the correct RSS or Atom syntaxall you have to do is use an XML validator (such as www.stg.brown.edu/service/xmlvalid/) and check your document against the standard RSS or Atom DTD or XML schema. Since you can find RSS validators online (such as www.feedvalidator.org), there's no need to go into great detail about the DTDs or XML schema here, but I highly recommend you always validate your RSS or Atom feeds.

This introduction to XML gives you the foundation you'll need to write RSS and Atom documents. I haven't covered all the XML rules, of coursethere are entire books on the topic if you want more information. Or, take a look at the XML specifications online if you have the stomach for it (they're very slow reading).

XML itself isn't really a language, despite the name Extensible Markup Language. It's really a set of rules you use to write your own language. That's how RSS and Atom came to be: Their authors created a set of elements, such as the <rss> element, that adhered to the XML rules. In other words, they used the XML rules to create new languagesRSS and Atomwith their own built-in elements.

Now it's time to take a look at RSS, starting with version 0.91.

Категории