Secrets of RSS
In Chapter 3, "Creating RSS Feeds," you got a good start with RSS 0.91 documents, but only simple ones. Here's a more complete RSS 0.91 document, which will be referred to throughout this section: <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <copyright>Copyright 2005.</copyright> <pubDate>Wed, 14 Dec 2005 07:00:00 GMT</pubDate> <lastBuildDate>Mon, 12 Dec 2005 07:00:00 GMT</lastBuildDate> <docs>http://www.rssmaniac.com/steve/info.html</docs> <description>This feed contains news from Steve!</description> <link>http://www.rssmaniac.com/steve</link> <title>Steve's News!</title> <language>en-us</language> <image> <title>Steve's News</title> <url>http://www.rssmaniac.com/steve/Image.jpg</url> <link>http://www.rssmaniac.com/steve</link> <description>Steve's News</description> <width>144</width> <height>36</height> </image> <managingEditor>steve@rssmaniac.com (Steve)</managingEditor> <webMaster>steve@rssmaniac.com (Steve)</webMaster> <skipHours> <hour>8</hour> <hour>9</hour> <hour>10</hour> </skipHours> <skipDays> <day>Sunday</day> </skipDays> <item> <title>Steve shovels the snow</title> <description>It snowed once again. Time to shovel!> </description> <link>http://www.rssmaniac.com/steve</link> </item> <textinput> <title>Search for other items</title> <description>What do you want to find?</description> <name>search</name> <link>http://www.rssmaniac.com/find.php</link> </textinput> </channel> </rss> Let's take this document apart, piece by piece. The XML Declaration and DTD
As all RSS documents should, this one starts with an XML declaration. This XML declaration includes only the required version attribute, which must be set to "1.0" in RSS 0.91. Note also the <!DOCTYPE> element. <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> . . .
The <!DOCTYPE> element specifies the RSS 0.91 DTD (which can be found at http://my.netscape.com/publish/formats/rss-0.91.dtd), so that your RSS reader can check your document for correct syntax if it wants to. Although the <!DOCTYPE> element as written here is officially required in RSS 0.91, many people omit it, and many RSS validators don't insist on it. Here's some technical stuff. The XML declaration has an optional attribute, encoding, which lets you specify the character set you want to use in your feed. For example, if you want to use Japanese, you specify an encoding that supports Japanese characters. You can find the legal character encodings for RSS 0.91 in Table 4.1. The default is UTF-8, which is a condensed version of Unicode and includes all the standard ASCII characters that most text editors use.
The <Rss> Element
The <rss> element is the document element, and starts the data-storage part of an RSS 0.91 document. <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> . . . </rss> Remember, because this is the document element, it contains all the other elements in the document (the XML declaration and the <!DOCTYPE> element are not part of the document element). The version attribute is required and must be set to "0.91" for an RSS 0.91 document. There is one child element of the <rss> elementthe <channel> elementand it is required. The <Channel> Element
The <channel> element contains all the information needed to set up a particular channel. Each <rss> element must contain only one <channel> element. The <channel> element has no attributes. <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> . . . </channel> </rss>
The <channel> element has a number of child elements, some required, some optional. The required child elements are the following:
Here are the optional child elements inside a <channel> element:
The <Copyright> Element
The <copyright> element is an optional element inside the channel element that contains copyright information for the feed. <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <copyright>Copyright 2005.</copyright> . . .
This element has no attributes, and allows no child elements. The <Pubdate> Element
The <pubDate> element, an optional child element of the <channel> element, contains the date at which the channel was published: <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <copyright>Copyright 2005.</copyright> <pubDate>Wed, 14 Dec 2005 07:00:00 GMT</pubDate> . . .
There's no special format for the date and time; that's up to you. The <pubDate> element has no child elements and no attributes. The <lastBuildDate> Element
As you can tell from its name, the <lastBuildDate> element contains the last time this document was written and published. <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <copyright>Copyright 2005.</copyright> <pubDate>Wed, 14 Dec 2005 07:00:00 GMT</pubDate> <lastBuildDate>Mon, 12 Dec 2005 07:00:00 GMT</lastBuildDate> . . . The <lastBuildDate> element is an optional child element of the <channel> element. It has no child elements and no attributes, so it's nice and simple. And there's no special format for recording the date. The last build date can be useful: Among other things, it tells your readers how often you update your feed (so if you don't update your feeds often, you might want to omit this element!). The <docs> Element
The <docs> element contains the URL for more information and a description of the channel. <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <copyright>Copyright 2005.</copyright> <pubDate>Wed, 14 Dec 2005 07:00:00 GMT</pubDate> <lastBuildDate>Mon, 12 Dec 2005 07:00:00 GMT</lastBuildDate> <docs>http://www.rssmaniac.com/steve/info.html</docs> . . .
This element, another optional child element of the <channel> element, has no child elements and no attributes. All it contains is a URL that contains documentation (hence the name, <docs>) for this channel. This is where you can give your readers more information about your channel, such as help files on a Web site, so make the most of it. Note that URLs must begin with either http://or ftp://not www. The <description> Element
Like an HTML field, in which you enter text, the <description> element holds a text description of a channel, an item, or a text-input control. It's a required child element of the <channel> element; each <channel> element must have one <description> child element. The <description> element is also a required child element of the <item> and <textinput> elements. In our example, it describes our new channel. <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <copyright>Copyright 2005.</copyright> <pubDate>Wed, 14 Dec 2005 07:00:00 GMT</pubDate> <lastBuildDate>Mon, 12 Dec 2005 07:00:00 GMT</lastBuildDate> <docs>http://www.rssmaniac.com/steve/info.html</docs> <description>This feed contains news from Steve!</description> . . .
The <description> element doesn't have any child elements or attributes. It's an important element, so make sure you add appropriate text to it. This is the text that will be displayed in the RSS reader when the user asks for the properties of your feed. The <link> Element
The <link> element is a required child element of the <channel>, <image>, <item>, and <textinput> elements, and represents the URL the user can click. When used as a child element of the <channel> element, it usually holds the home page of the channel's creator or organization, for example. When used as a child element of an <item> element, it's a link to the full item on a Web site. It is used to define a link for the new channel in our RSS 0.91 document: <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <copyright>Copyright 2005.</copyright> <pubDate>Wed, 14 Dec 2005 07:00:00 GMT</pubDate> <lastBuildDate>Mon, 12 Dec 2005 07:00:00 GMT</lastBuildDate> <docs>http://www.rssmaniac.com/steve/info.html</docs> <description>This feed contains news from Steve!</description> <link>http://www.rssmaniac.com/steve</link> . . .
Note that URLs must begin with either http://or ftp://, not www. The <title> Element
The <title> element is a required element in the <channel>, <image>, <item>, and <textinput> elements, and, as you know, holds a title for that channel, item, or text-input control. The <title> element is used to give a title to the new channel. <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <copyright>Copyright 2005.</copyright> <pubDate>Wed, 14 Dec 2005 07:00:00 GMT</pubDate> <lastBuildDate>Mon, 12 Dec 2005 07:00:00 GMT</lastBuildDate> <docs>http://www.rssmaniac.com/steve/info.html</docs> <description>This feed contains news from Steve!</description> <link>http://www.rssmaniac.com/steve</link> <title>Steve's News!</title> . . . The title for a channel appears in an RSS reader's feeds window; the title for an item appears in the titles window; and the title for a text-input control is used to label that control. The <language> Element
The <language> element, a required child element in the <channel> element, specifies the language for the channel. Our example specifies U.S. English. <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <copyright>Copyright 2005.</copyright> <pubDate>Wed, 14 Dec 2005 07:00:00 GMT</pubDate> <lastBuildDate>Mon, 12 Dec 2005 07:00:00 GMT</lastBuildDate> <docs>http://www.rssmaniac.com/steve/info.html</docs> <description>This feed contains news from Steve!</description> <link>http://www.rssmaniac.com/steve</link> <title>Steve's News!</title> <language>en-us</language> . . . The <language> element has no attributes and no child elements. The content of this element, such as en-us, is called a language code. A large number of language codes are allowed for RSS 0.91 (Table 4.2).
Note RSS readers use language codes to determine the language your feed uses, but to select a character set for that language, don't forget to set the XML declaration's encoding attribute.
The <image> Element
As you know, the <image> element is used to connect an image to the channel, and the RSS reader displays the image. <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <copyright>Copyright 2005.</copyright> <pubDate>Wed, 14 Dec 2005 07:00:00 GMT</pubDate> <lastBuildDate>Mon, 12 Dec 2005 07:00:00 GMT</lastBuildDate> <docs>http://www.rssmaniac.com/steve/info.html</docs> <description>This feed contains news from Steve!</description> <link>http://www.rssmaniac.com/steve</link> <title>Steve's News!</title> <language>en-us</language> <image> <title>Steve's News</title> <url>http://www.rssmaniac.com/steve/Image.jpg </url> <link>http://www.rssmaniac.com/steve</link> <description>Steve's News</description> <width>144</width> <height>36</height> </image> . . . The <image> element has no attributes, but it can have a number of child elements. The required child elements are the following:
And here are the optional child elements:
Let's take a look at some of these elements next. Some of them, such as <width> and <height>, have particular restrictions you should know about. The <image>; Element's <title> Element
The <title> element gives a title to the image. <image> <title>Steve's News</title> . . . </image>
RSS readers often show this title if for some reason the image can't be displayed. The <title> element, which is required in the <image> element, has no child elements or attributes. The <image> Element's <url> Element
The <url> element's function is no surprise: It holds the URL of the image, and is a required child element of the <image> element. <image> <title>Steve's News</title> <url>http://www.rssmaniac.com/steve/Image.jpg</url> . . . </image> This element doesn't have any child elements or attributes. Note that URLs must begin with either http://or ftp://, not www. The <image> Element's <link> Element
The <link> element is a required child element of the <image> element, and contains the URL the RSS reader brings up if the user clicks the link. <image> <title>Steve's News</title> <url>http://www.rssmaniac.com/steve/Image.jpg</url> <link>http://www.rssmaniac.com/steve</link> . . . </image>
A URL in a <link> element must begin with http://or ftp://. Usually, this URL points to your main page, or to a page that explains more about your feed. The <image> Element's <description> Element
The <description> element is an optional <image> element, and, as you can guess, is designed to include a description of your feed that you usually put into the image's (required) <title> element. <image> <title>Steve's News</title> <url>http://www.rssmaniac.com/steve/Image.jpg</url> <link>http://www.rssmaniac.com/steve</link> <description>Steve's News</description> . . . </image>
The <image> Element's <width> Element
You can let the RSS reader know the width of an image, but, as with Web browsers, you don't have to, because the RSS reader can figure it out after the image is fully loaded. <image> <title>Steve's News</title> <url>http://www.rssmaniac.com/steve/Image.jpg</url> <link>http://www.rssmaniac.com/steve</link> <description>Steve's News</description> <width>144</width> . . . </image> The text contained in this element is a positive integer that gives the width of the feed's image in pixels, and its value must be between 1 and 144, inclusive (yep, 144 is the maximum). So you can't use an image 800 pixels wide in your feed. Some readers use a default value if you don't specify the width of your image, and that, for some reason, is 88 pixels. This element has no child elements and no attributes. The <image> Element's <height> Element
The image's <height> element is an optional child element of the <image> element, and specifies the image's height in pixels. <image> <title>Steve's News</title> <url>http://www.rssmaniac.com/steve/Image.jpg</url> <link>http://www.rssmaniac.com/steve</link> <description>Steve's News</description> <width>144</width> <height>36</height> </image>
The text contained in this element should correspond to a positive integer between 1 and 400, inclusivemeaning images can be quite a bit higher than they are wide. The default value for 0.91 RSS documents, for some reason, is set at 31 pixels. This element has no child elements and no attributes. The <managingEditor> Element
The <managingEditor> element gives your feed's readers someone to contact. An optional element, it's a child element of the <channel> element. <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <copyright>Copyright 2005.</copyright> <pubDate>Wed, 14 Dec 2005 07:00:00 GMT</pubDate> <lastBuildDate>Mon, 12 Dec 2005 07:00:00 GMT</lastBuildDate> <docs>http://www.rssmaniac.com/steve/info.html</docs> <description>This feed contains news from Steve!</description> <link>http://www.rssmaniac.com/steve</link> <title>Steve's News!</title> <language>en-us</language> <image> . . . </image> <managingEditor>steve@rssmaniac.com (Steve)</managingEditor> . . .
Formally speaking, this element should contain the email address of the managing editor of your feed, not just a name. If anyone wants to get in touch, they should be able to use this email address. This element doesn't have any child elements or attributes. The <webMaster> Element
The <webMaster> element, another child element of the <channel> element, holds the email address of the person responsible for handling any technical problems with your feed. This is to be distinguished from the <managingEditor> element, which holds the email address of the person responsible for the content of your site. <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <copyright>Copyright 2005.</copyright> <pubDate>Wed, 14 Dec 2005 07:00:00 GMT</pubDate> <lastBuildDate>Mon, 12 Dec 2005 07:00:00 GMT</lastBuildDate> <docs>http://www.rssmaniac.com/steve/info.html</docs> <description>This feed contains news from Steve!</description> <link>http://www.rssmaniac.com/steve</link> <title>Steve's News!</title> <language>en-us</language> <image> . . . </image> <managingEditor>steve@rssmaniac.com (Steve)</managingEditor> <webMaster>steve@rssmaniac.com (Steve)</webMaster> . . .
This element has no child elements or attributes. The <rating> Element
In practice, the <rating> element is rarely used, so it's not included in our example RSS 0.91 document. The <rating> element, an optional child element of the <channel> element, gives a third-party Platform for Internet Content Selection (PICS) rating of your RSS feed. Among other things, the rating system is designed to avoid allowing access to adult content by minors. (You can find a list of PICS rating organizations at www.w3.org/PICS/raters.htm, but most of the links are defunct.) If you're going to use this element, the text you place in it usually starts with "PICS-1.1", then includes the URL of the rating agency and its rating, resulting in something like the following: <rating>(PICS-1.1 "http://www.picswatcher.org/" (A+))</rating>
The <skipHours> Element
The <skipHours> element lets you set the hours your feed will not be updated, if you choose to specify them. This element is not in big-time use these days, but when RSS was first developed, people assumed that feeds would be updated hourly. Accordingly, this element was designed to give you some time off by specifying what hours you will not be updating your feed. <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <copyright>Copyright 2005.</copyright> <pubDate>Wed, 14 Dec 2005 07:00:00 GMT</pubDate> <lastBuildDate>Mon, 12 Dec 2005 07:00:00 GMT</lastBuildDate> <docs>http://www.rssmaniac.com/steve/info.html</docs> <description>This feed contains news from Steve!</description> <link>http://www.rssmaniac.com/steve</link> <title>Steve's News!</title> <language>en-us</language> . . . <skipHours> . . . </skipHours> . . .
The <skipHours> element has no attributes, but you must include at least one <hour> child element. The <skipHours> Element's <hour> Element
The <hour> element, a child element of the <skipHours> element, contains an hour of the day, measured in Greenwich Mean Time (GMT), when your feed will not be updated. <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <copyright>Copyright 2005.</copyright> <pubDate>Wed, 14 Dec 2005 07:00:00 GMT</pubDate> <lastBuildDate>Mon, 12 Dec 2005 07:00:00 GMT</lastBuildDate> <docs>http://www.rssmaniac.com/steve/info.html</docs> <description>This feed contains news from Steve!</description> <link>http://www.rssmaniac.com/steve</link> <title>Steve's News!</title> <language>en-us</language . . . <skipHours> <hour>8</hour> <hour>9</hour> <hour>10</hour> </skipHours> . . .
The <skipDays> Element
Like the <skipHours> element, the <skipDays> element lets you indicate what days of the week your feed won't be updated. [View full width] <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <copyright>Copyright 2005.</copyright> <pubDate>Wed, 14 Dec 2005 07:00:00 GMT</pubDate> <lastBuildDate>Mon, 12 Dec 2005 07:00:00 GMT</lastBuildDate> <docs>http://www.rssmaniac.com/steve/info.html</docs> <description>ThisThe <skipDays> element has no attributes, but if you use it, you must include at least one child <day> element. The <skipDays> Element's <day> Element
You indicate a day of the week in the <day> element, in English (the specification says you should use English for day names). For example, if your feed isn't updated on Sundays, you can include a <skipDays> element with a <day> child element like this: <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <copyright>Copyright 2005.</copyright> <pubDate>Wed, 14 Dec 2005 07:00:00 GMT</pubDate> <lastBuildDate>Mon, 12 Dec 2005 07:00:00 GMT</lastBuildDate> <docs>http://www.rssmaniac.com/steve/info.html</docs> <description>This feed contains news from Steve!</description> <link>http://www.rssmaniac.com/steve</link> <title>Steve's News!</title> <language>en-us</language . . . <skipDays> <day>Sunday</day> </skipDays> <item> . . . You can include up to seven <day>, elements inside a <skipDays> element, each corresponding to a different day of the week. The <day> element has no child elements and no attributes. The <item> Element
Here's the big one, the <item> element, which is meant to represent a Web page or a section of a Web page. As you know, you use this element to create items in your feed. In RSS 0.91, there is a limit of 15 <item> child elements in the <channel> element. <item> <title>Steve shovels the snow</title> <description>It snowed once again. Time to shovel!> </description> <link>http://www.rssmaniac.com/steve</link> </item> . . . The <item> element has two required child elements:
And it has one optional child element:
The <item> Element's <title> Element
As you know from Chapter 3, an item's <title> element contains the title for the item, and its text will appear in the titles window in an RSS reader. <item> <title>Steve shovels the snow</title> . . . </item>
This element has no attributes or child elements. Note that it's required within an <item> element. The <item> Element's <description> Element
The <description> element contains a description of the item and usually includes part of the full text of the item. Even though this is an optional child element in the <item> element, you'll see a description in nearly all RSS items in your RSS reader. <item> <title>Steve shovels the snow</title> <description>It snowed once again. Time to shovel!> </description> <link>http://www.rssmaniac.com/steve</link> </item>
The <item> Element's <link> Element
The <link> element is also required inside an <item> element, and as you know, it contains the URL of the full item. <item> <title>Steve shovels the snow</title> . . . <link>http://www.rssmaniac.com/steve</link> </item>
The item's link appears below the item's description in the RSS reader, and the user can click the link for more information on the item itself. As with other URLs, the URL you use here must start with http:// or ftp://. The <textinput> Element
RSS 0.91 provides for an optional <textinput> element that readers can use to ask you questions, search your site, provide feedback, and so on. This <textinput> element appears as a text field; readers can enter text and send it, with a click, to a URL you provide. This element is an optional child element of the <channel> element, and is not in general use todayyou won't find many RSS readers that support it, because it's become standard to handle this kind of interaction with your readers on your Web site. <textinput> <title>Search for other items</title> <description>What do you want to find?</description> <name>search</name> <link>http://www.rssmaniac.com/find.cgi</link> </textinput> . . .
The <textinput> element doesn't have any attributes, but it does have four required child elements.
The <textinput> Element's <title> Element
The <title> element lets you display a title for the <textinput> control. The title of the <textinput> control in our example is "Search for other items." <textinput> <title>Search for other items</title> . . . </textinput> This element doesn't have any child elements or attributes. The <textinput> Element's <description> Element
The <textinput> control's <description> element lets you display a description, such as a prompt, for the <textinput> control. <textinput> <title>Search for other items</title> <description>What do you want to find?</description> . . . </textinput> This element doesn't have any child elements or attributes. The <textinput> Element's <name> Element
When you send data to an online program, the data in an HTML control is associated with the name of the control. Online scripts and programs can then recover the data using that name. In our example, the name of the text-input control is search. <textinput> <title>Search for other items</title> <description>What do you want to find?</description> <name>search</name> . . . </textinput>
Thus, for example, if you were sending your data to a PHP script on the server, you could recover the data the user entered into the text-input control with the PHP expression $_REQUEST["search"]. This element also has no child elements or attributes. The <textinput> Element's <link> Element
The <textinput> control's <link> element holds a URL. You want the text the user enters into the control to be sent to this URL. This is the URL of the online script or program that handles that text, such as the program that runs a search for the user. In our example RSS 0.91 document, the text data the user enters will be sent to a PHP script named find.php: <textinput> <title>Search for other items</title> <description>What do you want to find?</description> <name>search</name> <link>http://www.rssmaniac.com/find.php</link> </textinput>
As usual, the URL must begin with http:// or ftp:// (and because you're passing data to an online script, you'll probably want to use http://). This element has no child elements or attributes. That completes the RSS 0.91 syntax. Not badyou're on your way to being an RSS pro. |