What Its All About: XML Exposed
What It s All About XML Exposed
What is XML? The Extensible Markup Language (XML) is a standard for creating your own markup language that describes the structure and meaning of the data in an application. Like HTML, XML uses its own set of rules to serve up content and provides a common language for transferring the content across the Web. The end result is a technology that developers can use to make content available regardless of what system is on the other end.
10 Things That XML Is
- XML is for configuring or structuring data.
Structured or configurable data sets include things such as spreadsheets, address books, configuration parameters, transactions, and technical drawings or schemas. This could also any bit of data that can be grouped and used together. XML is defined as a set of guidelines or conventions that can used to place text in a structured format that lets developers manipulate the data. XML is not actually a programming language. In fact, you don't have to be a programmer to use it or learn it. XML makes it easy for a computer to generate data, read data, and ensure that the data structure is instantly recognizable by the application that reads it. XML also avoids all the common pitfalls that are found in other popular programming languages. XML is extensible and platform independent, and it supports internationalization and localization. XML is also fully compliant with Unicode (see www.unicode.org).
- XML is by look and definition, much like HTML .
Like HTML, XML also makes use of tags (words bracketed by < and > ) and attributes (of the form name ="value" ). Although HTML provides its own specific set of tags and attributes, it also tells you how you must use them and is limiting in its use. XML tags are defined by the developer ”don't tell the developer how to use them ”and there are no limits as to how chunks of data can be grouped or separated. How the data is translated is completely left up to the application that reads it. For example, if you see in an XML file, do not assume that it is the bold font attribute. Depending on the context of its use, it may be a business name, a birthday, a book, or just a b. However, unlike HTML, the rules for XML files are strict. A forgotten tag or an attribute without quotes makes an XML file unusable. HTML is much more forgiving in this way and is tolerated more often than not. The official XML specification forbids applications from trying to second-guess the creator of a broken XML file; if the file is broken, an application must halt immediately and report an error.
- XML is text but isn't meant to be read.
Having a text-based format makes it easy for the developer to read, organize, and put together XML. Developers can easily read and write XML format with their favorite text editor, such as Notepad or WordPad. Text formats also allow developers to easily debug applications without having to load a proprietary client.
- XML is verbose by design.
The designers of XML made a conscious decision to make XML a text-based language. It's not surprising that because XML is text-based and uses tags to delimit its data, an XML file can become rather large, compared to other binary or compressed nontext formats. The only disadvantage to a text-based formatted language is that the amount of available disk space on the developer's computer can be challenged. Of course, in today's computing environment, disk space is usually not a show stopper. It comes in large quantities , can be purchased at the corner computer store, and is not cost-prohibitive for most. Compression programs such as WinZip can also help to compress files when it's time for storage or archiving.
- XML is a family of technologies.
XML 1.0 as defined by the W3C is the actual base specification that defines what tags and attributes are. Other modules in the XML family are complementary to XML and can help with high-priority, routine tasks :
- XLink ” A standard way to add hyperlinks to an XML file.
- XPointer and Xfragments ” Syntaxes in XML development that can point to parts of an XML document. An XPointer is much like a URL, but instead it points to a piece of data inside an XML file.
- CSS 1 & 2 ” Style sheet language. It is applicable to XML as it is to HTML.
- XSL ” The advanced language used for creating style sheets. It is based on XSLT.
- XFrames ” An XML application for composing documents together, replacing HTML frames .
- DOM ” A standard set of function calls for manipulating XML (and HTML) files from a programming language.
- XML Schemas 1 & 2 ” Help developers to define the structure of their own XML tags and attributes.
Several more modules and tools are available or under development by the W3C. Visit the W3C's technical reports page online for the latest information on XML modules.
- XML is new, sort of.
XML began development in 1996 and has been around as a W3C Recommendation since February 1998. However, XML really isn't that new. Before XML there was SGML, which was developed in the early 1980s and has also been an ISO standard since 1986. It was and still is widely used for large documentation projects. The development of HTML, as you know it today on the Web, started to surface in 1990. What the designers of XML did was simply combine the best parts of SGML with the best parts of HTML, to create a simple-to-use language called XML. XML has evolved out of many to create one; its purpose is to revolutionize text-formatted languages. However new or old XML is, it's simple to use and easy to learn, and it doesn't require a proprietary client to compile it. Figure 20.20 shows the XML timeline.
Figure 20.20. The XML timeline.
- XML is leading HTML to XHTML.
HTML is the precursor of XHTML. XHTML is derived from many of the same HTML elements, but its syntax has been changed ever so slightly, as to conform to XML's rules. XHTML is a document that contains text-based language that is XML-based and that inherits XML's syntax. It also restricts its syntax tags and attributes much like HTML ”that is, it allows
for but not for paragraph. XHTML rules define that
is for paragraph , and not for price , person , or anything else. XML and HTML both combined to give birth to XHTML.
- XML is modular.
XML allows you to define a new set of tags by combining and reusing other tag sets. Because two tag sets devised independently may have elements or attributes with the same name, developers should be careful when combining data sets. For example, does
stand for in one set and person in another? On a bright note, XML provides a namespaces mechanism that eliminates the possibility of confusion when combining data sets by using qualified names. Using XML namespaces is a simple method for qualifying element and attribute names by associating them with assigned names as identified by uniform resource identifier (URI) references. The goal here is to support many components at the design and document level and, when defining XML data sets and tag elements, to make it easy to combine two schemas to produce a third.
- XML is the basis for RDF and one common Web-speak.
Resource Description Format (RDF), like RTF is to Domino, is an XML text format that supports resource descriptions and list-based uses, such as music play lists, photo collections, book libraries, and bibliographies . For example, with RDF you can identify people in a Web photo album using information from a personal contact list; then your mail client can automatically create a message to those people letting them know that their photos are on the Web and what URL address they can be viewed at. As with what HTML has done to integrate menu systems and forms in applications on the Web, RDF is integrating applications and agents into one common language that all the Web can understand (Web-speak). This also allows XML to agree on the meanings of its elements and attributes to process the data effectively.
- XML is license-free, platform-independent, and well supported.
XML is a large and continually growing set of community tools as devised by W3C engineers experienced in text-based languages and Web technology. Choosing to use XML is much like choosing ODBC or SQL to get to data stored inside a database. You still have to build your own database, forms, fields, agents, and systems to manipulate the data. What's more, many tools are available and many people are willing to help for free. And speaking of free, like HTML, XML can be used license-free. This allows developers to create their own systems and applications built on XML without having paying anyone anything. Its large and growing support also means that you are not tied to a single vendor.