W3C XML Schema Documents

In this section, we introduce schemas for specifying XML document structure and validating XML documents. Many developers in the XML community believe that DTDs are not flexible enough to meet today's programming needs. For example, DTDs lack a way of indicating what specific type of data (e.g., numeric, text) an element can contain and DTDs are not themselves XML documents. These and other limitations have led to the development of schemas.

Unlike DTDs, schemas do not use EBNF grammar. Instead, schemas use XML syntax and are actually XML documents that programs can manipulate. Like DTDs, schemas are used by validating parsers to validate documents.

In this section, we focus on the W3C's XML Schema vocabulary (note the capital "S" in "Schema"). We use the term XML Schema in the rest of the chapter whenever we refer to W3C's XML Schema vocabulary. For the latest information on XML Schema, visit www.w3.org/XML/Schema. For tutorials on XML Schema concepts beyond what we present here, visit www.w3schools.com/schema/default.asp.

A DTD describes an XML document's structure, not the content of its elements. For example,

5

contains character data. If the document that contains element quantity references a DTD, an XML parser can validate the document to confirm that this element indeed does contain PCDATA content. However, the parser cannot validate that the content is numeric; DTDs do not provide this capability. So, unfortunately, the parser also considers

hello

to be valid. An application that uses the XML document containing this markup should test that the data in element quantity is numeric and take appropriate action if it is not.

XML Schema enables schema authors to specify that element quantity's data must be numeric or, even more specifically, an integer. A parser validating the XML document against this schema can determine that 5 conforms and hello does not. An XML document that conforms to a schema document is schema valid, and one that does not conform is schema invalid. Schemas are XML documents and therefore must themselves be valid.

Validating Against an XML Schema Document

Figure 19.11 shows a schema-valid XML document named book.xml, and Fig. 19.12 shows the pertinent XML Schema document (book.xsd) that defines the structure for book.xml. By convention, schemas use the .xsd extension. We used an online XSD schema validator provided by Microsoft at

apps.gotdotnet.com/xmltools/xsdvalidator

to ensure that the XML document in Fig. 19.11 conforms to the schema in Fig. 19.12. To validate the schema document itself (i.e., book.xsd) and produce the output shown in Fig. 19.12, we used an online XSV (XML Schema Validator) provided by the W3C at

www.w3.org/2001/03/webdata/xsv

These tools are free and enforce the W3C's specifications regarding XML Schemas and schema validation. Section 19.12 lists several online XML Schema validators.

Figure 19.11. Schema-valid XML document describing a list of books.

1 3 4 5 = "http://www.deitel.com/booklist"> 6 7

Visual Basic 2005 How to Program, 3/e 8 9 10 11 Visual C# 2005 How to Program 12 13 14 15 Java How to Program, 6/e 16 17 18 19 C++ How to Program, 5/e 20 21 22 23 Internet and World Wide Web How to Program, 3/e 24 25

Figure 19.12. XML Schema document for book .xml.

1 3 4 5 = "http://www.w3.org/2001/XMLSchema" 6 xmlns:deitel = "http://www.deitel.com/booklist" 7 targetNamespace = "http://www.deitel.com/booklist"> 8 9 = "books" type = "deitel:BooksType"/> 10 11 = "BooksType"> 12 13 = "book" type = "deitel:SingleBookType" 14 minOccurs = "1" maxOccurs = "unbounded"/> 15 16 17 18 = "SingleBookType"> 19 20 = "title" type = "string"/> 21 22 23  

Figure 19.11 contains markup describing several Deitel books. The books element (line 5) has the namespace prefix deitel, indicating that the books element is a part of the http://www.deitel.com/booklist namespace. Note that we declare the namespace prefix deitel in line 5.

Creating an XML Schema Document

Figure 19.12 presents the XML Schema document that specifies the structure of book.xml (Fig. 19.11). This document defines an XML-based language (i.e., a vocabulary) for writing XML documents about collections of books. The schema defines the elements, attributes and parent-child relationships that such a document can (or must) include. The schema also specifies the type of data that these elements and attributes may contain.

Root element schema (Fig. 19.12, lines 523) contains elements that define the structure of an XML document such as book.xml. Line 5 specifies as the default namespace the standard W3C XML Schema namespace URIhttp://www.w3.org/2001/XMLSchema. This namespace contains predefined elements (e.g., root element schema) that comprise the XML Schema vocabularythe language used to write an XML Schema document.

Portability Tip 19 3

W3C XML Schema authors specify URI http://www.w3.org/2001/XMLSchema when referring to the XML Schema namespace. This namespace contains predefined elements that comprise the XML Schema vocabulary. Specifying this URI ensures that validation tools correctly identify XML Schema elements and do not confuse them with those defined by document authors.

Line 6 binds the URI http://www.deitel.com/booklist to namespace prefix deitel. As we discuss momentarily, the schema uses this namespace to differentiate names created by us from names that are part of the XML Schema namespace. Line 7 also specifies http://www.deitel.com/booklist as the targetNamespace of the schema. This attribute identifies the namespace of the XML vocabulary that this schema defines. Note that the targetNamespace of book.xsd is the same as the namespace referenced in line 5 of book.xml (Fig. 19.11). This is what "connects" the XML document with the schema that defines its structure. When an XML schema validator examines book.xml and book.xsd, it will recognize that book.xml uses elements and attributes from the http://www.deitel.com/booklist namespace. The validator also will recognize that this namespace is the namespace defined in book.xsd (i.e., the schema's targetNamespace). Thus the validator knows where to look for the structural rules for the elements and attributes used in book.xml.

Defining an Element in XML Schema

In XML Schema, the element tag (line 9) defines an element to be included in an XML document that conforms to the schema. In other words, element specifies the actual elements that can be used to mark up data. Line 9 defines the books element, which we use as the root element in book.xml (Fig. 19.11). Attributes name and type specify the element's name and data type, respectively. An element's data type indicates the data that the element may contain. Possible data types include XML Schemadefined types (e.g., string, double) and user-defined types (e.g., BooksType, which is defined in lines 1116). Figure 19.13 lists several of XML Schema's many built-in types. For a complete list of built-in types, see Section 3 of the specification found at www.w3.org/TR/xmlschema-2.

Figure 19.13. Some XML Schema data types.

(This item is displayed on page 951 in the print version)

XML Schema Data Type(s)

Description

Ranges or Structures

Examples

string

A character string.

 

"hello"

boolean

True or false.

TRue, false

true

decimal

A decimal numeral.

i * (10n), where i is an integer and n is an integer that is less than or equal to zero.

5, -12, -45.78

float

A floating-point number.

m * (2e), where m is an integer whose absolute value is less than 224 and e is an integer in the range -149 to 104. Plus three additional numbers: positive infinity, negative infinity and not-a-number (NaN).

0, 12, -109.375, NaN

double

A floating-point number.

m * (2e), where m is an integer whose absolute value is less than 253 and e is an integer in the range -1075 to 970. Plus three additional numbers: positive infinity, negative infinity and not-a-number (NaN).

0, 12, -109.375, NaN

long

A whole number.

-9223372036854775808 to 9223372036854775807, inclusive

1234567890, -1234567890

int

A whole number.

-2147483648 to 2147483647, inclusive

1234567890, -1234567890

short

A whole number.

-32768 to 32767, inclusive

12, -345

date

A date consisting of a year, month and day.

yyyy-mm with an optional dd and an optional time zone, where yyyy is four digits long and mm and dd are two digits long.

2005-05-10

time

A time consisting of hours, minutes and seconds.

hh:mm:ss with an optional time zone, where hh, mm and ss are two digits long.

16:30:25-05:00

In this example, books is defined as an element of data type deitel:BooksType (line 9). BooksType is a user-defined type (lines 1116) in the http://www.deitel.com/booklist namespace and therefore must have the namespace prefix deitel. It is not an existing XML Schema data type.

Two categories of data type exist in XML Schemasimple types and complex types. Simple and complex types differ only in that simple types cannot contain attributes or child elements and complex types can.

A user-defined type that contains attributes or child elements must be defined as a complex type. Lines 1116 use element complexType to define BooksType as a complex type that has a child element named book. The sequence element (lines 1215) allows you to specify the sequential order in which child elements must appear. The element (lines 1314) nested within the complexType element indicates that a BooksType element (e.g., books) can contain child elements named book of type deitel:SingleBookType (defined in lines 1822). Attribute minOccurs (line 14), with value 1, specifies that elements of type BooksType must contain a minimum of one book element. Attribute maxOccurs (line 14), with value unbounded, specifies that elements of type BooksType may have any number of book child elements.

Lines 1822 define the complex type SingleBookType. An element of this type contains a child element named title. Line 20 defines element title to be of simple type string. Recall that elements of a simple type cannot contain attributes or child elements. The schema end tag (, line 23) declares the end of the XML Schema document.

A Closer Look at Types in XML Schema

Every element in XML Schema has a type. Types include the built-in types provided by XML Schema (Fig. 19.13) or user-defined types (e.g., SingleBookType in Fig. 19.12).

Every simple type defines a restriction on an XML Schema-defined type or a restriction on a user-defined type. Restrictions limit the possible values that an element can hold.

Complex types are divided into two groupsthose with simple content and those with complex content. Both can contain attributes, but only complex content can contain child elements. Complex types with simple content must extend or restrict some other existing type. Complex types with complex content do not have this limitation. We demonstrate complex types with each kind of content in the next example.

The schema document in Fig. 19.14 creates both simple types and complex types. The XML document in Fig. 19.15 (laptop.xml) follows the structure defined in Fig. 19.14 to describe parts of a laptop computer. A document such as laptop.xml that conforms to a schema is known as an XML instance documentthe document is an instance (i.e., example) of the schema.

Figure 19.14. XML Schema document defining simple and complex types.

1 3 4 5 "http://www.w3.org/2001/XMLSchema" 6 xmlns:computer = "http://www.deitel.com/computer" 7 targetNamespace = "http://www.deitel.com/computer"> 8 9 "gigahertz"> 10 "decimal"> 11 "2.1"/> 12 13 14 15 "CPU"> 16 17 "string"> 18 "model" type = "string"/> 19 20 21 22 23 "portable"> 24 25 "processor" type = "computer:CPU"/> 26 "monitor" type = "int"/> 27 "CPUSpeed" type = "computer:gigahertz"/> 28 "RAM" type = "int"/> 29 30 "manufacturer" type = "string"/> 31 32 33 "laptop" type = "computer:portable"/> 34

Figure 19.15. XML document using the laptop element defined in computer.xsd.

(This item is displayed on page 953 in the print version)

1 3 4 5 "http://www.deitel.com/computer" 6 manufacturer = "IBM"> 7 8 "Centrino">Intel 9 17 10 2.4 11 256 12

Line 5 declares the default namespace to be the standard XML Schema namespaceany elements without a prefix are assumed to be in the XML Schema namespace. Line 6 binds the namespace prefix computer to the namespace http://www.deitel.com/computer. Line 7 identifies this namespace as the targetNamespacethe namespace being defined by the current XML Schema document.

To design the XML elements for describing laptop computers, we first create a simple type in lines 913 using the simpleType element. We name this simpleType gigahertz because it will be used to describe the clock speed of the processor in gigahertz. Simple types are restrictions of a type typically called a base type. For this simpleType, line 10 declares the base type as decimal, and we restrict the value to be at least 2.1 by using the minInclusive element in line 11.

Next, we declare a complexType named CPU that has simpleContent (lines 1620). Remember that a complex type with simple content can have attributes but not child elements. Also recall that complex types with simple content must extend or restrict some XML Schema type or user-defined type. The extension element with attribute base (line 17) sets the base type to string. In this complexType, we extend the base type string with an attribute. The attribute element (line 18) gives the complexType an attribute of type string named model. Thus an element of type CPU must contain string text (because the base type is string) and may contain a model attribute that is also of type string.

Lastly we define type portable, which is a complexType with complex content (lines 2331). Such types are allowed to have child elements and attributes. The element all (lines 2429) encloses elements that must each be included once in the corresponding XML instance document. These elements can be included in any order. This complex type holds four elementsprocessor, monitor, CPUSpeed and RAM. They are given types CPU, int, gigahertz and int, respectively. When using types CPU and gigahertz, we must include the namespace prefix computer, because these user-defined types are part of the computer namespace (http://www.deitel.com/computer)the namespace defined in the current document (line 7). Also, portable contains an attribute defined in line 30. The attribute element indicates that elements of type portable contain an attribute of type string named manufacturer.

Line 33 declares the actual element that uses the three types defined in the schema. The element is called laptop and is of type portable. We must use the namespace prefix computer in front of portable.

We have now created an element named laptop that contains child elements processor, monitor, CPUSpeed and RAM, and an attribute manufacturer. Figure 19.15 uses the laptop element defined in the computer.xsd schema. Once again, we used an online XSD schema validator (apps.gotdotnet.com/xmltools/xsdvalidator) to ensure that this XML instance document adheres to the schema's structural rules.

Line 5 declares namespace prefix computer. The laptop element requires this prefix because it is part of the http://www.deitel.com/computer namespace. Line 6 sets the laptop's manufacturer attribute, and lines 811 use the elements defined in the schema to describe the laptop's characteristics.

In this section, we introduced W3C XML Schema documents for defining the structure of XML documents, and we validated XML instance documents against schemas using an online XSD schema validator. Section 19.9 demonstrates programmatically validating XML documents against schemas using .NET Framework classes. This allows you to ensure that a C# program manipulates only valid documentsmanipulating an invalid document that is missing required pieces of data could cause errors in the program.

(Optional) Extensible Stylesheet Language and XSL Transformations

Категории