XML in a Nutshell, Third Edition

     

Earlier, the xs:simpleContent element was used to declare an element that could only contain simple content:

<xs:element name="fullName"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="language" type="xs:language"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>

The base type for the extension in this case was the built-in xs:string data type. But simple types are not limited to the predefined types. The xs:simpleType element can define new simple data types, which can be referenced by element and attribute declarations within the schema.

17.6.1 Defining New Simple Types

To show how new simple types can be defined, let's extend the phone element from the example application to support a new attribute called location . This attribute will be used to differentiate between work and home phone numbers . This attribute will have a new simple type called locationType , which will be referenced from the contactsType definition:

<xs:complexType name="contactsType"> <xs:sequence> <xs:element name="phone" minOccurs="0"> <xs:complexType> <xs:attribute name="number" type="xs:string"/> <xs:attribute name="location" type="addr:locationType"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:simpleType name="locationType"> <xs:restriction base="xs:string"/> </xs:simpleType>

Of course, a location type that just maps to the built-in xs:string type isn't particularly useful. Fortunately, schemas can strictly control the possible values of simple types through a mechanism called facets .

17.6.2 Facets

In schema-speak, a facet is an aspect of a possible value for a simple data type. Depending on the base type, some facets make more sense than others. For example, a numeric data type can be restricted by the minimum and maximum possible values it could contain. But these types of restrictions wouldn't make sense for a boolean value. The following list covers the different facet types that are supported by a schema processor:

  • length (or minLength and maxLength )

  • pattern

  • enumeration

  • whiteSpace

  • maxInclusive and maxExclusive

  • minInclusive and minExclusive

  • totalDigits

  • fractionDigits

Facets are applied to simple types using the xs:restriction element. Each facet is expressed as a distinct element within the restriction block, and multiple facets can be combined to further restrict potential values of the simple type.

17.6.2.1 Handling whitespace

The whiteSpace facet controls how the schema processor will deal with any whitespace within the target data. Whitespace normalization takes place before any of the other facets are processed . There are three possible values for the whiteSpace facet:

preserve

Keep all whitespace exactly as it was in the source document (basic XML 1.0 whitespace handling for content within elements).

replace

Replace occurrences of #x9 (tab), #xA (line feed), and #xD ( carriage return) characters with #x20 (space) characters.

collapse

Perform the replace step first, then collapse multiple-space characters into a single space.

17.6.2.2 Restricting length

The length-restriction facets are fairly easy to understand. The length facet forces a value to be exactly the length given. The minLength and maxLength facets can set a definite range for the lengths of values of the type given. For example, take the nameComponent type from the schema. What if a name component could not exceed 50 characters (because of a database limitation, for instance)? This rule can be enforced by using the maxLength facet. Incorporating this facet requires a new simple type to reference from within the nameComponent complex type definition:

<xs:complexType name="nameComponent"> <xs:simpleContent> <xs:extension base="addr:nameString"/> </xs:simpleContent> </xs:complexType> <xs:simpleType name="nameString"> <xs:restriction base="xs:string"> <xs:maxLength value="50"/> </xs:restriction> </xs:simpleType>

The new nameString simple type is derived from the built-in xs:string type, but it can contain no more than 50 characters (the default is unlimited). The same approach can be used with the length and minLength facets.

17.6.2.3 Enumerations

One of the more useful types of restriction is the simple enumeration. In many cases, it is sufficient to restrict possible values for an element or attribute to a member of a predefined list. For example, values of the new locationType simple type defined earlier could be restricted to a list of valid options, like so:

<xs:simpleType name="locationType"> <xs:restriction base="xs:string"> <xs:enumeration value="work"/> <xs:enumeration value="home"/> <xs:enumeration value="mobile"/> </xs:restriction> </xs:simpleType>

Then, if the location attribute in any instance document contained a value not found in the list of enumeration values, the schema processor would generate a validity error.

17.6.2.4 Numeric facets

Almost half of the of built-in data types defined by the schema specification represent numeric data of one type or another. The following two sections cover all of the numeric facets available, but see Chapter 22 for a comprehensive list of which of these facets are applicable to which data types.

17.6.2.4.1 Minimum and maximum values

Four facets control the minimum and maximum values of items:

  • minInclusive

  • minExclusive

  • maxInclusive

  • maxExclusive

The primary difference between the inclusive and exclusive flavors of the min and max facets is whether the value given is considered part of the set of allowable values. For example, the following two facet declarations are equivalent when restricting xs:integer :

<xs:maxInclusive value="0"/> <xs:maxExclusive value="1"/>

The difference between inclusive and exclusive becomes more significant when dealing with decimal or floating-point values. For example, if minExclusive were set to 5.0 , the equivalent minInclusive value would require an infinite number of nines to the right of the decimal point ( 4.99999 ). These facets can also be applied to date and time values.

17.6.2.4.2 Length and precision

There are two facets that control the length and precision of decimal numeric values: totalDigits and fractionDigits . The totalDigits facet determines the total number of digits (only digits are counted, not signs or decimal points) that are allowed in a complete number. fractionDigits determines the number of those digits that must appear to the right of the decimal point in the number.

17.6.2.5 Enforcing format

The xs:pattern facet can place very sophisticated restrictions on the format of string values. The pattern facet compares the value in question against a regular expression, and if the value doesn't conform to the expression, it generates a validation error. For example, this xs:simpleType element declares a Social Security number simple type using the pattern facet:

<xs:simpleType name="ssn"> <xs:restriction base="xs:string"> <xs:pattern value="\d\d\d-\d\d-\d\d\d\d"/> </xs:restriction> </xs:simpleType>

This new simple type enforces the rule that a Social Security number consists of three digits, a dash followed by two digits, another dash, and finally four more digits. The actual regular expression language is very similar to that of the Perl programming language. See Chapter 22 for more information on the full pattern-matching language.

17.6.2.6 Lists

XML 1.0 provided a few very simple list types that could be declared as possible attribute values: IDREFS , ENTITIES , and NMTOKENS . Schemas have generalized the concept of lists and provide the ability to declare lists of arbitrary types.

These list types are themselves simple types and may be used in the same places other simple types are used. For example, if the fullName element were expanded to accommodate multiple middle names , one approach would be to declare the middle element to contain a list of nameString values:

<xs:element name="middle" type="addr:nameList" minOccurs="0"/> . . . <xs:complexType name="nameList"> <xs:simpleContent> <xs:extension base="addr:nameListType"/> </xs:simpleContent> </xs:complexType> <xs:simpleType name="nameListType"> <xs:list itemType="addr:nameString"/> </xs:simpleType>

After this change has been made, the middle element of an instance document can contain an unlimited list of names, each of which can contain up to 50 characters separated by whitespace. The use of xs:complexType here will greatly simplify adding attributes later.

17.6.2.7 Unions

In some cases, it is useful to allow potential values for elements and attributes to have any of several types. The xs:union element allows a type to be declared that can draw from multiple type spaces. For example, it might be useful to allow users to enter their own one-word descriptions into the location attribute of the phone element, as well as to choose from a list. The location attribute declaration could be modified to include a union that incorporated the locationType type and the xs:NMTOKEN types:

<xs:attribute name="location"> <xs:simpleType> <xs:union memberTypes="addr:locationType xs:NMTOKEN"/> </xs:simpleType> </xs:attribute>

Now the location attribute can contain either addr:locationType or xs:NMTOKEN content.

Категории