XML in a Nutshell, Third Edition

   

Although it is possible to access the data from the original XML document using only the Node interface, the DOM Core provides a number of specific node-type interfaces that simplify common programming tasks . These specific node types can be divided into two broad types: structural nodes and content nodes.

19.4.1 Structural Nodes

Within an XML document, a number of syntax structures exist that are not formally part of the content. The following interfaces provide access to the portions of the document that are not related to element data.

19.4.1.1 DocumentType

The DocumentType interface provides access to the XML document type definition's notations, entities, internal subset, public ID, and system ID. Since a document can have only one DOCTYPE declaration, only one DocumentType node can exist for a given document. It is accessed via the doctype attribute of the Document interface. The definition of the DocumentType interface is shown in Table 19-6.

Table 19-6. The DocumentType interface, derived from Node

Name

Type

Read-only

Attributes

   

entities

NamedNodeMap

internalSubset

DOMString

name

DOMString

notations

NamedNodeMap

publicId

DOMString

systemId

DOMString

Using additional fields available since DOM Level 2, it is now possible to fully reconstruct a parsed document using only the information provided within the DOM framework. No programmatic way to modify DocumentType node contents currently exists.

19.4.1.2 ProcessingInstruction

The ProcessingInstruction node type provides direct access to a processing instruction's contents. Though processing instructions appear in the document's text, they may also appear before or after the root element, as well as in DTDs. Table 19-7 describes the ProcessingInstruction node's attributes.

Table 19-7. The ProcessingInstruction interface, derived from Node

Name

Type

Read-only

Attributes

   

data

DOMString

 

target

DOMString

Remember that the only syntactically defined part is the target name, which is an XML name token. The remaining data (up to the terminating > ) is free-form. See Chapter 18 for more information about uses (and potential misuses) of XML processing instructions.

19.4.1.3 Notation

XML notations formally declare the format for external unparsed entities and processing instruction targets. The list of all available notations is stored in a NamedNodeMap within the document's DOCTYPE node, which is accessed from the Document interface. The definition of the Notation interface is shown in Table 19-8.

Table 19-8. The Notation interface, derived from Node

Name

Type

Read-only

Attributes

   

publicId

DOMString

systemId

DOMString

19.4.1.4 Entity

The name of the Entity interface is somewhat ambiguous, but its meaning becomes clear when it is connected with the EntityReference interface, which is also part of the DOM Core. The Entity interface provides access to the entity declaration's notation name, public ID, and system ID. Parsed entity nodes have childNodes , while unparsed entities have a notationName . The definition of this interface is shown in Table 19-9.

Table 19-9. The Entity interface, derived from Node

Name

Type

Read-only

2.0

3.0

Attributes

       

inputEncoding

DOMString

 

notationName

DOMString

   

publicId

DOMString

   

systemId

DOMString

   

xmlEncoding

DOMString

 

xmlVersion

DOMString

 

DOM Level 3 introduces three new attributes that apply to external parsed entities: inputEncoding , xmlEncoding , and xmlVersion . This additional information makes it possible to properly enforce XML well- formedness constraints for external parsed entities based on the value of the xmlVersion attribute. The two encoding related attributes make it possible to precisely reconstruct external parsed entity files from their DOM tree representation.

All members of this interface are read-only and cannot be modified at runtime.

19.4.2 Content Nodes

The actual data conveyed by an XML document is contained completely within the document element. The following node types map directly to the XML document's nonstructural parts , such as character data, elements, and attribute values.

19.4.2.1 Document

Each parsed document causes the creation of a single Document node in memory. (Empty Document nodes can be created through the DOMImplementation interface.) This interface provides access to the document type information and the single, top-level Element node that contains the entire body of the parsed document (the documentElement ). It also provides access to the class factory methods that allow an application to create new content nodes that were not created by parsing a document. Table 19-10 shows all attributes and methods of the Document interface.

Table 19-10. The Document interface, derived from Node

Name

Type

Read-only

2.0

3.0

Attributes

       

doctype

DocumentType

   

documentElement

Element

   

documentURI

DOMString

   

domConfig

DOMConfiguration

 

implementation

DOMImplementation

   

inputEncoding

DOMString

 

strictErrorChecking

boolean

   

xmlEncoding

DOMString

 

xmlStandalone

boolean

 

xmlVersion

DOMString

   

Methods

       

adoptNode

Node

   

createAttribute

Attr

     

createAttributeNS

Attr

   

createCDATASection

CDATASection

     

createComment

Comment

     

createDocumentFragment

DocumentFragment

     

createElement

Element

     

createElementNS

Element

   

createEntityReference

EntityReference

     

createProcessingInstruction

ProcessingInstruction

     

createTextNode

Text

     

getElementById

Element

   

getElementsByTagName

NodeList

     

getElementsByTagNameNS

NodeList

   

importNode

Node

   

normalizeDocument

void

   

renameNode

Node

   

The various create...( ) methods are important for applications that wish to modify the structure of a document that was previously parsed. Note that nodes created using one Document instance may only be inserted into the document tree belonging to the Document that created them. DOM Level 2 provided a new importNode( ) method that allows a node, and possibly its children, to be essentially copied from one document to another. DOM Level 3 introduced the adoptNode( ) method that actually moves an entire node subtree from one document to another.

Besides the various node-creation methods, some methods can locate specific XML elements or lists of elements. The methods getElementsByTagName( ) and getElementsByTagNameNS() return a list of all XML elements with the name, and possibly namespace, specified. The getElementById( ) method returns the single element with the given ID attribute.

DOM Level 3 also introduced several attributes that are useful when an application wishes to reconstruct an XML document to its original, pre-parsing format. The inputEncoding , xmlEncoding , and xmlStandalone attributes preserve information about the values of the XML declaration from the original document as well as the character encoding of the document before it was parsed (and converted to Unicode).

One of the major additions to DOM in Level 3 was the inclusion of document validation support within the DOM tree itself. The normalizeDocument( ) method provides the developer with a mechanism for essentially "re-parsing" the XML document from the DOM tree in memory. Various parameters available through the domConfig attribute control how this normalization will occur. It is also possible to change the target version of XML by modifying the xmlVersion attribute before normalization. This will cause the DOM to enforce the XML name construction rules associated with the selected XML version. See Chapter 21 for more information about the differences between XML Versions 1.0 and 1.1.

19.4.2.2 DocumentFragment

Applications that allow real-time editing of XML documents sometimes need to temporarily park document nodes outside the hierarchy of the parsed document. A visual editor that wants to provide clipboard functionality is one example. When the time comes to implement the cut function, it is possible to move the cut nodes temporarily to a DocumentFragment node without deleting them, rather than having to leave them in place within the live document. Then, when they need to be pasted back into the document, they can be reinserted using a method such as Node.appendChild( ) . The DocumentFragment interface, derived from Node , has no interface-specific attributes or methods.

19.4.2.3 Element

Element nodes are the most frequently encountered node type in a typical XML document. These nodes are parents for the Text , Comment , EntityReference , ProcessingInstruction , CDATASection , and child Element nodes that comprise the document's body. They also allow access to the Attr objects that contain the element's attributes. Table 19-11 shows all attributes and methods supported by the Element interface.

Table 19-11. The Element interface, derived from Node

Name

Type

Read-only

2.0

3.0

Attributes

       

schemaTypeInfo

TypeInfo

 

tagName

DOMString

   

Methods

       

getAttribute

DOMString

     

getAttributeNode

Attr

     

getAttributeNodeNS

Attr

   

getAttributeNS

DOMString

   

getElementsByTagName

NodeList

     

getElementsByTagNameNS

NodeList

   

hasAttribute

boolean

   

hasAttributeNS

boolean

   

removeAttribute

void

     

removeAttributeNode

Attr

     

removeAttributeNS

Attr

   

setAttribute

void

     

setAttributeNode

Attr

     

setAttributeNodeNS

Attr

     

setAttributeNS

Attr

   

setIdAttribute

void

 

setIdAttributeNode

void

 

setIdAttributeNS

void

 

19.4.2.4 Attr

Since XML attributes may contain either text values or entity references, the DOM stores element attribute values as Node subtrees. The following XML fragment shows an element with two attributes:

<!ENTITY bookcase_pic SYSTEM "bookcase.gif" NDATA gif> <!ELEMENT picture EMPTY> <!ATTLIST picture src ENTITY #REQUIRED alt CDATA #IMPLIED> . . . <picture src="bookcase_pic" alt="3/4 view of bookcase"/>

The first attribute contains a reference to an unparsed entity; the second contains a simple string. Since the DOM framework stores element attributes as instances of the Attr interface, a few parsers make the contents of attributes available as actual subtrees of Node objects. In this example, the src attribute would contain an EntityReference object instance. Note that the nodeValue of the Attr node gives the flattened text value from the Attr node's children. Table 19-12 shows the attributes and methods supported by the Attr interface.

Table 19-12. The Attr interface, derived from Node

Name

Type

Read-only

2.0

3.0

Attributes

       

specified

boolean

   

isId

boolean

 

name

DOMString

   

value

DOMString

     

ownerElement

Element

 

schemaTypeInfo

TypeInfo

 

Besides the attribute name and value, the Attr interface exposes the specified flag that indicates whether this particular attribute instance was included explicitly in the XML document or inherited from the !ATTLIST declaration of the DTD. There is also a back pointer to the Element node that owns this attribute object.

19.4.2.5 CharacterData

Several types of data within a DOM node tree represent blocks of character data that do not include markup. CharacterData is an abstract interface that supports common text-manipulation methods, which are used by the concrete interfaces Comment , Text , and CDATASection . Table 19-13 shows the attributes and methods supported by the CharacterData interface.

Table 19-13. The CharacterData interface, derived from Node

Name

Type

Read-only

DOM 2.0

Attributes

     

data

DOMString

   

length

unsigned long

 

Methods

     

appendData

void

   

deleteData

void

   

insertData

void

   

replaceData

void

   

19.4.2.6 Comment

DOM parsers are not required to make the contents of XML comments available after parsing, and relying on comment data in your application is poor programming practice at best. If your application requires access to metadata that should not be part of the basic XML document, consider using processing instructions instead. The Comment interface, derived from CharacterData , has no interface-specific attributes or methods, only those it inherits from its superinterfaces.

19.4.2.7 EntityReference

If an XML document contains references to general entities within the body of its elements, the DOM-compliant parser may pass these references along as EntityReference nodes. This behavior is not guaranteed because the parser is free to expand any entity or character reference included with the actual Unicode character sequence it represents. The EntityReference interface, derived from Node , has no interface-specific attributes or methods.

19.4.2.8 Text

The character data of an XML document is stored within Text nodes. Text nodes are children of either Element or Attr nodes. After parsing, every contiguous block of character data from the original XML document is translated directly into a single Text node. Once the document has been parsed, however, it is possible that the client application may insert, delete, and split Text nodes so that Text nodes may be side by side within the document tree. Table 19-14 describes the Text interface.

Table 19-14. The Text interface, derived from CharacterData

Name

Type

Read-only

2.0

3.0

Attributes

       

isElementContentWhitespace

boolean

   

wholeText

DOMString

 

Methods

       

replaceWholeText

Text

   

splitText

Text

     

The splitText method provides a way to split a single Text node into two nodes at a given point. This split would be useful if an editing application wished to insert additional markup nodes into an existing island of character data. After the split, it is possible to insert additional nodes into the resulting gap.

Another useful addition (introduced in Level 3) is the wholeText attribute. This attribute returns all of the text contained in the selected Text node, as well as any adjacent Text nodes, in document order. Prior to Level 3, it was necessary to enumerate all children of a given node and concatenate them manually to get the entire text contained within a node.

19.4.2.9 CDATASection

CDATA sections provide a simplified way to include characters that would normally be considered markup in an XML document. These sections are stored within a DOM document tree as CDATASection nodes. The CDATASection interface, derived from Text , has no interface-specific attributes or methods.

Категории