XML in a Nutshell, Third Edition

2017-07-07 02:10:07

The Node interface is the DOM Core class hierarchy's root. Though never instantiated directly, it is the root interface of all specific interfaces, and you can use it to extract information from any object within a DOM document tree without knowing its actual type. It is possible to access a document's complete structure and content using only the methods and properties exposed by the Node interface. As shown in Table 19-1, this interface contains information about the type, location, name , and value of the corresponding underlying document data.

Table 19-1. The Node interface

Name	Type	Read-only	2.0	3.0
Attributes
attributes	NamedNodeMap
baseURI	DOMString
childNodes	NodeList
firstChild	Node
lastChild	Node
localName	DOMString
namespaceURI	DOMString
nextSibling	Node
nodeName	DOMString
nodeType	unsigned short
nodeValue	DOMString
ownerDocument	Document
parentNode	Node
prefix	DOMString
previousSibling	Node
textContent	DOMString
Methods
appendChild	Node
cloneNode	Node
compareDocumentPosition	unsigned short
getFeature	DOMObject
getUserData	DOMUserData
hasAttributes	boolean
hasChildNodes	boolean
insertBefore	Node
isDefaultNamespace	boolean
isEqualNode	boolean
isSameNode	boolean
isSupported	boolean
lookupNamespaceURI	DOMString
lookupPrefix	DOMString
normalize	void
removeChild	Node
replaceChild	Node
setUserData	DOMUserData

Since the Node interface is never instantiated directly, the nodeType attribute contains a value that indicates the given instance's specific object type. Based on the nodeType , it is possible to cast a generic Node reference safely to a specific interface for further processing. Table 19-2 shows the node type values and their corresponding DOM interfaces, and Table 19-3 shows the values they provide for nodeName , nodeValue , and attributes attributes.

Table 19-2. The DOM node types and interfaces

Node type	DOM interface
`ATTRIBUTE_NODE`	`Attr`
`CDATA_SECTION_NODE`	`CDATASection`
`COMMENT_NODE`	`Comment`
`DOCUMENT_FRAGMENT_NODE`	`DocumentFragment`
`DOCUMENT_NODE`	`Document`
`DOCUMENT_TYPE_NODE`	`DocumentType`
`ELEMENT_NODE`	`Element`
`ENTITY_NODE`	`Entity`
`ENTITY_REFERENCE_NODE`	`EntityReference`
`NOTATION_NODE`	`Notation`
`PROCESSING_INSTRUCTION_NODE`	`ProcessingInstruction`
`TEXT_NODE`	`Text`

Table 19-3. The DOM node types and method results

Node type	nodeName	nodeValue	Attributes
`ATTRIBUTE_NODE`	att name	att value	null
`CDATA_SECTION_NODE`	`#cdata-section`	content	null
`COMMENT_NODE`	`#comment`	content	null
`DOCUMENT_FRAGMENT_NODE`	`#document-fragment`	null	null
`DOCUMENT_NODE`	`#document`	null	null
`DOCUMENT_TYPE_NODE`	document type name	null	null
`ELEMENT_NODE`	tag name	null	NamedNodeMap
`ENTITY_NODE`	entity name	null	null
`ENTITY_REFERENCE_NODE`	name of entity referenced	null	null
`NOTATION_NODE`	notation name	null	null
`PROCESSING_INSTRUCTION_NODE`	target	content excluding the target	null
`TEXT_NODE`	`#text`	content	null

Note that the nodeValue attribute returns the contents of simple text and comment nodes but returns nothing for elements. Prior to DOM Level 3, retrieving the text content of an element required locating any child Text nodes it might contain, but DOM Level 3 introduced the getTextContent( ) and setTextContent() convenience methods.

19.3.1 The NodeList Interface

The NodeList interface provides access to the ordered content of a node. Most frequently, it is used to retrieve text nodes and child elements of element nodes. See Table 19-4 for a summary of the NodeList interface.

Table 19-4. The NodeList interface

Name	Type	Read-only	2.0	3.0
Attribute
length	Long
Method
item	Node

The NodeList interface is extremely basic and is generally combined with a loop to iterate through the children of a node, as in the following example:

NodeList nl = nd.getChildNodes( ); for (int i = 0; i < nl.getLength( ); i++) { Node ndChild = nl.item(i); if (ndChild.getNodeType( ) = = Node.COMMENT_NODE) { System.out.println("found comment: " + ndChild.getNodeValue( )); } }

19.3.2 The NamedNodeMap Interface

The NamedNodeMap interface is used for unordered collections whose contents are identified by name. In practice, this interface is used to access attributes of elements. See Table 19-5 for a summary of the NamedNodeMap interface.

Table 19-5. The NamedNodeMap interface

Name	Type	Read-only	2.0	3.0
Attribute
length	Long
Methods
getNamedItem	Node
getNamedItemNS	Node
removeNamedItem	Node
removeNamedItemNS	Node
setNamedItem	Node
setNamedItemNS	Node

19.3.3 Relating Document Structure to Nodes

Although the DOM doesn't specify an interface to cause a document to be parsed, it does specify how the document's syntax structures are encoded as DOM objects. A document is stored as a hierarchical tree structure, with each item in the tree linked to its parent, children, and siblings:

Figure 19-1 shows how the preceding short sample document would be stored by a DOM parser.

Figure 19-1. Document storage and linkages

Each Node -derived object in a parsed DOM document contains references to its parent, child, and sibling nodes. These references make it possible for applications to enumerate document data using any number of standard tree-traversal algorithms. "Walking the tree" is a common approach to finding information stored in a DOM and is demonstrated in Example 19-1 at the end of this chapter.