Navigating a Document with XPath
Problem
You want to find or address sections of an XML document in a standard, programming-languageindependent way.
Solution
The XPath language defines a way of referring to almost any element or set of elements in an XML document, and the REXML library comes with a complete XPath implementation. REXML::XPath provides three class methods for locating Element objects within parsed documents: first, each, and match.
Take as an example the following XML description of an aquarium. The aquarium contains some fish and a gaudy castle decoration full of algae. Due to an aquarium stocking mishap, some of the smaller fish have been eaten by larger fish, just like in those cartoon food chain diagrams. (Figure 11-1 shows the aquarium.)
xml = %{
<aquarium>
Figure 11-1. The aquarium
We can use REXML::
Xpath.first to get the Element object corresponding to the first
REXML::XPath.first(doc, //fish)
# => We can use match to get an array containing all the elements that are green:
REXML::XPath.match(doc, //[@color="green"])
# => [
We can use each with a code block to iterate over all the fish that are inside other fish:
def describe(fish)
"#{fish.attribute(size)} #{fish.attribute(color)} fish"
end
REXML::
XPath.each(doc, //fish/fish) do |fish|
puts "The #{describe(fish.parent)} has eaten the #{describe(fish)}."
end
# The large orange fish has eaten the small green fish.
# The small green fish has eaten the tiny red fish.
Every element in a Document has an xpath method that returns the canonical XPath path to that element. This path can be considered the elements "address" within the document. In this example, a complex bit of Ruby code is replaced by a simple XPath expression:
red_fish = doc.children[0].children[3].children[1].children[1]
# =>
Even a brief overview of XPath is beyond the scope of this recipe, but here are some more examples to give you ideas:
# Find the second green element.
REXML::XPath.match(doc, //[@color="green"])[1]
# => <algae color=green/>
# Find the color attributes of all small fish.
REXML::XPath.match(doc, //fish[@size="small"]/@color)
# => [color=lue, color=green]
# Count how many fish are inside the first large fish.
REXML::XPath.first(doc, "count(//fish[@size=large][1]//*fish)")
# => 2
The Elements class acts kind of like an array that supports XPath addressing. You can make your code more concise by passing an XPath expression to Elements#each, or using it as an array index.
doc.elements.each(//fish) { |f| puts f.attribute(color) }
# blue
# orange
# green
# red
doc.elements[//fish]
# =>
Within an XPath expression, the first element in a list has an index of 1, not 0. The XPath expression //fish[size=large][1] matches the first large fish, not the second large fish, the way large_fish[1] would in Ruby code. Pass a number as an array index to an Elements object, and you get the same behavior as
XPath:
doc.elements[1]
# => <aquarium> … >
doc.children[0]
# => <aquarium> … >
Discussion
See Also
Категории