Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX

SAX includes an adapter class that you can subclass to build these sorts of two-way filters: org.xml.sax.helpers.XMLFilterImpl . Its general design is similar to what I just detailed, but it implements all of the relevant interfaces in one class:

public class XMLFilterImpl implements XMLFilter, EntityResolver, DTDHandler, ContentHandler, ErrorHandler

When the various setter methods such as setContentHandler() and setErrorHandler() in this class are invoked, the handler is stored in a private field. For example, here is the setContentHandler() method:

public void setContentHandler (ContentHandler handler) { contentHandler = handler; }

When the parse() method is called, it swaps out all of the installed handlers for the XMLFilterImpl object itself:

private void setupParse () { if (parent == null) { throw new NullPointerException("No parent for filter"); } parent.setEntityResolver(this); parent.setDTDHandler(this); parent.setContentHandler(this); parent.setErrorHandler(this); }

When the parent parser calls back to the ContentHandler methods, the XMLFilterImpl passes the call back to the original ContentHandler object stored in the contentHandler field. For example, the startElement() method is as follows :

public void startElement (String uri, String localName, String qName, Attributes atts) throws SAXException { if (contentHandler != null) { contentHandler.startElement(uri, localName, qName, atts); } }

The other callback methods are similar. Thus by default, XMLFilterImpl doesn't filter anything, much like the earlier TransparentFilter example. However, you can subclass this class and override those methods where you want to change the data passed back. You pass your changed data by invoking the usual callback methods in this class. Because you may have overridden the relevant methods in a subclass, you may need to use super to access the methods in XMLFilterImpl directly.

For example, the startElement() method in Example 8.12 adds an id attribute to every element that doesn't already have one, and then passes that modified element on to the underlying content handler to do whatever it needs to do.

Example 8.12 A Subclass of XMLFilterImpl

import org.xml.sax.*; import org.xml.sax.helpers.*; import java.util.*; public class IDFilter extends XMLFilterImpl { public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes atts) throws SAXException { boolean hasID = false; for (int i = 0; i < atts.getLength(); i++) { if (atts.getQName(i).equalsIgnoreCase("id") atts.getType(i).equals("ID")) { hasID = true; ids.add(atts.getValue(i)); break; } } if (!hasID) { AttributesImpl newAttributes = new AttributesImpl(atts); String idValue = makeID(); newAttributes.addAttribute("", "id", "id", "ID", idValue); atts = newAttributes; } super.startElement(namespaceURI, localName, qualifiedName, atts); } // need to track which IDs we've already used, including IDs // that were included in the document int id = 1; private Set ids; // requires Java 1.2 public void startDocument() { // reinitialize id list for each document ids = new HashSet(); id = 1; } // Generate an ID that hasn't been used yet private String makeID() { while (ids.contains("_" + id)) id++; ids.add("_" + id); return "_" + id; } }

You'll notice that this code is much shorter and simpler than the programs that implemented XMLFilter directly. You can reuse much of the code inside XMLFilterImpl without a lot of thought. When subclassing XMLFilterImpl , you only need to override the methods that implement the filter. The remaining methods can be left to the superclass. In fact, it is so much easier to use XMLFilterImpl rather than XMLFilter that almost all real-world filters are based on XMLFilterImpl . A few books even ignore the existence of the XMLFilter interface completely. I covered it here mostly because I spent a lot of time being confused by XMLFilter , not realizing how much more XMLFilterImpl does. It is not just an implementation of the XMLFilter interface.

Because XMLFilterImpl is still an XMLReader , the client application uses it as it would use any other XMLReader by setting handlers, features, and properties and then parsing documents. The only difference is that the client application needs to pass an actual parser object to the setParent() method before doing anything else.

Following is the beginning of the output from my use of IDFilter and FilterTester on the RDDL specification, after the usual adjustments for line length. The initial doc processing instruction is an artifact of the XMLWriter class.

% java -Dorg.xml.sax.driver=gnu.xml.aelfred2.XmlReader FilterTester http://www.rddl.org/ IDFilter <?doc type="doctype" role="title" {Resource Directory Description Language 1.0 } ?> <html xml:lang="en" xml:base="http://www.rddl.org/" version="-//XML-DEV//DTD XHTML RDDL 1.0//EN" id="_1" xmlns="http://www.w3.org/1999/xhtml"> <head profile="" id="_2"> <title id="_3"> XML Resource Directory Description Language (RDDL)</title> <link href="xrd.css" type="text/css" rel="stylesheet" id="_4"></link> </head> <body id="_5"> <h1 id="_6">Resource Directory Description Language (RDDL)</h1> <div class="head" id="_7"> <p id="_8">This Version: <a href="http://www.openhealth.org/RDDL/20010305" id="_9"> March 5, 2001</a></p> ...

Категории