Java Enterprise in a Nutshell (In a Nutshell (OReilly))

7.2. Java API for XML Processing

The JAXP API is bundled into the JDK as of 1.4 and is an optional package for earlier versions. It is also a standard component of the J2EE 1.3 and 1.4 platforms. XML Schema support is available only in JAXP 1.2, which is part of J2EE 1.4 and JDK 1.4. J2EE 1.3 includes the 1.1 release of JAXP, which is otherwise functionally identical. If you're working in a Java 5.0 environment, you're using JAXP 1.3, which adds XPath and XInclude support to the API, among other things.The full specification and a reference implementation are available from http://java.sun.com/xml.

The SAX and DOM APIs that are actually used for processing XML files don't include a standard method for creating a parser object; this is one of the voids JAXP fills. The API provides a set of Factory objects that will create parsers or XSLT processors. Additionally, JAXP defines a programmatic interface to XSLT processors.

The actual parser and processor implementations used by JAXP are pluggable. You can use the Crimson parser , the Apache Xerces parser (available from http://xml.apache.org), or any other JAXP-compatible parser. Version 1.1 of the reference implementation shipped with Sun's Crimson XML parser and the Xalan XSL engine from the Apache XML project (again, see http://xml.apache.org). In JAXP 1.2, the Crimson parser was replaced with the Xerces parser. There are still variations in support for different levels of functionality across parser implementations. The examples in this chapter have been tested with the Xerces parser that shipped with JAXP 1.2.

7.2.1. Getting a Parser or Processor

To retrieve a parser or processor from inside a Java program, call the newInstance( ) method of the appropriate factory class, either SAXParserFactory, DocumentBuilderFactory, or transformerFactory. The actual factory implementation is provided by the parser vendor. For example, to retrieve the platform default SAX parser:

SAXParserFactory spf = SAXParserFactory.newInstance( ); spf.setValidating(true); //request a validating parser try { SAXParser saxParser = spf.newSAXParser( ); // Processs XML here } catch (SAXException e) { e.printStackTrace( ); } catch (ParserConfigurationException pce) { pce.printStackTrace( ); } catch (IOException ioe) { ioe.printStackTrace( ); }

The next three sections will deal with what you can do once you've actually retrieved a parser. For the time being, let's treat it as an end in itself.

SAXParserFactory includes a static method called newInstance( ). When this method is called, the JAXP implementation searches for an implementation of javax.xml.parsers.SAXParserFactory, instantiates it, and returns it. The implementation of SAXParserFactory is provided by the parser vendor; it's org.apache.xerces.jaxp.SAXParserFactoryImpl for the Xerces parser.

The system looks for the name of the class to instantiate in the following four locations, in order:

  1. In one of these system properties:

    • javax.xml.parsers.SAXParserFactory

    • javax.xml.parsers.DocumentBuilderFactory

    • javax.xml.parsers.TransformerFactory

  2. In the lib/jaxp.properties file in the JRE directory. The configuration file is in key=value format, and the key is the name of the corresponding system property. Therefore, to set Crimson as the default parser, jaxp.properties would contain the following line:

    javax.xml.parsers.SAXParserFactory=org.apache.crimson.jaxp. SAXParserFactoryImpl

  3. In the application jar file, via the Services API . The API looks for the classname in a file called META-INF/services/parserproperty in which the filename (parserproperty) is the property name corresponding to the desired factory. The runtime environment checks every available jar file, so if you have multiple parsers available to your application, specify the desired factory using one of the previous methods to prevent nondeterministic behavior.

  4. In a platform default factory instance.

Once you have a factory, various parser options can be set using the factory- specific set methods. SAXParserFactory and DocumentBuilderFactory, for instance, include setNamespaceAware( ) and setValidating( ) methods, which tell the factory whether to produce a parser that is aware of XML namespaces (and will fail if the document being parsed doesn't properly conform to the namespace specification) and whether to validate against any DTD and/or schema specified by the XML document itself. To enable schema validation in parsers, you'll need to set the following attribute on the DocumentBuilderFactory, or set the equivalent property on the SAXParser using setProperty():

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); dbf.setAttribute( "http://java.sun.com/xml/jaxp/properties/schemaLanguage", "http://www.w3.org/2001/XMLSchema");

This attribute is a standard property defined in the JAXP specification and it tells the parser which schema language to use for validation. In this example, we're using the 2001 version of the XML Schema language.

Factories are threadsafe, so a single instance can be shared by multiple threads. This allows parser factories to be instantiated in a Java Servlet init( ) method or other centralized location. Parsers and processors, however, aren't guaranteed to be threadsafe.

Категории