Integrating PHP and XML 2004

The SAX parser is an event-based , non-validating parser that reads data from the XML document. The current version of SAX is SAX 2. SAX2 processes documents in a sequential manner. It reads a part of the XML document and generates events when it finds an XML tag. It then reads the next part of the XML document. You can use the SAX parser to modify, query, and write an XML document.

Architecture of the SAX Parser

The SAX parser checks the validity of the structure of an XML document. The SAX parser consists of various handlers that are invoked for each XML tag. The handlers are user -defined functions, which are also called callback functions.

Figure 2-1 shows the architecture of the SAX parser:

Figure 2-1: Architecture of the SAX Parser

Working with the SAX Parser in PHP

The SAX parser invokes handlers for each opening and closing tag in an XML document. It also invokes handlers for character data and processing instructions of the XML document. To use the SAX parser:

  1. Initialize the SAX parser using the PHP function, xml_parser_create(). The code to initialize the SAX parser is:

    $xparser=xml_parser_create();

    The above code initializes the SAX parser and creates the xparser variable, which provides a reference to the SAX parser.

  2. Identify the events, and set the callback functions to be invoked for the events. The code to identify the events and set the callback functions is:

    xml_set_element_handler($xparser, "startingHandler", "endingHandler"); xml_set_character_data_handler($xparser, "cdataHandler");

    The above code shows that the xml_set_element_handler() function is the built-in function of PHP. In the above code:

    • The xml_set_element_handler() function invokes the callback functions for the opening and closing tags of an XML document.

    • The startingHandler function is the callback function that is invoked when the SAX parser finds an opening tag.

    • The endingHandler function is the callback function that is invoked when the SAX parser finds a closing tag.

    • The xml_set_character_data_handler() function is a built-in function in PHP. The function specifies the callback functions to be invoked for character data within the tags of the XML document.

    • The cdataHandler function is the callback function that is invoked when the SAX parser finds character data within the XML document.

  1. Provide the code for the callback functions, startingHandler(), endingHandler(), and cdataHandler(), in the PHP script.

  2. Open the XML document using the fopen() function, as shown in the following code:

    if(!($fp=fopen("student.xml","r"))) { die ("File does not exist"); }

    The above code creates the fp variable, which refers to the student.xml file. In the above code:

    • The fopen() function opens the student.xml file in the read mode. If the fp variable does not contain the pointer to the XML file, the code displays the error message, File does not exist.

    • The die() function is the built-in function of PHP that terminates the execution of the script and displays the message specified as an argument.

  1. Parse the XML document using the xml_parse() function, as shown in Listing 2-3:

    Listing 2-3: Parsing the XML Document

    while($data=fread($fp, 4096)) { if(!xml_parse($xparser,$data,feof($fp))) { die("XML parse error: xml_error_string(xml_get_error_code($xparser))"); } }

     

    In the above code:

    • The SAX parser reads the content of the XML document in chunks of 4KB.

    • The xml_parse() function parses the XML document until it reaches the end of the XML document.

    • The feof() function returns the Boolean value, true, if the end of the document is reached, and notifies the parser to terminate the processing.

    • The die() function terminates the execution when an error occurs in parsing the XML document.

    • The xml_get_error_code() function returns the error code and the xml_error_string() function returns the error description corresponding to the error code.

  2. Release the resources of the XML parser using the xml_parser_free() function of PHP, as shown in the following code:

    xml_parser_free($xml_parser);

    The above code releases the XML parser when the execution of the script ends.

Note  

To initialize the parser with other encoding schemes, use the following code:

$xparser=xml_parser_create("UTF-16");

Implementing the SAX Parser

The SAX parser consists of various functions, known as handlers. Each handler is invoked when the SAX parser finds certain events, such as opening tag, closing tag, character data, processing instructions, and comments.

For parsing an XML file, you need to provide the XML data file to the SAX parser, as shown:

$xfile="student.xml";

The above code creates the xfile variable that contains the name of the XML document to be parsed by the SAX parser. You can refer to the student.xml file using the $xfile variable within the PHP script.

To implement the SAX parser, you need to create callback functions for handling all events. PHP passes three parameters to the startingHandler() callback function, which are:

Listing 2-4 shows the startingHandler() callback function:

Listing 2-4: Handling the Opening Tag Event

function startingHandler($xparser, $element_name, $attributes) { echo "Opening Tag:<b>$element_name</b><br>"; while (list($key,$value)=each($attributes)) { echo "Attribute:<b><i>$key=$value</i></b><br>"; } }

 

In the above listing:

Unlike the start tag handler, PHP passes two parameters to the endingHandler() callback function, because it does not contain attributes. The endingHandler() callback function is the end tag handler, which is invoked when the parser finds an end tag. The parameters passed to the end tag handler include the reference to the SAX parser and the element name.

The code to define the endingHandler() callback function, is:

function endingHandler($xparser, $element_name) { echo "Closing Tag:<b>$element_name</b><br>"; }

The above code shows that the parser invokes the endingHandler() function when it finds the closing tag. It displays the names of the closing tags of the XML document in bold.

PHP passes two parameters to the character data handler. The parameters passed to the character data handler include the reference to the SAX parser and the character data.

The code to define the character data callback function, cdataHandler, is:

function cdataHandler($xparser, $cdata) { echo "CDATA: <i><u>$cdata</u></i><br>"; }

The above code shows that the cdataHandler() function is invoked when the parser finds text between the opening and closing tags. The cdataHandler() function displays the text between the opening and closing tags in underlined and italicized format.

You can implement the SAX parser in a PHP script, as shown in Listing 2-5:

Listing 2-5: Implementing the SAX Parser

<html><head> <basefont face="Times New Roman"> </head> <body> <?php function startingHandler($xparser, $element_name, $attributes) { echo "Opening Tag:<b>$element_name</b><br>"; while (list($key,$value)=each($attributes)) { echo "Attribute:<b><i>$key=$value</i></b><br>"; } } function endingHandler($xparser, $element_name) { echo "Closing Tag:<b>$element_name</b><br>"; } function cdataHandler($xparser, $cdata) { echo "CDATA: <i><u>$cdata</u></i><br>"; } $xfile="student.xml"; $xparser=xml_parser_create(); xml_set_element_handler($xparser, "startingHandler","endingHandler"); xml_set_character_data_handler($xparser,"cdataHandler"); if(!($fp=fopen($xfile,"r"))) { die("File Input/Output error: $xfile"); } while($data=fread($fp,4096)) { if(!xml_parse($xparser,$data,feof($fp))) { die("XML parser error: xml_error_string(xml_get_error_code($xparser))"); } } xml_parser_free($xparser); ?> </body> </html>

 

The above listing shows that the SAX parser of PHP parses the XML document, student.xml. In the above code:

The content of the student.xml file is, as shown:

<?xml version="1.0"?> <studentdata><student><name id="s001">George</name><age>15</age><address>New York</address><standard>10</standard></student></studentdata>

Note  

The cdataHandler() function also accepts any white space in the XML file as its parameter.

Figure 2-2 shows the output of Listing 2-5:

Figure 2-2: Output of Listing 2-5

Using the Expat Parser

The Expat parser is a SAX parser that supports the event-driven approach of parsing a document. The Expat parser is the default parser for the PHP scripting language. This parser contains wrapper classes and filters, which you use to perform advanced processing on XML documents, such as transforming, updating, and querying XML documents.

The PHPXML class is an API that contains wrapper classes, which implement filters in the SAX parser. The class_sax_filters.php file of the SAX parser is an example of a wrapper class.

Note  

You can download the PHPXML class from http://phpxmlclasses. sourceforge .net/

The AbstractSAXParser class implements the SAX parser. This class checks the XML document and invokes events using the objects of the AbstractFilter class, called listener objects. The methods used by the AbstractSAXParser class are:

The ExpatParser class implements the AbstractSAXParser class to parse an XML document. The constructor of the ExpatParser class accepts the XML document as an argument. The code to implement the ExpatParser class is:

$xml_parser=new ExpatParser("file.xml"); $f1=new FilterAddStudent(); $xml_parser->SetListener(f1); $xml_parser->parse();

In the above code:

Filters are user-defined classes that accept SAX events from a parser, process it, and provide the result to other filters or Web browsers. A filter class implements the methods that are defined in the AbstractFilter class. You need to extend the AbstractFilter class to create a user-defined filter. The handlers implemented by a filter class are:

Filters that do not invoke other events, but display the output on the Web browser, are called finalizers. The FilterOutput() method is a finalizer, which displays the output of the XML document on the Web browser.

Listing 2-6 shows how to implement the AbstractFilter abstract class:

Listing 2-6: Implementing the AbstractFilter Class

<?php include_once("/class_sax_filters.php"); public class FilterAddStudent extends AbstractFilter { function StartElementHandler($element_name, $attributes) { // Implementation of the start tag handler } function EndElementHandler($element_name) { // Implementation of the end tag handler } function CharacterDataHandler($cdata) { // Implementation of the character data handler } } $xml_parser=new ExpatParser("file.xml"); $f1=new FilterAddStudent(); $f2=new FilterOutput(); $f1->SetListener($f2); $xml_parser->SetListener(f1); $xml_parser->parse(); ?>

 

The above listing shows that the AbstractFilter class implements the FilterAddStudent filter class. The f2 variable refers to the object of the FilterOutput class that displays the output of the XML document on the Web browser.

Категории