Beginning XML Databases (Wrox Beginning Guides)

XML data types can contain very large XML documents and store them inside an SQL Server database. The result is a large chunk of data stored into a binary object data type. Any searches through that XML document involve scanning the entire XML document data space. In many cases, queries can perform better by reading smaller portions of a data space in order to facilitate more efficient scans through large chunks of data. One method of improving performance of data reads is by using indexes.

SQL Server 2005 allows creation of indexes on XML data types. This book really needs to discuss only these indexes, what they are, and how they are created. Demonstrating their use is more of a performance tuning topic and is beyond the scope of this book. SQL Server also uses XML indexes in such a way that queries executed against XML data types use the same SQL query engine process that SQL queries do, including all query planning, query optimization, and query execution processing. So there is additionally no overhead to using a query processing engine specific to XML, and thus not as far advanced as the SQL Server processing query engine.

A query against an XML data type, regardless of the existence of XML indexes or not, uses something called XQuery. XQuery comprises special methods executable directly against XML data types as self-contained object methods. These methods are discussed specifically for SQL Server in the first section of this chapter. XQuery is a standard established as a basis for queries against XML documents. XQuery is discussed later on in this book from a more generic perspective.

The most basic reasons for the existence of an index against a set of data are as follows :

SQL Server allows four different types of indexes, which can be created against XML data types:

The following commands create a primary XML index on the XML field, in the XML table, for the demographics database used for this book. First create a table containing two fields where one is an integer primary key and the other an XML data type field:

CREATE TABLE XML( ID INT PRIMARY KEY, XML XML) GO

Next create the primary key on the XML data type in the XML table:

CREATE PRIMARY XML INDEX XMLIDX ON XML(XML) GO

Now lets add some data to the table, copying a single region from the DEMXML table into the new XML table created previously:

DECLARE @xmldoc xml SET @xmldoc = (SELECT xml.query('/demographics/region[name="North America"]') FROM demXML) INSERT INTO XML(ID,XML) VALUES(10,@xmldoc) SELECT ID, XML FROM XML GO

The disadvantage of creating a primary index on an XML data type is that the primary key index flattens out (or decomposes) the XML hierarchy. This tends to introduce duplication such that the primary index can actually become larger in physical size than the XML document (or fragment in this case). However, the index is constructed using a special algorithm, resulting in a BTree index. Reading a BTree index allows for scans through a tree structure, which is nearly always much faster than reading the entire physical space of an XML document. Exceptions to this rule are when deliberately reading the entire XML document, or reading a large percentage of the XML structure, or when the dataset is very small.

Secondary indexes can assist in XQuery processing by effectively creating a BTree index on all the path routes through an XML document:

CREATE XML INDEX XMLIDXPATH ON XML(XML) USING XML INDEX XMLIDX FOR PATH GO

Or all the attribute values through an XML document:

CREATE XML INDEX XMLIDXPATH ON XML(XML) USING XML INDEX XMLIDX FOR PROPERTY GO

Or all textual values for nodes through an XML document:

CREATE XML INDEX XMLIDXPATH ON XML(XML) USING XML INDEX XMLIDX FOR VALUE GO

This chapter has attempted to introduce the use of XML directly from within SQL Server database. As in the previous chapter, which covered Oracle and XML, XML is vastly more complex and detailed than presented in this chapter.

Категории