Beginning XML Databases (Wrox Beginning Guides)

The FOR XML clause extends a SELECT statement by allowing return of data from relational tables into an XML structured document. The basic syntax of the FOR XML clause is as follows :

FOR XML { RAW [('<element>')] [<directives>] [ELEMENTS] AUTO [<directives>] [ELEMENTS] EXPLICIT [<directives>] PATH [('<element>')] [<directives>] [ELEMENTS] } <directives> = [ , TYPE ] [, ROOT ('<root>') ]

The FOR XML clause syntax can be described as follows:

Lets begin with RAW mode.

FOR XML RAW Mode

This is a simple query executed against a single table, displaying regions :

USE [newdem] GO SELECT * FROM region GO

Figure 6-4 shows the result of the preceding query.

Figure 6-4: A single table query

The following query uses RAW mode to return regions. Each record is returned as a single XML element where each field becomes an attribute of each record-element:

SELECT * FROM region FOR XML RAW GO

RAW mode produces this:

<row region_id="1" region="Africa" population="789548670" area="26780325" /> <row region_id="2" region="Asia" population="47382633" area="657741" /> <row region_id="3" region="Australasia" population="24340222" area="7886602" /> <row region_id="4" region="Caribbean" population="40417697" area="268857" /> <row region_id="5" region="Central America" population="142653392" area="2360325" /> <row region_id="6" region="Europe" population="488674441" area="4583335" /> <row region_id="7" region="Far East" population="2100636517" area="15357441" /> <row region_id="8" region="Middle East" population="294625718" area="6798768" /> <row region_id="9" region="Near East" population="1499157105" area="4721322" /> <row region_id="10" region="North America" population="331599508" area="18729272" /> <row region_id="11" region="Oceania" population="9133256" area="536238" /> <row region_id="12" region="Russian Federation" population="258037209" area="21237500" /> <row region_id="13" region="South America" population="375489788" area="17545171" />

The element name can be renamed by doing this:

SELECT * FROM region FOR XML RAW('region') GO

This is the result showing the name of the element changed to region :

<region region_id="1" region="Africa" population="789548670" area="26780325" /> <region region_id="2" region="Asia" population="47382633" area="657741" /> <region region_id="3" region="Australasia" population="24340222" area="7886602" /> <region region_id="4" region="Caribbean" population="40417697" area="268857" /> <region region_id="5" region="Central America" population="142653392" area="2360325" /> <region region_id="6" region="Europe" population="488674441" area="4583335" /> <region region_id="7" region="Far East" population="2100636517" area="15357441" /> <region region_id="8" region="Middle East" population="294625718" area="6798768" /> <region region_id="9" region="Near East" population="1499157105" area="4721322" /> <region region_id="10" region="North America" population="331599508" area="18729272" /> <region region_id="11" region="Oceania" population="9133256" area="536238" /> <region region_id="12" region="Russian Federation" population="258037209" area="21237500" /> <region region_id="13" region="South America" population="375489788" area="17545171" />

Use the ELEMENTS clause to convert all attributes produced in RAW mode into child elements:

SELECT * FROM region FOR XML RAW('region'), ELEMENTS GO

This is a partial result:

<region> <region_id>1</region_id> <region>Africa</region> <population>789548670</population> <area>26780325</area> </region> <region> <region_id>2</region_id> <region>Asia</region> <population>47382633</population> <area>657741</area> </region> ... <region> <region_id>13</region_id> <region>South America</region> <population>375489788</population> <area>17545171</area> </region>

Now lets do a join without the ELEMENTS clause:

SELECT r.region, co.country FROM region AS r JOIN country AS co ON co.region_id = r.region_id FOR XML RAW GO

This is a partial result:

... <row region="Africa" country="Zambia" /> <row region="Africa" country="Zimbabwe" /> <row region="Asia" country="Burma" /> <row region="Australasia" country="Australia" /> <row region="Australasia" country="New Zealand" /> <row region="Caribbean" country="Bahamas" /> <row region="Caribbean" country="Barbados" /> ...

And now lets add the ELEMENTS clause and this time join three tables:

SELECT r.region, co.country, s.state FROM region AS r JOIN country AS co ON co.region_id = r.region_id JOIN state AS s ON s.country_id = co.country_id FOR XML RAW GO

Here is a partial result showing the three tables joined, returning states, within countries , within regions:

<row region="North America" country="Canada" state="British Columbia" /> <row region="North America" country="Canada" state="Nova Scotia" /> <row region="North America" country="Canada" state="Ontario" /> <row region="North America" country="Canada" state="Quebec" /> <row region="North America" country="United States" state="Alaska" /> <row region="North America" country="United States" state="Alabama" /> <row region="North America" country="United States" state="Arkansas" /> <row region="North America" country="United States" state="Arizona" /> ...

Now the ELEMENTS clause is added back in:

SELECT r.region, co.country, s.state FROM region AS r JOIN country AS co ON co.region_id = r.region_id JOIN state AS s ON s.country_id = co.country_id FOR XML RAW('region'), ELEMENTS GO

This is the result. The problem is that there is only a single element layer:

<region> <region>North America</region> <country>Canada</country> <state>British Columbia</state> </region> <region> <region>North America</region> <country>Canada</country> <state>Nova Scotia</state> </region> <region> <region>North America</region> <country>Canada</country> <state>Ontario</state> </region> <region> <region>North America</region> <country>Canada</country> <state>Quebec</state> </region> <region> <region>North America</region> <country>United States</country> <state>Alaska</state> </region> ... <region> <region>North America</region> <country>United States</country> <state>Vermont</state> </region>

At this stage the limits of what RAW mode can do have more or less been reached, and it is better to proceed to using AUTO mode.

FOR XML AUTO Mode

As already stated, AUTO mode allows the creation of a nested XML structure. The previous example in the previous section on RAW mode showed that only a single nested layer could be created using RAW mode. There are two ways in which AUTO mode improves on using RAW mode:

The following query is the last example used in the previous section covering RAW mode, joining three tables. The query is now changed to return XML in AUTO as opposed to RAW mode:

SELECT r.region, r.population, r.area, co.country, co.currency, s.state, s.nickname FROM region AS r JOIN country AS co ON co.region_id = r.region_id JOIN state AS s ON s.country_id = co.country_id FOR XML AUTO GO

Again here is a complete result:

<r region="North America" population="331599508" area="18729272"> <co country="Canada" currency="Dollars"> <s state="British Columbia" nickname="" /> <s state="Nova Scotia" nickname="" /> <s state="Ontario" nickname="" /> <s state="Quebec" nickname="" /> </co> <co country="United States" currency=""> <s state="Alaska" nickname="" /> <s state="Alabama" nickname="" /> <s state="Arkansas" nickname="The Natural State" /> <s state="Arizona" nickname="" /> <s state="California" nickname="Golden State" /> <s state="Colorado" nickname="Centennial State" /> <s state="Connecticut" nickname="Nutmeg State" /> <s state="Dictrict of Columbia" nickname="" /> <s state="Delaware" nickname="Diamond State" /> <s state="Florida" nickname="Sunshine State" /> <s state="Georgia" nickname="Peach State" /> <s state="Hawaii" nickname="Aloha State" /> <s state="Iowa" nickname="Hawkeye State" /> <s state="Idaho" nickname="Gem State" /> <s state="Illinois" nickname="Prairie State" /> <s state="Indiana" nickname="Hoosier State" /> <s state="Kansas" nickname="Sunflower State" /> <s state="Kentucky" nickname="Bluegrass State" /> <s state="Louisiana" nickname="Pelican State" /> <s state="Massachusetts" nickname="Bay State" /> <s state="Maryland" nickname="Free State" /> <s state="Maine" nickname="Pine Tree State" /> <s state="Michigan" nickname="Wolverine State" /> <s state="Minnesota" nickname="North Star State" /> <s state="Missouri" nickname="Show-me State" /> <s state="Mississippi" nickname="Magnolia State" /> <s state="Montana" nickname="Treasure State" /> <s state="North Carolina" nickname="Tar Heel State" /> <s state="North Dakota" nickname="Sioux State" /> <s state="New Hampshire" nickname="Granite State" /> <s state="New Mexico" nickname="Land of Enchantment" /> <s state="Nevada" nickname="Sagebrush State" /> <s state="New York" nickname="Empire State" /> <s state="Ohio" nickname="Buckeye State" /> <s state="Oklahoma" nickname="Sooner State" /> <s state="Oregon" nickname="Beaver State" /> <s state="Pennsylvania" nickname="Keystone State" /> <s state="Rhode Island" nickname="The Ocean State" /> <s state="South Carolina" nickname="Palmetto State" /> <s state="South Dakota" nickname="Mount Rushmore State" /> <s state="Tennessee" nickname="Volunteer State" /> <s state="Texas" nickname="Lone Star State" /> <s state="Utah" nickname="Beehive State" /> <s state="Virginia" nickname="The Old Dominion" /> <s state="Washington" nickname="Evergreen State" /> <s state="Wisconsin" nickname="Badger State" /> <s state="West Virginia" nickname="Mountain State" /> <s state="Wyoming" nickname="Equality State" /> </co> <co country="Canada" currency="Dollars"> <s state="Alberta" nickname="" /> </co> <co country="United States" currency=""> <s state="Nebraska" nickname="Cornhusker State" /> <s state="New Jersey" nickname="Garden State" /> <s state="Vermont" nickname="Green Mountain State" /> </co> </r>

A couple of things are odd about the preceding query result:

These issues can be resolved by including an ORDER BY clause in the query and by removing aliases from the query. I have also changed the attribute names for regions, countries, and states to a generic value in the form of the name attributes:

SELECT region.region AS name, region.population, region.area, country.country AS name, country.currency, state.state AS name, state.nickname FROM region JOIN country ON country.region_id = region.region_id JOIN state ON state.country_id = country.country_id ORDER BY region.region, country.country, state.state FOR XML AUTO GO

The result is now properly sorted with more sensibly named elements:

<region name="North America" population="331599508" area="18729272"> <country name="Canada" currency="Dollars"> <state name="Alberta" nickname="" /> <state name="British Columbia" nickname="" /> <state name="Nova Scotia" nickname="" /> <state name="Ontario" nickname="" /> <state name="Quebec" nickname="" /> </country> <country name="United States" currency=""> <state name="Alabama" nickname="" /> <state name="Alaska" nickname="" /> <state name="Arizona" nickname="" /> <state name="Arkansas" nickname="The Natural State" /> <state name="California" nickname="Golden State" /> <state name="Colorado" nickname="Centennial State" /> <state name="Connecticut" nickname="Nutmeg State" /> <state name="Delaware" nickname="Diamond State" /> <state name="Dictrict of Columbia" nickname="" /> <state name="Florida" nickname="Sunshine State" /> <state name="Georgia" nickname="Peach State" /> <state name="Hawaii" nickname="Aloha State" /> <state name="Idaho" nickname="Gem State" /> <state name="Illinois" nickname="Prairie State" /> <state name="Indiana" nickname="Hoosier State" /> <state name="Iowa" nickname="Hawkeye State" /> <state name="Kansas" nickname="Sunflower State" /> <state name="Kentucky" nickname="Bluegrass State" /> <state name="Louisiana" nickname="Pelican State" /> <state name="Maine" nickname="Pine Tree State" /> <state name="Maryland" nickname="Free State" /> <state name="Massachusetts" nickname="Bay State" /> <state name="Michigan" nickname="Wolverine State" /> <state name="Minnesota" nickname="North Star State" /> <state name="Mississippi" nickname="Magnolia State" /> <state name="Missouri" nickname="Show-me State" /> <state name="Montana" nickname="Treasure State" /> <state name="Nebraska" nickname="Cornhusker State" /> <state name="Nevada" nickname="Sagebrush State" /> <state name="New Hampshire" nickname="Granite State" /> <state name="New Jersey" nickname="Garden State" /> <state name="New Mexico" nickname="Land of Enchantment" /> <state name="New York" nickname="Empire State" /> <state name="North Carolina" nickname="Tar Heel State" /> <state name="North Dakota" nickname="Sioux State" /> <state name="Ohio" nickname="Buckeye State" /> <state name="Oklahoma" nickname="Sooner State" /> <state name="Oregon" nickname="Beaver State" /> <state name="Pennsylvania" nickname="Keystone State" /> <state name="Rhode Island" nickname="The Ocean State" /> <state name="South Carolina" nickname="Palmetto State" /> <state name="South Dakota" nickname="Mount Rushmore State" /> <state name="Tennessee" nickname="Volunteer State" /> <state name="Texas" nickname="Lone Star State" /> <state name="Utah" nickname="Beehive State" /> <state name="Vermont" nickname="Green Mountain State" /> <state name="Virginia" nickname="The Old Dominion" /> <state name="Washington" nickname="Evergreen State" /> <state name="West Virginia" nickname="Mountain State" /> <state name="Wisconsin" nickname="Badger State" /> <state name="Wyoming" nickname="Equality State" /> </country> </region>

And now for something really odd I can change the hierarchical structure of the output by rearranging the sequence of the fields in the selected list of fields:

SELECT country.country AS name, country.currency, state.state AS name, state.nickname, region.region AS name, region.population, region.area FROM region JOIN country ON country.region_id = region.region_id JOIN state ON state.country_id = country.country_id ORDER BY region.region, country.country, state.state FOR XML AUTO GO

The partial result is nonsensical but it shows what can be done with AUTO mode:

<country name="Canada" currency="Dollars"> <state name="Alberta" nickname=""> <region name="North America" population="331599508" area="18729272" /> </state> <state name="British Columbia" nickname=""> <region name="North America" population="331599508" area="18729272" /> </state> <state name="Nova Scotia" nickname=""> <region name="North America" population="331599508" area="18729272" /> </state> <state name="Ontario" nickname=""> <region name="North America" population="331599508" area="18729272" /> </state> <state name="Quebec" nickname=""> <region name="North America" population="331599508" area="18729272" /> </state> </country> <country name="United States" currency=""> <state name="Alabama" nickname=""> <region name="North America" population="331599508" area="18729272" /> </state> <state name="Alaska" nickname=""> <region name="North America" population="331599508" area="18729272" /> </state> ... <state name="Wyoming" nickname="Equality State"> <region name="North America" population="331599508" area="18729272" /> </state> </country>

The AUTO clause works better when using subqueries. In the following query the TYPE clause is effectively used to typecast a subquery into the class of the calling (parent) query. Additionally, the ELEMENTS clause is used in this query to push all attributes into child elements, and the ROOT clause is used to add a root element as a wrapper for the entire XML document:

SELECT region.region AS name, region.population, region.area, (SELECT country.country AS name, country.currency, (SELECT state.state AS name, state.nickname FROM state WHERE state.country_id = country.country_id FOR XML AUTO, TYPE) FROM country WHERE country.region_id = region.region_id FOR XML AUTO, TYPE) FROM region FOR XML AUTO, ROOT, ELEMENTS GO

Here is another partial result:

<root> <region> <name>Africa</name> <population>789548670</population> <area>26780325</area> <country name="Algeria" currency="Algeria Dinars" /> <country name="Angola" currency="Kwanza" /> ... <region> <name>Near East</name> <population>1499157105</population> <area>4721322</area> <country name="Bangladesh" currency="Taka" /> <country name="India" currency="Rupees" /> <country name="Pakistan" currency="Rupees" /> <country name="Sri Lanka" currency="Rupees" /> <country name="Turkey" currency="New Lira" /> </region> <region> <name>North America</name> <population>331599508</population> <area>18729272</area> <country name="Canada" currency="Dollars"> <state name="British Columbia" nickname="" /> <state name="Nova Scotia" nickname="" /> <state name="Ontario" nickname="" /> <state name="Quebec" nickname="" /> <state name="Alberta" nickname="" /> </country> <country name="Greenland" currency="" /> <country name="United States" currency=""> <state name="Alaska" nickname="" /> <state name="Alabama" nickname="" /> <state name="Arkansas" nickname="The Natural State" /> ... <state name="Vermont" nickname="Green Mountain State" /> </country> </region> <region> <name>Oceania</name> <population>9133256</population> <area>536238</area> ... <country name="Guyana" currency="Dollars" /> <country name="Paraguay" currency="Guarani" /> <country name="Peru" currency="Nuevos Soles" /> <country name="Surinam" currency="" /> <country name="Uruguay" currency="Pesos" /> <country name="Venezuela" currency="Bolivares" /> </region> </root>

Thats enough about AUTO mode. EXPLICIT mode allows for more precise XML document output.

FOR XML EXPLICIT Mode

As stated previously, EXPLICIT mode provides the most flexibility in creation of XML documents but with more complexity when building queries. The EXPLICIT clause allows for complete control of XML document structure and content. The EXPLICIT clause has very specific syntax in that the first field in the selected list is called TAG and the second field is called PARENT . This applies to the primary calling query, and all subqueries that use the EXPLICIT clause.

The TAG and PARENT fields are used to determine the level of each query in the hierarchy. The TAG field is a positive number determining the level of a query in the hierarchy. The PARENT field is also a positive number but is the tag number of the field under which the current element will be placed into the hierarchy. The PARENT of the first element can also be NULL or because the topmost element does not effectively have a parent in the resulting query. Data fields appear after the TAG and PARENT fields.

Every data field retrieved must be aliased using the following syntax notation:

[<element>]!<tag#>[!<attribute>][!<directive>]

This is a simple example query finding records in a single table:

SELECT 1 AS TAG, 0 AS PARENT, region AS [region!1!name] FROM region FOR XML EXPLICIT GO

This is the result:

<region name="Africa" /> <region name="Asia" /> <region name="Australasia" /> <region name="Caribbean" /> <region name="Central America" /> <region name="Europe" /> <region name="Far East" /> <region name="Middle East" /> <region name="Near East" /> <region name="North America" /> <region name="Oceania" /> <region name="Russian Federation" /> <region name="South America" />

Now lets build a hierarchy. Each layer in the hierarchy is constructed as a separate query. The individual queries for each level are merged together using a UNION clause:

SELECT 1 AS TAG, 0 AS PARENT, region AS [region!1!name], NULL AS [region!2!population], NULL AS [region!2!area] FROM region UNION ALL SELECT 2 AS TAG, 1 AS PARENT, region AS [region!1!name], population AS [region!2!population], area AS [region!2!area] FROM region ORDER BY [region!1!name] FOR XML EXPLICIT GO

This is the result:

<region name="Africa"> <region population="789548670" area="26780325" /> <region population="47382633" area="657741" /> </region> <region name="Asia" /> <region name="Australasia"> <region population="24340222" area="7886602" /> <region population="40417697" area="268857" /> </region> ... <region name="Russian Federation" /> <region name="South America"> <region population="375489788" area="17545171" /> </region>

Doing the same thing with a multiple table join would look something like this:

SELECT 1 AS TAG, 0 AS PARENT, region AS [region!1!name], population AS [region!1!population], area AS [region!1!area], NULL AS [country!2!name], NULL AS [country!2!currency], NULL AS [state!3!name], NULL AS [state!3!nickname] FROM region UNION ALL SELECT 2 AS TAG, 1 AS PARENT, NULL AS [region!1!name], NULL AS [region!1!population], NULL AS [region!1!area], country AS [country!2!name], currency AS [country!2!currency], NULL AS [state!3!name], NULL AS [state!3!nickname] FROM region JOIN country ON country.region_id = region.region_id UNION ALL SELECT 3 AS TAG, 2 AS PARENT, NULL AS [region!1!name], NULL AS [region!1!population], NULL AS [region!1!area], NULL AS [country!2!name], NULL AS [country!2!currency], state AS [state!3!name], nickname AS [state!3!nickname] FROM country JOIN state ON state.country_id = country.country_id ORDER BY [region!1!name], [country!2!name], [state!3!name] FOR XML EXPLICIT GO

The preceding query is ugly and complicated. The better way is by using the PATH clause.

FOR XML PATH Mode

The PATH clause is a better option than the EXPLICIT clause because queries are just as powerful, but much easier to code, and much more efficient.

This example reads a single table, allowing specification of both attributes and elements:

SELECT region_id AS "@region_id", region AS "name", population AS "population", area AS "area" FROM region FOR XML PATH('region'), ROOT GO

This is a partial result:

<root> <region region_id="1"> <name>Africa</name> <population>789548670</population> <area>26780325</area> </region> <region region_id="2"> <name>Asia</name> <population>47382633</population> <area>657741</area> </region> ... <region region_id="13"> <name>South America</name> <population>375489788</population> <area>17545171</area> </region> </root>

The next example uses nesting subqueries within the selected fields list in order to create a hierarchical structure. As you can see, using the PATH clause makes the query look a lot more like simple SQL, as opposed to the horrendous complexity of using the EXPLICIT clause:

SELECT region_id AS "@region_id", region AS "name", population AS "population", area AS "area", ( SELECT country_id AS "@country_id", country.country AS name, country.currency, ( SELECT state_id AS "@state_id", state.state AS name, state.nickname FROM state WHERE state.country_id = country.country_id FOR XML PATH('state'), TYPE ) FROM country WHERE country.region_id = region.region_id FOR XML PATH('country'), TYPE ) FROM region FOR XML PATH('region'), ROOT GO

This is a partial result:

<root> <region region_id="1"> <name>Africa</name> <population>789548670</population> <area>26780325</area> <country country_id="1"> <name>Algeria</name> <currency>Algeria Dinars</currency> </country> ... </region> ... <region region_id="10"> <name>North America</name> <population>331599508</population> <area>18729272</area> <country country_id="125"> <name>Canada</name> <currency>Dollars</currency> <state state_id="1"> <name>British Columbia</name> <nickname /> </state> <state state_id="2"> <name>Nova Scotia</name> <nickname /> </state> <state state_id="3"> <name>Ontario</name> <nickname /> </state> ... </country> </region> </root>

Overall, the FOR XML clause is quite an effective and simplistic method of producing XML documents data from relational database tables.

Категории