Beginning XML Databases (Wrox Beginning Guides)
| ||
| ||
|
A discussion of the relational and object data models, and specifically the object data model, should tell you the following about XML:
-
An XML document can be flat-structured as a single-layer hierarchy report from a relational database.
-
Allowing an XML document to become a multiple-layered, hierarchical structure, which is somewhat akin to the object data model, allows much more flexibility.
Lets say for example that you want to find a single country from the XML document shown in Figure 8-4.
So what if you want to find New Zealand? Your algorithm can search through the structure of the XML document tree, stopping the search when it has found the country you are looking for. In Figure 8-4, a tree scan or depth-first search would parse the XML document something like this:
-
Start at the root node, which is <planet name ="earth"> .
-
Dismiss the first region at the region level because you know that New Zealand is not in Africa and so the element <region name="Africa"> is examined but all its subtree nodes ( countries ) will be ignored.
-
The second region is <region="Australasia"> , which is the region you are looking for.
-
Then you ignore the first country because it is not New Zealand, but it is <country="Australia"> .
-
The second country in Australasia is the country you are looking for and thus New Zealand will be retrieved.
Important | The most important point is that everything else, all countries, are ignored! Its just like searching an index. |
The way you can most efficiently perform the preceding steps is to search, or parse, the XML document using an expression that directs the search for the correct country, within the correct region. The @ sign is used to denote use of an attribute value:
/planet/region/region[@name=Australasia]/country[@name=New Zealand]
The preceding expression is pseudocode. This means it wont work as it is in any particular environment.
So now lets use similar information, as shown in the XML document of Figure 8-4, but this time using a single hierarchical layer, flattened relational structure, as shown in Figure 8-5. The following query reduces the number of fields returned for the XML document shown in Figure 8-4, because there are just as many fields in Figure 8-4:
SELECT ' <record region="'R.REGION '" r_population="'R.POPULATION'" r_area="'R.AREA '" country="'C.COUNTRY '" c_population="'C.POPULATION'" c_area="'C.AREA'"></record>' FROM REGION R JOIN COUNTRY C ON(C.REGION_ID = R.REGION_ID) WHERE C.CURRENCY IS NOT NULL ORDER BY C.POPULATION;
I have also re-sorted the results in the preceding query, essentially mixing up countries and regions into a random order, and Figure 8-4 has the <xml> tag processing instruction included.
In actuality, there are two layers in the XML document shown in Figure 8-5. The first layer is the root node. Everything else is in the second layer.
Now imagine a program searching through the XML document shown in Figure 8-5. Lets say you wanted to find information on the two countries of China and Gambia. The XML document is not sorted according to the names of countries. In the extreme cases of China and Gambia, the document must be parsed from start to finish looking for both countries. Not using a tree-like hierarchical structure is extremely inefficient. There is no high-performance tree-searching algorithm for this example because there is no tree. A full scan is required for two reasons:
-
Records are not necessarily sorted in the order required.
-
Countries are not necessarily contained within regions. If you want to find a single region you would have to read the entire document, unless of course you knew beforehand exactly how many countries were in a particular region.
The benefit of using a hierarchical structure within XML documents should now be clear. And because an XML document hierarchical structure is somewhat equivalent to an object data model, it is an advantage to have even just a brief understanding of the object data model. Of course, XML does not require a specific structure because it is built to be as flexible as possible. However, because of the efficient nature of building tree structures into XML documents, the benefits should be plain to see.
So how can an XML document be compared with that of an object structure? Consider the following:
-
Every node in an XML document can contain a collection of other nodes.
-
Every node can have attributes further describing itself.
-
There are no methods attached to XML elements although something like eXtensible Style Sheet (XSL) templating could possibly be loosely considered as such.
-
There is no capacity for inheritance in basic XML documents because XML documents are data not metadata. Classes are very definitely a basic metadata structure, and object modeling inheritance occurs at the class level.
Try It OutUsing Flat and Hierarchical Data
|
The following query executes against a relational database, producing a flattened structure of records from five tables. Alter the XML data to a hierarchical, properly object-structured XML document, and execute the result as an XML document in a browser:
SELECT r.REGION, c.COUNTRY, l.LANGUAGE, p.YEAR, SUM(pl.FEMALE) AS Females, SUM(pl.MALE) AS Males FROM REGION r JOIN COUNTRY c ON (c.REGION_ID = r.REGION_ID) JOIN POPULATION p ON (p.COUNTRY_ID = c.COUNTRY_ID) JOIN POPULATIONBYLANGUAGE pl ON (pl.POPULATION_ID = p.POPULATION_ID) JOIN LANGUAGE l ON (l.LANGUAGE_ID = pl.LANGUAGE_ID) WHERE pl.FEMALE > pl.MALE AND p.YEAR BETWEEN 1981 AND 1985 GROUP BY r.REGION, c.COUNTRY, p.YEAR, l.LANGUAGE ORDER BY r.REGION, c.COUNTRY, l.LANGUAGE, p.YEAR;
This is the result, which has flattened out regions, countries, population, and languages from the demographics relational database model:
REGION COUNTRY LANGUAGE YEAR FEMALES MALES ---------------- ---------------- ---------------- ---------- ---------- ---------- Africa Mauritius Bhojpuri 1983 99467 97609 Africa Mauritius French 1983 19330 16888 Europe Finland Czech 1985 42 36 Europe Finland Estonian 1985 330 102 Europe Finland Finnish 1981 2323676 2177310 Europe Finland Finnish 1985 2371522 2225386 Europe Finland Icelandic 1985 21 16 Europe Finland Korean 1985 17 13 Europe Finland Norwegian 1985 123 115 Europe Finland Polish 1985 274 236 Europe Finland Russian 1985 1463 800 Europe Finland Siamese 1985 48 7 Europe Finland Swedish 1981 156237 143913 Europe Finland Swedish 1985 154975 144123 Far East Nepal Limbu 1981 65318 63916 Far East Nepal Magar 1981 107247 105434 Far East Nepal Sunwar 1981 5509 5141 Middle East Israel Arabic 1983 334885 332925 Middle East Israel Bucharian 1983 2740 2675 Middle East Israel Bulgarian 1983 9775 8215 Middle East Israel Czech 1983 1475 1175 Middle East Israel Dutch 1983 2080 1480 Middle East Israel French 1983 60835 46500 Middle East Israel Georgian 1983 10545 10065 Middle East Israel German 1983 37215 24195 Middle East Israel Greek 1983 1550 1425 Middle East Israel Hungarian 1983 23710 18055 Middle East Israel Indian 1983 7485 6825 Middle East Israel Italian 1983 4225 3650 Middle East Israel Kurdish 1983 3695 3470 Middle East Israel Persian 1983 22420 20575 Middle East Israel Polish 1983 23705 15370 Middle East Israel Portugese 1983 2340 1930 Middle East Israel Rumanian 1983 63730 52960 Middle East Israel Russian 1983 56565 44500 Middle East Israel Serbian 1983 1640 1335 Middle East Israel Spanish 1983 49805 42560 Middle East Israel Turkish 1983 5175 5050 Middle East Israel Yiddish 1983 101445 87775 North America Canada English 1981 7521960 7396495 North America Canada French 1981 3178190 3070905 North America Canada German 1981 266770 256085 North America Canada Ukrainian 1981 148570 143695 South America Paraguay Castellano 1982 91431 75010
Use a simple SELECT statement to produce a flat-structured XML document with two levels, including the root node. After that, edit the document manually such that all appropriate layers are included, making the XML document a hierarchical structure of more than two layers. This is how:
-
Edit the preceding SELECT statement and put all fields into a single record, for each row:
SELECT ' <record region="'r.REGION'" country="'c.COUNTRY '" language="'l.LANGUAGE'" year="'p.YEAR '" females="'SUM(pl.FEMALE)'" males="'SUM(pl.MALE)'"></record>' FROM REGION r JOIN COUNTRY c ON (c.REGION_ID = r.REGION_ID) JOIN POPULATION p ON (p.COUNTRY_ID = c.COUNTRY_ID) JOIN POPULATIONBYLANGUAGE pl ON (pl.POPULATION_ID = p.POPULATION_ID) JOIN LANGUAGE l ON (l.LANGUAGE_ID = pl.LANGUAGE_ID) WHERE pl.FEMALE > pl.MALE AND p.YEAR BETWEEN 1981 AND 1985 GROUP BY r.REGION, c.COUNTRY, p.YEAR, l.LANGUAGE ORDER BY r.REGION, c.COUNTRY, l.LANGUAGE, p.YEAR;
-
Store the output in a file and call it whatever you like. I will call it fig0806.xml.
-
Edit the XML file you just created. Add these two lines to the beginning of the file:
<?xml version="1.0"?> <planet name="earth">
-
Add this line to the end of the XML file:
</planet>
-
In a browser, execute the XML file you just created and edited. The result should look something like that shown in Figure 8-6.
Figure 8-6: Another poorly structured, single-layered XML document -
The next stage is to restructure the XML file shown in Figure 8-6 into a multiple level hierarchy XML document. The result is shown in Figure 8-7.
Figure 8-7: Another properly structured, multiple-layered XML document
How It Works
You created a complex SELECT statement retrieving a number of fields from five different related tables. You created a flat-structured XML document, and from that manually altered that XML document to a hierarchical structure, similar to that shown in Figure 8-7.
|
This chapter has attempted to briefly explain the object data model, and then apply some facets of that model to the hierarchical structure of XML documents. I hope this approach has helped to explain both topics and, in particular, the benefits of utilizing a hierarchical structure in XML documents rather than a flat structure.
| ||
| ||
|