Oracle9iR2 Data Warehousing

4.5 Dimensions

With Oracle 8i, in addition to defining primary-foreign-key relationships among tables, you can also describe the relationships between the columns within one or more tables by creating a new schema object, called a dimension. Telling Oracle about such relationships in the data makes it possible for the advisor to recommend the best set of summaries to create, for the optimizer to rewrite more queries to use the summaries, and for the OLAP tools to perform rollup and drill-down operations. We will see the power of dimensions in the section 4.6, where we talk about query rewrite.

Typically, relationships between fact and dimension tables are specified using referential integrity constraints. Dimensions are used to define the following types of relationships:

4.5.1 Creating a dimension

Once the tables that contain the dimension data is created, you can create the DIMENSION object, using the CREATE DIMENSION statement. It helps to draw a bubble diagram showing the hierarchical relationships among the columns prior to defining a dimension.

Figure 4.6 shows a bubble diagram for the geography dimension. Postal codes roll up into towns, which roll up into counties, which, in turn, roll up into countries.

Figure 4.6: Bubble diagram for the geography dimension.

To convert the bubble diagram in Figure 4.6 into a dimension definition, each bubble in the diagram becomes a LEVEL in the hierarchy, as shown in the following CREATE DIMENSION statement.

CREATE DIMENSION geography_dim LEVEL postal_code IS customer.postal_code LEVEL town IS customer.town LEVEL county IS customer.county LEVEL country IS customer.country HIERARCHY loc_rollup ( postal_code CHILD OF town CHILD OF county CHILD OF country );

The relationships between the levels are described with the HIERARCHY clause. The bubbles with arrows coming out of them in the diagram are described with the CHILD OF clause. In our example, postal code (zip code) rolls up into town; therefore, postal_code is a CHILD OF town. Town rolls up into county, therefore, town is a CHILD OF county. County rolls up into country, so county is a CHILD OF country.

For any value of a child column in a hierarchy, there must be one, and only one, value of its parent column. In the customer table used in the example, postal code 02134 is in Boston, which is in Massachusetts. Since a postal code is unique to any given town, it satisfies this rule. Postal code 02134 refers to addresses in Boston only. It cannot also be used to refer to addresses in San Francisco.

4.5.2 Defining a dimension with multiple hierarchies

Sometimes the same data can be rolled up in different ways. In our sample schema, EASYDW, as in many businesses, we use both a regular calendar and a fiscal calendar. In the regular calendar, days roll up into months, which roll up into years. In the fiscal calendar, days roll up into weeks, which roll up into fiscal quarters. This can be represented by defining multiple hierarchies within the dimension.

Figure 4.7 shows the bubble diagram for the time dimension. It contains two hierarchies, describing the different ways the time_key can be rolled up. Each arrow coming out of a bubble is described as a CHILD OF clause. Since there are two arrows coming out of the time_key bubble, there are two CHILD OF statements—one in each hierarchy. From a given date, one hierarchy tells us which week and fiscal quarter this date is in and the other tells us the month and year. At the top of every bubble diagram is the special level "ALL," representing the grand total level for that dimension. All levels of a hierarchy can be rolled up to ALL. We don't explicitly define the ALL level in the CREATE DIMENSION statement.

Figure 4.7: Bubble diagram for the time dimension.

The following SQL segment is an example of a dimension with multiple hierarchies.

CREATE DIMENSION time_dim LEVEL time_key IS time.time_key LEVEL month IS time.month LEVEL quarter IS time.quarter LEVEL year IS time.year LEVEL week_number IS time.week_number HIERARCHY calendar_rollup ( time_key CHILD OF month CHILD OF Year ) HIERARCHY fiscal_rollup ( time_key CHILD OF week_number CHILD OF quarter );

Again, for each value of a child column in a hierarchy, there is only one parent value. Any given date, say March 21, 2002 falls into one, and only one fiscal week_number and in a specific quarter.

4.5.3 Defining a dimension with attributes

Dimensions can be used to define relationships between columns within the same table that are not part of the hierarchy. Often users do not know the unique identifier for a record but know the name. For example, a customer may not know the product_id, but does know the product_name of what he or she is interested in purchasing. In relational database theory these relationships are called functional dependencies, or one-to-one relationships. A functional dependency from column A to B means that for every value of A, there is only one value of B. So if you know the value of A, then you can determine the value of B.

In a dimension definition, the ATTRIBUTE clause is used to define any functional dependencies. The ATTRIBUTE clause should be used only when there is a one-to-one relationship between the two columns.

Here we see the definition of a dimension with the attribute clause. The attribute clause tells us that given a product_id, there is only one product_name and one manufacturer. However, it does not mean that given the manufacturer we can determine the product_id.

CREATE DIMENSION product_dim LEVEL product_id IS product.product_id LEVEL category IS product.category HIERARCHY merchandise_rollup ( product_id CHILD OF category ) ATTRIBUTE product_id DETERMINES (product_name, manufacturer);

4.5.4 Defining a dimension with normalized tables

If you have a snowflake or other normalized schema, then your hierarchy may refer to columns in multiple tables. The dimension statement in Oracle allows you to declare such relationships as well.

Figure 4.8 shows a normalized time dimension. The bubbles and bold arrows define the rollup relationships as before. The dotted rectangles represent the tables where the levels come from. The dotted arrows show how the tables join to each other. In this example, there are separate tables for time, week, month, quarter, and year. The time table joins to the week table using the predicate time.week = week.week_number and to the month table using the predicate time.month = month.month.

Figure 4.8: A normalized dimension.

To convert this diagram to a CREATE DIMENSION statement, you would first define a level for each of the bubbles, specifying both the table name and column name for that level. For example, the level week is week.week_number. Next you would define your hierarchy using the level names as before. Finally, you would specify any joins that must be performed when traversing up the hierarchy using the JOIN KEY clause. Thus you would get the following CREATE DIMENSION statement:

CREATE DIMENSION time_dim LEVEL time_key IS time.time_key LEVEL month IS month.month LEVEL quarter IS quarter.quarter LEVEL year IS year.year LEVEL week IS week.week_number HIERARCHY calendar_rollup ( time_key CHILD OF month CHILD OF year JOIN KEY time.month REFERENCES month JOIN KEY month.year REFERENCES year ) HIERARCHY fiscal_rollup ( time_key CHILD OF week CHILD OF quarter JOIN KEY time.week REFERENCES week JOIN KEY week.quarter REFERENCES quarter );

In the fiscal_rollup hierarchy in this example, the JOIN KEY time.week column is used to join the time table to the week level.

As discussed in Chapter 2, normalized dimension tables incur the overhead of extra joins during query processing and hence must be used with care.

4.5.5 Validating a dimension

We have said that the relationships declared by the dimension object are not automatically checked by Oracle. In order to get accurate results with query rewrite, you must ensure that these relationships hold in your data. To help with this, Oracle provides a procedure known as DBMS_OLAP.VALIDATE_DIMENSION. You can run this procedure every time new data is loaded into your dimension tables to ensure data integrity.

For example, suppose we had the following customer dimension:

CREATE DIMENSION customer_dim LEVEL customer IS customer.customer_id LEVEL town IS customer.town LEVEL region IS customer.county HIERARCHY customer_zone ( customer CHILD OF town CHILD OF region ) ATTRIBUTE town DETERMINES postal_code ATTRIBUTE customer DETERMINES (sex, occupation);

According to this dimension, town determines the postal_code, and every town should fall in one, and only one, county. Let us check that this is indeed the case.

SELECT distinct town, county, postal_code FROM customer; TOWN COUNTY POSTAL_COD ---------- ---------- ---------- Chandlers Hants SO53 1TZ Eastleigh Hants SO53 1LD Liverpool Mersey LV1 1QT London London W1 1QC Soton Hants SO6 1gh Soton2 Hants SO1 1TF Soton3 Hants SO1 1TF Soton4 Hants SO1 1TF Soton5 Hants SO1 1TF Soton6 Hants SO1 1TF Soton7 Hants SO1 1TF Soton8 Hants SO1 1TF Soton9 Hants SO1 1TF

Now suppose some bad data got inserted into the customer table. The postal code for London was mistyped as W1 2QC and the town of Liver-pool was misclassified in the Hants county.

INSERT INTO customer VALUES ('AB130000', 'London', 'London', 'W1 2QC', '7-JUL-74', 'UK', 'Doctor', 'F'); INSERT INTO customer VALUES ('AB130001', 'Liverpool', 'Hants', 'LV1 1QT','18-JUN-73', 'UK', 'Doctor', 'M'); COMMIT;

If we run VALIDATE_DIMENSION, it will detect that there is a discrepancy in the HIERARCHY and ATTRIBUTE relationships declared by the customer_dim dimension.

variable run_id number; execute dbms_olap.create_id(:run_id); execute dbms_olap.validate_dimension('CUSTOMER_DIM', 'EASYDW', FALSE, TRUE, :run_id);

The VALIDATE_DIMENSION procedure will verify the integrity of the data and place any exceptions found in the MVIEW_EXCEPTIONS table.

SELECT distinct owner, table_name, dimension_name, relationship FROM system.MVIEW_EXCEPTIONS WHERE runid = :run_id; OWNER TABLE_NAME DIMENSION_NAME RELATIONSHIP ------ ----------- -------------- ------------ EASYDW CUSTOMER CUSTOMER_DIM ATTRIBUTE EASYDW CUSTOMER CUSTOMER_DIM CHILD OF

The bad_rowid column of MVIEW_EXCEPTIONS gives the table rowids with the discrepancy (i.e., all rows corresponding to the violated relationship). In our example, it would return all rows corresponding to the towns of London and Liverpool. We can now look at the actual data values by looking up the customer table, as follows. We can see that customer_ids AB130000 and AB130001 have mismatched values for town, postal_code, and county with respect to the remaining data.

SELECT customer_id, town, county, postal_code FROM customer WHERE rowid IN (select bad_rowid FROM system.mview_exceptions); CUSTOMER_I TOWN COUNTY POSTAL_COD ---------- ---------- ---------- ---------- AB123477 London London W1 1QC AB123478 London London W1 1QC AB123480 London London W1 1QC ... AB130000 London London W1 2QC <- bad data AB123496 Liverpool Mersey LV1 1QT AB123497 Liverpool Mersey LV1 1QT AB123498 Liverpool Mersey LV1 1QT ... AB130001 Liverpool Hants LV1 1QT <- bad data

In the next section, we will look at how query rewrite can be used to transparently rewrite queries with materialized views. We will also see how dimension objects are used.

Категории