Dimensional Data Warehousing with MySQL: A Tutorial
This chapter discusses the multi-path hierarchy, expanding on the single-path hierarchy you learned in Chapter 15, “Dimension Hierarchies.” You might recall from Chapter 15, the month dimension in our data warehouse has one hierarchy path, the year-quarter-month path. In this chapter we will add a new level, campaign session, and add a new hierarchy path of year-campaign-month. The month dimension will then have two hierarchy paths and therefore has a multi-path hierarchy.
Another topic of discussion in this chapter is the ragged hierarchy, which is a hierarchy that does not have data in one or more of its levels.
Adding A Hierarchy
I explain in this section how to add a hierarchy in a dimension that already has a hierarchy, thus forming a multi-path hierarchy. I also show how to populate the new hierarchy and verify that the population is successful.
First of all, you need to add a new column called campaign_session to the month_dim table. Figure 16.1 shows the schema after the addition.
You can use the script in Listing 16.1 to add the new column.
Listing 16.1: Adding the campaign_session column
/*****************************************************************/ /* */ /* add_campaign_session.sql */ /* */ /*****************************************************************/ USE dw; ALTER TABLE month_dim ADD campaign_session CHAR (30) AFTER month ; /* end of script */
Run the script in Listing 16.1 now.
mysql> \. c:\mysql\scripts\add_campaign_session.sql
You should see this on your console upon running the script.
Database changed Query OK, 82 rows affected (0.63 sec) Records: 82 Duplicates: 0 Warnings: 0
To understand how the campaign session works, look at the sample campaign sessions in Table 16.1.
Campaign Session | Month |
---|---|
2005 First Campaign | January-April |
2005 Second Campaign | May-July |
2005 Third Campaign | August-August |
2005 Last Campaign | September-December |
Each campaign session lasts one or more months. A campaign session might not run exactly in one quarter. This means, campaign session levels do not roll up to the quarter (the campaign session’s next higher level). Rather, the campaign sessions roll up to the year level.
Now you need to populate the campaign_session column. I’ve provided the following campaign session data for 2006 in the campaign_session.csv file.
CAMPAIGN SESSION, MONTH, YEAR 2006 First Campaign, 1, 2006 2006 First Campaign, 2, 2006 2006 First Campaign, 3, 2006 2006 First Campaign, 4, 2006 2006 Second Campaign, 5, 2006 2006 Second Campaign, 6, 2006 2006 Second Campaign, 1, 2006 2006 Third Campaign, 8, 2006 2006 Last Campaign, 9, 2006 2006 Last Campaign, 10, 2006 2006 Last Campaign, 11, 2006 2006 Last Campaign, 12, 2006
As usual, you don’t load data from a text file directly to a data warehouse table. Instead, you use a staging table. Listing 16.2 shows a script that creates a campaign_session_stg table.
Listing 16.2: Creating the campaign_session_stg table
/*****************************************************************/ /* */ /* create_campaign_stg. sql */ /* */ /*****************************************************************/ USE dw; CREATE TABLE campaign_session_stg ( campaign_session CHAR (30) , month CHAR (9) , year INT (4) ) ; /* end of script */
Run the script by calling the script name this way.
mysql> \. c:\mysql\scripts\create_campaign_stg.sql
Now you can load the 2006 campaign sessions into the month dimension. Listing 16.3 shows the script to do that.
Listing 16.3: Campaign session population
/*****************************************************************/ /* */ /* campaign_session.sql */ /* */ /*****************************************************************/ USE dw; TRUNCATE campaign_session_stg; LOAD DATA INFILE 'campaign_session.csv' INTO TABLE campaign_session_stg FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY "" LINES TERMINATED BY '\r\n' IGNORE 1 LINES ( campaign_session , month Adding A Hierarchy , year ) ; UPDATE month_dim a, campaign_session_stg b SET a.campaign_session = b.campaign_session WHERE a.month = b.month AND a.year = b.year ; /* end of script */
Run the script using this command.
mysql> \. c:\mysql\scripts\campaign_session.sql
Here is what you should see on the console.
Database changed Query OK, 1 row affected (0.05 sec) Query OK, 12 rows affected (0.09 sec) Records: 12 Deleted: 0 Skipped: 0 Warnings: 0 Query OK, 12 rows affected (0.05 sec) Rows matched: 12 Changed: 12 Warnings: 0
Now query the month_dim table to confirm the table has been correctly populated.
mysql> select month_sk, month_name, year, campaign_session -> from month_dim -> where year = 2006;
Here is the result.
+----------+-----------+------+----------------------+ | month_sk | month_name| year | campaign_session | +----------+-----------+------+----------------------+ | 11 | January | 2006 | 2006 First Campaign | | 12 | February | 2006 | 2006 First Campaign | | 13 | March | 2006 | 2006 First Campaign | | 14 | April | 2006 | 2006 First Campaign | | 15 | May | 2006 | 2006 Second Campaign | | 16 | June | 2006 | 2006 Second Campaign | | 17 | July | 2006 | 2006 Second Campaign | | 18 | August | 2006 | 2006 Third Campaign | | 19 | September | 2006 | 2006 Last Campaign | | 20 | October | 2006 | 2006 Last Campaign | | 21 | November | 2006 | 2006 Last Campaign | | 22 | December | 2006 | 2006 Last Campaign | +----------+-----------+------+----------------------+ 12 rows in set (0.00 sec)
Note | You load the campaign session CSV file in January every year when you get the data from the user and must do so before the population of the month_end_sales_order_fact table. |