Dimensional Data Warehousing with MySQL: A Tutorial
In this section, I explain and present an example of yet another date population technique. Using this technique you populate the date dimension by loading dates from the source.
When you populate the date_dim table by loading dates from the source, your date_dim table will store only the dates that are used, saving you disk space. Unfortunately, this method is more complex because you must load all dates to the date dimension from your data sources that have dates.
The script in Listing 6.3 loads the sales order dates from the sales_order table in the source database into the date_dim table. You use the DISTINCT keyword in the script to make sure no duplicates are loaded.
Listing 6.3: Loading dates from the source
/*****************************************************************/ /* */ /* source_date.sql */ /* */ /*****************************************************************/ USE dw; INSERT INTO date_dim SELECT DISTINCT NULL , order_date , MONTHNAME (order_date) , MONTH (order_date) , QUARTER (order_date) , YEAR (order_date) , '0000-00-00' , '9999-12-31' FROM source.sales_order WHERE order_date NOT IN (SELECT date FROM date_dim) ; /* end of script */
Before you run the script in Listing 6.3, truncate the date_dim table. Then, run the source_date.sql script using this command.
mysql> \. c:\mysql\scripts\source_date.sql
The response should be similar to this.
Database changed Query OK, 1 row affected (0.23 sec) Records: 1 Duplicates: 0 Warnings: 0
Query the sales_order source table and the date_dim table to confirm correct population. All source dates must get into the date_dim table. You query the source table using this command.
mysql> select * from source.sales_prder \G
The result is as follows.
*************************** 1. row *************************** order_number: 17 customer_number: 1 product_code: 1 order_date: 2007-02-06 entry_date: 2007-02-06 order_amount: 1000.00 *************************** 2. row *************************** order_number: 18 customer_number: 2 product_code: 1 order_date: 2007-02-06 entry_date: 2007-02-06 order_amount: 1000.00 *************************** 3. row *************************** order_number: 19 customer_number: 3 product_code: 1 order_date: 2007-02-06 entry_date: 2007-02-06 order_amount: 4000.00 *************************** 4. row *************************** order_number: 20 customer_number: 4 product_code: 1 order_date: 2007-02-06 entry_date: 2007-02-06 order_amount: 4000.00 4 rows in set (0.05 sec)
Now, query the date_dim table using this command.
mysql> select * from date_dim \G
You’ll see
*************************** 1. row *************************** date_sk: 1 date: 2007-02-06 month_name: February month: 2 quarter: 1 year: 2007 effective_date: 0000-00-00 expiry_date: 9999-12-31 1 row in set (0.00 sec)
You happen to have only one date in this example source data, February 6, 2007, therefore only this date is copied to the date_dim table.
Now, add some sales orders by running the script in Listing 6.4, then run the population script (source_date.sql) again.
Listing 6.4: Adding more dates from additional sales orders
/*****************************************************************/ /* */ /* more_sales_order.sql */ /* */ /*****************************************************************/ USE source; INSERT INTO sales_order VALUES (21, 1, 3, '2007-02-07', '2007-02-07', 1000) , (22, 2, 3, '2007-02-08', '2007-02-08', 1000) , (23, 3, 3, '2007-02-09', '2007-02-09', 4000) , (24, 4, 3, '2007-02-10', '2007-02-10', 4000) ; /* end of script */
You run the script in Listing 6.4 using this command.
mysql> \. c:\mysql\scripts\more_sales_order.sql
Then, run the source_date.sql script again using this command.
mysql> \. c:\mysql\scripts\source_date.sql
Finally, confirm that the four dates from the source are correctly loaded into the date_dim table.
mysql> select * from date_dim \G
Here is how your result should look like.
*************************** 1. row *************************** date_sk: 1 date: 2007-02-06 month_name: February month: 2 quarter: 1 year: 2007 effective_date: 0000-00-00 expiry_date: 9999-12-31 *************************** 2. row *************************** date_sk: 2 date: 2007-02-07 month_name: February month: 2 quarter: 1 year: 2007 effective_date: 0000-00-00 expiry_date: 9999-12-31 *************************** 3. row *************************** date_sk: 3 date: 2007-02-08 month_name: February month: 2 quarter: 1 year: 2007 effective_date: 0000-00-00 expiry_date: 9999-12-31 *************************** 4. row *************************** date_sk: 4 date: 2007-02-09 month_name: February month: 2 quarter: 1 year: 2007 effective_date: 0000-00-00 expiry_date: 9999-12-31 *************************** 5. row *************************** date_sk: 5 date: 2007-02-10 month_name: February month: 2 quarter: 1 year: 2007 effective_date: 0000-00-00 expiry_date: 9999-12-31 5 row in set (0 .00 sec)