Oracle9iR2 Data Warehousing
4.1 Summary tables
A common technique used in data warehouses is to precompute and store results of frequent queries. This is especially beneficial when the queries involve aggregation, because the result is usually much smaller than the detailed data used to produce the result. An example of such a query is a monthly sales report for the business. As multiple users are interested in the total sales of each product for each month, the data would be selected, joined, sorted, and aggregated over and over again for each user. Rather than wasting resources reexecuting the same query repeatedly, the result could be precomputed and saved in a table. Such precomputed results are often called summaries or summary tables.
Table 4.1 shows a summary containing the total number of items sold for each month of the year. Summary tables are usually much smaller than the tables containing the detail data. In this example, 17 rows of detailed sales transaction data is summarized into 10 rows. Depending on the data, the reduction in size and therefore the improvement in query performance can be quite significant.
Table 4.1: Summary Table
Detail Sales Transactions | ||
---|---|---|
Date | Customer | Product |
1/5/02 | Smith | Book |
1/6/02 | Jones | Tent |
2/5/02 | Smith | Book |
2/5/02 | Mills | Book |
3/5/02 | Smith | Film |
3/7/02 | Mills | Book |
3/5/02 | Smith | Film |
4/1/02 | Smith | Book |
4/2/02 | Jones | Book |
4/3/02 | Smith | Tent |
4/3/02 | Jones | Tent |
5/5/02 | Peters | Book |
5/6/02 | Smith | Book |
6/1/02 | Jones | Book |
6/5/02 | Smith | Tent |
6/6/02 | Jones | Tent |
6/5/02 | David | Tent |
Total Sales by Product, by Month | ||
---|---|---|
Month | Product | QTY Sold |
1/02 | Book | 1 |
1/02 | Tent | 1 |
2/02 | Book | 2 |
3/02 | Book | 1 |
3/02 | Film | 2 |
4/02 | Book | 2 |
4/02 | Tent | 2 |
5/02 | Book | 2 |
6/02 | Book | 1 |
6/02 | Tent | 3 |
Oracle 8i, with its Summary Management feature, made a huge advance in the way people used summary tables by letting the database manage summaries transparently.
4.1.1 Why do you need Summary Management?
To understand why Summary Management is needed, let us first look at some of the tasks involved in managing summary tables.
A summary is essentially the precomputed result of a SQL query. Once a summary has been created, the result of the query can be obtained from the summary. A summary can also be used to answer other related queries. For instance, if we were interested in the total sales of each product for each year, the result can be obtained by adding the months for that year together, since months roll up into years. To use a summary, the query has to be modified somehow to reference that summary. In some situations this would mean that application SQL must be modified. Alternatively, users must be informed of the existence of summarized data and trained on which summary tables to use for each particular query. Thus, while summary tables improve query performance, managing these summaries can be quite a task.
As new detail data is loaded into the warehouse, the data in the summary is no longer synchronized with the detail tables. When this happens, the summary is said to be stale. Table 4.2 shows a stale summary. In order to bring it up-to-date with the detail data, the stale summary must be refreshed. A summary can be rebuilt when new data is loaded into the warehouse. This is known as complete refresh. In some cases, it is possible to incrementally refresh the summary with only the new or changed data. Before Summary Management with Oracle 8i , refreshing summaries involved complex custom-built procedures.
Table 4.2: Stale Summary
Detail Sales Transactions | |||
---|---|---|---|
Date | Customer | Product | |
1/5/02 | Smith | Book | |
1/6/02 | Jones | Tent | |
2/5/02 | Smith | Book | |
2/5/02 | Mills | Book | |
3/5/02 | Smith | Film | |
3/7/02 | Mills | Book | |
3/5/02 | Smith | Film | |
4/1/02 | Smith | Book | |
4/2/02 | Jones | Book | |
4/3/02 | Smith | Tent | |
4/3/02 | Jones | Tent | |
5/5/02 | Peters | Book | |
5/6/02 | Smith | Book | |
6/1/02 | Jones | Book | |
6/5/02 | Smith | Tent | |
6/6/02 | Jones | Tent | |
6/5/02 | David | Tent | |
7/1/02 | Smith | Book | |
7/2/02 | Jones | Book | |
7/1/02 | David | Tent |
Total Sales by Product, by Month | ||
---|---|---|
Month | Product | QTY Sold |
1/02 | Book | 1 |
1/02 | Tent | 1 |
2/02 | Book | 2 |
3/02 | Book | 1 |
3/02 | Film | 2 |
4/02 | Book | 2 |
4/02 | Tent | 2 |
5/02 | Book | 2 |
6/02 | Book | 1 |
6/02 | Tent | 3 |
Choosing which summaries to create requires an understanding of the workload-what types of questions users are asking, and how often the same information is being requested. The number of possible summary tables that could be created is very large. Since summaries consume disk space and take time to refresh, it is important to select few summaries that produce the most performance benefits.
In conclusion, before using summaries, the following questions must be addressed:
-
What is the best set of summaries to create?
-
Do users have to be aware of summaries? If so, how will the users know what summaries exist and when to use them? If summaries are later determined not to be that useful and are dropped, users also need to know about this.
-
As the detail data change, how will the summaries be kept up-to-date?
Oracle's Summary Management has the answer to all these questions. With this feature, summaries can be created directly in the Oracle database. Mechanisms are provided to keep your summaries up-to-date with changes in underlying data. Further, Oracle will transparently rewrite your queries to use these summaries, so users do not have to be aware of the summaries.
4.1.2 Summary Management with Oracle
The Summary Management feature in Oracle includes the following components:
-
A database entity known as a Materialized View, which is essentially the summary table
-
Query Rewrite, which transparently rewrites SQL queries to use materialized views
-
A mechanism to refresh the summaries using either complete or incremental refresh
-
A Summary Advisor, which recommends what summaries to create
-
Dimensions, which provide an ability to declare hierarchical relationships such as rollups in the data, to assist query rewrite
With the Summary Advisor you can easily determine the materialized views to create for a given set of queries to fit a specified amount of space. Once the summaries have been created and enabled for query rewrite, queries will automatically use these summaries. A significant benefit of this is that the end users and database applications no longer need to be aware of the existence of the summaries. Many query tools, such as Microstrategy's DSS Agent and Information Advantage's Decision Suite, also provide some query rewrite capabilities (also known as aggregate navigation). However, unlike these tools, query rewrite in Oracle is very general and is not limited to star schemas or queries with aggregation. Any client tool can take advantage of this feature in the database server to provide aggregate navigation capabilities. For instance, Oracle Discoverer uses the Summary Management features in the Oracle database to improve query response time.
Summary Management also provides complete and fast refresh procedures, so that the summaries can be updated when new detail data is loaded into the warehouse. This eliminates the need to write complex incremental refresh programs.
We will now look at each aspect of Summary Management in detail.