Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit

Previous chapters illustrated that the basic process of building a data warehouse and OLAP system is started by performing the following steps:

After these steps have been performed, the final step is designing and building the OLAP cubes. The level of success in designing and building the OLAP cubes is determined by how well these preceding steps have been carried out.

After this lesson, you will be able to:

Estimated lesson time: 40 minutes

Determining Requirements

The best hardware and software will not bring success if the requirements of the data warehouse have not been as thoroughly determined as possible. Two ironies make this aspect of building a data warehouse particularly challenging:

Business and User Requirements

The first step in building a data warehouse and OLAP system is careful analysis of the business and user needs. A warehouse may have a wealth of aggregations, but if the dimensions do not reflect the business structure, the data will be difficult to use.

Technical Requirements

These business requirements must be evaluated in conjunction with the technical requirements. For example, if the storage space required for a given level of granularity is in the multi-terabyte range, the cost of storage may be prohibitive. You would then need to adjust your design accordingly by selecting a higher level of granularity, adjusting the lifetime of warehoused data, or perhaps both.

Designing and Building the OLAP Data Warehouse Database

There are several steps to follow in order to create an OLAP database. The OLAP database is the basis for the cubes that you build with OLAP Services. Determining the fact and dimension tables lays the groundwork for building dimensions and data cubes.

Design the Database

When designing a database,

Once the database has been designed, the database, tables, and indexes are created.

Create the Tables

One of your most important tasks is to identify the fact and dimension tables.

Create the Indexes

Indexes speed data access for regular queries and are used by OLAP Services when it builds cubes based on the underlying fact tables. Because the warehouse provides users with read-only access, you can create a number of indexes to speed up retrievals without concern for impact on inserts and updates. It is important to identify the primary queries that the users will perform and index along the dimensions of those queries. The SQL Server Index Tuning wizard can assist in this process.

Extracting, Cleaning, and Loading Data

Once the database has been created, the data must be loaded from the OLTP systems into the data warehouse. SQL Server 7.0 provides an application called Data Transformation Services (DTS) that allows data to be moved from various data sources into the data warehouse. During this period you can perform several steps:

After these processes are performed, the data is moved into the warehouse. You can now build OLAP cubes based on the underlying data warehousing tables.

Figure 9.1 shows the Northwind_Mart database OLAP star schema. The Sales_Fact table has measures in it. A measure is any quantifiable datum that the data warehouse should measure: dollars in an order, quantity shipped, or orders taken by an employee, for example. Any metric that provides valuable information about business operations is a valid measure, provided that information can be captured from operational systems. These measures are the values that end users want to see in dimensional analysis.

Figure 9.1 Northwind_Mart OLAP Star Schema

The other tables are the dimension tables that represent how users will view the measures in other words, how the data is to be filtered and subdivided. For example, you may want to see orders taken by a particular employee (filter), for each quarter over the past two years (subdivision, also known as dimension levels).

Multidimensional Structure (Cube)

The following is a very simple example of a cube.

You are the owner of a coffee stand. You have two employees, Claire and William. In order to pay each appropriately for their hard work, you track which employee sells how much of each item sold daily.

 Col ACol BCol C
Row 1FridayClaireWilliam
Row 2Regular2015
Row 3Decaf1811
 Col ACol BCol C
Row 1SaturdayClaireWilliam
Row 2Regular3022
Row 3Decaf2219
 Col ACol BCol C
Row 1SundayClaireWilliam
Row 2Regular1812
Row 3Decaf1811

If you were to stack these reports on a corner of your desk, you would create a multidimensional cube. If you wanted to analyze Claire s sales, you could "slice" your stack of reports down Column B. If you wanted to analyze how well decaf sells, you would "slice" along Row 3.

This is admittedly an extremely simple example. However, even the most complex multidimensional analysis is merely a computer-based extension of this simple process. Thankfully, computers find it much easier than most humans to think in multidimensional terms.

Data cubes are multidimensional structures that store the data for your OLAP system. Multidimensional means that cubes allow you to look at your data in various ways. In Figure 9.2, you want to know product sales by region and by time. The three dimensions are

Figure 9.2 Graphical representation of a cube

Each cell holds one value, exactly as a spreadsheet does. The address or location of each cell is the intersection of each dimension. The data or value in the cell is a summarized number from the online transaction processing (OLTP) system.

The cube holds the sales for all your products in the various locations, by period. To get an annual total, choose a product and location, and sum up the four period cells to get annual sales by product and location.

Alternatively, you could create a second cube that has all products by all locations, where the time slice is just an annual total. The cube will then be smaller, but you will no longer be able to retrieve totals by period you will be able to retrieve only annual totals. This is where careful analysis will reveal the needs of the system and determine the cost of adding cubes as compared with the required response times.

For each dimension in a cube, you define a hierarchy by the data to access. For example, location might actually be a hierarchy in which

With this structure, you can start at a location and drill down to a specific store.

Lesson Summary

This lesson emphasized how important and interdependent are data warehouse design decisions. For example, dimension and fact tables can be designed only after sufficient analysis and user interviews. The cubes that these end users will consume are wholly dependent on the denormalized data warehouse database.

Категории