Oracle9iR2 Data Warehousing
No book on data warehousing would be complete without a discussion of its two important applications—OLAP and data mining. In Oracle 9i, OLAP and Data Mining functionality are now integrated into the database server and are available as options to the Enterprise Edition of the database. We will discuss each of these options in this chapter.
9.1 Oracle OLAP
Online Analytical Processing (OLAP) is used to answer queries that provide information about your business. OLAP queries typically perform complex calculations, such as period-over-period comparisons, finding top-N products, time series analyses, and sales forecasting. The data is analyzed along multiple dimensions, such as product, geography, and time. Hence, this type of data is often called multidimensional data.
ROLAP versus MOLAP
Ever since the OLAP industry started, there has been an ongoing debate about the best way to store multidimensional data. One school of thought advocated storing data in tables in relational databases, which are known for their ability to scale to large amounts of data. This is known as Relational OLAP (ROLAP). In this case, analysis of data is done using SQL queries—all of the SQL features discussed in this book would be useful for relational OLAP. The other school of thought felt that multidimensional data processing should be done using a specialized storage format (called a multidimensional database or MDDB) designed to quickly answer OLAP queries. This is known as Multidimensional OLAP (MOLAP). Many vendors provide MOLAP capabilities, including Oracle's own Express Server product. The benefit of these products is that users can access the data very easily without complex queries. However, a major problem with this approach is that data has to be moved from the relational database, where it is typically stored, to the multidimensional database for analysis. This means that the data is replicated and can get out of sync. Further, MOLAP products, being primarily focused on analysis do not provide the same level of security and reliability that a relational database does. Having said that, both mechanisms have their merits, and whether the right choice is ROLAP or MOLAP depends on the application in question. In fact, prior to Oracle 9i, most businesses would have had to invest in both technologies—a relational database for simple analysis and reporting needs and a specialized analysis product for advanced analysis and business planning applications. Oracle 9i eliminates the need to have two separate analysis products by allowing you to do both within the Oracle database server.
Oracle 9i OLAP
Over the past few years, starting with Oracle 8i, Oracle has been incorporating OLAP functionality into the database to support relational OLAP. Analytical functions and the aggregation operators, CUBE, ROLLUP, and GROUPING SETS, discussed in Chapter 3, allow users to do complex OLAP calculations through SQL. Materialized views and query rewrite allow you to preaggregate data so that queries can be answered quickly.
With Oracle 9i Release 2, Oracle now also supports multidimensional OLAP directly in the database. This is available via the OLAP Option of Oracle 9i Database, Enterprise Edition. With the OLAP Option, the database can store data in a multidimensional format in an entity known as an analytical workspace. Further, there is a full-featured multidimensional calculation engine built into the database. This means that you have the full analytical capabilities provided by any traditional MOLAP product. However, with Oracle 9i OLAP Option, you also get the added benefits of security, manageability, and reliability provided by a database management system. Thus, Oracle 9i Release 2 can be called a relational-multidimensional OLAP database!
With the introduction of the Oracle 9i OLAP Option, the calculation capabilities of Oracle's MOLAP product, Oracle Express Server, are now available in the database. Existing Oracle Express databases can be migrated to analytical workspaces in the database. You can build Java applications for OLAP using OLAP APIs and reusable components known as BI Beans, integrated into JDeveloper. This is analogous to the Oracle Objects application development environment supported by Oracle Express, except that it is a lot more powerful. In the future, products such as Oracle Financial Analyzer and Oracle Sales Analyzer, which use Express, may be replaced by applications that use the Oracle 9i OLAP functionality.
Oracle 9i OLAP Option and Discoverer
You may wonder why you need the OLAP Option when you already have tools such as Discoverer. Discoverer and the OLAP Option serve complementary functions. Discoverer is a packaged application that is suited for ad hoc querying and reporting needs. It is designed for end users who may not know SQL. However, some applications and users require more complex analytical functionality, such as financial forecasting, budget allocation, and modeling. The OLAP Option is targeted toward these high-end analytical needs. Further, the OLAP Option also provides an OLAP application development framework. With BI Beans and OLAP API, you can write custom applications specific to your business requirements.
We have discussed Oracle's SQL query processing features at length throughout this book. In this chapter, we will focus on the multidimensional analysis model provided by the OLAP Option. For those of you who have used Oracle Express before, we will point out the equivalent concepts in the two products wherever possible.
Let us start by looking at some of the applications where the OLAP Option is most useful.
9.1.1 Analytical applications
Once a data warehouse has been built, a business may deploy a host of business intelligence applications. Some of these may be ad hoc querying and reporting applications such as Discoverer and Reports, discussed in Chapter 6. However, a business may need several advanced analysis applications, such as demand planning, sales forecasting, corporate budgeting, and financial modeling. These applications not only analyze data but also generate new data for the business. The Oracle OLAP Option is very well suited to these advanced analytical applications, since it provides a multidimensional storage and analysis model. Let us briefly look at some of the operations commonly performed by these applications.
Analytical calculations and aggregation
OLAP applications often require aggregation and analyses involving ranking (such as top-10 products), time-series calculations (such as moving average), and inter-row calculations (such as period-over-period comparisons).
Note that these calculations can now also be done in SQL, as discussed in Chapter 3, but have traditionally been done by multidimensional analysis tools.
Forecasting
Forecasting, as the name suggests, involves predicting a quantity based on available historical figures—for instance, forecasting sales for the next quarter based on results of the past year. These applications use advanced statistical algorithms, such as linear and nonlinear regressions, single and double exponential smoothing, and Holt-Winters method.
Allocation
Allocation, also known as reverse aggregation, is used to divide a quantity such as a budget or a quota into several parts. Allocation is an important part of business planning applications. These applications require the ability to apportion a budget within a hierarchical structure or among specific members in the organization. The allocation may be weighted using user-defined parameters or analytical functions.
Financial calculations
These are calculations that can be conveniently done in a spreadsheet environment, for example, interest calculations and payment schedules.
Modeling
Modeling involves describing a quantity using a set of equations. The model can then be used to compute other quantities by plugging in data into these equations. The equations may have an implied dependency order among them and can compute new values of dimensions and facts. For example, a simple modeling program may involve equations such as the following:
Profit = Revenue - Cost Revenue = SalesChannel('Internet') + SalesChannel('Retail') RetailSales('Clothes') = RetailSales('Mens wear') + RetailSales('Womens wear') RetailSales(quarter) = lag(RetailSales,quarter),1) * 1.1 ...
What-if analysis
What-if analysis or scenario management is a very important aspect of advanced analytical applications. It involves analyzing data under hypothetical scenarios to determine their impact on the business. For instance, how much will it cost the company if we were to close down some of our retail stores and start an online outlet store? What will be the impact on revenues if we made a change to our sales organization? What-if analysis requires a transactional model different from that provided by traditional SQL. Users must be able to change the structure and content of the data in a localized manner without making the data visible to the entire database. Further, the changes may be temporary and the user should be able to restore the data back the way it was. You can think of this as a long-running session where you do a sequence of DDL and DML operations and commit and roll back the changes; they are visible only within your session.
Oracle 9i OLAP supports all the above operations with the 9i OLAP Option.
9.1.2 Oracle OLAP architecture
We will now describe the various components that form Oracle 9i OLAP and then discuss how to use each of them.
Hint: | There has been a significant change in the architecture of Oracle OLAP in 9i Release 2. All examples and functionality described in this chapter refer to 9i Release 2. |
Figure 9.1 shows the architecture of Oracle 9i OLAP. It consists of the following components:
-
Oracle 9i database and SQL engine
-
Analytic workspaces
-
OLAP calculation engine
-
OLAP DML command language
-
SQL table functions
-
OLAP catalog
-
OLAP API
-
BI Beans
Oracle 9i database
The Oracle 9i database serves as the data store for all data in the OLAP Option. The data may be stored in relational tables (typically in star or snowflake schemas) or in multidimensional format in analytical workspaces within the Oracle database. Consolidating all types of data into one data store provides many advantages:
-
All data can be managed by a single application, Oracle Enterprise Manager.
-
The database provides a uniform security mechanism for all types of data.
-
The data does not have to be moved into a separate database and hence the time lag involved in making data available for analysis is reduced.
-
The reliability and availability of the data are improved due to features such as Real Application Clusters and Data Guard.
Analytical Workspaces
An analytical workspace is an entity that stores data in a multidimensional format within the Oracle database. As with any other database object, an analytical workspace has a specific owner. However, unlike other objects, an analytical workspace can be permanent or temporary for the duration of analysis. You would use an analytical workspace for data when you need to use the advanced analytical capabilities of the OLAP calculation engine.
OLAP calculation engine
The OLAP calculation engine is a full-featured multidimensional calculation engine running inside the Oracle 9i database server. It operates on data stored in multidimensional form in analytical workspaces. It complements the analytical features provided by SQL with various calculation capabilities. Some of the common types of analyses done using it include forecasting, allocation, and modeling.
OLAP DML
OLAP DML is a simple but powerful programming language that is used to load, manipulate, and query data in analytical workspaces. It provides several operations, such as aggregation, forecasting, regression analysis, numerical calculations, and time-series manipulation. OLAP DML can be issued either by using the OLAP API, the DBMS_AW PL/SQL package, or the OLAP Worksheet tool (similar to SQL*Plus) in Oracle Enterprise Manager. If you are familiar with Oracle Express, you may recognize that OLAP DML is very similar to the SPL language. In fact, existing programs written for Oracle Express should work with only minor changes in OLAP DML.
Table functions
In Chapter 5, we saw how table functions can be used to perform various ETL functions for your data warehouse. A table function can perform any kind of computation underneath but finally produces its output as a set of rows. Hence, it can be used in SQL queries as if it were a table in a database. Because of this capability, table functions provide a mechanism to access multidimensional data using SQL. If you have a table function that provides access to the analytical workspace using OLAP DML, applications can then use it like a table in a regular SQL query. Oracle provides a table function, called OLAP_TABLE, that provides SQL access to an analytical workspace. This makes Oracle OLAP accessible to users who are unfamiliar with OLAP DML.
OLAP Catalog
The OLAP Catalog describes the logical metadata model for multidimensional data, using dimensions, levels, hierarchies, attributes, measures, and cubes. You should be familiar with the terms dimension, levels, attributes, and hierarchies from the SQL dimension object, discussed in Chapter 4. A measure is the same as a fact in your fact table, such as purchase_price, and is typically used in aggregations. A cube defines how measures are computed along various dimensions. Later in this chapter, we will describe how you can create these entities using Oracle Enterprise Manager. Note that the OLAP catalog merely defines the metadata and does not actually populate the data into the cube. You must create metadata in the OLAP catalog if you would like to use OLAP APIs, BI Beans, or create analytical workspaces using Oracle-supplied packages.
OLAP API
The OLAP API is a set of Java programming interfaces for Oracle OLAP. The OLAP API allows application developers to write programs to perform calculations and multidimensional selection and navigation through the data. Since it is Java based, the OLAP API provides a portable, object-oriented application development framework for OLAP applications. The objects being manipulated by OLAP API must first be defined in the OLAP catalog.
BI Beans
BI Beans are reusable components specially designed for rapid development of OLAP applications. You can create BI Beans using simple wizards in JDeveloper and store them persistently in the database. BI Beans can perform various operations such as connecting to a database, forming analytical calculations, and displaying them in various graphical and tabular formats. These can then be used in Java or JSP applications that need analytical capabilities. BI Beans use the OLAP APIs to access data.
We will now look at each of these components in detail.
9.1.3 Analytical Workspaces
Analytical workspaces allow you to store data in a multidimensional form. As with relational tables, an analytical workspace is owned by a specific schema and uses an Oracle tablespace for storage. You can use analytical workspaces to store data that is used in calculations such as forecasting and allocations.
Hint: | Existing Oracle Express databases can also be migrated into analytical workspaces in the Oracle database. |
Analytical workspaces can be persistent or temporary, depending on your needs. If you need to perform a calculation but do not need to store the results, you can discard changes done within the analytical workspace at the end of the session. A temporary analytical workspace is often used for what-if analysis, where you want to try different hypothetical scenarios but not make all the changes persistent. Unlike relational tables, where changes done by DDL, such as adding a column, are automatically made visible throughout the database, all changes done within the analytical workspace are local to your session unless explicitly committed.
You can also store aggregate data within an analytical workspace, much like a materialized view; however, you will be responsible for refreshing the data. Incremental refresh mechanisms are not available for data in analytical workspaces.
Now, let us look at the physical storage model used by analytical workspaces and how it differs from the relational model.
Concepts
Throughout this book we have described how a data warehouse can be created and managed using relational tables. In this relational world, data is typically stored in a star or snowflake schema. The fact table stores various quantities that you want to analyze with respect to each dimension. Dimension objects are used to define the hierarchical relationships between various columns in the dimension tables. These dimension tables are sometimes referred to as level-based dimensions. To perform a calculation, queries must join the fact and dimension tables using appropriate predicates.
In a multidimensional format, such as an analytical workspace, there are no tables or columns. Instead, there are three main types of entities:
-
Dimension: Unlike the SQL dimension object, which expresses relationships between columns, a dimension in this world corresponds to a list of values. These are called value dimensions. For instance, a city dimension may consist of the values Boston, London, and San Francisco. Your geography dimension may consist of the values World, United States, Massachusetts, New Hampshire, UK, and London. Thus, there is no hierarchy that is implied by a dimension in itself. To specify any relationships between various values in the dimension, you must create a relation.
-
Relation: A relation stores the correspondence between a value in one dimension to another value in the same or another dimension. Note that relations can also declare a relationship between two values in the same dimension—these are called self-relations. Just as a hierarchy describes relationships in a level-based dimension, a relation describes relationships in a value-based dimension.
You can think of the relationship defined by a relation as a parent-child dimension table in a relational schema. Figure 9.2 shows the difference between a level-based dimension table and a parent-child dimension table in the relational world.
Figure 9.2: Level-based versus parent-child dimension tables. -
In a level-based dimension, each level is stored in a separate column. For instance, you have columns corresponding to the city, state, and region levels. On the other hand, in a parent-child dimension, all values are stored in the child column and parent column has the corresponding parent value. For example, suppose the child column contains values such as Boston, San Francisco, and MA. For each value in the child column, there will be a corresponding value in the parent column. For instance, for the child value, Boston, the parent value is MA. Unlike a level-based dimension table, where higher-level values are repeated for every lowest-level value, in a parent-child dimension every relationship is stored exactly once and hence the validity of the dimension data is automatically ensured.
-
Variable: A variable is used to store data and is equivalent to a fact table. It is defined with respect to a specific set of dimensions. Figure 9.3 shows a conceptual picture of how data is stored in a variable. In this example, the sales variable is dimensioned by geography and time. You can query the value of a variable for any values of the dimensions it is defined against. For instance, the sales value for United States for the year 2002 is $3,102. Notice that this is similar to a spreadsheet, where you can retrieve the value of any cell by simply specifying the row and column.
Figure 9.3: Conceptual view of a variable.
There are several advantages to this multidimensional storage format:
-
It enforces referential integrity. For instance, if a variable is defined along customer and time dimensions, every cell of the data will have some unique value of a customer and time. Also, relationships between dimension values are stored exactly once and hence you will not end up with inconsistent data, such as Boston, MA, and Boston, CA.
-
There is an implicit ordering among rows in the dimension that is defined at creation. This is unlike SQL, where you must explicitly add ORDER BY clauses to get data out in a certain order.
-
Users don't need to specify how to join the fact and dimension tables to get their answers. They can simply ask to report the variable for any dimension values, similar to a spreadsheet. However, unlike a spreadsheet, you are not restricted to two dimensions.
-
The data can be presented to the application as "fully solved." Once the DBA sets up the analytical workspace with various calculations, the application users do not have to describe how to perform a calculation as part of the query. They just have to indicate which of the available calculations they would like, and the calculation engine will take care of the details of computing it. The calculation may be a complex analytical function, a formula, or an aggregate. The data may be precomputed for performance or calculated on the fly; however, the application does not have to know about how it is done.
At this point, you may be wondering what is involved in creating and querying these analytical workspaces. To manipulate data stored in analytical workspaces you use the OLAP DML language. You can use OLAP DML to create dimensions and variables and to load data into the analytical workspace. However, in most situations, you probably already have your data warehouse organized in a star or snowflake schema. In this case, you can simply use the package CWM2_OLAP_AW_CREATE or the Analytic Workspace Manager Application, described later in this chapter. If you would like to use any of Oracle's tools, such as OLAP API or BI Beans, the analytical workspace must conform to a certain standard format. The package will automatically create this standard analytical workspace for you; hence, it is strongly recommended that you use these packages rather than manually create the analytical workspace elements.
9.1.4 OLAP DML
OLAP DML is a very simple but powerful language that allows you to express a variety of calculations and do spreadsheet-like reporting on data stored in an analytical workspace. It provides functions for forecasting, allocation, aggregation, statistical analysis, and financial calculations.
You can execute OLAP DML using either the OLAP Worksheet in Oracle Enterprise Manager, using the DBMS_AW PL/SQL package, or using the SPLExecutor class of the OLAP API.
To access the OLAP Worksheet from Oracle Enterprise Manager, expand the Database node in the Navigator panel, followed by the Warehouse and OLAP nodes, and then use the Object Menu. It is a simple application (similar to SQL*Plus) that allows you to execute OLAP DML commands as well as edit, compile, and run programs written in the OLAP DML language. Figure 9.4 shows the OLAP Worksheet. You issue the commands in the lower portion of the window and the results appear in the upper portion. For instance, in Figure 9.4, we are querying a dimension named OLAP_TIME_KEY.
Hint: | OLAP DML is very different and completely separate from SQL. However, there is a SQL mode in the OLAP worksheet where you can issue regular SQL statements as in SQL*Plus. Conversely, you can issue OLAP DML commands in SQL*Plus using the DBMS_AW package. |
Before you can issue any OLAP DML commands, you must first create or attach to an analytical workspace. The following AW CREATE command creates an analytical workspace named EASYDWOLAP. We are associating a specific tablespace, OLAPTBSP with it.
AW CREATE EASYDWOLAP TABLESPACE olaptbsp
In the next few sections, we will show some examples of using OLAP DML to illustrate the types of calculations that can be done with analytical workspaces. However, this is not a tutorial on OLAP DML; for that you must refer to the Oracle documentation. The OLAP worksheet also has an excellent help system, which describes all OLAP DML commands with examples.
Dimensions and Variables
Dimensions and variables are the basic elements of the multidimensional storage format. We will briefly show how they are defined, so you can understand the examples in the subsequent sections.
The first example defines an equivalent of a customer dimension with the hierarchy customer_id, town, and county.
DEFINE cust_id DIMENSION TEXT DEFINE cust_town DIMENSION TEXT DEFINE cust_county DIMENSION TEXT DEFINE customers DIMENSION CONCAT (cust_county cust_town cust_id) DEFINE customers.parents RELATION customers <customers>
Each level of the hierarchy cust_id, cust_town, and cust_county is specified as a dimension. The CONCAT command lets you create a single dimension—customers containing values from all three dimensions: cust_id, cust_town, cust_country. The customers.parents RELATION is a self-relation indicating the parent child relationship between various values within the customers dimension. Note that the actual relationship will be determined when we load data into these dimensions, which we will not show here.
The following group of commands defines a time dimension—very similar to the customers dimension. There are three separate dimensions—time_id, time_month, and time_year, corresponding to various levels—and a relation, timedim.parents, that establishes the parent-child relationships between the various dimension values.
DEFINE time_id DIMENSION TEXT DEFINE time_month DIMENSION NUMBER(2) DEFINE time_year DIMENSION NUMBER(4) DEFINE timedim DIMENSION CONCAT (time_year time_month time_id) DEFINE timedim.parents RELATION timedim <timedim>
Next we will define a COMPOSITE dimension, which specifies what dimensions will be used together. This is useful when the data do not have all combinations of customers and timedim values; in this case a composite will only store the relevant <customers, timedim> pairs.
DEFINE dims COMPOSITE <customers timedim>
The composite dimension will be used to dimension our variables, defined in the following example. As mentioned earlier, a VARIABLE is similar to a fact table and must be dimensioned by one or more available dimensions. We will define two variables—purchase_price and ship_charge—dimensioned using the composite dimension, DIMS, defined earlier.
DEFINE purchase_price VARIABLE NUMBER (6,2) <dims <customers timedim>> DEFINE ship_charge VARIABLE NUMBER (6,2) <dims <customers timedim>>
We will use these dimensions and variables in the examples in the following sections.
Reporting data with OLAP DML
We can report the data in the variable along its dimensions using the REPORT command. The REPORT command returns the answer, similar to a spreadsheet, with one dimension along the rows and another along the columns. You can restrict the data you are querying to a specific value or list of values using the LIMIT command. This is similar to a selection specified by a WHERE clause in SQL. However, unlike SQL, where the selection is specified on a query basis, the LIMIT commands in an OLAP DML persist as long as you are attached to the analytical workspace.
In the following example, we are querying the purchase_price variable for the cust_county dimension for months January through March. The keyword ACROSS indicates that the time_month dimension values will be reported as columns. Notice how the engine automatically figured out the dimensions involved, using the definition of the variable.
LIMIT customers to cust_county LIMIT time_month to 1 to 3 REPORT across time_month : purchase_price ---------PURCHASE_PRICE--------- -----------TIME_MONTH----------- CUSTOMERS 1 2 3 -------------------------------- ---------- ---------- ---------- <CUST_COUNTY: Hants> 1,976.03 1,924.90 3,286.77 <CUST_COUNTY: London> 2,335.90 2,555.28 3,122.70 <CUST_COUNTY: Mersey> 2,335.90 2,555.28 3,122.70
You can also compute other variables using the variables defined earlier. For instance, the following example defines a variable named totalsales that computes the sum of purchase_price and shipping charge.
DEFINE TOTALSALES VARIABLE NUMBER (6,2) <DIMS <CUSTOMERS TIMEDIM>>
We then compute this variable for the lowest levels of the customers and timedim dimensions.
LIMIT timedim TO time_id LIMIT customers TO cust_id ACROSS CUSTOMERS TIMEDIM DO 'TOTALSALES = PURCHASE_PRICE + SHIP_CHARGE'
We can now report on this new variable. In the following example, we are asking for the totalsales for the first three months for each customer. Notice that we did not have to specify how to compute totalsales as part of the query. This is where the power of OLAP DML becomes apparent.
LIMIT CUSTOMERS TO CUST_ID LIMIT TIME_MONTH TO 1 TO 3 REPORT across time_month : totalsales -----------TOTALSALES----------- -----------TIME_MONTH----------- CUSTOMERS 1 2 3 ------------- ---------- ---------- ---------- <CUST_ID: AB123456> 32.51 134.24 143.46 <CUST_ID: AB123457> 65.02 188.42 65.02 ...
Aggregating data
You can use OLAP DML to perform aggregation. Let us assume that we have loaded the data corresponding to the cust_id and time_id values into the variable purchase_price. We would now like to compute the values for the higher levels, such as cust_county and time_month, using aggregation. To aggregate the data, we must define an aggregation map using the AGGMAP command. The aggregation map gives the calculation engine instructions on how the aggregation is to be performed—which aggregate function to apply, which levels to aggregate over, and what aggregates to precompute. Once the DBA has defined the aggregation map, it can be used without knowing the details of how the aggregation is performed underneath.
The following AGGMAP indicates that you want to compute aggregates over the time_id dimension and over the cust_id dimension. The AGGMAP specifies the relations that determine how to aggregate from one level to the next.
DEFINE easyagg AGGMAP AGGMAP RELATION timedim.parents PRECOMPUTE (ALL) RELATION customers.parents PRECOMPUTE (ALL) END
You can now report the data using the previous AGGMAP, as follows:
>report across time_month : aggregate(ship_charge using easyagg) ----AGGREGATE(SHIP_CHARGE------- --------USING EASYAGG1)--------- -----------TIME_MONTH----------- CUSTOMERS 1 2 3 -------------------------------- ---------- ---------- ---------- <CUST_TOWN: Eastleigh> 2.95 7.45 11.95 <CUST_TOWN: Soton> 10.40 14.90 23.90 <CUST_TOWN: Chandlers> 2.95 5.90 11.95 <CUST_TOWN: Soton2> 2.95 5.90 11.95 <CUST_TOWN: Soton3> 2.95 5.90 2.95 ...
In the above example, we asked to precompute all aggregates but we could get quite sophisticated and ask to precompute only some levels or only some values.
Forecasting
One of the common operations performed using OLAP DML is forecasting. This is one of the many advanced analytical operations that can be performed with analytical workspaces and OLAP DML but not in SQL. To forecast a quantity such as sales we must perform the following steps:
-
Define variables to store the forecast results.
-
Specify the parameters of the forecast.
-
Execute the forecast.
We will show a very simple example of forecasting sales for the next 6 months based on the past 6 months.
The first step is to define a variable called SALES_FORECAST, which stores the result of the forecast. Note again that we have dimensioned this variable using the composite dimension DIMS.
DEFINE SALES_FORECAST VARIABLE NUMBER(6,2) <DIMS <TIMEDIM CUSTDIM>>
Next, we constrain timedim to the time_month level and customers to the cust_id level. This means that the forecast will be computed using the months in the time dimension, for each cust_id value in the customers dimension.
LIMIT TIMEDIM TO TIME_MONTH LIMIT CUSTOMERS TO CUST_ID
To specify parameters and run the forecast, we must create a handle, which will be used by subsequent commands. The handle, called sf_handle, is obtained by calling the FCOPEN command.
DEFINE sf_handle VARIABLE INTEGER; sf_handle = FCOPEN('EasyDWSalesForecast')
Next, we will set the forecast parameters using the FCSET command. We are using the automatic method for forecasting. We will consider 6 time periods (months) as historical data and forecast using a periodicity parameter of 2.
FCSET sf_handle method 'automatic' histperiods 6 periodicity 2
Finally, we execute the forecast using the FCEXEC command. We must specify the name of the time dimension and also the variable containing the data to be used for the forecast—in our case, purchase_price. The results are placed into the sales_forecast variable defined earlier.
FCEXEC sf_handle TIME timedim INTO sales_forecast purchase_price
Finally, we close the handle, as follows:
FCCLOSE sf_handle
You can now use an AGGMAP, discussed earlier, to aggregate the sales forecast to higher dimension levels for reporting purposes. Assuming we have done so, we can now report the sales_forecast variable as before to see the forecasted results for the months 9 through 11.
LIMIT time_month to 9 to 11 LIMIT customers to cust_id REPORT across time_month: sales_forecast ---------SALES_FORECAST--------- -----------TIME_MONTH----------- CUSTOMERS 9 10 11 ------------- ---------- ---------- ---------- <CUST_ID: AB123456> 232.51 334.24 343.46 <CUST_ID: AB123457> 165.02 188.42 265.02 ...
Creating custom measures
With OLAP DML you can also define formulae to perform calculations using simple arithmetic or analytical functions. Once defined, users can then reference these calculations in reports like any other variables defined earlier. Again, we should emphasize that the end user does not need to know how the calculation was done or what analytical function it used.
The following example defines a simple FORMULA for totalsales as a total of purchase_price and ship_charge.
DEFINE TOTALSALES FORMULA DECIMAL <TIMEDIM CUSTOMERS> EQ purchase_price + ship_charge
The following example reports the totalsales values for months 4, 5, and 6.
LIMIT time_month TO 4 to 6 LIMIT timedim TO time_month REPORT across cust_dim : totalsales -------------------TOTALSALES------------------- ------------------CUST_COUNTY------------------ TIMEDIM Hants LondonC Mersey --------------- --------------- --------------- --------------- <TIME_MONTH: 4> 6,376.96 7,687.26 7,687.26 <TIME_MONTH: 5> 4,493.81 5,363.68 5,363.68 <TIME_MONTH: 6> 13,749.80 16,663.60 16,663.60
The next example defines a more complex formula to compute change in sales from one time period to the prior time period using the LAGDIF function. The NOSTATUS keyword indicates that we will compute the LAGDIF using all dimension values for the timedim dimension, regardless of those selected by the current LIMIT specification
DEFINE SALES_PREV_MONTH FORMULA DECIMAL <TIMEDIM CUSTOMERS> EQ LAGDIF(PURCHASE_PRICE,1,TIMEDIM,NOSTATUS)
We will report it with the same limit settings as the previous example.
REPORT across cust_dim: sales_prev_month ----------------SALES_PREV_MONTH--------------- ------------------CUST_COUNTY------------------ TIMEDIM Hants LondonC Mersey --------------- --------------- --------------- --------------- <TIME_MONTH: 4> 3,885.18 4,481.96 4,481.96 <TIME_MONTH: 5> -1,883.15 -2,323.58 -2,323.58 <TIME_MONTH: 6> 9,254.98 11,298.91 11,298.91
In this section, we have given you a quick but broad overview of OLAP DML and analytical workspaces. We saw how data is stored, aggregated, calculated, and reported in the multidimensional format. We have only scratched the surface of what can be done with OLAP DML, but hopefully you have some idea of the simplicity and power of this language.
Besides interactively issuing OLAP DML using the OLAP Worksheet, you can also use it within an application, as discussed in the next section.
9.1.5 Accessing analytical workspaces in an application
Oracle OLAP provides several ways by which an application can access multidimensional data in an analytical workspace. These are summarized in Figure 9.5. A PL/SQL application using OLAP DML can access an analytical workspace using the DBMS_AW PL/SQL package. Java programs can access it using the OLAP API, provided that appropriate OLAP catalog metadata has been defined. SQL applications can also access the data by defining table functions, which convert the data into rows. If necessary, relational views may be defined on top of the table functions, so access to the analytical workspace is further encapsulated.
We will now briefly look at each of these mechanisms.
DBMS_AW package
The DBMS_AW package provides functions that allow you to execute OLAP DML commands and programs using PL/SQL programs or SQL*Plus.
The EXECUTE procedure can be used to execute one or more OLAP DML commands and print the output to the screen using the DBMS_OUTPUT package. The following example attaches to the EASYD-WOLAP analytical workspace and reports the cust_town dimension.
SET SERVEROUTPUT ON; BEGIN DBMS_AW.EXECUTE('aw attach easydwolap report cust_town'); END; / CUST_TOWN -------------- Eastleigh Soton Chandlers Soton2 Soton3 Soton4 Soton5 ...
SQL access to multidimensional data
Applications can access multidimensional data stored in analytical workspaces with SQL by using SQL table functions. Oracle provides a table function called OLAP_TABLE to do this but you can also write your own custom table functions. For instance, suppose we want to retrieve the data from the analytical workspace EASYDWOLAP defined earlier. We would like to get the purchase_price along the customer and time dimensions. To use the OLAP_TABLE function, as with any other table function described in Chapter 5, you must first define a TYPE that describes the rows being returned and a TYPE that describes a TABLE whose rows are objects of this type.
For our example, we define a type purchase_price_type as follows:
CREATE TYPE purchase_price_type AS OBJECT (cust_id VARCHAR2(80), time_id VARCHAR2(80), purchase_price NUMBER); /
Next, we define a purchase_price_type, which describes a table whose rows are of the purchase_price_type.
CREATE TYPE purchase_price_tbl AS TABLE OF purchase_price_type; /
We can then get the purchase_price along the customer and time dimensions using the OLAP_TABLE function, as follows:
SELECT * FROM TABLE(OLAP_TABLE( 'easydwolap duration session', 'purchase_price_tbl', '', 'measure purchase_price from purchase_price dimension cust_id from cust_id dimension time_id from time_id'));
The OLAP_TABLE function takes as parameters the name of an analytical workspace and its duration (easydwolap duration session), the type of the result (purchase_price_tbl) and OLAP DML commands to retrieve the various columns for the result. It also takes an optional limit parameter, which allows you to retrieve only partial data.
You can now create a relational view on this table function, as follows, so that applications can access this data without really needing to know about table functions and OLAP DML.
CREATE VIEW purchase_price_vw as SELECT * FROM TABLE(OLAP_TABLE( 'easydwolap duration session', 'purchase_price_tbl', '', 'measure purchase_price from purchase_price dimension cust_id from cust_id dimension time_id from time_id'));
This view is now queried like any other relational table. For instance, in the following example, we are aggregating the purchase_price using SUM.
SELECT time_id, SUM(purchase_price) as purchases FROM purchase_price_vw GROUP BY time_id; TIME_ID SUM(PURCHASE_PRICE) -------------------------------- 01JAN2002 19859.46 01FEB2002 19188.96 01APR2002 44660.79 ...
This was a very simple example, and you can get quite sophisticated with your use of table functions. With this feature, the application can now be completely unaware of whether the data being accessed is stored in a relational or a multidimensional format.
The OLAP API
The OLAP API is a set of Java classes that can be used to develop OLAP applications. It is very well suited to developing thin-client applications, which can be accessed with a Web browser. The OLAP API uses a multidimensional model for querying data; however, it internally translates these queries into SQL. To access objects using this API, you need to have defined the objects in the OLAP catalog, described in the next section. The underlying data may be stored either in relational tables or in analytical workspaces encapsulated within relational views.
A detailed discussion of the OLAP API is beyond the scope of this book. Instead, we will illustrate BI Beans, which are reusable components that use the OLAP API to access data. These can then be incorporated into custom OLAP applications.
In the next section, we will discuss the OLAP catalog, which defines metadata required by tools such as OLAP API, BI Beans, and the Analytic Workspace Manager Application.
9.1.6 The OLAP catalog
The OLAP catalog stores the metadata to specify the logical model of your data. If your application wants to have access to data using OLAP API, then it must be defined in the OLAP catalog.
The OLAP metadata consists of the following types of entities:
-
Dimensions: Dimensions are used to express relationships such as hierarchies in your data. Dimensions consist of levels, hierarchies, and level attributes. In addition, you can also specify some dimension attributes, such as its description.
-
Measures: A measure is a quantity that will be used in calculations, such as purchase price or cost.
-
Measure folders: A measure folder, also known as a catalog, is a convenient place to keep related measures together.
-
Cube: A cube defines how measures will be aggregated across one or more dimensions. In relational terms, it defines how to join your fact and dimension tables. A cube also specifies which hierarchies in the dimensions will be used to compute aggregations.
The OLAP catalog requires you to map these elements to relational tables and columns. Hence, if you have data in analytical workspaces, you must encapsulate access to the data within relational views, as described previously.
OLAP metadata can be defined in two ways:
-
Using the CWM and CWM2 packages
-
Using Oracle Enterprise Manager
The CWM packages correspond to the first version of the Common Warehouse Metadata, also known as CWMLite. It supports traditional dimensions, as defined in a star or a snowflake schema. A dimension defined using CWM API in the OLAP catalog has a SQL dimension object (described in Chapter 4) underlying it, in addition to OLAP metadata. Data warehouse design tools such as Oracle Warehouse Builder also allow you to automatically generate metadata according to the CWM specification.
CWM2 is the second version of CWM and provides advanced features, such as parent-child dimensions, ragged hierarchies (where not all data has the same lowest level in the hierarchy), same value mapping to different levels in different hierarchies, and NULL values in level columns.
At the time of writing, Oracle Enterprise Manager only provides a graphical interface to create the CWM version of metadata. If your data is stored in relational tables and follows a star or a snowflake schema, this should be sufficient for you.
We will now illustrate the use of Oracle Enterprise Manager to generate OLAP metadata. The CWM APIs are very easy to use and follow the same flow as the wizards in Enterprise Manager and hence we will not discuss them in this book.
Creating OLAP metadata in Oracle Enterprise Manager
To access the OLAP functionality in Oracle Enterprise Manager, expand the Database node, followed by the Warehouse node, and then the OLAP node. Oracle Enterprise Manager provides wizards to create a dimension and a cube. It also allows you to create measure folders.
If you have any SQL dimension objects defined for use with query rewrite (described in Chapter 4), you will automatically see them listed under Dimensions. If you click on one of them, you may get a window saying that some of the metadata needed for OLAP has not been created yet. At this point, you can ask it to continue. It will generate several CWM API calls, as shown in Figure 9.6. Be assured that your existing dimension object will not be harmed in any way by doing this. Once you press Apply, the CWM metadata will be created. You can now use this dimension to define your cube objects as discussed next.
Hint: | Under the OLAP options tab in Figure 9.6, you can set up descriptive names for dimensions, levels, and hierarchies. It is recommended that you provide all this information if you plan to use the BI Beans Wizards. |
Oracle Enterprise Manager allows you to define cube metadata that relates your dimensions and measures. We will walk through some of the steps performed to create a cube, using the Cube Wizard. You can access the wizard in the Object menu in Figure 9.6.
First, you will be asked to name the cube and also select the schema where it should be placed (not shown here). Next, you will be asked to choose the fact table that contains the measures to be stored in the cube, as shown in Figure 9.7. In this example, we have chosen EASYDW.PURCHASES table.
After this, you will be asked to pick the dimensions you would like to include into your cube, as shown in Figure 9.8. In our example, we have two dimensions, CUSTOMER_DIM and OLAP_TIME_DIM, which we assume were created earlier.
In the next screen, you will be asked to define how your dimensions join to your fact table. You must define this relationship for each dimension in the cube before you will be allowed to move on. You must also define a default hierarchy along which the cube will compute the measures. In Figure 9.9, we have specified that the CUSTOMER level in the CUSTOMER_DIM dimension joins to the fact table using the customer_id column.
The next step, shown in Figure 9.10, is to define the measures to be stored in the cube. You must name each measure and indicate what fact table column it corresponds to. By default any column in the fact table with a data type of NUMBER will be included in the measure list. Note that the default aggregation method used for each measure is SUM.
Finally, you will be shown a summary of the cube, as shown in Figure 9.11. You can press the Show SQL button to see the CWM APIs that will be used to generate the cube.
When you press Finish in Figure 9.11, the cube will be created. If you chose to invoke the OLAP Summary Advisor by checking the box at the bottom of Figure 9.11, it will then be invoked. The OLAP Summary Advisor uses the Summary Advisor described in Chapter 4 in a special way to generate materialized views to optimize the cube. However, these materialized views have some additional OLAP metadata associated with them, so these must not be confused with materialized views used for other applications. You must use the appropriate Summary Advisor for your application.
Hint: | The OLAP Summary Advisor is available when you right-click on a cube object in Oracle Enterprise Manager. The Summary Advisor, described in Chapter 4, is available by expanding the Summary Management node. |
Once the cube has been created, you can edit any of the cube's properties by right-clicking on that cube under the OLAP Management section. You can also browse the data in the cube using the Cube Viewer application in Oracle Enterprise Manager. If you select the cube you want to view and right-click on it—you will find the Cube Viewer in the pop-up menu. The Cube Viewer will display the results of any query in a cross-tabulation format, as shown in Figure 9.12. In this example, we are displaying the purchase_price measure for all products in Hants county. You can drill down into the data further by clicking on the arrows in the title cells.
The query whose results are to be displayed can be constructed or modified using the Query Builder from the File menu. The Query Builder is very similar to the one used by BI Beans, discussed later; hence, we will not delve into its details here. However, unlike BI Beans, where you can customize this report in a variety of ways, the Cube Viewer is just a simple application to peruse your data and is a convenient way to verify that the OLAP metadata has been defined correctly.
9.1.7 Analytical workspace creation tools
A situation likely to be encountered by many users is that they already have a relational warehouse and would like to build an analytical workspace for some of its special features, such as forecasting and what-if analysis. Oracle provides two simple tools to easily create an analytical workspace from a star or snowflake schema. These tools will ensure that the analytical workspace is created in such a way that it can be used by all of Oracle's analysis tools, such as OLAP API and BI Beans.
In this section, we will briefly describe the two tools provided for this purpose:
-
CWM2_OLAP_AW_CREATE, a PL/SQL package
-
Analytic Workspace Manager Application
To create an analytical workspace using either of these tools, your data must be in a star or a snowflake schema. Also, you must have defined CUBE and DIMENSION metadata in the OLAP catalog.
The CWM2_OLAP_AW_CREATE package allows you to:
-
Define dimensions in the analytical workspace, corresponding to a dimension in the OLAP Metadata, using the AW_DIMENSION_CREATE procedure.
-
Define variables, corresponding to a cube in the OLAP metadata, using the AW_CUBE_CREATE procedure.
-
Load data into the dimension and variables in the analytical workspace from the fact and dimension tables, using various load procedures.
-
Create relational views on top of the analytical workspace using the AW_DIMENSION_CREATE_ACCESS and AW_CUBE_CREATE_ACCESS procedures. This allows the analytical workspace to be accessed via SQL.
Oracle also provides a tool known as the Analytic Workspace Manager, which allows you to do the same using a graphical interface. We will demonstrate some of its features next.
Hint: | The Analytic Workspace Manager Application is available free of charge at Oracle Technology Network (http://otn.oracle.com). It is not part of the Oracle Database installation CD. |
The Analytic Workspace Manager Application allows you to create and edit dimensions and variables in an analytical workspace. For instance, in Figure 9.13, we are editing a variable named COST.
The Analytic Workspace Manager also allows you to browse the cubes and dimensions in the OLAP catalog. Also included is a wizard to create an analytical workspace from a cube defined in the OLAP catalog. Figure 9.14 shows the first screen of the wizard, where you select the cube you would like. In Figure 9.14, we have chosen the PURCHASES cube assumed to have been defined earlier.
Next, you will be asked to name the analytical workspace and choose a tablespace to store it, as shown in Figure 9.15.
Finally, if you would like to enable the use of this analytical workspace with OLAP API and BI Beans, you can check the option in Figure 9.16. This will generate various scripts to create relational views on top the analytical workspace elements. You can specify the directory where the scripts should be stored and run them at your convenience.
It is highly recommended that you make use of these tools to create analytical workspaces rather than creating them manually, which can be quite cumbersome.
9.1.8 BI Beans
Business Intelligence Beans (BI Beans) is a set of reusable components that allows you to rapidly develop OLAP applications. They are integrated into Oracle's JDeveloper product, where they can be created and customized using simple wizards. BI Beans can perform a wide range of tasks, such as connecting to a database, building queries to perform analytical calculations, and displaying the results in extremely powerful reports, tables, or graphs. These components can then be easily deployed as part of a Java or JSP application.
Note | To use BI Beans with 9i Release 2 OLAP Option, you must use JDeveloper 9.0.3 and BI Beans 2.7 or later. |
We will illustrate the capabilities of BI Beans in the following text. In these examples, we will assume that we have defined a cube called PURCHASE_DATA, which has two dimensions, PRODUCT and CUSTOMERS, and two measures purchase_price and shipping_charge, which are grouped into a measure folder called Purchase Measures. The descriptive attributes supplied when defining dimensions and cubes will be used to display various measures and dimensions in these wizards.
A novel aspect of BI Beans is that you have full access to the data as the component is being designed; so you can immediately see how the resulting graph or presentation will look. Before you start to build business intelligence components, you must set up a database connection using the Designer bean. Then you can use the Query Builder to define queries, Presentation Wizard to create a report or graph, and Calculation Builder to define calculations.
Creating reports
The Presentation Wizard, shown in Figure 9.17, allows you to create reports in a Crosstab, Graph, or Table format. We will first create a report of total sales in Hants county for various product categories. As we do different customizations with this report, you should begin to appreciate the power of BI Beans. In Figure 9.17, we have chosen to create a crosstab named EasyDWSalesAnalysis.
The next step, shown in Figure 9.18 is to choose which measures we want in the report. The Automatically add/remove dimensions box is checked, and, hence, the dimensions associated with the measure will be automatically included in the report. Note that some of the screens used by the Presentation Wizard are also common to the Query Builder Wizard, since ultimately a presentation is just a visual representation of a query.
The next step is to select how these dimensions will be laid out in the report. In a crosstab format, you can represent data along two dimensions—a row dimension and a column dimension and, optionally, a third page dimension. The wizard will choose a default layout for the dimensions. You can change the layout by a simple drag-and-drop operation to move the dimensions into the row, column, or page as desired. In Figure 9.19, since we have only two dimensions, we will keep customers as the columns and products along the rows and will not use the page dimension.
We then choose what levels or values of the dimension to include in the report, as shown in Figure 9.20. In our example, we have chosen only Hants county. Note that even though this is a design environment, the wizard shows you the actual values in your customer dimension.
On the right-hand side in Figure 9.20, is the Selected panel with two tabs—Steps and Members. The Members tab shows the values chosen for a particular dimension. The Steps tab shows how you determine the list of members. In our example, we only have one member, Hants county. Later, we will discuss how you can add members according to specific conditions.
We could continue on and customize the product dimension, but for now let's just press the Finish button and customize it later. The resulting report will look like that shown in Figure 9.21. You may wonder what is so special about this report. Well, you will notice that there are small arrows in the title cells for ALL_PRODUCTS, ELEC, HDRW, and MUSC. If we clicked on ELEC to expand it, the bean would automatically query the data to drill down to the individual product level. In fact, we have expanded the ALL_PRODUCTS cell to drill down to the individual categories. If you include this into a JSP application, the users now have an interactive report where they can roll up and drill down the hierarchy. Along the customer dimension, we have chosen Hants county as the top level, which means we can drill down to towns and individual customers in this county.
Let's now see how we can edit and customize this report. In Figure 9.21, there is a toolbar just above the window where the report is displayed. If we click on the first button in this toolbar (with the tooltip Show Query Builder), we get the Query Wizard, shown in Figure 9.22, where we can edit all aspects of the report.
Earlier we mentioned Steps and Members; let us now discuss the Conditions tab. Each dimension in the report can be customized to include members according to a certain condition. The wizard will automatically show you some useful conditions pertinent to the data. In Figure 9.22, we have chosen "Top 10 based on Purchase Price." The steps indicate that we will start with ALL_PRODUCTS and then add the top ten products. So the report will now consist of data for ALL_PRODUCTS (from which we can drill down) and also the top ten products, as shown in Figure 9.23.
Data-driven formatting
As with all reporting tools, you can customize this report to add titles, change fonts, and set colors. The Customizer, shown in Figure 9.24, is available by clicking the button next to the Query Wizard in the toolbar above the report. We have added a title and a subtitle to our report.
Another interesting aspect of BI Beans is that you can actually format the report using the actual data. For instance, we would like the report to highlight products that are not performing well. For this, we will go to the Format tab and click the Data button to get the window shown in Figure 9.25. We will create a format named "Losers," which sets the background color Gray for all cells in the report where the total purchase_price is less
than 8,000. To do this we go to the Rules tab, shown in Figure 9.26, and edit the data to add the required conditions. For instance, we are setting the condition purchase_price < 8,000, as shown in Figure 9.27.
The resulting report is shown in Figure 9.28. Hopefully, you now have an appreciation of how easy it is to create very nice and powerful reports. Even though we are showing these reports in operation within JDeveloper, you are seeing these reports exactly as they would appear if included in your application.
Graphs
The Presentation Wizard can also be used to create graphs. The steps used to pick dimensions and measures are identical to those for the crosstab. Figure 9.29 shows the various types of graphs available.
As with crosstabs, a graph can display data along multiple dimensions, as shown in Figure 9.30. Instead of rows and columns, in a bar graph you have bars representing one dimension and groups representing a second dimension. You can also have a page dimension as a third dimension.
Figure 9.31 shows a graph created using the presentation bean, where we see sales by customer county. In this example, we have chosen customer counties as groups and products as bars.
Note that this graph also has drill down capability—if you click on the bars, it will drill down to the next level, which is the product category, as shown in Figure 9.32. Now you can see the most popular product category in each county. You can also customize the graph using the data—for instance, we have chosen specific colors for each category value.
Calculations
The examples so far have produced reports from the base data; however, most OLAP applications will need to perform calculations. These calculations may involve analytical functions, such as ranking, reporting aggregates, cumulative sums, and moving averages. The Calculation Builder allows you to specify these calculations using measures in your cube. In Figure 9.33, we are creating a share calculation, which computes the ratio of a quantity with respect to the total (recall the RATIO_TO_REPORT function discussed in Chapter 3).
On pressing the Next button, you get the screen shown in Figure 9.34, where you specify the parameters for the calculation. In this example, we have specified the measure purchase price and indicated that the ratio would be computed for each level in the products hierarchy with respect to the total. So we will be able to find out the market share of sales for each product category.
These calculations will then be available to be included in presentations such as graphs or reports. For instance, in Figure 9.35, we have plotted a pie graph of sales for each category for each county.
9.1.9 Summary
In this section, we have given you an overview of the components and capabilities of Oracle 9i OLAP. Regardless of whether you store your data in relational or multidimensional format, the Oracle database provides all the tools and techniques required for advanced business analysis applications.