Multidimensional Databases: Problems and Solutions

We now present a simple multidimensional data model "MD" that provides a number of constructs to describe, in an abstract but natural way, the basic notions involved in multidimensional analysis. As is customary in database models, we make a clear distinction between the scheme (which specifies the structure of a concept) and the instance (that is, the actual values associated with a concept).

Formal Definition of MD

We assume the existence of a finite set of base types such as text, integer, decimal, and date. Each base type t is associated with a domain of base values of that type. We also assume the existence of a countable set of names and a countable set of identifiers (ids). Such ids are values, distinct from base values, that are used to uniquely identify real-life objects.

A dimension has three main components: a set of levels, a set of level descriptions, and a hierarchy over the levels.

Definition 1 [Dimension scheme] An MD dimension scheme D consists of:

There is a natural graphical representation of an MD dimension. Some examples are reported in Figure 2. In this representation, levels are depicted by means of round-cornered boxes and there is a direct arc between the two levels l1 and l2 if l1 ≤ l2. Small diamonds depict the descriptions of a level.

Figure 2: Dimension Scheme in the MD Model

Example 3: Figure 2 reports the dimensions for the Toys4All company, as described in Example 1: Time, Product, Store, Promotion, and Warehouse.

As an example, let us consider in more detail the Time dimension. Its levels are day, month, quarter, year, and season. The roll-up relation of Time is the reflexive and transitive closure of the sets of pairs (day, month), (month, quarter), (quarter, year), and (day, season). Thus, for instance, the level day rolls-up to the level month, but also to the level year. Descriptions of the level day are date, day-of-week (mapping each day to the name of the corresponding day), day-number-in-month (mapping each day to the number of the day within its month), and day-number-overall (coding days in consecutive day numbers).

Let us now state precisely what is an instance of a dimension scheme.

Definition 2 [Dimension instance]: An instance of a dimension D=(L, Δ, ≤) consists of:

The roll-up functions of a dimension instance must satisfy the following consistency conditions.

Condition 1 [Consistency of roll-up]: The family of roll-up functions of a dimension are consistent if:

  1. for each level l, the function is the identity on the members of l; and

  2. if a level l1 rolls-up to l2 in different ways (e.g., rolling-up through either l' or l“), then the members of l1 roll-up to elements of l2 in a consistent way, that is:

    for each member m of l1.

Note that, as is customary in conceptual models, a member of a dimension level is not a value but is the object itself (e.g., a member of the store level is the actual building, not its name and address). In fact, although this object has an id and a number of values (the descriptions) associated with it, its existence and identity are clearly independent of them.

We are now ready to introduce the general notion of multidimensional database scheme. This has two main components: a collection of dimensions and a number of data cube schemes, which are defined over levels of the dimensions.

Definition 3 [Multidimensional Scheme]: A multidimensional scheme consists of:

Note that in MD there is a uniform treatment of measures and dimensions, as a measure can be not only a simple value but also a level of a dimension. This allows the analyst to transform measures into attributes and vice versa (Cabibbo & Torlone, 1998b), an important functionality that any OLAP system should have (Pedersen, 2000).

Data cube schemes can also be naturally represented by means of diagrams. An example that refers to the dimensions in Figure 2 is given in Figure 3: facts are represented by boxes and measures by circles.

Figure 3: Two Data Cube Schemes Over the Dimensions in Figure 2

Example 4: A multidimensional scheme for the business processes of the Toys4All Company described in Example 1: and Example 2: can be defined using the dimension schemes of Example 3: Specifically, two data cubes, Sales and Inventory, can be used to model the sale process and the warehouse process respectively. The schemes of these data cubes are represented graphically in Figure 3.

The data cube Sales describes daily sales, detailed by item, store, and promotion. Its attributes are time (at the day level of the time dimension, describing the day in which the sale occurred), item (the product sold), store (the store having sold the product), and promotion (the promotion applied to the sale). Its measures are unit-sales (the number of items sold), euro-sales (the income of the sale, in Euros), and euro-cost (the cost price of the items sold).

The data cube Inventory is instead used to represent the inventory levels of the various products, detailed by warehouse and month. Specifically, inventory levels are measured at the end of each month. The measures of this data cube are quantity-on-hand (the quantity in stock of a product at the end of the month), quantity-shipped (the quantity shipped from the warehouse during the month), and value-at-cost (the value of the quantity in stock, at cost price).

Before introducing the notion of instance of a data cube scheme, two preliminary notions are needed.

Let D = (D,F) be a multidimensional scheme, f[A1 : l1,…, An : ln] → [M1 : m, …, Mk : mk] be a data cube scheme in F, and d be an instance of D.

Definition 4 [Conceptual coordinate]: A(conceptual) coordinate for f over d is a tuple over the attributes of f, that is, a function mapping each attribute Ai to a member of the level li occurring in d.

Definition 5 [Fact]: A fact for f over d is a tuple over the measures of f, that is, a function mapping each measure name Mj to either a value (if mj is a base type) or a member in d (if mj is a level).

We are now ready to introduce the notion of instance of a multidimensional scheme.

Definition 6 [Instance of multidimensional scheme]: An instance of a multidimensional database scheme (D,F) is composed of:

An entry of a data cube c is a coordinate over which the instance of c is defined.

Example 3: A possible instance for the multidimensional scheme of Example 4: is shown in Figure 4. In this example, level members are represented by their ids.

Figure 4: A Sample Instance Over the Multidimensional Scheme of Example 4:

A coordinate over the data cube scheme Sales is, for example,

where d423 is, for instance, the id associated with the physical item at hand.

The actual instance associates with this entry the value 2 for the measure unit-sales, the value 19.98 for the measure euro-sales, and the value 14.98 for the measure euro-cost.

In Figure 4, data cubes are graphically represented as a table. This representation suggests how data cubes can be implemented using the relational model: a data cube over a scheme f can be represented by a relation over the attributes of f, with additional columns for the measures. The attributes of f form the key of the relation. In practice, a data cube having n attributes and m measures can also be represented by means of an n-dimensional array in which each (non-null) entry corresponds to an entry of f and is associated with an m-tuple of measures. This representation recalls the way in which multidimensional systems usually store data, thus confirming that the MD is a conceptual model which describes multidimensional data independently of any specific (logical) implementation.

It is apparent that the notation we have used for coordinates resembles subscripting into a multi-dimensional array (although in a non-positional way). However, there is an important difference between data cubes and multi-dimensional arrays. Specifically, in arrays, "physical" coordinates vary over intervals within (linearly ordered) domains of values, whereas domains over which coordinates range in the MD model are conceptual entities. In this sense, our notion of coordinate is "conceptual."

Roll-up functions are a distinctive feature of the model proposed: they describe intentionally how members of different levels are related. This description is independent of any effective implementation: roll-up functions can be implemented by means of materialized relations, built-in functions, or external procedures. Moreover, roll-up functions provide a powerful tool for querying multidimensional data, as they can be used to specify how data can be grouped, and how data cubes involving data at different levels of granularity can be joined (Cabibbo & Torlone, 1997, 1998b).

Basic Features of a Multidimensional Model

The MD data model presented in the previous section exhibits those fundamental features that any multidimensional model should include in some form in order to be suitable for OLAP applications. According to Pederson (2000) and Blaschka et al. (1998), these "mandatory" features can be summarized as follows.

Advanced Features of a Multidimensional Model

There are a number of further advisable features that a conceptual multidimensional model should support. We have classified these features as "advanced" because they model concepts that either: i) are difficult to represent in a simple way (such as the notion of "summarizability"), or ii) serve to capture specific application cases. Adopting once more a terminology inherited from Pedersen (2000) and Blaschka et al. (1998), these basic features can be summarized as follows.

Категории