Multidimensional Databases: Problems and Solutions

Less common is incompleteness in the metadata. The metadata in a multidimensional database are the dimensions, categories, and units. Often, the metadata can be stored as a set of mapping tables, which encodes the relationship between a finer and a coarser category. The set of mapping tables for a dimension specifies the hierarchy for that dimension. A mapping table has two columns, labeled Fine and Coarse as shown in Figure 7. The mapping in the table is from individual kinds of Items sold in a store to Food Groups. In the mapping table, each unit in the Fine category is associated with some unit in the Coarse category. As already discussed, in order to be summarizable, all such relationships must be totaled on both sides, and many-to-one from the finer to the coarser category (Lenz & Shoshani, 1997). The example table shown in Figure 7 has several problems (Pedersen et al., 1999, 2000).

Figure 7: A Mapping Table from Items to Food Groups

The problem of incomplete information in metadata is that it makes the hierarchy non-summarizable. Proper management of the incomplete information can restore the hierarchy. Each case is considered below in detail.

Non-covering hierarchies: Figure 8(a) shows an example of a non-covering hierarchy. The problem is subtle. There is no Food Group unit to which "Charcoal" belongs; but, skipping a level, "Charcoal" does belong to the "Picnic Supplies" unit in the Store Aisles category. The non-covering hierarchy is a problem in a semi-eager multidimensional model that materializes Food Groups. Even though Food Groups is a finer category than Store Aisles, the materialized values for Food Groups cannot be used to compute Store Aisles since "Picnic Supplies" will be undercounted. The solution is to complete the non-covering mapping by adding a hidden unit as shown in Figure 8(b). The unit "Hidden PS" is added to the Food Groups category. The unit is only seen by the system, that is, it is hidden from a user who queries for Food Group measures. At most one hidden unit is added for each non-covering violation.

Figure 8: Repairing a Non-Covering Mapping

Non-onto hierarchies: Techniques for repairing non-onto hierarchies (where the coarse category does not totally participate) are similar to those for completing non-covering hierarchies. Here, the hierarchy is padded with finer, hidden values below the existing values, e.g., a hidden "Grass" value is inserted into the Item level and mapped to "Grass" in the Food Groups level to make the hierarchy onto.

Non-strict hierarchies: A non-strict hierarchy is a many-to-many mapping between two categories. In general, non-strict mappings can be handled by introducing new units in the hierarchy. The example presented above had two separate cases of non-strict mappings. The first case is that tomatoes are considered to be both a fruit and a vegetable. This could lead to overcounting, as shown in Figure 9(a). "Tomato" contributes to both "Fruit" and "Vegetable," and so is counted twice at "Produce." The second case is that a "Monstero" could be considered a "Fruit" or a "Vegetable." This could lead to overcounting (both mappings are possible) or undercounting (neither mapping is definite). The technique to repair the non-strictness in both cases is the same.

Figure 9: Repairing a Non-Strict Hierarchy

To repair a non-strict hierarchy, two goals must be simultaneously attained. First, the counting problems must be fixed. In Figure 9(a) the counting problem manifests itself as an overcounting. The count at the "Produce" unit should be two rather than three since the produce is one "Tomato" and one "Apple." The second goal to achieve is the correct display of values at a coarse category. When data at a coarse category is viewed, it must include the multiple mapped unit(s) at the finer category. In the example, if the Food Group units are viewed then the count for "Vegetable" should be one, and for "Fruit," it should be two since a "Tomato" is both a fruit and a vegetable.

To solve the counting problem, a new (hidden) unit is added for each non-strict mapping. In Figure 9(b) the overcounting has been corrected by adding a "Vegetable and Fruit" unit. "Tomato" maps to the new unit. The solution to the second problem is to split each unit into a visible unit and a hidden unit. The hidden units are used to aggregate higher up the hierarchy. In Figure 9(b) the hidden units are shaded in gray. Note that "Tomato" does not map to the hidden "Vegetable" or "Fruit" units. The visible units are used for display purposes only. They are the original non-strict mappings. The visible units are shaded white in the Food Groups category of Figure 9(b). The relationship between the visible units in Food Groups and the units in Items is the same in both Figure 9(a) and Figure 9(b).

Категории