Multidimensional Databases: Problems and Solutions

In this section we set up a formal framework for source integration in Data Warehousing. In particular, our main goal is to define the notion of source integration systems, which is intended to represent the component of a data warehouse system dealing with the task of integrating the sources of information for the data warehouse system. We characterize a source integration system as constituted by three elements, namely, the global schema, the sources, and the mapping between the two. Finally, we provide the semantics both of the system, and of query answering.

The formal definition of a source integration system is given below.

Definition 1: A source integration system I is a triple where is the global schema, is the source schema, and is the mapping between and .

The following comments on the above formal definition are in order.

Let us turn our attention to the semantics of a source integration system . We assume that the databases involved in our framework (both global databases and source databases) are defined over a fixed (infinite) alphabet Γ of symbols. In order to assign semantics to I, we start by considering a source database for I, i.e., a database for the source schema . Based on , we can specify which is the information content of the global schema at the extensional level. We call global database for I any database for .

Definition 2: Let be a source integration system, and a source database for I. A global database for I is said to be legal for I with respect to , if:

The notion of satisfying the mapping with respect to depends on the type of the mapping considered, GAV or LAV.

Queries posed to a source integration system I are expressed in terms of a query language over the alphabet , i.e., over the global schema. In the following, if is a database, and q is a query, then denotes the result of evaluating q over the .

Definition 3: Let be a source integration system, a source database for I, and q a query of arity n to I. The answer to q with respect to , is the set of tuples (c1, ∊, cn)∊Γn such that for each global database legal for I with respect to .

Since, in general, several global databases exist that are legal for I with respect to , in the terminology of source integration, is often called the set of certain answers of q with respect to .

As we said in the introduction, the main activities that are carried out in the design of a source integration system are: schema integration, data integration, and data cleaning. To relate these activities to the formalization presented in this section, we observe that:

Категории