(ed.) Intelligent Agents for Data Mining and Information Retrieval

We have identified selection cases and classified DTD types in the above sections. Now, we can briefly summarize the relationships between selection cases and DTD types as follows :

Theorem 2

For a given user query q , the database selection in a homogeneous DTD may be either a non-conflict selection case or a disjoint selection case.

 

Proof: In a homogeneous DTD, ˆ S i , S j ˆˆ S ( 1 ‰ i, j ‰ n, i ‰  j ), I i = I j , W i = W j . If:

  1. Suppose C i ˆ C j ‰  ˜, c k ˆˆ C i ˆ C j ( 1 ‰ k ‰ p ), D ik = D jk , is valid since they use the same indexing method and the same term weight scheme to evaluate the usefulness of the databases. Then, Simi Li (D ik , q) = Simi Lj (D jk , q) is true. So, the database selection in this homogeneous DTD is a non-conflict selection case (recall Definition 11).

  2. Suppose C i ˆ C j = ˜ is valid. Then, the database selection in this homogeneous DTD is a disjoint selection case (recall Definition 8).

Theorem 3

Given a user query q, for a partially homogeneous DTD, or a partially heterogeneous DTD, or a heterogeneous DTD, any potential selection case may exist.

 

Proof: In a partially homogeneous DTD, or a partially heterogeneous DTD, or a heterogeneous DTD, ˆ S i , S j ˆˆ S ( 1 ‰ i, j ‰ n, i ‰  j ), ˆƒ 1 ‰ i, j ‰ n, i ‰  j, I i ‰  I j or ˆƒ 1 ‰ i, j ‰ n, i ‰  j, W i ‰  W j is true. If:

  1. Suppose C i ˆ C j ‰  ˜, c k ˆˆ C i ˆ C j ( 1 ‰ k ‰ p ), D ik = D jk , is valid, but since the databases employ different index methods or different term weight schemes, Simi Li (D ik , q) = Simi Lj (D jk , q) is not always true. So, the selection case in these three DTDs is either a conflict selection case or a non-conflict selection case.

  2. Suppose C i ˆ C j = ˜ is valid. Then, the database selection in these three DTDs is a disjoint selection case.

By combining the above two cases, we conclude that any potential selection case may exist in all the DTD types except the homogeneous DTD.

Категории