Designing Relational Database Systems (Dv-Mps Designing)

The next four operators of relational algebra are based on traditional set theory. They have, however, been modified slightly to account for the fact that we're dealing with relations, not undifferentiated sets.

Union

Conceptually, a relational union is the concatenation of two recordsets. It's more or less the relational version of addition. The result of the union of recordset A with recordset B is the same as actually adding all the records in A to B.

As an example, say you need a list of all the names and addresses known to the database for a mass mailing. The Northwind database Customers and Employees recordsets both have addresses and so can easily be combined in a union operation. In this case, we'd use the UNION statement, as follows:

SELECT CompanyName AS Name, Address, City, PostalCode FROM Customers UNION SELECT [FirstName] & " " & [LastName] AS Name, Address, City, PostalCode FROM Employees ORDER BY name;

Note that the CompanyName field is renamed "Name" and the FirstName and LastName fields from the Employees table are concatenated. The resulting field is also "Name." The union query doesn't require that the fields in the <fieldList> of each SELECT statement have the same name, but there must be the same number of them and they must have the same (or compatible) types. The results of this statement in Access are shown in Figure 5-8.

Figure 5-8. The UNION statement combines the records from each table.

Intersection

The intersection operator returns the records that two recordsets have in common. It is, in essence, a "find the duplicates" operation, and that's the way it's most often used. An intersection is implemented using outer joins.

As an example, suppose that you have inherited client lists from several legacy systems, as shown in Figure 5-9.

Figure 5-9. Legacy tables often have duplicate data.

The following SELECT statement will return the duplicate records:

SELECT DuplicateCustomers1.* FROM DuplicateCustomers1 LEFT JOIN DuplicateCustomers2 ON (DuplicateCustomers1.CustomerID = DuplicateCustomers2.CustomerID) AND (DuplicateCustomers1.CompanyName = DuplicateCustomers2.CompanyName) WHERE (((DuplicateCustomers2.CustomerID) IS NOT NULL));

The results of this statement are shown in Figure 5-10.

Figure 5-10. An outer join combined with the IS NOT NULL operator performs an intersection.

Difference

While the intersection of two recordsets is used to "find the duplicates," the difference operator will "find the orphans." The relational difference of two recordsets is the records that belong to one recordset but not the other.

As an example, given the same two recordsets shown in Figure 5-9, the SELECT statement below will return the unmatched records:

SELECT DuplicateCustomers1.* FROM DuplicateCustomers1 LEFT JOIN DuplicateCustomers2 ON (DuplicateCustomers1.CustomerID = DuplicateCustomers2.CustomerID) AND (DuplicateCustomers1.CompanyName = DuplicateCustomers2.CompanyName) WHERE (DuplicateCustomers2.CustomerID IS NULL);

The outer join operation in this statement returns all the records from the two lists. As you will recall, an outer join supplies Null for the fields that do not have a match in the other table. The WHERE clause uses the IS NULL operator to restrict the records returned to only those (unmatched) records.

If this all seems as clear as mud, try performing the operation in two discrete steps: first create the outer join as a view, and then restrict the view with the WHERE statement. This process is shown in Figure 5-11.

Figure 5-11. The difference operation can be performed in two steps.

Cartesian Product

The final set operator is the Cartesian product. Like its counterpart in traditional set theory, the Cartesian product of two recordsets combines every record in one set with every record in the other.

The Cartesian product (or just "product") of two recordsets is returned by a SELECT statement with no JOIN clause. The statement below will return every customer combined with every customer service representative:

SELECT CustomerName, CSRName FROM Customer, CSRs;

Cartesian products are occasionally useful either for analysis purposes or as interim results for further manipulation. Most often, though, they're produced by accident. Forget to drag the join line in the Access query designer and bingo, you've got a Cartesian product. It's amazingly easy to do, so don't be embarrassed the first (dozen) times it happens to you.

Категории