Designing Relational Database Systems (Dv-Mps Designing)
Various extensions to relational algebra have been proposed since the relational model was first formulated. We'll look at three that have been generally accepted: summarize, extend, and rename. We'll also look at three extensions provided by Microsoft: transform, rollup, and cube.
Summarize
The summarize operator does precisely what one would expect it to do: it produces records containing summary data grouped according to the specified fields. It's an extremely useful operation in any number of situations in which you want to examine data at a higher level of abstraction than is stored in the database.
The summarize operation is implemented using the GROUP BY clause of the SELECT statement. There will be one record returned for each distinct value in the specified field or fields. If more than one field is listed, groups will be nested. For example, consider the following statement:
SELECT Categories.CategoryName, Products.ProductName, SUM([Order Details].Quantity) AS SumOfQuantity FROM (Categories INNER JOIN Products ON Categories.CategoryID = Products.CategoryID) INNER JOIN [Order Details] ON Products.ProductID = [Order Details].ProductID GROUP BY Categories.CategoryName, Products.ProductName; |
This statement will return one record for each product in the Northwind database, grouped by category and containing three fields: CategoryName, ProductName, and SumOfQuantity—the total number of each product sold—as shown in Figure 5-12.
Figure 5-12. The GROUP BY clause returns summary data.
The fields listed in <fieldList> in the SELECT statement must be either part of the <groupFieldList> or an argument to a SQL aggregate function. SQL aggregate functions calculate summary values for each record. The most common aggregate functions are AVERAGE, COUNT, SUM, MAXIMUM, and MINIMUM.
Aggregates are another place where nulls can bite you. Null values are included in the summarize operation—they form a group. They are, however, ignored by aggregate functions. This is usually only a problem if you're using one of the fields in the <groupFieldList> as the parameter to an aggregate function.
Extend
The extend operator allows you to define virtual fields that are calculated based on constants and values stored in the database but that are not physically stored themselves. You create virtual fields simply by defining the virtual field in the <fieldList> of the SELECT statement, as follows:
SELECT [UnitPrice]*[Qty] AS ExtendedPrice FROM [Order Details]; |
The calculations defining the virtual fields can be of arbitrary complexity. This process is so simple and fast, there is rarely any justification for storing a calculated field in a table.
Rename
The final common operator is rename. The rename operation can be performed on either a recordset in <recordsetList> or on individual fields in <fieldList>. In the Jet database engine, a recordset renaming uses the following syntax:
SELECT <fieldName> AS <fieldAlias> FROM <tableName> AS <tableAlias> |
In SQL Server, the "AS" keyword is not necessary, as shown below:
SELECT <fieldName> <fieldAlias> FROM <recordsetName> <recordsetAlias> |
Renaming is particularly useful when you're defining a view with a self-join, as shown in the following code:
SELECT Manager.Name, Employee.Name FROM Employees AS Employee INNER JOIN Employees AS Manager ON Employee.EmployeeID = Manager.EmployeeID; |
This syntax allows you to keep each usage logically distinct.
Transform
The TRANSFORM statement is the first of the Microsoft extensions to the relational algebra that we'll examine. TRANSFORM takes the results of a summarize (GROUP BY) operation and rotates them 90 degrees. More often referred to as a crosstab query, this incredibly useful operation is only supported by the Jet database engine; it has not (yet) been implemented in SQL Server.
The TRANSFORM statement has the following basic syntax:
TRANSFORM <aggregateFunction> SELECT <fieldList> FROM <recordsetList> GROUP BY <groupByList> PIVOT <columnHeading> [IN (<valueList>)] |
The TRANSFORM <aggregateFunction> clause defines the summary data that will populate the recordset. The SELECT statement must include a GROUP BY clause and cannot include a HAVING clause. As with any GROUP BY clause, the <groupByList> can contain multiple fields. (In a TRANSFORM statement, the <fieldList> and <groupByList> expressions are almost always identical.)
The PIVOT clause identifies the field whose values will be used as column headings. By default, the Jet database engine will include the columns in the recordset alphabetically from left to right. The optional IN statement, however, allows you to specify column names, which will be listed in the order in which they're included in <valueList>.
The TRANSFORM statement below provides essentially the same information as the summarize example given previously, the results of which are shown in Figure 5-12, above.
TRANSFORM Count(Products.ProductID) AS CountOfProductID SELECT Suppliers.CompanyName FROM Suppliers INNER JOIN (Categories INNER JOIN Products ON Categories.CategoryID = Products.CategoryID) ON Suppliers.SupplierID = Products.SupplierID GROUP BY Suppliers.CompanyName PIVOT Categories.CategoryName; |
The results of this TRANSFORM operation are shown in Figure 5-13.
Figure 5-13. The TRANSFORM statement rotates results by 90 degrees.
Rollup
The summarize operator implemented using the GROUP BY clause generates records containing summary data. The ROLLUP clause provides a logical extension to this operation by providing total values.
The ROLLUP clause is only available in SQL Server. It is implemented as an extension to the GROUP BY clause:
SELECT Categories.CategoryName, Products.ProductName, SUM([Order Details].Quantity) AS SumOfQuantity FROM (Categories INNER JOIN Products ON Categories.CategoryID = Products.CategoryID) INNER JOIN [Order Details] ON Products.ProductID = [Order Details].ProductID GROUP BY Categories.CategoryName, Products.ProductName WITH ROLLUP; |
This results in the recordset are shown in Figure 5-14.
This is again the same recordset shown in Figure 5-12, with additional rows: the rows containing Null (one is shown in the figure) contain the total values for the group or subgroup. Thus, 8,137 beverages were sold in total.
Figure 5-14. The ROLLUP operator adds totals.
Cube
The CUBE operator is also available only in SQL Server and is implemented as an extension to the GROUP BY clause. Essentially, the CUBE clause summarizes every column in the <groupByList> by every other column. It is conceptually similar to the ROLLUP operator, but whereas ROLLUP produces totals for each column specified in the <groupByList>, CUBE creates summary data for additional groups.
For example, if you have three fields in the <groupByList>—A, B, and C—the CUBE operator will return the following seven aggregates:
- The total number of Cs.
- The total number of Cs, grouped by A.
- The total number of Cs, grouped by C within A.
- The total number of Cs, grouped by B within A.
- The total number of Cs, grouped by B.
- The total number of Cs, grouped by A within B.
- The total number of Cs, grouped by C within B.