Database in Depth: Relational Theory for Practitioners
Exercise 5-1.
From a relational perspective, what's wrong (if anything) with the following SQL expressions? a. SELECT * FROM S, SP b. SELECT S.SNO, S.CITY FROM S c. SELECT SP.SNO, SP.PNO, 2 * SP.QTY FROM SP d. SELECT P.PNO FROM P e. SELECT S.SNO FROM S, SP f. SELECT S.SNO FROM S ORDER BY S.CITY DESC
Exercise 5-2.
Closure is important in the relational model for the same kind of reason that numeric closure is important in ordinary arithmetic. In arithmetic, however, there's one situation where the closure property breaks down, in a sense namely, division by zero. Is there any analogous situation in the relational algebra? Exercise 5-3.
Given the usual suppliers-and-parts database, what's the value of the expression JOIN{S,SP,P}? What's the corresponding predicate? Warning: There's a trap here; in fact, some might even argue that the trap is such that it makes join (as I've defined it in the body of this chapter) a rather dangerous operation. What do you think of that argument? Exercise 5-4.
Why do you think the project operator is so called? Exercise 5-5.
Given our usual sample values for the suppliers-and-parts database, what values do the following expressions denote? In each case, give an informal interpretation of the expression in natural language. a. S SEMIJOIN ( SP WHERE PNO = PNO('P2') ) b. P NOT MATCHING ( SP WHERE SNO = SNO('S2') ) c. S { CITY } MINUS P { CITY } d. ( S { SNO, CITY } JOIN P { PNO, CITY } ) { ALL BUT CITY } e. JOIN { ( S RENAME ( CITY AS SC ) ) { SC } , ( P RENAME ( CITY AS PC ) ) { PC } } Exercise 5-6.
Union, intersection, product, and join are all both commutative and associative. Verify these claims. Also verify that semijoin is associative but not commutative. Exercise 5-7.
In what circumstances (if any) are r SEMIJOIN s and s SEMIJOIN r equivalent? Exercise 5-8.
Which of the algebraic operators (if any) have a definition that doesn't rely on tuple equality? Exercise 5-9.
The SQL FROM clause FROM t1, t2, . . . , tn (where each ti is an expression denoting a table) returns the cartesian product of its arguments. But what if n = 1? What's the cartesian product of just one table? And by the way, what's the product of t1 and t2 if t1 and t2 both contain duplicate rows? Exercise 5-10.
Show that rename isn't primitive. Exercise 5-11.
Distinguish between a "summary" and an aggregate operator. What do the SQL analogs of these constructs look like? Exercise 5-12.
Give an expression involving EXTEND instead of SUMMARIZE that's logically equivalent to the following: SUMMARIZE SP PER ( S { SNO } ) ADD ( COUNT ( ) AS NP ) Exercise 5-13.
Given our usual sample values for the suppliers-and-parts database, what values do the following expressions denote? In each case, give an informal interpretation of the expression in natural language. a. EXTEND S ADD ( 'Supplier' AS TAG ) b. EXTEND ( P JOIN SP ) ADD ( WEIGHT * QTY AS SHIPWT ) c. EXTEND P ADD ( WEIGHT * 454 AS GMWT, WEIGHT * 16 AS OZWT ) d. EXTEND S ADD ( COUNT ( ( SP RENAME ( SNO AS X ) ) WHERE X = SNO ) AS NP ) e. SUMMARIZE S BY { CITY } ADD ( AVG ( STATUS ) AS AVG_STATUS )
Exercise 5-14.
Which of the following expressions are equivalent? a. SUMMARIZE r PER ( r { } ) ADD ( COUNT ( ) AS CT ) b. SUMMARIZE r ADD ( COUNT ( ) AS CT ) c. SUMMARIZE r BY { } ADD ( COUNT ( ) AS CT ) d. EXTEND TABLE_DEE ADD ( COUNT ( r ) AS CT )
Exercise 5-15.
Simplifying just slightly, an aggregate operator invocation in Tutorial D takes the form: <agg op name> ( <relation exp> [, <attribute name> ] )
If the <agg op name> is COUNT, the <attribute name> is irrelevant and must be omitted; otherwise, it can be omitted if and only if the <relation exp> denotes a relation of degree one, in which case the sole attribute of the result of that <relation exp> is assumed by default. Here are a couple of examples: SUM ( SP WHERE SNO = SNO('S1'), QTY ) SUM ( ( SP WHERE SNO = SNO('S1') ) { QTY } )
Note the difference between these two expressions: the first gives the total of all shipment quantities for supplier S1, and the second gives the total of all distinct shipment quantities for supplier S1. How does the scheme just outlined differ from its SQL counterpart? Exercise 5-16.
In Tutorial D, if the argument to an aggregate operator invocation happens to be an empty set, COUNT returns zero and so does SUM; MAX and MIN return, respectively, the lowest and the highest value of the applicable type; AND and OR return TRUE and FALSE, respectively; and AVG raises an exception (I ignore Tutorial D's other aggregate operators here deliberately). What does SQL do in these circumstances and why? Exercise 5-17.
Let relation R4 from Figure 5-1 denote the current value of some relvar R. If R4 has the meaning described in Chapter 2, give the predicate for that relvar R. Exercise 5-18.
Let r be the relation denoted by the following expression: SP GROUP ( { } AS X ) What does r look like, given our usual sample value for SP? Also, what does the following expression yield? r UNGROUP ( X ) Exercise 5-19.
Without using the IS_EMPTY shorthand, write a Tutorial D expression that returns TRUE if the current value of the parts relvar P is empty and FALSE otherwise. Also show an SQL analog of that expression. Exercise 5-20.
Write Tutorial D and/or SQL expressions for the following queries on the suppliers-and-parts database:
Exercise 5-21.
Prove the following statements (making them more precise, where necessary):
Exercise 5-22.
Union is said to be idempotent, because r UNION r is identically equal to r for all r. (Is this true in SQL?) As you might expect, idempotence can be useful in expression transformation. Which other relational operators (if any) are idempotent? Exercise 5-23.
Let r be a relation. What does the expression r{ } mean? What does it return? Exercise 5-24.
The following boolean expression: x > y AND y > 3 (which might be part of a query) is clearly equivalent to and can therefore be transformed into the following: x > y AND y > 3 AND x > 3
The equivalence is based on the fact that the comparison operator ">" is transitive. The transformation is worth making if x and y are from different relations, because it enables the system to perform an additional restriction (using x > 3) before doing the greater-than join implied by x > y. As we saw in the body of the chapter, doing restrictions early is generally a good idea; having the system infer additional "early" restrictions, as here, is also a good idea. Do you know of any commercial products that actually perform this kind of optimization? Exercise 5-25.
Consider the following expressions: a. WITH ( P WHERE COLOR = 'Purple' ) AS PP , ( SP RENAME ( SNO AS X ) ) AS T : S WHERE ( T WHERE X = SNO ) { PNO }
These are both attempts at formulating the query "Get suppliers who supply every purple part." Given our usual sample data values, show the result returned in each case. Which result (and which formulation), if either, do you regard as correct? Justify your answer. Exercise 5-26.
Consider the restriction r WHERE bx. In the body of the chapter, I said that every attribute mentioned in the restriction condition bx must be an attribute of r, but I also mentioned that bx was subject to certain further limitations. One of those limitations[*] is that, strictly speaking, bx is supposed to consist of just a single term of the form x Op y,where each of x and y is either an attribute of r or a literal and Op is a comparison operator that makes sense for x and y. But a real language will allow WHERE clauses to contain boolean expressions of arbitrary complexity, just as long as the only "free variables" (see Appendix A) in that expression are attributes of r. Show that it's legitimate to extend the definition of restrict in such ways that is, show that such extensions are really just shorthand for something we already know is legitimate. [*] Another is that bx mustn't involve any quantifiers, nor must it be equivalent to an expression that does (see Appendix A). Exercise 5-27.
Here are two simple expressions involving relation R4 from Figure 5-1 in the body of the chapter. What queries do they represent? ( R4 WHERE TUPLE { PNO PNO('P2') } Exercise 5-28.
Given our usual sample values for the suppliers-and-parts database, what does the following expression denote? EXTEND S ADD ( ( ( SP RENAME ( SNO AS X ) ) WHERE X = SNO ) { PNO } AS PNO_REL )
Exercise 5-29.
Let the relation returned by the expression in the previous exercise be kept as a relvar called SSP. What do the following updates do? INSERT SSP RELATION { TUPLE { SNO SNO('S6'), PNO_REL RELATION { TUPLE { PNO PNO('P5') } } } } ; UPDATE SSP WHERE SNO = SNO('S2') ( INSERT PNO_REL RELATION { TUPLE { PNO PNO('P5') } } ) ; Exercise 5-30.
Using relvar SSP from the previous exercise, write expressions for the following queries:
Exercise 5-31.
A quota query is a query that specifies a desired limit, or quota, on the cardinality of the result: for example, the query "Get the three heaviest parts," for which the specified quota is three. Give Tutorial D and SQL formulations of this query. |