Database in Depth: Relational Theory for Practitioners

Now let's get back to an examination of basic relational concepts. In this section I want to focus on some specific properties of relations themselves. First of all, every relation has a heading and a body: the heading is a set of attributes (where an attribute is an attribute-name:type-name pair), and the body is a set of tuples that conform to that heading. In the case of the suppliers relation of Figure 1-3, for example, there are four attributes in the heading and five tuples in the body. Note, therefore, that a relation doesn't really contain tuples it contains a body, and that body in turn contains the tuples but we do usually talk as if relations contained tuples directly, for the sake of simplicity.

By the way, although it's strictly correct to say that the heading consists of attribute-name:type-name pairs, it's usual to omit the type names in pictures like Figure 1-3 and thereby pretend that the heading is a set of attribute names only. For example, the STATUS attribute does have a type (INTEGER, let's say), but I didn't show it in Figure 1-3. But you should never forget it's there!

Next, the number of attributes in the heading is the degree (sometimes the arity), and the number of tuples in the body is the cardinality. For example, relations S, P, and SP in Figure 1-3 have degree 4, 5, and 3, respectively, and cardinality 5, 6, and 12, respectively.

Next, relations never contain duplicate tuples. This property follows because a body is a set of tuples, and sets in mathematics do not contain duplicate elements. By the way, SQL fails here: SQL tables are allowed to contain duplicate rows, as I'm sure you know, and SQL tables are thus not relations, in general. Please understand, therefore, that in this book I always use the term "relation" to mean a relation without duplicate tuples, by definition and not an SQL table. Please understand also that relational operations always produce a result without duplicate tuples, again by definition. For example, projecting the suppliers relation of Figure 1-3 on CITY produces the result shown on the left and not the one on the right:

(The result on the left can be obtained via the SQL query SELECT DISTINCT S.CITY FROM S. Omitting DISTINCT leads to the nonrelational result on the right. Note in particular that the table on the right has no double underlining; that's because it has no key, and hence a fortiori no primary key.)

Next, the tuples of a relation are unordered, top to bottom. This property follows because, again, a body is a set, and sets in mathematics have no ordering to their elements. (Thus, for example, {a,b,c} and {c,a,b} are the same set in mathematics, and the same kind of thing is naturally true in the relational model.) Of course, when we draw a relation as a table on paper, we do have to show the rows in some top-to-bottom order, but that ordering doesn't correspond to anything relational. In the case of the suppliers relation of Figure 1-3, for example, I could have shown the rows in any order say, supplier S3, then S1, then S5, then S4, then S2 and the picture would still represent the same relation.

NOTE

The fact that the tuples of a relation are unordered doesn't mean queries can't include an ORDER BY specification, but it does mean that such queries produce a result that's not a relation. ORDER BY is useful for the purpose of displaying results, but it isn't a relational operator as such.

In similar fashion, the attributes of a relation are also unordered, left to right, because a heading too is a mathematical set. Again, when we draw a relation as a table on paper, we have to show the columns in some left-to-right order, but that ordering doesn't correspond to anything relational. In the case of the suppliers relation of Figure 1-3, for example, I could have shown the columns in any left-to-right order say STATUS, SNAME, CITY, SNO and the picture would still represent the same relation in the relational model. Incidentally, SQL fails here too: SQL tables do have a left-to-right ordering to their columns (another reason why SQL tables aren't relations, in general). For example, the pictures below represent the same relation, but two different SQL tables:

(The corresponding SQL queries are, respectively, SELECT S.SNO, S.CITY FROM S and SELECT S.CITY, S.SNO FROM S. By the way, you might be thinking that the difference between these two tables is hardly very significant; in fact, however, it has some serious consequences, some of which I'll be touching on in later chapters.)

Next, relations are always normalized (equivalently, they're in first normal form, 1NF). Informally, this means that, in terms of the tabular picture of a relation, at every row-and-column intersection we always see just a single value. More formally, it means that every tuple in every relation contains just a single value, of the appropriate type, in every attribute position. I'll have quite a lot more to say on this particular issue in the next chapter.

Next, we draw a distinction between base and derived relations. As I explained earlier, the operators of the relational algebra allow us to start with some given relations perhaps those of Figure 1-3 and obtain further relations from those given ones. The given relations are the base ones; the others are derived. Now, a relational system obviously has to provide a means for defining the base relations in the first place. In SQL, this task is performed by the CREATE TABLE statement (the SQL counterpart to a base relation being, of course, a base table). And base relations obviously have to be named for example:

CREATE TABLE SP ... ;

But certain derived relations including in particular what are called views are also named. A view (also known as a virtual relation) is a named relation whose value at all times is the result of evaluating a certain relational expression at the time in question. Here's an SQL example:

CREATE VIEW SST_PARIS AS SELECT S.SNO, S.STATUS FROM S WHERE S.CITY = 'Paris' ;

You can operate on views as if they were base relations, but they aren't base relations; instead, you can think of views as being materialized in effect, you can think of a base relation as being dynamically built at the time they're referenced. (Though I should emphasize that thinking of views being materialized when they're referenced is only a way of thinking; it's not what's really supposed to happen. How views really work is explained in Chapter 4.)

There's an important point I need to make here. You'll often see people describe the difference between base relations and views like this:

  • Base relations really exist that is, they're physically stored in the database.

  • Views, by contrast, don't "really exist" they merely provide different ways of looking at the base relations.

But the relational model has nothing to say about what's physically stored! In particular, it does not say that base relations are physically stored. The only requirement is that there must be some mapping between whatever is physically stored and those base relations, so that those base relations can somehow be constructed when they're needed (conceptually speaking, at any rate). If the base relations can be constructed in this way, then so can everything else. For example, we might physically store the join of suppliers and shipments, instead of storing them separately; base relations S and SP then could be constructed, conceptually, by taking appropriate projections of that join. In other words, base relations are no more (and no less!) "physical" than views are, so far as the relational model is concerned.

The fact that the relational model says nothing about physical storage is deliberate, of course. The idea was to give implementers the freedom to implement the model in whatever way they chose in particular, in whatever way seemed likely to yield good performance without compromising on data independence. The sad fact is, however, that SQL vendors seem mostly not to have understood this point; instead, they map base tables fairly directly to physical storage,[*] and (as noted in the previous section) their products therefore provide far less data independence than relational systems are theoretically capable of. Indeed, this state of affairs is reflected in the SQL standard itself as well as in most other SQL documentation which typically (fairly ubiquitously, in fact) uses expressions such as "tables and views." Clearly, anyone who uses such an expression is under the impression that tables and views are different things, and probably under the impression too that "tables" are physical and views aren't. But the whole point about a view is that it is a table (or, as I would prefer to say, a relation); that is, we can perform the same kinds of operations on views as we can on regular relations (at least in the relational model), because views are "regular relations"! Throughout this book, therefore, I'll reserve the term relation to mean a relation (possibly a base relation, possibly a view, possibly a query result, and so on); if I mean (for example) a base relation specifically, then I'll say "base relation." And I suggest very strongly that you adopt the same discipline for yourself. Don't fall into the common trap of thinking that the term relation means a base relation specifically.

[*] I say this knowing full well that today's SQL products provide a variety of options for hashing, partitioning, indexing, clustering, and otherwise organizing the data as stored on the disk. I still consider the mapping to physical storage in those products to be "fairly direct."

Категории