Hack 54. Denormalize Your Tables

Despite warnings to the contrary that you will hear throughout your career, sometimes you need to denormalize data.

If you needed to create a permanent log or audit trail of something in a table, you might need to copy data from one table to another, or to a new table. Although this leads to data redundancy, it makes the log immune to changes which may occur in other tables at a later date. This might be essential in a situation where you want an audit trail. In a logging situation, you only add records and you never update them. You can use a simple INSERT/SELECT combination to append records to an existing table. Each vendor has a neat variation on the SELECT or CREATE statement that allows you to create a table on the fly as well.

Imagine you are charged with recording all shipments from a storehouse. As each shipment goes out you must record product details, such as price, at the time of shipment. Similarly, you must record the address of the recipients as it was at the moment of shipment.

You should never really need to duplicate data within a self-contained database, because copying creates data redundancy, which can lead to inconsistent states if the data is updated in only one place. However, when you do need redundancy, you need the INSERT/SELECT combination. You use the INSERT statement, and a SELECT in place of the more commonly used VALUES construct:

INSERT INTO warehouse SELECT p.id, c.name, c.addr, p.price, d.quant FROM shipment s JOIN shipment_detail d ON (d.shipment=s.id) JOIN product p ON (d.product =p.id) JOIN customer c ON (s.customer=c.id)

The line shown is good for adding rows to a table that already exists. That's fine for audit logs, but sometimes for auditing you want a table snapshot, and this needs a table to hold the snapshot before the INSERT statements can be executed. Even then you can save yourself the effort of creating the table separately (which is convenient because you don't need to find out the type of each column for the CREATE TABLE command).

MySQL allows you to CREATE a TABLE from a SELECT query:

-- MySQL CREATE TABLE warehouse SELECT p.id, c.name, c.addr, p.price, d.quant FROM shipment s JOIN shipment_detail d ON (d.shipment=s.id) JOIN product p ON (d.product =p.id) JOIN customer c ON (s.customer=c.id);

Oracle does the same thing, but it needs the keyword AS before the SELECT:

-- Oracle CREATE TABLE warehouse AS SELECT p.id, c.name, c.addr, p.price, d.quant FROM shipment s JOIN shipment_detail d ON (d.shipment=s.id) JOIN product p ON (d.product =p.id) JOIN customer c ON (s.customer=c.id);

SQL Server and Access allow you to write a SELECT query that creates a table. The phrase INTO is required to make it create a table:

SELECT p.id, c.name, c.addr, p.price, d.quant INTO warehouse FROM shipment s JOIN shipment_detail d ON (d.shipment=s.id) JOIN product p ON (d.product =p.id) JOIN customer c ON (s.customer=c.id);

Stick to SQL92

This inconsistency in syntax is due to the history of SQL. SQL92 was a pretty thorough standard and has been widely implemented. However, a handful of popular facilities were missing from the SQL92 standard: functionality for copying tables was one issue, and auto-number was another [Hack #57].

The team that defined SQL92 didn't leave this kind of thing out by accident, and they certainly didn't leave it out because it hadn't been invented. They left it out because they considered it bad practice to be avoided. Think twice before you take advantage of features that were left out of the 92 standard. They are missing for a reason.

Both table copying and auto-numbering were addressed in later versions of the standard, but by this time each vendor had committed itself to its own particular syntax. Ironically, MySQL is emerging as one of the most compliant engines. As a relative latecomer, MySQL's first releases were so far behind the standard that it was hard to take those releases seriously, but with every new release MySQL gets closer to the standard. The MySQL team is releasing substantial revisions at an impressive rate.

Категории