Hack 49. Keep Track of Infrequently Changing Values
It may be enough for your database to keep track of current real-world information. But sometimes you need to record historical data as well.
If you're calculating values for reporting on the current state of things, the most up-to-date information is sufficient. But if your reports span a significant time period, such as a year-over-year comparison, you will need to take changes into account. This hack looks at two options for storing current and historical data in the same table.
Suppose you're keeping track of products and their prices. The most direct and data-oriented way to ensure that you can always quickly find the prices of items is to have a table of prices with a composite primary key of the product name and the date. Then, every day, you need to "confirm" the price of each item by adding a new row to the database, even when the price doesn't change. This approach is safe and reliable, leaving an audit trail of prices, and you can create queries using the current date to find the current prices. Your table would look like Table 7-1.
product | whn | price |
---|---|---|
aglet | 2006-05-20 | $10.00 |
aglet | 2006-05-21 | $10.00 |
aglet | 2006-05-22 | $10.50 |
aglet | 2006-05-23 | $10.50 |
... | ||
aglet | 2006-06-14 | $10.50 |
aglet | 2006-06-15 | $9.00 |
... | ||
gimlet | 2006-05-20 | $12.00 |
gimlet | 2006-05-21 | $12.00 |
gimlet | 2006-05-22 | $12.00 |
... |
The query to find the price of product 'aglet' on date '2006-05-20' is easy:
SELECT price FROM priceDaily WHERE product='aglet' AND whn=DATE '2006-05-20'
The primary key for this table will be the pair (product, whn). This index will make this query fast no matter how big the price table gets.
Because the prices don't change very often, you will use a lot of space to record very little activity. Disk space is cheap, and with today's hard-drive capacities, it will be several centuries before there's an impact on your disk space. Still, vastly inflated tables make the system cumbersome. It will be difficult to back up and slow to move onto another machine. It also requires you to create new rows every day to confirm the prices.
7.1.1. Record Price Changes
You could just store price changes in the database if and when they happen. This uses less disk space, requires low maintenance, and has zero redundancy. Table 7-2 shows this approach. You can create and populate this table in MySQL with these commands:
CREATE TABLE priceChanges (product CHAR(16), whn DATE, price DECIMAL (19,4)); INSERT INTO priceChanges VALUES ('aglet', '2001-01-01', 10.00); INSERT INTO priceChanges VALUES ('aglet', '2006-05-21', 10.50); INSERT INTO priceChanges VALUES ('aglet', '2005-06-15', 9.00); INSERT INTO priceChanges VALUES ('gimlet', '2001-01-01', 12.00);
product | whn | price |
---|---|---|
aglet | 2001-01-01 | $10.00 |
aglet | 2006-05-21 | $10.50 |
aglet | 2005-06-15 | $9.00 |
gimlet | 2001-01-01 | $12.00 |
The disadvantage is that you will need a slightly more complicated SQL query to recover the current price, and an even more complex query to find the price on a particular day. Complicated SQL queries are not a problem; but you should consider how the system will optimize them. The pair (product, whn) would be a suitable primary key for this table and the index used for the primary key should be enough to ensure that the queries shown can be well optimized. But the key must be in that order to be used effectively.
7.1.1.1. Find the current price
Assuming that future price changes are not included in the database, you can find the date of the latest price for a given product with the phrase (SELECT MAX(whn) FROM priceChanges WHERE product='aglet'). So the query to find the latest price for product 'aglet' can be:
SELECT price FROM priceChanges WHERE product = 'aglet' AND whn = (SELECT MAX(whn) FROM priceChanges WHERE product='aglet')
|
7.1.1.2. Find the price at a specified date
To find the price at a specified date you have to track down the relevant price change record. That will be the latest price change on or before the specified date. The phrase (SELECT MAX(whn) FROM priceChanges WHERE product='aglet' AND whn<='2006-05-20') gives the date of the relevant price change record.
Putting this all together you get the price of product 'aglet' on date '2006-05-20' as:
SELECT price FROM priceChanges WHERE product = 'aglet' AND whn = (SELECT MAX(whn) FROM priceChanges WHERE product='aglet' AND whn <= '2006-05-20')
7.1.1.3. List all prices at a specific date
To find the entire price list at a particular date you can simply drop the outer SELECT condition, product='aglet'. However, there is still an inner restriction on product. You must make sure that the product in the inner SELECT matches the product in the outer SELECT. To do this you simply alias the outer version of the table so that it can be referenced in the inner clause:
SELECT product, price FROM priceChanges o WHERE whn = (SELECT MAX(whn) FROM priceChanges WHERE product=o.product AND whn<='2006-05-20')
You may gain an improvement in efficiency if you turn the subquery into a JOIN [Hack #11].