Using a Join to Create a Lookup Table from Descriptive Labels
12.20.1 Problem
A table stores long descriptive labels in an identifier column. You want to convert this column to short ID values and use the labels to create a lookup table that maps IDs to labels.
12.20.2 Solution
Use one of the related-table update techniques described in Recipe 12.19.
12.20.3 Discussion
It's a common strategy to store ID numbers or codes rather than descriptive strings in a table to save space. It also improves performance, because it's quicker to index and retrieve numbers than strings. (For queries in which you need to produce the names, join the ID values with an ID-to-name lookup table.) When you're creating a new table, you can keep this strategy in mind and design the table from the outset to be used with a lookup table. But you may also have an existing table that stores descriptive strings and that could benefit from a conversion to use ID values. This section discusses how to create the lookup table that maps each label to its ID, and how to convert the labels to IDs in the original table. The technique combines ALTER TABLE with a related-table update.
Suppose you collect coins, and you've begun to keep track of them in your database using the following table:
CREATE TABLE coin ( id INT UNSIGNED NOT NULL AUTO_INCREMENT, date CHAR(5) NOT NULL, # 4 digits + mint letter denom CHAR(20) NOT NULL, # denomination (e.g., Lincoln cent) PRIMARY KEY (id) );
Each coin is assigned an ID automatically as an AUTO_INCREMENT value, and you also record each coin's date of issue and denomination. The records that you've entered into the table thus far are as follows:
mysql> SELECT * FROM coin; +----+-------+---------------------+ | id | date | denom | +----+-------+---------------------+ | 1 | 1944s | Lincoln cent | | 2 | 1977 | Roosevelt dime | | 3 | 1955d | Lincoln cent | | 4 | 1938 | Jefferson nickel | | 5 | 1964 | Kennedy half dollar | | 6 | 1959 | Lincoln cent | | 7 | 1945 | Jefferson nickel | | 8 | 1905 | Buffalo nickel | | 9 | 1924 | Mercury head dime | | 10 | 2001 | Roosevelt dime | | 11 | 1937 | Mercury head dime | | 12 | 1977 | Kennedy half dollar | +----+-------+---------------------+
The table holds the information in which you're interested, but you notice that it's a waste of space to write out the denomination names in every record, and that the problem will become worse as you enter additional records into the table. It would be more space-efficient to store coded denomination IDs in the coin table rather than the names, then look up the names when necessary from a denom table that lists each denomination name and its ID code. (The benefit of this may not be evident with such a small table, but when your collection grows to include 10,000 coins, the space savings from storing numbers rather than strings will become more significant.)
The procedure for setting up the lookup table and converting the coin table to use it is as follows:
- Create the denom lookup table to hold the ID-to-name mapping.
- Populate the denom table using the denomination names currently in the coin table.
- Replace the denomination names in the coin table with the corresponding ID values.
The denom table needs to record each denomination name and its associated ID, so it can be created using the following structure:
CREATE TABLE denom ( denom_id INT UNSIGNED NOT NULL AUTO_INCREMENT, name CHAR(20) NOT NULL, PRIMARY KEY (denom_id) );
To populate the table, insert into it the set of denomination names that are present in the coin table. Use SELECT DISTINCT for this, because each name should be inserted just once:
INSERT INTO denom (name) SELECT DISTINCT denom FROM coin;
The INSERT statement adds only the denomination name to the denom table; denom_id is an AUTO_INCREMENT column, so MySQL will assign sequence numbers to it automatically. The resulting table looks like this:
+----------+---------------------+ | denom_id | name | +----------+---------------------+ | 1 | Lincoln cent | | 2 | Roosevelt dime | | 3 | Jefferson nickel | | 4 | Kennedy half dollar | | 5 | Buffalo nickel | | 6 | Mercury head dime | +----------+---------------------+
With MySQL 3.23 and up, you can create and populate the denom table using a single CREATE TABLE ... SELECT statement:
CREATE TABLE denom ( denom_id INT UNSIGNED NOT NULL AUTO_INCREMENT, PRIMARY KEY (denom_id) ) SELECT DISTINCT denom AS name FROM coin;
After setting up the denom table, the next step is to convert the denomination names in the coin table to their associated IDs:
- Create a tmp table that is like coin but has a denom_id column rather than a denom column.
- Populate tmp from the result of a join between the coin and denom tables.
- Use the tmp table to replace the original coin table.
To create the tmp table, use a CREATE TABLE statement that is like the one used originally to create coin, but substitute a denom_id column for the denom column:
CREATE TABLE tmp ( id INT UNSIGNED NOT NULL AUTO_INCREMENT, date CHAR(5) NOT NULL, # 4 digits + mint letter denom_id INT UNSIGNED NOT NULL, # denomination ID PRIMARY KEY (id) );
Then populate tmp using a join between coin and denom:
INSERT INTO tmp (id, date, denom_id) SELECT coin.id, coin.date, denom.denom_id FROM coin, denom WHERE coin.denom = denom.name;
Finally, replace the original coin table with the tmp table:
DROP TABLE coin; ALTER TABLE tmp RENAME TO coin;
With MySQL 3.23 and up, you can create and populate the tmp table using a single statement:
CREATE TABLE tmp ( PRIMARY KEY (id) ) SELECT coin.id, coin.date, denom.denom_id FROM coin, denom WHERE coin.denom = denom.name;
Then replace coin with tmp, as before.
Another method for converting the coin table after creating the denom table is to modify coin in place without using a tmp table:
- Add a denom_id column to the coin table with ALTER TABLE.
- Fill in the denom_id value in each row with the ID corresponding to its denom name.
- Drop the denom column.
To carry out this procedure, add a column to coin to hold the denomination ID values:
ALTER TABLE coin ADD denom_id INT UNSIGNED NOT NULL;
Then fill in the denom_id column with the proper values using the denomination name-to-ID mapping in the denom table. One way to do that is to write a script to update the ID values in the coin table one denomination at a time. Here is a short script that does so:
#! /usr/bin/perl -w # update_denom.pl - For each denomination in the denom table, # update the coin table records having that denomination with the # proper denomination ID. use strict; use lib qw(/usr/local/apache/lib/perl); use Cookbook; my $dbh = Cookbook::connect ( ); my $sth = $dbh->prepare ("SELECT denom_id, name FROM denom"); $sth->execute ( ); while (my ($denom_id, $name) = $sth->fetchrow_array ( )) { # For coin table records with the given denomination name, # add the corresponding denom_id value from denom table $dbh->do ("UPDATE coin SET denom_id = ? WHERE denom = ?", undef, $denom_id, $name); } $dbh->disconnect ( ); exit (0);
The script retrieves each denomination ID/name pair from the denom table and constructs an appropriate UPDATE statement to modify all coin table rows containing the denomination name by setting their denom_id values to the corresponding ID. When the script finishes, all rows in the coin table will have the denom_id column updated properly. At that point, the denom column is no longer necessary and you can jettison it:
ALTER TABLE coin DROP denom;
Whichever method you use to convert the coin table, the resulting contents look like this:
mysql> SELECT * FROM coin; +----+-------+----------+ | id | date | denom_id | +----+-------+----------+ | 1 | 1944s | 1 | | 2 | 1977 | 2 | | 3 | 1955d | 1 | | 4 | 1938 | 3 | | 5 | 1964 | 4 | | 6 | 1959 | 1 | | 7 | 1945 | 3 | | 8 | 1905 | 5 | | 9 | 1924 | 6 | | 10 | 2001 | 2 | | 11 | 1937 | 6 | | 12 | 1977 | 4 | +----+-------+----------+
When you need to display coin records with denomination names rather than IDs in a query result, perform a join using denom as a lookup table:
mysql> SELECT coin.id, coin.date, denom.name -> FROM coin, denom -> WHERE coin.denom_id = denom.denom_id; +----+-------+---------------------+ | id | date | name | +----+-------+---------------------+ | 1 | 1944s | Lincoln cent | | 2 | 1977 | Roosevelt dime | | 3 | 1955d | Lincoln cent | | 4 | 1938 | Jefferson nickel | | 5 | 1964 | Kennedy half dollar | | 6 | 1959 | Lincoln cent | | 7 | 1945 | Jefferson nickel | | 8 | 1905 | Buffalo nickel | | 9 | 1924 | Mercury head dime | | 10 | 2001 | Roosevelt dime | | 11 | 1937 | Mercury head dime | | 12 | 1977 | Kennedy half dollar | +----+-------+---------------------+
That result looks like the contents of the original coin table, even though the table no longer stores a long descriptive string in each row.
What about entering new items into the coin table? Using the original coin table, you'd enter the denomination name into each record. But with the denominations converted to ID values, that won't work. Instead, use an INSERT INTO ... SELECT statement to look up the denomination ID based on the name. For example, to enter a 1962 Roosevelt dime, use this statement:
INSERT INTO coin (date, denom_id) SELECT 1962, denom_id FROM denom WHERE name = 'Roosevelt dime';
This technique is described further in Recipe 12.18.