Preventing Duplicates from Occurring in a Table

14.2.1 Problem

You want to prevent a table from ever containing duplicates, so that you won't have to worry about eliminating them later.

14.2.2 Solution

Use a PRIMARY KEY or a UNIQUE index.

14.2.3 Discussion

To make sure that records in a table are unique, some column or combination of columns must be required to contain unique values in each row. When this requirement is satisfied, you can refer to any record in the table unambiguously using its unique identifier. To make sure a table has this characteristic, include a PRIMARY KEY or UNIQUE index in the table structure when you create the table. The following table contains no such index, so it would allow duplicate records:

CREATE TABLE person ( last_name CHAR(20), first_name CHAR(20), address CHAR(40) );

To prevent multiple records with the same first and last name values from being created in this table, add a PRIMARY KEY to its definition. When you do this, it's also necessary to declare the indexed columns to be NOT NULL, because a PRIMARY KEY does not allow NULL values:

CREATE TABLE person ( last_name CHAR(20) NOT NULL, first_name CHAR(20) NOT NULL, address CHAR(40), PRIMARY KEY (last_name, first_name) );

The presence of a unique index in a table normally causes an error to occur if you insert a record into the table that duplicates an existing record in the column or columns that define the index. Recipe 14.3 discusses how to handle such errors or modify MySQL's duplicate-handling behavior.

Another way to enforce uniqueness is to add a UNIQUE index rather than a PRIMARY KEY to a table. The two types of indexes are identical, with the exception that a UNIQUE index can be created on columns that allow NULL values. For the person table, it's likely that you'd require both the first and last names to be filled in. If so, you'd still declare the columns as NOT NULL, and the following table declaration would be effectively equivalent to the preceding one:

CREATE TABLE person ( last_name CHAR(20) NOT NULL, first_name CHAR(20) NOT NULL, address CHAR(40), UNIQUE (last_name, first_name) );

If a UNIQUE index does happen to allow NULL values, NULL is special because it is the one value that can occur multiple times. The rationale for this is that it is not possible to know whether one unknown value is the same as another, so multiple unknown values are allowed.

It may of course be that you'd want the person table to reflect the real world, in which people do sometimes have the same name. In this case, you cannot set up a unique index based on the name columns, because duplicate names must be allowed. Instead, each person must be assigned some sort of unique identifier, which becomes the value that distinguishes one record from another. In MySQL, a common technique for this is the AUTO_INCREMENT column:

CREATE TABLE person ( id INT UNSIGNED NOT NULL AUTO_INCREMENT, last_name CHAR(20), first_name CHAR(20), address CHAR(40), PRIMARY KEY (id) );

In this case, when you create a record with an id value of NULL, MySQL assigns that column a unique ID automatically. Another possibility is to assign identifiers externally and use those IDs as unique keys. For example, citizens in a given country might have unique taxpayer ID numbers. If so, those numbers can serve as the basis for a unique index:

CREATE TABLE person ( tax_id INT UNSIGNED NOT NULL, last_name CHAR(20), first_name CHAR(20), address CHAR(40), PRIMARY KEY (tax_id) );

14.2.4 See Also

AUTO_INCREMENT columns are discussed further in Chapter 11.

Категории