Hack 57. Generate Unique Sequential Numbers
Each SQL engine has its own mechanism for creating auto-numbered columns. However, it is possible to assign numbers to new rows of a table using only simple SQL statements.
Sometimes there is no usable primary key for a table and you need to generate a sequence of numbers. One example is that of the invoice: each invoice must have a unique number and you need your application to generate these numbers.
Each SQL implementation takes a different approach to this problem. You can use AUTO_INCREMENT in MySQL; in SQL Server you use an IDENTITY column; in Oracle and PostgreSQL you can create a SEQUENCE. You can find examples of how to use these later in this hack.
A problem common to all of the auto-numbering solutions is that gaps can occur in the sequence, when an invoice is generated by mistake. For instance, suppose that the last invoice generated was number 100. If invoice number 101 is created by mistake, users can delete it but they can never reuse that number. The next invoice to be generated will have the number 102 and the gap can never be filled.
You can generate your own sequences using standard SQL. This approach gives you better control over the numbers and prevents gaps from appearing in the sequences. You can use MAX to find the highest invoice number used so far. All you need to do is add one to it to get the next invoice number.
You can create a new record with this line:
INSERT INTO invoice(id, customer) SELECT COALESCE(MAX(id),0)+1, 'Shoe World' FROM invoice
When the invoice table is first created it will be empty. The expression MAX(id) will return NULL, the COALESCE will return 0, and the id value inserted will be 1. Once the table has some values in it the COALESCE function will return the highest id value and the INSERT statement will add one to that number. If invoice 101 is created and then deleted, the next invoice to be raised will be 101 again. You get no gaps if you delete the last invoice.
7.9.1. Mind the Gap
There is still a problem if you delete an invoice that is not the most recent. Suppose that you generate invoices 101, 102, and 103. Invoice 102 is deleted and you want to reuse that number. The INSERT command just shown will give you invoice 104 and the gap in the numbers remains. If you want to fill the gap you need a little more logic.
You can join the table to itself but with the id values offset by one, shown here and illustrated in Figure 7-1:
invoice x LEFT JOIN invoice y ON (x.id+1=y.id)
Figure 7-1. The LEFT JOIN of invoice a and invoice b
Normal x values will have a matching y value, but where there is a gap, and at the end, the x value will not have a partner. You can find such unpartnered values by looking for the NULL in y.id:
mysql> SELECT x.id AS x_id, a.customer, y.id AS y_id -> FROM invoice x LEFT JOIN invoice y ON (x.id+1=y.id); +------+------------+------+ | x_id | customer | y_id | +------+------------+------+ | 101 | Boot Shop | NULL | | 103 | Footware4U | 104 | | 104 | Shoe World | NULL | +------+------------+------+
From this list you need to select the smallest x_id value that has a NULL value in y_id. In this case it is 101. Adding one gives 102the first "missing" number:
mysql> SELECT COALESCE(MIN(x.id),0)+1, 'Shoe World' -> FROM invoice x LEFT JOIN invoice y ON (x.id+1=y.id) -> WHERE y.id IS NULL; +-------------------------+------------+ | COALESCE(Min(x.id),0)+1 | Shoe World | +-------------------------+------------+ | 102 | Shoe World | +-------------------------+------------+
You can then tuck this SELECT statement into the INSERT statement:
INSERT INTO invoice(id, customer) SELECT COALESCE(Min(x.id),0)+1, 'Shoe World' FROM invoice x LEFT JOIN invoice y ON (x.id+1=y.id) WHERE y.id IS NULL
7.9.1.1. Raw speed
If you are more concerned about performance than reusing deleted invoice numbers, note that the original approach will perform very well:
INSERT INTO invoice(id, customer) SELECT MAX(id)+1, 'Shoe World' FROM invoice
Assuming that id is the primary key, it will be indexed and the system can use this index to identify the MAX(id).
In contrast, the following statement will perform poorly. You should not expect the system to be able to use the id index to find the MAX of (id+1). The optimizer is not likely to be that smart:
INSERT INTO invoice SELECT MAX(id+1), 'Shoe World' FROM invoice
|
7.9.2. Multiuser Considerations
If you want to use the number that you generated using this hack, you must take care. It is possible that another user will be running the same routine at the same time and you will get their number rather than the correct value.
The only safe way around this is to lock the whole table. If your statement is part of a longer transaction, that will only make your problems worse [Hack #67]. In MySQL, you can do the following to get the invoice number you just created:
LOCK TABLES invoice WRITE, invoice AS x WRITE, invoice AS y WRITE; INSERT INTO invoice(id, customer) SELECT COALESCE(Min(x.id),0)+1, 'Shoe World' FROM invoice x LEFT JOIN invoice y ON (x.id+1=y.id) WHERE y.id IS NULL; UNLOCK TABLES;
If you really are dealing with invoices and you are generating a few per day, this will work. But if you are generating a few per second the LOCK will start to impact other users who are trying to read older records. In such high-volume environments, your best bet is to use the vendor-supplied extensions, which is the conventional approach.
7.9.3. Use System-Generated Numbers
Each SQL implementation has its own version of auto-numbering. They give you a special "session" variable which provides the id value most recently generated. What's more, they guarantee that they'll give you the right value, even if other transactions are running which have requested more numbers after your id was generated. The following sections provide brief examples of how to use auto-numbering for MySQL, SQL Server, Oracle, and PostgreSQL.
7.9.3.1. MySQL: AUTO_INCREMENT column
In MySQL:
CREATE TABLE invoice2( id INTEGER AUTO_INCREMENT PRIMARY KEY, customer VARCHAR(10) ); INSERT INTO invoice2(customer) VALUES ('Shoe World'); --Show the value generated SELECT LAST_INSERT_ID( );
7.9.3.2. SQL Server: IDENTITY column
In SQL Server:
CREATE TABLE invoice2( id INTEGER IDENTITY PRIMARY KEY, customer VARCHAR(10) ); INSERT INTO invoice2(customer) VALUES ('Shoe World'); --Show the value generated SELECT @@IDENTITY;
7.9.3.3. Oracle: SEQUENCE
In Oracle:
CREATE SEQUENCE sqInvoice2; CREATE TABLE invoice2( id INTEGER PRIMARY KEY, customer VARCHAR(10) ); INSERT INTO invoice2(id,customer) VALUES (sqInvoice2.NEXTVAL,'Shoe World'); --Show the value generated SELECT sqInvoice2.CURRVAL FROM dual;
7.9.3.4. PostgreSQL: SEQUENCE
In PostgreSQL:
CREATE SEQUENCE sqInvoice2; CREATE TABLE invoice2( id INTEGER PRIMARY KEY, customer VARCHAR(10) ); INSERT INTO invoice2(name) VALUES (NEXTVAL('sqInvoice2'),'Shoe World'); --Show the value generated SELECT CURRVAL('sqInvoice2');
An artificially generated key such as this is called a surrogate key. It is created for the purpose of being unique. You should try to avoid these keys whenever possible. If you can't avoid them, hide them from your users. If you can't hide them, at least let your users have some control over them.
7.9.4. Choosing a Primary Key
Generating an artificial sequence should be your last resort; if there is a combination of columns that you know will be unique you should make up a composite key on those. In the case of an invoice, the number is probably useful to the business process, but even then you should think carefully.
If your organization has more than one location you should not attempt to use a single sequence across the enterprise. There is no safe way for the Metropolis office to share the same sequence as the Smallville office. One day you will find that the Smallville folks have lost electrical power and that will prevent the Metropolis office from raising an invoice. Instead, you should allow each office its own sequence and use the office code plus the sequence number as the primary key.
You might want to choose a primary key that someone else has created. A car registration plate is fine for cars, but you can't use a National Insurance number or Social Security number. People are reluctant to divulge these kinds of details unless there is a good reason to do so. If you choose government-issued identification as your primary key you are likely to be excluding foreign nationals.
7.9.5. Hacking the Hack
Consider starting the sequence at 1,001 (or 10,001, 100,001, or even more) rather than at 1. Someone might copy a list of invoices into a spreadsheet and sort it. If the numbers get sorted as strings they will come out in the order 1, 11, 12, 2. The numbers 1,001, 1,002...1,011, 1,012 will be ordered correctly as strings or as integers.