Randomizing a Set of Rows
13.8.1 Problem
You want to randomize a set of rows or values.
13.8.2 Solution
Use ORDER BY RAND( ).
13.8.3 Discussion
MySQL's RAND( ) function can be used to randomize the order in which a query returns its rows. Somewhat paradoxically, this randomization is achieved by adding an ORDER BY clause to the query. The technique is roughly equivalent to a spreadsheet randomization method. Suppose you have a set of values in a spreadsheet that looks like this:
Patrick Penelope Pertinax Polly
To place these in random order, first add another column that contains randomly chosen numbers:
Patrick .73 Penelope .37 Pertinax .16 Polly .48
Then sort the rows according to the values of the random numbers:
Pertinax .16 Penelope .37 Polly .48 Patrick .73
At this point, the original values have been placed in random order, because the effect of sorting the random numbers is to randomize the values associated with them. To re-randomize the values, choose another set of random numbers and sort the rows again.
In MySQL, a similar effect is achieved by associating a set of random numbers with a query result and sorting the result by those numbers. For MySQL 3.23.2 and up, this is done with an ORDER BY RAND( ) clause:
mysql> SELECT name FROM t ORDER BY RAND( ); +----------+ | name | +----------+ | Pertinax | | Penelope | | Patrick | | Polly | +----------+ mysql> SELECT name FROM t ORDER BY RAND( ); +----------+ | name | +----------+ | Patrick | | Pertinax | | Penelope | | Polly | +----------+
For versions of MySQL older than 3.23.2, ORDER BY clauses cannot refer to expressions, so you cannot use RAND( ) there (see Recipe 6.4). As a workaround, add a column of random numbers to the column output list, alias it, and refer to the alias for sorting:
mysql> SELECT name, name*0+RAND( ) AS rand_num FROM t ORDER BY rand_num; +----------+-------------------+ | name | rand_num | +----------+-------------------+ | Penelope | 0.372227413926485 | | Patrick | 0.431537678867148 | | Pertinax | 0.566524063764628 | | Polly | 0.715938107777329 | +----------+-------------------+
Note that the expression for the random number column is name*0+RAND( ), not just RAND( ). If you try using the latter, the pre-3.23 MySQL optimizer notices that the column contains only a function, assumes that the function returns a constant value for each row, and optimizes the corresponding ORDER BY clause out of existence. As a result, no sorting is done. The workaround is to fool the optimizer by adding extra factors to the expression that don't change its value, but make the column look like a non-constant. The query just shown illustrates one easy way to do this: Take any column name, multiply it by zero, and add the result to RAND( ). Granted, it may seem a little strange to use name in a mathematical expression, because that column's values aren't numeric. That doesn't matter; MySQL sees the * multiplication operator and performs a string-to-number conversion of the name values before the multiply operation. The important thing is that the result of the multiplication is zero, which means that name*0+RAND( ) has the same value as RAND( ).
Applications for randomizing a set of rows include any scenario that uses selection without replacement (choosing each item from a set of items, until there are no more items left). Some examples of this are:
- Determining the starting order for participants in an event. List the participants in a table and select them in random order.
- Assigning starting lanes or gates to participants in a race. List the lanes in a table and select a random lane order.
- Choosing the order in which to present a set of quiz questions.
- Shuffling a deck of cards. Represent each card by a row in a table and shuffle the deck by selecting the rows in random order. Deal them one by one until the deck is exhausted.
To use the last example as an illustration, let's implement a card deck shuffling algorithm. Shuffling and dealing cards is randomization plus selection without replacement: each card is dealt once before any is dealt twice; when the deck is used up, it is reshuffled to re-randomize it for a new dealing order. Within a program, this task can be performed with MySQL using a table deck that has 52 rows, assuming a set of cards with each combination of 13 face values and 4 suits:
- Select the entire table and store it into an array.
- Each time a card is needed, take the next element from the array.
- When the array is exhausted, all the cards have been dealt. "Reshuffle" the table to generate a new card order.
Setting up the deck table is a tedious task if you insert the 52 card records by writing out all the INSERT statements manually. The deck contents can be generated more easily in combinatorial fashion within a program by generating each pairing of face value with suit. Here's some PHP code that creates a deck table with face and suit columns, then populates the table using nested loops to generate the pairings for the INSERT statements:
mysql_query (" CREATE TABLE deck ( face ENUM('A', 'K', 'Q', 'J', '10', '9', '8', '7', '6', '5', '4', '3', '2') NOT NULL, suit ENUM('hearts', 'diamonds', 'clubs', 'spades') NOT NULL )", $conn_id) or die ("Cannot issue CREATE TABLE statement "); $face_array = array ("A", "K", "Q", "J", "10", "9", "8", "7", "6", "5", "4", "3", "2"); $suit_array = array ("hearts", "diamonds", "clubs", "spades"); # insert a "card" into the deck for each combination of suit and face reset ($face_array); while (list ($index, $face) = each ($face_array)) { reset ($suit_array); while (list ($index2, $suit) = each ($suit_array)) { mysql_query ("INSERT INTO deck (face,suit) VALUES('$face','$suit')", $conn_id) or die ("Cannot insert card into deck "); } }
Shuffling the cards is a matter of issuing this statement:
SELECT face, suit FROM deck ORDER BY RAND( );
To do that and store the results in an array within a script, write a shuffle_deck( ) function that issues the query and returns the resulting values in an array (again shown in PHP):
function shuffle_deck ($conn_id) { $query = "SELECT face, suit FROM deck ORDER BY RAND( )"; $result_id = mysql_query ($query, $conn_id) or die ("Cannot retrieve cards from deck "); $card = array ( ); while ($obj = mysql_fetch_object ($result_id)) $card[ ] = $obj; # add card record to end of $card array mysql_free_result ($result_id); return ($card); }
Deal the cards by keeping a counter that ranges from 0 to 51 to indicate which card to select. When the counter reaches 52, the deck is exhausted and should be shuffled again.