Randomizing a Set of Rows

13.8.1 Problem

You want to randomize a set of rows or values.

13.8.2 Solution

Use ORDER BY RAND( ).

13.8.3 Discussion

MySQL's RAND( ) function can be used to randomize the order in which a query returns its rows. Somewhat paradoxically, this randomization is achieved by adding an ORDER BY clause to the query. The technique is roughly equivalent to a spreadsheet randomization method. Suppose you have a set of values in a spreadsheet that looks like this:

Patrick Penelope Pertinax Polly

To place these in random order, first add another column that contains randomly chosen numbers:

Patrick .73 Penelope .37 Pertinax .16 Polly .48

Then sort the rows according to the values of the random numbers:

Pertinax .16 Penelope .37 Polly .48 Patrick .73

At this point, the original values have been placed in random order, because the effect of sorting the random numbers is to randomize the values associated with them. To re-randomize the values, choose another set of random numbers and sort the rows again.

In MySQL, a similar effect is achieved by associating a set of random numbers with a query result and sorting the result by those numbers. For MySQL 3.23.2 and up, this is done with an ORDER BY RAND( ) clause:

mysql> SELECT name FROM t ORDER BY RAND( ); +----------+ | name | +----------+ | Pertinax | | Penelope | | Patrick | | Polly | +----------+ mysql> SELECT name FROM t ORDER BY RAND( ); +----------+ | name | +----------+ | Patrick | | Pertinax | | Penelope | | Polly | +----------+

How Random Is RAND()?

Does the RAND( ) function generate evenly distributed numbers? Check it out for yourself with the following Python script, rand_test.py, from the stats directory of the recipes distribution. It uses RAND( ) to generate random numbers and constructs a frequency distribution from them, using .1-sized categories. This provides a means of assessing how evenly distributed the values are.

#! /usr/bin/python # rand_test.pl - create a frequency distribution of RAND( ) values. # This provides a test of the randomness of RAND( ).

# Method is to draw random numbers in the range from 0 to 1.0, # and count how many of them occur in .1-sized intervals (0 up # to .1, .1 up to .2, ..., .9 up *through* 1.0).

import sys sys.path.insert (0, "/usr/local/apache/lib/python") import MySQLdb import Cookbook

npicks = 1000 # number of times to pick a number

bucket = [0] * 10

conn = Cookbook.connect ( ) cursor = conn.cursor ( )

for i in range (0, npicks): cursor.execute ("SELECT RAND( )") (val,) = cursor.fetchone ( ) slot = int (val * 10) if slot > 9: slot = 9 # put 1.0 in last slot bucket[slot] = bucket[slot] + 1

cursor.close ( ) conn.close ( )

# Print the resulting frequency distribution

for slot in range (0, 9): print "%2d %d" % (slot+1, bucket[slot])

sys.exit (0)

The stats directory also contains equivalent scripts in other languages.

For versions of MySQL older than 3.23.2, ORDER BY clauses cannot refer to expressions, so you cannot use RAND( ) there (see Recipe 6.4). As a workaround, add a column of random numbers to the column output list, alias it, and refer to the alias for sorting:

mysql> SELECT name, name*0+RAND( ) AS rand_num FROM t ORDER BY rand_num; +----------+-------------------+ | name | rand_num | +----------+-------------------+ | Penelope | 0.372227413926485 | | Patrick | 0.431537678867148 | | Pertinax | 0.566524063764628 | | Polly | 0.715938107777329 | +----------+-------------------+

Note that the expression for the random number column is name*0+RAND( ), not just RAND( ). If you try using the latter, the pre-3.23 MySQL optimizer notices that the column contains only a function, assumes that the function returns a constant value for each row, and optimizes the corresponding ORDER BY clause out of existence. As a result, no sorting is done. The workaround is to fool the optimizer by adding extra factors to the expression that don't change its value, but make the column look like a non-constant. The query just shown illustrates one easy way to do this: Take any column name, multiply it by zero, and add the result to RAND( ). Granted, it may seem a little strange to use name in a mathematical expression, because that column's values aren't numeric. That doesn't matter; MySQL sees the * multiplication operator and performs a string-to-number conversion of the name values before the multiply operation. The important thing is that the result of the multiplication is zero, which means that name*0+RAND( ) has the same value as RAND( ).

Applications for randomizing a set of rows include any scenario that uses selection without replacement (choosing each item from a set of items, until there are no more items left). Some examples of this are:

To use the last example as an illustration, let's implement a card deck shuffling algorithm. Shuffling and dealing cards is randomization plus selection without replacement: each card is dealt once before any is dealt twice; when the deck is used up, it is reshuffled to re-randomize it for a new dealing order. Within a program, this task can be performed with MySQL using a table deck that has 52 rows, assuming a set of cards with each combination of 13 face values and 4 suits:

Setting up the deck table is a tedious task if you insert the 52 card records by writing out all the INSERT statements manually. The deck contents can be generated more easily in combinatorial fashion within a program by generating each pairing of face value with suit. Here's some PHP code that creates a deck table with face and suit columns, then populates the table using nested loops to generate the pairings for the INSERT statements:

mysql_query (" CREATE TABLE deck ( face ENUM('A', 'K', 'Q', 'J', '10', '9', '8', '7', '6', '5', '4', '3', '2') NOT NULL, suit ENUM('hearts', 'diamonds', 'clubs', 'spades') NOT NULL )", $conn_id) or die ("Cannot issue CREATE TABLE statement "); $face_array = array ("A", "K", "Q", "J", "10", "9", "8", "7", "6", "5", "4", "3", "2"); $suit_array = array ("hearts", "diamonds", "clubs", "spades"); # insert a "card" into the deck for each combination of suit and face reset ($face_array); while (list ($index, $face) = each ($face_array)) { reset ($suit_array); while (list ($index2, $suit) = each ($suit_array)) { mysql_query ("INSERT INTO deck (face,suit) VALUES('$face','$suit')", $conn_id) or die ("Cannot insert card into deck "); } }

Shuffling the cards is a matter of issuing this statement:

SELECT face, suit FROM deck ORDER BY RAND( );

To do that and store the results in an array within a script, write a shuffle_deck( ) function that issues the query and returns the resulting values in an array (again shown in PHP):

function shuffle_deck ($conn_id) { $query = "SELECT face, suit FROM deck ORDER BY RAND( )"; $result_id = mysql_query ($query, $conn_id) or die ("Cannot retrieve cards from deck "); $card = array ( ); while ($obj = mysql_fetch_object ($result_id)) $card[ ] = $obj; # add card record to end of $card array mysql_free_result ($result_id); return ($card); }

Deal the cards by keeping a counter that ranges from 0 to 51 to indicate which card to select. When the counter reaches 52, the deck is exhausted and should be shuffled again.

Категории