Finding Rows Containing Per-Group Minimum or Maximum Values

12.7.1 Problem

You want to find which record within each group of rows in a table contains the maximum or minimum value for a given column. For example, you want to determine the most expensive painting in your collection for each artist.

12.7.2 Solution

Create a temporary table to hold the per-group maximum or minimum, then join the temporary table with the original one to pull out the matching record for each group.

12.7.3 Discussion

Many questions involve finding largest or smallest values in a particular table column, but it's also common to want to know what the other values are in the row that contains the value. For example, you can use MAX(pop) to find the largest state population recorded in the states table, but you might also want to know which state has that population. As shown in Recipe 7.6, one way to solve this problem is to use a SQL variable. The technique works like this:

mysql> SELECT @max := MAX(pop) FROM states; mysql> SELECT * FROM states WHERE pop = @max; +------------+--------+------------+----------+ | name | abbrev | statehood | pop | +------------+--------+------------+----------+ | California | CA | 1850-09-09 | 29760021 | +------------+--------+------------+----------+

Another way to answer the question is to use a join. First, select the maximum population value into a temporary table:

mysql> CREATE TABLE tmp SELECT MAX(pop) as maxpop FROM states;

Then join the temporary table to the original one to find the record matching the selected population:

mysql> SELECT states.* FROM states, tmp WHERE states.pop = tmp.maxpop; +------------+--------+------------+----------+ | name | abbrev | statehood | pop | +------------+--------+------------+----------+ | California | CA | 1850-09-09 | 29760021 | +------------+--------+------------+----------+

By applying these techniques to the artist and painting tables, you can answer questions like "What is the most expensive painting in the collection, and who painted it?" To use a SQL variable, store the highest price in it, then use the variable to identify the record containing the price so you can retrieve other columns from it:

mysql> SELECT @max_price := MAX(price) FROM painting; mysql> SELECT artist.name, painting.title, painting.price -> FROM artist, painting -> WHERE painting.price = @max_price -> AND painting.a_id = artist.a_id; +----------+---------------+-------+ | name | title | price | +----------+---------------+-------+ | Da Vinci | The Mona Lisa | 87 | +----------+---------------+-------+

The same thing can be done by creating a temporary table to hold the maximum price, and then joining it with the other tables:

mysql> CREATE TABLE tmp SELECT MAX(price) AS max_price FROM painting; mysql> SELECT artist.name, painting.title, painting.price -> FROM artist, painting, tmp -> WHERE painting.price = tmp.max_price -> AND painting.a_id = artist.a_id; +----------+---------------+-------+ | name | title | price | +----------+---------------+-------+ | Da Vinci | The Mona Lisa | 87 | +----------+---------------+-------+

On the face of it, using a temporary table and a join is just a more complicated way of answering the question. Does this technique have any practical value? Yes, it does, because it leads to a more general technique for answering more difficult questions. The previous queries show information only for the most expensive single painting in the entire painting table. What if your question is, "What is the most expensive painting per artist?" You can't use a SQL variable to answer that question, because the answer requires finding one price per artist, and a variable can hold only a single value at a time. But the technique of using a temporary table works well, because the table can hold multiple values and a join can find matches for them all at once. To answer the question, select each artist ID and the corresponding maximum painting price into a temporary table. The table will contain not just the maximum painting price, but the maximum within each group, where "group" is defined as "paintings by a given artist." Then use the artist IDs and prices stored in the tmp table to match records in the painting table, and join the result with artist to get the artist names:

mysql> CREATE TABLE tmp -> SELECT a_id, MAX(price) AS max_price FROM painting GROUP BY a_id; mysql> SELECT artist.name, painting.title, painting.price -> FROM artist, painting, tmp -> WHERE painting.a_id = tmp.a_id -> AND painting.price = tmp.max_price -> AND painting.a_id = artist.a_id; +----------+-------------------+-------+ | name | title | price | +----------+-------------------+-------+ | Da Vinci | The Mona Lisa | 87 | | Van Gogh | The Potato Eaters | 67 | | Renoir | Les Deux Soeurs | 64 | +----------+-------------------+-------+

The same technique works for other kinds of values, such as temporal values. Consider the driver_log table that lists drivers and trips that they've taken:

mysql> SELECT name, trav_date, miles -> FROM driver_log -> ORDER BY name, trav_date; +-------+------------+-------+ | name | trav_date | miles | +-------+------------+-------+ | Ben | 2001-11-29 | 131 | | Ben | 2001-11-30 | 152 | | Ben | 2001-12-02 | 79 | | Henry | 2001-11-26 | 115 | | Henry | 2001-11-27 | 96 | | Henry | 2001-11-29 | 300 | | Henry | 2001-11-30 | 203 | | Henry | 2001-12-01 | 197 | | Suzi | 2001-11-29 | 391 | | Suzi | 2001-12-02 | 502 | +-------+------------+-------+

One type of maximum-per-group problem for this table is, "show the most recent trip for each driver." It can be solved like this:

mysql> CREATE TABLE tmp -> SELECT name, MAX(trav_date) AS trav_date -> FROM driver_log GROUP BY name; mysql> SELECT driver_log.name, driver_log.trav_date, driver_log.miles -> FROM driver_log, tmp -> WHERE driver_log.name = tmp.name -> AND driver_log.trav_date = tmp.trav_date -> ORDER BY driver_log.name; +-------+------------+-------+ | name | trav_date | miles | +-------+------------+-------+ | Ben | 2001-12-02 | 79 | | Henry | 2001-12-01 | 197 | | Suzi | 2001-12-02 | 502 | +-------+------------+-------+

12.7.4 See Also

The technique illustrated in this section shows how to answer maximum-per-group questions by selecting summary information into a temporary table and joining that table to the original one. This technique has many applications. One such application is calculation of team standings, where the standings for each group of teams are determined by comparing each team in the group to the team with the best record. Recipe 12.8 discusses how to do this.

Категории