Date-Based Summaries
7.16.1 Problem
You want to produce a summary based on date or time values.
7.16.2 Solution
Use GROUP BY to categorize temporal values into bins of the appropriate duration. Often this will involve using expressions to extract the significant parts of dates or times.
7.16.3 Discussion
To put records in time order, you use an ORDER BY clause to sort a column that has a temporal type. If instead you want to summarize records based on groupings into time intervals, you need to determine how to categorize each record into the proper interval and use GROUP BY to group them accordingly.
Sometimes you can use temporal values directly if they group naturally into the desired categories. This is quite likely if a table represents date or time parts using separate columns. For example, the baseball1.com master ballplayer table represents birth dates using separate year, month, and day columns. To see how many ballplayers were born on each day of the year, perform a calendar date summary that uses the month and day values but ignores the year:
mysql> SELECT birthmonth, birthday, COUNT(*) -> FROM master -> WHERE birthmonth IS NOT NULL AND birthday IS NOT NULL -> GROUP BY birthmonth, birthday; +------------+----------+----------+ | birthmonth | birthday | COUNT(*) | +------------+----------+----------+ | 1 | 1 | 47 | | 1 | 2 | 40 | | 1 | 3 | 50 | | 1 | 4 | 38 | ... | 12 | 28 | 33 | | 12 | 29 | 32 | | 12 | 30 | 32 | | 12 | 31 | 27 | +------------+----------+----------+
A less fine-grained summary can be obtained by using only the month values:
mysql> SELECT birthmonth, COUNT(*) -> FROM master -> WHERE birthmonth IS NOT NULL -> GROUP BY birthmonth; +------------+----------+ | birthmonth | COUNT(*) | +------------+----------+ | 1 | 1311 | | 2 | 1144 | | 3 | 1243 | | 4 | 1179 | | 5 | 1118 | | 6 | 1105 | | 7 | 1244 | | 8 | 1438 | | 9 | 1314 | | 10 | 1438 | | 11 | 1314 | | 12 | 1269 | +------------+----------+
Sometimes temporal values can be used directly, even when not represented as separate columns. To determine how many drivers were on the road and how many miles were driven each day, group the records in the driver_log table by date:
mysql> SELECT trav_date, -> COUNT(*) AS 'number of drivers', SUM(miles) As 'miles logged' -> FROM driver_log GROUP BY trav_date; +------------+-------------------+--------------+ | trav_date | number of drivers | miles logged | +------------+-------------------+--------------+ | 2001-11-26 | 1 | 115 | | 2001-11-27 | 1 | 96 | | 2001-11-29 | 3 | 822 | | 2001-11-30 | 2 | 355 | | 2001-12-01 | 1 | 197 | | 2001-12-02 | 2 | 581 | +------------+-------------------+--------------+
However, this summary will grow lengthier as you add more records to the table. At some point, the number of distinct dates likely will become so large that the summary fails to be useful, and you'd probably decide to change the category size from daily to weekly or monthly.
When a temporal column contains so many distinct values that it fails to categorize well, it's typical for a summary to group records using expressions that map the relevant parts of the date or time values onto a smaller set of categories. For example, to produce a time-of-day summary for records in the mail table, do this:[1]
[1] Note that the result includes an entry only for hours of the day actually represented in the data. To generate a summary with an entry for every hour, use a join to fill in the "missing" values. See Recipe 12.10.
mysql> SELECT HOUR(t) AS hour, -> COUNT(*) AS 'number of messages', -> SUM(size) AS 'number of bytes sent' -> FROM mail -> GROUP BY hour; +------+--------------------+----------------------+ | hour | number of messages | number of bytes sent | +------+--------------------+----------------------+ | 7 | 1 | 3824 | | 8 | 1 | 978 | | 9 | 2 | 2904 | | 10 | 2 | 1056806 | | 11 | 1 | 5781 | | 12 | 2 | 195798 | | 13 | 1 | 271 | | 14 | 1 | 98151 | | 15 | 1 | 1048 | | 17 | 2 | 2398338 | | 22 | 1 | 23992 | | 23 | 1 | 10294 | +------+--------------------+----------------------+
To produce a day-of-week summary instead, use the DAYOFWEEK( ) function:
mysql> SELECT DAYOFWEEK(t) AS weekday, -> COUNT(*) AS 'number of messages', -> SUM(size) AS 'number of bytes sent' -> FROM mail -> GROUP BY weekday; +---------+--------------------+----------------------+ | weekday | number of messages | number of bytes sent | +---------+--------------------+----------------------+ | 1 | 1 | 271 | | 2 | 4 | 2500705 | | 3 | 4 | 1007190 | | 4 | 2 | 10907 | | 5 | 1 | 873 | | 6 | 1 | 58274 | | 7 | 3 | 219965 | +---------+--------------------+----------------------+
To make the output more meaningful, you might want to use DAYNAME( ) to display weekday names instead. However, because day names sort lexically (for example, "Tuesday" sorts after "Friday"), use DAYNAME( ) only for display purposes. Continue to group on the numeric day values so that output rows sort that way:
mysql> SELECT DAYNAME(t) AS weekday, -> COUNT(*) AS 'number of messages', -> SUM(size) AS 'number of bytes sent' -> FROM mail -> GROUP BY DAYOFWEEK(t); +-----------+--------------------+----------------------+ | weekday | number of messages | number of bytes sent | +-----------+--------------------+----------------------+ | Sunday | 1 | 271 | | Monday | 4 | 2500705 | | Tuesday | 4 | 1007190 | | Wednesday | 2 | 10907 | | Thursday | 1 | 873 | | Friday | 1 | 58274 | | Saturday | 3 | 219965 | +-----------+--------------------+----------------------+
A similar technique can be used for summarizing month-of-year categories that are sorted by numeric value but displayed by month name.
Uses for temporal categorizations are plentiful:
- DATETIME or TIMESTAMP columns have the potential to contain many unique values. To produce daily summaries, strip off the time of day part to collapse all values occurring within a given day to the same value. Any of the following GROUP BY clauses will do this, though the last one is likely to be slowest:
GROUP BY FROM_DAYS(TO_DAYS(col_name)) GROUP BY YEAR(col_name), MONTH(col_name), DAYOFMONTH(col_name) GROUP BY DATE_FORMAT(col_name,'%Y-%m-%e')
- To produce monthly or quarterly sales reports, group by MONTH(col_name) or QUARTER(col_name) to place dates into the correct part of the year.
- To summarize web server activity, put your server's logs into MySQL and run queries that collapse the records into different time categories. Chapter 18 discusses how to do this for Apache.