Hack 25. Keep a Running Total
You can keep track of a bunch of values and even produce a running total, but your database might break a sweat.
Many times you might need a running total: to determine an account balance, to analyze sales reports, and so on. Check out Merle's bank statement in Table 5-2.
date | description | in | out | balance |
---|---|---|---|---|
1 Nov 2006 | Wages | 50 | 50 | |
2 Nov 2006 | Company Store | 10 | 40 | |
3 Nov 2006 | Company Store | 10 | 30 | |
4 Nov 2006 | Company Store | 10 | 20 | |
5 Nov 2006 | Company Store | 10 | 10 | |
6 Nov 2006 | Company Store | 10 | 0 | |
7 Nov 2006 | Company Store | 10 | 10 |
The balance column shows a running total. The balance column should not be stored in the database (it would introduce redundancy), and it should be calculated only when it is needed. The source data for this statement is the sequence of financial transactions shown in Table 5-3, consisting of one deposit of $50 and six withdrawals of $10 each.
whn | description | amount |
---|---|---|
1 Nov 2006 | Wages | 50 |
2 Nov 2006 | Company Store | 10 |
3 Nov 2006 | Company Store | 10 |
4 Nov 2006 | Company Store | 10 |
5 Nov 2006 | Company Store | 10 |
6 Nov 2006 | Company Store | 10 |
You can calculate the running total by joining this table to itself. Let's call the two versions of the transact table x and y. The x version of TRansact produces the lines of output shown in the bank statement. The y version is used to accumulate all transactions that occurred on or before the x date:
mysql> SELECT x.whn, x.description, x.amount, SUM(y.amount) AS balance -> FROM transact x JOIN transact y ON (x.whn>= y.whn) -> GROUP BY x.whn, x.description, x.amount; +------------+---------------+--------+---------+ | whn | description | amount | balance | +------------+---------------+--------+---------+ | 2006-11-01 | Wages | 50 | 50 | | 2006-11-02 | Company Store | -10 | 40 | | 2006-11-03 | Company Store | -10 | 30 | | 2006-11-04 | Company Store | -10 | 20 | | 2006-11-05 | Company Store | -10 | 10 | | 2006-11-06 | Company Store | -10 | 0 | | 2006-11-07 | Company Store | -10 | -10 | +------------+---------------+--------+---------+ 7 rows in set (0.00 sec)
It can be easier to figure out what is going here if you try that query using ORDER BY rather than GROUP BY:
mysql> SELECT x.whn, x.description, x.amount, y.amount AS y -> FROM transact x JOIN transact y ON (x.whn >= y.whn) -> ORDER BY x.whn, y.whn; +------------+---------------+--------+------+ | whn | description | amount | y | +------------+---------------+--------+------+ | 2006-11-01 | Wages | 50 | 50 | | 2006-11-02 | Company Store | -10 | 50 | | 2006-11-02 | Company Store | -10 | -10 | | 2006-11-03 | Company Store | -10 | 50 | | 2006-11-03 | Company Store | -10 | -10 | | 2006-11-03 | Company Store | -10 | -10 | | 2006-11-04 | Company Store | -10 | 50 | | 2006-11-04 | Company Store | -10 | -10 | | 2006-11-04 | Company Store | -10 | -10 | | 2006-11-04 | Company Store | -10 | -10 | | 2006-11-05 | Company Store | -10 | 50 | | 2006-11-05 | Company Store | -10 | -10 | | 2006-11-05 | Company Store | -10 | -10 | | 2006-11-05 | Company Store | -10 | -10 | | 2006-11-05 | Company Store | -10 | -10 | | 2006-11-06 | Company Store | -10 | 50 | | 2006-11-06 | Company Store | -10 | -10 | | 2006-11-06 | Company Store | -10 | -10 | | 2006-11-06 | Company Store | -10 | -10 | | 2006-11-06 | Company Store | -10 | -10 | | 2006-11-06 | Company Store | -10 | -10 | | 2006-11-07 | Company Store | -10 | 50 | | 2006-11-07 | Company Store | -10 | -10 | | 2006-11-07 | Company Store | -10 | -10 | | 2006-11-07 | Company Store | -10 | -10 | | 2006-11-07 | Company Store | -10 | -10 | | 2006-11-07 | Company Store | -10 | -10 | | 2006-11-07 | Company Store | -10 | -10 | +------------+---------------+--------+------+ 28 rows in set (0.00 sec)
You can see that the first day has one row; the second day has two rows, and so forth. That is because one row from the y table passes the join condition (x.whn >= y.whn) for the first x day; two y rows match the second x day, and so on. Thus, the sum of the y column restricted to a particular date gives the running total for that date.
You can bundle the repeated x rows using the GROUP BY clause. This means that each x row shows up only once; the y values get aggregated.
In banking, money received (IN) often appears in one column and money paid (OUT) in another. If you want to separate the IN and OUT columns for the positive and negative values it is easiest to wrap up this query into another. You can use a CASE expression to output either a number or a blank string:
mysql> SELECT w AS dte, d AS description, -> CASE WHEN (a>=0) THEN a ELSE NULL END AS moneyIn, -> CASE WHEN (a<0) THEN a ELSE NULL END AS moneyOut, -> balance FROM -> (SELECT x.whn AS w, x.description AS d, -> x.amount AS a, SUM(y.amount) AS balance -> FROM transact x JOIN transact y ON (x.whn>=y.whn) -> GROUP BY x.whn, x.description, x.amount) t; +------------+---------------+---------+----------+---------+ | dte | description | moneyIn | moneyOut | balance | +------------+---------------+---------+----------+---------+ | 2006-11-01 | Wages | 50 | NULL | 50 | | 2006-11-02 | Company Store | NULL | -10 | 40 | | 2006-11-03 | Company Store | NULL | -10 | 30 | | 2006-11-04 | Company Store | NULL | -10 | 20 | | 2006-11-05 | Company Store | NULL | -10 | 10 | | 2006-11-06 | Company Store | NULL | -10 | 0 | | 2006-11-07 | Company Store | NULL | -10 | -10 | +------------+---------------+---------+----------+---------+
|
5.2.1. Hacking the Hack
To tackle the performance issue you can use variables and a cursor to calculate a running total in SQL Server and MySQL. You can use a similar technique with PL/SQL in Oracle. The syntax is different for each platform.
5.2.1.1. SQL Server
This example assumes that there is a column in the TRansact table to hold the running total. This column is called runtot and it gets updated as the cursor loops over each row:
DECLARE @accumulator INTEGER DECLARE @amount INTEGER DECLARE @cur CURSOR SET @cur=CURSOR FOR SELECT amount FROM transact FOR UPDATE OF runtot OPEN @cur FETCH NEXT FROM @cur INTO @amount SET @accumulator=0 WHILE @@FETCH_STATUS = 0 BEGIN SET @accumulator = @accumulator+@amount UPDATE transact SET runtot = @accumulator WHERE CURRENT of @cur FETCH NEXT FROM @cur INTO @amount END
5.2.1.2. MySQL
In MySQL, you can update and reference a variable from within a SELECT statement:
mysql> SELECT whn, -> description, -> amount, -> @accumulator:=@accumulator+amount RunningTotal -> FROM transact; +------------+---------------+--------+--------------+ | whn | description | amount | RunningTotal | +------------+---------------+--------+--------------+ | 2006-11-01 | Wages | 50 | 50 | | 2006-11-02 | Company Store | -10 | 40 | | 2006-11-03 | Company Store | -10 | 30 | | 2006-11-04 | Company Store | -10 | 20 | | 2006-11-05 | Company Store | -10 | 10 | | 2006-11-06 | Company Store | -10 | 0 | | 2006-11-07 | Company Store | -10 | -10 | +------------+---------------+--------+--------------+
5.2.1.3. Oracle
Oracle has a neat extension to the windowing functions. You can use the OVER clause in conjunction with an aggregate function such as SUM. This makes efficient calculation of running totals a breeze:
SQL> SELECT whn,amount,SUM(amount) OVER (ORDER BY whn) 2 FROM transact; WHN AMOUNT SUM(AMOUNT)OVER(ORDERBYWHN) --------------- ---------- --------------------------- 01-NOV-06 50 50 02-NOV-06 -10 40 03-NOV-06 -10 30 04-NOV-06 -10 20 05-NOV-06 -10 10 06-NOV-06 -10 0 07-NOV-06 -10 -10 7 rows selected.
5.2.2. See Also
- "Calculate Rank" [Hack #40]