Hack 25. Keep a Running Total

You can keep track of a bunch of values and even produce a running total, but your database might break a sweat.

Many times you might need a running total: to determine an account balance, to analyze sales reports, and so on. Check out Merle's bank statement in Table 5-2.

Table 5-2. Bank statement with running total

date description in out balance
1 Nov 2006 Wages 50   50
2 Nov 2006 Company Store   10 40
3 Nov 2006 Company Store   10 30
4 Nov 2006 Company Store   10 20
5 Nov 2006 Company Store   10 10
6 Nov 2006 Company Store   10 0
7 Nov 2006 Company Store   10 10

The balance column shows a running total. The balance column should not be stored in the database (it would introduce redundancy), and it should be calculated only when it is needed. The source data for this statement is the sequence of financial transactions shown in Table 5-3, consisting of one deposit of $50 and six withdrawals of $10 each.

Table 5-3. The transact table

whn description amount
1 Nov 2006 Wages 50
2 Nov 2006 Company Store 10
3 Nov 2006 Company Store 10
4 Nov 2006 Company Store 10
5 Nov 2006 Company Store 10
6 Nov 2006 Company Store 10

You can calculate the running total by joining this table to itself. Let's call the two versions of the transact table x and y. The x version of TRansact produces the lines of output shown in the bank statement. The y version is used to accumulate all transactions that occurred on or before the x date:

mysql> SELECT x.whn, x.description, x.amount, SUM(y.amount) AS balance -> FROM transact x JOIN transact y ON (x.whn>= y.whn) -> GROUP BY x.whn, x.description, x.amount; +------------+---------------+--------+---------+ | whn | description | amount | balance | +------------+---------------+--------+---------+ | 2006-11-01 | Wages | 50 | 50 | | 2006-11-02 | Company Store | -10 | 40 | | 2006-11-03 | Company Store | -10 | 30 | | 2006-11-04 | Company Store | -10 | 20 | | 2006-11-05 | Company Store | -10 | 10 | | 2006-11-06 | Company Store | -10 | 0 | | 2006-11-07 | Company Store | -10 | -10 | +------------+---------------+--------+---------+ 7 rows in set (0.00 sec)

It can be easier to figure out what is going here if you try that query using ORDER BY rather than GROUP BY:

mysql> SELECT x.whn, x.description, x.amount, y.amount AS y -> FROM transact x JOIN transact y ON (x.whn >= y.whn) -> ORDER BY x.whn, y.whn; +------------+---------------+--------+------+ | whn | description | amount | y | +------------+---------------+--------+------+ | 2006-11-01 | Wages | 50 | 50 | | 2006-11-02 | Company Store | -10 | 50 | | 2006-11-02 | Company Store | -10 | -10 | | 2006-11-03 | Company Store | -10 | 50 | | 2006-11-03 | Company Store | -10 | -10 | | 2006-11-03 | Company Store | -10 | -10 | | 2006-11-04 | Company Store | -10 | 50 | | 2006-11-04 | Company Store | -10 | -10 | | 2006-11-04 | Company Store | -10 | -10 | | 2006-11-04 | Company Store | -10 | -10 | | 2006-11-05 | Company Store | -10 | 50 | | 2006-11-05 | Company Store | -10 | -10 | | 2006-11-05 | Company Store | -10 | -10 | | 2006-11-05 | Company Store | -10 | -10 | | 2006-11-05 | Company Store | -10 | -10 | | 2006-11-06 | Company Store | -10 | 50 | | 2006-11-06 | Company Store | -10 | -10 | | 2006-11-06 | Company Store | -10 | -10 | | 2006-11-06 | Company Store | -10 | -10 | | 2006-11-06 | Company Store | -10 | -10 | | 2006-11-06 | Company Store | -10 | -10 | | 2006-11-07 | Company Store | -10 | 50 | | 2006-11-07 | Company Store | -10 | -10 | | 2006-11-07 | Company Store | -10 | -10 | | 2006-11-07 | Company Store | -10 | -10 | | 2006-11-07 | Company Store | -10 | -10 | | 2006-11-07 | Company Store | -10 | -10 | | 2006-11-07 | Company Store | -10 | -10 | +------------+---------------+--------+------+ 28 rows in set (0.00 sec)

You can see that the first day has one row; the second day has two rows, and so forth. That is because one row from the y table passes the join condition (x.whn >= y.whn) for the first x day; two y rows match the second x day, and so on. Thus, the sum of the y column restricted to a particular date gives the running total for that date.

You can bundle the repeated x rows using the GROUP BY clause. This means that each x row shows up only once; the y values get aggregated.

In banking, money received (IN) often appears in one column and money paid (OUT) in another. If you want to separate the IN and OUT columns for the positive and negative values it is easiest to wrap up this query into another. You can use a CASE expression to output either a number or a blank string:

mysql> SELECT w AS dte, d AS description, -> CASE WHEN (a>=0) THEN a ELSE NULL END AS moneyIn, -> CASE WHEN (a<0) THEN a ELSE NULL END AS moneyOut, -> balance FROM -> (SELECT x.whn AS w, x.description AS d, -> x.amount AS a, SUM(y.amount) AS balance -> FROM transact x JOIN transact y ON (x.whn>=y.whn) -> GROUP BY x.whn, x.description, x.amount) t; +------------+---------------+---------+----------+---------+ | dte | description | moneyIn | moneyOut | balance | +------------+---------------+---------+----------+---------+ | 2006-11-01 | Wages | 50 | NULL | 50 | | 2006-11-02 | Company Store | NULL | -10 | 40 | | 2006-11-03 | Company Store | NULL | -10 | 30 | | 2006-11-04 | Company Store | NULL | -10 | 20 | | 2006-11-05 | Company Store | NULL | -10 | 10 | | 2006-11-06 | Company Store | NULL | -10 | 0 | | 2006-11-07 | Company Store | NULL | -10 | -10 | +------------+---------------+---------+----------+---------+

The SQL statements here are not very efficient. It is possible to perform these calculations in linear time, but SQL performs them in quadratic time.

For example, to work out the running total on seven days you need to add only seven numbers. The self-joins used here need zero additions for the first day, one for the second day, two for the third day, and so onthat works out to 21 addition operations in total, and if you had 10 times as many rows you would need 100 times as many sums.

If you have a few hundred rows to deal with you will probably not even notice the cost; but if you have a thousand rows it will start to hurt.

 

5.2.1. Hacking the Hack

To tackle the performance issue you can use variables and a cursor to calculate a running total in SQL Server and MySQL. You can use a similar technique with PL/SQL in Oracle. The syntax is different for each platform.

5.2.1.1. SQL Server

This example assumes that there is a column in the TRansact table to hold the running total. This column is called runtot and it gets updated as the cursor loops over each row:

DECLARE @accumulator INTEGER DECLARE @amount INTEGER DECLARE @cur CURSOR SET @cur=CURSOR FOR SELECT amount FROM transact FOR UPDATE OF runtot OPEN @cur FETCH NEXT FROM @cur INTO @amount SET @accumulator=0 WHILE @@FETCH_STATUS = 0 BEGIN SET @accumulator = @accumulator+@amount UPDATE transact SET runtot = @accumulator WHERE CURRENT of @cur FETCH NEXT FROM @cur INTO @amount END

 

5.2.1.2. MySQL

In MySQL, you can update and reference a variable from within a SELECT statement:

mysql> SELECT whn, -> description, -> amount, -> @accumulator:=@accumulator+amount RunningTotal -> FROM transact; +------------+---------------+--------+--------------+ | whn | description | amount | RunningTotal | +------------+---------------+--------+--------------+ | 2006-11-01 | Wages | 50 | 50 | | 2006-11-02 | Company Store | -10 | 40 | | 2006-11-03 | Company Store | -10 | 30 | | 2006-11-04 | Company Store | -10 | 20 | | 2006-11-05 | Company Store | -10 | 10 | | 2006-11-06 | Company Store | -10 | 0 | | 2006-11-07 | Company Store | -10 | -10 | +------------+---------------+--------+--------------+

 

5.2.1.3. Oracle

Oracle has a neat extension to the windowing functions. You can use the OVER clause in conjunction with an aggregate function such as SUM. This makes efficient calculation of running totals a breeze:

SQL> SELECT whn,amount,SUM(amount) OVER (ORDER BY whn) 2 FROM transact; WHN AMOUNT SUM(AMOUNT)OVER(ORDERBYWHN) --------------- ---------- --------------------------- 01-NOV-06 50 50 02-NOV-06 -10 40 03-NOV-06 -10 30 04-NOV-06 -10 20 05-NOV-06 -10 10 06-NOV-06 -10 0 07-NOV-06 -10 -10 7 rows selected.

 

5.2.2. See Also

Категории