Hack 69. Execute Functions in the Database

Should you execute functions in your program or in the database system?

SQL is really quite expressive. It has functions for manipulating text, numbers, and dates into a variety of formats, combining columns, and splitting up column data. But when you're developing client-server applications [Hack #2], you have a choice: should you do this in the database system or in your programming language? The intuitive answer is to do it in the programming language. Your program will likely be able to execute the functions you require faster than the database can. However, processing in the database system can bring its own advantages, particularly when it saves a round trip across the network.

Always remember that the database system runs constantly, and thus has the capability to hold things in memory if the system considers there to be a performance advantage in doing so. This suggests that repeatedly applying the same functions to the same data will lead to the results set being cached. If the data is cached, the functions will not be executed every time you call them, and the database system will just return you the cached copy.

Suppose you were querying the database to return 10 rows of data, just so that you can add them together in your code. The marshalling of columns for transport, sending them to your code, and then splitting them up again all require CPU time and effort. You could perform the addition in the database system and save all that code. If lots of code needs those rows added, you could put that processing into a view. This might be more reliable. It would certainly promote code reuse, and would offer a single point of control so that when the database schema is altered, you've just got to modify the view.

Put another way, the choice comes down to this:

mysql> CREATE TABLE widetable (a INT,b INT, c INT, d INT, e INT); Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO widetable VALUES (5,10,19,11,3); Query OK, 1 row affected (0.00 sec) mysql> SELECT a,b,c,d,e FROM widetable; +------+------+------+------+------+ | a | b | c | d | e | +------+------+------+------+------+ | 5 | 10 | 19 | 11 | 3 | +------+------+------+------+------+ 1 row in set (0.00 sec) Then, in your programming language, do (5+10+19+11+3), throw away a,b,c,d,e, and just use the value 48

or this:

mysql> SELECT a+b+c+d+e AS v FROM widetable; +------+ | v | +------+ | 48 | +------+ 1 row in set (0.00 sec)

In addition to arithmetic, other calculations may best be done in the database. As the database system processes each row, it can execute the calculation and return just the result. When the calculation is in your code, it has to store all the elements of the calculation separately and pass them all to your code, at which point you calculate the result and discard all the elements.

Database functions can be especially useful when dealing with dates. The clock in your database is likely to be different from the clock where your application is running. If it is on a different machine, its clock might be ahead of or behind the machine running your programs. This inconsistency can cause you real problems. In general, you should rely on the clock in the databasedata may be coming in from applications running on several different machines which each have their own idea of the current time.

In many operating systems, the clock is really "seconds since epoch," which is an integer. Programming languages often use this too, as it is much easier to do date arithmetic on an integer than on a date string. It is tempting to save the integer in the database when you want to record a time or date, because your code can handle this number natively. For example, here is the code in Linux:

# date Sun Sep 3 09:49:03 BST 2006 # date +%s 1157273339

So, if you were storing log information in a database log table, and you decided to use epoch-style information, you could end up with this:

INSERT INTO log (event,whn) VALUES ('event detected',1157273339);

But epoch numbers mean nothing to the database system. To convert epoch numbers back to a date you may need an external programming language. Ad hoc queries for particular dates are greatly simplified if you use real dates in the database, but awkward if you use an integer representation. Your database system also has what it believes is the current time, and to avoid confusion which may be caused by differences between "program" time and "database" time, you should always try to use database time. In Oracle, MySQL, and PostgreSQL, you access this using CURRENT_TIMESTAMP:

INSERT INTO log (event,whn) VALUES ('event detected',CURRENT_TIMESTAMP);

On SQL Server, you can use GeTDate( ).

If you want to see what was logged in the last five minutes and you are using epoch time, you have to determine the current epoch number (such as 4561145), work out what five minutes is in seconds (5 * 60 = 300), and then run the following:

SELECT eventname FROM LOG WHERE eventdate BETWEEN 4561145-300 AND 4561145;

With database dates, you can use the bigger but much clearer query:

SELECT eventname FROM log WHERE eventdate BETWEEN CURRENT_TIMESTAMP - INTERVAL 5 MINUTE AND CURRENT_TIMESTAMP;

So, always use CURRENT_DATE, CURRENT_TIMESTAMP, or the equivalent, and learn to translate epoch numbers between the database representation and the programming language. The quality and readability of your data and queries will improve.

Категории