Stored Programs as an Alternative to Expensive SQL
Sometimes we can use a stored program to perform query or DML operations that perform badly in standard SQL. This usually happens when the "pure" SQL statement becomes overly complex because of limitations in the SQL syntax or when the MySQL optimizer isn't able to come up with a sensible plan for your SQL query. In this section we offer two scenarios in which a stored program can be expected to outperform a SQL statement that executes the same logical steps.
22.4.1. Avoid Self-Joins with Procedural Logic
One situation in which a stored program might offer a better solution is where you are forced to construct a query that joins a table to itself in order to filter for the required rows. For instance, in Example 22-5, we issue a SQL statement that retrieves the most valuable order for each customer over the past few months.
Example 22-5. Finding the maximum sale for each customer
SELECT s.customer_id,s.product_id,s.quantity, s.sale_value FROM sales s, (SELECT customer_id,max(sale_value) max_sale_value FROM sales GROUP BY customer_id) t WHERE t.customer_id=s.customer_id AND t.max_sale_value=s.sale_value AND s.sale_date>date_sub(currdate( ),interval 6 month); |
This is an expensive SQL statement, partially because we first need to create a temporary table to hold the customer ID and maximum sale value and then join that back to the sales table to find the full details for each of those rows.
MySQL doesn't provide SQL syntax that would allow us to return this data without an expensive self-join. However, we can use a stored program to retrieve the data in a single pass through the sales table. Example 22-6 shows a stored program that stores maximum sales for each customer into a temporary table (max_sales_by_customer) from which we can later select the results.
Example 22-6. Stored program to return maximum sales for each customer over the last 6 months
1 CREATE PROCEDURE sp_max_sale_by_cust( ) 2 MODIFIES SQL DATA 3 BEGIN 4 DECLARE last_sale INT DEFAULT 0; 5 DECLARE l_last_customer_id INT DEFAULT -1; 6 DECLARE l_customer_id INT; 7 DECLARE l_product_id INT; 8 DECLARE l_quantity INT; 9 DECLARE l_sale_value DECIMAL(8,2); 10 DECLARE counter INT DEFAULT 0; 11 12 DECLARE sales_csr CURSOR FOR 13 SELECT customer_id,product_id,quantity, sale_value 14 FROM sales 15 WHERE sale_date>date_sub(currdate( ),interval 6 month) 16 ORDER BY customer_id,sale_value DESC; 17 18 DECLARE CONTINUE HANDLER FOR NOT FOUND SET last_sale=1; 19 20 OPEN sales_csr; 21 sales_loop: LOOP 22 FETCH sales_csr INTO l_customer_id,l_product_id,l_quantity,l_sale_value; 23 IF (last_sale=1) THEN 24 LEAVE sales_loop; 25 END IF; 26 27 IF l_customer_id <> l_last_customer_id THEN 28 /* This is a new customer so first row will be max sale*/ 29 INSERT INTO max_sales_by_customer 30 (customer_id,product_id,quantity,sale_value) 31 VALUES(l_customer_id,l_product_id,l_quantity,l_sale_value); 32 END IF; 33 34 SET l_last_customer_id=l_customer_id; 35 36 END LOOP; 37 38 END; |
Let's look at the most significant lines in this program:
Line(s) |
Explanation |
---|---|
12 |
Declare a cursor that will return sales for the past 6 months ordered by customer_id and then by descending sale_value. |
27-32 |
Check to see whether we have encountered a new customer_id. The first row for any given customer will be the maximum sale for that customer, so we insert that row into a temporary table (line 30). |
The stored program is significantly faster than the standard SQL solution. Figure 22-3 shows the elapsed time for the two solutions.
Figure 22-3. Using a stored program to optimize a complex self-join
22.4.2. Optimize Correlated Updates
A correlated update is an UPDATE statement that contains a correlated subquery in the SET clause and/or WHERE clause. Correlated updates are often good candidates for optimization through procedural execution. In Example 22-7 we have an UPDATE statement that updates all customers who are also employees, and assigns the employee's manager as their sales representative.
Example 22-7. Correlated UPDATE statement
UPDATE customers c SET sales_rep_id = (SELECT manager_id FROM employees WHERE surname = c.contact_surname AND firstname = c.contact_firstname AND date_of_birth = c.date_of_birth) WHERE (contact_surname, contact_firstname, date_of_birth) IN (SELECT surname, firstname, date_of_birth FROM employees and ); |
Note that the UPDATE statement needs to access the employees table twice: once to identify customers who are employees and again to find the manager's identifier for those employees.
Example 22-8 offers a stored program that provides an alternative to the correlated update. The stored program identifies those customers who are also employees using a cursor. For each of the customers retrieved by the cursor, an UPDATE is issued.
Example 22-8. Stored program alternative to the correlated update
CREATE PROCEDURE sp_correlated_update( ) MODIFIES SQL DATA BEGIN DECLARE last_customer INT DEFAULT 0; DECLARE l_customer_id INT ; DECLARE l_manager_id INT; DECLARE cust_csr CURSOR FOR select c.customer_id,e.manager_id from customers c, employees e where e.surname=c.contact_surname and e.firstname=c.contact_firstname and e.date_of_birth=c.date_of_birth; DECLARE CONTINUE HANDLER FOR NOT FOUND SET last_customer=1; OPEN cust_csr; cust_loop: LOOP FETCH cust_csr INTO l_customer_id,l_manager_id; IF (last_customer=1) THEN LEAVE cust_loop; END IF; UPDATE customers SET sales_rep_id=l_manager_id WHERE customer_id=l_customer_id; END LOOP; END; |
Because the stored program does not have to do two separate accesses of the customers table, it is significantly faster than the standard SQL. Figure 22-4 compares the performance of the two approaches.
Figure 22-4. Performance of a correlated update and stored program alternative