Trigger Overhead
Every database trigger is associated with a specific DML operation (INSERT, UPDATE, or DELETE) on a specific tablethe trigger code will execute whenever that DML operation occurs on that table. Furthermore, all MySQL 5.0 triggers are of the FOR EACH ROW type, which means that the trigger code will execute once for each row affected by the DML operation. Given that a single DML operation might potentially affect thousands of rows, should we be concerned that our triggers might have a negative effect on DML performance? Absolutely!
For all of the reasons outlined previously, triggers can significantly increase the amount of time taken to execute DML operations and can have a detrimental effect on overall application performance if trigger overhead is not carefully managed.
The overhead of a trigger itself is significant, though not unmanageable. For instance, consider the trigger shown in Example 22-21; this trivial trigger serves no purpose, but it allows us to measure the overhead of a trigger that does virtually nothing.
Example 22-21. "Trivial" trigger
CREATE TRIGGER sales_bi_trg BEFORE INSERT ON sales FOR EACH ROW SET @x=NEW.sale_value; |
When we implemented this trivial trigger, the time taken to insert 100,000 sales rows increased from 8.84 seconds to 12.9 secondsan increase of about 45%. So even the simplest of triggers adds a significantthough bearableoverhead.
But what about a complex trigger? In Chapter 11, we created a set of triggers to maintain a sales summary table. One of the triggers we created is the BEFORE INSERT trigger, shown in Example 22-22.
Example 22-22. A more complex trigger
CREATE TRIGGER sales_bi_trg BEFORE INSERT ON sales FOR EACH ROW BEGIN DECLARE row_count INTEGER; SELECT COUNT(*) INTO row_count FROM customer_sales_totals WHERE customer_id=NEW.customer_id; IF row_count > 0 THEN UPDATE customer_sales_totals SET sale_value=sale_value+NEW.sale_value WHERE customer_id=NEW.customer_id; ELSE INSERT INTO customer_sales_totals (customer_id,sale_value) VALUES(NEW.customer_id,NEW.sale_value); END IF; END |
This trigger checks to see if there is an existing row for the customer in the summary table and, if there is, updates that row; otherwise, it adds a new row. Since we are performing a single additional update or insert for every row inserted, we do expect an increase in our INSERT overhead. However, we might not expect that the time taken to insert 10,000 rows increases from 0.722 second to 64.36 secondsalmost 100 times more!
The problem with our trigger is obvious on reflection. The SQL that checks for a matching row is not supported by an index, so for every row inserted into sales, we are performing a full scan of customer_sales_totals. This is not a huge table, but these scans are performed for every row inserted, so the overhead adds up rapidly. Furthermore, the UPDATE statement is also not supported by an index, so a second scan of the customer_sales_totals table is performed to support the UPDATE.
The solution is to create an index on customer_sales_totals.customer_id, as shown in Example 22-23.
Example 22-23. Index to support our trigger
CREATE UNIQUE INDEX customer_sales_totals_cust_id ON customer_sales_totals(customer_id) |
Once the index is created, the performance improves: time to insert 10,000 rows is reduced to about 4.26 seconds, whichalthough much slower than the performance we achieved without a triggeris certainly more acceptable than 64 seconds. Performance variations are shown in Figure 22-11.
Figure 22-11. Trigger performance variations
The lesson here is this: since the trigger code will execute once for every row affected by a DML statement, the trigger can easily become the most significant factor in DML performance. Code inside the trigger body needs to be as lightweight as possible andin particularany SQL statements in the trigger should be supported by indexes whenever possible.