Tuning DML (INSERT, UPDATE, DELETE)
The first principle for optimizing UPDATE, DELETE, and INSERT statements is to optimize any WHERE clause conditions used to find the rows to be manipulated or inserted. The DELETE and UPDATE statements may contain WHERE clauses, and the INSERT statement may contain SQL that defines the data to be inserted. Ensure that these WHERE clauses are efficientperhaps by creating appropriate concatenated indexes .
The second principle for optimizing DML performance is to avoid creating too many indexes. Whenever a row is inserted or deleted, updates must occur to every index that exists against the table. These indexes exist to improve query performance, but bear in mind that each index also results in overhead when the row is created or deleted. For updates, only the indexes that reference the specific columns being modified need to be updated.
21.5.1. Batching Inserts
The MySQL language allows more than one row to be inserted in a single INSERT operation. For instance, the statement in Example 21-21 inserts five rows into the clickstream_log table in a single call.
Example 21-21. Batch INSERT statement
INSERT INTO clickstream_log (url,timestamp,source_ip) values ('http://dev.mysql.com/downloads/mysql/5.0.html', '2005-02-10 11:46:23','192.168.34.87') , ('http://dev.mysql.com/downloads/mysql/4.1.html', '2005-02-10 11:46:24','192.168.35.78'), ('http://dev.mysql.com', '2005-02-10 11:46:24','192.168.35.90'), ('http://www.mysql.com/bugs', '2005-02-10 11:46:25','192.168.36.07'), ('http://dev.mysql.com/downloads/mysql/5.1.html', '2005-02-10 11:46:25','192.168.36.12') |
Batching INSERT operations in this way can radically improve performance. Figure 21-10 shows how the time taken to insert 10,000 rows into the table decreases as we increase the number of rows included within each INSERT statement. Inserting one row at a time, it took about 384 seconds to insert the rows. When inserting 100 rows at a time, we were able to add the same number of rows in only 7 seconds.
Figure 21-10. Performance improvement from multirow inserts
|
21.5.2. Optimizing DML by Reducing Commit Frequency
If we are using a transactional storage enginefor instance, if our tables are using the InnoDB enginewe should make sure that we are committing changes to the database only when necessary. Excessive commits will degrade performance.
By default, MySQL will issue an implicit commit after every SQL statement. When a commit occurs, a storage engine like InnoDB will write a record to its transaction log on disk to ensure that the transaction is persistent (i.e., to ensure that the transaction will not be lost if MySQL or our program crashes). These transaction log writes involve a physical I/O to the disk and therefore always add to our response time.
We can prevent this automatic commit behavior by issuing the SET AUTOCOMMIT=0 statement and/or by issuing a START TRANSACTION statement before issuing our statements. We can then issue a COMMIT statement at regular intervals, reducing the number of writes to the transaction log that will be required. (Note, though, that MySQL will occasionally write to the transaction log anyway when memory buffers require flushing.)
Usually, the frequency with which we commit is driven by our application logic rather than by performance. For instance, if a user clicks a Save button in our application, he is going to expect that the information will be permanently saved to the database, and so we will be required to issue a COMMIT as a result. However, in batch applications, we can often choose to commit at relatively infrequent intervals. Reducing the commit frequency can have a huge effect on DML performance.
In Figure 21-11, we see how reducing the commit frequency affected the time taken to insert 10,000 rows into the database. At the default settings, it took about 850 seconds (about 14 minutes) to insert the 10,000 rows. If we commit only after every 100 rows have been inserted, the time taken is reduced to only 8 seconds.
In these tests, the InnoDB transaction log was on the same disk as the InnoDB tablespace files, which magnified the degradation caused by transaction log writes. Moving the transaction log to a dedicated disk can reducealthough not eliminatethe transaction log overhead.
Figure 21-11. How commit frequency affects DML performance
|
We looked at how you can manipulate commit frequency in stored programs in Chapter 8.
21.5.3. Triggers and DML Performance
Because trigger code will be invoked for every row affected by the relevant DML operation, poorly performing triggers can have a very significant effect on DML performance . If our DML performance is a concern and there are triggers on the tables involved, we may want to determine the overhead of our triggers by measuring performance with and without the triggers.
We provide some more advice on trigger tuning in Chapter 22.