How MySQL Processes SQL

The following sections provide a brief overview of the parsing and caching steps that MySQL undertakes as it processes a SQL statement.

19.2.1. Parsing SQL

A SQL statement sent to the MySQL server must first be parsed. Parsing involves the following actions:

The execution plan represents MySQL's strategy for retrieving or modifying the data specified by the SQL statement. The optimizer is that part of the MySQL code that is responsible for making these decisions. Here are some of the questions that the optimizer needs to ask before it can come up with its plan:

Compared to some of the major relational databases (Oracle, SQL Server, DB2), MySQL's optimizer might seem, at first glance, to be relatively simplistic. MySQL's optimizer is, however, extremely effective. You will only rarely need to rewrite a SQL statement to make it perform more efficientlythe optimizer will usually make the right decision. Since the optimizer cannot create "missing" indexes that might make your statement run faster, the most important thing you can do to assist the optimizer is to create a good set of supporting indexes on your tables .

Understanding how the optimizer makes its decisions will help you to make sound database design and SQL programming decisions. In the next two chapters, we will look at specific SQL tuning scenarios, explain how the optimizer deals with each of these scenarios, and discuss techniques for optimizing the SQL involved.

19.2.2. Caching

MySQL supports some in-memory structures (also known generally as caches ), which can improve the performance of SQL statements.

19.2.2.1. Buffer pool and key cache

Almost every SQL statement needs to work with data from the databaseeither to return it to the calling program or to modify it as instructed by an INSERT, UPDATE, or DELETE statement. In many cases, however, MySQL can obtain this data without the overhead of disk I/O by retrieving the required data from one of a number of caches.

For MyISAM tables, MySQL relies on the operating system to cache the data contained in the individual files that make up the tables. All operating systems include read caches, and if you read from a MyISAM file more than once, there is a chance that the data will still be in the operating system cache when you try to read it a second time. You will usually have very little control over the size of the OS read cache, since it is normally managed by the operating system itself.

MyISAM does, however, have its own cache for index blocks. This is controlled by the startup parameter KEY_BUFFER_SIZE.

The InnoDB storage engine maintains a single cache for both index and table blocks. This is controlled by the parameter INNODB_BUFFER_POOL_SIZE.

Correctly sizing these two buffers can help reduce the amount of disk I/O required to satisfy the data requirements of your SQL statements. In general, you should allocate as much memory as possible to these caches. However, beware of allocating too much memory for the MyISAM key bufferyou might inadvertently starve the OS read buffer and reduce the amount of memory available for caching table data.

19.2.2.2. Table cache

The table cache maintains metadata about tables in memory and also contains the link to the storage handler's physical representation of the table. In MyISAM, these links are file descriptors pointing to the .frm files and the .MYD files. Each session that needs to access a table will require its own table cache entry. The default value of TABLE_CACHE (typically 256) is often too small for systems with large numbers of tables and/or high numbers of concurrent users.

19.2.2.3. Query cache

Before MySQL goes to the trouble of parsing a SQL statement, it will look in the query cache to see if it already has in memory a copy of the SQL statement and its result set. If it finds a match, it can return the result set directly from the query cache. This "shortcut" can greatly improve query performance. So what are the criteria for determining a match?

In order for MySQL to take advantage of a cached result set, the new SQL statement must match exactly the statement associated with the result set, including whitespace and comments . If the same logical statement is written more than once within an application, there is a very good chance that the statements will not be physically identical, thus negating a key performance enhancement.

In addition, if any table referred to in the statement is modified, then that statement and its result set will be flushed from the query cache. This behavior makes the query cache most useful for applications or tables that are read-intensive. If a table is being modified many times a secondas might be the case in an OLTP applicationthen it is unlikely that queries against that table are going to remain in cache long enough to be useful. Remember: any modification to the table will cause queries using that table to be flushedeven if the modification does not impact the rows returned by the query.

Some SQL statements cannot be cached at allparticularly if they contain a function that is not guaranteed to return the same result every time it is called. For instance, the CURDATE function will return a different value (the current date-time) every time it is called. So if you include a call to CURDATE in your query, it will not be cached.

The query cache will be most effective when at least some of the following are true:

You can control the size of the cache with the SET GLOBAL query_cache_size=size statement.

You can view statistics about query cache usage with the SHOW STATUS LIKE 'qcache%'; statement.

Stored programs can benefit from the query cache. A stored program that returns a result set will be cached, and any subsequent execution of that program can be satisfied using the query cache. However, SQL statements within stored programs cannot currently be satisfied from the cache (we might imagine that when they execute within the database, they are executing "behind the cache").

19.2.2.4. Table statistics

Like most query optimizers, MySQL maintains statistics about table and index data so that it can use this additional information to formulate the most efficient execution plan.

You can view the statistics that MySQL keeps for a table with the SHOW TABLE STATUS statement. Example 19-1 shows an example of using this statement.

Example 19-1. Viewing table statistics

mysql> SHOW TABLE STATUS LIKE 'sales' G *************************** 1. row *************************** Name: sales Engine: InnoDB Version: 9 Row_format: Fixed Rows: 2500137 Avg_row_length: 114 Data_length: 285016064 Max_data_length: 0 Index_length: 0 Data_free: 0 Auto_increment: 2500001 Create_time: 2004-12-28 10:47:35 Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: InnoDB free: 1766400 kB 1 row in set (0.60 sec)

You can view the statistics that MySQL keeps for the indexes on a table with the SHOW INDEXES statement, as shown in Example 19-2.

Example 19-2. Viewing index statistics

mysql> SHOW INDEXES FROM sales G *************************** 1. row *************************** Table: sales Non_unique: 0 Key_name: PRIMARY Seq_in_index: 1 Column_name: SALES_ID Collation: A Cardinality: 2500137 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: 1 row in set (0.18 sec)

The two most important columns in the output from these commands are Rows and Avg_row_length from SHOW TABLE STATUS and Cardinality from SHOW INDEXES. Cardinality reports the number of distinct rows in the indexthis helps MySQL to determine how efficient the index will be in retrieving rows. Indexes that have a high cardinality -to-rows ratio are often called selective indexes.

These statistics are created by MySQL (or the storage engine) during certain operations such as bulk loads/deletes, index creation, and ALTER TABLE operations. You can request that MySQL update the statistics with the ANALYZE TABLE statement. If your database is subject to large fluctuations in data volumes, you may want to run ANALYZE TABLE periodically, but be aware that this statement places a read lock on the table, preventing concurrent update, and therefore should not be run during times of heavy concurrent updates activity.

The optimizer also obtains additional statistics at runtime by probing a table's indexes to determine the relative cardinality of an index against the query values requested. Through this analysis, the optimizer may determine that although an index has low overall cardinality, it is highly selective for the values provided in the query.

Suppose, for instance, that we have an index on gender ('male', 'female', 'unsure'). MySQL will ignore this index for a query that requests all males or all females, but will choose to use the index for a query of all those unsure of their gender. Since this group comprises only a small proportion of the rows, the index will, in this case, help MySQL locate the total result set quickly. We'll look in detail in the next chapter at how MySQL chooses indexes.

Категории