Query Execution

Query execution in MySQL Cluster can use a couple different methods. Which method is used makes a large difference in the response time of a query.

When MySQL receives a query, many different steps occur before you receive the results. The path is roughly laid out like this:

  1. Receive query over network
  2. Check query cache
  3. Parse query
  4. Check permissions
  5. Optimize query:

    1. Query transformations
    2. Decide the order in which to read the tables
    3. Decide on index use
    4. Decide which algorithms to use for retrieval

  6. Query execution:

    1. Retrieval of data, based on preceding plan
    2. Apply expressions to retrieved data
    3. Sort data as necessary

  7. Return results back to client

MySQL Cluster comes into play in step 5(a). MySQL Cluster is the storage engine that does the actual physical retrieval of data. All the other steps are done on the MySQL server side and work the same as with other storage engines. Before we talk further about this, we need to discuss the one other step that is affected as well: the query cache.

The Query Cache in MySQL Cluster

The query cache is a special cache that exists in MySQL to cache a query and a result set. If someone sends exactly the same query again, MySQL can return the result set directly from the cache instead of having to execute it again. When the data in a table changes, the query cache invalidates all the queries that involve that table, in order to prevent serving stale data. This is generally how it works with MySQL, but it gets a bit more involved with MySQL Cluster.

First, the query cache doesn't work at all with MySQL Cluster version 4.1. This shortcoming has been fixed in MySQL 5.0, but it does work slightly differently. The reason it works differently is that data can change in many different MySQL servers without the other MySQL servers being aware of the changes. If the server isn't aware of changes, then it can't invalidate the cache, which leads to incorrect results.

When you have the query cache on and NDB enabled, MySQL creates an extra thread internally to periodically check what tables have been changed. Doing this check causes a bit of extra overhead for the MySQL server. Due to this extra overhead, you need to keep a few things in mind.

First, you should ensure that you are getting good use of the query cache. You can monitor the query cache status variables in order to see what percentage of them are using the cache:

mysql> SHOW STATUS LIKE 'Qcache%'; +-------------------------+----------+ | Variable_name | Value | +-------------------------+----------+ | Qcache_free_blocks | 1 | | Qcache_free_memory | 10476984 | | Qcache_hits | 32342 | | Qcache_inserts | 2323 | | Qcache_lowmem_prunes | 2 | | Qcache_not_cached | 20 | | Qcache_queries_in_cache | 32 | | Qcache_total_blocks | 36 | +-------------------------+----------+ 8 rows in set (0.00 sec)

The important variables to monitor are Qcache_hits, Qcache_inserts, and Qcache_not_cached. You need to ensure that Qcache_hits is generally larger than the other two values added together (which is the query cache miss rate).

The second thing you can do is change a value called ndb_check_cache_time. This variable tells how many seconds MySQL Cluster should wait between checking for data changes that were made on another MySQL server. The default setting is 0, which means MySQL Cluster will constantly be checking. Increasing this variable greatly reduces the overhead, but, in theory, it means there is a chance that the query cache will return slightly incorrect data. For example, it might return incorrect data if someone changes data on one server and any queries until the next server checks for that change return incorrect data. Whether this is okay depends entirely on the application. Normally any setting above 1 wouldn't be needed because the server checks only once per second for invalidations.

Data Retrieval in MySQL Cluster

Категории