Tuning ORDER and GROUP BY

GROUP BY, ORDER BY, and certain group functions (MAX, MIN, etc.) may require that data be sorted before being returned to the user. You can detect that a sort is required from the Using filesort tag in the ExTRa column of the EXPLAIN statement output, as shown in Example 21-19.

Example 21-19. Simple SQL that performs a sort

SELECT * FROM customers ORDER BY contact_surname, contact_firstname Explain plan ------------ ID=1 Table=customers Select type=SIMPLE Access type=ALL Rows=101999 Key= (Possible= ) Ref= Extra=Using filesort

If there is sufficient memory, the sort can be performed without having to write intermediate results to disk. However, without sufficient memory, the overhead of the disk-based sort will often dominate the overall performance of the query.

There are two ways to avoid a disk-based sort:

These approaches are described in the following sections.

21.4.1. Creating an Index to Avoid a Sort

If an index exists on the columns to be sorted, MySQL can use the index to avoid a sort. For instance, suppose that the following index exists:

CREATE INDEX i_customer_name ON customers(contact_surname, contact_firstname)

MYSQL can use that index to avoid the sort operation shown in Example 21-19. Example 21-20 shows the output when the index exists; note the absence of the Using filesort tag and that the i_customer_name index is used, even though there are no WHERE clause conditions that would suggest that the index was necessary.

Example 21-20. Using an index to avoid a sort

SELECT * from customers ORDER BY contact_surname, contact_firstname Explain plan ------------ ID=1 Table=customers Select type=SIMPLE Access type=index Rows=101489 Key=i_customer_name (Possible= ) Ref= Extra=

21.4.2. Reducing Sort Overhead by Increasing Sort Memory

When MySQL performs a sort, it first sorts rows within an area of memory defined by the parameter SORT_BUFFER_SIZE. If the memory is exhausted, it writes the contents of the buffer to disk and reads more data into the buffer. This process is continued until all the rows are processed; then, the contents of the disk files are merged and the sorted results are returned to the query. The larger the size of the sort buffer, the fewer the disk files that need to be created and then merged. If the sort buffer is large enough, then the sort can complete entirely in memory.

You can allocate more memory to the sort by issuing a SET SORT_BUFFER_SIZE statement. For instance, the following allocates 10,485,760 bytes (10M) to the sort:

SET SORT_BUFFER_SIZE=10485760;

You can determine the current value of SORT_BUFFER_SIZE by issuing the following statement:

SHOW VARIABLES LIKE 'sort_buffer_size';

As you allocate more memory to the sort, performance will initially improve up to the point at which the sort can complete within a single "merge run." After that point, adding more memory appears to have no effect, until the point at which the sort can complete entirely in memory. After this point, adding more memory will not further improve sort performance. Figure 21-9 shows where these two plateaus of improvement occurred for the example above. It also shows the effect of creating an index to avoid the sort altogether.

To find out how many sort merge runs were required to process our SQL, we can examine the value for the status variable SORT_MERGE_PASSES from the SHOW STATUS statement before and after our SQL executes.

Figure 21-9. Optimizing ORDER BY through increasing sort buffer size or creating an index

To optimize SQL that must perform a sort (ORDER BY, GROUP BY), consider increasing the value of SORT_BUFFER_SIZE or create an index on the columns being sorted.

Категории