Advanced DBA Certification Guide and Reference for DB2 Universal Database v8 for Linux, UNIX, and Windows
The following section will describe tips and techniques for understanding the performance characteristics of workloads using MDC tables that are different from regular table design choices. These performance aspects consist of the following:
Query Processing Overview
In general, the query workload should take advantage of the multi-dimensional clustering by using block indexes effectively. Block indexes can be used for index scans , index ANDing, index ORing, nested loop joins, star joins, hash joins, and group by clauses. Block Index Scans
If the table PARTS is created with the dimensions SHIPPEDDATE and GENPART, consider queries with predicates in the WHERE clause of the form:
All of these predicates are perfect candidates for using the block index defined on the SHIPPEDDATE column. The block index scans should provide performance benefits because the units of I/O and processing are in blocks, or extents, and the size of these indexes is usually much more compact than corresponding RID indexes. In addition, only the first page of the block will need to be checked for valid data, not the entire block. For the statement: select SHIPPEDDATE from parts where SHIPPEDDATE < '2002-11-05' the access plan looks like the following: Access Plan: ----------- Total Cost: 438.915 Query Degree: 1 Rows RETURN ( 1) Cost I/O 14 FETCH ( 2) 438.915 112 /----+----\ 3.5 16 IXSCAN TABLE: USER1 ( 3) PARTS 0.0061235 16 INDEX: SYSIBM SQL0211171657071 In this example, the block index is scanned to determine the block(s) that contain the qualifying rows, then the blocks of the table are scanned to read the qualifying rows of data.
NOTE Notice that the index name above is SQL0211171657071 because it was system generated when the table was created. For ease of examining EXPLAIN information, etc., it is normally a good idea to rename these indexes to make the name more easily understood / remembered .
Block Index ANDing
For an MDC table that contains two or more dimensions, the query workload should contain query predicates that are selective on two or more attributes. For example, the PARTS table has dimensions on SHIPPEDDATE, GENPART, and DISCOUNT. If a query contains predicates of the form:
or
then access plans can combine the block indexes, using index ANDing.
NOTE It is possible for an index scan on one dimension to be selected as well.
Block Index ORing
For the PARTS table defined with three dimensions on SHIPPEDDATE, GENPART, and DISCOUNT, queries containing predicates of the form shown below are candidates for Block Index ORing:
or
Combining Dimension Block Indexes and Record Indexes
MDC tables also allow all of the flexibilities available to regular tables. Regular indexes (RID indexes) can also be created on MDC tables to complement the dimension block indexes. These indexes can be combined with block indexes to perform index ANDing and ORing. For example, if the PARTS table has dimensions on SHIPPEDDATE and DISCOUNT, and a RID index on PARTNUM, then predicates such as the following can use index ANDing and ORing techniques combining block and RID indexes:
The following are some guidelines for defining RID indexes to complement block indexes:
Index-Only Access Restrictions on Block Indexes
Because dimension block indexes can occasionally point to empty blocks, it is not possible to just look at the index key entries for clauses such as EXISTS, MIN, MAX, or DISTINCT. The query plan for these types of clauses on an MDC table case will read the data in the block to verify the presence of a row before proceeding. This can result in degradation in the query performance. If such queries are likely to occur in the query workload, then RID indexes should be created on the referenced columns to take advantage of index-only access. Monotonicity
The derivation of block-level range predicates from column-level range predicates can only be done for columns that have been generated using a monotonic expression. The MONTH function is not monotonic because the range of dates between Sept 2002 and Mar 2003, although increasing, correspond to the month values between 09 and 03, which is decreasing . However, the expression integer(DATE)/100 generates values 200209 and 200303, which increase as the date increases ; therefore, this expression is monotonic. If the optimizer determines that the expression involved in a generated column is not monotonic, it will not be able to generate corresponding predicates on the generated column for range predicates on the base column, but it will still be able to generate predicates on the generated column for equality and IN predicates.
NOTE Monotonic means that an increasing range of values on the base column corresponds to a range of values on the generated column that is never decreasing.
NOTE In some cases, the optimizer will not be able to determine definitively whether an expression is monotonic, and in these cases, it will have to assume that the expression is not monotonic. |