Understanding DB2: Learning Visually with Examples (2nd Edition)

7.9. Indexes

Indexes are database objects that are built based on one or more columns of a table. They are used for two main reasons:

  • To improve query performance because they can access the data faster with direct access to rows based on their key values

  • To guarantee uniqueness when defined as unique

7.9.1. Working with Indexes

To create an index, use the CREATE INDEX statement. This statement requires at a minimum:

  • The name of the index

  • The name of the associated table

  • The columns that make up the index (also known as index keys)

In addition, you can specify the following:

  • Whether the index is unique (enforce uniqueness for the key values) or non-unique (allow duplicates)

  • Which order DB2 should use to build and maintain the key values: ascending (ASC, the default) or descending (DESC) order

  • Whether bi-directional scans of the index are allowed (ALLOW REVERSE SCANS)

  • Whether to create INCLUDE columns that are not part of the index key but are columns often retrieved by your queries

For example, let's consider the following statement.

CREATE UNIQUE INDEX company_ix ON company (company_ID ASC, name DESC) INCLUDE (no_employees) ALLOW REVERSE SCANS

This statement creates a unique index company_ix. This index is associated to the table company based on the column company_ID in ascending order and name in descending order. Bi-directional scans are allowed because we specified ALLOW REVERSE SCANS. If this is not specified and the index needs to be traversed in opposite direction, the index cannot be used; a sort will be required. In addition, an INCLUDE column no_employees was added to the index definition. This column does not belong to the index key, that is, the index will not be built and maintained taking this column into consideration. Instead, an INCLUDE column is useful for performance reasons. Assuming the users of table company often retrieve the no_employees column, without the INCLUDE column, DB2 would first have to access the index page and then the data page. Rather than performing two access operations, why not add the desired column in the index?

NOTE

INCLUDE columns in an index can improve performance at the cost of having a larger index. The effect of adding an INCLUDE column versus including the column as part of the index key is the same; however, the maintenance cost of updating INCLUDE columns is less than that of updating key columns.

Now let's consider another example that shows how an index looks. The following statement was used to create a table containing the sales records of a company.

CREATE TABLE sales ( sales_person VARCHAR(30) NOT NULL, region CHAR(5) NOT NULL, number_of_sales INT NOT NULL, year INT )

Figure 7.15 illustrates the contents of this table.

Figure 7.15. The sales table

Let's define an index on the sales_person column of the sales table using the following CREATE INDEX statement.

CREATE INDEX index1 ON sales (sales_person)

When DB2 builds the index index1, it creates pointers to the data pages of each record in the table. Each record is identified by a record ID (RID). An index on the sales_person column is shown in Figure 7.16.

Figure 7.16. An index defined on the sales_person column

Rows are stored on physical disks, and disks are divided into extents. An extent contains a fixed number of pages. We will talk about pages and extents in detail in Chapter 8, The DB2 Storage Model. For now, we will treat extents as portions of the disk.

Let's say you issue the following query:

SELECT * FROM sales WHERE sales_person = 'Sam'

For this query, DB2 would use index index1, and as you can see from Figure 7.16, it would follow the pointers to extents 1, 2, 4, and 5; all of these extents have a data page with a record where the salesperson is Sam. An index gives you a direct access to the records you are looking for. Without an index, DB2 may scan all the data pages in all the extents for the table. This operation is known as a table scan, and for very large tables, it can be a very expensive operation.

Once an index has been created, it cannot be modified. To add or remove a key column from the index, you must drop and recreate the index. To drop an index, use the DROP INDEX statement. For example:

DROP INDEX index1

NOTE

An index automatically created by the database manager when a primary key or unique constraint was defined cannot be dropped with the DROP INDEX statement. To drop these indexes, use the ALTER TABLE statement with the DROP PRIMARY KEY or DROP UNIQUE constraint_name clauses, respectively.

Indexes can improve query performance considerably; however, the more indexes you define on a table, the more the cost incurred when updating the table because the indexes will also need to be updated. The larger you define the size of an index (based on the number of key columns and their columns sizes), the more the cost to update the index. Choose your indexes wisely.

The Index Advisor, part of the Design Advisor tool, can recommend indexes for you based on a specific query or a set of queries. You can launch the Design Advisor from the Control Center by choosing a given database, right-clicking on it, and choosing Design Advisor. You can also invoke the Index Advisor directly from the command line by using the db2advis command.

7.9.2. Clustering Indexes

In the example in section 7.9.1, Working with Indexes, you saw that index index1 (based on the sales_person column) improved query performance over table scans. However, because the data pages for the corresponding records were spread across different extents, several I/O requests to the disk were required. Would it not be more efficient to keep all of the desired data pages clustered together on the same extent?

You can achieve this by using a clustering index. A clustering index is created so that the index pages physically map to the data pages. That is, all the records that have the same index key are physically close together. Figure 7.17 illustrates how index1 works when created as a clustering index using the CLUSTER clause as follows.

CREATE INDEX index1 ON sales (sales_person) CLUSTER

Figure 7.17. A clustering index on the sales_person column

In the figure, when you issue this query:

SELECT * FROM sales WHERE sales_person = 'Sam'

DB2 would still use index index1 but it requires less I/O access to the disk because the desired data pages are clustered together on extents 4 and 5.

NOTE

There can be only one clustering index per table.

Категории