DB2 UDB V8 and WebSphere V5. Performance Tuning and Operations Guide2004

 < Day Day Up > 


3.2 DB2 architecture overview

Figure 3-2 shows a general overview of the architecture and processes for DB2 UDB V8.

Figure 3-2: DB2 architecture overview

On top of this figure you see the clients connecting to the database using different network protocols. WebSphere also acts as a client. From the view of DB2 UDB, it makes no difference with other DB2 UDB clients when WebSphere requests a connection. With WebSphere, the communication with DB2 UDB is by Java Database Connectivity (JDBC) via a network protocol.

If a client connects to the database he will communicate with a coordinator agent. The coordinator agent will be assigned to this application and handles all of the SQL requests. If intra-partition parallelism is enabled, the coordinator agent works with subagents. The coordinator agent then has to decide whether to handle the request by itself or to send the request to a number of subagents, depending on the configuration and the number of CPUs.

The bufferpool is a piece of memory to where database pages of user table data, index data, and catalog data are temporarily moved from disk storage. DB2 agents read and modify the bufferpool when processing the data requests. If the data is not available in the buffer pool, it needs to be retrieved from the disks into the buffer pool.

Prefetchers retrieve data from disk and move it into the buffer pool before applications need the data. Page cleaners move data from the buffer pool back out to disk. The page cleaners are working in the background to look for no longer needed dirty pages and write them to disk.

All changes to regular data (any data type except BLOBs, Long VARCHAR, and data in Global Temporary tables) and index pages are written to the log buffer. The data in the log buffer is written to disk by the logger. The mechanism of logging the database changes allows you to recover the data to the point of failure when a system crash is encountered.

The deadlock detector checks if a deadlock situation occurred. A deadlock is created when one or more applications are waiting for another application to release a lock on data. Each of the waiting applications is locking data needed by another application. Mutual waiting for the other application to release a lock on held data leads to a deadlock. The deadlock detector will select one of the applications in the deadlock and releases the locks currently held by that "volunteered" application. A negative SQLCODE will be returned to the application containing the "selected" transaction.

3.2.1 Process model

DB2 UDB starts various processes to deal with the database tasks. In Figure 3-3 you can find an illustration of the process model DB2 UDB uses.

Figure 3-3: DB2 process model

A single coordinator agent (db2agent) is assigned to a client application when it connects to a database. A coordinator agent works on behalf of an application, and communicates to other agents, using inter-process communication (IPC) or remote communication protocols. Subagents (db2agntp) support the coordinator agents if they are a partitioned database environment or intra-partition parallelism is activated.

DB2 UDB uses a firewall to prevent client applications and implementations of user defined functions (UDF) and stored procedures (SP) from accessing the address space of DB2 UDB. A firewall maintains the integrity of the data in the databases, because it disables application programming errors from overwriting internal buffers or files of the database manager. The firewall also improves reliability, because application errors cannot crash the database manager.

Note 

The firewall inside of DB2 acts only for internal purposes and has nothing to do with a firewall that prevents your company from external attacks.

The fenced mode process (db2fmp) is responsible for executing fenced store procedures and UDFs. Fenced code runs in a separate address space behind the firewall.

A listener process is between the client and the server processes. This process is responsible for the communication between the client and the server. There are different listener processes to handle different communication protocols.

There can be up to four listeners available in an AIX environment:

On the instance level there are several processes and threads like the system controller (db2sysc) that must exist in order for the database server to function. Also, the following threads and processes may be started to carry out various tasks:

The following list includes some of the important threads/processes used by each database:

3.2.2 Memory model

DB2 UDB has four different types of memory, as shown in Figure 3-4.

Figure 3-4: Types of memory

The Database global manager shared memory is allocated when the database manager is started (db2start). The memory remains allocated untill the database manager is stopped (db2stop). This area contains information that the database manager uses to manage activity across all database connections.

Database Shared Memory (also known as Database Global Memory) is allocated when a database is activated or connected to for the first time. This memory is used across all applications that might connect to the database. Database Shared Memory can be controlled by the DATABASE_MEMORY configuration parameter. By default, this parameter is set to automatic so that DB2 calculates the amount of memory allocated for the database. With the NUMDB parameter you can control the maximum count of concurrent active databases. Each database has its own global memory that consumes the biggest part of memory.

Many different memory areas are contained in database shared memory, as shown in Figure 3-5, including:

Figure 3-5: How the database manager uses the memory

The memory for an agent becomes allocated when the agent is created. Agent private memory is allocated for the agent and contains memory allocations that are used only by this specific agent, such as the sort heap and the application heap. When a database is already in use by one application, only agent private memory and application global shared memory is allocated for subsequent connecting applications. With the MAXAPPLS parameter you specify the maximum number of applications that can simultaneously connect to a single database. This parameter is part of the database configuration and can be set to different values for different databases. The more applications are allowed to connect concurrently to a database the more memory is needed.

The database manager parameters MAXAGENTS and MAX_COORDAGENTS are not shown in the figure. They limit the number of database manager agents that can exist simultaneously across all active databases in an instance. Together with MAXAPPLS, these parameters limit the amount of memory allocated for agent private memory and application global memory.

Figure 3-5 shows what the database manager shared memory and the other portions of memory are used for.

3.2.3 Storage model

Figure 3-6 illustrates how the data is physically stored on disks.

Figure 3-6: Storage hierarchy

A tablespace is a logical layer between the database and data stored in it. By default each database has three table spaces.

There are two different kinds of table spaces called System Managed Space (SMS) and Database Managed Space (DMS).

We recommend using SMS if you cannot estimate the size and the growth of data. Temporary tables with data volume varies from small to large is a good candidate of using SMS table space.

A container is a physical storage device. It can be identified by a directory name (SMS), a file name (DMS), or a raw device (DMS). A table space consists of several containers, as you can see from Figure 3-6.

An extent is a unit of space in a container and contains a number of pages; the default is 32.

A page has a size of 4 KB, 8 KB, 16 KB or 32 KB. You have to define the page size when you create a table space. There is no possibility to change the page size of a table space, once created. The maximum amount of rows in page is 254. Therefore it is necessary to determine the maximum row size of the tables you want to store in the table space to determine what page size fits it best.

3.2.4 Key performance-related areas

In this section we provide a deeper view of the performance-related aspects of the DB2 UDB architecture. This will prepare you for Chapter 7, "Monitoring and tuning of DB2 UDB V8" on page 237, where we show you how to tune the database server.

Buffer pool

When an application accesses a row of a table for the first time, the database manager places the page containing that row in the buffer pool. The next time any application requests data, the database manager first looks for the buffer pool. If the requested data is in the buffer pool, it can be retrieved without disk access, resulting in faster performance.

The buffer pool is a very important area for data accessing performance. It is a good practice to have as much data as possible that the application frequently needs in the buffer pool. However, allocating excessive amounts of memory for the buffer pool for an application could impact the system performance. In Chapter 7, "Monitoring and tuning of DB2 UDB V8" on page 237, we show you how to monitor and tune the buffer pool to achieve your performance goal.

A buffer pool is assigned to one or many table spaces. You may define more than one buffer pool in a database. If you have a table with data that is constantly used, it may be useful to place this table in a table space with its own buffer pool to make the data memory resident.

With DMS table spaces, the table data, indexes, and long data can be placed in separate table spaces. This makes it possible to have a separate buffer pool for an index to keep the index memory resident and improve the performance.

By default the prefetcher reads contiguous pages from disk and may write them into non-contiguous pages in the buffer pool. With DB2 V8, you can configure a block-based buffer pool. When this feature is activated (during creation of a buffer pool or with the ALTER buffer pool command) the contiguous pages from disk are written into contiguous pages of the buffer pool, when available.

It is recommended to use a separate buffer pool for temporary table spaces. This increases performance for queries that require temporary storage, especially sort-intensive queries.

Note 

The default buffer pool size is very small and therefore there is a need to tune. In UNIX systems the default size is 1000 4-K pages; in Windows systems it is 250 4-K pages.

Asynchronous read/write

I/O is a complex part and also very important for performance. As described earlier, it is good to have prefetchers running. These agents are responsible for accessing data before an application needs it. Processes known as I/O servers handle the prefetching. The number of processes can be configured using the configuration parameter NUM_IOSERVERS. It is better to configure more I/O servers than needed than not have enough of them. The impact of having excessive I/O servers is almost none because the memory used for the I/O servers is paged out. However, the performance can be dramatically decreased f I/O servers are not enough.

Figure 3-7 shows the steps of how DB2 UDB uses the I/O server and prefetching process. It illustrates the process starting from a client request for data pages untill DB2 brings the data from the database server back to the client.

Figure 3-7: Prefetching data using I/O servers

Below we explain the figure:

Tablespaces and container

A tablespace can have several containers. The data will be distributed in a round-robin technique across the available containers. Figure 3-8 shows how the extents are allocated. Extent 0 is allocated in the first container, extent 1 in the second, and so forth.

Figure 3-8: Distribution of extents in container

DB2 V8 introduces some new container management features. It is now allowed to alter the size of container. Another new feature is that you can turn off rebalancing while adding a container. In the previous version the container rebalance process is automatic. This process can impact overall system performance if the table space is big. For detailed information about how to add or extended the containers, please reference Chapter 5 of the IBM DB2 UDB Administration Guide: Planning, SC09-4822.

It is important to distribute the container over several disks to allow the database manager to do parallel I/O. If you are using striped devices like RAID devices, it is recommended to use only one container per tablespace. The size of an extent should be n * the RAID stripe size with n>=1. The prefetch size should be n* extentsize and n * RAID stripe size * count of devices, with n>=1.

Note 

DB2 cannot determine if a RAID device is used. It is important to set the DB2_PARALLEL_IO registry variable to enable parallel I/O. DB2_PARALLEL_IO=* enables parallel I/O for all tablespaces.

Database agents

Agents are processes and they need system resources. For a good performance it is necessary to handle database agents economically with system resources. The number of available agents depends on the database manager configuration parameters MAXAGENTS and NUM_POOLAGENTS.

Connection concentrator

For Internet applications with many relatively transient connections, or similar kinds of applications, the connection concentrator improves performance by allowing many more client connections to be processed efficiently. It also reduces memory use for each connection and decreases the number of context switches.

The connection concentrator is new in DB2 UDB V8 and works like a transaction monitor. Therefore there is no more need to buy a commercial product to reduce the number of connections to the database. It is now possible to handle thousands of concurrent connections with hundreds of agents.

Note 

The connection concentrator is enabled when the value of the database manager parameter MAX_CONNECTIONS is greater than the value of the DBM parameter MAX_COORDAGENTS.

Figure 3-9 shows the concept of the connection concentrator. The left side of the picture shows that without the connection concentrator, the number of coordinator agents is as many as the number of client connections. This consumes a lot of system resources if there are many connections active on a database server. When the connection concentrator is activated, the number of coordinator agents decreases because the coordinator agents are within a pool and accessible for different connections. When the number of coordinator agents shrinks, the number of subagents also reduces, as shown in right side of the figure.

Figure 3-9: Connection concentrator concept

For more details see Chapter 7, "Monitoring and tuning of DB2 UDB V8" on page 237, where the parameter will be discussed in more detail.

Concurrency

Concurrency can be a reason for an application with a poor performance. DB2 UDB puts locks on data used by an application according to the isolation level. How restrictive these locks are depends on the isolation level. The isolation level can be set for an application and depends on the kind and the needs of the application. These levels are from restrictive to non-restrictive:

Table 3-1 is a summary of the isolation levels.

Table 3-1: Summary of isolation levels

Isolation level

Access to uncommitted data

Nonrepeatable read

Phantom read phenomenon

RR

Not possible

Not possible

Not possible

RS

Not possible

Not possible

possible

CS

Not possible

possible

possible

UR

possible

possible

possible

The impacts of the isolation level on the performance that may be seen in an application are deadlocks and lock timeouts/lock waits. How to set the Isolation level is discussed in Chapter 7, "Monitoring and tuning of DB2 UDB V8" on page 237 when we talk about application tuning. We will also discuss how to prevent from lock escalation.

In Table 3-2 we introduce the different locks DB2 UDB uses.

Table 3-2: Lock modes shown in order of increasing control over resources

Lock mode

Applicable object type

Description

IN (Intent None)

Table spaces, tables

The lock owner can read any data in the table, including uncommitted data, but cannot update any of it. No row locks are acquired by the lock owner. Other concurrent applications can read or update the table.

IS (Intent Share)

Table spaces, tables

The lock owner can read data in the locked table, but not update this data. When an application holds the IS table lock, the application acquires an S or NS lock on each row read. In either case, other applications can read or update the table.

NS (Next Key Share)

Rows

The lock owner and all concurrent applications can read, but not update, the locked row. This lock is acquired on rows of a table, instead of an S lock, where the isolation level is either RS or CS on data that is read.

S (Share)

Rows, tables

The lock owner and all concurrent applications can read, but not update, the locked data. Individual rows of a table can be S locked. If a table is S locked, no row locks are necessary.

IX (Intent Exclusive)

Table spaces, tables

The lock owner and concurrent applications can read and update data in the table. When the lock owner reads data, an S, NS, X, or U lock is acquired on each row read. An X lock is also acquired on each row that the lock owner updates. Other concurrent applications can both read and update the table.

SIX (Share with Intent Exclusive)

Tables

The lock owner can read and update data in the table. The lock owner acquires X locks on the rows it updates, but acquires no locks on rows that it reads. Other concurrent applications can read the table.

U (Update)

Rows, Tables

The lock owner can update data in the locked row or table. The lock owner acquires X locks on the rows before it updates the rows. Other units of work can read the data in the locked row or table, but cannot attempt to update it.

NX (Next Key Exclusive)

Rows

The lock owner can read but not update the locked row. This mode is similar to an X lock except that it is compatible with the NS lock.

NW (Next Key Weak Exclusive)

Rows

This lock is acquired on the next row when a row is inserted into the index of a non-catalog table. The lock owner can read but not update the locked row. This mode is similar to X and NX locks except that it is compatible with the W and NS locks.

X (Exclusive)

Rows, tables

The lock owner can both read and update data in the locked row or table. Tables can be Exclusive locked, meaning that no row locks are acquired on rows in those tables. Only uncommitted read applications can access the locked table.

W (Weak Exclusive)

Rows

This lock is acquired on the row when a row is inserted into a non-catalog table. The lock owner can change the locked row. This lock is similar to an X lock except that it is compatible with the NW lock. Only uncommitted read applications can access the locked row.

Z (Superxclusive)

Table space, tables

This lock is acquired on a table in certain conditions, such as when the table is altered or dropped, an index on the table is created or dropped, or a table is reorganized. No other concurrent application can read or update the table.

SQL

In the following we list some concepts with high performance issues and some flycatchers introduced in version 8.

Data types

Understanding your data and defining the proper data type to store the data is another element affects the performance. Improper data type design in an application could result in bad performance. For example, it is not a good practice to use VARCHAR whenever a string needs to be stored. Numbers should be stored in number fields and not in character fields. In Chapter 7, "Monitoring and tuning of DB2 UDB V8" on page 237, we give some advice on how to choose the correct data types.

String data

This section describes the three string data types CHAR, VARCHAR and LONG VARCHAR. Each character string is further defined as one of:

When defining the length of a string data type, it is important to keep in mind that the length represents the number of bytes and not the number of characters. If you have a multi-byte code set like Unicode there is a need to reserve more space.

Numeric data

There are six data types that can be used to store numeric values. The data types are used to store different numeric types and precision. The data is stored using a fixed amount of storage for all numeric data types. The amount of storage required increases as the precision of the number goes up.

Dates, times, and timestamps

There are three DB2 data types to represent dates and times:

Index

Indexes are used to speed up the retrieval of data. An index usually takes less space than a table and has a tree structure to reduce the look-up time. But indexes also need space from your storage. What indexes need to be created depends on the SQL the application uses. We introduce a tool later in this chapter that helps you to determine which indexes may be useful.

Type-2 indexes

DB2 Version 8 adds support for type-2 indexes. Here are the main advantages of type-2 indexes:

Attention: 

All new indexes are created as type-2 indexes, except when you add an index on a table that already has type-1 indexes. In this case the new index will also be a type-1 index, because you cannot mix type-1 and type-2 indexes on the same table.

All indexes created before DB2 Version 8 are type-1 indexes. Use the REORG INDEXES command to convert type-1 indexes to type-2 indexes. Use the INSPECT command to ascertain the type of index defined on a table. After this conversion, runstats should be performed.

Stored procedures (SPs)

Store procedures (SPs) can help reducing the network traffic. A SP can contain several SQL commands and also some business logic. SPs are stored in the database server. Calling a SP takes only one call command and the result is sent back to the application after executing the SQLs in SP at the host. If you do all the SQL commands within your application, the communication between your application and the database is much higher.

Multidimensional clustering (MDC)

Multidimensional clustering (MDC) is a new feature of DB2 UDB V8. Multidimensional clustering enables a table to be physically clustered on more than one key or dimension simultaneously. Prior to Version 8, DB2 UDB only supports single-dimensional clustering of data via clustering indexes. Using a clustering index, DB2 UDB attempts to maintain the physical order of data on pages in the key order of the index as records are inserted and updated in the table. Clustering indexes greatly improve the performance of range queries that have predicates containing the key (or keys) of the clustering index, as, with good clustering, only a portion of the table needs to be accessed, and, when the pages are sequential, more efficient prefetching can be performed.

With MDC, these benefits are extended to more than one dimension or clustering key. In the case of query performance, range queries involving any, or any combination of, specified dimensions of the table will benefit from clustering. Not only will these queries access only those pages having records with the correct dimension values, these qualifying pages will be grouped by extents. Furthermore, although a table with a clustering index can become un-clustered over time as space fills up in the table, an MDC table is able to maintain its clustering over all dimensions automatically and continuously, thus eliminating the need to reorganize the table in order to restore the physical order of the data.

Declared temporary tables

A declared temporary table is defined by the DECLARED GLOBAL TEMPORARY TABLE statement. Declared temporary tables are used to store data that does not need to be persistent. A declared temporary table is a temporary table that is only accessible to SQL statements that are issued by the application that created the temporary table. A declared temporary table does not persist beyond the duration of the connection of the application to the database. The creation of a temporary table is not stored in the catalog. In comparison to regular tables, DB2 does not lock declared temporary tables or their rows, and, if you specify the NOT LOGGED parameter when you create it, does not log declared temporary tables or their contents.

In DB2 UDB V8, a new feature is added to the declared temporary tables that allows the creation of index on temporary tables. Prior to version 8, the temporary table needed to be scanned completely, which is very time consuming if the table is big. With indexes, retrieving data from a temporary table is much faster. For a better optimization it is now also possible to update the statistic of temporary tables with runstats.

Maintenance

Performance tuning is a constant task. There are maintenance tasks an administrator can do regularly to keep good performance.

Runstats

DB2 UDB has an optimizer that looks for the best way to retrieve data. The optimizer uses statistical data to calculate the lowest price for retrieval. With the runstats command the statistical data will be collected and stored in the catalog. So the optimizer knows, for example, how many rows a table has, and can determine if an index scan is useful or just use a table scan.

The statistic data must be updated from time to time. If you have tables with many changes, you should run runstats at regular intervals.

Reorg

In tables where data are frequently inserted and deleted, the table space becomes more and more fragmented. This has an impact on the performance. To defragment the data, DB2 UDB offers a reorgchk command to determine if a table requires re-organization, and reorg command to organize the data.

Rebind

When using static SQL, the best access plan is determined during the bind time and saved as a package. If your data changes and you run a runstats, the static code that was bound to the server will not pick up the new statistic. Therefore it is necessary to do a rebind to force a new calculation of the access plan.

Application design

Keep in mind that the tuning of the server is important, but the best-tuned server is slow if the database design or the application is badly designed.

The physical and logical designs of the database and the application are also critical for performance. The database needs to become normalized to eliminate redundant data. But for performance it is sometime useful to not have normalized data. We discus this in Chapter 7, "Monitoring and tuning of DB2 UDB V8" on page 237.


 < Day Day Up > 

Категории