SQL Performance Tuning

   

The SQL Standard provides three data types for columns that contain large object (LOB) data: BINARY LARGE OBJECT (or BLOB) for binary data and, for character data, CHARACTER LARGE OBJECT (or CLOB) and NATIONAL CHARACTER LARGE OBJECT (or NCLOB). Table 7-12 shows the SQL Standard requirements and the level of support (data type and maximum segment size ) the Big Eight have for these data types.

Notes on Table 7-12:

  • BLOB column

    Shows the maximum size of a BLOB if the DBMS supports this data type.

    • Microsoft and Sybase support IMAGE for binary data up to 2GB long; really a synonym for BLOB.

    • MySQL supports BLOB for case-sensitive character values up to 65,535 characters long; really a synonym for CLOB (or a large VARCHAR) and not a true BLOB.

      Table 7-12. ANSI/DBMS Large Object Support

        BLOB CLOB NCLOB Other Maximum Length In or Out
      ANSI SQL Yes Yes Yes No N/S
      IBM 2GB 2GB No No 64KB
      Informix 4TB 4TB No 2GB 2GB
      Ingres 2GB 2GB No No 2KB
      InterBase 4GB No No No 64KB
      Microsoft No No No 2GB 8KB
      MySQL 64KB No No 2KB 24MB
      Oracle 4GB 4GB 4GB No 64KB
      Sybase No No No 2GB 32KB
  • CLOB column

    Shows the maximum size of a CLOB if the DBMS supports this data type.

    • Microsoft and Sybase support TEXT for character values up to 2GB long; really a synonym for CLOB.

    • MySQL supports BLOB for case-sensitive character values up to 65,535 characters long; really a synonym for CLOB (or a large VARCHAR) and not a true BLOB.

  • NCLOB column

    Shows the maximum size of an NCLOB if the DBMS supports this data type.

    • Microsoft supports NTEXT for Unicode character values up to 2GB long; really a synonym for NCLOB.

    • MySQL supports TEXT for case-insensitive character values up to 65,535 characters long; really a synonym for NCLOB (or a large NCHAR VARYING).

  • Other column

    Shows the maximum size of a LOB if the DBMS supports other data types for large object data.

    • Informix supports TEXT for character data up to 2GB long and BYTE for binary data up to 2GB long. The DBMS can access these "simple" LOB data types in one piece.

    • Microsoft supports IMAGE as a synonym for BLOB, TEXT as a synonym for CLOB, and NTEXT as a synonym for NCLOB.

    • MySQL supports BLOB as a synonym for CLOB and TEXT as a synonym for NCLOB. MySQL also supports LONGBLOB as a synonym for a CLOB up to 2GB long and LONGTEXT as a synonym for an NCLOB up to 2GB long.

    • Sybase supports IMAGE as a synonym for BLOB and TEXT as a synonym for CLOB.

  • Maximum Length In or Out column

    Shows the maximum segment size allowed for a single INSERT (in) or SELECT (out) operation on a LOB column.

    • For Oracle, the maximum segment size is dependent on the operating system but cannot exceed 4GB.

The normal situation in data storage is that rows fit inside pages. Because pages have fixed sizes, no column can be longer than the page size. For example, Microsoft's page size is 8KB, and it turns out that the maximum length of a CHAR or VARCHAR column is 8KB. LOBs can be larger because the LOB value does not go on the page with the rest of the row. Instead, the page contains a pointer to a data page, or a page series. The pages to which the pointer points are usually stored far awayprobably in a different file. For example, suppose you have this table:

CREATE TABLE Table1 ( column1 CHAR(5), column2 BLOB, column3 CHAR(5) )

If you were to INSERT a row into Table1 so that column1 contains XXXXX , column2 contains the text of this book in the BLOB, and column3 contains YYYYY , you'd end up with the situation shown in Figure 7-2. (There are exceptionsfor example, if the BLOB is small, then InterBase will try to fit the value in the main file.)

Figure 7-2. How a LOB is stored

LOB pages are a linked list ( veterans of dBASE will see a similarity to the infamous "memo" column here). The pages can stretch out till the crack of doom, but typically the addressing space on a 32-bit machine is 2GBwhich is why so many DBMSs list 2GB as the maximum LOB size. This method can waste a great deal of space. For example, with most DBMSs, if you store a NULL in a LOB, you still take up a whole page. One exception to this is InterBase, which uses RLE compression so the space used is only about 500 bytes. If you use two LOBs, you need two LOB pages. It's impractical to declare a single memory variable with a length of 2GB or to send a single 2GB message over a network. So the DBMS must provide special scalar functions for handling LOB-to-file retrievals, or for reading and writing only a chunk of a LOB at a time.

LOBs, then, have several disadvantages:

  • They waste space.

  • They require extra page reads.

  • They don't allow all the string-manipulation functions that CHAR/VARCHAR allow.

  • Some tools have trouble understanding LOBs.

So when would you want to use a LOB? The obvious answer is when the value for a column is longer than the maximum CHAR/VARCHAR length and, of course, when the data being stored doesn't represent character data (such as a picture). But many people also use LOBs to take advantage of the fact that LOBs are stored outside the main file. Thus, for example, you could also use a LOB for a column that is rarely used in SELECT and has all of these characteristics:

  • Its length is variable.

  • Its average length is at least a few hundred bytes.

  • It is never used in WHERE, GROUP BY, or ORDER BY clauses.

The effect of making such a column a BLOB or a CLOB will be that (a) more rows fit in the main table (which speeds up access in the common case), and (b) changes to the LOB column won't cause shifting because the LOB is stored in a page of its own.

One other thing we should mentionLOBs can be different from other data types not only in the way they are stored, but also in the way they are buffered. For example, IBM will usually transfer large objects directly between the disk and the server's output message area without using the buffer pools that are used for transferring other data columns. Because of this, LOBs actually have very little effect on IBM's memory consumption despite their (usually) very large size. The downside, of course, is that the number of things you can do with a LOB is severely limited.

The Bottom Line: LOBs

The normal situation in data storage is that rows fit inside pages. Because pages have fixed sizes, no regular column can be larger than the page size.

LOBs can be larger than a page because the LOB value does not go on the page with the rest of the row. Instead, there is a pointer to a data page or a page series. LOB pages are a linked list.

LOBs have several disadvantages. They waste space, they require extra page reads, they don't allow all the string-manipulation functions, and some tools have trouble understanding them.

LOBs have the advantage that they are stored outside the main file, so (a) more rows fit in the main table (which speeds up access in the common case) and (b) changes to the LOB column won't cause shifting because the LOB is stored in a page of its own.

Recommendation: Prefer LOBs for images and character columns whose data is longer than the maximum CHAR/VARCHAR length. Prefer LOBs for variable-length columns, with an average length of at least a few hundred bytes, that are rarely used in SELECT and are never used in WHERE, GROUP BY, or ORDER BY clauses.

   

Категории