SQL Performance Tuning
The SQL Standard provides three data types for columns that contain large object (LOB) data: BINARY LARGE OBJECT (or BLOB) for binary data and, for character data, CHARACTER LARGE OBJECT (or CLOB) and NATIONAL CHARACTER LARGE OBJECT (or NCLOB). Table 7-12 shows the SQL Standard requirements and the level of support (data type and maximum segment size ) the Big Eight have for these data types. Notes on Table 7-12:
The normal situation in data storage is that rows fit inside pages. Because pages have fixed sizes, no column can be longer than the page size. For example, Microsoft's page size is 8KB, and it turns out that the maximum length of a CHAR or VARCHAR column is 8KB. LOBs can be larger because the LOB value does not go on the page with the rest of the row. Instead, the page contains a pointer to a data page, or a page series. The pages to which the pointer points are usually stored far awayprobably in a different file. For example, suppose you have this table: CREATE TABLE Table1 ( column1 CHAR(5), column2 BLOB, column3 CHAR(5) ) If you were to INSERT a row into Table1 so that column1 contains XXXXX , column2 contains the text of this book in the BLOB, and column3 contains YYYYY , you'd end up with the situation shown in Figure 7-2. (There are exceptionsfor example, if the BLOB is small, then InterBase will try to fit the value in the main file.) Figure 7-2. How a LOB is stored
LOB pages are a linked list ( veterans of dBASE will see a similarity to the infamous "memo" column here). The pages can stretch out till the crack of doom, but typically the addressing space on a 32-bit machine is 2GBwhich is why so many DBMSs list 2GB as the maximum LOB size. This method can waste a great deal of space. For example, with most DBMSs, if you store a NULL in a LOB, you still take up a whole page. One exception to this is InterBase, which uses RLE compression so the space used is only about 500 bytes. If you use two LOBs, you need two LOB pages. It's impractical to declare a single memory variable with a length of 2GB or to send a single 2GB message over a network. So the DBMS must provide special scalar functions for handling LOB-to-file retrievals, or for reading and writing only a chunk of a LOB at a time. LOBs, then, have several disadvantages:
So when would you want to use a LOB? The obvious answer is when the value for a column is longer than the maximum CHAR/VARCHAR length and, of course, when the data being stored doesn't represent character data (such as a picture). But many people also use LOBs to take advantage of the fact that LOBs are stored outside the main file. Thus, for example, you could also use a LOB for a column that is rarely used in SELECT and has all of these characteristics:
The effect of making such a column a BLOB or a CLOB will be that (a) more rows fit in the main table (which speeds up access in the common case), and (b) changes to the LOB column won't cause shifting because the LOB is stored in a page of its own. One other thing we should mentionLOBs can be different from other data types not only in the way they are stored, but also in the way they are buffered. For example, IBM will usually transfer large objects directly between the disk and the server's output message area without using the buffer pools that are used for transferring other data columns. Because of this, LOBs actually have very little effect on IBM's memory consumption despite their (usually) very large size. The downside, of course, is that the number of things you can do with a LOB is severely limited. The Bottom Line: LOBs
The normal situation in data storage is that rows fit inside pages. Because pages have fixed sizes, no regular column can be larger than the page size. LOBs can be larger than a page because the LOB value does not go on the page with the rest of the row. Instead, there is a pointer to a data page or a page series. LOB pages are a linked list. LOBs have several disadvantages. They waste space, they require extra page reads, they don't allow all the string-manipulation functions, and some tools have trouble understanding them. LOBs have the advantage that they are stored outside the main file, so (a) more rows fit in the main table (which speeds up access in the common case) and (b) changes to the LOB column won't cause shifting because the LOB is stored in a page of its own. Recommendation: Prefer LOBs for images and character columns whose data is longer than the maximum CHAR/VARCHAR length. Prefer LOBs for variable-length columns, with an average length of at least a few hundred bytes, that are rarely used in SELECT and are never used in WHERE, GROUP BY, or ORDER BY clauses. |