Microsoft SQL Server 7.0 System Administration Training Kit
One of the main functions of a system administrator is to allocate, manage, and monitor the space and storage requirements for a database. Estimating the space that a database requires can help you plan your storage layout and determine hardware requirements.
After this lesson, you will be able to
- Estimate the space requirements for a database
Estimated lesson time: 15 minutes
Estimating the Minimum Size of a Database
Figure 5.4 lists a number of factors that you should consider when attempting to determine the space requirements of your database.
Figure 5.4 Factors to consider when estimating the size of a database
Consider the following when you estimate the amount of space that your database will occupy:
- The size of the model database and the system tables (include projected growth). This is typically not a large percentage of the database size.
- The amount of data in your tables (include projected growth).
- The number of indexes and the size of each, especially the size of the key value, the number of rows, and the fill factor setting.
- The size of the transaction log. Your estimate should take into account factors such as the amount and frequency of modification activity, the size of each transaction, and how often you back up (dump) the log.
NOTE
As a starting point, you should allocate 25 percent of the database size to the transaction log for online transaction processing (OLTP) environments. You can allocate a smaller percentage for databases that are used primarily for queries.
Estimating the Amount of Data in Tables
After you consider the amount of space that is allocated to the model database, you should estimate the minimum amount of data in each table, including projected growth. This can be calculated by determining the total number of rows, the row size, the number of rows that fit on a page, and the total number of pages required for each table in the database.
Estimating the Number of Pages in a Table
To estimate the minimum number of pages required for a table, find out the number of characters for each row and the approximate number of rows that the table will have. Calculate the number of pages, using the following method:
- Calculate the number of bytes in a row by totaling the number of bytes that each column contains. If one or more columns are defined as variable length—such as a column for names—you can add the column average to the total. Add 9 bytes to the total, as each row has an overhead of 9 bytes.
- Determine the maximum number of rows contained in each data page. To do this, divide 8094 by the number of bytes in a row. Round the result down to the nearest whole number because SQL Server does not allow a row to cross pages. In practice, it is unlikely that this many rows will actually be stored on a page. SQL Server keeps some free space on each page to allow for a small amount of individual row growth per page when variable-length data is updated to a larger size.
- Divide the approximate number of rows in the table by the number of rows that are contained in each data page. The result equals the minimum number of pages needed to store your table.
Example: Calculating Table Size
This example calculates the size of a table that has four int columns, three char(30) columns, and one datetime column. The table will have 250,000 rows.
- Total row size: 4 * 4 bytes (the int columns) + 3 * 30 bytes (the char columns) + 1 * 8 bytes (the datetime column) + 9 bytes (row overhead)= 123 bytes
- Number of rows per page: 8094 / 123 = 65 rows per page
- Number of pages in the table: 250,000 / 65 = 3847 pages
- Size of the table: 3847 * 8 KB = 30,776 KB or about 31 MB
In this example, SQL Server stored 63 rows per page for this table. The total number of pages is 3969, and the table size is 31,752 KB.
Estimating the Amount of Data in Indexes
Estimating index space becomes more difficult when indexes have two types of pages. The first type, called leaf pages, holds the indexed key values. The second type forms the binary search tree that speeds indexed search and retrieval.
Index pages can intentionally be left unfilled by specifying a fill factor of less than 100 (percent) when building the index. This increases the number of pages in the index but makes table row inserts and updates faster because fewer new index pages need to be allocated during these operations.
Clustered vs. Nonclustered Indexes
A clustered index stores the data rows of a table in the indexed order of the key values of the index. A table can only have one clustered index. A nonclustered index is a separate structure that stores a copy of the key values from the table in indexed order and a pointer to each corresponding data row in the table.
Extra space for a clustered index is taken up only by the b-tree index pages because the data rows in the table are the index leaf pages. Also, if a fill factor is specified when building a clustered index, the data pages of the table are filled only to the level specified by the fill factor, making the table larger.
A nonclustered index consists of the b-tree index pages as well as the leaf pages of the index. The leaf pages hold a pointer to and an indexed copy of the key value of each row in the table. If the table has a clustered index, the pointer in a nonclustered index is the clustered index key value. For this reason, you should use small key values for the clustered index of a table that will have nonclustered indexes, or else the nonclustered indexes can become very large.
Index Size Example
The following table shows some examples of the sizes of various indexes added to the table for which the size was calculated in the previous example.
The table first gives the size of an index based on a char(30) key column when there is no clustered index on the table. The size of the same index is then given when there is a clustered index on the table. Finally, the size of the index is given when the index has a fill factor of 50 percent and there is a clustered index on the table. Note that this index is larger than the table! The clustered index used a char(30) key column—this is not recommended and illustrates the dramatic increase in size of nonclustered indexes if you do use such a large key column for a clustered index.
The same figures are then repeated for an index based in an int key column instead of a char(30) key column.
Reminder: the table has 250,000 rows, and is 3969 pages or 31,752 KB in size.
Key | Clustered on table | Fill factor | Pages | Size |
---|---|---|---|---|
One char(30) column | No | 100 | 1360 | 10,880 KB |
One char(30) column | Yes | 100 | 2296 | 18,368 KB |
One char(30) column | Yes | 50 | 4548 | 36,384 KB |
One int column | No | 100 | 560 | 4480 KB |
One int column | Yes | 100 | 1492 | 11,936 KB |
One int column | Yes | 50 | 2944 | 23,552 KB |
Lesson Summary
There are many factors involved in the estimation of space requirements for a database, including the size of the rows in your tables, the estimated growth of the tables, the number and size of the indexes, and the fill factor for the indexes are some important factors. This lesson gave you the tools to estimate the size requirements of a database, allowing you to manage your resources more effectively.