The ARC Buffer Manager

It's important to keep a few points in mind when you're trying to tune a PostgreSQL database. First, the shared buffer cache is shared. All of the examples in this chapter were built using a single-session databaseif you try to reproduce these experiments, be sure you're the only one using the database or your results will vary widely. Second, the important part of a buffer management scheme isn't the part that determines what goes into the cache, it's the part that determines what gets thrown out of the cache. When PostgreSQL reads data from a table, it first checks to see if the required page is in the cache. If PostgreSQL finds it in the cache, it stops looking. If the required page isn't in the cache, PostgreSQL must read it in from disk. That means that every page in every table is read into the cache as soon as a query (or other command) refers to the page. If your buffer cache is large enough, PostgreSQL will never evict a page from the cache (although it will write modified pages to disk).

When PostgreSQL adds a page to the shared cache and finds that the cache is already full, it must evict some other page. Prior to release 8.0, PostgreSQL would always evict the least-recently-used page. Pretend that you have a very small buffer cache (say three pages). When you execute a command that causes a table scan (a scan of every page from beginning to end), the server starts by reading the first page into the cache. Next, the server processes every tuple on that page (ignoring dead and uncommitted tuples as it goes).

Once it has finished processing the first page, it unpins that page in the cache (meaning that that page is no longer in use and can be evicted if necessary). The server then moves on to the second page. Because there are still two free pages in the cache, PostgreSQL just reads the second page from disk, stores that page in the cache, processes each tuple in that page, and then unpins that page. The server repeats this sequence for the third page. When PostgreSQL comes to the fourth page, it finds that the cache is full and evicts the least-recently-used page (page one) from the cache, replacing it with page four. That leaves you with pages two, three, and four in the cache. When the server reads in page five, it evicts page two (the least-recently-used page) from the cache, leaving you with pages three, four, and five in the cache. That sequence continues until the server has finished reading the entire tablewhen you're finished, the cache contains the last three pages from the table. If you execute the same command again, the sequence is the same except that PostgreSQL will have to evict one of the last three pages before it can add the first page to the cache. If instead you execute a command that can be satisfied by looking at the last three pages of the table, PostgreSQL will find those pages in the cache and won't read them from disk.

Of course, if another user is running a command at the same time, he's using the same buffer cache and the eviction sequence will be completely different.

As you've seen earlier in this chapter, a table scan can evict all of the pages from an LRU cache.

Starting with version 8.0, PostgreSQL uses a new caching mechanism that constantly adapts itself to a changing workload. The ARC (adaptive replacement cache) scheme effectively uses two caches: One is a traditional LRU cache and the other is a LFU cache. LFU stands for "least-frequently-used" as opposed to "least-recently-used." PostgreSQL divides the shared memory segment into one cache that buffers recently used pages and a second cache that buffers frequently used pages. That means that if your shared buffer cache contains 1,024 pages, some pages will contain recently used pages and some will contain frequently used pages. How many pages does PostgreSQL devote to each cache? It depends on your workload: PostgreSQL adjusts the relative size of each cache as it runs. If the server sees a period of high "locality of reference" (meaning that the current workload is frequently accessing a small set of pages), it devotes more space to the LFU cache (taking pages away from the LRU cache). If the server sees a request for a page that was recently evicted from the LRU cache, it devotes more space to the LRU cache (taking pages away from the LFU cache). To see how ARC affects the shared buffer cache, we'll show you two simple querieswe'll run the queries first in PostgreSQL version 7.4.2, then again in version 8.0.

$ timer > "SELECT * FROM recalls WHERE record_id > 8000 AND record_id < 8050" recalls +-------------+--------------------------------+------------------------------+ | | SEQUENTIAL I/O | INDEXED I/O | | |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached| |-------------+------+-------+----------+------+------+-------+---------+------+ |recalls | 0 | 0 | 4 | 45 | 1 | 49 | 3 | 0 | +-------------+------+-------+----------+------+------+-------+---------+------+

To satisfy this query, PostgreSQL uses the index that covers the record_id column. Because we just restarted the 7.4 server (and we have enough room for 512 pages in the shared buffer cache), PostgreSQL had to read all three of the index blocks that we hit from disk.

$ timer > "SELECT * FROM recalls WHERE record_id > 8000 AND record_id < 8050" recalls +-------------+--------------------------------+-------------------------------+ | | SEQUENTIAL I/O | INDEXED I/O | | |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached| |-------------+------+-------+----------+------+------+-------+---------+------+ |recalls | 0 | 0 | 0 | 49 | 1 | 49 | 0 | 3 | +-------------+--------------------------------+-------------------------------+

When we execute the same query again, PostgreSQL reads the same three index blocks, but this time, it finds them in the cache. Now we'll execute a query that causes a table scan:

$ timer "SELECT * FROM recalls" recalls +-------------+--------------------------------+-------------------------------+ | | SEQUENTIAL I/O | INDEXED I/O | | |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached| |-------------+------+-------+----------+------+------+-------+---------+------+ |recalls | 1 | 39241 | 4400 | 0 | 0 | 0 | 0 | 0 | +-------------+--------------------------------+-------------------------------+

This query made a complete pass through the table, shuffling all 4,400 heap blocks through a cache that can only hold 512 blocks. When the query completes, the last 512 or so heap blocks that we read are still in the cache. Now go back and execute the first query (the one that causes a partial index scan):

$ timer > "SELECT * FROM recalls WHERE record_id > 8000 AND record_id < 8050" recalls +-------------+--------------------------------+-------------------------------+ | | SEQUENTIAL I/O | INDEXED I/O | | |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached| |-------------+------+-------+----------+------+------+-------+---------+------+ |recalls | 0 | 0 | 4 | 45 | 1 | 49 | 3 | 0 | +-------------+--------------------------------+-------------------------------+

Notice that PostgreSQL had to read the same three index blocks again, but the intervening table scan has evicted them from the cache and they must be read from disk.

Now here is the same sequence running in a version 8.0 server. Again, we'll execute the same query that caused a partial index scan:

$ timer > "SELECT * FROM recalls WHERE record_id > 8000 AND record_id < 8050" recalls +-------------+--------------------------------+-------------------------------+ | | SEQUENTIAL I/O | INDEXED I/O | | |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached| |-------------+------+-------+----------+------+------+-------+---------+------+ |recalls | 0 | 0 | 5 | 0 | 1 | 49 | 4 | 0 | +-------------+--------------------------------+-------------------------------+

And we'll execute it again just to make sure that the index blocks did in fact stay in the cache:

$ timer > "SELECT * FROM recalls WHERE record_id > 8000 AND record_id < 8050" recalls +-------------+--------------------------------+-------------------------------+ | | SEQUENTIAL I/O | INDEXED I/O | | |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached| |-------------+------+-------+----------+------+------+-------+---------+------+ |recalls | 0 | 0 | 0 | 5 | 1 | 49 | 0 | 4 | +-------------+--------------------------------+-------------------------------+

Now we'll execute a query that, in version 7.4.2, threw the index blocks out of the cache:

$ timer "SELECT * FROM recalls" recalls +-------------+--------------------------------+-------------------------------+ | | SEQUENTIAL I/O | INDEXED I/O | | |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached| |-------------+------+-------+----------+------+------+-------+---------+------+ |recalls | 1 | 39241 | 4400 | 5 | 0 | 0 | 0 | 0 | +-------------+--------------------------------+-------------------------------+

And finally, we'll repeat the partial index scan query:

$ timer > "SELECT * FROM recalls WHERE record_id > 8000 AND record_id < 8050" recalls +-------------+--------------------------------+-------------------------------+ | | SEQUENTIAL I/O | INDEXED I/O | | |scans |tuples |heap_blks |cached|scans |tuples |idx_blks |cached| |-------------+------+-------+----------+------+------+-------+---------+------+ |recalls | 0 | 0 | 0 | 5 | 1 | 49 | 0 | 4 | +-------------+--------------------------------+-------------------------------+

This time, the 8.0 server has retained the frequently used index blocks in the LFU cache.

Категории