Inside Microsoft Windows 2000, Third Edition (Microsoft Programming Series)
The first time a file's data is accessed for a read or write operation, the file system driver is responsible for determining whether some part of the file is mapped in the system cache. If it's not, the file system driver must call the CcInitializeCacheMap function to set up the per-file data structures described in the preceding section.
Once a file is set up for cached access, the file system driver calls one of several functions to access the data in the file. There are three primary methods for accessing cached data, each intended for a specific situation:
- The copy read method copies user data between cache buffers in system space and a process buffer in user space.
- The mapping and pinning method uses virtual addresses to read and write data directly to cache buffers.
- The physical memory access method uses physical addresses to read and write data directly to cache buffers.
File system drivers must provide two versions of the file read operation—cached and noncached—to prevent an infinite loop when the memory manager processes a page fault. When the memory manager resolves a page fault by calling the file system to retrieve data from the file (via the device driver, of course), it must specify this noncached read operation by setting the "no cache" flag in the IRP.
The next three sections explain these cache access mechanisms, their purpose, and how they're used.
Copying to and from the Cache
Because the system cache is in system space, it is mapped into the address space of every process. As with all system space pages, however, cache pages aren't accessible from user mode, because that would be a potential security hole. (For example, a process might not have the rights to read a file whose data is currently contained in some part of the system cache.) Thus, user application file reads and writes to cached files must be serviced by kernel-mode routines that copy data between the cache's buffers in system space and the application's buffers residing in the process address space. The functions that file system drivers can use to perform this operation are listed in Table 11-8.
Table 11-8 Kernel-Mode Functions for Copying to and from the Cache
Function | Description |
---|---|
CcCopyRead | Copies a specified byte range from the system cache to a user buffer |
CcFastCopyRead | Faster variation of CcCopyRead but limited to 32-bit file offsets and synchronous reads (used by NTFS, not FAT) |
CcCopyWrite | Copies a specified byte range from a user buffer to the system cache |
CcFastCopyWrite | Faster variation of CcCopyWrite but limited to 32-bit file offsets and synchronous, non-write-through writes (used by NTFS, not FAT) |
You can examine read activity from the cache via the performance counters or system variables listed in Table 11-9.
Table 11-9 System Variables for Examining Read Activity from the Cache
Performance Counter (frequency) | System Variable (count) | Description |
---|---|---|
Cache: Copy Read Hits % | (CcCopyReadWait + CcCopyReadNoWait) / (CcCopyReadWait + (CcCopyReadWaitMiss + CcCopyReadNoWait) + CcCopyReadNoWaitMiss) | Percentage of copy reads to parts of files that were in the cache (A copy read can still generate paging I/O—the Memory: Cache Faults/Sec counter reports page fault activity for the system working set but includes both hard and soft page faults, so the counter still doesn't indicate actual paging I/O caused by cache faults.) |
Cache: Copy Reads/Sec | CcCopyReadWait + CcCopyReadNoWait | Total copy reads from the cache |
Cache: Sync Copy Reads/Sec | CcCopyReadWait | Synchronous copy reads from the cache |
Cache: Async Copy Reads/Sec | CcCopyReadNoWait | Asynchronous copy reads from the cache |
Caching with the Mapping and Pinning Interfaces
Just as user applications read and write data in files on a disk, file system drivers need to read and write the data that describes the files themselves (the metadata, or volume structure data). Because the file system drivers run in kernel mode, however, they could, if the cache manager were properly informed, modify data directly in the system cache. To permit this optimization, the cache manager provides the functions shown in Table 11-10. These functions permit the file system drivers to find where in virtual memory the file system metadata resides, thus allowing direct modification without the use of intermediary buffers.
Table 11-10 Functions for Finding Metadata Locations
Function | Description |
---|---|
CcMapData | Maps the byte range for read access |
CcPinRead | Maps the byte range for read/write access and pins it |
CcPreparePinWrite | Maps and pins the byte range for write access |
CcPinMappedData | Pins a previously mapped buffer |
CcSetDirtyPinnedData | Notifies the cache manager that the data has been modified |
CcUnpinData | Releases the pages so that they can be removed from memory |
If a file system driver needs to read file system metadata in the cache, it calls the cache manager's mapping interface to obtain the virtual address of the desired data. The cache manager touches all the requested pages to bring them into memory and then returns control to the file system driver. The file system driver can then access the data directly.
If the file system driver needs to modify cache pages, it calls the cache manager's pinning services, which keep the pages being modified in memory. The pages aren't actually locked into memory (such as when a device driver locks pages for direct memory access transfers). Instead, the memory manager's mapped page writer (explained in Chapter 7) sees that these pages are pinned and doesn't write the pages to disk until the file system driver unpins (releases) them. When the pages are released, the cache manager flushes any changes to disk and releases the cache view that the metadata occupied.
The mapping and pinning interfaces solve one thorny problem of implementing a file system: buffer management. Without directly manipulating cached metadata, a file system must predict the maximum number of buffers it will need when updating a volume's structure. By allowing the file system to access and update its metadata directly in the cache, the cache manager eliminates the need for buffers, simply updating the volume structure in the virtual memory the memory manager provides. The only limitation the file system encounters is the amount of available memory.
You can examine pinning and mapping activity in the cache via the performance counters or system variables listed in Table 11-11.
Table 11-11 System Variables for Examining Pinning and Mapping Activity
Performance Counter (frequency) | System Variable (count) | Description |
---|---|---|
Cache: Data Map Hits % | (CcMapDataWait + CcMapDataNoWait)/ (CcMapDataWait + CcMapDataNoWait)+ (CcMapDataWaitMiss + CcMapDataNoWaitMiss) | Percentage of data maps to parts of files that were in the cache (A copy read can still generate paging I/O.) |
Cache: Data Maps/Sec | CcMapDataWait + CcMapDataNoWait | Total data maps from the cache |
Cache: Sync Data Maps/Sec | CcMapDataWait | Synchronous data maps from the cache |
Cache: Async Data Maps/Sec | CcMapDataNoWait | Asynchronous data maps from the cache |
Cache: Data Map Pins/Sec | CcPinMappedDataCount | Number of requests to pin mapped data |
Cache: Pin Read Hits % | (CcPinReadWait + CcPinReadNoWait) / (CcPinReadWait + CcPinReadNoWait)+ (CcPinReadWaitMiss + CcPinReadNoWaitMiss) | Percentage of pinned reads to parts of files that were in the cache (A copy read can still generate paging I/O.) |
Cache: Pin Reads/Sec | CcPinReadWait + CcPinReadNoWait | Total pinned reads from the cache |
Cache: Sync Pin Reads/Sec | CcPinReadWait | Synchronous pinned reads from the cache |
Cache: Async Pin Reads/Sec | CcPinReadNoWait | Asynchronous pinned reads from the cache |
Caching with the Direct Memory Access Interfaces
In addition to the mapping and pinning interfaces used to access metadata directly in the cache, the cache manager provides a third interface to cached data: direct memory access (DMA). The DMA functions are used to read from or write to cache pages without intervening buffers, such as when a network file system is doing a transfer over the network.
The DMA interface returns to the file system the physical addresses of cached user data (rather than the virtual addresses, which the mapping and pinning interfaces return), which can then be used to transfer data directly from physical memory to a network device. Although small amounts of data (1 KB to 2 KB) can use the usual buffer-based copying interfaces, for larger transfers, the DMA interface can result in significant performance improvements for a network server processing file requests from remote systems.
To describe these references to physical memory, a memory descriptor list (MDL) is used. (MDLs were introduced in Chapter 7.) The four separate functions described in Table 11-12 create the cache manager's DMA interface.
Table 11-12 Functions That Create the DMA Interface
Function | Description |
---|---|
CcMdlRead | Returns an MDL describing the specified byte range |
CcMdlReadComplete | Frees the MDL |
CcMdlWrite | Returns an MDL describing a specified byte range (possibly containing zeros) |
CcMdlWriteComplete | Frees the MDL and marks the range for writing |
You can examine MDL activity from the cache via the performance counters or system variables listed in Table 11-13.
Table 11-13 System Variables for Examining MDL Activity from the Cache
Performance Counter (frequency) | System Variable (count) | Description |
---|---|---|
Cache: MDL Read Hits % | (CcMdlReadWait + CcMdlReadNoWait) / (CcMdlReadWait + CcMdlReadNoWait) + (CcMdlReadWaitMiss + (CcMdlReadWaitMiss + CcMdlReadNoWaitMiss) | Percentage of MDL reads to parts of files that were in the cache (References to pages satisfied by an MDL read can still generate paging I/O.) |
Cache: MDL Reads/Sec | CcMdlReadWait + CcMdlReadNoWait | Total MDL reads from the cache |
Cache: Sync MDL Reads/Sec | CcMdlReadWait | Synchronous MDL reads from the cache |
Cache: Async MDL Reads/Sec | CcMdlReadNoWait | Asynchronous MDL reads from the cache |
Write Throttling
Windows 2000 must determine whether the scheduled writes will affect system performance and then schedule any delayed writes. First it asks whether a certain number of bytes can be written right now without hurting performance and blocks that write if necessary. Then it sets up callback for automatically writing the bytes when writes are again permitted. Once it's notified of an impending write operation, the cache manager determines how many dirty pages are in the cache and how much physical memory is available. If few physical pages are free, the cache manager momentarily blocks the file system thread that's requesting to write data to the cache. The cache manager's lazy writer flushes some of the dirty pages to disk and then allows the blocked file system thread to continue. This write throttling prevents system performance from degrading because of a lack of memory when a file system or network server issues a large write operation.
Write throttling is also useful for network redirectors transmitting data over slow communication lines. For example, suppose a local process writes a large amount of data to a remote file system over a 9600-baud line. The data isn't written to the remote disk until the cache manager's lazy writer flushes the cache. If the redirector has accumulated lots of dirty pages that are flushed to disk at once, the recipient could receive a network timeout before the data transfer completes. By using the CcSetDirtyPageThreshold function, the cache manager allows network redirectors to set a limit on the number of dirty cache pages they can tolerate, thus preventing this scenario. By limiting the number of dirty pages, the redirector ensures that a cache flush operation won't cause a network timeout.
EXPERIMENT
Viewing the Write-Throttle Parameters
The !defwrites kernel debugger command dumps the values of the kernel variables the cache manager uses, including the number of dirty pages in the file cache (CcTotalDirtyPages) when determining whether it should throttle write operations:
kd> !defwrites *** Cache Write Throttle Analysis *** CcTotalDirtyPages: 758 ( 3032 Kb) CcDirtyPageThreshold: 770 ( 3080 Kb) MmAvailablePages: 42255 ( 169020 Kb) MmThrottleTop: 250 ( 1000 Kb) MmThrottleBottom: 30 ( 120 Kb) MmModifiedPageListHead.Total: 689 ( 2756 Kb) CcTotalDirtyPages within 64 (max charge) pages of the threshold, writes may be throttled Check these thread(s): CcWriteBehind(LazyWriter) Check critical workqueue for the lazy writer, !exqueue 16 |
This output shows that the number of dirty pages is close to the number that triggers write throttling (CcDirtyPageThreshold), so if a process tried to write more than 12 pages (48 KB) at the time of the experiment, it would be delayed until the lazy writer lowered the number of dirty pages.