In Figure 7-7, we see the relationship between the various kernel-resident data structures required to manage the swap space. Figure 7-7. Swap Kernel Structures
As we see in the diagram, the hambone is connected to the knee bone. Swap management starts with entries in the swap device table, swdevt, and the file system device table, fswdevt. Pointers from the two priority arrays are used to order the searching of the swap devices. A swap device is broken down into individual swap chunks. The kernel tunable maxswapchunks sets the systemwide number and determines the size of the swap table. Each swap chunk is sized by the kernel-tunable swchunk, which defaults to 2 MB, enough room for 512 page-outs. Individual pages have entries in the chunk's swap map. This map also contains a linked free list for its pages. The swap maps are pointed to by entries in the system swap table, swaptab. The swap table is filled in the order the swap devices were enabled but is searched according to the order implied by the priority table pointers. If two devices share a priority, their chunks are searched in round-robin fashion. Let's examine these structures in Listings 7.1 through 7.6. Listing 7.1. q4> fields struct devpri These structures are maintained in the array swdev_pri[NSWPRI] (default value is 11) Pointer to the first swap device at this priority 0 0 4 0 * first Pointer to next device at this priority to allocate from 4 0 4 0 * curr Listing 7.2. q4> fields struct fspri These structures are maintained in the array swfs_pri[NSWPRI] (default value is 11) Pointer to first file system swap at this priority 0 0 4 0 * first Pointer to next swap area at this priority to allocate from 4 0 4 0 * curr Listing 7.3. q4> fields swdev_t The swap device number 0 0 4 0 int sw_dev The swap device flags (i.e. SW_ENABLE) 4 0 4 0 int sw_flags The Kbyte (DEV_BSIZE) offset to the beginning of the swap area on the disk device 8 0 4 0 long sw_start Number of blocks on the device 12 0 4 0 long sw_nblksavail Number of blocks enabled for swap 16 0 4 0 long sw_nblksenabled Number of free pages 20 0 4 0 int sw_nfpgs Swap priority for this device 24 0 4 0 int sw_priority First swap table entry for this device 28 0 4 0 int sw_head Last swap table entry for this device 32 0 4 0 int sw_tail Pointer to next swap device sharing the same priority 36 0 4 0 * sw_next Listing 7.4. q4> fields fswdev_t Pointer to next file system swap area with the same priority 0 0 4 0 * fsw_next The status flags 4 0 4 0 int fsw_flags Number of free swap pages 8 0 4 0 int fsw_nfpgs Number of blocks allocated 12 0 4 0 long fsw_allocated Minimum number of preallocated blocks 16 0 4 0 u_long fsw_min The block allocation limit 20 0 4 0 u_long fsw_limit The block reservation limit (File System swap equivalent to minimum free space) 24 0 4 0 u_long fsw_reserve Priority for this file system swap space 28 0 4 0 int fsw_priority Pointer to the vnode for the file system's mount point 32 0 4 0 * fsw_vnode The underlying file system's block size 36 0 4 0 u_int fsw_bsize This swap space's first swap table entry 40 0 2 0 short fsw_head This swap space's last swap table entry 42 0 2 0 short fsw_tail The directory path name for the underlying file system's mount point 44 0 256 0 char[256] fsw_mntpoint Listing 7.5. q4> fields swpt_t Index to the first free swapmap array entry 0 0 2 0 short st_free Index of next chunk for same device or file system swap area 2 0 2 0 short st_next Status flags (ST_INDEL|ST_FREE|ST_INUSE) 4 0 4 0 int st_flags Pointer to swap device 8 0 4 0 * st_dev Pointer to swap file system 12 0 4 0 * st_fsp Device of file system chunk vnode 16 0 4 0 * st_vnode Number of free pages on the device 20 0 4 0 int st_nfpgs Pointer to a swap maps starting address 24 0 4 0 * st_swpmp Listing 7.6. q4> fields swpm_t Number of kthreads using this page 0 0 2 0 u_short sm_ucnt Index to first free entry in this swap map 2 0 2 0 short sm_next As a final bit of discussion, do you see the slight-of-hand trick played by the disk block descriptor data? How can the 28-bit field in the dbd point to a specific device and an offset on the device. Device numbers are 32 bits long by themselves, and the block address on a modern disk may be quite large. The smoke and mirrors employed here are several levels of indirection. The upper half of the dbd data, dbd_swptb, points to the appropriate swap table entry. Here we pick up st_dev, the device number, and st_swpmp, the pointer to this chunk's swap map. Next, the dbd dbd_swpmp is the page offset into the swap chunk. This means that currently no more than 2^14 swap chunks may be configured on a system and that each chunk may hold only 2^14 pages at most. If we do the math, this limits the maximum device swap space to: 2^14 * 2^14 * 4096 or 2^40 or 1 TB |