The Linux Kernel Primer. A Top-Down Approach for x86 and PowerPC Architectures

4.2. Memory Zones

Not all pages are created equal. Some computer architectures have constraints on what certain physical address ranges of memory can be used for. For example, in x86, some ISA buses are only able to address the first 16MB of RAM. Although PPC does not have this constraint, the memory zone concepts are ported to simplify the architecture-independent portion of the code. In the architecture-dependent portion of the PPC code, these zones are set to overlap. Another such constraint is seen in a system that has more RAM than it can address with its linear address space.

A memory zone is composed of page frames or physical pages, which means that a page frame is allocated from a particular memory zone. Three memory zones exist in Linux: ZONE_DMA (used for DMA page frames), ZONE_NORMAL (non-DMA pages with virtual mapping), and ZONE_HIGHMEM (pages whose addresses are not contained in the virtual address space).

4.2.1. Memory Zone Descriptor

As with all objects that the kernel manages, a memory zone has a structure called zone, which stores all its information. The zone struct is defined in include/linux/mmzone.h. We now closely look at some of the most commonly used fields:

----------------------------------------------------------------------------- include/linux/mmzone.h 66 struct zone { ... 70 spinlock_t lock; 71 unsigned long free_pages; 72 unsigned long pages_min, pages_low, pages_high; 73 74 ZONE_PADDING(_pad1_) 75 76 spinlock_t lru_lock; 77 struct list_head active_list; 78 struct list_head inactive_list; 79 atomic_t refill_counter; 80 unsigned long nr_active; 81 unsigned long nr_inactive; 82 int all_unreclaimable; /* All pages pinned */ 83 unsigned long pages_scanned; /* since last reclaim */ 84 85 ZONE_PADDING(_pad2_) ... 103 int temp_priority; 104 int prev_priority; ... 109 struct free_area free_area[MAX_ORDER]; ... 135 wait_queue_head_t * wait_table; 136 unsigned long wait_table_size; 137 unsigned long wait_table_bits; 138 139 ZONE_PADDING(_pad3_) 140 ... 157 } ____cacheline_maxaligned_in_smp; -----------------------------------------------------------------------------

4.2.1.1. lock

The zone descriptor must be locked when it is being manipulated to prevent read/write errors. The lock field holds the spinlock that protects the descriptor from this.

This is a lock for the descriptor itself and not for the memory range with which it is associated.

4.2.1.2. free_pages

The free_pages field holds the number of free pages that are left in the zone. This unsigned long is decremented every time a page is allocated from the particular zone and incremented every time a page is returned to the zone. The total amount of free RAM returned by a call to nr_free_pages() is calculated by adding this value from all three zones.

4.2.1.3. pages_min, pages_low, and pages_high

The pages_min, pages_low, and pages_high fields hold the zone watermark values. When the number of available pages reaches each of these watermarks, the kernel responds to the memory shortage in ways suited for each decrementally serious situation.

4.2.1.4. lru_lock

The lru_lock field holds the spinlock for the free page list.

4.2.1.5. active_list and inactive_list

active_list and inactive_list are involved in the page reclamation functionality. The first is a list of the active pages and the second is a list of pages that can be reclaimed.

4.2.1.6. all_unreclaimable

The all_unreclaimable field is set to 1 if all pages in the zone are pinned. They will only be reclaimed by kswapd, which is the pageout daemon.

4.2.1.7. pages_scanned, temp_priority, and prev_priority

The pages_scanned, temp_priority, and prev_priority fields are all involved with page reclamation functionality, which is outside the scope of this book.

4.2.1.8. free_area

The buddy system uses the free_area bitmap.

4.2.1.9. wait_table, wait_table_size, and wait_table_bits

The wait_table, wait_table_size, and wait_table_bits fields are associated with process wait queues on the zone's pages.

Cache Aligning and Zone Padding

Cache aligning is done to improve performance on descriptor field accesses. Cache aligning improves performance by minimizing the number of instructions needed to copy a chunk of data. Take the case of having a 32-bit value not aligned on a word. The processor would need to make two "load word" instructions to get the data onto registers as opposed to just one. ZONE_PADDING shows how cache aligning is performed on a memory zone:

[View full width]

--------------------------------------------------------------------------- include/linux/mmzone.h #if defined(CONFIG_SMP) struct zone_padding { int x; } ____cacheline_maxaligned_in_smp; #define ZONE_PADDING(name) struct zone_padding name; #else #define ZONE_PADDING(name) #endif ---------------------------------------------------------------------------

If you want to know more about how cache aligning works in Linux, refer to include/linux/cache.h.

4.2.2. Memory Zone Helper Functions

When actions are commonly applied to an object, or information is often requested of an object, usually, helper functions make coding easier. Here, we present a couple of helper functions that facilitate memory zone manipulation.

4.2.2.1. for_each_zone()

The for_each_zone() macro iterates over all zones:

----------------------------------------------------------------------------- include/linux/mmzone.h 268 #define for_each_zone(zone) \ 269 for (zone = pgdat_list->node_zones; zone; zone = next_zone(zone)) -----------------------------------------------------------------------------

4.2.2.2. is_highmem() and is_normal()

The is_highmem() and is_normal() functions check if zone struct is in the highmem or normal zones, respectively:

----------------------------------------------------------------------------- include/linux/mmzone.h 315 static inline int is_highmem(struct zone *zone) 316 { 317 return (zone - zone->zone_pgdat->node_zones == ZONE_HIGHMEM); 318 } 319 320 static inline int is_normal(struct zone *zone) 321 { 322 return (zone - zone->zone_pgdat->node_zones == ZONE_NORMAL); 323 } -----------------------------------------------------------------------------

Категории