Mac OS X Internals: A Systems Approach

5.3. High-Level Processor Initialization

Figure 58 shows an overview of the control flow of the ppc_init() function, including other notable functions it calls. Note that ppc_init() also marks the transition from assembly-language code to C code.

Figure 58. High-level processor initialization

ppc_init() first sets up various fields in the per-processor data area of the boot processor. One of the fields is pp_cbfr [osfmk/console/ppc/serial_console.c], a pointer to a per-processor console buffer used by the kernel to handle multiprocessor console output. Let us look at the key operations performed by each function in the sequence depicted in Figure 58.

5.3.1. Before Virtual Memory

thread_bootstrap() [osfmk/kern/thread.c] populates a static thread structure (tHRead_template) used as a template for fast initialization of newly created threads. It then uses this template to initialize init_thread, another static thread structure. thread_bootstrap() finishes by setting init_thread as the current thread, which in turn loads the SPRG1 register[6] with init_thread. Upon return from thread_bootstrap(), ppc_init() initializes certain aspects of the current thread's machine-dependent state.

[6] SPRG1 holds the active thread.

cpu_bootstrap() [osfmk/ppc/cpu.c] initializes certain locking data structures.

cpu_init() [osfmk/ppc/cpu.c] restores the Timebase Register from values saved in the per_proc_info structure. It also sets the values of some informational fields in the per_proc_info structure.

// osfmk/ppc/cpu.c void cpu_init(void) { // Restore the Timebase ... proc_info->cpu_type = CPU_TYPE_POWERPC; proc_info->cpu_subtype = (cpu_subtype_t)proc_info->pf.rptdProc; proc_info->cpu_threadtype = CPU_THREADTYPE_NONE; proc_info->running = TRUE; }

processor_bootstrap() [osfmk/kern/processor.c] is a Mach function that sets the value of the global variable master_processor from the value of the global variable master_cpu, which is set to 0 before this function is called. It calls the cpu_to_processor() [osfmk/ppc/cpu.c] function to convert a cpu (an integer) to a processor (a processor_t).

// osfmk/ppc/cpu.c processor_t cpu_to_processor(int cpu) { return ((processor_t)PerProcTable[cpu].ppe_vaddr->processor); }

As we saw in Figure 53, the ppe_vaddr field points to a per_proc_info structure. Its processor field, shown as a character array in Figure 53, houses a processor_t data type, which is Mach's abstraction for a processor.[7] Its contents include several data structures related to scheduling. processor_bootstrap() calls processor_init() [osfmk/kern/processor.c], which initializes a processor_t's scheduling-related fields, and sets up a timer for quantum expiration.

[7] We will look at details of Mach's processor abstraction in Chapter 7.

ppc_init() then sets the static_memory_end global variable to the highest address used in the kernel's data area, rounded off to the nearest page. Recall from Chapter 4 that the topOfKernelData field of the boot_args structure contains this value. ppc_init() calls PE_init_platform() [pexpert/ppc/pe_init.c] to initialize some aspects of the Platform Expert. The call is made with the first argument (vm_initialized) set to FALSE, indicating that the virtual memory (VM) subsystem is not yet initialized. PE_init_platform() copies the boot arguments pointer, the pointer to the device tree, and the display properties to a global structure variable called PE_state, which is of type PE_state_t.

// pexpert/pexpert/pexpert.h typedef struct PE_state { boolean_t initialized; PE_Video video; void *deviceTreeHead; void *bootArgs; #if __i386__ void *fakePPCBootArgs; #endif } PE_state_t; extern PE_state_t PE_state; // pexpert/ppc/pe_init.c PE_state_t PE_state;

PE_init_platform() then calls DTInit() [pexpert/gen/device_tree.c] to initialize the Open Firmware device tree routines. DTInit() simply initializes a pointer to the device tree's root node. Finally, PE_init_platform() calls pe_identify_machine() [pexpert/ppc/pe_identify_machine.c], which populates a clock_frequency_info_t variable (gPEClockFrequencyInfo) with various frequencies such as that of the Timebase, the processor, and the bus.

// pexpert/pexpert/pexpert.h struct clock_frequency_info_t { unsigned long bus_clock_rate_hz; unsigned long cpu_clock_rate_hz; unsigned long dec_clock_rate_hz; ... unsigned long long cpu_frequency_hz; unsigned long long cpu_frequency_min_hz; unsigned long long cpu_frequency_max_hz; }; typedef struct clock_frequency_info_t clock_frequency_info_t; extern clock_frequency_info_t gPEClockFrequencyInfo;

ppc_init() parses several boot arguments at this point, such as novmx, fn, pmsx, lcks, diag, ctrc, tb, maxmem, wcte, mcklog, and ht_shift. We came across all these in Chapter 4. However, not all arguments are processed immediatelyin the case of some arguments, ppc_init() sets the values of only certain kernel variables for later referral.

5.3.2. Low-Level Virtual Memory Initialization

ppc_init() calls ppc_vm_init() [osfmk/ppc/ppc_vm_init.c] to initialize hardware-dependent aspects of the virtual memory subsystem. The key actions performed by ppc_vm_init() are shown in Figure 58.

5.3.2.1. Sizing Memory

ppc_vm_init() first invalidates the in-memory shadow BATs by loading them with zeros. It then retrieves information about physical memory banks from the boot arguments. This information is used to calculate the total amount of memory on the machine. For each available bank that is usable, ppc_vm_init() initializes a memory region structure (mem_region_t).

// osfmk/ppc/mappings.h typedef struct mem_region { phys_entry *mrPhysTab; // Base of region table ppnum_t mrStart; // Start of region ppnum_t mrEnd; // Last page in region ppnum_t mrAStart; // Next page in region to allocate ppnum_t mrAEnd; // Last page in region to allocate } mem_region_t; ... #define PMAP_MEM_REGION_MAX 11 extern mem_region_t \ pmap_mem_regions[PMAP_MEM_REGION_MAX + 1]; extern int pmap_mem_regions_count; ...

Note that it is possible for physical memory to be noncontiguous. The kernel maps the potentially noncontiguous physical space into contiguous physical-to-virtual mapping tables. pmap_vm_init() creates an entry in the pmap_mem_regions array for each DRAM bank it uses, while incrementing pmap_mem_regions_count. The kernel calculates several maximum values for memory size. For example, on machines with more than 2GB of physical memory, one of the maximum memory values is pinned at 2GB for compatibility. Certain data structures must also reside within the first 2GB of physical memory. The following are specific examples of memory limits established by ppc_vm_init().

  • mem_size is the 32-bit physical memory size, minus any performance buffer. It is pinned at 2GB on machines with more than 2GB of physical memory. It can be limited by the maxmem boot-time argument.

  • max_mem is the 64-bit memory size. It can also be limited by maxmem.

  • mem_actual is the 64-bit physical memory size that equals the highest physical address plus 1. It cannot be limited by maxmem.

  • sane_size is the same as max_mem, unless max_mem exceeds VM_MAX_KERNEL_ADDRESS, in which case sane_size is pinned at VM_MAX_KERNEL_ADDRESS, which is defined to be 0xDFFFFFFF (3.5GB) in osfmk/mach/ppc/vm_param.h.

ppc_vm_init() sets the first_avail variable, which represents the first available virtual address, to static_memory_end (note that virtual memory is not operational yet). Next, it computes kmapsizethe size of kernel text and databy retrieving segment addresses from the kernel's Mach-O headers. It then calls pmap_bootstrap() [osfmk/ppc/pmap.c] with three arguments: max_mem, first_avail, and kmapsize. Next, pmap_bootstrap() prepares the system for running with virtual memory.

5.3.2.2. Pmap Initialization

The physical map (pmap) layer[8] is the machine-dependent portion of Mach's virtual memory subsystem. pmap_bootstrap() first initializes the kernel's physical map (kernel_pmap). It then finds space for the page table entry group (PTEG) hash table and the PTEG Control Area (PCA). The in-memory hash table has the following characteristics.

[8] We will discuss the pmap layer in Chapter 8.

  • The kernel allocates one PTEG per four physical pages.[9] As we saw in Chapter 4, the ht_shift boot argument allows the hash table's size to be altered.

    [9] The IBM-recommended hash table size is one PTEG per two physical pages.

  • The table is allocated in physical memory in the highest available range of physically contiguous memory.

  • The PCA resides immediately before the hash table. Its size is calculated from the hash table size.

The PCA's structure is declared in osfmk/ppc/mappings.h.

// osfmk/ppc/mappings.h typedef struct PCA { union flgs { unsigned int PCAallo; // Allocation controls struct PCAalflgs { unsigned char PCAfree; // Indicates the slot is free unsigned char PCAsteal; // Steal scan start position unsigned char PCAauto; // Indicates that the PTE was autogenned unsigned char PCAmisc; // Miscellaneous flags #define PCAlock 1 // This locks up the associated PTEG #define PCAlockb 31 } PCAalflgs; } flgs; } PCA_t;

The program in Figure 59 performs the same calculations as the kernel to calculate the page hash table size on a machine. You can use it to determine the amount of memory used by the page table given the amount of physical memory on the machine and the size of a PTEG. Note the use of the cntlzw PowerPC instruction to count the number of leading zeros.

Figure 59. Calculating the PowerPC PTEG hash table size used by the kernel

$ cat hash_table_size.c // hash_table_size.c #define PROGNAME "hash_table_size" #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <mach/vm_region.h> typedef unsigned int uint_t; #define PTEG_SIZE_G4 64 #define PTEG_SIZE_G5 128 extern unsigned int cntlzw(unsigned int num); vm_size_t calculate_hash_table_size(uint64_t msize, int pfPTEG, int hash_table_shift) { unsigned int nbits; uint64_t tmemsize; vm_size_t hash_table_size; // Get first bit in upper half nbits = cntlzw(((msize << 1) - 1) >> 32); // If upper half is empty, find bit in lower half if (nbits == 32) nbits = nbits + cntlzw((uint_t)((msize << 1) - 1)); // Get memory size rounded up to a power of 2 tmemsize = 0x8000000000000000ULL >> nbits; // Ensure 32-bit arithmetic doesn't overflow if (tmemsize > 0x0000002000000000ULL) tmemsize = 0x0000002000000000ULL; // IBM-recommended hash table size (1 PTEG per 2 physical pages) hash_table_size = (uint_t)(tmemsize >> (12 + 1)) * pfPTEG; // Mac OS X uses half of the IBM-recommended size hash_table_size >>= 1; // Apply ht_shift, if necessary if (hash_table_shift >= 0) // make size bigger hash_table_size <<= hash_table_shift; else // Make size smaller hash_table_size >>= (-hash_table_shift); // Ensure minimum size if (hash_table_size < (256 * 1024)) hash_table_size = (256 * 1024); return hash_table_size; } int main(int argc, char **argv) { vm_size_t htsize; uint64_t msize; if (argc != 2) { fprintf(stderr, "%s <memory in MB>\n", PROGNAME); exit(1); } msize = ((uint64_t)(atoi(argv[1])) << 20); htsize = calculate_hash_table_size(msize, PTEG_SIZE_G5, 0); printf("%d bytes (%dMB)\n", htsize, htsize >> 20); exit(0); } $ cat cntlzw.s ; cntlzw.s ; count leading zeros in a 32-bit word ; .text .align 4 .globl _cntlzw _cntlzw: cntlzw r3,r3 blr $ gcc -Wall -o hash_table_size hash_table_size.c cntlzw.s $ ./hash_table_size 4096 33554432 bytes (32MB) $ ./hash_table_size 2048 16777216 bytes (16MB)

pmap_bootstrap() calls hw_hash_init() [osfmk/ppc/hw_vm.s] to initialize the hash table and the PCA. It then calls hw_setup_trans() [osfmk/ppc/hw_vm.s], which we came across earlier in this chapter. Recall that hw_setup_trans() only configures the hardware registers required for address translationit does not actually start address translation.

pmap_bootstrap() calculates the amount of memory that needs to be designated as "allocated" (i.e., it cannot be marked free). This includes memory for the initial context save areas, trace tables, physical entries (phys_entry_t), the kernel text, the logical pages (struct vm_page) needed to map physical memory, and the address-mapping structures (struct vm_map_entry). It then allocates the initial context save areas by calling savearea_init() [osfmk/ppc/savearea.c]. This allows the processor to take an interrupt.

Save Areas

Save areas are used to store process control blocks (PCBs). Depending on its type, a save area can contain a general processor context, a floating-point context, a vector context, and so on. Various save area structures are declared in osfmk/ppc/savearea.h. A save area never spans a page boundary. Moreover, besides referring to a save area by its virtual address, the kernel may also reference it by its physical address, such as from within an interrupt vector, where exceptions must not occur. The kernel maintains two global save area free lists: the save area free pool and the save area free list. There is one local list for each processor.

pmap_bootstrap() initializes the mapping tables by calling mapping_init() [osfmk/ppc/mappings.c]. It then calls pmap_map() [osfmk/ppc/pmap.c] to map memory for page tables in the kernel's map. The page tables are mapped V=Rthat is, with virtual address being equal to the real address. On 64-bit machines, pmap_bootstrap() calls pmap_map_physical() [osfmk/ppc/pmap.c] to block-map physical memory regionsin units of up to 256MBinto the kernel's address map. Physical memory is mapped at virtual addresses starting from PHYS_MEM_WINDOW_VADDR, which is defined to be 0x100000000ULL (4GB) in osfmk/ppc/pmap.h. Moreover, in this physical memory window, an I/O hole of size IO_MEM_WINDOW_SIZE (defined to be 2GB in osfmk/ppc/pmap.h) is mapped at an offset IO_MEM_WINDOW_VADDR (defined to be 2GB in osfmk/ppc/pmap.h). The pmap_map_iohole() [osfmk/ppc/pmap.c] function is called on a 64-bit machine to map the I/O hole.

Finally, pmap_bootstrap() sets the next available page pointer (first_avail) and the first free virtual address pointer (first_free_virt). The rest of the memory is marked free and is added to the free regions, from where it can be allocated by pmap_steal_memory() [osfmk/vm/vm_resident.c].

ppc_vm_init() now calls pmap_map() to map (again, V=R) exception vectors in the kernel's address map, starting from the address exception_entry through the address exception_endboth addresses are defined in osfmk/ppc/lowmem_vectors.s. Other pmap_map() calls that are made include those for the kernel's text (__TEXT) and data (__DATA) segments. The __KLD and __LINKEDIT segments are mapped (wired) through pmap_enter() [osfmk/ppc/pmap.c], page by page. These segments are unloaded by the I/O Kit in their entirety, to reclaim that memory, after booting completes.

ppc_vm_init() next calls MapUserMemoryWindowInit() [osfmk/ppc/pmap.c] to initialize a mechanism the kernel uses for mapping portions of user-space memory into the kernel. The copyin() and copyout() functions, both of which are implemented in osfmk/ppc/movc.s, primarily use this facility by calling MapUserMemoryWindow() [osfmk/ppc/pmap.c], which maps a user address range into a predefined kernel range. The range is 512MB in size and starts at USER_MEM_WINDOW_VADDR, which is defined to be 0xE0000000ULL (3.5GB) in osfmk/ppc/pmap.h.

5.3.2.3. Starting Address-Translation

Now that the memory management hardware has been configured and virtual memory subsystem data structures have been allocated and initialized, ppc_vm_init() calls hw_start_trans() [osfmk/ppc/hw_vm.s] to start address translation. Note that this is the first time in the boot process that address translation is enabled.

5.3.3. After Virtual Memory

ppc_init() makes a call to PE_init_platform(), but with the vm_initialized Boolean argument set to trUE (unlike the earlier call made by ppc_init()). As a result, PE_init_platform() calls pe_init_debug() [pexpert/gen/pe_gen.c], which copies the debug flags, if any, from the boot arguments to the kernel variable DEBUGFlag.

printf_init() [osfmk/kern/printf.c] initializes locks used by the printf() and sprintf() kernel functions. It also calls bsd_log_init() [bsd/kern/subr_log.c] to initialize a message buffer for kernel logging. The buffer structure is declared in bsd/sys/msgbuf.h.

// bsd/sys/msgbuf.h #define MSG_BSIZE (4096 - 3 * sizeof(long)) struct msgbuf { #define MSG_MAGIC 0x063061 long msg_magic; long msg_bufx; // write pointer long msg_bufr; // read pointer char msg_bufc[MSG_BSIZE]; // buffer }; #ifdef KERNEL extern struct msgbuf *msgbufp; ...

Since logs may be written at interrupt level, it is possible for a log manipulation to affect another processor at interrupt level. Therefore, printf_init() also initializes a log spinlock to serialize access to log buffers.

panic_init() [osfmk/kern/debug.c] initializes a lock used to serialize modifications by multiple processors to the global panic string. printf() and panic() are required if a debugger needs to run.

5.3.3.1. Console Initialization

PE_init_kprintf() [pexpert/ppc/pe_kprintf.c] determines which console character output method to use. It checks the /options node in the device tree for the presence of input-device and output-device properties. If either property's value is a string of the format scca:x, where x is a number with six or fewer digits, PE_init_kprintf() attempts to use a serial port, with x being the baud rate. However, if the serialbaud boot argument is present, its value is used as the baud rate instead. PE_init_kprintf() then attempts to find an onboard serial port.

Figure 510 shows an excerpt from kprintf() initialization.

Figure 510. Initialization of the kprintf() function

// pexpert/ppc/pe_kprintf.c void serial_putc(char c); void (* PE_kputc)(char c) = 0; ... vm_offset_t scc = 0; void PE_init_kprintf(boolean_t vm_initialized) { ... // See if "/options" has "input-device" or "output-device" ... if ((scc = PE_find_scc())) { // Can we find a serial port? scc = io_map_spec(scc, 0x1000); // Map the serial port initialize_serial((void *)scc, gPESerialBaud); // Start serial driver PE_kputc = serial_putc; simple_lock_init(&kprintf_lock, 0); } else PE_kputc = cnputc; ... }

PE_find_scc() [pexpert/ppc/pe_identify_machine.c] looks for a serial port[10] in the device tree. If one is found, PE_find_scc() returns the physical I/O address of the port, which is then passed to io_map_spec() [osfmk/ppc/io_map.c] to be mapped into the kernel's virtual address space. Since virtual memory is enabled at this point, io_map_spec() calls io_map() [osfmk/ppc/io_map.c] to allocate pageable kernel memory in which the desired mapping is created. initialize_serial() [osfmk/ppc/serial.c] configures the serial hardware by performing I/O to the appropriate registers. Finally, PE_init_kprintf() sets the PE_kputc function pointer to serial_putc() [osfmk/ppc/ke_printf.c], which in turn calls scc_putc() [osfmk/ppc/serial_io.c] to output a character to a serial line.

[10] A legacy serial port is named escc-legacy, whereas a new-style serial port is named escc in the device tree.

If no serial ports could be found, PE_init_kprintf() sets PE_kprintf to cnputc() [osfmk/console/ppc/serial_console.c], which calls the putc member of the appropriate entry[11] of the cons_ops structure to perform console output.

[11] Depending on whether the serial console or the graphics console is the default, the appropriate entry is set to SCC_CONS_OPS or VC_CONS_OPS, respectively, at compile time.

// osfmk/console/ppc/serial_console.c #define OPS(putc, getc, nosplputc, nosplgetc) putc, getc const struct console_ops { int (* putc)(int, int, int); int (* getc)(int, int, boolean_t, boolean_t); } cons_ops[] = { #define SCC_CONS_OPS 0 { OPS(scc_putc, scc_getc, no_spl_scputc, no_spl_scgetc) }, #define VC_CONS_OPS 1 { OPS(vcputc, vcgetc, no_spl_vcputc, no_spl_vcgetc) }, }; #define NCONSOPS (sizeof cons_ops / sizeof cons_ops[0])

osfmk/console/ppc/serial_console.c contains a console operations table with entries for both a serial console and a video console.

vcputc() [osfmk/console/video_console.c] outputs to the graphical console by drawing characters directly to the framebuffer.

ppc_vm_init() now checks whether a serial console was requested at boot time, and if so, it calls switch_to_serial_console() [osfmk/console/ppc/serial_console.c] to set the SCC_CONS_OPS entry of console_ops as the default for console output.

ppc_vm_init() calls PE_create_console() [pexpert/ppc/pe_init.c] to create either the graphical or the textual console, depending on the type of video set in the PE_state.video.v_display field, which was initialized earlier by PE_init_platform().

// pexpert/ppc/pe_init.c void PE_init_platform(boolean_t vm_initialized, void *_args) { ... boot_args *args = (boot_args *)_args; if (PE_state.initialized == FALSE) { PE_state.initialized = TRUE; ... PE_state.video.v_display = args->Video.v_display; ... } ... } ... void PE_create_console(void) { if (PE_state.video.v_display) PE_initialize_console(&PE_state.video, kPEGraphicsMode); else PE_initialize_console(&PE_state.video, kPETextMode); }

PE_initialize_console() [pexpert/ppc/pe_init.c] supports disabling the screen (switching to the serial console), enabling the screen (switching to the "last" console), or simply initializing the screen. All three operations involve calling initialize_screen() [osfmk/console/video_console.c], which is responsible for retrieving the graphical framebuffer address. osfmk/console/video_console.c also implements functions used while displaying boot progress during a graphical boot.

ppc_vm_init() finally calls PE_init_printf() [pexpert/gen/pe_gen.c].

After ppc_vm_init() returns, ppc_init() processes the wcte and mcksoft boot arguments (see Table 412) on 64-bit hardware.

5.3.3.2. Preparing for the Bootstrapping of Kernel Subsystems

Finally, ppc_init() calls machine_startup() [osfmk/ppc/model_dep.c], which never returns.

machine_startup() processes several boot arguments. In particular, it checks whether the kernel must halt in the debugger. It initializes locks used by the debugger (debugger_lock) and the backtrace print mechanism (pbtlock). debugger_lock is used to ensure that there is only one processor in the debugger at a time. pbtlock is used by print_backtrace() [osfmk/ppc/model_dep.c] to ensure that only one backtrace can occur at a time. If the built-in kernel debuggerKDBhas been compiled into the kernel, machine_startup() calls ddb_init() [osfmk/ddb/db_sym.c] to initialize KDB. Moreover, if the kernel has been instructed to halt in KDB, machine_startup() calls Debugger() [osfmk/ppc/model_dep.c] to enter the debugger.

// osfmk/ppc/model_dep.c #define TRAP_DEBUGGER __asm__ volatile("tw 4,r3,r3"); ... void machine_startup(boot_args *args) { ... #if MACH_KDB ... ddb_init(); if (boot_arg & DDB_KDB) current_debugger = KDB_CUR_DB; if (halt_in_debugger && (current_debugger == KDB_CUR_DB)) { Debugger("inline call to debugger(machine_startup)"); ... } ... } ... void Debugger(const char *message) { ... if ((current_debugger != NO_CUR_DB)) { // debugger configured printf("Debugger(%s)\n", message); TRAP_DEBUGGER; // enter the debugger splx(spl); return; } ... }

machine_startup() calls machine_conf() [osfmk/ppc/model_dep.c], which manipulates Mach's machine_info structure [osfmk/mach/machine.h]. The host_info() Mach call[12] retrieves information from this structure. Note that the memory_size field is pinned to 2GB on machines with more than 2GB of physical memory.

[12] We will see an example of using this call in Chapter 6.

// osfmk/mach/machine.h struct machine_info { integer_t major_version; // kernel major version ID integer_t minor_version; // kernel minor version ID integer_t max_cpus; // maximum number of CPUs possible integer_t avail_cpus; // number of CPUs now available uint32_t memory_size; // memory size in bytes, capped at 2GB uint64_t max_mem; // actual physical memory size integer_t physical_cpu; // number of physical CPUs now available integer_t physical_cpu_max; // maximum number of physical CPUs possible integer_t logical_cpu; // number of logical CPUs now available integer_t logical_cpu_max; // maximum number of logical CPUs possible }; typedef struct machine_info *machine_info_t; typedef struct machine_info machine_info_data_t; extern struct machine_info machine_info; ...

On older kernels, machine_startup() also initializes thermal monitoring for the processor by calling ml_thrm_init() [osfmk/ppc/machine_routines_asm.s]. Newer kernels handle thermal initialization entirely in the I/O Kitml_thrm_init() performs no work on these kernels.

Finally, machine_conf() calls kernel_bootstrap() [osfmk/kern/startup.c], which never returns.

Категории