Mac OS X Internals: A Systems Approach
3.4. Software Conventions
An application binary interface (ABI) defines a system interface for compiled programs, allowing compilers, linkers, debuggers, executables, libraries, other object files, and the operating system to work with each other. In a simplistic sense, an ABI is a low-level, "binary" API. A program conforming to an API should be compilable from source on different systems supporting that API, whereas a binary executable conforming to an ABI should operate on different systems supporting that ABI.[51] [51] ABIs vary in whether they strictly enforce cross-operating-system compatibility or not. An ABI usually includes a set of rules specifying how hardware and software resources are to be used for a given architecture. Besides interoperability, the conventions laid down by an ABI may have performance-related goals too, such as minimizing average subroutine-call overhead, branch latencies, and memory accesses. The scope of an ABI could be extensive, covering a wide variety of areas such as the following:
The PowerPC version of Mac OS X uses the Darwin PowerPC ABI in its 32-bit and 64-bit versions, whereas the 32-bit x86 version uses the System V IA-32 ABI. The Darwin PowerPC ABI is similar tobut not the same asthe popular IBM AIX ABI for the PowerPC. In this section, we look at some aspects of the Darwin PowerPC ABI without analyzing its differences from the AIX ABI. 3.4.1. Byte Ordering
The PowerPC architecture natively supports 8-bit (byte), 16-bit (half word), 32-bit (word), and 64-bit (double word) data types. It uses a flat-address-space model with byte-addressable storage. Although the PowerPC architecture provides an optional little-endian facility, the 970FX does not implement itit implements only the big-endian addressing mode. Big-endian refers to storing the "big" end of a multibyte value at the lowest memory address. In the PowerPC architecture, the leftmost bitbit 0is defined to be the most significant bit, whereas the rightmost bit is the least significant bit. For example, if a 64-bit register is being used as a 32-bit register in 32-bit computation mode, then bits 32 through 63 of the 64-bit register represent the 32-bit register; bits 0 through 31 are to be ignored. By corollary, the leftmost bytebyte 0is the most significant byte, and so on.
In PowerPC implementations that support both the big-endian and little-endian[52] addressing modes, the LE bit of the Machine State Register can be set to 1 to specify little-endian mode. Another bitthe ILE bitis used to specify the mode for exception handlers. The default value of both bits is 0 (big-endian) on such processors. [52] The use of little-endian mode on such processors is subject to several caveats as compared to big-endian mode. For example, certain instructionssuch as load/store multiple and load/store stringare not supported in little-endian mode.
3.4.2. Register Usage
The Darwin ABI defines a register to be dedicated, volatile, or nonvolatile. A dedicated register has a predefined or standard purpose; it should not be arbitrarily modified by the compiler. A volatile register is available for use at all times, but its contents may change if the context changesfor example, because of calling a subroutine. Since the caller must save volatile registers in such cases, such registers are also called caller-save registers. A nonvolatile register is available for use in a local context, but the user of such registers must save their original contents before use and must restore the contents before returning to the calling context. Therefore, it is the calleeand not the callerwho must save nonvolatile registers. Correspondingly, such registers are also called callee-save registers.
In some cases, a register may be available for general use in one runtime environment but may have a special purpose in some other runtime environment. For example, GPR12 has a predefined purpose on Mac OS X when used for indirect function calls.
Table 312 lists common PowerPC registers along with their usage conventions as defined by the 32-bit Darwin ABI.
3.4.2.1. Indirect Calls
We noted in Table 312 that a function that branches indirectly to another function stores the target of the call in GPR12. Indirect calls are, in fact, the default scenario for dynamically compiled Mac OS X user-level code. Since the target address would need to be stored in a register in any case, using a standardized register allows for potential optimizations. Consider the code fragment shown in Figure 318. Figure 318. A simple C function that calls another function
By default, the assembly code generated by GCC on Mac OS X for the function shown in Figure 318 will be similar to that shown in Figure 319, which has been annotated and trimmed down to relevant parts. In particular, note the use of GPR12, which is referred to as r12 in the GNU assembler syntax. Figure 319. Assembly code depicting an indirect function call
3.4.2.2. Direct Calls
If GCC is instructed to statically compile the code in Figure 318, we can verify in the resultant assembly that there is a direct call to f2 from f1, with no use of GPR12. This case is shown in Figure 320. Figure 320. Assembly code depicting a direct function call
3.4.3. Stack Usage
On most processor architectures, a stack is used to hold automatic variables, temporary variables, and return information for each invocation of a subroutine. The PowerPC architecture does not explicitly define a stack for local storage: There is neither a dedicated stack pointer nor any push or pop instructions. However, it is conventional for operating systems running on the PowerPCincluding Mac OS Xto designate (per the ABI) an area of memory as the stack and grow it upward: from a high memory address to a low memory address. GPR1, which is used as the stack pointer, points to the top of the stack. Both the stack and the registers play important roles in the working of subroutines. As listed in Table 312, registers are used to hold subroutine arguments, up to a certain number.
If a function f1 calls another function f2, which calls yet another function f3, and so on in a program, the program's stack grows per the ABI's conventions. Each function in the call chain owns part of the stack. A representative runtime stack for the 32-bit Darwin ABI is shown in Figure 321. Figure 321. Darwin 32-bit ABI runtime stack
In Figure 321, f1 calls f2, which calls f3. f1's stack frame contains a parameter area and a linkage area. The parameter area must be large enough to hold the largest parameter list of all functions that f1 calls. f1 typically will pass arguments in registers as long as there are registers available. Once registers are exhausted, f1 will place arguments in its parameter area, from where f2 will pick them up. However, f1 must reserve space for all arguments of f2 in any caseeven if it is able to pass all arguments in registers. f2 is free to use f1's parameter area for storing arguments if it wants to free up the corresponding registers for other use. Thus, in a subroutine call, the caller sets up a parameter area in its own stack portion, and the callee can access the caller's parameter area for loading or storing arguments. The linkage area begins after the parameter area and is at the top of the stackadjacent to the stack pointer. The adjacency to the stack pointer is important: The linkage area has a fixed size, and therefore the callee can find the caller's parameter area deterministically. The callee can save the CR and the LR in the caller's linkage area if it needs to. The stack pointer is always saved by the caller as a back chain to its caller. In Figure 321, f2's portion of the stack shows space for saving nonvolatile registers that f2 changes. These must be restored by f2 before it returns to its caller. Space for each function's local variables is reserved by growing the stack appropriately. This space lies below the parameter area and above the saved registers. The fact that a called function is responsible for allocating its own stack frame does not mean the programmer has to write code to do so. When you compile a function, the compiler inserts code fragments called the prologue and the epilogue before and after the function body, respectively. The prologue sets up the stack frame for the function. The epilogue undoes the prologue's work, restoring any saved registers (including CR and LR), incrementing the stack pointer to its previous value (that the prologue saved in its linkage area), and finally returning to the caller.
A 32-bit Darwin ABI stack frame is 16-byte aligned.
Consider the trivial function shown in Figure 322, along with the corresponding annotated assembly code. Figure 322. Assembly listing for a C function with no arguments and an empty body
3.4.3.1. Stack Usage Examples
Figures 323 and 324 show examples of how the compiler sets up a function's stack depending on the number of local variables a function has, the number of parameters it has, the number of arguments it passes to a function it calls, and so on. Figure 323. Examples of stack usage in functions
Figure 324. Examples of stack usage in functions (continued from Figure 323)
f1 is identical to the "null" function that we encountered in Figure 322, where we saw that the compiler reserves 48 bytes for the function's stack. The portions shown as shaded in the stacks are present either for alignment padding or for some current or future purpose not necessarily exposed through the ABI. Note that GPR30 and GPR31 are always saved, GPR30 being the designated frame pointer. f2 uses a single 32-bit local variable. Its stack is 64 bytes. f3 calls a function that takes no arguments. Nevertheless, this introduces a parameter area on f3's stack. A parameter area is at least eight words (32 bytes) in size. f3's stack is 80 bytes. f4 takes eight arguments, has no local variables, and calls no functions. Its stack area is the same size as that of the null function because space for its arguments is reserved in the parameter area of its caller. f5 takes no arguments, has eight word-size local variables, and calls no functions. Its stack is 64 bytes. 3.4.3.2. Printing Stack Frames
GCC provides built-in functions that may be used by a function to retrieve information about its callers. The current function's return address can be retrieved by calling the __builtin_return_address() function, which takes a single argumentthe level, an integer specifying the number of stack frames to walk. A level of 0 results in the return address of the current function. Similarly, the __builtin_frame_address() function may be used to retrieve the frame address of a function in the call stack. Both functions return a NULL pointer when the top of the stack has been reached.[53] Figure 325 shows a program that uses these functions to display a stack trace. The program also uses the dladdr() function in the dyld API to find the various function addresses corresponding to return addresses in the call stack. [53] For __builtin_frame_address() to return a NULL pointer upon reaching the top of the stack, the first frame pointer must have been set up correctly. Figure 325. Printing a function call stack trace[54]
[54] Note in the program's output that the function name in frames #5 and #6 is tart. The dladdr() function strips leading underscores from the symbols it returnseven if there is no leading underscore (in which case it removes the first character). In this case, the symbol's name is start. 3.4.4. Function Parameters and Return Values
We saw earlier that when a function calls another with arguments, the parameter area in the caller's stack frame is large enough to hold all parameters passed to the called function, regardless of the number of parameters actually passed in registers. Doing so has benefits such as the following.
3.4.4.1. Passing Parameters
Parameter-passing rules may depend on the type of programming language usedfor example, procedural or object-oriented. Let us look at parameter-passing rules for C and C-like languages. Even for such languages, the rules further depend on whether a function has a fixed-length or a variable-length parameter list. The rules for fixed-length parameter lists are as follows.
Let us look at the case of functions with variable-length parameter lists. Note that a function may have some number of required parameters preceding a variable number of parameters.
3.4.4.2. Returning Values
Functions return values according to the following rules.
|
Категории