Mac OS X Internals: A Systems Approach
3.2. The G5: Lineage and Roadmap
As we saw earlier, the G5 is a derivative of IBM's POWER4 processor. In this section, we will briefly look at how the G5 is similar to and different from the POWER4 and some of the POWER4's successors. This will help us understand the position of the G5 in the POWER/PowerPC roadmap. Table 32 provides a high-level summary of some key features of the POWER4 and POWER5 lines.
[a] A chip includes two processor cores and L2 cache. A multichip module (MCM) contains multiple chips and usually L3 cache. A four-chip POWER5 MCM with four L3 cache modules is 95 mm2. [b] LPAR stands for (processor-level) Logical Partitioning. [c] SMT stands for simultaneous multithreading.
3.2.1. Fundamental Aspects of the G5
All POWER processors listed in Table 32, as well as the G5 derivatives, share some fundamental architectural features. They are all 64-bit and superscalar, and they perform speculative, out-of-order execution. Let us briefly discuss each of these terms. 3.2.1.1. 64-bit Processor
Although there is no formal definition of what constitutes a 64-bit processor, the following attributes are shared by all 64-bit processors:
The PowerPC architecture was designed to support both 32-bit and 64-bit computation modesan implementation is free to implement only the 32-bit subset. The G5 supports both computation modes. In fact, the POWER4 supports multiple processor architectures: the 32-bit and 64-bit POWER; the 32-bit and 64-bit PowerPC; and the 64-bit Amazon architecture. We will use the term PowerPC to refer to both the processor and the processor architecture. We will discuss the 64-bit capabilities of the 970FX in Section 3.3.12.1.
3.2.1.2. Superscalar
If we define scalar to be a processor design in which one instruction is issued per clock cycle, then a superscalar processor would be one that issues a variable number of instructions per clock cycle, allowing a clock-cycle-per-instruction (CPI) ratio of less than 1. It is important to note that even though a superscalar processor can issue multiple instructions in a clock cycle, it can do so only with several caveats, such as whether the instructions depend on each other and which specific functional units they use. Superscalar processors typically have multiple functional units, including multiple units of the same type.
3.2.1.3. Speculative Execution
A speculative processor can execute instructions before it is determined whether those instructions will need to be executed (instructions may not need to be executed because of a branch that bypasses them, for example). Therefore, instruction execution does not wait for control dependencies to resolveit waits only for the instruction's operands (data) to become available. Such speculation can be done by the compiler, the processor, or both. The processors in Table 32 employ in-hardware dynamic branch prediction (with multiple branches "in flight"), speculation, and dynamic scheduling of instruction groups to achieve substantial instruction-level parallelism. 3.2.1.4. Out-of-Order Execution
A processor that performs out-of-order execution includes additional hardware that can bypass instructions whose operands are not availablesay, due to a cache miss that occurred during register loading. Thus, rather than always executing instructions in the order they appear in the programs being run, the processor may execute instructions whose operands are ready, deferring the bypassed instructions for execution at a more appropriate time. 3.2.2. New POWER Generations
The POWER4 contains two processor cores in a single chip. Moreover, the POWER4 architecture has features that help in virtualization. Examples include a special hypervisor mode in the processor, the ability to include an address offset when using nonvirtual memory addressing, and support for multiple global interrupt queues in the interrupt controller. IBM's Logical Partitioning (LPAR) allows multiple independent operating system images (such as AIX and Linux) to be run on a single POWER4-based system simultaneously. Dynamic LPAR (DLPAR), introduced in AIX 5L Version 5.2, allows dynamic addition and removal of resources from active partitions. The POWER4+ improves upon the POWER4 by reducing its size, consuming less power, providing a larger L2 cache, and allowing more DLPAR partitions. The POWER5 introduces simultaneous multithreading (SMT), wherein a single processor supports multiple instruction streamsin this case, twosimultaneously.
The POWER5 supports other important features such as the following:
Besides using 90-nm technology, the POWER5+ adds several features to the POWER5's feature set, for example: 16GB page sizes, 1TB segments, multiple page sizes per segment, a larger (2048-entry) translation lookaside buffer (TLB), and a larger number of memory controller read queues. The POWER6 is expected to add evolutionary improvements and to extend the Fast Path concept even further, allowing functions of higher-level softwarefor example, databases and application serversto be performed in silicon.[21] It is likely to be based on a 65-nm process and is expected to have multiple ultra-high-frequency cores and multiple L2 caches. [21] The "reduced" in RISC becomes not quite reduced! 3.2.3. The PowerPC 970, 970FX, and 970MP
The PowerPC 970 was introduced in October 2002 as a 64-bit high-performance processor for desktops, entry-level servers, and embedded systems. The 970 can be thought of as a stripped-down POWER4+. Apple used the 970followed by the 970FX and the 970MPin its G5-based systems. Table 33 contains a brief comparison of the specifications of these processors. Figure 33 shows a pictorial comparison. Note that unlike the POWER4+, whose L2 cache is shared between cores, each core in the 970MP has its own L2 cache, which is twice as large as the L2 cache in the 970 or the 970FX.
[a] The 970FX and 970MP use 90 nm lithography, in which copper wiring, strained silicon, and silicon-on-insulator (SOI) are fused into the same manufacturing process. This technique accelerates electron flow through transistors and provides an insulating layer in silicon. The result is increased performance, transistor isolation, and lower power consumption. Controlling power dissipation is particularly critical for chips with low process geometries, where subthreshold leakage current can cause problems. [b] The L2 cache is shared between the two processor cores. [c] Although jointly developed by Motorola, Apple, and IBM, AltiVec is a trademark of Motorola, or more precisely, Freescale. In early 2004, Motorola spun out its semiconductor products sector as Freescale Semiconductor, Inc. [d] PowerTune is a clock-frequency and voltage-scaling technology. Another noteworthy point about the 970MP is that both its cores share the same input and output busses. In particular, the output bus is shared "fairly" between cores using a simple round-robin algorithm. Figure 33. The PowerPC 9xx family and the POWER4+
3.2.4. The Intel Core Duo
In contrast, the Intel Core Duo processor line used in the first x86-based Macintosh computers (the iMac and the MacBook Pro) has the following key characteristics:
|
Категории