GNU/Linux Application Programming (Programming Series)
The job of the optimizer is essentially to do one of three potentially orthogonal tasks . It can optimize the code to make it faster and smaller, it can optimize the code to make it faster but potentially larger, or it can simply reduce the size of the code but potentially make it slower. Luckily, we have control over the optimizer to instruct it on what we really want.
Note | While the GCC optimizer does a good job of code optimization, it can sometimes result in larger or slower images (the opposite of what you may be after). It s important to test your image to ensure that you re getting what you expect. When you don t get what you expect, changing the options you provide to the optimizer can usually remedy the situation. |
In this section, we ll look at the various mechanisms to optimize code using GCC.
Optimization Level | Description |
---|---|
-O0 | No optimization (the default level). |
-O, -O1 | Tries to reduce both compilation time and image size. |
-O2 | More optimizations than -O1 , but only those that don t increase size over speed (or vice-versa). |
-Os | Optimize for resulting image size (all -O2 , except for those that increase size). |
-O3 | Even more optimizations ( -O2 , plus a couple more). |
In its simplest form, GCC provides a number of levels of optimization that can be enabled. The -O ( oh) option permits the specification of five different optimization levels, listed in Table 4.4.
Enabling the optimizer simply entails specifying the given optimization level on the GCC command line. For example, in the following command line, we instruct the optimizer to focus on reducing the size of the resulting image:
$ gcc -Os test.c -o test
Note that it is possible to specify different optimization levels for each file that is to make up an image. There are certain optimizations (not contained within the optimization levels) that require all files to be compiled with the option if one is compiled with it. We ll not address any of those here.
Let s now dig into the optimization levels and see what each does and also identify the individual optimizations that are provided.
-O0 Optimization
With -O0 optimization (or no optimizer spec specified at all), the compiler will simply generate code that provides the expected results and is easily debuggable within a source code debugger (such as the GNU Debugger, gdb ). The compiler is also much faster when not optimizing, as the optimizer is not invoked at all.
-O1 Optimization (-O)
In the first level of optimization, the optimizer s goal is to compile as quickly as possible and also to reduce the resulting code size and execution time. Compilation may take more time with -O1 (over -O0 ), but depending upon the source being compiled, this is usually not noticeable.
Optimization Level | Description |
---|---|
defer -pop | Defer popping function args from stack until necessary. |
thread- jumps | Perform jump threading optimizations (to avoid jumps to jumps). |
branch-probabilities | Use branch profiling to optimize branches. |
cprop-registers | Perform a register copy-propagation optimization pass. |
guess-branch-probability | Enable guessing of branch probabilities. |
omit-frame-pointer | Do not generate stack- frames (if possible). |
The individual optimizations in -O1 are shown in Table 4.5.
The -O1 optimization is usually a safe level if you still desire to safely debug the resulting image.
Note | When specifying optimizations explicitly, the -f option is used to identify them. For example, to enable the defer-pop optimization, we would simply define this as -fdefer-pop . If the option is enabled via an optimization level, and you want it turned off, simply use the negative form -fno-defer-pop . |
-O2 Optimization
The second optimization level provides even more optimizations (while including those in -O1 ) but does not include any optimizations that will trade speed for space (or vice-versa). The optimizations that are present in -O2 are listed in Table 4.6.
Note that Table 4.6 lists only those optimizations that are unique to -O2 , but it doesn t list the -O1 optimizations. It should be assumed that -O2 is the collection of optimizations shown in Tables 4.5 and 4.6.
-Os Optimization
The -Os optimization level simply disables some -O2 optimizations that would otherwise increase the size of the resulting image. Those optimizations that are disabled for -Os (that do appear in -O2 ) are -falign-labels , -falign-jumps , -falign-labels , and -falign-functions . Each of these has the potential to increase the size of the resulting image, and therefore they are disabled to help build a smaller executable.
Optimization | Description |
---|---|
align- loops | Align the start of loops. |
align-jumps | Align the labels that are only reachable by jumps. |
align-labels | Align all labels. |
align-functions | Align the beginning of functions. |
optimize-sibling-calls | Optimize sibling and tail recursive calls. |
cse-follow-jumps | When performing CSE, follow jumps to their targets. |
cse-skip-blocks | When performing CSE, follow conditional jumps. |
gcse | Perform global common subexpression elimination . |
expensive-optimizations | Perform a set of expensive optimizations. |
strength-reduce | Perform strength reduction optimizations. |
rerun-cse-after-loop | Rerun CSE after loop optimizations. |
rerun-loop-opt | Rerun the loop optimizer twice. |
caller-saves | Enable register saving around function calls. |
force-mem | Copy memory operands into registers before using. |
peephole2 | Enable an rtl peephole pass before sched2 . |
regmove | Enable register move optimizations. |
strict-aliasing | Assume that strict aliasing rules apply. |
delete-null-pointer-checks | Delete useless null pointer checks. |
reorder-blocks | Reorder basic blocks to improve code placement. |
schedule-insns | Reschedule instructions before register allocation. |
schedule-insns2 | Reschedule instructions after register allocation. |
-O3 Optimization
The -O3 optimization level is the highest level of optimization provided by GCC. In addition to those optimizations provided in -O2 , this level also includes those shown in Table 4.7.
Optimization | Description |
---|---|
-finline-functions | Inline simple functions into the calling function. |
-frename-registers | Optimize register allocation for architectures with large numbers of registers (makes debugging difficult). |
Target CPU | -mcpu= |
---|---|
i386 DX/SX/CX/EX/SO | i386 |
i486 DX/SX/DX2/SL/SX2/DX4 | i486 |
487 | i486 |
Pentium | pentium |
Pentium MMX | pentium-mmx |
Pentium Pro | pentiumpro |
Pentium II | pentium2 |
Celeron | pentium2 |
Pentium III | pentium3 |
Pentium IV | pentium4 |
Via C3 | c3 |
Winchip 2 | winchip2 |
Winchip C6-2 | winchip-c6 |
AMD K5 | i586 |
AMD K6 | k6 |
AMD K6 II | k6-2 |
AMD K6 III | k6-3 |
AMD Athlon | athlon |
AMD Athlon 4 | athlon |
AMD Athlon XP/MP | athlon |
AMD Duron | athlon |
AMD Tbird | athlon-tbird |
Architectural Optimizations
While standard optimization levels can provide meaningful improvements on software performance and code size, specifying the target architecture can also be very useful. The -mcpu option tells the compiler to generate instructions for the CPU type as specified. For the standard 86 target, Table 4.8 lists some of the options.
So if we were compiling specifically for the Intel Celeron architecture, we d use the following command line:
$ gcc -mcpu=pentium2 test.c -o test
Of course, combining the -mcpu option with an optimization level can lead to additional performance benefits. One very important point to note is that once we compile for a given CPU, it may not run on another. Therefore, if we re more interested in an image running on a variety of CPUs, allowing the compiler to pick the default (i386) will support any of the X86 architectures.
Категории