How Linux Works: What Every Superuser Should Know
8.6 Assembly Code and How a Compiler Works
If you want to use a compiler's advanced features, you should have an idea of how the compiler operates. Here is a brief summary:
-
The compiler reads a source code file and builds an internal representation of the code inside the file. If there's a problem with the source code, the compiler states the error and exits.
-
The compiler analyzes the internal representation and generates assembly code for the target processor.
-
An assembler converts the assembly code into an object file.
-
The linker gathers object files and libraries into an executable.
You may be specifically interested in steps 2 and 3 of this process. Assembly code is one step away from the raw binary machine code that the processor runs; it is a textual representation of the processor instructions. Here is an excerpt of a program in x86 assembly code:
.L5: movl -8(%ebp),%eax imull -16(%ebp),%eax movl -4(%ebp),%edx addl %eax,%edx movl %edx,-12(%ebp) incl -16(%ebp) jmp .L3 .p2align 4,,7
Each line of assembly code usually represents a single instruction. To manually generate assembly code from a C source file, use the compiler's -S option:
cc -S -o prog .S prog .c
Here, prog .c is the C source file and prog .S is the assembly code output. You can turn an assembly code file into an object file with the assembler, as :
as -o prog .o prog .S
For more information about x86 assembly code, see The Art of Assembly Language [Hyde]. RISC assembly code is a little more comprehensible; see MIPS RISC Architecture [Kane]. If you are interested in how to design and implement a compiler, two good books are Compilers: Principles, Techniques, and Tools [Aho 1986] and Modern Compiler Implementation in ML [Appel].