The Linux Kernel Primer. A Top-Down Approach for x86 and PowerPC Architectures
2.3. Assembly Language Example
We can now create a simple program to see how the different architectures produce assembly language for the same C code. For this experiment, we use the gcc compiler that came with Red Hat 9 and the gcc cross compiler for PowerPC. We present the C program and then, for comparison, the x86 code and the PPC code. It might startle you to see how much assembly code is generated with just a few lines of C. Because we are just compiling from C to assembler, we are not linking in any environment code, such as the C runtime libraries or local stack creation/destruction, so the size is much smaller than an actual ELF executable. Note that with assembler, you are closest to seeing exactly what the processor is fetching from cycle to cycle. Another way to look at it is that you have complete control of your code and the system. It is important to mention that even though instructions are fetched from memory in order, they might not always be executed in exactly the same order read in. Some architectures order load and store operations separately. Here is the example C code: ----------------------------------------------------------------------- count.c 1 int main() 2 { 3 int i,j=0; 4 5 for(i=0;i<8;i++) 6 j=j+i; 7 8 return 0; 9 } ----------------------------------------------------------------------- Line 1
This is the function definition main. Line 3
This line initializes the local variables i and j to 0. Line 5
The for loop: While i takes values from 0 to 7, set j equal to j plus i. Line 8
The return marks the jump back to the calling program. 2.3.1. x86 Assembly Example
Here is the code generated for x86 by entering gcc S count.c on the command line. Upon entering the code, the base of the stack is pointed to by ss:ebp. The code is produced in "AT&T" format, in which registers are prefixed with a % and constants are prefixed with a $. The assembly instruction samples previously provided in this section should have prepared you for this simple program, but one variant of indirect addressing should be discussed before we go further. When referencing a location in memory (for example, stack), the assembler uses a specific syntax for indexed addressing. By putting a base register in parentheses and an index (or offset) just outside the parentheses, the effective address is found by adding the index to the value in the register. For example, if %ebp was assigned the value 20, the effective address of 8(%ebp) would be (8) + (20)= 12: ----------------------------------------------------------------------- count.s 1 .file "count.c" 2 .version "01.01" 3 gcc2_compiled.: 4 .text 5 .align 4 6 .globl main 7 .type main,@function 8 main: #create a local memory area of 8 bytes for i and j. 9 pushl %ebp 10 movl %esp, %ebp 11 subl $8, %esp #initialize i (ebp-4) and j (ebp-8) to zero. 12 movl $0, -8(%ebp) 13 movl $0, -4(%ebp) 14 .p2align 2 15 .L3: #This is the for-loop test 16 cmpl $7, -4(%ebp) 17 jle .L6 18 jmp .L4 19 .p2align 2 20 .L6: #This is the body of the for-loop 21 movl -4(%ebp), %eax 22 leal -8(%ebp), %edx 23 addl %eax, (%edx) 24 leal -4(%ebp), %eax 25 incl (%eax) 26 jmp .L3 27 .p2align 2 28 .L4: #Setup to exit the function 29 movl $0, %eax 30 leave 31 ret -----------------------------------------------------------------------
Line 9
Push stack base pointer onto the stack. Line 10
Move the stack pointer into the base pointer. Line 11
Get 8 bytes of stack mem starting at ebp. Line 12
Move 0 into address ebp8 (j). Line 13
Move 0 into address ebp4 (i). Line 14
This is an assembler directive that indicates the instruction should be half-word aligned. Line 15
This is an assembler-created label called .L3. Line 16
This instruction compares the value of i to 7. Line 17
Jump to label .L6 if 4(%ebp) is less than or equal to 7. Line 18
Otherwise, jump to label .L4. Line 19
Align. Line 20
Label .L6. Line 21
Move i into eax. Line 22
Load the address of j into edx. Line 23
Add i to the address pointed to by edx (j). Line 24
Move the new value of i into eax. Line 25
Increment i. Line 26
Jump back to the for loop test. Line 27
Align as described in Line 14 code commentary. Line 28
Label .L4. Line 29
Set the return code in eax. Line 30
Release the local memory area. Line 31
Pop any variable off stack, pop the return address, and jump back to the caller. 2.3.2. PowerPC Assembly Example
The following is the resulting PPC assembly code for the C program. If you are familiar with assembly language (and acronyms), the function of many PPC instructions is clear. There are, however, several derivative forms of the basic instructions that we must discuss here:
The following code was generated by entering gcc S count.c on the command line: ----------------------------------------------------------------------- countppc.s 1 .file "count.c" 2 .section ".text" 3 .align 2 4 .globl main 5 .type main,@function 6 main: #Create 32 byte memory area from stack space and initialize i and j. 7 stwu 1,-32(1) #Store stack ptr (r1) 32 bytes into the stack 8 stw 31,28(1) #Store word r31 into lower end of memory area 9 mr 31,1 #Move contents of r1 into r31 10 li 0,0 #Load 0 into r0 11 stw 0,12(31) #Store word r0 into effective address 12(r31), var j 12 li 0,0 #Load 0 into r0 13 stw 0,8(31) #Store word r0 into effective address 8(r31) , var i 14 .L2: #For-loop test 15 lwz 0,8(31) #Load i into r0 16 cmpwi 0,0,7 #Compare word immediate r0 with integer value 7 17 ble 0,.L5 #Branch if less than or equal to label .L5 18 b .L3 #Branch unconditional to label .L3 19 .L5: #The body of the for-loop 20 lwz 9,12(31) #Load j into r9 21 lwz 0,8(31) #Load i into r0 22 add 0,9,0 #Add r0 to r9 and put result in r0 23 stw 0,12(31) #Store r0 into j 24 lwz 9,8(31) #load i into r9 25 addi 0,9,1 #Add 1 to r9 and store in r0 26 stw 0,8(31) #Store r0 into i 27 b .L2 28 .L3: 29 li 0,0 #Load 0 into r0 30 mr 3,0 #move r0 to r3 31 lwz 11,0(1) #load r1 into r11 32 lwz 31,-4(11) #Restore r31 33 mr 1,11 #Restore r1 34 blr #Branch to Link Register contents --------------------------------------------------------------------
Line 7
Store stack ptr (r1) 32 bytes into the stack. Line 8
Store word r31 into the lower end of the memory area. Line 9
Move the contents of r1 into r31. Line 10
Load 0 into r0. Line 11
Store word r0 into effective address 12(r31), var j. Line 12
Load 0 into r0. Line 13
Store word r0 into effective address 8(r31), var i. Line 14
Label .L2:. Line 15
Load i into r0. Line 16
Compare word immediate r0 with integer value 7. Line 17
Branch to label .L5 if r0 is less than or equal to 7. Line 18
Branch unconditional to label .L3. Line 19
Label .L5:. Line 20
Load j into r9. Line 21
Load i into r0. Line 22
Add r0 to r9 and put the result in r0. Line 23
Store r0 into j. Line 24
Load i into r9. Line 25
Add 1 to r9 and store in r0. Line 26
Store r0 into i. Line 27
This is an unconditional branch to label .L2. Line 28
Label .L3:. Line 29
Load 0 into r0. Line 30
Move r0 to r3. Line 31
Load r1 into r11. Line 32
Restore r31. Line 33
Restore r1. Line 34
This is an unconditional branch to the location indicated by Link Register contents. Contrasting the two assembler files, they have nearly the same number of lines. Upon further inspection, you can see that the RISC (PPC) processor is characteristically using many load and store instructions while the CISC (x86) tends to use the mov instruction more often. |
Категории