Reverse engineering a simple C program part 2 : GDB
In part 1 of this simple reverse engineering exercise I examined a basic C program with 'Strings', 'Hexdump', and 'Ltrace'. These tools are useful for scratching the surface of a program with, but for in-depth analysis it is necessary to disassemble the program entirely.
After loading GDB with the license crack program, setting a break point on main, running the program with no user input, and using the 'disassemble main' command, we are given the following output:
This output is of the programs first and only function - 'main'. Every C program starts with a main function, and some may end within the main function while others may make calls to additional functions contained within the same code. The end goal here is to examine the programs functions until we draw a complete map of how the program executes. The focus will largely be on conditional instructions that influence flow control.
Flow control is simply the logical order in which individual instructions of a program are executed. Some instructions may be executed before others, or some not at all, based on conditions that exist or do not exist within the environment in which the program is running.
At lines 3, 4 and 5 we see the function prologue taking place. Lines 6 and 7 are saving the values of registers %edi and %rsi by moving their contents into memory locations that are -0x4 and -0x10 bytes from the base pointer.
The instruction at line 8 is a 'compare' instruction which is a type of conditional instruction. These conditional instructions are used for program flow control, executing certain blocks of code based on conditions that are either met or not met. The compare instruction takes two operands as its input and determines whether or not they are equal or unequal. In this case the hexadecimal value 0x2 is compared with a memory address that is -0x4(%rbp), which we know currently contains the value of the %edi register. So what was in the %edi register? Running 'info registers' will reveal all:
The %edi register (seen in this screenshot as rdi - a reference to the 64-bit register) currently contains the hexadecimal value of 0x1. The compare instruction from line 8 in the previous screen-shot is comparing 0x2 with 0x1, and then storing a flag which indicates whether the two values are equal or not equal. In this instance the flag will be set to 'not equal'.
The instruction located at line 9 is another flow control instruction. Jne simply means 'Jump if not equal', and uses the flag previously set by the compare instruction to determine whether or not a jump should be made. In this instance the 'not equal' flag is set so the jump is taken to the memory address 0x40061c <main+102>.
The memory address at 0x040061c is followed by a call to the 'puts' function at memory address 0x0400621. Puts is a C function that writes a string to stdout ('standard out'). Following this is a mov instruction that zeros out the %eax register, a leave instruction and a return instruction. The return instruction signals the end of this function, which is also the end of the program.
So how do we make sense of this information? What can we learn?
The flow control statements make it easy to draw a simple, logical graph of the programs execution. Graphing all of the above information helps put things into perspective:
There is still the matter of the puts call at memory address 0x0400621. To determine what this function is printing to stdout is actually very straight forward in this instance: run the program normally with no input (remember that the program was executed in GDB with no input). The other method available is to use Ltrace, which will show the call to puts and the string it prints to stdout. In this instance 'puts' is being used to print the statement 'usage: <key>' to the users screen if no key is provided.
It's also worth briefly mentioning the %edi register. When the program is run without any user input the %edi register is set to a value of 0x1. When the program is run again, only this time with user input, the %edi registers value is 0x2. This indicates that %edi is being used to keep track of the number of arguments passed to the program when it is called (the first argument is always the program itself, hence starting at 0x1).
Using everything learned so far makes it possible to start mapping the rest of the program. It's also quite useful to step through the program in GDB one instruction at a time, viewing the disassembled code, following the various conditional jumps, and learning what happens at various moments of the programs execution. After mapping each instruction of this simple C program, I'm left with the following graph:
There are several debugging programs that will perform this process automatically for the user, which is especially helpful when debugging complex programs. For the sake of learning, however, it is much better to debug and graph a program by hand, and to spend time looking up each instruction if more clarification is needed.
After loading GDB with the license crack program, setting a break point on main, running the program with no user input, and using the 'disassemble main' command, we are given the following output:
The 'main' function |
This output is of the programs first and only function - 'main'. Every C program starts with a main function, and some may end within the main function while others may make calls to additional functions contained within the same code. The end goal here is to examine the programs functions until we draw a complete map of how the program executes. The focus will largely be on conditional instructions that influence flow control.
Flow control is simply the logical order in which individual instructions of a program are executed. Some instructions may be executed before others, or some not at all, based on conditions that exist or do not exist within the environment in which the program is running.
At lines 3, 4 and 5 we see the function prologue taking place. Lines 6 and 7 are saving the values of registers %edi and %rsi by moving their contents into memory locations that are -0x4 and -0x10 bytes from the base pointer.
The instruction at line 8 is a 'compare' instruction which is a type of conditional instruction. These conditional instructions are used for program flow control, executing certain blocks of code based on conditions that are either met or not met. The compare instruction takes two operands as its input and determines whether or not they are equal or unequal. In this case the hexadecimal value 0x2 is compared with a memory address that is -0x4(%rbp), which we know currently contains the value of the %edi register. So what was in the %edi register? Running 'info registers' will reveal all:
Register contents |
The %edi register (seen in this screenshot as rdi - a reference to the 64-bit register) currently contains the hexadecimal value of 0x1. The compare instruction from line 8 in the previous screen-shot is comparing 0x2 with 0x1, and then storing a flag which indicates whether the two values are equal or not equal. In this instance the flag will be set to 'not equal'.
The instruction located at line 9 is another flow control instruction. Jne simply means 'Jump if not equal', and uses the flag previously set by the compare instruction to determine whether or not a jump should be made. In this instance the 'not equal' flag is set so the jump is taken to the memory address 0x40061c <main+102>.
Line 31 - 0x040061c - the memory address executed after the conditional jump |
The memory address at 0x040061c is followed by a call to the 'puts' function at memory address 0x0400621. Puts is a C function that writes a string to stdout ('standard out'). Following this is a mov instruction that zeros out the %eax register, a leave instruction and a return instruction. The return instruction signals the end of this function, which is also the end of the program.
So how do we make sense of this information? What can we learn?
The flow control statements make it easy to draw a simple, logical graph of the programs execution. Graphing all of the above information helps put things into perspective:
It's starting to make more sense now... |
There is still the matter of the puts call at memory address 0x0400621. To determine what this function is printing to stdout is actually very straight forward in this instance: run the program normally with no input (remember that the program was executed in GDB with no input). The other method available is to use Ltrace, which will show the call to puts and the string it prints to stdout. In this instance 'puts' is being used to print the statement 'usage: <key>' to the users screen if no key is provided.
It's also worth briefly mentioning the %edi register. When the program is run without any user input the %edi register is set to a value of 0x1. When the program is run again, only this time with user input, the %edi registers value is 0x2. This indicates that %edi is being used to keep track of the number of arguments passed to the program when it is called (the first argument is always the program itself, hence starting at 0x1).
Using everything learned so far makes it possible to start mapping the rest of the program. It's also quite useful to step through the program in GDB one instruction at a time, viewing the disassembled code, following the various conditional jumps, and learning what happens at various moments of the programs execution. After mapping each instruction of this simple C program, I'm left with the following graph:
A complete overview of the programs flow control |
There are several debugging programs that will perform this process automatically for the user, which is especially helpful when debugging complex programs. For the sake of learning, however, it is much better to debug and graph a program by hand, and to spend time looking up each instruction if more clarification is needed.
Comments
Post a Comment