Reverse engineering a simple C program part 1: Strings, ltrace, hexdump

Reverse engineering is something I find quite interesting, and a topic I am endeavoring to spend more time studying. There's nothing better than hands on practice, so lets take a look at a very simple 'license check' program written in C.


Here's the code:






This program will simply check for user input and then compare that input to a stored string. If the user input matches the string "ABCD-Z34K-42-OK" than the program will print "License confirmed" to the user. If the string does not match than the program will print "License incorrect" to the user. If no input is detected at all than the program simply prints "Usage: <key>" and exits.

When this program is compiled none of the source code will be available. The process of compiling will result in the source code being translated (it can also be thought of as transforming) into the machine language that is read by computer hardware. Our challenge is to compile the source code and extract the license key.

The program can be compiled in Linux with the command "gcc" - the GNU Compiler Collection. Opening the file after it has been compiled shows that the pretty source code above is indeed gone:






The compiled source code is referred to as a binary, and it is these binaries that are the subject of reverse engineering. It is exceptionally rare to have access to the original source code when reverse engineering, so I'll demonstrate a few ways to 'crack' this license check program even without access to the original code.

The "@^" symbols represent characters that are outside of the ASCII character range, while the string "ELF" in the top left corner of the file tells us this file is an 'Executable and Linkable Format' file. There's another string, '/lib64/ld-linux-x86-64.so2', that I wont go into right now. There are several other readable strings inside the file, and some of them may contain useful information for reverse engineering this binary with. The 'strings' command is especially helpful here:






Closer examination of the strings contained within the file reveals some interesting output. We first see strings like 'puts', 'printf', and 'strcmp' which are all functions used by the program. The strings 'License confirmed', 'License incorrect' and 'Usage' all correlate to the source code of the program. In plain sight is the key 'ABCD-Z34K-42-OK'.
Strings are exceptionally useful when reverse engineering and are always worth investigating. Along with calls to standard C functions there may also be user defined functions within the strings, alluding to the nature of the program.

A useful way to see what's happening during program execution is with the 'ltrace' command. I was initially unfamiliar with ltrace and needed to spend some time consulting its manual page to understand how it worked. Reading the manual and understanding the tools being used is one of the easiest ways to learn!






So now we know that the ltrace program intercepts and records the dynamic library calls which are called by the executed program. "Library calls" are calls by a program to functions like 'puts', 'strcmp', and 'printf'. They are so called as each of those functions is accessible via the standard C library. We know that the program we're about to execute contains functions like these because of the output of the strings command earlier so it makes perfect sense to use ltrace in this instance.
 
Running the ltrace program against the license key check program results in the following output:






We can see that ltrace intercepted function calls to 'printf', 'strcmp', and 'puts'.
The 'strcmp' function seems to contain two arguments, one of which is the key we supplied when executing the program and the other is the license we originally saw in the source code.
The 'puts' and 'printf' functions appear to be printing information back to the user.

Using ltrace is advantageous for a couple of reasons. Firstly, we get a better understanding of the functions used by the program (remember: the source code is rarely ever available so this information isn't known before hand). Secondly, there's the possibility it will reveal information that wasn't already available after analysing the strings, or, it will at least clarify the purpose behind some of the strings we have seen.

The 'strcmp' function looks interesting, so lets quickly consult the manual for that function to find out exactly what it's doing:






The 'strcmp' function takes two arguments as its input and compares them. The result is an integer that is either less than, equal to or greater than zero if the first argument (in this case the user input) is found to equal or not equal the second argument (the correct key). With this knowledge we could safely assume that the second argument to 'strcmp' in the above screen capture is the correct key as expected by the program.

Another helpful tool is 'hexdump'. Hexdump has additional uses outside of the way I'll use it in this example, but for now it serves as another tool that can view the strings and reveal interesting information about the license checking program:







Overall this is just scratching the surface of the license checking program. The tools used so far can provide a valuable amount of information, but there is still plenty more work yet to do.
To completely understand how this program executes requires the use of a disassembler. By disassembling the program it is possible to step through each instruction and analyse every step taken by the program during its execution. In this way it is possible to completely piece together a program from start to finish even without the source code. This seems like the perfect opportunity to spend more time with GDB...

...Part 2 coming soon

Comments

Popular posts from this blog

Exploiting OpenSSH 4.7 / OpenSSL 0.9.8 (Metasploitable 2)

501 million 'Pwned Passwords'