Beginner Crackme

As part of an Intro to Security course I’m taking, my professor gave us a crackme style exercise to practice reading x86 assembly and basic reverse engineering.

The program is pretty simple. It accepts a password as an argument and we’re told that if the password is correct, "ok" is printed.

$ ./crackme
usage: ./crackme <secret>
$ ./crackme test
$

As usual, I start by running file on the binary, which shows that it’s a standard x64 ELF binary. file also says that the binary is "not stripped", which means that it includes symbols. All I really know about symbols are that they can include debugging information about a binary like function and variable names and some symbols aren’t really necessary; they can be stripped out to reduce the binary’s size and make reverse engineering more challenging. Maybe I’ll do a more in depth post on this in the future.

$ file crackme
crackme: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=0x3fcf895b7865cb6be6b934640d1519a1e6bd6d39, not stripped

Next, I run strings, hoping to get lucky and find the password amongst the strings in the binary. Strings looks for series of printable characters followed by a NULL, but unfortunately nothing here works as the password.

$ strings crackme
/lib64/ld-linux-x86-64.so.2
exd4
libc.so.6
puts
printf
memcmp
__libc_start_main
__gmon_start__
GLIBC_2.2.5
fffff.
AWAVA
AUATL
[]A\A]A^A_
usage: %s <secret>
;*3$"

Since that didn’t work, we’re forced to disassemble the binary and actually try to reverse engineer it. We’ll start with main.

$ gdb -batch -ex 'file crackme' -ex 'disas main'
Dump of assembler code for function main:
   0x00000000004004a0 <+0>:     sub    rsp,0x8
   0x00000000004004a4 <+4>:     cmp    edi,0x1
   0x00000000004004a7 <+7>:     jle    0x4004c7 <main+39>
   0x00000000004004a9 <+9>:     mov    rdi,QWORD PTR [rsi+0x8]
   0x00000000004004ad <+13>:    call   0x4005e0 <verify_secret>
   0x00000000004004b2 <+18>:    test   eax,eax
   0x00000000004004b4 <+20>:    je     0x4004c2 <main+34>
   0x00000000004004b6 <+22>:    mov    edi,0x4006e8
   0x00000000004004bb <+27>:    call   0x400450 <puts@plt>
   0x00000000004004c0 <+32>:    xor    eax,eax
   0x00000000004004c2 <+34>:    add    rsp,0x8
   0x00000000004004c6 <+38>:    ret
   0x00000000004004c7 <+39>:    mov    rsi,QWORD PTR [rsi]
   0x00000000004004ca <+42>:    mov    edi,0x4006d4
   0x00000000004004cf <+47>:    xor    eax,eax
   0x00000000004004d1 <+49>:    call   0x400460 <printf@plt>
   0x00000000004004d6 <+54>:    mov    eax,0x1
   0x00000000004004db <+59>:    jmp    0x4004c2 <main+34>
End of assembler dump.

Let’s break this down a little.

    0x00000000004004a0 <+0>:     sub    rsp,0x8
    0x00000000004004a4 <+4>:     cmp    edi,0x1
    0x00000000004004a7 <+7>:     jle    0x4004c7 <main+39>

Starting at the beginning, we see the stack pointer decremented as part of the function prologue. The prologue is a set of setup steps involving saving the old frame’s base pointer on the stack, reassigning the base pointer to the current stack pointer, then subtracting the stack pointer a certain amount to make room on the stack for local variables, etc. We don’t see the former two steps because this is the main function so it doesn’t really have a function calling it, so saving/setting the base pointer isn’t necessary.

Then the edi register is compared to 1 and if it is less than or equal, we jump to offset 39.

   0x00000000004004c2 <+34>:    add    rsp,0x8
   0x00000000004004c6 <+38>:    ret
   0x00000000004004c7 <+39>:    mov    rsi,QWORD PTR [rsi]
   0x00000000004004ca <+42>:    mov    edi,0x4006d4
   0x00000000004004cf <+47>:    xor    eax,eax
   0x00000000004004d1 <+49>:    call   0x400460 <printf@plt>
   0x00000000004004d6 <+54>:    mov    eax,0x1
   0x00000000004004db <+59>:    jmp    0x4004c2 <main+34>

Here at offset 39, we print something then jump to offset 34 where we repair the stack (undo the sub instruction from the prologue) and return (ending execution).

This is likely how the program checks the arguments and prints the usage message if no arguments are supplied (which would cause argc/edi to be 1).

However if we supply an argument, edi is 0x2 and we move past the jle instruction.

   0x00000000004004a9 <+9>:     mov    rdi,QWORD PTR [rsi+0x8]
   0x00000000004004ad <+13>:    call   0x4005e0 <verify_secret>

Here we can see the verify_secret function being called with a parameter in rdi. This is most likely the argument we passed into the program. We can confirm this with gdb (I’m using it with peda here).

gdb-peda$ tele $rsi
0000| 0x7fffffffeb48 --> 0x7fffffffed6e ("/home/vagrant/crackme/crackme")
0008| 0x7fffffffeb50 --> 0x7fffffffed8c --> 0x4548530074736574 ('test')
0016| 0x7fffffffeb58 --> 0x0

Indeed rsi points to the first element of argv, so incrementing that by 8 bytes (because 64 bit) points to argv[1], which is our input.

If we look after the verify_secret call we can see the program checks if eax is 0 and if it is, jumps to offset 34, ending the program. However, if eax is not zero, we’ll hit a puts call before exiting, which will presumably print out the "ok" message we want.

   0x00000000004004b2 <+18>:    test   eax,eax
   0x00000000004004b4 <+20>:    je     0x4004c2 <main+34>
   0x00000000004004b6 <+22>:    mov    edi,0x4006e8
   0x00000000004004bb <+27>:    call   0x400450 <puts@plt>
   0x00000000004004c0 <+32>:    xor    eax,eax
   0x00000000004004c2 <+34>:    add    rsp,0x8
   0x00000000004004c6 <+38>:    ret

Now lets disassemble verify_secret to see how the input validation is performed, and to see how we can make it return non-zero.

Dump of assembler code for function verify_secret:
   0x00000000004005e0 <+0>:     sub    rsp,0x408
   0x00000000004005e7 <+7>:     movzx  eax,BYTE PTR [rdi]
   0x00000000004005ea <+10>:    mov    rcx,rsp
   0x00000000004005ed <+13>:    test   al,al
   0x00000000004005ef <+15>:    je     0x400622 <verify_secret+66>
   0x00000000004005f1 <+17>:    mov    rdx,rsp
   0x00000000004005f4 <+20>:    jmp    0x400604 <verify_secret+36>
   0x00000000004005f6 <+22>:    nop    WORD PTR cs:[rax+rax*1+0x0]
   0x0000000000400600 <+32>:    test   al,al
   0x0000000000400602 <+34>:    je     0x400622 <verify_secret+66>
   0x0000000000400604 <+36>:    xor    eax,0xfffffff7
   0x0000000000400607 <+39>:    lea    rsi,[rsp+0x400]
   0x000000000040060f <+47>:    add    rdx,0x1
   0x0000000000400613 <+51>:    mov    BYTE PTR [rdx-0x1],al
   0x0000000000400616 <+54>:    add    rdi,0x1
   0x000000000040061a <+58>:    movzx  eax,BYTE PTR [rdi]
   0x000000000040061d <+61>:    cmp    rdx,rsi
   0x0000000000400620 <+64>:    jb     0x400600 <verify_secret+32>
   0x0000000000400622 <+66>:    mov    edx,0x18
   0x0000000000400627 <+71>:    mov    esi,0x600a80
   0x000000000040062c <+76>:    mov    rdi,rcx
   0x000000000040062f <+79>:    call   0x400480 <memcmp@plt>
   0x0000000000400634 <+84>:    test   eax,eax
   0x0000000000400636 <+86>:    sete   al
   0x0000000000400639 <+89>:    add    rsp,0x408
   0x0000000000400640 <+96>:    movzx  eax,al
   0x0000000000400643 <+99>:    ret
End of assembler dump.

I won’t walk through this one in detail because understanding each line isn’t necessary to crack this. Let’s skip to the memcmp call. If memcmp returns 0, eax is set to 1 and the function returns. This is exactly what we want. From the man page, memcmp takes three parameters, two buffers to compare and their lengths, and returns 0 if the buffers are identical.

   0x0000000000400622 <+66>:    mov    edx,0x18
   0x0000000000400627 <+71>:    mov    esi,0x600a80
   0x000000000040062c <+76>:    mov    rdi,rcx
   0x000000000040062f <+79>:    call   0x400480 <memcmp@plt>

Here’s the setup to the memcmp call. We can see the third parameter for length is the immediate 0x18 meaning the buffers will be 24 bytes in length. If we examine address 0x600a80, we find this 24 byte string:

gdb-peda$ hexd 0x600a80 /2
0x00600a80 : 91 bf a4 85 85 c3 ba b9 9f a6 b6 b1 93 b9 83 8f   ................
0x00600a90 : ae b1 ae c1 bc 80 ca ca 00 00 00 00 00 00 00 00   ................

Since this is a direct address to some memory, we can be fairly certain that we’ve found some sort of secret value! Based on the movzx eax,BYTE PTR [rdi] instruction (offset 7) which moves a byte from the input string into eax, the xor eax, 0xfffffff7 instruction (offset 36), and the add rdi, 0x1 instruction (offset 54) which increments the char* pointer to our input string, we can reasonably guess that this function is xor’ing each character of our input with 0xf7 and writing the result into a buffer which begins at rsp (also pointed to by rcx). Since we now know the secret (\x91\xbf\xa4\x85...) and the xor key (0xf7) it’s pretty easy to extract the password we need by xor’ing each byte of the secret with the xor key.

Here’s a way to do this with python.

{% highlight python %} str = ‘\x91\xbf\xa4\x85\x85\xc3\xba\xb9\x9f\xa6\xb6\xb1\x93\xb9\x83\x8f\xae\xb1\xae\xc1\xbc\x80\xca\xca’ ba = bytearray(str) for i, byte in enumerate(ba): ba[i] ^= 0xf7 print ba {% endhighlight %}

Which results in this:

$ python crack.py
fHSrr4MNhQAFdNtxYFY6Kw==
$ ./crackme fHSrr4MNhQAFdNtxYFY6Kw==
ok

Any thoughts?