Overview#
You Can’t See Me is a fun CTF on Hack The Box that requires you to reverse engineer a simple C application. It’s generally rated as an “Easy” challenge, and is a good introduction to reversing software and performing malware analysis. As with the other CTF guides, answers will be blurred out. Also for brevity, I won’t be including all output of every command.
You can find the link to You Can’t See Me here.
All you need to do is download the ZIP and extract it. There should only be
one file inside, named auth
, with this MD5:
Static Analysis#
file#
Let’s first examine what type of file we’re dealing with here with the file
command:
1auth: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, stripped
So we know that this is an ELF executable, so it will execute on Linux. It’s also a 64-bit LSB executable, so we should expect to interact with R/64-bit registers when we disassemble the code (rax, rdi, rip, etc.). Finally, it’s stripped, meaning that the debugging symbols have been removed1. This will make the disassembly and reversing of this program much more difficult, as you will later see.
strings#
We can use the strings
command to pull out some static data from the file:
1/lib64/ld-linux-x86-64.so.2
2libc.so.6
3stdin
4printf
5fgets
6malloc
7strcmp
8__libc_start_main
9GLIBC_2.2.5
10...
11Welcome!
12I said, you can't c me!
13HTB{%s}
14this_is_the_password
15...
16GCC: (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008
17...
From this output, we can gather that this project was also compiled with GCC, and we
can see some of the functions used in C, such as malloc
, fgets
, __libc_start_main
,
and so on. We can also see that the flag is set dynamically with a variable,
and will be put into HTB{%s}
. As for the actual code, we see some text being printed
out, with an interesting one being this_is_the_password
.
objdump#
Finally, we can dump out all the disassembled code with objump
to get an idea
of what we’re working with. I’m used to reading Intel’s assembly style, so
that’s why I specified -M intel
.
1auth: file format elf64-x86-64
2
3Disassembly of section .init:
40000000000401000 <.init>:
5
6Disassembly of section .plt:
70000000000401020 <printf@plt-0x10>:
80000000000401030 <printf@plt>:
90000000000401040 <fgets@plt>:
100000000000401050 <strcmp@plt>:
110000000000401060 <malloc@plt>:
12
13Disassembly of section .text:
140000000000401070 <.text>:
15
16Disassembly of section .fini:
170000000000401308 <.fini>:
We do see some of the functions that we noticed earlier from the strings
output,
but there’s some notable ones missing. __libc_start_main
should be under the .plt
section, but it’s not there. We also apparently don’t have a main
or _start
function,
both of which should be under .text
. main
is where the application code is, so
we’ll need to know where it is to debug effectively with all this assembly code.
Since there’s no visible entry point function to this application, you
won’t be able to set breakpoints to main
or _start
in your debugging
tool. You can attempt it with GDB now, and you’ll encounter an error message:
If I try to set a breakpoint…
It doesn’t work, and this confirms our findings from the file analysis
that the executable was stripped.
This means that we’ll have to entirely work with addresses to figure out where
the main
function truly is.
File Headers#
ELF executables contain headers, and one of these headers contains the
entry point address! There are a couple of ways to do this2, one of
which is already an objdump
option, the f
flag.
1./auth: file format elf64-x86-64
2architecture: i386:x86-64, flags 0x00000112:
3EXEC_P, HAS_SYMS, D_PAGED
4start address 0x0000000000401070
Alright, so we have the entry point at 0x401070
. Now we can start dynamic
analysis!
Dynamic Analysis: GDB#
Now, the application doesn’t actually start from main
right away. First, it loads all the memory
and dynamic libraries needed, which occurs in a _start
procedure3. That _start
procedure
is where the entry point drops us off at, and so we need to look for the address for main
from there.
Note: I was running GDB with Intel’s assembly style, which is easier on the eyes in my opinion. You can set this within GDB by running:
Locating the main function#
Load the file with gdb, and set a breakpoint at the entry point, using break *0x401070
.
Then run
to be placed at the start of the application. Look at the next couple of
instructions to see where main
will be called from.
1Dump of assembler code from 0x401070 to 0x4010a2:
2=> 0x0000000000401070: endbr64
3 0x0000000000401074: xor ebp,ebp
4 0x0000000000401076: mov r9,rdx
5 0x0000000000401079: pop rsi
6 ...
7 0x000000000040108a: mov rcx,0x401290
8 0x0000000000401091: mov rdi,0x401160
9 0x0000000000401098: call QWORD PTR [rip+0x2f52] # 0x403ff0
10 0x000000000040109e: hlt
11 0x000000000040109f: nop
12 0x00000000004010a0: endbr64
13End of assembler dump.
That hlt
instruction puts the CPU into an idle state4. For us, it means
that the program executable has finished, so we can ignore all the
instructions after it. This means that main
logically would have to run
before hlt
, and it would be invoked via the call
instruction.
Note: So apparently, the address that gets moved into the rdi
register,
is in fact the address of the main
function5. I could only manage to find this
out from multiple StackOverflow answers to similar questions, for example
here,
here,
here, and
here.
No answers have explained precisely why this is the
case however, and so I will have to come back to this guide and explain the
reasoning behind this. For now, we’ll just accept that we know the address
of main
from rdi
.
The rdi
register is set to 0x401160
, so we can now set a breakpoint for
main with break *0x401160
.
Examining main#
Now, let’s look at the instructions for main
. We know this function ends
when the ret
instruction is invoked, so everything between here and there
contains the application code! I didn’t include all the output below, just
only kept the highlights.
1=> 0x0000000000401160: push rbp
2 ...
3 0x00000000004011f1: cmp DWORD PTR [rbp-0x8],0x14
4 0x00000000004011f5: jge 0x40121f
5 0x00000000004011fb: movsxd rax,DWORD PTR [rbp-0x8]
6 0x00000000004011ff: movsx ecx,BYTE PTR [rbp+rax*1-0x40]
7 0x0000000000401204: add ecx,0xa
8 0x0000000000401207: mov dl,cl
9 0x0000000000401209: movsxd rax,DWORD PTR [rbp-0x8]
10 0x000000000040120d: mov BYTE PTR [rbp+rax*1-0x20],dl
11 0x0000000000401211: mov eax,DWORD PTR [rbp-0x8]
12 0x0000000000401214: add eax,0x1
13 0x0000000000401217: mov DWORD PTR [rbp-0x8],eax
14 0x000000000040121a: jmp 0x4011f1
15 0x000000000040121f: mov esi,0x15
16 0x0000000000401224: mov rdi,QWORD PTR [rbp-0x28]
17 0x0000000000401228: mov rdx,QWORD PTR ds:0x404050
18 0x0000000000401230: call 0x401040 <fgets@plt>
19 0x0000000000401235: lea rdi,[rbp-0x20]
20 0x0000000000401239: mov rsi,QWORD PTR [rbp-0x28]
21 0x000000000040123d: mov QWORD PTR [rbp-0x50],rax
22 0x0000000000401241: call 0x401050 <strcmp@plt>
23 0x0000000000401246: cmp eax,0x0
24 0x0000000000401249: je 0x401268
25 ...
26 0x000000000040125b: call 0x401030 <printf@plt>
27 ...
28 0x0000000000401263: jmp 0x401280
29 ...
30 0x0000000000401278: call 0x401030 <printf@plt>
31 ...
32 0x0000000000401288: ret
This program seems as though it creates some string, and then prompts the user
for a response. Depending on the answer given, a different message will be
printed out. If you play around with the app, you’ll see that the wrong
answer always outputs I said, you can't c me!
, as seen earlier in our
strings analysis. Also, that text this_is_the_password
is
just a trick, and doesn’t even work here.
Looking at the code closely, you may notice a loop between 0x4011f1
and
0x40121a
. The only time it exits is when DWORD PTR [rbp-0x8]
is
greater than or equal to 0x14
6. In decimal, 0x14
is 20, so this loop
only ends when a 20-character DWORD is generated.
Getting the Flag#
rax
is being used as the counter, and every iteration, dl
is being
shifted into the corresponding index:
So at the end of 20 iterations, we should be able to access the complete
set of 20 bytes, which is pointed to by rbp - 0x20
. We can do this by setting a breakpoint
just after the loop ends, at 0x40121f
.
At this point, we can just print out the value stored at the address of the byte pointer (hidden):
We can continue on from here to where we input our answer, and this is in fact the correct one!
Final Thoughts#
This was a cool reverse engineering challenge where I learned a lot about assembly, debugging, and some of the process of how programs are run on Linux. It did require a lot of googling when I got stuck, and I’ve included below some of the resources that helped me along the way. Happy hacking!
die.net: strip(1) - Linux man page. https://linux.die.net/man/1/strip ↩︎
StackOverflow: Reversing ELF 64-Bit LSB Executable x86-64 gdb. https://reverseengineering.stackexchange.com/a/3816 ↩︎
Embedded Artistry: A General Overview of What Happens Before main(). https://embeddedartistry.com/blog/2019/04/08/a-general-overview-of-what-happens-before-main/#genview ↩︎
Wikipedia: HLT (x86 instruction). https://en.wikipedia.org/wiki/HLT_(x86_instruction) ↩︎
StackOverflow: How to handle stripped binaries with GDB? https://reverseengineering.stackexchange.com/a/1936 ↩︎
Wikibooks: x86 Assembly/Control Flow https://en.wikibooks.org/wiki/X86_Assembly/Control_Flow#Jump_if_Greater_or_Equal ↩︎