Category Archives: C

WIP: Integers, safe arithmetic, and handling overflow (lab notes)

Some rough lab notes on how to do integer math with computers. Surprisingly deep topic! Usual caveat applies, this is just what I’ve learned so far and there may be inaccuracies. This is mainly from a C/C++ perspective, though there is a small discussion of Rust included.

It all starts with a simple prompt:

return_type computeBufferSize(integer_type headerLen, integer_type numElements, integer_type elementSize)
{
  // We conceptually want to do: headerLen + (numElements * elementSize)
}

How do we safely do something as simple as this arithmetic expression in C/C++? Furthermore, what type do we choose for integer_type and return_type?

Overflow

  • You can’t just do headerLen + (numElements * elementSize) literally, because that can overflow for valid input arguments
  • Why is overflow bad?
  • Reason 1: Logic errors. The result of the expression will wrap around, producing incorrect values. At best they will make the program behave incorrectly, at medium they will crash the program, and at worst, they will be security vulnerabilities.
  • Reason 2: Undefined behavior. This only applies to overflow on signed types. Signed overflow is UB, which means the fundamental soundness of the program is compromised, and it can behave unpredictably at runtime. Apparently this allows the compiler to perform useful optimizations, but can also cause critical miscompiles. (e.g. if guards later being compiled out of the program, causing security issues)
  • (Note: Apparently there are compiler flags to force signed overflow to have defined behavior. -fno-strict-overflow and -fwrapv)

How to handle overflow

  • Ok, overflow is bad. What does one do about it?
  • Even for the simplest operations of multiplication and addition, you need to be careful of how large the input operands can be, and whether overflow is possible.
  • For application code, it might often be the case that numbers are so small that this is not a concern, but if you’re dealing with untrusted input, buffer sizes, or memory offsets, then you need to be more careful, as these numbers can be larger.
  • In situations where you need to defend against overflow, you cannot just inline the whole arithmetic expression the natural way.
  • You need to evaluate the expression operation by operation, at each step checking if overflow would occur, and failing the function if so.
  • For example, here you’d first evaluate the multiplication in a checked fashion. And then if that succeeds, do the arithmetic.
  • For addition a + b, the typical check for size_t looks like: bool willOverflow = (a > SIZE_MAX - b);
  • For multiplication a * b, it looks like bool willOverflow = (b != 0 && a > SIZE_MAX / b);
  • For signed ints, the checks are more involved. This is a reason to prefer unsigned types in situations where overflow checking is needed.

Helper libraries

Manually adding inline overflow checking code everywhere is tedious, error prone, and harms readability. It would be nice if one could just write the arithmetic and spend less time worrying about the subtleties of overflow.

GCC and Clang offer builtins for doing checked arithmetic. They also have the benefit of checking signed integer overflow without UB.

In practice, it seems like some industrial C/C++ projects use helper utilities to make overflow-safe programming more ergonomic.

When to think about overflow

So when coding, when do you actually need to care about this?

You do not literally have to use checked-arithmetic for every single addition or multiplication in the program. Checked arithmetic clutters the code and can impact runtime performance, so you should only use it when it’s relevant.

It’s relevant when coding on the “boundary”. Arithmetic on untrusted or weakly trusted input data, e.g. from the network, filesystem, hardware, etc. Or where you’re implementing an API to be called by external users.

But for many normal, everyday math operations in your code, you probably don’t need it. (Unless you are particularly dealing with numbers so large that overflow is a possibility. So a certain amount of vigilance is needed.)

Rust

  • Rust has significant differences from C/C++ for its overflow behavior, which make it much less error prone.
  • Difference 1 – All overflow is defined, unlike signed overflow being UB in C/C++. This prevents critical miscompiles if signed overflow happens.
  • What about logic bugs where numbers wrap around and produce dangerously large values that are mishandled down the line?
  • Rust sometimes protects against these.
  • Rust protects against these in debug builds by injecting overflow checks, which panic if this happens. This impacts performance, but debug builds are slow anyway, and this helps find bugs faster in development.
  • In release builds, wrap-around logic bugs are still possible. Rust prevents undefined behavior and memory unsafety, but incorrect arithmetic can still cause panics, allocation failures, or incorrect program behavior.
  • Difference 2 – Rust has first-class support for checked arithmetic operations in the form of “methods” one can call on an integer. These are a clean and ergonomic way to safely handle overflow if that is relevant for code.

Choosing an integer type

There are a lot of int types. Which do you choose?

  • int
  • unsigned int
  • size_t
  • ssize_t
  • uint32_t, uint64_t
  • off_t
  • ptrdiff_t
  • uintptr_t

Here’s my current understanding.

First, split off the specific-purpose types:

  • off_t: (Signed) For file offsets
  • ptrdiff_t: (Signed) for pointer differences
  • uintptr_t: (Unsigned) For representing a pointer as an integer, in order to do math on it

Next, split off the explicit sized types: uint32_t, uint64_t, etc

These are mainly used when interfacing with external data: the network, filesystem, hardware, etc. (From C/C++! In Rust, it’s idiomatic to use explicit sized types in more situations).

That mostly leaves int vs size_t.

This is hotly debated. Here’s my rule of thumb:

  • For application code, dealing with “small” integers, counts: Use int. This lets you assert for non-negative. The downside is signed int overflow is UB, and thus dangerous, but the idea is that you should not be anywhere near overflowing since the integers are small. This is approximately the position of the Google Style C++ guide.
  • For “systems” code dealing with buffer sizes, offsets, raw memory: Use size_t. int is often too small for cases like this. size_t is “safer” in that overflow is defined. This is approximately the position of the Chromium C++ style guide.

In the above code that’s computing the buffer size, there’s a few reasons to avoid int.

First of all, int might be too small, depending on the usage case (e.g. generic libraries).

Secondly, it’s risky to use int because it’s easier to invoke dangerous UB. Since int overflow is UB, you risk triggering miscompiles (e.g. if guards being compiled out) if you accidentally overflow. This has caused many subtle security bugs.

In general, reasoning about overflow is more complicated with int than with size_t, since size_t has fully defined behavior.

However there are downsides of size_t. Notably, you can’t assert it as non-negative.

What about unsigned int?

Rule of thumb: Don’t use this. It’s a worse size_t because it’s smaller/undefined width, and a worse int because it can’t be asserted for non-negative.

What about ssize_t?

This is a size_t that can also contain negative values for error codes.

It seems somewhat useful, but is also error-prone to use, as it must always be checked for negativity before use as a size_t, otherwise you’ll have the classic issue where it produces a dangerously large unsigned variable.

In general, it seems like one should prefer explicit distinction between error state and integer state (e.g. bool return, and size_t out parameter, or std::optional<size_t>) over this.

Which types to choose for the example

The input arguments should be size_t, since we’re dealing with buffers, offsets, and raw memory.

The return value is a bit of a trick question. Any function like this must be able to return either a value, or an error if overflow happened.

The core return value should be size_t. But then how to express error in case of overflow?

std::optional<size_t> is a good choice if using C++.

ssize_t is an option, if using C.

But changing the signature to return a bool, and then having an out size_t parameter might be even better, and is the pattern used by the compiler builtins.

Related:

osdev journal: Gotchas with cc-runtime/libgcc

libclang_rt/libgcc are compiler runtime support library implementations, which the compiler occasionally emits calls into instead of directly inlining the codegen. Usually this is for software implementations of math operations (division/mod). Generally you’ll need a runtime support library for all but the most trivial projects.

cc-runtime is a utility library for hobby OS developers. It is a standalone version of libclang_rt, which can be included vendored into an OS build.

The main advantage for me is that it lets me use a prebuilt clang from Homebrew. The problem with prebuilt clang from Homebrew, is it doesn’t come with a libclang_rt compiled for i386 (which makes sense, why would it — I’m on an ARM64 Mac).

(This is unlike the prebuilt i386-elf-gcc in Homebrew, which does come with a prebuilt libgcc for i386).

Since it doesn’t come with libclang_rt for i386, my options are:

Option
Keep using libgcc from i386-elf-gcc in HomebrewUndesirable — the goal is to only depend on one toolchain, and here I’d depend on both clang and gcc.
Build clang and libclang_rt from sourceUndesirable — it’s convenient to avoid building the toolchain from source if possible.
Vendor in a libgcc.a binary from https://codeberg.org/osdev/libgcc-binariesUndesirable — vendoring binaries should be a last resort
Use cc-runtimeBest — No vendored binaries, no gcc dependency, no building toolchain from source

However, cc-runtime does have a gotcha. If you’re not careful, you’ll balloon your binary size.

This is because the packed branch of cc-runtime (which is the default and easiest to integrate) packs all the libclang_rt source files into a single C file, which produces a single .o file. So the final .a library has a single object file in it.

This is in contrast to libgcc.a (or a typical compilation of libclang_rt) where the .a library probably contains multiple .o files — one for each .c file.

By default, linkers will optimize and only use any .o files in the .a library that are needed. But since cc-runtime is a single .o file, the whole thing will get included! This means, the binary will potentially include many libclang_rt functions that are unused.

In my case, the size of one of my binaries went from 36k (libgcc) to 56k (cc-runtime, naive).

To work around this, you either need to use the trunk branch of cc-runtime (which doesn’t pack them all into one .c file). This is ~30 .c files and slightly more annoying to integrate into the build system.

Or, you can use some compiler/linker flags to make the linker optimization more granular and work at the function level, instead of the object file level.

Those are:

Compiler flag:

-ffunction-sections -fdata-sections

Linker flag

--gc-sections

With this, my binary size reduced to 47k. So there is still a nontrivial size increase, but the situation is slightly improved.

Ultimately, my preferred solution is the first: to use the trunk branch. The build integration is really not that bad, and the advantage is you don’t need to remember to use the special linker flag, which you’d otherwise need to ensure is in any link command for any binary that links against cc-runtime.

That said, those compiler/linker flags are probably a good idea to use anyway, so the best solution might be to do both.

Pure GNU Make is (unfortunately) not enough

I’d love to simply use GNU Make for my C/C++ projects because it’s so simple to get started with. Unfortunately it’s lacking a few essential qualify of life features:

  • Out of tree builds
  • Automatic header dependency detection
  • Recompile on CFLAGS change
  • First class build targets

Out of tree builds

If you want your build to happen in an isolated build directory (as opposed to creating object files in your source tree), you need to implement this yourself.

It involves a lot of juggling paths. Not fun!

Automatic header dependency detection

In C/C++, when a header file changes, you must recompile all translation units (i.e. object files, roughly) that depend on (i.e. include) that header. If you don’t, the object file will become stale and none of your changes to constants, defines, or struct definitions (for example) will be picked up.

In Make rules, you typically express dependencies between source files, and object files, e.g:

%.o: %.c
  # run compiler command here

This will recompile the object file when the source file changes, but won’t recompile when any headers included by that source file change. So it’s not good enough out of the box.

To fix this, you need to manually implement this by:

  1. Passing the proper flags to the compiler to cause it to emit header dependency information. (Something like -MMD. I don’t know them exactly because that’s my whole point =)
  2. Instructing the build to include that generate dependency info (Something like
    -include $(OBJECTS:.o=.d)
    )

The generated dependency info looks like this:

pmap.c.obj: \
 kern/pmap.c \
 inc/x86.h \
 inc/types.h \
 inc/mmu.h \
 inc/error.h \
 inc/string.h \

Recompile on CFLAGS change

In addition to recompiling if headers change, you also want to recompile if any of your compiler, linker, or other dev tool flags change.

Make doesn’t provide this out of the box, you’ll also have to implement this yourself.

This is somewhat nontrivial. For an example, check out how the JOS OS (from MIT 6.828 (2018)) does it: https://github.com/offlinemark/jos/blob/1d95b3e576dd5f84b739fa3df773ae569fa2f018/kern/Makefrag#L48

First class build targets

In general, it’s nice to have build targets as a first class concept. They express source files compiled by the module, and include paths to reach headers. Targets can depend on each other and seamlessly access the headers of another target (the build system makes sure all -I flags are passed correctly).

This is also something you’d have to implement yourself, and there are probably limitations to how well your can do it in pure Make.


Make definitely has it’s place for certain tasks (easily execute commonly used command lines), but I find it hard to justify using it for anything non-trivially sized compared to more modern alternatives like CMake, Meson, Bazel, etc.

That said, large projects like Linux use it, so somehow they must make it work!

Mutexes, atomics, lockfree programming

Some rough lab notes on these topics to record the current state of my knowledge. I’m not an expert, so there may be inaccuracies.

Mutexes

  • On Linux, libpthread mutexes are implemented using the underlying futex syscall
  • They are basically a combination of a spinlock (in userspace), backed by the kernel for wait/signal operations only when absolutely necessary (i.e. when there’s contention). In the common case of an uncontended lock acquire, there is no context switch which improves performance
  • The userspace spinlock portion uses atomics as spinlocks usually do, specifically because the compare and set must be atomic
  • Jeff Preshing (see below) writes that each OS/platform has an analogous concept to this kind of “lightweight” mutex — Windows and macOS have them too
  • Before futex(2), other syscalls were used for blocking. One option might have been the semaphore API, but commit 56c910668cff9131a365b98e9a91e636aace337a in glibc is before futex, and it seems like they actually use signals. (pthread_mutex_lock -> __pthread_lock (still has spinlock elements, despite being before futex) -> suspend() -> __pthread_suspend -> __pthread_wait_for_restart_signal -> sigsuspend)
  • A primary advantage of futex over previous implementations is that futexes only require kernel resources when there’s contention
  • Like atomics, mutexes implementations include memory barriers (maybe even implicitly due to atomics) to prevent loads/stores from inappropriately crossing the lock/unlock boundary due to compiler and/or hardware instruction reordering optimizations
Continue reading

How the hardware/software interface works

It’s really all about memory. But to start at the beginning, the rough stack looks like this:

  • Userspace application
  • Kernel driver
  • Hardware device

I find it easier to think about this from the middle out. On Linux, the kernel exposes hardware devices as files backed by the /dev virtual filesystem. Userspace can do normal syscalls like open, read, write, and mmap on them, as well as the less typical ioctl (for more arbitrary, device-specific functionality).1.

The files are created by kernel drivers which are modules of kernel code whose sole purpose is to interface with and abstract hardware so it can be used by other parts of the operating system, or userspace. They are implemented implemented using internal driver “frameworks” in the kernel, e.g. the I2C or SPI frameworks. When you interface with a file in /dev, you are directly triggering callback handlers in a driver which execute in the process context.

That’s how userspace interfaces with the kernel. How do drivers interface with hardware? These days, mostly via memory mapped I/O (MMIO)2. This is when device hardware “appears” at certain physical addresses, and can be interfaced with via load and store instructions using an “API” that the device defines. For example, you can read data from a sensor by simply reading a physical address, or write data out to a device by writing to an address. The technical term for the hardware component these reads/writes interface with is “registers” (i.e. memory mapped registers).

(Aside: Other than MMIO, the other main interface the kernel has with hardware is interrupts, for interrupt driven I/O processing (as opposed to polling, which is what MMIO enables). I’m not very knowledgeable about this, so I won’t get into it other than to say drivers can register handlers for specific IRQ (interrupt requests) numbers, which will be invoked by the kernel’s generic interrupt handling infrastructure.)

Using MMIOs looks a lot like embedded bare metal programming you might do on a microcontroller like a PIC or Arduino (AVR). At the lowest level, a kernel driver is really just embedded bare metal programming.

Here’s an example of a device driver for UART (serial port) hardware for ARM platforms: linux/drivers/tty/serial/amba-pl011.c. If you’re debugging an ARM Linux system via a serial connection, this is might be the driver being used to e.g. show the boot messages.

The lines like:

cr = readb(uap->port.membase + UART010_CR);

are where the real magic happens.

This is simply doing a read from a memory address derived from some base address for the device, plus some offset of the specific register in question. In this case it’s reading some control information from a Control Register.

#define UART010_CR		0x14	/* Control register. */

linux/include/linux/amba/serial.h#L28

Device interfaces may range from having just a few to many registers.

To go one step deeper down the rabbit hole, how do devices “end up” at certain physical addresses? How is this physical memory map interface implemented?3

The device/physical address mapping is implemented in digital logic outside the CPU, either on the System on Chip (SOC) (for embedded systems), or on the motherboard (PCs)4. The CPU’s physical interface include the address, data, and control buses. Digital logic converts bits of the address bus into signals that mutually exclusively enable devices that are physically connected to the bus. The implementations of load/store instructions in the CPU set a read/write bit appropriately in the Control bus, which lets devices know whether a read or write is happening. The data bus is where data is either transferred out from or into the CPU.

In practice, documentation for real implementations of these systems can be hard to find, unless you’re a customer of the SoC manufacturer. But there are some out there for older chips, e.g.

Here’s a block diagram for the Tegra 2 SoC architecture, which shipped in products like the Motorola Atrix 4G, Motorola Droid X2, and Motorola Photon. Obviously it’s much more complex than my description above. Other than the two CPU cores in the top left, and the data bus towards the middle, I can’t make sense of it. (link)

While not strictly a “System on Chip”, a classic PIC microcontroller has many shared characteristics of a SoC (CPU, memory, peripherals, all in one chip package), but is much more approachable.

We can see the single MIPS core connected to a variety of peripheral devices on the peripheral bus. There’s even layers of peripheral bussing, with a “Peripheral Bridge” connected to a second peripheral bus for things like I2C and SPI.

Linux Internals: How /proc/self/mem writes to unwritable memory

Introduction

An obscure quirk of the /proc/*/mem pseudofile is its “punch through” semantics. Writes performed through this file will succeed even if the destination virtual memory is marked unwritable. In fact, this behavior is intentional and actively used by projects such as the Julia JIT compiler and rr debugger.

This behavior raises some questions: Is privileged code subject to virtual memory permissions? In general, to what degree can the hardware inhibit kernel memory access?

By exploring these questions5, this article will shed light on the nuanced relationship between an operating system and the hardware it runs on. We’ll examine the constraints the CPU can impose on the kernel, and how the kernel can bypass these constraints.

Continue reading

The tradeoffs in using -Weverything

In addition to the well-known -Wall, clang offers another interesting warning flag: -Weverything. While it sounds like a “strictly better” option (the more warnings, the better?) it’s not.

This topic is already well covered by Arthur O’Dwyer’s blog post and the clang documentation.

Arthur makes a strong case against use of this flag. The clang docs generally agree, though offering a slightly more lenient position:

If you do use -Weverything then we advise that you address all new compiler diagnostics as they get added to Clang, either by fixing everything they find or explicitly disabling that diagnostic with its corresponding Wno- option.

But what I have not seen clearly expressed is how use of -Weverything is a tradeoff.

Continue reading

Surgical formatting with git-clang-format

If you’re already a 10x engineer, you probably won’t need this article. But for the rest of us, this is what I wish I knew about clang-format as an inexperienced C++ programmer: how to only format the changes in your pull request.


You may have already heard of clang-format. It auto-formats source files for languages including C and C++. You can aim it at a source file and format the entire thing using clang-format -i file.cpp.

If you’re contributing to a project that is already 100% clang-format clean, then this workflow works fine. But you’ll occasionally encounter a project that is not quite 100% formatted, such as LLVM, osquery, or Electron6.

For these projects, the “format entire files” workflow doesn’t work because you’ll incidentally format parts of the files that are unrelated to your contribution. This will add noise to your diff and make it harder for your reviewers.

In this case, you need a way to surgically format only the lines changed in your contribution. To do this, you can use the clang-format git extension. This article will cover the basics of git-clang-format, including a practical workflow that allows for messy development, and formatting at the end.

Continue reading

How setjmp and longjmp work (2016)

Pretty recently I learned about setjmp() and longjmp(). They’re a neat pair of libc functions which allow you to save your program’s current execution context and resume it at an arbitrary point in the future (with some caveats7). If you’re wondering why this is particularly useful, to quote the manpage, one of their main use cases is “…for dealing with errors and interrupts encountered in a low-level subroutine of a program.” These functions can be used for more sophisticated error handling than simple error code return values.

I was curious how these functions worked, so I decided to take a look at musl libc’s implementation for x86. First, I’ll explain their interfaces and show an example usage program. Next, since this post isn’t aimed at the assembly wizard, I’ll cover some basics of x86 and Linux calling convention to provide some required background knowledge. Lastly, I’ll walk through the source, line by line.

Continue reading