Monthly Archives: January 2026

WIP: Integers, safe arithmetic, and handling overflow (lab notes)

Some rough lab notes on how to do integer math with computers. Surprisingly deep topic! Usual caveat applies, this is just what I’ve learned so far and there may be inaccuracies. This is mainly from a C/C++ perspective, though there is a small discussion of Rust included.

It all starts with a simple prompt:

return_type computeBufferSize(integer_type headerLen, integer_type numElements, integer_type elementSize)
{
  // We conceptually want to do: headerLen + (numElements * elementSize)
}

How do we safely do something as simple as this arithmetic expression in C/C++? Furthermore, what type do we choose for integer_type and return_type?

Overflow

  • You can’t just do headerLen + (numElements * elementSize) literally, because that can overflow for valid input arguments
  • Why is overflow bad?
  • Reason 1: Logic errors. The result of the expression will wrap around, producing incorrect values. At best they will make the program behave incorrectly, at medium they will crash the program, and at worst, they will be security vulnerabilities.
  • Reason 2: Undefined behavior. This only applies to overflow on signed types. Signed overflow is UB, which means the fundamental soundness of the program is compromised, and it can behave unpredictably at runtime. Apparently this allows the compiler to perform useful optimizations, but can also cause critical miscompiles. (e.g. if guards later being compiled out of the program, causing security issues)
  • (Note: Apparently there are compiler flags to force signed overflow to have defined behavior. -fno-strict-overflow and -fwrapv)

How to handle overflow

  • Ok, overflow is bad. What does one do about it?
  • Even for the simplest operations of multiplication and addition, you need to be careful of how large the input operands can be, and whether overflow is possible.
  • For application code, it might often be the case that numbers are so small that this is not a concern, but if you’re dealing with untrusted input, buffer sizes, or memory offsets, then you need to be more careful, as these numbers can be larger.
  • In situations where you need to defend against overflow, you cannot just inline the whole arithmetic expression the natural way.
  • You need to evaluate the expression operation by operation, at each step checking if overflow would occur, and failing the function if so.
  • For example, here you’d first evaluate the multiplication in a checked fashion. And then if that succeeds, do the arithmetic.
  • For addition a + b, the typical check for size_t looks like: bool willOverflow = (a > SIZE_MAX - b);
  • For multiplication a * b, it looks like bool willOverflow = (b != 0 && a > SIZE_MAX / b);
  • For signed ints, the checks are more involved. This is a reason to prefer unsigned types in situations where overflow checking is needed.

Helper libraries

Manually adding inline overflow checking code everywhere is tedious, error prone, and harms readability. It would be nice if one could just write the arithmetic and spend less time worrying about the subtleties of overflow.

GCC and Clang offer builtins for doing checked arithmetic. They also have the benefit of checking signed integer overflow without UB.

In practice, it seems like some industrial C/C++ projects use helper utilities to make overflow-safe programming more ergonomic.

When to think about overflow

So when coding, when do you actually need to care about this?

You do not literally have to use checked-arithmetic for every single addition or multiplication in the program. Checked arithmetic clutters the code and can impact runtime performance, so you should only use it when it’s relevant.

It’s relevant when coding on the “boundary”. Arithmetic on untrusted or weakly trusted input data, e.g. from the network, filesystem, hardware, etc. Or where you’re implementing an API to be called by external users.

But for many normal, everyday math operations in your code, you probably don’t need it. (Unless you are particularly dealing with numbers so large that overflow is a possibility. So a certain amount of vigilance is needed.)

Rust

  • Rust has significant differences from C/C++ for its overflow behavior, which make it much less error prone.
  • Difference 1 – All overflow is defined, unlike signed overflow being UB in C/C++. This prevents critical miscompiles if signed overflow happens.
  • What about logic bugs where numbers wrap around and produce dangerously large values that are mishandled down the line?
  • Rust sometimes protects against these.
  • Rust protects against these in debug builds by injecting overflow checks, which panic if this happens. This impacts performance, but debug builds are slow anyway, and this helps find bugs faster in development.
  • In release builds, wrap-around logic bugs are still possible. Rust prevents undefined behavior and memory unsafety, but incorrect arithmetic can still cause panics, allocation failures, or incorrect program behavior.
  • Difference 2 – Rust has first-class support for checked arithmetic operations in the form of “methods” one can call on an integer. These are a clean and ergonomic way to safely handle overflow if that is relevant for code.

Choosing an integer type

There are a lot of int types. Which do you choose?

  • int
  • unsigned int
  • size_t
  • ssize_t
  • uint32_t, uint64_t
  • off_t
  • ptrdiff_t
  • uintptr_t

Here’s my current understanding.

First, split off the specific-purpose types:

  • off_t: (Signed) For file offsets
  • ptrdiff_t: (Signed) for pointer differences
  • uintptr_t: (Unsigned) For representing a pointer as an integer, in order to do math on it

Next, split off the explicit sized types: uint32_t, uint64_t, etc

These are mainly used when interfacing with external data: the network, filesystem, hardware, etc. (From C/C++! In Rust, it’s idiomatic to use explicit sized types in more situations).

That mostly leaves int vs size_t.

This is hotly debated. Here’s my rule of thumb:

  • For application code, dealing with “small” integers, counts: Use int. This lets you assert for non-negative. The downside is signed int overflow is UB, and thus dangerous, but the idea is that you should not be anywhere near overflowing since the integers are small. This is approximately the position of the Google Style C++ guide.
  • For “systems” code dealing with buffer sizes, offsets, raw memory: Use size_t. int is often too small for cases like this. size_t is “safer” in that overflow is defined. This is approximately the position of the Chromium C++ style guide.

In the above code that’s computing the buffer size, there’s a few reasons to avoid int.

First of all, int might be too small, depending on the usage case (e.g. generic libraries).

Secondly, it’s risky to use int because it’s easier to invoke dangerous UB. Since int overflow is UB, you risk triggering miscompiles (e.g. if guards being compiled out) if you accidentally overflow. This has caused many subtle security bugs.

In general, reasoning about overflow is more complicated with int than with size_t, since size_t has fully defined behavior.

However there are downsides of size_t. Notably, you can’t assert it as non-negative.

What about unsigned int?

Rule of thumb: Don’t use this. It’s a worse size_t because it’s smaller/undefined width, and a worse int because it can’t be asserted for non-negative.

What about ssize_t?

This is a size_t that can also contain negative values for error codes.

It seems somewhat useful, but is also error-prone to use, as it must always be checked for negativity before use as a size_t, otherwise you’ll have the classic issue where it produces a dangerously large unsigned variable.

In general, it seems like one should prefer explicit distinction between error state and integer state (e.g. bool return, and size_t out parameter, or std::optional<size_t>) over this.

Which types to choose for the example

The input arguments should be size_t, since we’re dealing with buffers, offsets, and raw memory.

The return value is a bit of a trick question. Any function like this must be able to return either a value, or an error if overflow happened.

The core return value should be size_t. But then how to express error in case of overflow?

std::optional<size_t> is a good choice if using C++.

ssize_t is an option, if using C.

But changing the signature to return a bool, and then having an out size_t parameter might be even better, and is the pattern used by the compiler builtins.

Related:

osdev journal: The virtual memory unit test experiment

In streams 97-100 or so, I start a basic implementation of code that manages the page tables of an address space. I figured this is fairly generic data structure code in the end, so it might be useful to write unit tests for it.

In the end the experiment sort of succeeded, but also I’ve abandoned the strong unit testing focus for the forseeable future.

The theory was mostly correct that this code is fairly generic data structure code, with some exceptions.

The code needs to allocate physical memory frames (in production), for page tables. It uses a frame allocator in the kernel to do so. That’s not available in the userspace unit tests, so that needs to be mocked out by a mock physical memory allocator.

Also, there’s address conversion. The VM code needs to convert between virtual and physical addresses because page tables store physical frame numbers. We need to convert it back to virtual in order to work with the data in the kernel, e.g. when walking the page tables.

These were a bit annoying to mock, but overall the approach worked and I could write unit tests for the map function.

But the reason I abandoned the approach in general for the kernel is because there are so many other things that will eventually need to be mocked. TLB flushing after page tables are modified. Other kinds of hardware accesses. And that’s not even to mention timing and concurrency which will play a major role.

A major difference between a kernel and normal application is the extent to which the code interfaces with “external services”. It’s commonplace to mock out external services, but in app code, there is usually finite set of them. In the kernel, there are so many touchpoints with hardware and external services that the mock layer would grow huge.

This actively gets in the way of development and causes friction. It’s not to say it’s not worth it, but there are reasons it’s hard to unit test kernels and why higher level system and integration tests are preferred.

Review & tips for Chaos Communications Congress

https://events.ccc.de/congress

I just got back from 39c3.

Review

  • This is hands-down, the largest, most technically sophisticated, most well-organized event I’ve ever been to.
  • 16000 people, 4 days in Hamburg’s CCH convention center
  • While there is massive chaos from the nature of 16k people in one place, with so much to do, it is also the most organized, controlled, and structured form of chaos I’ve ever seen
  • For a sense of this, just watch any of the infra reviews: https://media.ccc.de/v/39c3-infrastructure-review
  • These people obviously just love taking things to an extreme in terms of how organized, visualized, and well-functioning they can make things. They did it so well, that in 39c3, infra teams ran out of problems to solve and started fixing the actual CCH facilities itself (a broken accessibility ramp). This is also not just digital work, this is real, physical labor and handywork.
  • The online documentation is incredible. Everything is very clearly written out either on the blog or wiki, or somewhere else. My only nit is that it can sometimes be hard to find info.
  • There is a mind-blowing amount of custom technical infrastructure created for this. Everything from managing and submitting events, to securely selling the actual open tickets (which sell out in seconds), to dashboards for the queue times for checking in (with comparisons to previous years), to dashboards for the CCH power consumption, network traffic, … you name it
  • Massive lines for checkin or coat check are well organized, often with volunteers clearly marking the end of the line, or at least some kind of laminated paper held by the last person and line and passed on when new people enter the lines.
  • Talks are all live-streamed AND often have live-translation in multiple languages by volunteers AND are all archived on https://media.ccc.de
  • Incredible focus on accessibility. They add custom blind-accessible braille bathroom maps in from of all bathrooms, just for the congress(?)
  • There is a locally running phone system (?) and even GSM (mobile phone network) running at the event
  • There is on-site security, autism support, accessibility support, etc.
  • There is complete automated infrastructure for volunteers โ€” for posting jobs that need to be done, and for signing up for work, and tracking that it’s done. Plus break room and food for volunteers (angels).
  • The queues are super-optimized for speed, e.g. check in and baggage claim.

Tips

  • The food on site is somewhat expensive and small portions. Buy a few sandwiches and items at Dammtor station on the way in to last you for the day.
  • There is apparently c3print printing service on site if you need to print flyers (e.g. for a meetup)
  • If you are organizing a meetup last minute, a long table in the food/bar area is a decent option. If you go a bit early, you can often claim an entire table for the meetup. This is nice as it requires no reservation (the room reservations easy get booked out), and allows for standing space with a table so people can mingle around and laptops can be shown.
  • Early check in to get your wristband opens on Day -1 in the afternoon until late. Go there to avoid a large queue.
  • If you want to go to the opening talk, go a bit early. It can fill up completely.
  • In general you need to be fast for things. With so many people, things are often booked quickly.
  • Consider not connecting to the Wifi, and using mobile data instead. Also consider using Lockdown mode on iOS devices, especially near the Hack Different assembly ๐Ÿ˜‰
  • The line for retrieving coats/luggage gets very long on Day 4 towards the end of the day – reserve up to ~30 min for waiting in line.
  • Coat check and baggage claim can fill up.
  • Everyone leaves and the party ends on Day 4 – no need to book hotel lodging for that night, unless you especially want to stay.
  • The infodesk does not love it if you need to borrow a marker from them for your signs. They’re a bit more accomodating for the special tape needed to hang things up on the walls.
  • Time goes extremely fast during the congress due to all the chaos and things to do.
  • It can be lonely, since there are so many people there, many seemingly with their own friend groups.
  • Talks can sometimes be a waste of time (as with any conference), prefer other ways of spending time unless you really want to see a talk, want to hang out with someone by seeing a talk together, or want to sit in a comfortable seat and talk a break and watch a talk.
  • It’s easy to get sick. Take immune supplements and consider masking.