Featured post

Linux Internals: How /proc/self/mem writes to unwritable memory

Introduction

An obscure quirk of the /proc/*/mem pseudofile is its “punch through” semantics. Writes performed through this file will succeed even if the destination virtual memory is marked unwritable. In fact, this behavior is intentional and actively used by projects such as the Julia JIT compiler and rr debugger.

This behavior raises some questions: Is privileged code subject to virtual memory permissions? In general, to what degree can the hardware inhibit kernel memory access?

By exploring these questions1, this article will shed light on the nuanced relationship between an operating system and the hardware it runs on. We’ll examine the constraints the CPU can impose on the kernel, and how the kernel can bypass these constraints.

Continue reading
Featured post

Double fetches, scheduling algorithms, and onion rings

Most people thought I was crazy for doing this, but I spent the last few months of my gap year working as a short order cook at a family-owned fast-food restaurant. (More on this here.) I’m a programmer by trade, so I enjoyed thinking about the restaurant’s systems from a programmer’s point of view. Here’s some thoughts about two such systems.

Continue reading
Featured post

What they don’t tell you about demand paging in school

This post details my adventures with the Linux virtual memory subsystem, and my discovery of a creative way to taunt the OOM (out of memory) killer by accumulating memory in the kernel, rather than in userspace.

Keep reading and you’ll learn:

  • Internal details of the Linux kernel’s demand paging implementation
  • How to exploit virtual memory to implement highly efficient sparse data structures
  • What page tables are and how to calculate the memory overhead incurred by them
  • A cute way to get killed by the OOM killer while appearing to consume very little memory (great for parties)

Note: Victor Michel wrote a great follow up to this post here.

Continue reading
Featured post

How setjmp and longjmp work (2016)

Pretty recently I learned about setjmp() and longjmp(). They’re a neat pair of libc functions which allow you to save your program’s current execution context and resume it at an arbitrary point in the future (with some caveats2). If you’re wondering why this is particularly useful, to quote the manpage, one of their main use cases is “…for dealing with errors and interrupts encountered in a low-level subroutine of a program.” These functions can be used for more sophisticated error handling than simple error code return values.

I was curious how these functions worked, so I decided to take a look at musl libc’s implementation for x86. First, I’ll explain their interfaces and show an example usage program. Next, since this post isn’t aimed at the assembly wizard, I’ll cover some basics of x86 and Linux calling convention to provide some required background knowledge. Lastly, I’ll walk through the source, line by line.

Continue reading

osdev journal: bootloaders and booting (grub, multiboot, limine, BIOS, EFI)

Here’s my rough lab notes from what I learned during weeks 69-73 of streaming where I did my “boot tour” and ported JOS to a variety of boot methods.


JOS originally used a custom i386 BIOS bootloader. This is a classic approach for osdev: up to 512 bytes of hand written 16 bit real mode assembly packed into the first sector of a disk.

I wanted to move away from this though — I had the sense that using a real third party bootloader was the more professional way to go about this.

Grub

First I ported to Grub, which is a widely used, standard bootloader on Linux systems.

This requires integrating the OS with the Multiboot standard. Grub is actually designed to simply be a reference implementation of a generic boot protocol, called Multiboot. The goal is to allow different implementations of bootloaders and operating systems to all transparently interoperate with each other, as opposed to the specific bootloaders made for each OS which was common at the time of its development.

(Turns out Multiboot never really took off. Linux and BSDs already had bootloaders and boot protocols and never ported to use Multiboot. Grub supports them via implementing their specific boot protocols in addition to Multiboot. I’m not sure any mainstream OS is natively using Multiboot. Probably mostly hobby os projects.)

This integration looks like:

  • Adding a Multiboot header
  • Optionally making use of an info struct pointer in EBX

The Multiboot header is interesting. Multiboot was designed to be binary format agnostic. While there is native ELF support, OS’s need to advertise that they are Multiboot compatible by including magic bytes in the first few KB of their binary, along with some other metadata (e.g. about architecture). The multiboot conforming boot loader will scan for this header. Exotic binary formats can add basic metadata about what load address they need and have a basic form of loading be done (probably just memcpying the entire OS binary into place. The OS might need to load itself further from there if it has non-contiguous segments.)

Then, for Multiboot v1, the OS receives a pointer to an info struct in EBX. This contains useful information provided from the bootloader (cli args, memory maps, etc), which is the second major reason to use a third party bootloader.

There are two versions of the Multiboot standard. V1 is largely considered obsolete and deprecated because this method of passing a struct wasn’t extensible in a backward compatible way. An OS coded against a newer version of the struct (which might have grown) would possibly crash if loaded against an older bootloader that only provided a smaller struct (because it might dereference struct offsets that go out of bounds of the given struct).

So the Multiboot V2 standard was developed to fix this. Instead of passing a struct, it uses a TLV format where the OS receives an array of tagged values, and can interpret only those whose tags it’s aware of.

The build process is a bit nicer for Grub also compared with a custom bootloader. Instead of creating a “disk image” by concatenating a 512 byte assembly block, and my kernel, with Grub you can use an actual filesystem.

You simply create a directory with a specific directory structure, then you can use grub-mkrescue to convert that into an .iso file with some type of CD-ROM based filesystem format. (Internally it uses xorriso). You can then pass the .iso to QEMU with -cdrom instead of -drive as I was doing previously.

Limine

Limine is a newer, modern bootloader aimed at hobby OS developers. I tried it out because it’s very popular, which I now think is well deserved. In addition to implementing essentially every boot protocol, it includes its own native boot protocol with some advanced features like automatic SMP setup, which is otherwise fairly involved.

It uses a similar build process to grub-mkrescue with creating a special directory structure and running xorriso to produce an iso.

I integrated against Limine, but kept my OS as Multiboot2 since Limine’s native protocol only supported 64 bit.

BIOS vs UEFI

Everything I’ve mentioned so far has been in the context of legacy BIOS booting.

Even though I ported away from a custom bootloader to these fancy third party ones, I’m still using them in BIOS mode. I don’t know exactly what’s in these .iso files, but that means they must populate the first 512 bytes of the media with their own version of the 16 bit real mode assembly, and bootstrap from there.

But BIOS is basically obsolete — the modern way to boot a PC is UEFI.

The nice thing about integrating against a mature third party bootloader, is it abstracts the low level boot interface for you. So all you need to do is target Grub or Limine, and then you can (nearly) seamlessly boot from either BIOS or UEFI.

It was fairly easy to get this working with Limine, because Limine provides prebuilt UEFI binaries (BOOTIA32.EFI) and has good documentation.

The one tricky thing is that QEMU doesn’t come with UEFI firmware by default, unlike with BIOS (where SeaBIOS is included). So you need to get a copy of OVMF to pass to QEMU to do UEFI boot. (Conveniently there are pre-built OVMF binaries available by the Limine author).

I failed at getting UEFI booting with Grub to work on my macOS based dev setup, because I couldn’t easily find a prebuilt Grub BOOTIA32.EFI. There is a package on apt, but I didn’t have a Linux machine quickly available to investigate if I could extract the file out of that.

Even though UEFI is the more modern standard, I’m proceeding with just using BIOS simply to avoid dealing with the whole OVMF thing.

Comparison table

ProsCons
Custom– No external dependency– More finicky custom code to support, more surface area for bugs
– Doable to get basics working but nontrivial effort required to reimplement more advanced features in Grub/Limine (a boot protocol, cli args, memory map, etc)
– No UEFI support
Grub– Well tested, industrial strength
– Available prebuilt from Homebrew
– Simple build process, just use single i386-grub-mkrescue to create iso
– Difficult to get working in UEFI mode on Mac (difficult to find a prebuilt BOOTIA32.EFI)
Limine– Good documentation
– Easy to get working for both BIOS and UEFI
– Supports Multiboot/Multiboot2. Near drop in replacement for grub
– Can opt into custom boot protocol with advanced features (SMP bringup)
– Not used industrially, mostly for hobby osdev
– Not packaged in Homebrew, requires building driver tool from source (but this is trivial)

osdev journal: Gotchas with cc-runtime/libgcc

libclang_rt/libgcc are compiler runtime support library implementations, which the compiler occasionally emits calls into instead of directly inlining the codegen. Usually this is for software implementations of math operations (division/mod). Generally you’ll need a runtime support library for all but the most trivial projects.

cc-runtime is a utility library for hobby OS developers. It is a standalone version of libclang_rt, which can be included vendored into an OS build.

The main advantage for me is that it lets me use a prebuilt clang from Homebrew. The problem with prebuilt clang from Homebrew, is it doesn’t come with a libclang_rt compiled for i386 (which makes sense, why would it — I’m on an ARM64 Mac).

(This is unlike the prebuilt i386-elf-gcc in Homebrew, which does come with a prebuilt libgcc for i386).

Since it doesn’t come with libclang_rt for i386, my options are:

Option
Keep using libgcc from i386-elf-gcc in HomebrewUndesirable — the goal is to only depend on one toolchain, and here I’d depend on both clang and gcc.
Build clang and libclang_rt from sourceUndesirable — it’s convenient to avoid building the toolchain from source if possible.
Vendor in a libgcc.a binary from https://codeberg.org/osdev/libgcc-binariesUndesirable — vendoring binaries should be a last resort
Use cc-runtimeBest — No vendored binaries, no gcc dependency, no building toolchain from source

However, cc-runtime does have a gotcha. If you’re not careful, you’ll balloon your binary size.

This is because the packed branch of cc-runtime (which is the default and easiest to integrate) packs all the libclang_rt source files into a single C file, which produces a single .o file. So the final .a library has a single object file in it.

This is in contrast to libgcc.a (or a typical compilation of libclang_rt) where the .a library probably contains multiple .o files — one for each .c file.

By default, linkers will optimize and only use any .o files in the .a library that are needed. But since cc-runtime is a single .o file, the whole thing will get included! This means, the binary will potentially include many libclang_rt functions that are unused.

In my case, the size of one of my binaries went from 36k (libgcc) to 56k (cc-runtime, naive).

To work around this, you either need to use the trunk branch of cc-runtime (which doesn’t pack them all into one .c file). This is ~30 .c files and slightly more annoying to integrate into the build system.

Or, you can use some compiler/linker flags to make the linker optimization more granular and work at the function level, instead of the object file level.

Those are:

Compiler flag:

-ffunction-sections -fdata-sections

Linker flag

--gc-sections

With this, my binary size reduced to 47k. So there is still a nontrivial size increase, but the situation is slightly improved.

Ultimately, my preferred solution is the first: to use the trunk branch. The build integration is really not that bad, and the advantage is you don’t need to remember to use the special linker flag, which you’d otherwise need to ensure is in any link command for any binary that links against cc-runtime.

That said, those compiler/linker flags are probably a good idea to use anyway, so the best solution might be to do both.

Idea pools: A simple AI metaphor (WIP)

(A WIP sketch about AI, productivity, tech)


At work, every so often the product teams take a break from normal work and do a ‘hack sprint’ for working on creative, innovative ideas that aren’t necessarily relevant to hot topics in the main work streams.

This time, many of the designers used AI tools to generate code and build prototypes. Normally, they would have required a developer to collaborate with.

In the end, there were simply more hacks done in the end than otherwise would be. So in this local scope, AI didn’t put devs “out of a job” in the hack sprint because designers no longer needed them.

Instead it just allowed the same fixed pool of people to make more things happen, pulling more ideas into reality, from the infinitely deep idea pool, than before.


The “infinitely deep idea pool” is my preferred mental model here.

There’s people on one end, the pool on the other, and the people can pull ideas out of the pool into reality at a fixed rate.

Here’s productivity is defined as “ideas pulled, per person, per second”.

Improvements to tech increase that “idea pull” rate.

People become redundant when technology improves productivity, and the goal is just to maintain the status quo. Then a smaller number of people with higher productivity can pull the same number of ideas as the previously larger team with less productivity.

But often, the goal is not to just maintain the status quo. It’s way too tempting to try to outdo it, and push beyond. We want to pull more ideas out of the pool, which is always possible because the idea pools are infinitely deep.

And if that’s true, then no one becomes redundant — the team could use as much help as it can get to pull more ideas out. (People * Productivity = Ideas Pulled Per Second) This is the phenomenon I observed in the hack sprint.

But that’s an if. Some organizations might be fine to maintain the status quo, or only grow it a small amount, relative to the productivity increase. Then real redundancy is created.

But that’s only in the local scope of that organization. In the global scope, the global idea pool from which we all draw from is infinite — there will always be ideas for someone to pull.


This metaphor can help explain why technological advancements haven’t yielded the relaxation and leisure promised by past futurists. In order to really benefit like this, you need to hold your needs constant (maintain the status quo) while productivity improves. And that’s very difficult to do.

Tips for networking

I’m not a pro, but here’s what I’ve learned along the way:


Randomly add value to peoples’ lives that you want to keep in touch with.

This looks like:

  • Meet interesting people
  • Learn what they care about
  • What out for related things you see
  • Send them their way

These “things” can be serious, like useful tools, apps, or news — or can be silly, like memes.


Simply check in once in a while.


  • People want to help you, but you need to put in the work too
  • Craft good, compelling, detailed requests for help. As opposed to lazy asks.
  • If someone does something nice for you — like making an intro — always follow up with them and let them know how things went.

VIM tips + lab notes

Underrated commands:

z commands

  • z<cr> – redraw with current line at the top
  • zz – redraw with current line at middle

L and H — great for quickly going to the top or bottom of editor and browsing slightly offscreen

Enable VSCode “Editor smooth scrolling”, or find a vim smooth scrolling plugin to make ctrl+d and ctrl+u actually functionality

[[ and ]] — for quickly browsing opening brackets

Diminishing returns of worrying

Writing this just because I’ve never heard anyone talk about it before:

Worrying about things has increasing, diminishing, and negative returns, just like anything else.

The increasing returns are when a bit of worrying about something causes you to prepare for a situation or otherwise act differently in a way that benefits you.

But after a point, the worrying starts to saturate. You’re already aware of the potential problem, and more worrying e.g. doesn’t necessarily help you become more prepared or choose better actions.

Lastly, worrying even more can actively harm you. Maybe it causes undue stress or prompts you to make poor investments, relative to the likelihood of the event you’re worrying about.

So worry, but just enough.

How to be consistent and achieve success

I think the most important part of achieving consistency is detaching yourself from the outcome of each individual work session, whatever that might be. Here are some example ‘work sessions’ from my life:

  • A particular workout
  • Releasing a song
  • Releasing a blog post
  • Doing a stream
  • Making a youtube video

Attaching yourself to the outcome (e.g. number of views) will only set you up for failure, since inevitably one of the work sessions will ‘flop’.

To detach yourself from individual outcomes, you have to love the long-term journey of whatever you’re doing. The absolute most important part is simply being there, day after day, week after week, over a long period of time.

This can be compressed down to “Showing Up = Winning”.

If you can reframe the requirement for “winning” from “getting a lot of views” or “breaking my personal record” to simply “I showed up”, you give yourself a massive psychological advantage.


P.S. One extra tip:

An extra tip for the creatives: Reframe each release as another piece of progress in building your large public body of work. It may not be today, but someday, your large public body of work will make it all happen for you — and every release is a step towards that, no matter how “well” it does.

P.S. another tip

Establish the habit by simply doing the activity at the same time each week/day and scheduling your life around that as much as possible. Ideally find a time slot with the least contention against other things that come up.

For streaming, I found that Sunday afternoons was usually free and didn’t compete too much against other plans.

But the “scheduling your life around it” is where the rubber really meets the road. That’s where you prove to yourself that you consider this a high priority to you by putting your time where your mouth is.

Tips for going to conferences alone

Going to a conference alone can be an intimidating experience, but it’s completely doable (I’ve done it many times). Here are my tips:

Optional: Look people up ahead of time and reach out

If you can, try to research ahead of time people who will be attending the conference and reach out online with a LinkedIn or Twitter message. This might give you a nice head start.

Be friendly, open, and seek out others in your situation

You might be surprised how many other solo attendees are at conferences or conventions. These will be the easiest people to meet as your ‘first friends’ — don’t be afraid to approach and say hello!

Set a goal: Don’t eat dinner alone

If a conference doesn’t include dinner, set an explicit goal for yourself to not have dinner alone.

Actively try to meet people throughout the day, specifically seeking out other solo attendees who might want to get dinner later.

Exchange contact info with people you enjoyed meeting, and float the idea of possibly getting dinner if they don’t already have plans.

Detach politely from uninteresting people

Don’t spend excessively long around people you don’t connect with.

After meeting someone, if you don’t find them very interesting and would prefer to keep mingling, it’s completely acceptable to do so. You can say something like “Well it was great to meet you — I think I’d like to mingle around a bit more. Have a great conference.”

Just try to make one new friend

Don’t set the bar too high for what would make it a successful event for you. For me, if I make even one solid new friend or connection, I consider it a win.

Just try to have one takeaway from talks

This is unrelated to going solo, but like the above tip, I set the bar pretty low for what I aim to get out of talks. If I get even one solid insight, thought, or takeaway, I consider it a win. You’d be surprised how hard it is to get one solid takeaway from some talks.

Volunteer

Volunteering can be a great way to automatically meet people (organizers, other volunteers) and get in contact with well known people in the community.

Make it easy for others to strike up a conversation

You can do yourself a favor by wearing slightly more interesting clothing or accessories than you typically might. For example, for me it might be wearing a shirt for my favorite band. Or maybe something topical for the conference/convention. The goal is to give people something easy to comment one which you can talk about, and help get a conversation going, or keep one going if you run out of things to talk about.

How to do custom commands/targets in CMake

To run custom build stages in CMake, you’ll often want what I call a “custom command/target pair”:

set(CLEAN_FS_IMG ${CMAKE_CURRENT_BINARY_DIR}/clean-fs.img)
add_custom_target(CleanFsImage DEPENDS ${CLEAN_FS_IMG})
add_custom_command(
    OUTPUT ${CLEAN_FS_IMG}
    COMMAND ${CMAKE_CURRENT_BINARY_DIR}/fsformat ${CLEAN_FS_IMG} 1024 ${FS_IMG_FILES}
    WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
    DEPENDS fsformat ${FS_IMG_FILES}
    VERBATIM
)

This is an example from my CMake-rewrite of the JOS OS build system. It generates a filesystem image, using a custom host tool (fsformat). The fs image includes a number of files generated by the build process. If any of those files changes, we want to rebuild the fs image.

At the core, we want to have a CMake target, just like an executable or library, that runs an arbitrary command line instead of a compiler.

Surprisingly, we need this “command/target pair” pattern to accomplish this.

add_custom_target() alone is not correct. This target is always out of date. If we put the command there, we will regenerate the fs every build, even if nothing has changed. This makes custom targets only suitable for helper commands that must always run (e.g. perhaps a helper target to run QEMU)

add_custom_command() does implement efficient rebuilding, only when an input has changed, but is also not sufficient alone, because it does not produce a named target.

This is admittedly a matter of personal taste — I prefer having named targets available because it allows manually trigger just this target in a build command. This would not be otherwise possible with a custom command.

If you don’t have this requirement, just a custom command could be fine, since you can depend on the output file path elsewhere in your build.

The combination of both produces what I want:

  • A named target that can be independently triggered with a build command
  • A build stage that runs an arbitrary command that only rebuilds when necessary

In other words, what this pair states is:

  • Build CleanFsImage target always
  • When building it, ensure ${CLEAN_FS_IMG} is created/up to date by running whatever operation necessary (i.e. the custom command)
  • Then it’s up to the custom command to decide to run the command or not, based on if it’s necessary

A gotcha

One gotcha to be aware of is with chaining these command/target pairs.

set(FS_IMG ${CMAKE_CURRENT_BINARY_DIR}/fs.img)
add_custom_target(FsImage ALL DEPENDS ${FS_IMG})
add_custom_command(
    OUTPUT ${FS_IMG}
    COMMAND cp ${CLEAN_FS_IMG} ${FS_IMG} 
    # We cannot depend on CleanFsImage target here, because custom targets don't create
    # a rebuild dependency (only native targets do).
    DEPENDS ${CLEAN_FS_IMG}
    VERBATIM
)

In my build, I also have a FsImage target that just copies the clean one. This is one mounted by the OS, and might be mutated.

This custom command cannot depend on the CleanFsImage target, but rather must depend on the ${CLEAN_FS_IMG} path directly. That’s because custom targets don’t create a rebuild dependency (unlike native targets), just a build ordering.

In practice, the FsImage wasn’t being regenerated when the CleanFsImage was. To properly create a rebuild dependency, you must depend on the command target’s output path.

Pure GNU Make is (unfortunately) not enough

I’d love to simply use GNU Make for my C/C++ projects because it’s so simple to get started with. Unfortunately it’s lacking a few essential qualify of life features:

  • Out of tree builds
  • Automatic header dependency detection
  • Recompile on CFLAGS change
  • First class build targets

Out of tree builds

If you want your build to happen in an isolated build directory (as opposed to creating object files in your source tree), you need to implement this yourself.

It involves a lot of juggling paths. Not fun!

Automatic header dependency detection

In C/C++, when a header file changes, you must recompile all translation units (i.e. object files, roughly) that depend on (i.e. include) that header. If you don’t, the object file will become stale and none of your changes to constants, defines, or struct definitions (for example) will be picked up.

In Make rules, you typically express dependencies between source files, and object files, e.g:

%.o: %.c
  # run compiler command here

This will recompile the object file when the source file changes, but won’t recompile when any headers included by that source file change. So it’s not good enough out of the box.

To fix this, you need to manually implement this by:

  1. Passing the proper flags to the compiler to cause it to emit header dependency information. (Something like -MMD. I don’t know them exactly because that’s my whole point =)
  2. Instructing the build to include that generate dependency info (Something like
    -include $(OBJECTS:.o=.d)
    )

The generated dependency info looks like this:

pmap.c.obj: \
 kern/pmap.c \
 inc/x86.h \
 inc/types.h \
 inc/mmu.h \
 inc/error.h \
 inc/string.h \

Recompile on CFLAGS change

In addition to recompiling if headers change, you also want to recompile if any of your compiler, linker, or other dev tool flags change.

Make doesn’t provide this out of the box, you’ll also have to implement this yourself.

This is somewhat nontrivial. For an example, check out how the JOS OS (from MIT 6.828 (2018)) does it: https://github.com/offlinemark/jos/blob/1d95b3e576dd5f84b739fa3df773ae569fa2f018/kern/Makefrag#L48

First class build targets

In general, it’s nice to have build targets as a first class concept. They express source files compiled by the module, and include paths to reach headers. Targets can depend on each other and seamlessly access the headers of another target (the build system makes sure all -I flags are passed correctly).

This is also something you’d have to implement yourself, and there are probably limitations to how well your can do it in pure Make.


Make definitely has it’s place for certain tasks (easily execute commonly used command lines), but I find it hard to justify using it for anything non-trivially sized compared to more modern alternatives like CMake, Meson, Bazel, etc.

That said, large projects like Linux use it, so somehow they must make it work!