Featured post

mem writes to unwritable memory

Introduction

An obscure quirk of the /proc/*/mem pseudofile is its “punch through” semantics. Writes performed through this file will succeed even if the destination virtual memory is marked unwritable. In fact, this behavior is intentional and actively used by projects such as the Julia JIT compiler and rr debugger.

This behavior raises some questions: Is privileged code subject to virtual memory permissions? In general, to what degree can the hardware inhibit kernel memory access?

By exploring these questions¹, this article will shed light on the nuanced relationship between an operating system and the hardware it runs on. We’ll examine the constraints the CPU can impose on the kernel, and how the kernel can bypass these constraints.

Continue reading →

Featured post

Double fetches, scheduling algorithms, and onion rings

3 Replies

Most people thought I was crazy for doing this, but I spent the last few months of my gap year working as a short order cook at a family-owned fast-food restaurant. (More on this here.) I’m a programmer by trade, so I enjoyed thinking about the restaurant’s systems from a programmer’s point of view. Here’s some thoughts about two such systems.

Continue reading →

Featured post

What they don’t tell you about demand paging in school

2 Replies

This post details my adventures with the Linux virtual memory subsystem, and my discovery of a creative way to taunt the OOM (out of memory) killer by accumulating memory in the kernel, rather than in userspace.

Keep reading and you’ll learn:

Internal details of the Linux kernel’s demand paging implementation
How to exploit virtual memory to implement highly efficient sparse data structures
What page tables are and how to calculate the memory overhead incurred by them
A cute way to get killed by the OOM killer while appearing to consume very little memory (great for parties)

Note: Victor Michel wrote a great follow up to this post here.

Continue reading →

Featured post

How setjmp and longjmp work (2016)

2 Replies

Pretty recently I learned about setjmp() and longjmp(). They’re a neat pair of libc functions which allow you to save your program’s current execution context and resume it at an arbitrary point in the future (with some caveats¹). If you’re wondering why this is particularly useful, to quote the manpage, one of their main use cases is “…for dealing with errors and interrupts encountered in a low-level subroutine of a program.” These functions can be used for more sophisticated error handling than simple error code return values.

I was curious how these functions worked, so I decided to take a look at musl libc’s implementation for x86. First, I’ll explain their interfaces and show an example usage program. Next, since this post isn’t aimed at the assembly wizard, I’ll cover some basics of x86 and Linux calling convention to provide some required background knowledge. Lastly, I’ll walk through the source, line by line.

Continue reading →

Removing dates from post URLs

Leave a reply

Originally I used a YYYY/MM/DD/<slug> url scheme for my blog, which felt nice since it creates namespacing and one can also get some date context about a blog post simply from the URL.

However, I eventually removed all date context from the URLs entirely. Namespacing isn’t a real benefit in practice (name collisions are rare) and neither is date context. I also found it annoying that I couldn’t type post URLS from memory, which is occasionally useful. Plus shorter URLs is also often a plus.

To migrate to this new URL scheme without breaking links, I used the “Redirection” WordPress plugin. Yet another reason why I like WordPress.

This simple no-date naming scheme is also inspired by bloggers like Paul Graham, Patrick Collison, and Sam Altman.

blog reflection: After a few years of blogging, I slightly regret using the YYYY/MM/DD/<slug> URL scheme

The main reason is that it's occasionally nice to be able to directly type out links to popular posts from memory
— Mark (@offlinemark) March 14, 2024

Release it and move on to the next

Leave a reply

A common artist pitfall is getting too stuck on a particular piece, which builds high expectations for it when it’s eventually released. It can be painful if that piece isn’t appreciated like you hoped it would be.

A remedy is to zoom out and maintain a global perspective over all the art you’ll release in your life. In the grand scheme, this one piece is hopefully a drop in the bucket of all the many other pieces you’ll make, some of which (hopefully) will be many times better than the one you’re stuck on. Staying stuck on one prevents you from moving forward to creating those amazing future works.

This advice doesn’t apply to every artist, but I think it does for many: Release it and move on to the next.

i just get so excited to think of all the art i'll be able to make in my lifetime
— comfort (@sendcomfort) May 16, 2024

x86 kernel development lab notes

Leave a reply

Here’s what I know about x86 kernel development. The usual caveat applies for my lab notes: this is not considered a high quality document and there may be inaccuracies.

Main Processor Modes

Real Mode (16 bit)
- CPU boots into this mode for backward compatibility
- The IDT is instead the IVT here (Interrupt Vector Table)
- Legacy BIOS booting begins here — the BIOS loads the first sector of disk into memory at a fixed address and begins executing it in Real Mode.
Protected Mode (32 bit)
- Segmentation is mandatory — a minimal GDT is necessary.
- Paging is not mandatory
Long Mode (64 bit)
- Paging is mandatory

Segmentation

Originally solved the problem of CPUs having more physical memory than could be addressed with 16 bit registers. (Note, this is the opposite situation of what we have today where virtual address spaces are vastly greater than physical ones)
Introduces the concept of “segments”, which are variable length “windows” into a larger address space.

Data structures

GDT (Global Descriptor Table)

Contains “descriptors” to describe the memory segments (“windows”) available. Segment register contain effectively an index into this table.
There are “normal” descriptors which describe memory segments and “system” descriptors which point to more exotic things, like Task State Segments (TSS) or Local Descriptor Tables (LDT)
These days OSs use the GDT as little as possible, only as much as strictly necessary. On 32 bit, this looks like 4 entries that start at base 0x0, and cover the entire 32 bit space. 2 for kernel, 2 for user — 1 for code, 1 for data for each. (?)
On 64 bit GDT is totally unused (I believe?), as are nearly all segment registers(?), except FS and GS. (Why are they special? There is even a special MSR for GS?)

LDT (Local Descriptor Table)

My understanding is LDTs are really no longer used by nearly any OS. Some parts of segmentation are still required by OSs, like the GDT, but LDT is not required and almost completely unused in modern OSs.
These would contain segments only accessible to a single task, unlike the regions in the GDT (?)

IDT (Interrupt Descriptor Table)

Interrupts: Generally externally triggered, i.e. from hardware devices
Exceptions: Internally generated, I.e. division by zero exception, or software breakpoint
When the processor receives an interrupt or exception, it handles that by executing code — interrupt handler routines.
These routines are registered via the IDT — an array of descriptors that describes how to handle a particular interrupt.
Interrupts/exceptions have numbers which directly map to entries in the IDT.
IDT descriptors are a polymorphic structure — there are several kinds of entities: interrupt, trap, and task gates (maybe others – call gates?).
Interrupt/trap gates are nearly identical and differ only in their handling of the interrupt flag. They contain a pointer to code to execute. This is expressed via a segment/offset.
Task gates make use of HW task switching and offer a more “turnkey” solution for running code in a separate context when an interrupt happens — but generally aren’t used for other reasons (?). Context switch is automatic?
Task gates in the IDT point to a TSS descriptor in the GDT, which points to a TSS (?)
Some interrupt/exceptions have an associated error code, some don’t.
Interrupt gates describe a minimal privilege level required to trigger then with an int instruction — this prevents userspace from triggering arbitrary interrupts with int. If userspace tries to trigger an int without permission, that is converted in to a General Protection fault (#GP)

Hardware task switching

Although long considered obsolete in favor of software task/context switching, x86 provides significant facilities for modeling OS “tasks” at the hardware level, and including functionality for automatic context switching between them.
Hardware task switching may require copying much more machine state than is necessary. Software context switches can be optimized to copy less and be faster, which is one reason why they’re preferred.
Hardware task switching puts a fixed limit on the number of tasks in the system (?)

TSS (Task State Segment)

This is a large structure modeling a “task” (thread of execution)
Contains space for registers & execution context
Even if HW task switching is not used, one TSS is still needed as the single HW Task running on the system, which internally implements all software context switching
TSS is minimally used for stack switches when handling interrupts across privilege levels — when switching from userspace to kernel during interrupt, kernel stack is taken from TSS
Linux task_struct is probably named with “task” due to being original created for i386
The Task Register (TR) contains a descriptor pointing to the current active HW task (?)

The JOS boot process

JOS is the OS used in MIT 6.828 (2018).

Bootloader

JOS includes a small BIOS bootloader in addition to the kernel
The bootloader begins with typical 16 bit Real Mode assembly to do the typical steps to initialize the CPU (Set A20 line, etc)
Transition to protected mode
Set the stack immediately at the start of the code, and transition to C
The kernel is loaded from disk using Port IO
Loaded into physical memory around the 1MB mark, which is generally considered a “safe” area to load into. (Below the 1MB mark has various regions where devices, BIOS data, or other “things” reside and it’s best to not clobber them.)
Call into the kernel entrypoint

Early kernel boot

Receive control from the bootloader in protected mode
Transition to paging
The kernel is linked to run in high memory, starting at 0xf000000 (KERNBASE)
The transition from segmentation to paging virtual memory happens in a few steps. There’s first an initial basic transition using set of minimal page tables.
After that transition is made, a basic memory allocator is set up, which is then used to allocate memory for the production page tables which implement the production virtual memory layout used for the rest of runtime.
The minimal page tables contain two mappings:
- 1 – Identity map the first 4MB to itself
- 2 – Map the 4MB region starting at KERNBASE also to the first 4MB
One page directory entry maps a 4MB region, so only two page directory entries are needed
These page tables are constructed statically at compile time
The first identity mapping is critical because without it the kernel would crash immediate after loading CR3, because the next instruction would be unmapped. The identity mapping allows the low mem addresses the kernel resides in to remain valid until the kernel can jump to high mem
The assembly there looks a bit strange because the jump appears redundant. But all the asm labels are linked using highmem addresses, so jumping to them transitions from executing in low mem, to executing in high mem, where the kernel will remain executing for the rest of its lifetime.
Set the stack pointer to a global data/BSS section of internal storage within the kernel and enter C code

Memory allocators

The goal is transition to a production virtual memory setup
This requires allocating memory for page tables
To build the dynamic page/frame allocator, we start with a basic bump allocator
It starts allocating simply from the end of the kernel in memory. We have access to a symbol for the end of the kernel via a linker script.
Kernel queries the physical memory size of the system and dynamically allocates data structures for the dynamic page/frame allocator. This is an array of structure that correspond to each available frame of physical memory. These structures have an embedded linked list pointer (intrusive linked list) and a refcount. They are linked together into a linked list, to implement a stack data structure where frames can be popped (when allocating) and pushed (when freeing).
Using this frame allocator, pages for the production page tables are allocated.

You don’t even need to be successful

Leave a reply

Something I’ve learned through my streaming project is that you don’t even need to be super “successful” at what you’re doing to build an audience of people that are excited for you and want to support you.

Based on my experience, you just need to be:

trying hard
at something hard
consistently
in public

I wouldn’t say I’ve been so “successful” at building an OS. It currently doesn’t even really boot or do anything yet. It does not support running programs at all. All it can do is kind of initialize the hardware and slowly initialize itself to the point where it’s almost ready to run programs. (And I didn’t even write a lot of that code. A lot of it was provided by the base foundation for the course I’m following.)

And none of that matters. People are still excited about what I’m doing, even though it’s not novel in the slightest, and I’m not that “successful”. What matters is simply that I’ve been trying, a lot, and talking about it.

Writing is like exercise

Leave a reply

Sometimes I struggle to justify why I spend time writing for my blog. Here’s the argument I keep coming back to:

A regular writing habit is just like a regular exercise habit for your brain. Yes, it takes time, energy, and money, but it’s also good for you, and well worth it.

Writing and especially publishing it is just like going to the gym for your brain. Plus it increases your luck surface area and helps keep you young. So, well worth the investment.

6 months of live-streaming

Leave a reply

I’ve been live-streaming weekly for six months now, and I’ve noticed a virtuous flywheel develop.

Publicly committing to doing this activity creates pressure to follow through and do it
The more I do it, the more of a streak develops
The more the streak that develops, the more I want to keep it up and not break it (Especially if I’m very public about the streak)
The more I want to keep it up and not break it, the more I prioritize it
The more I prioritize it, the more importance it gets in my schedule, and other things are scheduled around it

This has been a powerful cycle to harness because the activity (learning OS development) is deeply aligned with my interests and aspirations — so I’ve effectively designed a system that provides positive pressure towards doing something good for me.

WIP: Humans need variety

Leave a reply

I believe this is a basic insight at the core of many aspects of human life. For example:

Variety of diet
Variety of physical positions/movement (i.e. Why you shouldn’t sit all day, or why it’s good to exercise)
Variety of physical location (i.e. Why 100% remote work is difficult for many people)
Variety of daily experience (i.e. Why people travel to other places for vacation)
Variety of people to be around (i.e. Why people can get annoyed with each other if they’re constantly around each other too much – like families)
Variety of daily occupation (i.e. Why people switch jobs every few years)

How I live-stream programming

Leave a reply

I’ve been live-streaming myself doing operating systems programming every week for 6 months now. This is not meant to be a comprehensive guide on streaming, just an overview of how I do it.

The bare basics

OBS — https://obsproject.com
Youtube or Twitch

I stream to Youtube using OBS. I use Youtube because:

It automatically archives the streams indefinitely to your channel by default. (Twitch only keeps the video for a limited time IIRC).
It exposes you to all of Youtube for potential viewership.

I tried Twitch once but didn’t get many viewers, so I stopped trying there.

A bit more (chat widget, alert boxes, chatbot)

Streamlabs — https://streamlabs.com/

I use Streamlabs’ free plan for my Chat Widget, Alert Boxes, and Chatbot. It works very well for a free offering and these add some more flair and professionalism to your stream.

Chat Widget: The messages in the chat are rendered directly into the stream video. The chat is an important part of the streaming experience, and it would be a shame to lose those messages. This ensure they are at least captured in the video.
Alert Boxes: These are visual effects that show in the stream when someone subscribes or does other actions.
Chatbot: I have a few chatbot commands for answering common questions about my tools, discord, or recommended sources. The chatbot is also useful for moderating the chat and e.g. restricting offensive language.

After using a basic OBS setup for a while, I customized it with a text box at the top that gives viewers a quick sense of what I do (“Streaming OS/Kernel Dev, Assembly & Low Level Programming”), the topic of today’s stream, and a friendly invitation to engage in the chat.

Misc tips

OBS Background removal — https://github.com/locaal-ai/obs-backgroundremoval

I used this plugin for background removal for that streamer green-screen effect (without a green-screen). It’s quality is just ok. I eventually stopped using it and just show my background.

After streaming to Youtube, be careful to not use their built in trimming tools because this removes the chat replay from the video. I used to press stream, then trim the beginning, but now I press stream and immediately start to preserve the native Youtube replay.

Resolution — Increase your screen resolution so people can read the text more easily. On my monitor, I reduce the resolution to 720p and live with the fact that it’s comically big for me when I stream.

Music — I use a 3 hour LoFi hip hop video on Youtube that explicitly claims to be No Copyright. I have OBS set to do a macOS capture which records audio. I then play it, and actually mute my laptop so I don’t hear it (it distracts me when programming) but it gets recorded in the stream.

Multi-streaming — You can multi-stream to Youtube and Twitch with paid Streamlabs, but there are other ways to do it, apparently even including using ffmpeg locally.

Youtube Thumbnails — I use CapCut, which is perfect for this. I took a screenshot of one of my streams, loaded it in CapCut, then overlay a few text objects and images on top. CapCut paid version also conveniently can remove backgrounds from people and give a colored border for that authentic Youtuber effect.

WIP: We grow old because we stop making art

1 Reply

We don’t stop playing because we grow old; we grow old because we stop playing.
George Bernard Shaw

I’d like to offer a variation of this.

We don’t stop making art because we grow old; we grow old because we stop making art.

Children are naturally curious, creative, and artistic. They freely draw, sing, and ask questions. At the core is a youthful fearlessness. They’re not afraid — of being judged or looking stupid (yet).

We lose this as we grow older. We become concerned with appearances, and learn to avoid actions that might cause us to be judged or look stupid. We become afraid.

There is something deeply healthy about engaging in a creative practice, that connects us back to this youthful fearlessness. Just like how a personal fitness practice is essential for maintaining physical function despite the natural progression of entropy, a personal creative practice is essential for resisting the tendency to become fearful.

Where do you feel creative? At a piano? Taking photos? Writing words? Cooking? Working out? Playing sports? Dancing? Look more closely — that might be your fountain of youth.

To have good ideas, stop auto-rejecting your ideas

Leave a reply

Half of having good ideas is not immediately rejecting the ideas you do have, but rather allowing yourself to respect them, give them the chance, and even consider them as worth sharing.

This is just an observation of my own shift in mental state over the last few years. I don’t consider myself particularly smart or insightful, compared to all those “wise” people with “famous quotes”.

But I’ve found that releasing myself from this automatic “self-doubt instinct” has led to a more nurturing mental space where weak, fledgling ideas have the space to potentially grow into stronger ones. And that is what eventually leads to genuinely amazing, novel ideas.

At least, I hope. We’ll see if I have one one day.