Monthly Archives: July 2023

TIL: ARM64 doesn’t include conditional instructions

A major difference between x86 and ARM32 is that while x86 generally1 only offers conditional execution of branch instructions (e.g. BNE) ARM32 offers conditional execution for many more instructions (e.g. ALU instructions, like ADD.EQ- add if not equal). The idea was that these can be used to avoid emitting traditional branching instruction sequences which may suffer from pipeline stalls when a branch is mispredicted โ€” instead, straight line code with equivalent semantics can be emitted.

This was actually removed in ARM64. The official quote:

The A64 instruction set does not include the concept of predicated or conditional execution. Benchmarking shows that modern branch predictors work well enough that predicated execution of instructions does not offer sufficient benefit to justify its significant use of opcode space, and its implementation cost in advanced implementations.

Turns out that the whole pipeline stall problem is generally not a huge issue anymore as branch prediction has gotten so good, while support for the feature still requires allocating valuable instructions bits to encode the conditions. Note that ARM64 still uses 32 bit instructions, so conserving bits is still useful.

What is very interesting is that Intel’s recent APX extensions to x86-64 (whose purpose is to primarily add more general purpose registers) moves closer to this conditional instructions direction.

The performance features introduced so far will have limited impact in workloads that suffer from a large number of conditional branch mispredictions. As out-of-order CPUs continue to become deeper and wider, the cost of mispredictions increasingly dominates performance of such workloads. Branch predictor improvements can mitigate this to a limited extent only as data-dependent branches are fundamentally hard to predict.

To address this growing performance issue, we significantly expand the conditional instruction set of x86, which was first introduced with the Intelยฎ Pentiumยฎ Pro in the form of CMOV/SET instructions. These instructions are used quite extensively by todayโ€™s compilers, but they are too limited for broader use of if-conversion (a compiler optimization that replaces branches with conditional instructions).”

…including support for conditional loads and stores which is apparently tricky with modern out of order and superscalar architectures.

What cured my writer’s/publisher’s block

I’ve managed to make writing fun again and have been publishing a lot on my blog lately.

It’s due to three reasons:

1 – Having explicit content categories creates the freedom to post things that aren’t highly effort intensive deep dives.

By introducing explicit post categories (i.e. Micropost, Lab Notes, Essay, Deep Dive, etc) I feel much more free to post without the expectation that every post needs to be a time consuming, deeply researched technical post (which are the kind of posts that are popular).

I don’t have time for those lately, but that doesn’t mean I can’t write or post anything! Explicitly tagging something as a micropost or lab notes takes a lot of the pressure off, and makes me much more willing to write and publish.

2 – I stopped sharing on Twitter.

The curse of growing an audience is that posting to that audience has increasing weight as the audience grows, which creates stress and friction about posting. What if I post something that people don’t like and I lose a bunch of followers? What if I’m straight up wrong? What if I want to share something but don’t have time to fully polish it and people judge me? What if I post too often?

It’s also distracting to share on Twitter, and very difficult to not monitor notifications afterwards.

Furthermore, writing on Twitter was actually difficult in that it took extra energy to abide by the character limits, or fit things into threads otherwise. Writing freeform of my own blog is easier in this regard.

Not posting on Twitter removes a nontrivial amount of friction and stress that used to prevent me from sharing.

3 – Using WordPress (i.e. not a static site generator) removes a ton of friction.

The fact that I can click buttons in a web interface, write and post as easily as sending a tweet makes all the difference.

It’s so nice being able to change the website without coding. For instance, I just added a “Popular posts” feature in the sidebar in the last 5 minutes. Turns out Jetpack already has the feature included and I just had to enable it. I don’t even want to imagine what it would have taken to implement that by hand or with a static site generator.

Though not as fast a static site, the blog loads fast enough and I’m more than happy to take the performance hit.

It’s also awesome that I can quickly edit posts to fix typos, even from my phone with the WordPress app. I do this very often.

(bonus reason: It’s also due to the realization that posts don’t have to be long, can be written in one sitting, and don’t have to be absolutely perfect!)

How to not feel dumb around “smart” programmers

When going to programming meetups, conferences, or starting a new job, it can be extremely easy to feel dumb around other engineers there. Sometimes these engineers truly are geniuses, but many times they’re not as “smart” as your immediate imposter syndrome is leading you to believe.

My advice, especially for beginners, is to remember that these people are probably talking about a domain they are extremely familiar with, and have spent probably 1000x the time with than you. You probably have stumbled upon to a topic that they are specifically knowledgeable about, which gives the impression that they’re ultra smart (and you’re ultra dumb, for not knowing anything about it).

The key is to remember that this impressive depth of knowledge is likely limited in breadth. If you were talking about your domain, you’d might know a lot more than them.

It’s also easy to feel dumb when watching a conference talk. My trick here is to remember that the reason this person can speak so confidently about such a technical topic is because, again, they’ve spent 1000x more time on it than you have. They’re had their face pressed directly against this problem for a long time, which is why they know all this and can talk about it.

This comes from my experience:

  • Entering the programming world (feeling dumb at hacker club in college)
  • Enter the infosec world (feeling dumb at infosec conferences)
  • Going to conferences not directly in my expertise (i.e. the LLVM Dev Mtg)
  • Entering the audio programming world (feeling dumb at audio conferences)
  • Feeling dumb in many, many conference talks

Distance between application programming and standard library programming

This is a rough, potentially half-baked thought:

This might an interesting metric to compare programming languages: distance between application programming and standard library programming.

In general, standard library programming is more difficult than average application programming for any programming language. Or at the very least “different” in some way — language features used, programming style required, etc. That difference might vary depending on the language.

Mainly, I’m thinking about C++, where standard library programming requires expertise in C++ template meta-programming (TMP) (i.e. to implement std::variant), which is effectively an entirely different programming language, existing in the C++ type system. While many application developers may also be competent in TMP, there are many that aren’t (including myself). It’s possible to be a very productive C++ programmer, and STL user, without being an expert at implementing generic libraries using TMP.

Given this, my impression that C++ has a relatively high distance between application programming and stdlib programming.

Python of course also has some distance here. I’m not qualified to speak on it, but I can imagine it also involved more obscure language feature that do not occur often in normal app development. I would guess that this distance is less than C++.

Lastly, C is an interesting language to consider because it offers such a spartan feature set that there aren’t particularly that many more language features available to be used. (I’m probably wrong here and there are obscure things I’m not aware of, including symbol versioning for compatibility). But in general, my assumption would be that C has a some limited distance, given it’s restrained feature set.

Ultimately, this metric is probably impossible to quantify and may have limited value, but is something I find intriguing anyway if it were to exist.

“What could go wrong” considered harmful

The retort “What could go wrong” is one of my big pet peeves.

It’s often used in response to a failure of a complex system or operation. Sometimes the system had clearly poor design, making it warranted. But more often these comments reek of hindsight bias and carry an arrogance โ€” as if the speaker could have easily avoided the failure if they were the one in charge.

It’s possible to construct that kind of retort for almost anything if you try hard enough:

  • “Flying massive metal tubes around in the sky filled with hundreds of people, what could go wrong?”
  • “Cementing metal wires into the mouths of children, what could go wrong?”
  • “Shooting lasers into peoples’ eyes, what would go wrong?”

But if you did so, you’d actually be wrong a lot โ€” because there are many complex systems that function correctly for most users, most of the time. They function because many people have poured blood, sweat, and human ingenuity into them to make them reliable. And it’s often not intuitive that they can work.

Even if sometimes people are truly negligent and deserve it, I don’t find the phrase of net benefit to the culture. I consider it harmful because it ups the consequences of failure โ€” a necessity for innovation โ€” in exchange for cheap virtue signaling from bystanders who often have no experience in the domain.

So rather than assuming incompetence, let’s all be a bit more charitable. The world is complex and less intuitive than it looks.

Core devs are not necessarily product experts

A common misconception I long held is that core devs of a product must be the top product experts.

The reality is that, as a core dev

  • there isn’t enough time to be both a dev and a user
  • knowledge of the implementation taints you and can prevent you from seeing the product clearly
  • it’s extremely difficult to focus deep on the details, and also view the product from a high level, holistically

Yes, you will have the absolute expertise on certain product behaviors or possibilities. But almost certainly never for the whole product at once; just the part you’ve been working on lately, where the knowledge is most fresh.

This is why it’s so important to surround yourself with “power users” โ€” those that are untainted and unburdened by the implementation, and can use their full mental power to absorb and innovate on the product simply as a product.

These are often the people that find the most interesting uses and abuses of systems, that the core devs weren’t even aware of.

This can happen for any kind of product, including and especially programming languages. Many of the interesting programming “patterns” are created not by the developers of the language, but by “power users”.[citation needed]

WIP: Reflections on audience building

Here are a few reflections after building two separate online audiences with 1000+ followers (comfort and offlinemark).

Audiences require active maintenance .

  • If you don’t actively maintain the audience by regularly posting content, the actual effective audience size declines over time (despite the concrete number staying the same) as audience members become inactive on the platform.

They grow in fits in bursts.

  • In my experience, the audience growth was not linear, but occurred in fits and bursts of about a hundred followers for major events or releases. Sometimes things just pop off. For offlinemark there were a few major events that got me several hundreds of followers each time. Plus, these tweets sometimes get retweeted and have second lives on the internet, getting me a bunch of new followers for free.
    • Demand paging blog post
    • /proc/self/mem blog post
    • Git tweet about referencing commits by commit message
    • Thread about forking being unsafe in real-time contexts due to page faults
    • A few other tweets.

Audiences grow within some specific context where you’ve established reputation, and do not engage with content that’s outside that context.

  • offlinemark is within the technical context and other kinds of content like about my music or other thinking is not engaged with nearly as much.

Releasing content to a large audience can be very distracting.

  • It’s extremely difficult to have content being going viral and not be glued to your notifications.
  • What if you’re wrong in a major way? What if you’re getting cancelled somehow for something you said?
  • At the very least, it’s an incredible amount of dopamine and affirmation that takes immense self control to not bask in.

Goal: Become a pro systems programmer

That’s a personal goal of mine. I’m not sure why, but I think it comes from spending a long time in computer security and feeling unsatisfied with just breaking “low level” software โ€” I want to be able to build it too.

But what does “pro” mean? What does “systems programmer” even mean?

Answer #1: Here are some people that come to mind

Answer #2:

(Incomplete list)

Little skillSome skillPractical experience with
Fluent in a systems programming language. Able to write idiomatic code.x
Understanding of modern systems programming concepts โ€” e.g. smart pointers.x
Understanding of operating system and hardware/software interfacex
Competent with C/C++ build concepts โ€” e.g. headers, source, compilation, linkingx
Able to engage professionally in open sourcex
Competent with git โ€” e.g. can rebase, rewrite history and prepare perfect patch series (no stray newlines, commits build on each other, etc)x
Understanding of multithreaded programming and concurrency. Able to write idiomatic multithreaded code.x
Skilled with development tools โ€” e.g. debuggerx
Competent with performance profilingx
Understanding of how to model data with a static type system โ€” e.g. static vs dynamic dispatch, polymorphism, OOP concepts, reference vs value semantics.x
Understanding of I/O programming, esp. async I/O.x
Understanding of different OS platforms โ€” e.g Linux, Windows, macOSx
Experience with a variety of architecturesx


  • Become highly skilled in C++ template meta-programming


  • 2014 โ€” First professional Linux kernel experience (internship)
  • 2016 โ€” Landed a job working on symbolic execution engine (Trail of Bits)
  • 2019 โ€” First professional C++ work (Trail of Bits / osquery)
  • 2019 โ€” Landed commit in lldb
  • 2020 โ€” Landed commit in Linux kernel
  • 2021 โ€” Landed a full time C++ job (Ableton)
  • 2021 โ€” Built significant git competence (Ableton)
  • 2023 โ€” Built significant Linux kernel & hardware / software interface knowledge at work (Ableton)
  • 2023 โ€” Built knowledge of data modeling with static type system (Ableton)

Runtime polymorphism cheat sheet

While C++ is used here as an example, the concepts apply to any statically typed programming language that supports polymorphism.

For example, while Rust doesn’t have virtual functions and inheritance, it’s traits/dyn/Boxes are conceptually equivalent and. Rust enums are conceptually equivalent to std::variant as a closed set runtime polymorphism feature.

Virtual Functions/Inheritancestd::variant
Runtime PolymorphismYes – dynamic dispatch via vtableYes – dynamic dispatch via internal union tag (discriminant) and compile-time generated function pointer table
SemanticsReference – clients must operate using pointer or referenceValue – clients use value type
Open/Closed?Open – Can add new types without recompiling (even via DLL). Clients do not need to be adjusted.Closed – Must explicitly specify the types in the variant. Generally clients/dispatchers may need to be adjusted.
CodegenClient virtual call + virtual methodsClient function table dispatch based on union tag + copy of callable for each type in the dispatch. If doing generic dispatch (virtual function style), then also need the functions in each struct. Inlining possible.
Class definition boilerplateClass/pure virtual methods boilerplate.Almost none.
Client callsite boilerplateAlmost nonestd::visit() boilerplate can be onerous.
Must handle all cases in dispatch?No support โ€” the best you can do is an error-prone chain of dynamic_cast<>. If you need this, virtual functions are not the best tool.Yes, can support this.

Overall, virtual functions and std::variant are similar, though not completely interchangeable features. Both allow runtime polymorphism, however each has different strengths.

Virtual functions excels when the interface/concept for the objects is highly uniform and the focus is around code/methods; this allows callsites to be generic and avoid manual type checking of objects. Adding a const data member to the public virtual interface is awkward and must go through a virtual call.

std::variant excels when the alternative types are highly heterogenous, containing different data members, and the focus is on data. The dispatch/matching allows one to safely and maintainably handle the different cases, and be forced to update when a new alternative type is added. Accessing data members is much more ergonomic than for virtual functions, but the opposite is true for generic function dispatch across all alternative types, because the std::visit() is not ergonomic.

Building on these low level primitives, one can build:

  • Component pattern (using virtual functions typically) (value semantics technique; moves away from static typing and towards runtime typing)
  • Type erase pattern (also virtual functions internally) (value semantics wrapper over virtual functions)

Fun facts:

  • Rust also has exactly these, but just with different names and syntax. The ergonomics and implementation are different, but the concepts are the same. Rust uses fat pointers instead of normal pointer pointing to a vtable. Rust’s match syntax is more ergonomic for the variant-equivalent. Rust uses fat pointers apparently because it allows “attaching a vtable to an object whose memory layout you cannot control” which is apparently required due to Rust Traits. (source)
  • Go uses type erasure internally, but offers this as a first class language feature.

Case study: Component pattern

The component pattern is a typical API layer alternative to classical virtual functions. With classical runtime polymorphism via virtual functions, the virtual functions and inheritance are directly exposed to the client โ€” the client must use reference semantics and does direct invocation of virtual calls.

With the component pattern, virtual functions are removed from the API layer. Clients use value semantics and then “look up” a component for behavior that would have previously been inherited.

API classes, instead of inheriting, contain a container of Components, who are themselves runtime polymorphic objects of heterogenous types. The components can classically use virtual functions for this, inheriting from some parent class. Then the API class contains a container of pointers to the parent class. API clients look up the component they are interested in via its type, and the API class implements a lookup method that iterates the components and identifies the right one using dynamic_cast or similar.

However, variants offer another way to implement this. Rather than having all components inherit from the superclass, they can be separate classes that are included in a variant. The API class then has a container of this variant type. In the lookup method, instead of using dynamic_cast, it uses std::holds_alternative which is conceptually equal.

This is a somewhat unusual application of runtime polymorphism and neither implementation method stands out as strictly better. Since components do not share a common interface really (they would just inherit so they can be stored heterogenously in a container), virtual functions does not offer a strong benefit. But also since the component objects are never dispatched on (they are always explicitly looked up by type), the variant method also does not offer a strong benefit.

The main difference in this scenario is the core difference between virtual functions and variants: whether the set of “child” types is open or closed. With virtual functions being open, it offers the advantage that new components can be added by simply inheriting from the parent class and no existing code needs to be touched. Potentially new components could even be loaded dynamically and this would work.

With variants, when new components are added, the core definition of the component variant needs to be adjusted to include the new type. No dynamic loading is supported.

So it appears that virtual functions have slight advantage here.


Q: What about std::any?

std::any is loosely similar to virtual functions or std::variant in that it implements type erasure, allowing a set of heterogenous objects of different types, to be referenced using a single type. Virtual functions and std::variant aren’t typically called “type erasure” as far as I’ve heard, but this is effectively what they do.

However that’s where the similarities end. std::any represents type erasure, but not any kind of object polymorphism. With std::any, there is no notion of a common interface that can be exercised across a variety of types. In fact, there is basically nothing you can do with a std::any but store it and copy it. In order to extract the internally stored object, it must be queried using its type (via std::any_cast()) which tends to defeat the purpose of polymorphism.

std::any is exclusively designed to replace instances where you might have previously used a void * in C code, offering improved type safety and possibly efficiency. 1 The classic use case is implementing a library that allows clients to pass in some context object that will later be passed to callbacks supplied by the client.

For this use case, the library must be able to store and retrieve the user’s context object. It’s it. It literally never will interpret the object or access it in any other way. This is why std::any fits here.

Another use case for std::any might be the component pattern in C++, where objects store a list of components, which are then explicitly queried for by client code. In this case, the framework also never deals directly with the components, but simply stores and exposes the to clients on request.


Smartphone anxiety

Why do I have such anxiety when my phone isn’t on me, which prompts me to periodically tap my pocket to check on it?

A few key properties of phones:

  • Will cause significant pain and expense, in both time and money if lost
  • Small
  • Carried around everywhere

Beyond keys and wallets, there are not many other objects that can inflict so much pain & expense on one’s life if lost, while simultaneously being small and carried everywhere, which increases the chance of losing it. Thus producing a natural reaction to check if it’s there, and anxiety if not found. And furthermore, unlike keys and wallets, phones actively provide dopamine.