Four exercises that touch basic multithreaded and lockfree programming concepts.
Implement a program that attempts to use two threads to increment a global counter to 10,000 with each thread incrementing 5000. But make it buggy so that there are interleaving problems and the end result of the counter is less than 10,000.
Fix the above with atomics.
Implement a variant of the program: instead of simply incrementing the counter, make the counter wrap every 16 increments (as if incrementing through indices of an array of length 16). Make two threads each attempt to increment the counter (16 * 5000) times. The end state should have the counter be back at index zero. Implement it in a buggy naive way that causes the counter to often be nonzero, even if atomics are used.
Fix the above using a CAS loop.
(Bonus question for the above: Why isn’t std::atomic::compare_exchange_strong a good fit here?)
Some rough lab notes on these topics to record the current state of my knowledge. I’m not an expert, so there may be inaccuracies.
On Linux, libpthread mutexes are implemented using the underlying futex syscall
They are basically a combination of a spinlock (in userspace), backed by the kernel for wait/signal operations only when absolutely necessary (i.e. when there’s contention). In the common case of an uncontended lock acquire, there is no context switch which improves performance
The userspace spinlock portion uses atomics as spinlocks usually do, specifically because the compare and set must be atomic
Jeff Preshing (see below) writes that each OS/platform has an analogous concept to this kind of “lightweight” mutex — Windows and macOS have them too
Before futex(2), other syscalls were used for blocking. One option might have been the semaphore API, but commit 56c910668cff9131a365b98e9a91e636aace337a in glibc is before futex, and it seems like they actually use signals. (pthread_mutex_lock -> __pthread_lock (still has spinlock elements, despite being before futex) -> suspend() -> __pthread_suspend -> __pthread_wait_for_restart_signal -> sigsuspend)
A primary advantage of futex over previous implementations is that futexes only require kernel resources when there’s contention
Like atomics, mutexes implementations include memory barriers (maybe even implicitly due to atomics) to prevent loads/stores from inappropriately crossing the lock/unlock boundary due to compiler and/or hardware instruction reordering optimizations
While C++ is used here as an example, the concepts apply to any statically typed programming language that supports polymorphism.
For example, while Rust doesn’t have virtual functions and inheritance, it’s traits/dyn/Boxes are conceptually equivalent and. Rust enums are conceptually equivalent to std::variant as a closed set runtime polymorphism feature.
Yes – dynamic dispatch via vtable
Yes – dynamic dispatch via internal union tag (discriminant) and compile-time generated function pointer table
Reference – clients must operate using pointer or reference
Value – clients use value type
Open – Can add new types without recompiling (even via DLL). Clients do not need to be adjusted.
Closed – Must explicitly specify the types in the variant. Generally clients/dispatchers may need to be adjusted.
Client virtual call + virtual methods
Client function table dispatch based on union tag + copy of callable for each type in the dispatch. If doing generic dispatch (virtual function style), then also need the functions in each struct. Inlining possible.
Class definition boilerplate
Class/pure virtual methods boilerplate.
Client callsite boilerplate
std::visit() boilerplate can be onerous.
Must handle all cases in dispatch?
No support — the best you can do is an error-prone chain of dynamic_cast<>. If you need this, virtual functions are not the best tool.
Yes, can support this.
Overall, virtual functions and std::variant are similar, though not completely interchangeable features. Both allow runtime polymorphism, however each has different strengths.
Virtual functions excels when the interface/concept for the objects is highly uniform and the focus is around code/methods; this allows callsites to be generic and avoid manual type checking of objects. Adding a const data member to the public virtual interface is awkward and must go through a virtual call.
std::variant excels when the alternative types are highly heterogenous, containing different data members, and the focus is on data. The dispatch/matching allows one to safely and maintainably handle the different cases, and be forced to update when a new alternative type is added. Accessing data members is much more ergonomic than for virtual functions, but the opposite is true for generic function dispatch across all alternative types, because the std::visit() is not ergonomic.
Building on these low level primitives, one can build:
Component pattern (using virtual functions typically) (value semantics technique; moves away from static typing and towards runtime typing)
Type erase pattern (also virtual functions internally) (value semantics wrapper over virtual functions)
Rust also has exactly these, but just with different names and syntax. The ergonomics and implementation are different, but the concepts are the same. Rust uses fat pointers instead of normal pointer pointing to a vtable. Rust’s match syntax is more ergonomic for the variant-equivalent. Rust uses fat pointers apparently because it allows “attaching a vtable to an object whose memory layout you cannot control” which is apparently required due to Rust Traits. (source)
Go uses type erasure internally, but offers this as a first class language feature.
Case study: Component pattern
The component pattern is a typical API layer alternative to classical virtual functions. With classical runtime polymorphism via virtual functions, the virtual functions and inheritance are directly exposed to the client — the client must use reference semantics and does direct invocation of virtual calls.
With the component pattern, virtual functions are removed from the API layer. Clients use value semantics and then “look up” a component for behavior that would have previously been inherited.
API classes, instead of inheriting, contain a container of Components, who are themselves runtime polymorphic objects of heterogenous types. The components can classically use virtual functions for this, inheriting from some parent class. Then the API class contains a container of pointers to the parent class. API clients look up the component they are interested in via its type, and the API class implements a lookup method that iterates the components and identifies the right one using dynamic_cast or similar.
However, variants offer another way to implement this. Rather than having all components inherit from the superclass, they can be separate classes that are included in a variant. The API class then has a container of this variant type. In the lookup method, instead of using dynamic_cast, it uses std::holds_alternative which is conceptually equal.
This is a somewhat unusual application of runtime polymorphism and neither implementation method stands out as strictly better. Since components do not share a common interface really (they would just inherit so they can be stored heterogenously in a container), virtual functions does not offer a strong benefit. But also since the component objects are never dispatched on (they are always explicitly looked up by type), the variant method also does not offer a strong benefit.
The main difference in this scenario is the core difference between virtual functions and variants: whether the set of “child” types is open or closed. With virtual functions being open, it offers the advantage that new components can be added by simply inheriting from the parent class and no existing code needs to be touched. Potentially new components could even be loaded dynamically and this would work.
With variants, when new components are added, the core definition of the component variant needs to be adjusted to include the new type. No dynamic loading is supported.
So it appears that virtual functions have slight advantage here.
std::any is loosely similar to virtual functions or std::variant in that it implements type erasure, allowing a set of heterogenous objects of different types, to be referenced using a single type. Virtual functions and std::variant aren’t typically called “type erasure” as far as I’ve heard, but this is effectively what they do.
However that’s where the similarities end. std::any represents type erasure, but not any kind of object polymorphism. With std::any, there is no notion of a common interface that can be exercised across a variety of types. In fact, there is basically nothing you can do with a std::any but store it and copy it. In order to extract the internally stored object, it must be queried using its type (via std::any_cast()) which tends to defeat the purpose of polymorphism.
std::any is exclusively designed to replace instances where you might have previously used a void * in C code, offering improved type safety and possibly efficiency. 1 The classic use case is implementing a library that allows clients to pass in some context object that will later be passed to callbacks supplied by the client.
For this use case, the library must be able to store and retrieve the user’s context object. It’s it. It literally never will interpret the object or access it in any other way. This is why std::any fits here.
Another use case for std::any might be the component pattern in C++, where objects store a list of components, which are then explicitly queried for by client code. In this case, the framework also never deals directly with the components, but simply stores and exposes the to clients on request.
Turns out, we do have them. libstdc++ contains a heuristic kludge where it checks if libpthread is linked into the process and conditionally executes an atomic add or non-atomic add depending on the result. Post 2020, it doesn’t directly use the libpthread hack, but uses glibc’s native support for this.
These kinds of things always make me sad about C++ (especially compared to how nice and clean things seem to be in Rust world), but at least it’s nice knowing that the committee did consider it and there are good reasons why to not have this in the C++ stdlib. The most compelling reason is that in the absence of a borrow checker, it is all too easy to silently use a non-atomic shared_ptr in multithreaded code.
LLDB supports custom scripts (“variable formatters”) to pretty print C++ data structures. For example, std::vector is typically implemented as a struct with three pointers: begin, end, and capacity. But if you wanted to print out a std::vector variable during a debugging session, printing out these three pointers isn’t likely to be helpful. What you actually want is to print the contents of the vector. Pretty printer scripts allow for doing this for your own data structures.1
Arthur makes a strong case against use of this flag. The clang docs generally agree, though offering a slightly more lenient position:
If you do use -Weverything then we advise that you address all new compiler diagnostics as they get added to Clang, either by fixing everything they find or explicitly disabling that diagnostic with its corresponding Wno- option.
But what I have not seen clearly expressed is how use of -Weverything is a tradeoff.
If you’re already a 10x engineer, you probably won’t need this article. But for the rest of us, this is what I wish I knew about clang-format as an inexperienced C++ programmer: how to only format the changes in your pull request.
You may have already heard of clang-format. It auto-formats source files for languages including C and C++. You can aim it at a source file and format the entire thing using clang-format -i file.cpp.
If you’re contributing to a project that is already 100% clang-format clean, then this workflow works fine. But you’ll occasionally encounter a project that is not quite 100% formatted, such as LLVM, osquery, or Electron1.
For these projects, the “format entire files” workflow doesn’t work because you’ll incidentally format parts of the files that are unrelated to your contribution. This will add noise to your diff and make it harder for your reviewers.
In this case, you need a way to surgically format only the lines changed in your contribution. To do this, you can use the clang-format git extension. This article will cover the basics of git-clang-format, including a practical workflow that allows for messy development, and formatting at the end.
This post details my adventures with the Linux virtual memory subsystem, and my discovery of a creative way to taunt the OOM (out of memory) killer by accumulating memory in the kernel, rather than in userspace.
Keep reading and you’ll learn:
Internal details of the Linux kernel’s demand paging implementation
How to exploit virtual memory to implement highly efficient sparse data structures
What page tables are and how to calculate the memory overhead incurred by them
A cute way to get killed by the OOM killer while appearing to consume very little memory (great for parties)
Note: Victor Michel wrote a great follow up to this post here.