https://github.com/offlinemark/suicide
You know the common C idiom, “undefined behavior can format your hard drive”?
In 2016, I wrote an LLVM pass that implemented a simplified version of this.
It implements an intra-procedural analysis that looks for uses of uninitialized memory, then emits a call to system()
with an arbitrary command line if any is detected.
Here’s roughly how the analysis works:
- For every function:
- Discover all local variables (by finding all
alloca
LLVM IR instructions)
- Discover all local variables (by finding all
- For each local variable, find any reads of that variable before any writes happen to it. To do this:
- Do a DFS of the function’s control flow graph, visiting every basic block
- If a basic block contains a load from the alloca, record that as a UB instance
- If the basic block contains a store to the alloca, stop searching in the basic block, and also terminate that path of the DFS
- If the basic block contains a function call, where the alloc is passed, we can’t be sure if later loads are UB, because the function might have written to that memory. (We don’t know, because this is just an intra-procedural analysis). Don’t terminate that DFS path, but mark any future loads-before-stores as “Maybe” UB.
That’s it. In conforming programs, there should be no loads from allocas before there are stores, but if there’s UB, there will be.
Then for every UB load, emit a call to system()
at that point.
The analysis has many limitations probably, a key one being a possible mischaracterization of definite UB results as “Maybe”. This is because each basic block is only visited once. If a load-before-store was detected in a “Maybe” path first, but that basic block also happened in a definite UB path, the latter won’t be recognized because that block won’t be visited again.
Overall, a fun project and nice exercise to try out LLVM.