Category Archives: _Tech 💻

The difference between computer science and computer engineering

I first wondered this question as a confused 17 year-old, considering options for university — this blog post is for that kid and people like him. It’s my opinion on this after working professionally in both fields and earning a degree in one them (CE).

I think tracing each back to its roots is a good way to gain an intuition for the difference. Computer Science is ultimately derived from Mathematics, while Computer Engineering derives from Electrical Engineering.

The boundary is blurry, but CS is classically more concerned with topics like algorithms & runtime (“Big O”), data structures, and the theory of computation (“What kinds of problems can we solve with computers, what kinds can’t we?”). You’ll do many more formal proofs in CS.

CE concerns itself more with how digital computing systems work in the real world, starting from foundations in electronics (i.e. transistors), to digital logic, building up towards embedded systems, the hardware-software interface, and computer architecture. The latter is where things get blurry and bridge to the “lower levels” of CS.

In practice, most practitioners of either have some awareness of the other, although I suspect (without evidence) that CE’s tend to know more about CS than vice-versa, on average. However I also expect that CE’s tend to write “worse” code than CS’s, on average — “writing good code” and “knowing CS” do not necessarily correlate. ¹

So which should one choose?

I can only offer this scattered set of anecdotes and advice:

I knew many more CE majors that switched to CS, than the other way around. (Mostly people that just wanted to become “programmers” and realized that circuits & electronics are not for them. I’m not sure if they enjoyed the theory courses more, though.)
I have heard and personally agree that it’s easier to go “up” the stack than “down”.
Don’t give any experience building computers or doing IT support too much weight in the decision — those topics are neither CE nor CS and rather fall into the field of “IT” which is largely separate. You might be surprised how possible it is to study either CE or CS, work in these areas professionally, and not know how to physically build a computer. But in general, building a computer falls closer to CE than CS — but you will be learning how to design (some of, at a very high level) these lego pieces you are putting together.
Personally I’ve always been foremost curious “how computers actually work” and CE has served me well.

Lastly, what about “software engineering?”

Software engineering is an even newer field, derived from Computer Science, but describes the job descriptions of many (if not most) people that study either CS or CE. My view of it is that a degree in it focuses on the most practical elements of CS, focusing less (or not at all) on the theory. I don’t expect much of CE is covered it in, except for maybe a cursory overview of digital logic and computer architecture. But frankly, I don’t know any software engineering majors and am not qualified to really speak on this.

How did you choose between them?

Not rigorously. My decision was mostly based on the presence of other “engineers” in my life and a friend that studied CE. These were not good rationale for the decision, but I lucked out and had a good outcome anyway. I think I would have been fine either way, and naturally gravitated up or down the stack to the point I’m at now. Being CE did allow me to take some very cool classes in college like a very modern, practical compilers course using LLVM (unlike the more theory based compilers course taught in the CS school) (which had a direct impact on my ability to get a great job after) and a fun embedded systems class.

Back to basics: FIFOs (Named Pipes) and simple IPC logging

Leave a reply

FIFOS (“First-in-first-out”s aka Named Pipes) are are somewhat obscure part of Unix based operating systems — at least based on my personal experience. I’ve been using Linux for ten years and simply have never needed to interact with them much.

But today, I actually thought of a use for them! If you’ve ever had a program which had human readable output on stdout, diagnostics on stderr, and you wanted to observe the two as separate streams of text, without intermingled output — then FIFOs might be for you. More on that in a bit.

Youtube version of this blog post:

FIFOs 101

As background, FIFOs are one of the simpler IPC (Interprocess Communication) mechanisms. Like normal (un-named) pipes, they are a unidirectional channel for data transmission. Unlike normal pipes, they can be used between processes that don’t have a parent-child relationship. This happens via the filesystem — FIFOs exist as filesystem entities that can be operated on with the typical file API (open, close, read, write), also similar to normal pipes.

The simplest experiment you can try with them is to send data between two processes running cat and echo.

First, create the FIFO:

$ mkfifo my_fifo

Then, start “listening” on the FIFO. It will hang waiting to receive something.

$ cat my_fifo

In a separate terminal, “send a message” to the FIFO:

$ echo hello > my_fifo

This should return immediately, and you should see a message come in on the listening side:

$ cat my_fifo
hello1

FIFO-based logging, v1

Today I was working on a long-running program that produces a lot of human reading terminal output as it runs. There are some warning conditions in the code that I wanted to monitor, but not stop execution for.

I could print to stdout when the warning conditions occurred, but they’d get lost in the sea of terminal output that normally comes from the program. You can also imagine situations where a program is writing structured data to stdout, and you wouldn’t want a stray log text to be interspersed with the output.

The typical solution is to write the logs to stderr, which separates the two data streams, but as an exercise, we can try writing the logs to a FIFO, which can then be monitored by a separate process in another terminal pane. This lets us run our program with its normal output, while also monitoring the warning logs when they come.

In python, it would look like this (assuming the fifo is created manually):

import time

fifo = open('logging_fifo', 'w')

def log(msg):
    fifo.write(msg + '\n')
    fifo.flush()

i = 0
while True:
    print(i)
    i += 1

    if i % 5 == 0:
        log(f'reached a multiple of 5: {i}')

    time.sleep(.1)

In a separate pane, we can monitor for logs with a simple:

$ while true; do cat logging_fifo ; done

In all, it looks like this:

~/c/2/fifoplay ❯ python3 writer.py
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

$ while true; do cat logging_fifo ; done
reached a multiple of 5: 5
reached a multiple of 5: 10
reached a multiple of 5: 15
reached a multiple of 5: 20
reached a multiple of 5: 25

Note that the listener process must be running or else the writer will block!

FIFO-based logging, v2

An even more elegant and UNIX-y way to do this would be to still write to stderr in our writer application, but use shell redirection to redirect stderr to the fifo.

import time
import sys

i = 0
while True:
    print(i)
    i += 1

    if i % 5 == 0:
        print(f'reached a multiple of 5: {i}', file=sys.stderr)

    time.sleep(.1)

Which you then invoke as:

$ python3 writer2.py 2> logging_fifo

This yields the same result, just is more semantically correct.

Resolving git conflicts perfectly every time (using diff3)

Leave a reply

Reminder: You should unconditionally be using the diff3 merge style config with git. It’s strictly superior to the default config and provides critical context for resolving conflicts.

Instead of simply showing the the state of the original code, and then the incoming conflicting change, it also shows the code before either change was made. This lets you see the changes both sides were attempting, and mechanically reason about how to merge them.

The mechanical process is (credit to Mark Zadel for showing me):

Begin with the common ancestor code
Identify the difference between it and the original code.
Apply that difference to the incoming change. Keep only the incoming change.

The opposite direction (identify difference between middle and incoming change; apply difference to original code and keep it) also works. You can choose whichever is simpler.

Example:

Here’s a diff that happened on master.

 int main()
 {
     int x = 41;
-    return x + 1;
+    return x + 2;
 }

Here’s a diff that happened in parallel on a development branch.


 int main() 
 {
     int x = 41;
-    return x + 1;
+    int ret = x + 1;
+    return ret;
 }

Here’s the merge conflict from e.g. rebasing the branch onto master.

int main()
{
    int x = 41;
<<<<<<< HEAD
    return x + 2;
||||||| parent of 4cfa6e2 (Add intermediate variable)
    return x + 1;
=======
    int ret = x + 1;
    return ret;
>>>>>>> 4cfa6e2 (Add intermediate variable)
}

On the first side, we change the core computation. On the second side, we extract a variable.

One way to resolve the conflict is to take that change between the middle and top (x + 1 -> x + 2), then applying it to the bottom.

That produces the correct conflict resolution:

int main()
{
    int x = 41;
    int ret = x + 2;
    return ret;
}

The other way of extracting it (refactor a variable out from the top x+2 code) produces the same end result.

Links:

https://git-scm.com/docs/merge-config#Documentation/merge-config.txt-mergeconflictStyle

https://blog.nilbus.com/take-the-pain-out-of-git-conflict-resolution-use-diff3/

KPTI: The virtual memory 101 fact that’s no longer true

Leave a reply

(This is not news; just something I was surprised to learn recently.)

The classic virtual memory design for an operating system maps the kernel in the address space of every process. This improves context switch performance; switching into the kernel then requires no expensive page table reset. The kernel can run using the same page tables userspace was running with.

Typically, the kernel is mapped into the upper section of virtual memory. For example, on 32 bit Linux, the kernel is mapped into the top gigabyte. Concretely, to implement this, the page table entries mapping those kernel pages are set with the supervisor bit on. This means only privileged code (running in Ring 0 on x86) can access those pages. This is what enforces security and prevents userspace from accessing kernel memory. The MMU is therefore responsible for enforcing security.

In the world of CPU side-channel vulnerabilities this MMU enforced security boundary is no longer reliable. Specifically, the Meltdown vulnerability allows userspace to read arbitrary memory, anywhere in the virtual address space, regardless of whether the supervisor bit is set. It does this using cache-based timing side-channels that exist due to speculative execution of memory accesses.

This means that it’s no longer safe to map the kernel into the address space of userspace processes, and indeed that’s no longer done. The general name for this mitigation is “Kernel Page Table Isolation” (KPTI). As of “modern” kernels (since 5.15 for aarch64 Linux I believe),it’s on by default. (See CONFIG_UNMAP_KERNEL_AT_EL0). Context switches now must reset the page tables to a set private to the kernel.

KAISER will affect performance for anything that does system calls or interrupts: everything. Just the new instructions (CR3 manipulation) add a few hundred cycles to a syscall or interrupt. Most workloads that we have run show single-digit regressions. 5% is a good round number for what is typical. The worst we have seen is a roughly 30% regression on a loopback networking test that did a ton of syscalls and context switches.
https://lwn.net/Articles/738975/

The lesson here? Even the most seemingly fundamental knowledge about how computers work is subject to change. Don’t assume things are still as you learned them, and exercise caution and humility when discussing details of things you haven’t actively kept up with development of.

Links:

https://wiki.osdev.org/Paging

https://en.wikipedia.org/wiki/Kernel_page-table_isolation

https://lwn.net/Articles/738975/

You don’t need to load code into RAM to execute it

Leave a reply

This will be a basic fact to some, but you don’t need to load code into RAM to execute it. You can execute code straight from ROM.

In fact, this is how most computer systems boot up. After the CPU finishes initializing, it starts executing at a specific physical address which is generally mapped to some kind of Boot ROM.

(On x86, this first instruction is located at 0xFFFFFFF0, which is interestingly almost completely at the top of memory. The code there then needs to contain a jump to the rest of the actual boot code. (Source: Intel 64 and IA-32 Architectures Software Developer’s Manual, Vol 3A Section 9.1.4)

I believe ARM systems are different and the start address can vary.)

The Boot ROM — like the name suggests — is not RAM. It’s ROM. It’s a totally separate device on the memory bus offering nonvolatile storage. It’s mapped into physical memory using the mesh of digital logic that implements the physical memory mapping. (More: https://offlinemark.com/2023/08/09/how-the-hardware-software-interface-works/)

The CPU is generally not aware of what specific device is on the other end of the memory bus, servicing reads and writes. During instruction fetch, it simply issues reads to the memory bus, receives instruction data, then executes it. The data can transparently come from RAM, ROM, or potentially even some other device, provided it is fast enough.

The reason this was unintuitive to me, is because until recently I’ve only ever done “normal” programming, where programs are loaded from disk into memory before running them. This is the domain of probably 99% of programmers. And it’s not even just limited to userspace application programmers; even kernel developers have their code loaded into RAM before its run. It’s usually only the developers of very early stage bootloaders and microcontroller firmware developers that need to be aware of the CPU running code from locations other than RAM.

Links:

Wikipedia booting

SerenityOS Day 1: Debugging the piano app

Leave a reply

New screencast! I filmed myself debugging the SerenityOS Piano app having zero experience with the codebase or OS. 🐞

In this, I show my approach to working with and debugging unknown systems. A super useful skill to have!

cc @awesomekling 🫡https://t.co/5GTXmVYxG6
— Mark M (@offlinemark) September 16, 2023

I love spelunking into unknown codebases with nothing but find and grep. It’s one of the most valuable skills one can develop as a programmer imo and in this video you can see how I approach it.

This video focuses on debugging GUI event handling. At first the bug seemed related to the app’s waveform selection, but I then realized it was a more general topic with the SerenityOS GUI UX — selecting a dropdown entry retains focus, and requires an explicit escape key.

Ultimately I made progress accidentally by hitting the keyboard while the selection was still active, revealing to me that fact (which I hadn’t noticed before).

You can see my general debugging flow:

Get things building
How to run app from command line (to see stdout)?
How to print to stdout?
Using debug prints to understand the GUI event handling

Overall I’m quite impressed with SerenityOS. I only realized after looking into the code exactly how much code they had written and how fully featured the system is. Well done to the team.

4 basic multithreading & lockfree programming exercises

3 Replies

Four exercises that touch basic multithreaded and lockfree programming concepts.

Implement a program that attempts to use two threads to increment a global counter to 10,000 with each thread incrementing 5000. But make it buggy so that there are interleaving problems and the end result of the counter is less than 10,000.
Fix the above with atomics.
Implement a variant of the program: instead of simply incrementing the counter, make the counter wrap every 16 increments (as if incrementing through indices of an array of length 16). Make two threads each attempt to increment the counter (16 * 5000) times. The end state should have the counter be back at index zero. Implement it in a buggy naive way that causes the counter to often be nonzero, even if atomics are used.
Fix the above using a CAS loop.
(Bonus question for the above: Why isn’t std::atomic::compare_exchange_strong a good fit here?)

Continue reading →

How the hardware/software interface works

1 Reply

It’s really all about memory. But to start at the beginning, the rough stack looks like this:

Userspace application
Kernel driver
Hardware device

I find it easier to think about this from the middle out. On Linux, the kernel exposes hardware devices as files backed by the /dev virtual filesystem. Userspace can do normal syscalls like open, read, write, and mmap on them, as well as the less typical ioctl (for more arbitrary, device-specific functionality).¹.

The files are created by kernel drivers which are modules of kernel code whose sole purpose is to interface with and abstract hardware so it can be used by other parts of the operating system, or userspace. They are implemented implemented using internal driver “frameworks” in the kernel, e.g. the I2C or SPI frameworks. When you interface with a file in /dev, you are directly triggering callback handlers in a driver which execute in the process context.

That’s how userspace interfaces with the kernel. How do drivers interface with hardware? These days, mostly via memory mapped I/O (MMIO)². This is when device hardware “appears” at certain physical addresses, and can be interfaced with via load and store instructions using an “API” that the device defines. For example, you can read data from a sensor by simply reading a physical address, or write data out to a device by writing to an address. The technical term for the hardware component these reads/writes interface with is “registers” (i.e. memory mapped registers).

(Aside: Other than MMIO, the other main interface the kernel has with hardware is interrupts, for interrupt driven I/O processing (as opposed to polling, which is what MMIO enables). I’m not very knowledgeable about this, so I won’t get into it other than to say drivers can register handlers for specific IRQ (interrupt requests) numbers, which will be invoked by the kernel’s generic interrupt handling infrastructure.)

Using MMIOs looks a lot like embedded bare metal programming you might do on a microcontroller like a PIC or Arduino (AVR). At the lowest level, a kernel driver is really just embedded bare metal programming.

Here’s an example of a device driver for UART (serial port) hardware for ARM platforms: linux/drivers/tty/serial/amba-pl011.c. If you’re debugging an ARM Linux system via a serial connection, this is might be the driver being used to e.g. show the boot messages.

The lines like:

cr = readb(uap->port.membase + UART010_CR);

are where the real magic happens.

This is simply doing a read from a memory address derived from some base address for the device, plus some offset of the specific register in question. In this case it’s reading some control information from a Control Register.

#define UART010_CR		0x14	/* Control register. */

linux/include/linux/amba/serial.h#L28

Device interfaces may range from having just a few to many registers.

To go one step deeper down the rabbit hole, how do devices “end up” at certain physical addresses? How is this physical memory map interface implemented?³

The device/physical address mapping is implemented in digital logic outside the CPU, either on the System on Chip (SOC) (for embedded systems), or on the motherboard (PCs)⁴. The CPU’s physical interface include the address, data, and control buses. Digital logic converts bits of the address bus into signals that mutually exclusively enable devices that are physically connected to the bus. The implementations of load/store instructions in the CPU set a read/write bit appropriately in the Control bus, which lets devices know whether a read or write is happening. The data bus is where data is either transferred out from or into the CPU.

In practice, documentation for real implementations of these systems can be hard to find, unless you’re a customer of the SoC manufacturer. But there are some out there for older chips, e.g.

Here’s a block diagram for the Tegra 2 SoC architecture, which shipped in products like the Motorola Atrix 4G, Motorola Droid X2, and Motorola Photon. Obviously it’s much more complex than my description above. Other than the two CPU cores in the top left, and the data bus towards the middle, I can’t make sense of it. (link)

While not strictly a “System on Chip”, a classic PIC microcontroller has many shared characteristics of a SoC (CPU, memory, peripherals, all in one chip package), but is much more approachable.

We can see the single MIPS core connected to a variety of peripheral devices on the peripheral bus. There’s even layers of peripheral bussing, with a “Peripheral Bridge” connected to a second peripheral bus for things like I2C and SPI.

TIL: Debugging microcontrollers may require hardware breakpoints

Leave a reply

I was debugging a microcontroller recently and was surprised to see that I was limited to 4 breakpoints. That surprised me because I’m usually only used to seeing that kind of limitation when using watchpoints, not breakpoints. (See my youtube video for more on this).

But it makes sense after all — code running on this microcontroller is different than in a process in an OS’s userspace because the code will probably be flashed into some kind of ROM (Read Only Memory) (or EEPROM). Since the code is unwritable at a hardware level, the typical method of inserting software breakpoint instructions isn’t possible. So the alternative is to use the microprocessor’s support for hardware debugging, which occupies hardware resources and is thus finite.

WIP: Business models can have significant technical impact

Leave a reply

Business models are an interesting topic that lives directly at the boundary between business and technology.

The business model describes an abstract plan for how the business is to function sustainably, and is a full time job to develop in and of itself. Then, it must be implemented in the product using technology.

For example, a typical SaaS business model involves subscription pricing at various intervals (monthly & yearly), with a discount given for the yearly plan in exchange for more money paid up front and longer commitment. There may or may not be a limited time free trial period, or alternatively a limited time guaranteed refund. Furthermore, there may be different pricing tiers that unlock more advanced features.

Ultimately, this is all going to end up as code that models the various pricing tiers, plans, timing deadlines, and enforces security (i.e. making sure the advanced features are only available to pro users). Stripe is a common service used to model arbitrary business models and execute payments processing. They provide client libraries offering data models for concepts like users and plans.

While subscription models have grown in popularity as software shifts to the web, it’s not the only model. The older model of buying discrete software packaged versions still exists, mostly used by vendors of desktop software. In this model, customers pay a larger sum for unlimited use of a specific major version of software. Then they can optionally pay again to upgrade to a newer major version when it’s released. A variant of this exists where there’s no upgrade fee (i.e. “lifetime free updates”). Another model exists called “Rent to own” which is like a subscription, except payments stop at a certain point after which there is free unlimited use.

Whether the software runs on the client or server is another dimension to consider which may influence the business model — server side software is most commonly sold using subscriptions these days.

	Client	Server
Subscription	Adobe	SaaS Web Apps
One time payment per version	Dash, Ableton Live
One time payment, lifetime free updates	FL Studio, Tailwind CSS (technically more assets than software)	SaaS Web Apps Lifetime Plans (e.g Roam Research)
Rent to own	Audio plugins via Splice

The business model impacts technical strategy in a few ways.

Branching and release management
Compatibility of document artifacts

Subscriptions, “lifetime free updates”, and “rent to own” are simplest in terms of branching and release management. All users are expected to run the latest software (because it’s either free or automatic, in the case of server side), so development can largely happen on a single main branch which releases are cut from.

“One time payment per version” is more complex because potentially multiple major versions of the software need to be maintained in parallel. Ideally bugfixes in Version 2 would also be applied to V3 and V4 when appropriate, but no new feature development should make its way back to V2 and risk being included in binaries send to users that only paid for V2. For this situation, long term release branches make sense as they provide code isolation, though they require some mechanism to forward bugfixes, etc.

The second aspect is document artifact compatibility. Again in this scenario, subscriptions, “lifetime free updates”, and “rent to own” are simpler in that all users can be assumed to be running on the latest version — or at least can be told to upgrade at no cost.

“One time payment per version” is again more complex because multiple versions of the software exist in the wild, all producing artifacts of slightly different versions. If artifacts may be sent between users of different versions, it creates a mess of trying dealing with all the compatibility issues that may arise.