Building the Actor Runtime

By the end of Part 2, the compiler could emit C and produce binaries — but only for sequential programs. Fibonacci was running; concurrency was not. The whole premise of Hew is that every concurrent unit owns its state and processes messages one at a time, and none of that machinery existed yet.

Prototyping in Rust, Shipping in C

If the codegen target is C, why prototype the runtime in Rust first? Because the design space was wide open. The first prototype sketched out mailboxes with overflow policies (Block, DropNew, DropOld, Fail, Coalesce), supervision strategies (OneForOne, OneForAll, RestForOne), and an actor state machine (Init → Running → Waiting/Stopping → Stopped/Crashed). Rust let me iterate on abstractions without fighting memory management at the same time. Once the API stopped changing every hour, I threw it away and rewrote in C.

The C runtime started simple: one OS thread per actor, each with a linked-list mailbox protected by a mutex and condition variable.

typedef struct hew_msg_node {
    int msg_type;
    void* data;
    size_t data_size;
    struct hew_msg_node* next;
} hew_msg_node;

typedef struct {
    hew_msg_node* head;
    hew_msg_node* tail;
    pthread_mutex_t lock;
    pthread_cond_t not_empty;
    int closed;
} hew_mailbox;

A sender locks the mailbox, appends a message node, signals the condition variable, unlocks. The receiver blocks on pthread_cond_wait until something arrives. The closed flag handles graceful shutdown — once set, the receiver drains remaining messages and exits. Nothing clever. Just pthreads doing what pthreads does.

The First Real Actor Program

If you’ve built concurrent systems before, you know the moment — first time output from two threads interleaves in your terminal and everything is correct. The counter program was that moment:

actor Counter {
    count: i32,

    receive fn increment(amount: i32) -> i32 {
        count = count + amount;
        println(f"  [actor] count is now: {count}");
        count
    }
}

fn main() -> i32 {
    let counter = spawn Counter(0);
    counter.increment(10);
    counter.increment(20);
    sleep_ms(50);
    stop(counter);
    0
}

Seeing [actor] count is now: 10 print from a separate thread, unprompted, on its own schedule — that landed differently than a passing test. Under the hood, spawn allocates actor state, creates a mailbox, starts a thread, and returns a handle. counter.increment(10) compiles into a mailbox enqueue. The actor’s receive function runs in a loop on its own thread, pattern-matching on incoming message types. All the plumbing disappeared behind syntax that reads like a method call.

Fault Tolerance

Working actors immediately raised the next question: what happens when one crashes? The answer, borrowed from Erlang, is supervision trees. Three strategies cover the common dependency patterns:

OneForOne — restart just the failed child. The others are unaffected.
OneForAll — restart every child. Right for tightly coupled groups where partial state is worse than a full reset.
RestForOne — restart the failed child and everything started after it. Handles ordered dependencies.

The implementation tracks child actors in a linked list, installs a signal handler for thread exits, and applies the configured strategy on failure. It’s nowhere near the depth of testing that OTP has accumulated over decades. (I’m not sure it ever will be.) But it provides the structural foundation that actor programs need to recover gracefully.

Supervision also made the scaling problem concrete. A supervisor managing a pool of workers was fine at 10 actors. At 100 it was getting heavy. At 10,000 — one OS thread per actor, each with a 2–8 MB stack — you’d exhaust address space on a 16 GB machine long before running out of work.

M:N Work-Stealing

The path forward was the same one Go, Erlang, and Tokio all converged on: M:N scheduling. A 1,165-line design document came first, then the scheduler. The architecture shift:

BEFORE (1:1)                        AFTER (M:N)
┌──────────┐ ┌──────────┐           ┌──────────────────────────┐
│ Actor A  │ │ Actor B  │           │       hew_sched          │
│ Thread A │ │ Thread B │           │  Worker 0   Worker 1     │
│ Mailbox  │ │ Mailbox  │           │  [A][B][C]  [D][E]       │
└──────────┘ └──────────┘           │      steal ←→ steal      │
  ...thousands of threads            └──────────────────────────┘
                                      N workers, M actors

Several decisions shaped the scheduler:

Configurable worker pool — HEW_WORKERS env var or auto-detect via sysconf(_SC_NPROCESSORS_ONLN).
Per-worker local run queues with lock-free steal operations. A worker processes its own queue first, then steals from a random peer.
Message budget — each actor processes at most 1,000 messages per dispatch before yielding, preventing a hot actor from starving others.
Worker parking — idle workers block on a condition variable with a 10 ms timeout, then wake to check for stolen work.
CAS for state transitions — actor states update with __atomic_compare_exchange_n, no locks needed.

The actor state machine became an explicit enum:

typedef enum {
    HEW_ACTOR_IDLE     = 0,
    HEW_ACTOR_RUNNABLE = 1,
    HEW_ACTOR_RUNNING  = 2,
    HEW_ACTOR_BLOCKED  = 3,
    HEW_ACTOR_STOPPED  = 4
} hew_actor_state;

When a message arrives for an IDLE actor, the sender CAS-transitions it to RUNNABLE and pushes it onto a worker’s run queue. The worker pops it, transitions to RUNNING, dispatches messages up to the budget, then transitions back to IDLE (if the mailbox is empty) or re-enqueues as RUNNABLE (if messages remain). Draws on ideas from Go’s GMP model and Tokio’s work-stealing pool.

The public API stayed backwards-compatible throughout the rewrite. Every actor program that had already been written — including that counter — continued to run without changes. (I was more surprised by this than I probably should have been.)