Throwing Away the First Compiler

· 6 min read
Building Hew: Part 5 of 16

The Rust compiler had tests passing. It could parse Hew, type-check it, and emit LLVM IR through Inkwell. By the end of February 13, function codegen, control flow, structs, enums, string interpolation, actors, lambdas, and impl blocks were all working. You don’t walk away from that lightly.

(I walked away from it.)

The Ceiling

The Inkwell scaffold went up at 9:01 AM on February 13, and by 11:15 AM the first end-to-end test was passing. The pace felt great. Over the next several hours, feature after feature landed — object emission and linking, actor codegen with state structs, lambda closures. But with each feature, the same friction kept showing up.

LLVM’s C API (which Inkwell wraps) gives you a flat builder interface. You construct IR instruction by instruction, and that’s it. No progressive lowering, no dialect system, no way to teach the optimizer about your language’s semantics. For most languages that’s fine, but for one with actors and structured concurrency, it’s a problem.

Consider what happens to a spawn. With the C API, it becomes a function call immediately — just hew_rt_spawn, indistinguishable from any other call. The optimizer can’t reason about actor lifecycles, can’t fuse message sends, can’t eliminate dead actors. The abstraction is gone before optimization begins. What I needed were custom operations like hew.spawn and hew.send that could carry semantic meaning through the pipeline and get lowered progressively.

The 1 AM Decision

At 1:21 AM on February 14, I committed a 204-line design document laying out the new architecture:

Source (.hew) → Lexer → Parser → AST → TypeChecker
    → MLIR (Hew Dialect)
    → Progressive Lowering (func/scf/arith/memref)
    → LLVM Dialect → LLVM IR
    → Object Code → Link with libhew_rt.a
    → Executable

MLIR lets you define a custom dialect — your own operations, your own types, your own optimization passes — and then lower them step by step through standard dialects until you reach LLVM IR. Actor operations start as high-level hew.spawn and hew.send ops with full semantic information. They get lowered to function calls and memory operations only when the high-level passes are done with them.

The catch: MLIR is a C++ framework. Rust bindings exist but they’re incomplete. Going down this road meant building a new compiler frontend in C++20. A second compiler. From scratch.

(I spent longer than I should have staring at the working test suite before committing to this.)

There was a 1,024-line implementation plan covering 19 tasks. I committed it and started writing C++.

Second First Light

The rebuild was absurdly fast. The C++ lexer was complete by 10:45 AM — 1,749 lines, 64 keywords, 28 test cases. By 11:24 AM, the parser was working with Pratt precedence climbing at 4,797 lines. By 12:07 PM, the MLIR dialect had been defined in TableGen with 8 custom operations.

TableGen definitions are declarative:

def Hew_ConstantOp : Hew_Op<"constant", [Pure]> {
  let summary = "Produce a constant value.";
  let arguments = (ins AnyAttr:$value);
  let results = (outs AnyType:$result);
}

The full set of Hew-specific operations: hew.constant, hew.global_string, hew.print, hew.cast, hew.struct_init, hew.field_get, hew.field_set, and hew.enum_variant. Everything else — arithmetic, control flow, memory management — uses standard MLIR dialects: arith, scf, func, memref. The custom dialect stays deliberately small.

At 1:36 PM, Fibonacci compiled and ran through the MLIR pipeline. Twelve hours from design document to working codegen.

// Lowers Hew MLIR through the standard pipeline:
//   hew dialect -> func/arith/llvm ops
//   SCF -> ControlFlow -> LLVM dialect
//   LLVM dialect -> LLVM IR -> object file -> linked executable

Three conversion passes. The first converts Hew dialect operations to standard dialects — hew.constant becomes arith.constant, hew.print becomes a func.call to the runtime. The second converts structured control flow to basic blocks with explicit branches. The third converts everything to the LLVM dialect.

Fifty for Fifty

By 8:38 PM, lambda closure capture was working via lambda lifting and all fifty end-to-end tests were passing. The closure approach: walk the AST to find free variables in lambda bodies, add them as extra parameters to the generated function, pass captured values at call sites. No heap-allocated closure objects, no runtime indirection. This works because Hew’s lambdas capture by value and can’t escape their defining scope in most cases. When they do escape — passed to an actor — the actor machinery handles ownership transfer.

Fifty tests, all green, through a compiler that hadn’t existed eighteen hours earlier.

Hew now has two compilers: the Rust-based hewc (with both C and Inkwell LLVM backends) and the C++ hewcpp (with MLIR). The MLIR compiler is the primary path going forward. The Rust compiler stays on as a reference implementation and powers the playground backend.