Grammar Tooling and the MLIR Dialect Overhaul

· 4 min read
Building Hew: Part 10 of 16

328 commits landed between Part 9 and now. String interpolation codegen. Drop/RAII. Move semantics at actor boundaries. Structured concurrency. A REPL. A formatter. Pretty error diagnostics. Regex literals as first-class. Default integer widened from i32 to i64. The old C++ frontend deleted — 4,900 lines. And an MLIR dialect overhaul I’ll get to below.

But none of that is what a new user sees first — they see the website.

Building a grammar from scratch

Syntax highlighting is the first thing a developer notices, and monochrome code says “the tooling isn’t there yet” louder than any feature list.

I built a TextMate grammar across ten commits: comments, strings with interpolation, keywords, types, operators, attributes, function declarations. Written to match the Hew spec exactly, including constructs like receive fn and wire type that don’t exist in other languages. Shiki replaced the old tokenizer:

Before — keywords only:

fn fibonacci(n: i32) -> i32 {
    if n <= 1 { return n; }
    return fibonacci(n - 1) + fibonacci(n - 2);
}

After — functions, types, operators, numbers, all scoped:

fn fibonacci(n: i32) -> i32 {
    if n <= 1 { return n; }
    return fibonacci(n - 1) + fibonacci(n - 2);
}

A 967-line ANTLR4 grammar landed in the main repo for validation. Editor grammars followed for nano, Emacs, Sublime Text, and tree-sitter. The website brought both together in an interactive grammar explorer with railroad diagrams and a toggle between spec EBNF and ANTLR4:

match_expr
    : 'match' expression '{' match_arm+ '}'
    ;

match_arm
    : pattern ('if' expression)? '=>' (expression | block) ','?
    ;

The MLIR dialect overhaul

This was the biggest compiler change of the batch. The dialect overhaul replaced raw LLVM dialect ops — llvm.call @hew_vec_new scattered everywhere through the IR — with typed Hew dialect operations that lower semantically.

// Before: raw LLVM function calls
%0 = llvm.call @hew_vec_new(%size) : (i64) -> !llvm.ptr
%1 = llvm.call @hew_vec_push(%0, %elem) : (!llvm.ptr, i64) -> ()

// After: typed Hew dialect ops with semantic lowering
%0 = hew.vec.new : !hew.vec<i64>
%1 = hew.vec.push(%0, %elem) : (!hew.vec<i64>, i64) -> ()

Parameterized types for collections, actors, and handles. Dedicated ops for Vec, HashMap, string, and runtime calls. Trait dispatch with scf.if chain lowering. The IR went from LLVM function-call soup to something that actually reads like Hew.

That mattered for the next step: a WASM target in hew-cli with a native WASM runtime crate. The Emscripten-compiled hewcpp now runs diagnostics in the browser — same MLIR compiler that produces native binaries, checking your code as you type in the playground. A built-in profiler with per-actor timing and pprof export rounded out the runtime instrumentation.

A Tailwind naming collision

While adding light/dark mode to hew.sh, I named the background color token base. This broke things in a way that took longer to diagnose than I’m comfortable admitting.

// Named the background token "base"
colors: {
    base: 'rgb(var(--bg-base) / <alpha-value>)',
}

// Tailwind's md:text-base means "font-size: 1rem" at medium breakpoints
// But with a color named "base", it became "color: white"
<h1 class="text-4xl md:text-base">  // <-- white text, not 16px

// Fix: rename to "canvas"
colors: {
    canvas: 'rgb(var(--bg-canvas) / <alpha-value>)',
}

Tailwind’s text-base is a font-size utility — font-size: 1rem. But adding a color named base to the Tailwind config also generates text-base as a color utility. The color variant won. md:text-base on the hero heading went from “16px at medium breakpoints” to “white text.” The hero just vanished on light backgrounds.

Five-character fix: rename the token to canvas. (One of those bugs that takes an hour to find and one line to fix.)