Pre-Alpha Syntax Cleanup

· 7 min read
Building Hew: Part 14 of 16

Before tagging v0.1.0-alpha, I spent a day doing nothing but removing things. (It was more satisfying than I expected.) Syntax is hard to change later — once people start writing code in your language, every removal is a migration and every inconsistency becomes somebody’s mental model. Seemed like a good time to clean house.

Seven removals

The largest single syntax change in Hew’s history: remove the struct keyword, enforce semicolons as field separators, require fn after receive, switch loop labels from 'label: to @label:, and delete template literals.

The struct keyword was a historical artifact. Hew already used type for type declarations — you could write type Point { ... } or struct Point { ... } and they meant exactly the same thing. Two ways to declare a type, zero semantic difference. I don’t know why I kept both around as long as I did:

type Point {
    x: f64,
    y: f64,
}

became:

type Point {
    x: f64;
    y: f64;
}

Semicolons instead of commas came from the same principle. Hew uses semicolons to terminate statements. Having commas between fields was a grammar irregularity borrowed from languages where structs aren’t statement-bearing — but Hew types can contain methods, so semicolons are the natural separator — one rule for everything.

Then I removed try/catch and race from the AST and every downstream consumer — parser, type checker, serializer, enricher, C++ codegen, LSP, doc extractor. 107 lines deleted from expression codegen alone. try/catch had been an escape hatch for error recovery, but Result<T, E> with ? already handled the common case, and match handles the rest. race was select without the ability to name what you’re waiting for — strictly less powerful:

let value = try {
    let x = divide(10, 0)?;
    x * 3
} catch e {
    -1
};

replaced by the ? operator in a function that returns Result:

fn compute() -> Result<i32, i32> {
    let x = divide(10, 0)?;   // propagates Err
    Ok(x * 3)
}

Template literals (backtick strings with ${expr}) were removed in the same pass. Hew already had f-strings — f"hello {name}" — which do the same thing with clearer syntax. Two interpolation mechanisms is one too many.

Scope gets a binding

The scope syntax redesign was the one change that added something while removing more. The old design used scope.launch, scope.cancel(), and scope.is_cancelled() as bare keywords that magically referred to the enclosing scope. (I’m still not sure what I was thinking with that.) The new design makes the scope a named binding:

scope.launch { fetch_user(id1) };
scope.cancel();
if scope.is_cancelled() { ... }

became:

scope |s| {
    let a = s.launch { fetch_user(id1) };
    let b = s.launch { fetch_user(id2) };
    s.cancel();
    if s.is_cancelled() { ... }
}

No more implicit scope reference. The |s| is explicit — you can see where the scope enters, you can name it, and nested scopes don’t shadow each other. Three pseudo-keywords removed from the grammar, one binding syntax added.

Named arguments

Actor spawn already took named fields — spawn Counter(count: 0) — because constructing an actor is conceptually creating a struct. But function calls were positional-only, which meant API signatures with multiple integer parameters were a readability hazard. (The kind of thing you don’t notice until you’re debugging the third wrong-argument-order bug in a week.)

I added named arguments through the full pipeline. The parser uses two-token lookahead (identifier followed by colon) to detect named args. The type checker validates that named arguments match parameter names and rejects positional args after named ones. The C++ codegen reorders arguments to match parameter position:

fn connect(host: String, port: i32, timeout: i32) -> Connection { ... }

// Positional — which integer is which?
connect("db.local", 5432, 30);

// Named — reads like documentation
connect("db.local", port: 5432, timeout: 30);

Named arguments are optional — you can still call everything positionally. But when a function takes three integers, being able to write timeout: 30 instead of hoping you got the order right is the difference between readable code and a bug waiting to happen.

Making the formatter honest

A code formatter that changes your program’s meaning is worse than no formatter at all. (Ask me how I know.) I fixed eight semantic-breaking bugs in hew fmt:

// Before: hew fmt changed semantics
await expr       →  expr.await     // wrong: Hew uses prefix await
spawn Foo(x: 1)  →  spawn Foo(1)   // dropped named field
0xFF              →  255            // lost hex intent
wire { n: i32 @1 }  →  wire { n: i32 = 1 }  // wrong tag syntax

The root cause in most cases was the formatter reconstructing syntax from the AST without enough context. Operator precedence wasn’t tracked, so necessary parentheses got dropped. Integer literals were stored as values, losing their radix. Named call arguments were serialized as positional. The fix added operator precedence tracking, an IntRadix enum for faithful literal round-tripping, and field-aware call argument formatting. Then I added --check mode for CI and in-place formatting with -i.

After the fixes, hew fmt was run across every example program in the repository. Nine programs had parse errors that the formatter surfaced. A formatter that catches bugs by being strict about what it accepts is earning its keep.

Three crates become one

hew-doc, hew-eval, and hew-test had been separate crates since the early days, back when it seemed like the language might have standalone tooling for documentation, REPL evaluation, and test running. In practice all three were always invoked through hew-cli, shared the same dependencies, and had identical version numbers. Three crates, three Cargo.toml files, three things to version-bump on every release. Whether splitting them originally was over-engineering or just premature, I honestly can’t tell.

I moved the source files into hew-cli/src/ as submodules, rewired the imports, and deleted 3,356 lines of now-redundant crate infrastructure:

hew-cli/
├── src/
│   ├── main.rs          // CLI entry point
│   ├── compile.rs       // compiler driver
│   ├── doc/             // ← was hew-doc crate
│   │   ├── extract.rs
│   │   ├── highlight.rs
│   │   └── render.rs
│   ├── eval/            // ← was hew-eval crate
│   │   ├── repl.rs
│   │   ├── session.rs
│   │   └── classify.rs
│   └── test_runner/     // ← was hew-test crate
│       ├── runner.rs
│       ├── discovery.rs
│       └── output.rs

One crate, one binary, one thing to ship. The workspace went from nine crates to six.

v0.1.0-alpha

I added the first CHANGELOG. Writing it required answering: what does Hew actually have? Two compilers (Rust/MLIR and C++/MLIR). A runtime with M:N scheduling, per-actor heaps, and supervision trees. An LSP with completions, go-to-definition, and semantic tokens. A formatter, a debugger, a test runner, a REPL, a package manager. Wire types with binary and JSON serialization. Installers for Debian, RPM, Arch, Alpine, Nix, Homebrew, and Docker. (I had to go look at the repo to remember half of this.)

Before the CHANGELOG, 45 example programs needed updating for the new syntax. After, a generative compiler fuzzer with 48 categories to make sure nothing else broke. The fuzzer found one crash: a stack overflow on deeply nested expressions — 1,000+ chained binary operators. Fix was a 64 MiB stack thread for the CLI and dynamic stack growth in the type checker using stacker::maybe_grow.