Replacing FlatBuffers with MessagePack

Ten posts of feature work and the codebase was starting to creak. Not in ways that showed up in tests — everything passed. But the small frictions were accumulating: a serialization layer chosen for properties I never actually used, a flat namespace that puts print next to a PostgreSQL driver, a standard library with no mechanism for users to extend it.

So I stopped building features and spent two days tearing things apart. Fifty-nine commits later, the language looked exactly the same from the outside.

The FlatBuffers tax

FlatBuffers had been in the pipeline since the early two-compiler architecture. The Rust frontend serializes the AST, the C++ MLIR backend deserializes it, and FlatBuffers handled the boundary with zero-copy reads. In theory, zero-copy was elegant — the C++ side walks the AST directly from the serialized buffer without allocating. In practice, the C++ side is invoked as a subprocess. The AST gets serialized, written to a pipe, read back, and processed. (Zero-copy across a process boundary isn’t really zero-copy — it’s just a fast format with a complex build dependency.)

The tax was real: a FlatBuffers schema that had to stay in sync with the Rust AST, generated code on both sides, accessor methods that turned field access into function calls, and a build that needed flatc available. Every AST change meant updating the schema, regenerating code, and verifying both sides agreed. One more thing that could drift.

The migration happened over seven commits. Serde derives on the Rust AST types to make them serializable. A MessagePack serializer alongside FlatBuffers so both paths could coexist during the transition. Native C++ AST types and a MessagePack reader replaced the generated FlatBuffers types. Then the big one: converting all seven MLIRGen*.cpp files from FlatBuffers accessors to the new types. Tests updated. CLI switched over. And finally, every FlatBuffers dependency removed.

The difference in the codegen files was immediate:

// FlatBuffers: generated accessors, nested indirection
auto expr = stmt->expr();
auto binop = expr->expr_as_BinaryOp();
auto lhs = binop->lhs();
auto rhs = binop->rhs();
auto op = binop->op();

// MessagePack: native C++ types, direct field access
auto& expr = stmt.expr;
auto& binop = std::get<ast::BinaryOp>(expr.kind);
auto& lhs = *binop.lhs;
auto& rhs = *binop.rhs;
auto op = binop.op;

The migration deleted more code than it added. MessagePack is self-describing, so there’s no schema file. The C++ types are regular structs, so field access is just field access. The build no longer needs flatc. The only thing lost was zero-copy deserialization across a boundary that was never zero-copy to begin with.

Drawing the lines

The standard library had been a flat namespace. std::http, std::db::postgres, std::crypto::jwt — everything under std::. Fine when the library was small. Less fine when you start wondering whether print is really the same kind of thing as a PostgreSQL driver. (They sat at the same level of the namespace hierarchy, which implied they had equivalent status. They don’t.)

The split drew a line. Language fundamentals — print, string operations, math, collections — stay in std::. Ecosystem libraries — databases, HTTP, caching, crypto — move to hew:::

// Before: everything under std::
use std::http::server;
use std::db::postgres;
use std::crypto::jwt;

// After: language fundamentals in std::, ecosystem in hew::
use std::string;
use std::math;
use hew::http::server;
use hew::db::postgres;
use hew::crypto::jwt;

hew::db::postgres sits alongside hew::db::sqlite. hew::cache::redis alongside hew::cache::memcached. The namespace itself communicates what kind of thing each library is. Documentation updates followed, then stale reference fixes, and module-path import resolution that made use hew::db::postgres resolve to an actual .hew file on disk following a predictable path convention.

Package manager and export machinery

With namespaces sorted, the next question was how third-party packages fit in. The answer was Adze — a package manager using hew.toml manifests. Registry support, dependency resolution, and add/install/publish/list commands landed together, followed by integration with hew build.

On the runtime side, a proc macro crate introduced #[hew_export]. Annotate a runtime function and it generates the metadata the type checker needs: parameter types, return type, which feature flag gates it. The hew-stdlib-gen crate uses these annotations to produce standard library definitions automatically. Adding a new runtime function used to mean updating code in three places — now it means annotating one function.

Runtime feature flags got more granular too. Instead of linking the entire runtime, features like HTTP, TLS, or database drivers gate independently. When a program uses a feature that isn’t enabled, the linker now produces a diagnostic explaining what’s missing instead of dumping raw undefined-symbol errors. (Whether that diagnostic is actually clear enough, I’m not sure yet. But it’s better than ld: undefined symbol.)

Historical research notes and design journals got archived in the same sweep. They’d served their purpose during the early design phase. Keeping them in the working tree was just clutter.