How Fast Is It, Actually

· 3 min read
Building Hew: Part 6 of 16

I’d been putting off benchmarks. Two compilers, 50 tests passing — but none of that tells you whether the generated code is actually fast. (I kept finding reasons not to measure. You can probably guess why.)

The Benchmark Suite

The harness compiles Hew programs with hewcpp and equivalent C programs with gcc -O2, runs each five times, takes the median. Four categories: compute, collections, strings, actors.

Compute: Both Compilers Outsmarted the Test

Fibonacci was supposed to be the compute benchmark. Instead, both Hew and gcc -O2 constant-fold fib(30). Ten million iterations finish in ~3.5ms for both — within noise.

So the interesting result isn’t a number. It’s that MLIR’s constant folding — just canonicalization and CSE — matches gcc’s decades of optimization passes on this input. (I’m choosing to feel good about that rather than embarrassed about the benchmark design.)

Collections

1M push + sum: Hew at ~13ms, gcc at ~5ms. About 2.6× slower. No inlining, no loop unrolling in the Hew pipeline yet. This is the real number — not the fibonacci result where both compilers optimized the work away.

2.6× is not great. Whether it’s acceptable depends on how much the optimization passes close the gap once I actually write them. I don’t know yet.

Strings

Dominated by process startup at this scale. No meaningful difference to report.

Actors vs. Pthreads

This was the one I actually cared about.

At 100 spawns, actors and pthreads are effectively identical — both around 6ms. At 1,000, Hew actors are ~2× faster: 13ms vs 28ms. Actor spawn is a struct allocation, while pthread spawn creates an OS thread. The M:N scheduler pays for itself at scale.

Binary Size

Hew binaries are 328KB–1.9MB. Equivalent C programs are 16KB. The actor runtime is big, and I don’t have a way around that yet without stripping unused runtime components or building a lighter runtime for programs that don’t use actors. (That’s a problem for later.)

CI Pipeline

GitHub Actions went up to build both compilers on Ubuntu 24.04:

hewcpp:
  runs-on: ubuntu-24.04
  steps:
    - name: Install dependencies
      run: |
        sudo apt-get install -y cmake ninja-build \
          llvm-21-dev libmlir-21-dev mlir-21-tools clang-21

Builds hewcpp with LLVM 21, runs the Rust workspace (cargo build + test + clippy), compiles the C runtime. Every push is protected.

Generators in the Spec

Spec v0.6.1 added generators. The basic form is lazy sequences via yield. The question I kept circling back to: what if generators could stream across actor boundaries?

// Sync generator — lazy sequences
gen fn fibonacci() -> i32 {
    let (a, b) = (0, 1);
    loop {
        yield a;
        (a, b) = (b, a + b);
    }
}

// Cross-actor streaming
actor DataSource {
    receive gen fn stream_data() -> Record {
        for record in records {
            yield record;
        }
    }
}

// Consumer
for await record in source.stream_data() {
    process(record);
}

That receive gen fn bridges generators and the actor model. One actor produces, another consumes lazily. The mailbox handles backpressure — the producer yields into it and blocks when full, the consumer pulls as it’s ready.