Most of the gaps microgpt found turned out to be shallow. The compiler already had backend support for numeric conversions and math intrinsics — they just needed stdlib wrappers. Once those landed, the workarounds came out one commit at a time.
The refactoring sequence
Each commit removed one category of workaround. The order followed the dependency chain — you can’t remove hand-rolled math until the stdlib has math, and you can’t simplify the PRNG until you have .to_f64().
.to_f64() intrinsic. The compiler emits sitofp now. The 15-line int_to_f64_fast function and all its call sites disappeared. Every place that kept parallel integer and float counters collapsed to a single integer with .to_f64() where needed.
math.* stdlib. exp, log, pow, sqrt, floor, abs, max — thin wrappers around LLVM intrinsics. 97 lines of Taylor series deleted. The training loss now matches Python to 4+ significant figures because both use hardware fexp/flog instead of my approximations.
random.* stdlib. The 230-line hand-rolled Mersenne Twister became random.seed(42), random.gauss(0.0, std), random.shuffle(doc_order), and random.choices(cum_weights, cum_total, vocab_size). The stdlib implementation matches CPython’s MT19937 exactly, so the output stayed byte-identical.
break/continue and unary negation. All fourteen 0.0 - x became -x. The done-flag while-loop patterns became clean while loops with break. The tokenizer sort that had the compound corruption bug became straightforward:
while j2 >= 0 {
let a_code = string_char_at(uchars[j2], 0);
if a_code > key_code {
uchars[j2 + 1] = "" + uchars[j2];
j2 -= 1;
} else {
break;
}
}impl Tape and HashMap tokenizer. Methods moved onto the Tape type with impl Tape { ... }, so vadd(t, a, b) became t.vadd(a, b). The character-to-token lookup switched from linear search over Vec<String> to HashMap<String, Int>. Compound assignment operators (+=, -=) replaced verbose x = x + 1 patterns.
for i in 0..n and const. Hyperparameters became const declarations. Most indexed while loops became for loops. Redundant length parameters dropped from function signatures — v.len() replaced a separately-tracked n.
The final state
677 lines. The Model struct holds all nine weight matrices:
type Model {
wte: Vec<Val>; // token embeddings
wpe: Vec<Val>; // position embeddings
lm_head: Vec<Val>; // output projection
attn_wq: Vec<Val>; // query weights
attn_wk: Vec<Val>; // key weights
attn_wv: Vec<Val>; // value weights
attn_wo: Vec<Val>; // output weights
mlp_fc1: Vec<Val>; // MLP first layer
mlp_fc2: Vec<Val>; // MLP second layer
}The struct feeds into gpt_forward, which takes a model, a token ID, a position, and the KV cache, and returns logits. The training loop creates a fresh tape each step — copy parameter values, run forward, compute cross-entropy loss, backpropagate, apply Adam. Inference copies the trained parameters into a fresh tape and samples autoregressively with temperature scaling.
Python compatibility
With seed 42 on the same names dataset, the first six generated names are byte-identical to Python’s output:
sample 1: jamin
sample 2: aorti
sample 3: jadin
sample 4: lira
sample 5: dede
sample 6: edeTraining losses agree to four decimal places across all 200 steps. The tiny drift after sample 6 comes from hardware math — llvm.exp.f64 vs Python’s math.exp produce different last-bit rounding that accumulates over thousands of operations. Both are correct to IEEE 754; they just break ties differently.
What’s still there
The 677-line file is 5.6x the 120-line Python original. Some of that expansion is inherent — Hew is statically typed, doesn’t have operator overloading, and requires explicit allocation. tape.vadd(a, b) will always be more verbose than a + b. The Tape struct with seven parallel arrays will always be more explicit than Python’s __add__ magic.
But some of it is still language gaps. Vec<Vec<T>> still doesn’t compile, so matrices remain flat with manual indexing. HashMap<String, Vec<Val>> hasn’t been tested, so weight matrices are named fields on a struct. And the tokenizer still does "" + uchars[j2] to deep-copy strings during sorting — Vec<String> element access semantics haven’t been clarified yet.