fak · the reuse demo

ladder: …

Same model · same tokens · same answers. fak prefills the shared agent setup once and reuses it; the naive loop makes the model re-read the whole growing context every turn. Watch the wall-clock gap open — that gap is the value.

① Live race — fak vs naive both arms run live, same model

model workload: P=512 T=5 C=5 D=16 R=32 → 25 requests

fak idle—

prefilled 0 · decoded 0

naive (re-prefill every turn) idle—

prefilled 0 · decoded 0

② Reuse curve across the model ladder fak arm LIVE · naive arm projected from measured prefill cost

same workload, smaller P=128 for tractability on CPU. As the model grows, the absolute minutes saved grow with it — the ratio holds.

naive (re-prefill)fak (reuse)

Each rung: fak runs the session live; the naive bar is projected from that model's measured prefill cost (running the naive arm live at 3B would take ~an hour — it re-prefills the whole context every turn). The A/C ratio is timing-free and model-independent: it's fixed by the session shape.

fak in-kernel engine · pure-Go Q8 forward pass · tokens are real model output (anchor-quality on the 135M reference, chat-quality on the Qwen2.5 rungs). No network, no API, all on this box.