the agent kernel — running live on GCP
Every effect an agent has on the world — call a tool, admit a result into memory, reuse a cached answer — passes through a kernel the model doesn't control. From the security seat it's a permission gate the agent can't talk its way past; from the performance seat it does the shared work once. These pages drive the real, running kernel behind this endpoint.
Propose any tool call and watch the kernel return ALLOW or DENY — the verdict before it even decodes the arguments. Default-deny, fail-closed, independent of why the model asked.
Open the playground → real modelTalk to a real 14-billion-parameter model on an L4 GPU, served through the kernel. Human-scale, conversational proof that the gate sits on the hot path without getting in the way.
Start chatting → pure kernel · CPU vs GPUThe same model, the same kernel, decoded on the CPU reference path vs the wired-in CUDA backend
(--backend cuda) on the L4. No ollama either side. Same prompt, both engines, live —
~3.5× faster on the GPU.
The fleet thesis, exact and timing-free: a shared prefix prefilled once and cloned into N agents. Pick a scenario (or 5 agents × 50 turns) and read the precise token-work each strategy does — naive re-prefill vs warm KV vs fak. Runs entirely in the kernel.
Open the proof → side by sideTwo lanes race in real time: a SOTA 2-pass agent loop vs fak's 1-shot kernel, replaying the same class-labeled tool-call trace. Every turn fak saves — grammar repair, a vDSO cache hit, a poison quarantine — is visible. Self-contained, no model needed.
Run the comparison → how it worksThe architecture, the two-gate security model, and where the performance win comes from — the project's own diagrams, served live. The one idea, drawn out.
See how it works → OpenAI-compatibleDrop-in /v1/chat/completions, plus the kernel's own /v1/fak/adjudicate,
/healthz, and Prometheus /metrics. Point any OpenAI client at this host.
Boot timeline, adjudication counts and latency, vDSO hits — the kernel's own Prometheus surface, scraped straight from the running process.
GET /metrics →