checking…

the agent kernel — running live on GCP

Treat the model like an untrusted program,
and the tool call like a syscall.

Every effect an agent has on the world — call a tool, admit a result into memory, reuse a cached answer — passes through a kernel the model doesn't control. From the security seat it's a permission gate the agent can't talk its way past; from the performance seat it does the shared work once. These pages drive the real, running kernel behind this endpoint.

live API

⚖️ Adjudication Playground

Propose any tool call and watch the kernel return ALLOW or DENY — the verdict before it even decodes the arguments. Default-deny, fail-closed, independent of why the model asked.

Open the playground →
real model

💬 Live Chat — qwen2.5:14b

Talk to a real 14-billion-parameter model on an L4 GPU, served through the kernel. Human-scale, conversational proof that the gate sits on the hot path without getting in the way.

Start chatting →
pure kernel · CPU vs GPU

⚡ In-kernel engine: CPU vs GPU

The same model, the same kernel, decoded on the CPU reference path vs the wired-in CUDA backend (--backend cuda) on the L4. No ollama either side. Same prompt, both engines, live — ~3.5× faster on the GPU.

Run the head-to-head →
pure kernel · no GPU

🧩 Multi-agent reuse proof

The fleet thesis, exact and timing-free: a shared prefix prefilled once and cloned into N agents. Pick a scenario (or 5 agents × 50 turns) and read the precise token-work each strategy does — naive re-prefill vs warm KV vs fak. Runs entirely in the kernel.

Open the proof →
side by side

⚖️ Turn-tax — fak vs a SOTA loop

Two lanes race in real time: a SOTA 2-pass agent loop vs fak's 1-shot kernel, replaying the same class-labeled tool-call trace. Every turn fak saves — grammar repair, a vDSO cache hit, a poison quarantine — is visible. Self-contained, no model needed.

Run the comparison →
how it works

🗺️ Visual Gallery

The architecture, the two-gate security model, and where the performance win comes from — the project's own diagrams, served live. The one idea, drawn out.

See how it works →
OpenAI-compatible

🔌 The API

Drop-in /v1/chat/completions, plus the kernel's own /v1/fak/adjudicate, /healthz, and Prometheus /metrics. Point any OpenAI client at this host.

GET /v1/models →
observability

📈 Live Metrics

Boot timeline, adjudication counts and latency, vDSO hits — the kernel's own Prometheus surface, scraped straight from the running process.

GET /metrics →