Treat the model like an untrusted program,
and the tool call like a syscall.

Every effect an agent has on the world — call a tool, admit a result into memory, reuse a cached answer — passes through a kernel the model doesn't control. From the security seat it's a permission gate the agent can't talk its way past; from the performance seat it does the shared work once. These pages drive the real, running kernel behind this endpoint.

live API

⚖️ Adjudication Playground

Propose any tool call and watch the kernel return ALLOW or DENY — the verdict before it even decodes the arguments. Default-deny, fail-closed, independent of why the model asked.

Open the playground → real model

💬 Live Chat — qwen2.5:14b

Talk to a real 14-billion-parameter model on an L4 GPU, served through the kernel. Human-scale, conversational proof that the gate sits on the hot path without getting in the way.

Start chatting → pure kernel · CPU vs GPU

⚡ In-kernel engine: CPU vs GPU

The same model, the same kernel, decoded on the CPU reference path vs the wired-in CUDA backend (--backend cuda) on the L4. No ollama either side. Same prompt, both engines, live — ~3.5× faster on the GPU.

Run the head-to-head → pure kernel · no GPU

🧩 Multi-agent reuse proof

The fleet thesis, exact and timing-free: a shared prefix prefilled once and cloned into N agents. Pick a scenario (or 5 agents × 50 turns) and read the precise token-work each strategy does — naive re-prefill vs warm KV vs fak. Runs entirely in the kernel.

Open the proof → side by side

⚖️ Turn-tax — fak vs a SOTA loop

Two lanes race in real time: a SOTA 2-pass agent loop vs fak's 1-shot kernel, replaying the same class-labeled tool-call trace. Every turn fak saves — grammar repair, a vDSO cache hit, a poison quarantine — is visible. Self-contained, no model needed.

Run the comparison → how it works

Treat the model like an untrusted program,
and the tool call like a syscall.

⚖️ Adjudication Playground

💬 Live Chat — qwen2.5:14b

⚡ In-kernel engine: CPU vs GPU

🧩 Multi-agent reuse proof

⚖️ Turn-tax — fak vs a SOTA loop

🗺️ Visual Gallery

🔌 The API

📈 Live Metrics

Treat the model like an untrusted program,and the tool call like a syscall.

⚖️ Adjudication Playground

💬 Live Chat — qwen2.5:14b

⚡ In-kernel engine: CPU vs GPU

🧩 Multi-agent reuse proof

⚖️ Turn-tax — fak vs a SOTA loop

🗺️ Visual Gallery

🔌 The API

📈 Live Metrics

Treat the model like an untrusted program,
and the tool call like a syscall.