← fak

💬 Live Chat

qwen2.5:14b · 1× L4 GPU · through the kernel
One L4 GPU serves everyone, so replies are queued — usually a few seconds when the model is warm. The kernel must see the whole reply to adjudicate it before releasing it, so text appears once it's done (not token-by-token). Every message is a real /v1/chat/completions call through fak serve.