fak · Live Chat (qwen2.5:14b)

One L4 GPU serves everyone, so replies are queued — usually a few seconds when the model is warm. The kernel must see the whole reply to adjudicate it before releasing it, so text appears once it's done (not token-by-token). Every message is a real /v1/chat/completions call through fak serve.

💬 Live Chat