One L4 GPU serves everyone, so replies are
queued — usually a few seconds when the
model is warm. The kernel must see the whole reply to adjudicate it before releasing it, so text appears
once it's done (not token-by-token). Every message is a real
/v1/chat/completions call through
fak serve.