LLM

Authors: Philip Gerdes

File Change History:

Date	Change	Author
2026-04-22	Benchmarking	Philip

All benchmarks were run with the following parameters:

Time to First Token (TTFT)

Time per Output Token (TPOT)

Results:

TTFT is decisive for K.ai (latency minimization)
- Focus on N = 6200 requests
  - Possible measurement error at N = 10
  - Longer runtime, so the figure may also reflect fatigue effects from thermal buildup
Throughput (TPOT) is ~ 6 ms faster with SGLang
TTFT (start of response) is ~ 52 ms faster with vLLM