Fastest LLMs
Ranked by output tokens per second (t/s) via API. Sourced from Artificial Analysis speed leaderboard — 72-hour rolling average, measured under controlled API conditions. Not self-reported by providers. Faster does not mean smarter — see intelligence rankings for capability.
fastest model (API)
Llama 4 Scout
Meta · 180 tokens/sec
Speed tiers
Speed caveats
API speed fluctuates. Provider infrastructure, load, and geographic routing all affect actual latency. These figures are averages from Artificial Analysis over 72 hours. Your real-world experience will vary, particularly during peak hours.
Time to first token matters too. Tokens-per-second measures generation throughput once the model starts responding. For interactive applications, time-to-first-token (TTFT) can matter just as much — a model that starts instantly at 80 t/s can feel faster than one that pauses for 2 seconds then runs at 200 t/s.
Reasoning models are slower by design. Models with extended thinking modes generate many internal tokens before producing output. Speed rankings here use standard mode only — not extended thinking.
Values marked “est.” have not been directly measured for this specific model version by Artificial Analysis. Updated when AA completes indexing.