[good]

Fastest LLMs

Ranked by output tokens per second (t/s) via API. Sourced from Artificial Analysis speed leaderboard — 72-hour rolling average, measured under controlled API conditions. Not self-reported by providers. Faster does not mean smarter — see intelligence rankings for capability.

fastest model (API)

Llama 4 Scout

Meta · 180 tokens/sec

Speed tiers

200+ t/sBlazing fast. Near-instant responses for most outputs. Best for real-time applications, streaming UIs, and high-volume API workloads.
120–200 t/sVery fast. Noticeably snappier than typing speed. Good for interactive applications.
80–120 t/sFast. Comfortable for interactive use. You will not be waiting on the model.
50–80 t/sAdequate. Acceptable for most chat and document tasks. May feel slow on long outputs.
Under 50 t/sSlow for interactive use. Acceptable for batch processing where speed is not a constraint.
1
180 t/s
7.5
quality
Intelligence: AA 38.5 est. · API: $0.11/1M blendedFull review →
2
170 t/s
7.3
quality
Intelligence: AA 35.0 · API: $1.13/1M blendedFull review →
3
125 t/s
4.4
quality
Intelligence: AA 18.0 · API: $0.44/1M blendedFull review →
4
Grok 4.1xAIest.
90 t/s
8.0
quality
Intelligence: AA 41.4 · API: $6.00/1M blendedFull review →
5
Claude Sonnet 4.6Anthropicest.
85 t/s
8.0
quality
Intelligence: AA 44.3 · API: $6.00/1M blendedFull review →
6
73 t/s
7.3
quality
Intelligence: AA 39.0 · API: $0.69/1M blendedFull review →
7
67 t/s
7.5
quality
Intelligence: AA 46.0 · API: $10.00/1M blendedFull review →
8
GPT-5.2OpenAIest.
65 t/s
8.3
quality
Intelligence: AA 46.6 · API: $4.81/1M blendedFull review →
9
56 t/s
4.6
quality
Intelligence: AA 23.0 · API: $0.75/1M blendedFull review →
10
Gemini 3 ProGoogleest.
55 t/s
8.8
quality
Intelligence: AA 48.4 · API: $4.50/1M blendedFull review →
11
DeepSeek V3.2DeepSeekest.
45 t/s
6.3
quality
Intelligence: AA 41.6 · API: $0.48/1M blendedFull review →

Speed caveats

API speed fluctuates. Provider infrastructure, load, and geographic routing all affect actual latency. These figures are averages from Artificial Analysis over 72 hours. Your real-world experience will vary, particularly during peak hours.

Time to first token matters too. Tokens-per-second measures generation throughput once the model starts responding. For interactive applications, time-to-first-token (TTFT) can matter just as much — a model that starts instantly at 80 t/s can feel faster than one that pauses for 2 seconds then runs at 200 t/s.

Reasoning models are slower by design. Models with extended thinking modes generate many internal tokens before producing output. Speed rankings here use standard mode only — not extended thinking.

Values marked “est.” have not been directly measured for this specific model version by Artificial Analysis. Updated when AA completes indexing.