[good?]

Rankings

Fastest LLMs

Ranked by output tokens per second (t/s) via API. Sourced from Artificial Analysis speed leaderboard — 72-hour rolling average, measured under controlled API conditions. Not self-reported by providers. Faster does not mean smarter — see intelligence rankings for capability.

fastest model (API)

GPT OSS 120B

OpenAI · 304 tokens/sec

Speed tiers

200+ t/sBlazing fast. Near-instant responses for most outputs. Best for real-time applications, streaming UIs, and high-volume API workloads.
120–200 t/sVery fast. Noticeably snappier than typing speed. Good for interactive applications.
80–120 t/sFast. Comfortable for interactive use. You will not be waiting on the model.
50–80 t/sAdequate. Acceptable for most chat and document tasks. May feel slow on long outputs.
Under 50 t/sSlow for interactive use. Acceptable for batch processing where speed is not a constraint.
1
304 t/s
5.4
quality
Intelligence: AA 33.3 · API: $0.26/1M blendedFull review →
2
214 t/s
7.8
quality
Intelligence: AA 46.4 · API: $1.13/1M blendedFull review →
3
138 t/s
7.8
quality
Intelligence: AA 48.4 · API: $4.50/1M blendedFull review →
4
135 t/s
3.9
quality
Intelligence: AA 13.5 · API: $0.17/1M blendedFull review →
5
133 t/s
4.7
quality
Intelligence: AA 23.6 · API: $0.25/1M blendedFull review →
6
115 t/s
4.0
quality
Intelligence: AA 18.4 · API: $0.31/1M blendedFull review →
7
111 t/s
5.5
quality
Intelligence: AA 31.1 · API: $2.00/1M blendedFull review →
8
99 t/s
7.2
quality
Intelligence: AA 54.0 · API: $4.81/1M blendedFull review →
9
GPT-5.2OpenAI
91 t/s
7.5
quality
Intelligence: AA 51.3 · API: $4.81/1M blendedFull review →
10
91 t/s
8.7
quality
Intelligence: AA 57.2 · API: $4.50/1M blendedFull review →
11
Grok 4.2xAIest.
85 t/s
5.2
quality
Intelligence: AA 43.0 est. · API: $9.00/1M blendedFull review →
12
76 t/s
6.3
quality
Intelligence: AA 41.2 · API: $0.69/1M blendedFull review →
13
69 t/s
6.4
quality
Intelligence: AA 46.5 · API: $10.00/1M blendedFull review →
14
55 t/s
6.6
quality
Intelligence: AA 44.4 · API: $6.00/1M blendedFull review →
15
Mistral Large 3Mistral AI
50 t/s
3.2
quality
Intelligence: AA 22.8 · API: $0.75/1M blendedFull review →
16
43 t/s
4.0
quality
Intelligence: AA 32.1 · API: $0.49/1M blendedFull review →
17
Kimi K2Moonshot AI
42 t/s
4.0
quality
Intelligence: AA 26.3 · API: $0.77/1M blendedFull review →
18
37.2 t/s
5.8
quality
Intelligence: AA 37.1 · API: $6.00/1M blendedFull review →
19
35 t/s
2.7
quality
Intelligence: AA 17.0 · API: $0.37/1M blendedFull review →

Speed caveats

API speed fluctuates. Provider infrastructure, load, and geographic routing all affect actual latency. These figures are averages from Artificial Analysis over 72 hours. Your real-world experience will vary, particularly during peak hours.

Time to first token matters too. Tokens-per-second measures generation throughput once the model starts responding. For interactive applications, time-to-first-token (TTFT) can matter just as much — a model that starts instantly at 80 t/s can feel faster than one that pauses for 2 seconds then runs at 200 t/s.

Reasoning models are slower by design. Models with extended thinking modes generate many internal tokens before producing output. Speed rankings here use standard mode only — not extended thinking.

Values marked “est.” have not been directly measured for this specific model version by Artificial Analysis. Updated when AA completes indexing.