Rankings

Fastest LLMs

Ranked by output tokens per second (t/s) via API. Sourced from Artificial Analysis speed leaderboard — 72-hour rolling average, measured under controlled API conditions. Not self-reported by providers. Faster does not mean smarter — see intelligence rankings for capability.

fastest model (API)

GPT OSS 120B

OpenAI · 304 tokens/sec

Speed tiers

200+ t/sBlazing fast. Near-instant responses for most outputs. Best for real-time applications, streaming UIs, and high-volume API workloads.

120–200 t/sVery fast. Noticeably snappier than typing speed. Good for interactive applications.

80–120 t/sFast. Comfortable for interactive use. You will not be waiting on the model.

50–80 t/sAdequate. Acceptable for most chat and document tasks. May feel slow on long outputs.

Under 50 t/sSlow for interactive use. Acceptable for batch processing where speed is not a constraint.

GPT OSS 120BOpenAI

304 t/s

5.4

quality

Intelligence: AA 33.3 · API: $0.26/1M blendedFull review →

Gemini 3 FlashGoogle

214 t/s

7.8

quality

Intelligence: AA 46.4 · API: $1.13/1M blendedFull review →

Gemini 3 ProGoogle

138 t/s

7.8

quality

Intelligence: AA 48.4 · API: $4.50/1M blendedFull review →

Llama 4 ScoutMeta

135 t/s

3.9

quality

Intelligence: AA 13.5 · API: $0.17/1M blendedFull review →

Grok 4.1xAI

133 t/s

4.7

quality

Intelligence: AA 23.6 · API: $0.25/1M blendedFull review →

Llama 4 MaverickMeta

115 t/s

4.0

quality

Intelligence: AA 18.4 · API: $0.31/1M blendedFull review →

Claude Haiku 4.5Anthropic

111 t/s

5.5

quality

Intelligence: AA 31.1 · API: $2.00/1M blendedFull review →

GPT-5.3-CodexOpenAI

99 t/s

7.2

quality

Intelligence: AA 54.0 · API: $4.81/1M blendedFull review →

GPT-5.4OpenAIest.

95 t/s

7.5

quality

Intelligence: AA 55.5 est. · API: $5.63/1M blendedFull review →

GPT-5.2OpenAI

91 t/s

7.5

quality

Intelligence: AA 51.3 · API: $4.81/1M blendedFull review →

Gemini 3.1 ProGoogle

91 t/s

8.7

quality

Intelligence: AA 57.2 · API: $4.50/1M blendedFull review →

Grok 4.2xAIest.

85 t/s

5.2

quality

Intelligence: AA 43.0 est. · API: $9.00/1M blendedFull review →

GPT-5 MiniOpenAI

76 t/s

6.3

quality

Intelligence: AA 41.2 · API: $0.69/1M blendedFull review →

Claude Opus 4.6Anthropic

69 t/s

6.4

quality

Intelligence: AA 46.5 · API: $10.00/1M blendedFull review →

Claude Sonnet 4.6Anthropic

55 t/s

6.6

quality

Intelligence: AA 44.4 · API: $6.00/1M blendedFull review →

Mistral Large 3Mistral AI

50 t/s

3.2

quality

Intelligence: AA 22.8 · API: $0.75/1M blendedFull review →

DeepSeek V3.2DeepSeek

43 t/s

4.0

quality

Intelligence: AA 32.1 · API: $0.49/1M blendedFull review →

Kimi K2Moonshot AI

42 t/s

4.0

quality

Intelligence: AA 26.3 · API: $0.77/1M blendedFull review →

Claude Sonnet 4.5Anthropic

37.2 t/s

5.8

quality

Intelligence: AA 37.1 · API: $6.00/1M blendedFull review →

Qwen 3 235BAlibaba

35 t/s

2.7

quality

Intelligence: AA 17.0 · API: $0.37/1M blendedFull review →

Speed caveats

API speed fluctuates. Provider infrastructure, load, and geographic routing all affect actual latency. These figures are averages from Artificial Analysis over 72 hours. Your real-world experience will vary, particularly during peak hours.

Time to first token matters too. Tokens-per-second measures generation throughput once the model starts responding. For interactive applications, time-to-first-token (TTFT) can matter just as much — a model that starts instantly at 80 t/s can feel faster than one that pauses for 2 seconds then runs at 200 t/s.

Reasoning models are slower by design. Models with extended thinking modes generate many internal tokens before producing output. Speed rankings here use standard mode only — not extended thinking.

Values marked “est.” have not been directly measured for this specific model version by Artificial Analysis. Updated when AA completes indexing.