By Price
Best Budget LLM API
Under $1.00 per million tokens (blended). The price gap between budget-tier LLMs and flagship models has never been larger — and for many use cases, the quality gap is smaller than you’d expect. Here’s the best quality you can get on a tight API budget.
Blended cost = (input × 3 + output) ÷ 4 at a 3:1 input:output ratio. Updated February 2026.
How to think about budget API pricing
Cheap doesn’t automatically mean good value — here’s what to actually evaluate:
Calculate at your actual ratio — API pricing is quoted separately for input and output tokens. Your real cost depends entirely on your input:output ratio. A summarization pipeline (long input, short output) will have a very different blended cost than a chatbot (short input, long output). The 3:1 ratio here is a reasonable average — adjust it for your workload.
Quality floor matters more than price — A model at $0.10/1M that produces unreliable output costs more in engineering time to wrangle than a model at $0.50/1M that just works. The right question is: what's the cheapest model that's still good enough for my specific use case?
Rate limits at scale — Budget API tiers often have tighter rate limits than premium tiers. At low volume, you won't notice. At production scale, hitting rate limits repeatedly is expensive in latency and engineering overhead. Check the rate limits before committing.
Open weights = potentially free at scale — Several budget models are open-weight — you can download and self-host them for near-zero per-token cost at scale if you have the GPU capacity. The hosted API price is the fallback for teams without infrastructure.
Our pick
OpenAI's small-but-smart model and the best value in the GPT-5 family. At $0.25/$2.00 per 1M tokens it costs 7× less than GPT-5.2 while delivering an AA Intelligence Index of 41 — higher than Claude Haiku and Gemini Flash. The 400K context window and multimodal input make it a strong default for cost-sensitive production pipelines.
Cheapest option: OpenAI
$0.25/1M input · $2/1M output · $0.690 blended
400K context. Function calling, structured outputs, and vision supported.
Also consider
GPT OSS 120B is OpenAI's first large open-weight language model, released August 2025. It uses a Mixture-of-Experts architecture with 117 billion total parameters and 5.1 billion active per forward pass — designed so it can run on a single H100 GPU. With an AA Intelligence Index of 33 (#1 of 50 in reasoning open-weight models), it's the most capable officially released open-weight model from a frontier lab. At $0.15/$0.60 per 1M tokens and 336 tokens/second, it's both cheap and fast. The open weights are available on Hugging Face and can be self-hosted. A smaller companion model, GPT OSS 20B, runs on consumer 16GB GPUs at $0.05/$0.20 per 1M.
Weights & Biases Inference / OpenRouter: $0.15/$0.6/1M · $0.260 blended
Full review →Released November 17, 2025, Grok 4.1 is xAI's most refined model — a post-training upgrade to Grok 4 that briefly claimed the #1 spot on LMArena (30-position jump) before Gemini 3 Pro and Claude Opus 4.6 overtook it. It leads every frontier model on emotional intelligence (EQ-Bench3: 1586 Elo) and creative writing. It's not trying to win on coding or reasoning — it's trying to be the most compelling AI personality, with the cheapest entry point and real-time X data.
xAI (Grok 4.1 Fast): $0.2/$0.5/1M · $0.250 blended
Full review →DeepSeek's open-weights frontier model and one of the most cost-effective APIs available. V3.2 punches far above its price — at $0.28/$1.10 per 1M tokens it costs roughly 20× less than Claude Sonnet while delivering an AA Intelligence Index of 32. Strong on coding and reasoning tasks, but hosted in China with the privacy implications that brings.
Fireworks AI: $0.22/$0.88/1M · $0.490 blended
Full review →Bottom line
The best budget API pick is whichever model gives you the highest quality score under the $1/1M threshold. But before committing to a budget model, honestly assess whether the quality difference from a frontier model matters for your use case. For summarization, classification, and structured extraction — it often doesn’t. For nuanced reasoning, complex instructions, and open-ended generation — it often does.
Prices change constantly — verify before budgeting. How we rate →