How We Rate LLMs
Every rating on this site is computed from a defined formula — not assigned by hand. The quality score measures what a model can do. Price is kept separate so cost does not distort capability rankings. For budget-aware recommendations, see the best value, free web chat, and budget API pages.
Why price is not in the quality score
“Which model is smarter?” and “Which is cheapest for my workload?” are different questions. Mixing them into a single score produces misleading results — a cheap model with mediocre capability can outscore an excellent model just because it costs less. Quality and price are both shown on every model page; they are just scored separately.
Quality scoring categories
Intelligence
live40 pts max
Artificial Analysis Intelligence Index v4.0 — an independently measured composite of 10 standard benchmarks: GPQA Diamond, Humanity's Last Exam, SWE-bench Verified, Terminal-Bench Hard, τ²-Bench Telecom, GDPval-AA, SciCode, AA-LCR, AA-Omniscience, and IFBench. Standard/medium mode only — extended thinking scores are excluded. Normalized against the current field: floor is AA Index 20 (below any practical frontier model), ceiling is 5% above the current leader. Recalibrates automatically when models are added.
Current field range: 15.0 (floor) – 50.9 (ceiling, 5% above field leader)
Context Window
15 pts max
Absolute thresholds — qualitative capability differences that matter regardless of competition. <32K=2, 32K–128K=5, 128K–200K=7, 200K–400K=9, 400K–1M=11, 1M–5M=13, 5M–10M=14, 10M+=15 pts. Steps extended to 1B tokens for future-proofing.
Speed
10 pts max
Output tokens per second (Artificial Analysis speed leaderboard). <15 t/s=1, 15–30=2, 30–50=4, 50–80=6, 80–120=8, 120–200=9, 200+=10 pts. Steps extended to 2000+ t/s for future inference hardware.
Accessibility
10 pts max
Practical access for everyday users: free tier (+4), chat UI (+3), mobile app (+2), open source / self-hostable (+1). Max 10 pts.
Trust & Privacy
5 pts max
Company jurisdiction: US/EU = 3 pts, other = 2 pts, Chinese company = 1 pt. Strong published privacy policy: +2 pts.
Final Rating
total / 8
Total score (max 80) divided by 8, rounded to one decimal place. This is the quality rating shown on all model pages and comparisons. No pricing influence.
Price-aware recommendations
These pages rank quality scores within specific pricing constraints.
Quality score audit — all models
Sorted by total quality score. Click a model name to view its full review.
Data sources
- Intelligence Index: Artificial Analysis Intelligence Index v4.0. Independently measured — not self-reported by providers. Incorporates 10 benchmarks: GPQA Diamond, Humanity's Last Exam, SWE-bench Verified, Terminal-Bench Hard, τ²-Bench Telecom, GDPval-AA, SciCode, AA-LCR, AA-Omniscience, IFBench. Standard/medium inference mode only.
- Speed (tokens/sec): Artificial Analysis speed leaderboard — API-measured output tokens per second, 72hr rolling average.
- Pricing: Official provider API documentation and product pages. Verified monthly.
- Context window, accessibility, trust: Manually verified from official product pages and provider privacy policies.
Values labeled “est.” in the audit table above have not yet been directly measured by Artificial Analysis for that specific model version. They are extrapolated from the closest available AA measurement. These will be updated to verified values as AA completes indexing.
Quality scores recalculate automatically when model data is updated. Last data verification: February 2026.