LLM Rankings by Capability

Pure capability — price not included. For budget-aware picks see best by price. For image and video generators see Images and Video.

Overall leader

Gemini 3.1 Pro

8.7/10

Intelligence leader

Gemini 3.1 Pro

AA 57.18

Context leader

Llama 4 Scout

10M tokens

Speed leader

GPT OSS 120B

304 t/s

Overall quality score

0–10 score

The composite quality score — intelligence, context, speed, accessibility, and trust. The most useful single ranking.

→

Reasoning & intelligence

AA Index score

Ranked by Artificial Analysis Intelligence Index — an independently measured composite of 10 benchmarks including GPQA Diamond, τ²-Bench, and Humanity's Last Exam.

→

Context window

Max tokens

How many tokens can the model process in a single request? Matters for long documents, full codebases, and multi-turn conversations.

→

Speed

Tokens/sec

Output tokens per second via API — how fast the model actually responds. Sourced from Artificial Analysis speed leaderboard.

→

Coding

τ²-bench / LCB

Best LLMs for code generation, debugging, and software engineering tasks. Ranked using τ²-bench and LiveCodeBench from Artificial Analysis.

→

Multimodal

Quality score

LLMs that accept text and image input, ranked by overall quality score. Useful for document analysis, screenshot debugging, and visual Q&A.

→

Open source

Quality score

Open-weight models you can self-host, fine-tune, or run through providers like Groq and Together AI. Same composite quality score, filtered to open weights only.

→

How rankings are generated

Every number on these pages comes from a defined source — Artificial Analysis for intelligence and speed, official documentation for context windows. Nothing is estimated or manually assigned except where labeled “est.” See how we rate for the full methodology.