Llama 4 Scout

Open Source

3.9

out of 10

Meta's ultra-long-context open-weights model with a 10M token window — the largest of any publicly available model. Scout is a smaller MoE variant (109B total, ~17B active) optimized for speed and context length over raw intelligence. At 135 t/s and AA Intelligence Index 14, it's the right call when you need to process enormous documents or codebases that would overflow any other model.

Context window

10.0M tokens

API (blended)

$0.17/1M

Consumer access

Free (limited)

Multimodal

Yes

Score Breakdown

39.1/100 → 3.9/10

Total39.1/100 → 3.9/10

Intelligence, Reliability, Speed, and Context are field-relative — scores shift as models are added. Accessibility and Trust are absolute checklists. Full methodology →

Try Llama 4 Scout Compare

Strengths

+10M token context — largest of any publicly available model; processes entire repositories or books in one call
+Open weights (Llama 4 Community License) — fully self-hostable and auditable
+135 t/s (AA-measured) — fastest of any model reviewed, including much smaller ones
+Ultra-cheap API: ~$0.17/1M blended via Groq — cheapest in the dataset
+Natively multimodal: image + text input
+Active ecosystem: Groq, Together, Fireworks, Ollama, and LM Studio support

Weaknesses

-AA Intelligence Index 13 — the lowest in this dataset; not suitable for complex reasoning tasks
-GPQA Diamond 58.7%, HLE 4.3% (AA-measured) — near the lower bound for frontier science tasks
-τ²-bench 15.5% (AA-measured) — very limited agentic tool use capability
-Meta data practices: not ideal for privacy-sensitive enterprise use
-10M context not available through all hosted providers — check limits before integrating
-No MCP support

Best for

ultra-long document processinglarge codebase analysisself-hostinghigh-throughput summarizationbudget-constrained pipelines

Not ideal for

expert reasoningcomplex agentic taskshigh-quality creative worksensitive enterprise data

Pricing details

Subscription plans

Free (self-hosted)Full model weights via HuggingFace — run on your own infrastructure

Free

Free (meta.ai)Web chat access with rate limits(Daily message caps; full 10M context only via API/self-host)

Free

API pricing

Groqfree tierExtremely fast. Free tier with rate limits. Context window varies by provider.

$0.11/$0.34

Together AICompetitive pricing; good for production workloads.

$0.18/$0.66

Fireworks AIFast inference options available.

$0.15/$0.6

Prices verified February 2026. LLM pricing changes frequently — verify at the provider's site before budgeting.

Last updated: February 27, 2026

Benchmark sources:Artificial Analysis: Llama 4 Scout