[good?]

Anthropic

Claude Sonnet 4.6

6.6
out of 10

Released February 17, 2026, Claude Sonnet 4.6 is the model most people should use. It's Anthropic's default for free and paid users on Claude.ai for a reason: near-Opus performance at one-fifth the cost, a newer knowledge cutoff, and — in a twist worth understanding — it actually beats Claude Opus 4.6 on the everyday tasks that matter most: office productivity, financial analysis, and real-world tool use. For coding agents, it's the model even serious engineers prefer 59% of the time over the previous-generation flagship.

Context window

200K tokens

API (blended)

$6.00/1M

Consumer access

Free (limited) / $20/mo

Multimodal

Yes

Score Breakdown

65.9/100 → 6.6/10
Total65.9/100 → 6.6/10

Intelligence, Reliability, Speed, and Context are field-relative — scores shift as models are added. Accessibility and Trust are absolute checklists. Full methodology →

Strengths

  • +Best writing quality of any mainstream model
  • +Highly reliable instruction-following
  • +200K context handles most real-world long-document tasks
  • +Strong at nuanced reasoning and analysis
  • +More honest about uncertainty than most models

Weaknesses

  • -Most expensive API in the group at $6/1M blended
  • -Smallest context window of the six models reviewed
  • -No image generation capability

Best for

writingeditinganalysislong documentsresearch synthesis

Not ideal for

real-time dataimage generationultra-budget API use

Adaptive Thinking — Same Engine as Opus

Sonnet 4.6 runs the same four-tier adaptive thinking system as Opus 4.6. The difference isn't the ceiling — it's that Sonnet reaches it at a fraction of the cost.

Effort levelLatencyBest forCost impact
LowFastData retrieval, formatting, simple Q&AMinimal
MediumModerateSummaries, code tasks, email drafts, analysisStandard
High (default)SlowerComplex reasoning, multi-step research, debuggingStandard
MaxSlowestHard constraint problems, deep architecture planningHighest

At medium effort, Sonnet 4.6 matches or beats Opus 4.5 performance while consuming dramatically fewer tokens. Match effort to task complexity — max isn't always better, and it's always more expensive.

How It Benchmarks vs. Competitors

Pass@1, single-attempt scores. All AA-measured in standard mode — no extended thinking, apples-to-apples.

Knowledge & Reasoning (AA-measured)

BenchmarkClaude Sonnet 4.6Claude Opus 4.6GPT-5.2Gemini 3.1 Pro
GPQA Diamond (PhD science reasoning)79.9%84.0%90.3%94.1%
HLE (expert-level knowledge)13.2%18.6%35.4%44.7%

Sonnet trails Opus on deep science reasoning — GPQA Diamond is the honest gap. If your work requires expert-level hard-science QA at scale, Opus or Gemini 3.1 Pro is the right choice. For most knowledge work, the gap is smaller in practice than the raw numbers suggest.

Coding & Tool Use (AA-measured)

BenchmarkClaude Sonnet 4.6Claude Opus 4.6GPT-5.2Gemini 3.1 Pro
τ²-bench (real-world tool use)79.5%84.8%84.8%95.6%
AA Coding Index46.4347.5648.6755.5

On tool use, Sonnet is within 5 points of Opus and GPT-5.2. For most agentic pipelines, that gap is not decision-relevant — the cost difference ($3/$15 vs $5/$25 per MTok) almost certainly is.

Where Sonnet 4.6 Actually Beats Opus 4.6

This is the part of the marketing narrative Anthropic underplays. On several high-value real-world tasks, Sonnet wins — not just ties.

Tasks where Sonnet 4.6 leads

TaskSonnet 4.6Opus 4.6Edge
GDPval-AA (office productivity, Elo)1,6331,559Sonnet +74 Elo
Finance Agent v1.1 (financial analysis)63.3%62.0%Sonnet +1.3pp
MCP-Atlas (scaled tool use)61.3%60.3%Sonnet +1.0pp
OSWorld (computer use)72.5%72.7%Essentially tied (−0.2pp)
Knowledge cutoff (training data through)Jan 2026Aug 2025Sonnet +5 months newer

GDPval-AA is Anthropic's measure of everyday office AI tasks — writing, analysis, scheduling, planning. A 74-Elo gap is meaningful at the top of the distribution. The knowledge cutoff difference matters for anything involving events from late 2025 onward.

Context Window & Output Capacity

Sonnet's context is the same as Opus — the one real spec difference is output tokens.

CapabilityClaude Sonnet 4.6Claude Opus 4.6Notes
Standard context200,000 tokens200,000 tokens~150K words — handles most real-world documents
Extended context (beta)1,000,000 tokens1,000,000 tokensAvailable via API beta flag (Tier 4+); same as Opus
Max output tokens64,000 tokens128,000 tokensSonnet gets half — still large enough for full reports
Knowledge cutoff (reliable)August 2025May 2025Sonnet is more current on recent events
Training data throughJanuary 2026August 2025Sonnet trained on ~5 months more data

64K output is enough for nearly all production use cases: full code files, long-form reports, migration plans. You only need 128K for very large single-document generation. If that's your use case, Opus is the right call.

Claude Code — Preferred Over the Previous Flagship

Sonnet 4.6 is the default model in Claude Code. The preference numbers are stark.

Claude Code model preference (internal A/B testing)

ComparisonSonnet 4.6 preferenceSample
Sonnet 4.6 vs Claude Opus 4.5 (previous flagship)59%Production Claude Code sessions
Opus 4.6 vs Sonnet 4.6 on Sonnet 4.6 tasksVaries by task typeCoding tasks: often similar

The 59% preference stat means: when engineers had a choice between Sonnet 4.6 and the previous-generation flagship Opus 4.5, they chose Sonnet 4.6 more than half the time — a remarkable result for a model at one-fifth the cost. Common reason: Sonnet 4.6 applies effort more efficiently and doesn't over-reason on simple tasks.

Practical routing recommendation

Most Claude Code power users route by task type: Sonnet 4.6 for incremental development, PR reviews, and test writing; Opus 4.6 for greenfield architecture, hard debugging sessions, and anything requiring sustained multi-file reasoning over hours. If you're unsure, start with Sonnet — upgrade to Opus only when you hit a real capability wall.

Computer Use — Near-Opus Performance

Sonnet 4.6 and Opus 4.6 score within 0.2 percentage points on OSWorld. For most computer-use deployments, the choice comes down to cost, not capability.

Computer use performance

BenchmarkSonnet 4.6Opus 4.6GPT-5.2
OSWorld (GUI navigation & desktop tasks)72.5%72.7%38.2%

OSWorld measures a model's ability to navigate operating system GUIs, run terminal commands, and interact with applications autonomously. Claude models lead the field by a significant margin — GPT-5.2's 38.2% reflects that OpenAI hasn't invested comparably in computer-use training.

Safety Profile — More Reassuring Than Opus

Sonnet 4.6's system card was notably more positive than Opus 4.6's. The red-team findings tell a different story at the two tiers.

System card summary

Anthropic's evaluation described Sonnet 4.6 as showing 'a broadly warm, honest, prosocial, and at times funny character, very strong safety behaviors, and no signs of major concerns.' Prompt injection resistance improved significantly over Sonnet 4.5. One documented caveat: in computer use contexts, Sonnet 4.6 showed 'overeager behavior' in some GUI tasks — acting before confirming intent. For autonomous computer-use agents, add confirmation checkpoints.

Key safety contrast: Opus 4.6 vs Sonnet 4.6

ConcernOpus 4.6Sonnet 4.6
Scheming / manipulation in agentic contextsDocumented in system cardNo major concerns flagged
Prompt injection resistanceImproved (0.77% mitigated ASR)Significantly improved vs 4.5
ASL classificationASL-3ASL-2 / ASL-3 boundary
Morally-motivated sabotageOccasional 'whistleblowing' in edge casesNot documented
Computer use overeagernessNot flaggedDocumented — add confirmation steps

ASL-3 is Anthropic's classification for models that 'substantially increase the risk of catastrophic misuse.' Opus 4.6 operates under full ASL-3 protections. Sonnet 4.6's classification reflects a lower risk profile for the same type of deployment.

Pricing — The Real Reason Sonnet Wins

At 40% of Opus's cost with comparable performance on most tasks, the math is straightforward for most workloads.

API pricing comparison

ModelInput (per MTok)Output (per MTok)Blended (3:1 ratio)vs Sonnet
Claude Sonnet 4.6$3.00$15.00$6.00
Claude Opus 4.6$5.00$25.00$10.00+67% more expensive
GPT-5.2$1.25$5.00$2.1964% cheaper than Sonnet
Gemini 3.1 Pro$1.25$10.00$3.4443% cheaper than Sonnet

Blended cost calculated at 3:1 input:output ratio. Sonnet's $6/1M blended puts it above GPT-5.2 and Gemini 3.1 Pro on raw price — but Claude's lower iteration count and superior instruction-following mean total cost per completed task is often comparable. Batch API (50% discount) and prompt caching (90% savings on cached reads) can bring Sonnet's effective cost well below GPT-5.2 for repeated-context workloads.

Consumer plan pricing

PlanPrice/monthSonnet 4.6 accessOpus 4.6 access
Free$0✓ (with daily limits)
Pro$20✓ (full access)
Max (5×)$100✓ (5× Pro capacity)
Max (20×)$200✓ (20× Pro capacity)
Team$30/user/mo

Sonnet 4.6 is the default for Free and Pro users. Pro at $20/month gets you Opus 4.6 too. Max plans are for power users who exhaust Pro limits daily.

Sonnet 4.6 vs Sonnet 4.5 — What Actually Changed

Both cost $3/$15 per 1M tokens. Sonnet 4.6 is the free upgrade — here's what you get.

DimensionSonnet 4.6Sonnet 4.5Delta
AA Intelligence Index44.3837.14+7.2 pts
OSWorld (computer use)72.5%61.4%+11.1pp — major jump
GDPval-AA (office tasks Elo)1,633Not reportedNew leading score
Knowledge cutoff (training)Jan 2026July 2025+6 months newer
SWE-bench Verified (provider)79.6%77.2%+2.4pp
Claude Code preference vs 4.559% preferred41% preferred4.6 wins majority
Adaptive thinking tiers4 tiers4 tiersSame
API price$3/$15$3/$15No cost to upgrade

The OSWorld jump (+11.1pp) is the biggest practical improvement. Computer use in Sonnet 4.5 worked — in Sonnet 4.6, it's substantially more reliable. If your team has autonomous browser or desktop agents on Sonnet 4.5, this is the upgrade that matters.

Sonnet 4.6 vs Gemini 3.1 Pro: Where Each Wins

The most common $3-tier decision for professional API users.

DimensionClaude Sonnet 4.6Gemini 3.1 Pro
GPQA Diamond (AA)79.9%94.1% — Gemini leads
HLE (AA)13.2%44.7% — Gemini leads significantly
τ²-bench (AA)79.5%95.6% — Gemini leads significantly
GDPval-AA (office tasks)1,633 Elo1,317 Elo — Sonnet leads
Writing qualityBest-in-classStrong but less nuanced prose
Context window200K (1M beta)1M (GA)
API price (blended)$6.00/1M$3.44/1M — Gemini cheaper
Data jurisdictionUS / EU (Anthropic)US (Google Cloud)

The decision tree: if you need hard science reasoning, agentic tool use, or production 1M-context — Gemini 3.1 Pro. If you need the best writing, nuanced analysis, or GDPval-style office work — Claude Sonnet. If price is the constraint: Gemini 3.1 Pro is 43% cheaper at blended rates.

Bottom line

Claude Sonnet 4.6 is the right default choice for the vast majority of professional users. It beats the previous-generation flagship on office tasks, matches Opus on computer use, has a more current knowledge cutoff, and costs 40% less. The cases where you actually need Opus 4.6 are narrower than the marketing suggests: sustained multi-file reasoning over hours, hard science QA at scale, or maximum-output-token generation (128K). For everything else — writing, analysis, coding, agentic workflows, customer-facing products — Sonnet is the answer.

Pricing details

Subscription plans

FreeClaude Sonnet access with daily limits(Message cap; no file uploads; no Projects)
Free
Pro5x more usage, Projects, file uploads, priority access during peak hours
$20/mo
TeamAll Pro features, admin console, centralized billing, higher rate limits
$25/mo (annual)

API pricing

AnthropicPrompt caching: cached input at $0.30/1M. Batch API: 50% discount.
$3/$15
OpenRouterSmall markup over direct Anthropic pricing.
$3.1/$15.5
AWS BedrockSame pricing as direct. Useful if already on AWS.
$3/$15
Google Vertex AISame pricing as direct. Useful if already on GCP.
$3/$15

Prices verified February 2026. LLM pricing changes frequently — verify at the provider's site before budgeting.

Last updated: February 17, 2026