Anthropic
Claude Sonnet 4.6
Released February 17, 2026, Claude Sonnet 4.6 is the model most people should use. It's Anthropic's default for free and paid users on Claude.ai for a reason: near-Opus performance at one-fifth the cost, a newer knowledge cutoff, and — in a twist worth understanding — it actually beats Claude Opus 4.6 on the everyday tasks that matter most: office productivity, financial analysis, and real-world tool use. For coding agents, it's the model even serious engineers prefer 59% of the time over the previous-generation flagship.
Context window
200K tokens
API (blended)
$6.00/1M
Consumer access
Free (limited) / $20/mo
Multimodal
Yes
Score Breakdown
65.9/100 → 6.6/10Intelligence, Reliability, Speed, and Context are field-relative — scores shift as models are added. Accessibility and Trust are absolute checklists. Full methodology →
Strengths
- +Best writing quality of any mainstream model
- +Highly reliable instruction-following
- +200K context handles most real-world long-document tasks
- +Strong at nuanced reasoning and analysis
- +More honest about uncertainty than most models
Weaknesses
- -Most expensive API in the group at $6/1M blended
- -Smallest context window of the six models reviewed
- -No image generation capability
Best for
Not ideal for
Adaptive Thinking — Same Engine as Opus
Sonnet 4.6 runs the same four-tier adaptive thinking system as Opus 4.6. The difference isn't the ceiling — it's that Sonnet reaches it at a fraction of the cost.
| Effort level | Latency | Best for | Cost impact |
|---|---|---|---|
| Low | Fast | Data retrieval, formatting, simple Q&A | Minimal |
| Medium | Moderate | Summaries, code tasks, email drafts, analysis | Standard |
| High (default) | Slower | Complex reasoning, multi-step research, debugging | Standard |
| Max | Slowest | Hard constraint problems, deep architecture planning | Highest |
At medium effort, Sonnet 4.6 matches or beats Opus 4.5 performance while consuming dramatically fewer tokens. Match effort to task complexity — max isn't always better, and it's always more expensive.
How It Benchmarks vs. Competitors
Pass@1, single-attempt scores. All AA-measured in standard mode — no extended thinking, apples-to-apples.
Knowledge & Reasoning (AA-measured)
| Benchmark | Claude Sonnet 4.6 | Claude Opus 4.6 | GPT-5.2 | Gemini 3.1 Pro |
|---|---|---|---|---|
| GPQA Diamond (PhD science reasoning) | 79.9% | 84.0% | 90.3% | 94.1% |
| HLE (expert-level knowledge) | 13.2% | 18.6% | 35.4% | 44.7% |
Sonnet trails Opus on deep science reasoning — GPQA Diamond is the honest gap. If your work requires expert-level hard-science QA at scale, Opus or Gemini 3.1 Pro is the right choice. For most knowledge work, the gap is smaller in practice than the raw numbers suggest.
Coding & Tool Use (AA-measured)
| Benchmark | Claude Sonnet 4.6 | Claude Opus 4.6 | GPT-5.2 | Gemini 3.1 Pro |
|---|---|---|---|---|
| τ²-bench (real-world tool use) | 79.5% | 84.8% | 84.8% | 95.6% |
| AA Coding Index | 46.43 | 47.56 | 48.67 | 55.5 |
On tool use, Sonnet is within 5 points of Opus and GPT-5.2. For most agentic pipelines, that gap is not decision-relevant — the cost difference ($3/$15 vs $5/$25 per MTok) almost certainly is.
Where Sonnet 4.6 Actually Beats Opus 4.6
This is the part of the marketing narrative Anthropic underplays. On several high-value real-world tasks, Sonnet wins — not just ties.
Tasks where Sonnet 4.6 leads
| Task | Sonnet 4.6 | Opus 4.6 | Edge |
|---|---|---|---|
| GDPval-AA (office productivity, Elo) | 1,633 | 1,559 | Sonnet +74 Elo |
| Finance Agent v1.1 (financial analysis) | 63.3% | 62.0% | Sonnet +1.3pp |
| MCP-Atlas (scaled tool use) | 61.3% | 60.3% | Sonnet +1.0pp |
| OSWorld (computer use) | 72.5% | 72.7% | Essentially tied (−0.2pp) |
| Knowledge cutoff (training data through) | Jan 2026 | Aug 2025 | Sonnet +5 months newer |
GDPval-AA is Anthropic's measure of everyday office AI tasks — writing, analysis, scheduling, planning. A 74-Elo gap is meaningful at the top of the distribution. The knowledge cutoff difference matters for anything involving events from late 2025 onward.
Context Window & Output Capacity
Sonnet's context is the same as Opus — the one real spec difference is output tokens.
| Capability | Claude Sonnet 4.6 | Claude Opus 4.6 | Notes |
|---|---|---|---|
| Standard context | 200,000 tokens | 200,000 tokens | ~150K words — handles most real-world documents |
| Extended context (beta) | 1,000,000 tokens | 1,000,000 tokens | Available via API beta flag (Tier 4+); same as Opus |
| Max output tokens | 64,000 tokens | 128,000 tokens | Sonnet gets half — still large enough for full reports |
| Knowledge cutoff (reliable) | August 2025 | May 2025 | Sonnet is more current on recent events |
| Training data through | January 2026 | August 2025 | Sonnet trained on ~5 months more data |
64K output is enough for nearly all production use cases: full code files, long-form reports, migration plans. You only need 128K for very large single-document generation. If that's your use case, Opus is the right call.
Claude Code — Preferred Over the Previous Flagship
Sonnet 4.6 is the default model in Claude Code. The preference numbers are stark.
Claude Code model preference (internal A/B testing)
| Comparison | Sonnet 4.6 preference | Sample |
|---|---|---|
| Sonnet 4.6 vs Claude Opus 4.5 (previous flagship) | 59% | Production Claude Code sessions |
| Opus 4.6 vs Sonnet 4.6 on Sonnet 4.6 tasks | Varies by task type | Coding tasks: often similar |
The 59% preference stat means: when engineers had a choice between Sonnet 4.6 and the previous-generation flagship Opus 4.5, they chose Sonnet 4.6 more than half the time — a remarkable result for a model at one-fifth the cost. Common reason: Sonnet 4.6 applies effort more efficiently and doesn't over-reason on simple tasks.
Practical routing recommendation
Most Claude Code power users route by task type: Sonnet 4.6 for incremental development, PR reviews, and test writing; Opus 4.6 for greenfield architecture, hard debugging sessions, and anything requiring sustained multi-file reasoning over hours. If you're unsure, start with Sonnet — upgrade to Opus only when you hit a real capability wall.
Computer Use — Near-Opus Performance
Sonnet 4.6 and Opus 4.6 score within 0.2 percentage points on OSWorld. For most computer-use deployments, the choice comes down to cost, not capability.
Computer use performance
| Benchmark | Sonnet 4.6 | Opus 4.6 | GPT-5.2 |
|---|---|---|---|
| OSWorld (GUI navigation & desktop tasks) | 72.5% | 72.7% | 38.2% |
OSWorld measures a model's ability to navigate operating system GUIs, run terminal commands, and interact with applications autonomously. Claude models lead the field by a significant margin — GPT-5.2's 38.2% reflects that OpenAI hasn't invested comparably in computer-use training.
Safety Profile — More Reassuring Than Opus
Sonnet 4.6's system card was notably more positive than Opus 4.6's. The red-team findings tell a different story at the two tiers.
System card summary
Anthropic's evaluation described Sonnet 4.6 as showing 'a broadly warm, honest, prosocial, and at times funny character, very strong safety behaviors, and no signs of major concerns.' Prompt injection resistance improved significantly over Sonnet 4.5. One documented caveat: in computer use contexts, Sonnet 4.6 showed 'overeager behavior' in some GUI tasks — acting before confirming intent. For autonomous computer-use agents, add confirmation checkpoints.
Key safety contrast: Opus 4.6 vs Sonnet 4.6
| Concern | Opus 4.6 | Sonnet 4.6 |
|---|---|---|
| Scheming / manipulation in agentic contexts | Documented in system card | No major concerns flagged |
| Prompt injection resistance | Improved (0.77% mitigated ASR) | Significantly improved vs 4.5 |
| ASL classification | ASL-3 | ASL-2 / ASL-3 boundary |
| Morally-motivated sabotage | Occasional 'whistleblowing' in edge cases | Not documented |
| Computer use overeagerness | Not flagged | Documented — add confirmation steps |
ASL-3 is Anthropic's classification for models that 'substantially increase the risk of catastrophic misuse.' Opus 4.6 operates under full ASL-3 protections. Sonnet 4.6's classification reflects a lower risk profile for the same type of deployment.
Pricing — The Real Reason Sonnet Wins
At 40% of Opus's cost with comparable performance on most tasks, the math is straightforward for most workloads.
API pricing comparison
| Model | Input (per MTok) | Output (per MTok) | Blended (3:1 ratio) | vs Sonnet |
|---|---|---|---|---|
| Claude Sonnet 4.6 | $3.00 | $15.00 | $6.00 | — |
| Claude Opus 4.6 | $5.00 | $25.00 | $10.00 | +67% more expensive |
| GPT-5.2 | $1.25 | $5.00 | $2.19 | 64% cheaper than Sonnet |
| Gemini 3.1 Pro | $1.25 | $10.00 | $3.44 | 43% cheaper than Sonnet |
Blended cost calculated at 3:1 input:output ratio. Sonnet's $6/1M blended puts it above GPT-5.2 and Gemini 3.1 Pro on raw price — but Claude's lower iteration count and superior instruction-following mean total cost per completed task is often comparable. Batch API (50% discount) and prompt caching (90% savings on cached reads) can bring Sonnet's effective cost well below GPT-5.2 for repeated-context workloads.
Consumer plan pricing
| Plan | Price/month | Sonnet 4.6 access | Opus 4.6 access |
|---|---|---|---|
| Free | $0 | ✓ (with daily limits) | ✗ |
| Pro | $20 | ✓ (full access) | ✓ |
| Max (5×) | $100 | ✓ | ✓ (5× Pro capacity) |
| Max (20×) | $200 | ✓ | ✓ (20× Pro capacity) |
| Team | $30/user/mo | ✓ | ✓ |
Sonnet 4.6 is the default for Free and Pro users. Pro at $20/month gets you Opus 4.6 too. Max plans are for power users who exhaust Pro limits daily.
Sonnet 4.6 vs Sonnet 4.5 — What Actually Changed
Both cost $3/$15 per 1M tokens. Sonnet 4.6 is the free upgrade — here's what you get.
| Dimension | Sonnet 4.6 | Sonnet 4.5 | Delta |
|---|---|---|---|
| AA Intelligence Index | 44.38 | 37.14 | +7.2 pts |
| OSWorld (computer use) | 72.5% | 61.4% | +11.1pp — major jump |
| GDPval-AA (office tasks Elo) | 1,633 | Not reported | New leading score |
| Knowledge cutoff (training) | Jan 2026 | July 2025 | +6 months newer |
| SWE-bench Verified (provider) | 79.6% | 77.2% | +2.4pp |
| Claude Code preference vs 4.5 | 59% preferred | 41% preferred | 4.6 wins majority |
| Adaptive thinking tiers | 4 tiers | 4 tiers | Same |
| API price | $3/$15 | $3/$15 | No cost to upgrade |
The OSWorld jump (+11.1pp) is the biggest practical improvement. Computer use in Sonnet 4.5 worked — in Sonnet 4.6, it's substantially more reliable. If your team has autonomous browser or desktop agents on Sonnet 4.5, this is the upgrade that matters.
Sonnet 4.6 vs Gemini 3.1 Pro: Where Each Wins
The most common $3-tier decision for professional API users.
| Dimension | Claude Sonnet 4.6 | Gemini 3.1 Pro |
|---|---|---|
| GPQA Diamond (AA) | 79.9% | 94.1% — Gemini leads |
| HLE (AA) | 13.2% | 44.7% — Gemini leads significantly |
| τ²-bench (AA) | 79.5% | 95.6% — Gemini leads significantly |
| GDPval-AA (office tasks) | 1,633 Elo | 1,317 Elo — Sonnet leads |
| Writing quality | Best-in-class | Strong but less nuanced prose |
| Context window | 200K (1M beta) | 1M (GA) |
| API price (blended) | $6.00/1M | $3.44/1M — Gemini cheaper |
| Data jurisdiction | US / EU (Anthropic) | US (Google Cloud) |
The decision tree: if you need hard science reasoning, agentic tool use, or production 1M-context — Gemini 3.1 Pro. If you need the best writing, nuanced analysis, or GDPval-style office work — Claude Sonnet. If price is the constraint: Gemini 3.1 Pro is 43% cheaper at blended rates.
Bottom line
Claude Sonnet 4.6 is the right default choice for the vast majority of professional users. It beats the previous-generation flagship on office tasks, matches Opus on computer use, has a more current knowledge cutoff, and costs 40% less. The cases where you actually need Opus 4.6 are narrower than the marketing suggests: sustained multi-file reasoning over hours, hard science QA at scale, or maximum-output-token generation (128K). For everything else — writing, analysis, coding, agentic workflows, customer-facing products — Sonnet is the answer.
Pricing details
Subscription plans
API pricing
Prices verified February 2026. LLM pricing changes frequently — verify at the provider's site before budgeting.
Last updated: February 17, 2026