Anthropic

Claude Sonnet 4.5

5.8

out of 10

Released September 29, 2025, Claude Sonnet 4.5 was the most consequential Anthropic release of 2025 — a mid-tier model that outperformed its own flagship Opus 4.1 on most tasks at one-fifth the price, set the high watermark on SWE-bench Verified (77.2%), and proved that AI could sustain autonomous coding sessions for 30+ continuous hours. It has since been succeeded by Sonnet 4.6 (February 2026). If you're starting fresh, use Sonnet 4.6. But Sonnet 4.5 remains a proven, heavily safety-tested model at the same price — and for teams already running on it, there's no urgent reason to upgrade.

Context window

200K tokens

API (blended)

$6.00/1M

Consumer access

Free (limited) / $20/mo

Multimodal

Yes

Score Breakdown

58.1/100 → 5.8/10

Total58.1/100 → 5.8/10

Intelligence, Reliability, Speed, and Context are field-relative — scores shift as models are added. Accessibility and Trust are absolute checklists. Full methodology →

Try Claude Sonnet 4.5 Compare

Strengths

+77.2% SWE-bench Verified — best coding model in the world at launch (Sept 2025)
+61.4% OSWorld — best computer-use score at release, 45% jump over Sonnet 4
+30+ hour autonomous coding sessions — 4× longer than prior-generation Claude Opus 4
+98% τ²-bench telecom (tool-use orchestration) — near-perfect agentic performance
+Heavily safety-tested: 148-page system card, ASL-3, 99.29% harmless response rate
+Same $3/$15 API pricing as Sonnet 4.6 — proven model, no upgrade cost

Weaknesses

-Succeeded by Sonnet 4.6 (Feb 2026) — most use cases should prefer 4.6
-Knowledge cutoff July 2025 — 6 months behind Sonnet 4.6 (Jan 2026 training data)
-Hedges in 34% of code review comments — more cautious than necessary on actionable feedback
-Visual reasoning (MMMU 77.8%) trails GPT-5.2 significantly
-UI/frontend work still a known weak spot vs Gemini models

Best for

codingagentic taskscomputer uselong autonomous sessionsteams already on Sonnet 4.5

Not ideal for

latest events/knowledgevisual reasoningUI-heavy frontend tasksnew projects (use Sonnet 4.6)

Where Sonnet 4.5 Stood at Launch

At release in September 2025, Sonnet 4.5 was the best coding model in the world by most measures — and it beat its own company's flagship at a fraction of the cost.

Launch benchmark comparison (provider-reported, extended thinking)

Benchmark	Sonnet 4.5	GPT-5 (Oct 2025)	GPT-5 Codex	What it measures
SWE-bench Verified	77.2% (82% parallel)	72.8%	74.5%	Real GitHub bug-fixing
Terminal-Bench (thinking)	61.3%	—	58.8%	Autonomous CLI coding
OSWorld (computer use)	61.4%	~38%	Not reported	Desktop GUI navigation
Finance Agent v1.1	55.3%	46.9%	—	Financial analysis tasks
GPQA Diamond (science)	83.4%	85.7%	—	Expert scientific reasoning

These are provider-reported scores using extended thinking — higher than AA standard-mode measurements. They're useful for understanding Sonnet 4.5's position at launch relative to competitors at the time. For current apples-to-apples AA-measured data, see the Score Breakdown panel above.

Sources:Anthropic: Introducing Claude Sonnet 4.5 Anthropic: Claude Sonnet 4.5 system card

It beat its own flagship

Sonnet 4.5 surpassed Claude Opus 4.1 — Anthropic's own flagship at the time — on nearly every metric while costing $3/$15 per 1M tokens vs. Opus 4.1's $15/$75. Anthropic CPO Mike Krieger described Sonnet 4.5 as smaller than Opus 4.1 but smarter 'in almost every single way.' This tier compression — where mid-range models outperform prior-generation flagships — has become the defining pattern of the Claude 4.x family.

Coding & Agentic Performance

The two capabilities that defined Sonnet 4.5's reputation: SWE-bench leadership and 30-hour autonomous runs.

Coding benchmark performance

Benchmark	Score	Context
SWE-bench Verified (standard)	77.2%	Best in class at launch — #1 of any model
SWE-bench Verified (parallel compute)	82.0%	With multi-sample test-time compute
Terminal-Bench (extended thinking)	61.3%	First model to crack 60%
τ²-bench (retail)	86.2%	Tool orchestration — Telecom: 98.0%
τ²-bench (telecom)	98.0%	Near-perfect on multi-tool agentic tasks
Finance Agent v1.1	55.3%	#1 at launch, 8pp above GPT-5

SWE-bench Verified measures whether a model can resolve real, unsimplified GitHub issues. 77.2% means it successfully fixed 77 out of 100 real bugs on the first attempt. This held the top spot for five months before Sonnet 4.6 pushed higher.

Sources:Anthropic: Claude Sonnet 4.5 benchmarks

30-hour autonomous agent — what that means in practice

Metric	Sonnet 4.5	Claude Opus 4 (previous)
Sustained autonomous task horizon	30+ hours	~7 hours
Improvement	4× longer	—
Real-world demo	11,000-line chat app (Slack-like)	—
Tasks completed autonomously	Write code, stand up DB, buy domain, SOC 2 audit	—

Anthropic researcher David Hershey documented an enterprise trial where Sonnet 4.5 autonomously built a Slack-like chat application — 11,000 lines of code — including database setup, domain purchase, and compliance auditing. Human checkpoints were in place but not required for most steps.

Sources:Anthropic: Long-horizon agentic capabilities

Computer Use — OSWorld 61.4%

Sonnet 4.5 nearly doubled its predecessor's computer use performance in four months.

OSWorld progression — Anthropic Claude family

Model	OSWorld score	Release
Claude Sonnet 4 (Sonnet 4.0)	42.2%	May 2025
Claude Sonnet 4.5	61.4%	Sept 2025 (+45% improvement)
Claude Sonnet 4.6	72.5%	Feb 2026
Claude Opus 4.6	72.7%	Feb 2026

OSWorld measures a model's ability to operate desktop and browser UIs autonomously — clicking, typing, navigating applications, running terminal commands. The 45% jump from Sonnet 4 to Sonnet 4.5 in just four months reflects Anthropic's investment in computer-use training data during this period.

Where Sonnet 4.5 Falls Short

The model has consistent weaknesses that developers hit in production — and they haven't been fixed. They're addressed in Sonnet 4.6.

Known limitations

Limitation	Evidence	Fixed in 4.6?
Hedging in code review	34% of actionable comments use 'might,' 'could,' 'possibly' — worse than Opus 4.1 (28%)	Improved
Visual reasoning	MMMU 77.8% — well behind GPT-5.2 (84.2%)	Partial
UI / frontend work	Multiple developers report 15-20% slower than Gemini on short UI fixes	Partial
Instruction literalness	Treats MUST/ALWAYS as contextual, not absolute — broke production prompts for some teams	Improved
Knowledge cutoff	July 2025 — events after are unreliable	Yes (4.6 has Jan 2026)

CodeRabbit's 25-PR benchmark found Sonnet 4.5 catching 41% of important bugs vs Opus 4.1's 50% — closer than expected, but the hedging behavior made its reviews less actionable. 'A thoughtful colleague where Opus is surgical.'

Sources:CodeRabbit: Sonnet 4.5 code review analysis

Safety — Most Thoroughly Tested Claude at Release

Sonnet 4.5 shipped with a 148-page system card — the most comprehensive of any Claude release at the time.

Safety evaluation results

Metric	Sonnet 4.5	vs. Sonnet 4
Harmless response rate	99.29%	Up from 98.22%
Over-refusal rate	0.02%	Down from 0.15%
Shortcut behaviors	65% reduction	vs. Sonnet 3.7
Malicious agentic requests rejected	98.7% (148/150)	Up from 89.3%
ASL classification	ASL-3	ASL-3
Third-party auditors	UK AI Safety Institute, US AISI, Apollo Research	—

First Claude model evaluated using mechanistic interpretability — probing internal neural representations for alignment rather than relying solely on behavioral tests. Political bias dropped to 3.3% (1.3% with extended thinking).

Sources:Anthropic: Claude Sonnet 4.5 system card

Sonnet 4.5 vs Sonnet 4.6 — Should You Upgrade?

Both cost $3/$15 per 1M tokens. Here's what changed.

Dimension	Sonnet 4.5	Sonnet 4.6	Upgrade worth it?
AA Intelligence Index	37.14	44.38	Yes — measurable gap
GDPval-AA (office tasks)	Not reported	1,633 Elo	Yes
OSWorld (computer use)	61.4%	72.5%	Yes — significant jump
Knowledge cutoff	July 2025	Aug 2025 (train: Jan 2026)	Yes — 6 months newer
Adaptive thinking tiers	4 tiers	4 tiers	No change
Context window	200K (1M beta)	200K (1M beta)	No change
Max output tokens	64K	64K	No change
API price	$3/$15	$3/$15	Free upgrade

If you're calling via API, switching from claude-sonnet-4-5-20250929 to claude-sonnet-4-6 is a zero-cost upgrade with meaningful capability improvements. The main reason to stay on 4.5 is if you've tuned prompts specifically for its behavior and don't want to re-evaluate outputs.

What Sonnet 4.5 meant for the industry

Sonnet 4.5 arrived when Anthropic was valued at $183 billion (Series F). Claude Code reached $500M+ run-rate revenue in its first six months — largely on the back of Sonnet 4.5's coding capability. By the time Sonnet 4.6 shipped in February 2026, Anthropic had raised another $30B at a $380B valuation. The coding arms race that Sonnet 4.5 helped define — 30-hour autonomous sessions, SWE-bench leadership, agentic tool use — is now the primary competitive battleground for every major AI lab.

Bottom line

Claude Sonnet 4.5 set the standard for what mid-tier AI could do in 2025 — and it holds up well. For new projects, use Sonnet 4.6: it's meaningfully better on intelligence, computer use, and knowledge recency at the same price. For existing systems built on Sonnet 4.5, the prompts and behavior you've tuned are stable — upgrade on your own timeline. The model's core strengths (coding, agentic persistence, τ²-bench tool use) remain competitive even against 2026 alternatives.

Pricing details

Subscription plans

FreeClaude Sonnet access with daily limits (4.6 is current default)(Message cap; no file uploads; no Projects)

Free

ProAccess to all Claude models including Sonnet 4.5 via model picker

$20/mo

TeamAll Pro features, admin console, centralized billing, higher rate limits

$25/mo (annual)

API pricing

AnthropicAPI ID: claude-sonnet-4-5-20250929. Prompt caching: cached input at $0.30/1M. Batch API: 50% discount.

$3/$15

AWS BedrockID: anthropic.claude-sonnet-4-5. Same pricing as direct.

$3/$15

Google Vertex AISame pricing as direct.

$3/$15

Prices verified February 2026. LLM pricing changes frequently — verify at the provider's site before budgeting.

Last updated: February 27, 2026

Benchmark sources:Artificial Analysis: Claude Sonnet 4.5·Anthropic: Introducing Claude Sonnet 4.5