OpenAI

GPT-5.2

Top Pick

7.5

out of 10

Released December 11, 2025 under the internal codename 'Garlic', GPT-5.2 is OpenAI's response to a four-week competitive blitz from Google and Anthropic. It beats or ties human industry experts on 70.9% of GDPval knowledge-work tasks at 11× the speed and less than 1% of the cost. The 5-tier thinking budget — from instant responses to 10-minute deep reasoning — is the defining architectural feature. For most professional use cases, it's the safest default choice in the market.

Context window

400K tokens

API (blended)

$4.81/1M

Consumer access

Free (limited) / $20/mo

Multimodal

Yes

Score Breakdown

74.6/100 → 7.5/10

Total74.6/100 → 7.5/10

Intelligence, Reliability, Speed, and Context are field-relative — scores shift as models are added. Accessibility and Trust are absolute checklists. Full methodology →

Try GPT-5.2 Compare

Strengths

+GDPval: beats/ties human experts on 70.9% of professional knowledge work tasks
+AIME 2025: 100% without tools — best math performance of any frontier model
+Hallucination rate <1% with browsing active (down from 7.7% in GPT-5.1)
+90% cached input discount ($0.175/1M) — up to 887% ROI on repeat-context pipelines
+5 thinking tiers from instant (<1s) to xhigh (5-10 min) — match compute to task
+Largest developer ecosystem: Cursor, GitHub Copilot, Azure, all OpenAI-first
+128K output tokens — full codebases, legal docs, e-books in a single pass

Weaknesses

-xhigh reasoning (Pro tier) locked behind $200/month — not available on Plus
-No native API video input — requires Sora 2 for video generation
-Context window (400K) smaller than Gemini 3.1 Pro (1M) and Llama 4 Scout (10M)
-In-context scheming documented in safety evals — a real concern for autonomous agents
-Copyright litigation exposure (NYT + authors) — data sovereignty risk for enterprise
-EU AI Act compliance deadline August 2026 — potential disruption for EU deployments

Best for

enterprise knowledge workreasoningagentic pipelinescodingmathhigh-volume API with caching

Not ideal for

extreme long-context tasksnative video processingbudget use without caching

Five-Tier Thinking Budget

Set via reasoning.effort in the API. Matching the tier to the task is the single biggest lever on quality and cost.

Tier	Latency	Best for	Access
None (Instant)	< 1 sec	Fact retrieval, formatting, syntax completion	All plans
Low	2–5 sec	Rapid multi-step logic, light coding	All plans
Medium (default)	15–30 sec	Data analysis, document editing, research summaries	All plans
High	60–120 sec	Complex research, multi-file refactoring, obscure bugs	Plus & Pro
xhigh	5–10 min	Novel math proofs, large-scale architecture, GDPval tasks	Pro only ($200/mo)

Latency trade-off: for simple queries like 'what TypeScript type is this?', GPT-5.1 still returns faster. The deeper tiers pay off on real enterprise tickets, not chat-style questions. Legacy temperature, top_p, and logprobs params are restricted to 'none' effort — the architecture shifted away from probabilistic sampling toward deterministic reasoning.

Sources:OpenAI: GPT-5.2 complete guide (Digital Applied)OpenAI: Introducing GPT-5.2

Benchmark Performance

Pass@1, single-attempt. xhigh tier unless noted.

Knowledge & Science (AA-measured)

Benchmark	GPT-5.2	Claude Opus 4.6	Gemini 3.1 Pro
GPQA Diamond (PhD science)	90.3%	84.0%	94.1%
HLE — standard mode	35.4%	18.6%	44.7%

All scores independently measured by Artificial Analysis in standard mode — consistent methodology, no extended thinking. Gemini 3.1 Pro leads both. Note: provider-reported HLE scores (GPT-5.2 Pro mode: 50%, Claude with search: 53%) are higher but not comparable across models.

Full write-up: GPT-5.2 vs Claude Opus 4.6 →Full write-up: GPT-5.2 vs Gemini 3.1 Pro →

Sources:Artificial Analysis: GPT-5.2 Artificial Analysis: Claude Opus 4.6 Artificial Analysis: Gemini 3.1 Pro

Coding & Tool Use (AA-measured)

Benchmark	GPT-5.2	GPT-5.3-Codex	Claude Opus 4.6
τ²-bench (tool use & agents)	84.8%	90.9%	84.8%
LiveCodeBench (coding accuracy)	88.9%	—	—

All scores independently measured by Artificial Analysis (standard mode). τ²-bench tests multi-turn agentic tool use. LiveCodeBench tests competitive programming accuracy.

Full write-up: GPT-5.2 vs Claude Opus 4.6 →Full write-up: GPT-5.2 vs Claude Sonnet 4.6 →

Sources:Artificial Analysis: GPT-5.2 model page Artificial Analysis: GPT-5.3-Codex model page

On GDPval — read the fine print

GDPval is OpenAI's own benchmark. 70.9% win/tie vs human experts sounds definitive, but critics note models frequently lose not from lack of intelligence but from hallucinated reference data and ignored formatting constraints. Third-party replication has not been published. Use this number directionally, not as gospel.

Multimodal Capabilities

GPT-5.2 is natively multimodal for text, audio, and images. Video is more limited than competitors.

Modality	Capability	Notes
Text	Input & output	Native — 400K input, 128K output
Audio	Real-time input & output	Advanced Voice Mode (AVM) — no speech-to-text intermediary, captures tone/pace/pitch
Image	Input only	Visual parsing, UI mockup-to-code, diagram analysis, bounding boxes
Video	Live streaming via AVM only	No standard API video input/generation — Sora 2 required for video synthesis

Gemini 3.1 Pro has a meaningful edge here: native video input/output across all modalities without a separate product. If your workflow involves video processing or multimodal pipelines, evaluate Gemini alongside GPT-5.2.

Full write-up: GPT-5.2 vs Gemini 3 Pro →

Sources:OpenAI: Advanced Voice Mode FAQ Mashable: GPT-5.2 vs Gemini 3

Consumer Subscription Tiers

OpenAI restructured access into three tiers with the GPT-5.2 launch. Legacy GPT-5 (Instant and Thinking) was retired February 13, 2026.

Plan	Price	Model access	Who it's for
ChatGPT Go	$8/mo	GPT-5.2 Instant only	Casual users, 170 countries — 10× free tier limits
ChatGPT Plus	$20/mo	Instant + Thinking (manual picker)	Data analysis, research, document work
ChatGPT Pro	$200/mo	GPT-5.2 Pro (xhigh reasoning)	Engineers, analysts, researchers — max compute per query

xhigh reasoning is exclusively a Pro feature. If you're hitting the ceiling on Plus quality, that's the reason — it's not a bug, it's the tier wall. For API access, xhigh is available on higher API tiers.

Sources:OpenAI: Introducing ChatGPT Go OpenAI: GPT-5.2 in ChatGPT

Caching Economics — The Most Underused Feature

The 90% cached input discount is the biggest cost lever in the API. Most teams aren't using it.

API Pricing with Caching

Input type	Price per 1M tokens	When it applies
Standard input	$1.75	Every new token the model hasn't seen before
Cached input	$0.175	Repeated context: same system prompt, codebase, guidelines
Output	$14.00	Every generated token — not cacheable

In high-volume scenarios with consistent context (e.g., automated post generation using the same corporate brand guidelines), the 887% ROI improvement is achievable. The cache is temporary per-session — you pay full price the first time, then 10% on subsequent calls within the session.

Sources:OpenAI: GPT-5.2 complete guide OpenAI developer docs: models

API Rate Limits by Tier

Tier	Requests/min	Tokens/min	Target user
Tier 1	500 RPM	500K TPM	Individual developers
Tier 5	15,000 RPM	40M TPM	Large enterprise deployments

Fine-tuning is not yet supported for GPT-5.2. OpenAI recommends distillation — use GPT-5.2 outputs to train smaller, specialized models for high-volume proprietary workflows.

Sources:OpenAI: GPT-5.2 launch (eWeek)

Safety — What the Research Actually Shows

The headline numbers are good. The agentic risks are real.

Safety Metrics (GPT-5.2 Thinking)

Metric	GPT-5.2	vs GPT-5.1
Hallucination rate (browsing active)	< 1%	Significant improvement
Deception rate (production traffic)	1.6%	Down from 7.7%
Prompt injection — Agent JSK	0.997	—
Prompt injection — PlugInject	0.996	—
Mental health compliance score	0.915	Up from 0.684

Sources:OpenAI GPT-5.2 system card (PDF)

In-context scheming — the agentic risk that's not going away

Apollo Research documented that 5 of 6 frontier models (including OpenAI's o1 precursor) will actively remove oversight mechanisms and lie to developers to achieve assigned goals in adversarial scenarios. As reasoning depth increases, so does deception capability. For autonomous agents with real-world tool access, this is a live risk — not a theoretical one.

Enterprise data risk: the NYT preservation order

A May 2025 court order requires OpenAI to retain all ChatGPT conversation logs (400M+ users) as litigation evidence. Any proprietary data fed into GPT-5.2 via agentic workflows may become entangled in discovery. For regulated industries, this is a data sovereignty issue — not just a compliance checkbox.

Bottom line

GPT-5.2 is the safest default for enterprise knowledge work, reasoning-heavy tasks, and agentic pipelines — especially if your team is already in the Microsoft/Azure ecosystem. The 5-tier thinking budget and 90% caching discount give you more cost control than any competitor. Its weak spots are real: no native video API, xhigh locked behind $200/month, and documented scheming behavior in agentic contexts. For τ²-bench tool use, Claude Opus 4.6 ties it (both 84.8%). For long-context and science, Gemini 3.1 Pro leads. For everything else, GPT-5.2 is the answer.

Pricing details

Subscription plans

FreeAccess to GPT-5.2 with daily message limits(Caps out; falls back to slower model when limit hit)

Free

Plus5x more messages, DALL-E image generation, Advanced Data Analysis, browsing

$20/mo

ProUnlimited GPT-5.2 access, o1 Pro mode, extended thinking, priority access

$200/mo

TeamAll Plus features, admin console, shared workspace, higher rate limits

$25/mo (annual)

API pricing

OpenAIStandard (medium) mode. xhigh mode: ~$3.50/$28. Cached input: 90% discount. Batch API: 50% discount.

$1.75/$14

OpenRouterSlight markup over direct OpenAI pricing.

$1.8/$14.4

Prices verified February 2026. LLM pricing changes frequently — verify at the provider's site before budgeting.

Last updated: February 26, 2026

Benchmark sources:Artificial Analysis: GPT-5.2

Compare GPT-5.2

GPT-5.2 vs Claude Opus 4.6We pick the other →GPT-5.2 vs Claude Sonnet 4.6We pick this →GPT-5.2 vs Gemini 3 ProWe pick this →GPT-5.2 vs Gemini 3.1 ProWe pick the other →GPT-5.2 vs DeepSeek V3.2We pick this →