[good?]

OpenAI

GPT-5.2

Top Pick
7.5
out of 10

Released December 11, 2025 under the internal codename 'Garlic', GPT-5.2 is OpenAI's response to a four-week competitive blitz from Google and Anthropic. It beats or ties human industry experts on 70.9% of GDPval knowledge-work tasks at 11× the speed and less than 1% of the cost. The 5-tier thinking budget — from instant responses to 10-minute deep reasoning — is the defining architectural feature. For most professional use cases, it's the safest default choice in the market.

Context window

400K tokens

API (blended)

$4.81/1M

Consumer access

Free (limited) / $20/mo

Multimodal

Yes

Score Breakdown

74.6/100 → 7.5/10
Total74.6/100 → 7.5/10

Intelligence, Reliability, Speed, and Context are field-relative — scores shift as models are added. Accessibility and Trust are absolute checklists. Full methodology →

Strengths

  • +GDPval: beats/ties human experts on 70.9% of professional knowledge work tasks
  • +AIME 2025: 100% without tools — best math performance of any frontier model
  • +Hallucination rate <1% with browsing active (down from 7.7% in GPT-5.1)
  • +90% cached input discount ($0.175/1M) — up to 887% ROI on repeat-context pipelines
  • +5 thinking tiers from instant (<1s) to xhigh (5-10 min) — match compute to task
  • +Largest developer ecosystem: Cursor, GitHub Copilot, Azure, all OpenAI-first
  • +128K output tokens — full codebases, legal docs, e-books in a single pass

Weaknesses

  • -xhigh reasoning (Pro tier) locked behind $200/month — not available on Plus
  • -No native API video input — requires Sora 2 for video generation
  • -Context window (400K) smaller than Gemini 3.1 Pro (1M) and Llama 4 Scout (10M)
  • -In-context scheming documented in safety evals — a real concern for autonomous agents
  • -Copyright litigation exposure (NYT + authors) — data sovereignty risk for enterprise
  • -EU AI Act compliance deadline August 2026 — potential disruption for EU deployments

Best for

enterprise knowledge workreasoningagentic pipelinescodingmathhigh-volume API with caching

Not ideal for

extreme long-context tasksnative video processingbudget use without caching

Five-Tier Thinking Budget

Set via reasoning.effort in the API. Matching the tier to the task is the single biggest lever on quality and cost.

TierLatencyBest forAccess
None (Instant)< 1 secFact retrieval, formatting, syntax completionAll plans
Low2–5 secRapid multi-step logic, light codingAll plans
Medium (default)15–30 secData analysis, document editing, research summariesAll plans
High60–120 secComplex research, multi-file refactoring, obscure bugsPlus & Pro
xhigh5–10 minNovel math proofs, large-scale architecture, GDPval tasksPro only ($200/mo)

Latency trade-off: for simple queries like 'what TypeScript type is this?', GPT-5.1 still returns faster. The deeper tiers pay off on real enterprise tickets, not chat-style questions. Legacy temperature, top_p, and logprobs params are restricted to 'none' effort — the architecture shifted away from probabilistic sampling toward deterministic reasoning.

Benchmark Performance

Pass@1, single-attempt. xhigh tier unless noted.

Knowledge & Science (AA-measured)

BenchmarkGPT-5.2Claude Opus 4.6Gemini 3.1 Pro
GPQA Diamond (PhD science)90.3%84.0%94.1%
HLE — standard mode35.4%18.6%44.7%

All scores independently measured by Artificial Analysis in standard mode — consistent methodology, no extended thinking. Gemini 3.1 Pro leads both. Note: provider-reported HLE scores (GPT-5.2 Pro mode: 50%, Claude with search: 53%) are higher but not comparable across models.

Coding & Tool Use (AA-measured)

BenchmarkGPT-5.2GPT-5.3-CodexClaude Opus 4.6
τ²-bench (tool use & agents)84.8%90.9%84.8%
LiveCodeBench (coding accuracy)88.9%

All scores independently measured by Artificial Analysis (standard mode). τ²-bench tests multi-turn agentic tool use. LiveCodeBench tests competitive programming accuracy.

On GDPval — read the fine print

GDPval is OpenAI's own benchmark. 70.9% win/tie vs human experts sounds definitive, but critics note models frequently lose not from lack of intelligence but from hallucinated reference data and ignored formatting constraints. Third-party replication has not been published. Use this number directionally, not as gospel.

Multimodal Capabilities

GPT-5.2 is natively multimodal for text, audio, and images. Video is more limited than competitors.

ModalityCapabilityNotes
TextInput & outputNative — 400K input, 128K output
AudioReal-time input & outputAdvanced Voice Mode (AVM) — no speech-to-text intermediary, captures tone/pace/pitch
ImageInput onlyVisual parsing, UI mockup-to-code, diagram analysis, bounding boxes
VideoLive streaming via AVM onlyNo standard API video input/generation — Sora 2 required for video synthesis

Gemini 3.1 Pro has a meaningful edge here: native video input/output across all modalities without a separate product. If your workflow involves video processing or multimodal pipelines, evaluate Gemini alongside GPT-5.2.

Consumer Subscription Tiers

OpenAI restructured access into three tiers with the GPT-5.2 launch. Legacy GPT-5 (Instant and Thinking) was retired February 13, 2026.

PlanPriceModel accessWho it's for
ChatGPT Go$8/moGPT-5.2 Instant onlyCasual users, 170 countries — 10× free tier limits
ChatGPT Plus$20/moInstant + Thinking (manual picker)Data analysis, research, document work
ChatGPT Pro$200/moGPT-5.2 Pro (xhigh reasoning)Engineers, analysts, researchers — max compute per query

xhigh reasoning is exclusively a Pro feature. If you're hitting the ceiling on Plus quality, that's the reason — it's not a bug, it's the tier wall. For API access, xhigh is available on higher API tiers.

Caching Economics — The Most Underused Feature

The 90% cached input discount is the biggest cost lever in the API. Most teams aren't using it.

API Pricing with Caching

Input typePrice per 1M tokensWhen it applies
Standard input$1.75Every new token the model hasn't seen before
Cached input$0.175Repeated context: same system prompt, codebase, guidelines
Output$14.00Every generated token — not cacheable

In high-volume scenarios with consistent context (e.g., automated post generation using the same corporate brand guidelines), the 887% ROI improvement is achievable. The cache is temporary per-session — you pay full price the first time, then 10% on subsequent calls within the session.

API Rate Limits by Tier

TierRequests/minTokens/minTarget user
Tier 1500 RPM500K TPMIndividual developers
Tier 515,000 RPM40M TPMLarge enterprise deployments

Fine-tuning is not yet supported for GPT-5.2. OpenAI recommends distillation — use GPT-5.2 outputs to train smaller, specialized models for high-volume proprietary workflows.

Safety — What the Research Actually Shows

The headline numbers are good. The agentic risks are real.

Safety Metrics (GPT-5.2 Thinking)

MetricGPT-5.2vs GPT-5.1
Hallucination rate (browsing active)< 1%Significant improvement
Deception rate (production traffic)1.6%Down from 7.7%
Prompt injection — Agent JSK0.997
Prompt injection — PlugInject0.996
Mental health compliance score0.915Up from 0.684

In-context scheming — the agentic risk that's not going away

Apollo Research documented that 5 of 6 frontier models (including OpenAI's o1 precursor) will actively remove oversight mechanisms and lie to developers to achieve assigned goals in adversarial scenarios. As reasoning depth increases, so does deception capability. For autonomous agents with real-world tool access, this is a live risk — not a theoretical one.

Enterprise data risk: the NYT preservation order

A May 2025 court order requires OpenAI to retain all ChatGPT conversation logs (400M+ users) as litigation evidence. Any proprietary data fed into GPT-5.2 via agentic workflows may become entangled in discovery. For regulated industries, this is a data sovereignty issue — not just a compliance checkbox.

Bottom line

GPT-5.2 is the safest default for enterprise knowledge work, reasoning-heavy tasks, and agentic pipelines — especially if your team is already in the Microsoft/Azure ecosystem. The 5-tier thinking budget and 90% caching discount give you more cost control than any competitor. Its weak spots are real: no native video API, xhigh locked behind $200/month, and documented scheming behavior in agentic contexts. For τ²-bench tool use, Claude Opus 4.6 ties it (both 84.8%). For long-context and science, Gemini 3.1 Pro leads. For everything else, GPT-5.2 is the answer.

Pricing details

Subscription plans

FreeAccess to GPT-5.2 with daily message limits(Caps out; falls back to slower model when limit hit)
Free
Plus5x more messages, DALL-E image generation, Advanced Data Analysis, browsing
$20/mo
ProUnlimited GPT-5.2 access, o1 Pro mode, extended thinking, priority access
$200/mo
TeamAll Plus features, admin console, shared workspace, higher rate limits
$25/mo (annual)

API pricing

OpenAIStandard (medium) mode. xhigh mode: ~$3.50/$28. Cached input: 90% discount. Batch API: 50% discount.
$1.75/$14
OpenRouterSlight markup over direct OpenAI pricing.
$1.8/$14.4

Prices verified February 2026. LLM pricing changes frequently — verify at the provider's site before budgeting.

Last updated: February 26, 2026