OpenAI

GPT-5.3-Codex

7.2

out of 10

GPT-5.3-Codex is the best autonomous coding agent available right now — faster than its predecessor, more token-efficient than Claude Code, and now baked into GitHub Copilot. The catch: no API yet (delayed by an unprecedented 'High cybersecurity capability' classification), and it's an async delegation tool, not a pair programmer. If you want to fire off tasks and check back later, this is it.

Context window

400K tokens

API (blended)

$4.81/1M

Consumer access

$20/mo

Multimodal

Text only

Score Breakdown

71.7/100 → 7.2/10

Total71.7/100 → 7.2/10

Intelligence, Reliability, Speed, and Context are field-relative — scores shift as models are added. Accessibility and Trust are absolute checklists. Full methodology →

Try GPT-5.3-Codex Compare

Strengths

+τ²-bench: 90.9% (AA-measured) — second only to Gemini 3.1 Pro; best OpenAI model for agentic tool use
+GPQA Diamond: 91.5% and HLE: 39.9% (AA-measured) — strong science and reasoning for a coding-focused model
+~3× more token-efficient than Claude Code (72K vs 235K tokens on equivalent TypeScript tasks)
+25% faster than GPT-5.2-Codex — meaningful for long-horizon agentic runs
+First model 'instrumental in its own creation' — used to debug, deploy, and evaluate itself
+GitHub Copilot integration (Feb 9, 2026) — reaches millions of developers in existing workflows
+Supports parallel task delegation — run 7+ simultaneous Codex instances without context loss
+Low, medium, high, xhigh reasoning effort settings — tune cost vs depth per task
+40–60% chance of merge-ready code on minor tasks with no intervention

Weaknesses

-API not yet available — delayed by 'High capability' cybersecurity classification under Preparedness Framework
-First OpenAI model to acknowledge possible cyberattack enablement capability — Apollo Research sabotage score: 0.88/1.00
-Claude Code leads on complex refactors: ~23% fewer runtime errors in large TypeScript codebases
-Codex UI designed for task delegation, not pair programming — 'cumbersome' for interactive back-and-forth (Proser review)
-Desktop app is Mac-only at launch
-GPT-5.3-Codex-Spark (fastest variant, Cerebras hardware) is research preview only — Pro users only, 128K context, text-only
-Context window estimated at 400K — smaller than Gemini 3.1 Pro (1M) for very large repository analysis
-California SB 53 alleged violation filed — regulatory status unresolved as of Feb 2026

Best for

autonomous multi-step development workflows (research → implement → debug → test → PR)parallelizing multiple independent coding tasks simultaneouslydevelopers already in the GitHub / VS Code / ChatGPT ecosystemhigh-volume API coding tasks when API access opens (token efficiency advantage)teams delegating maintenance-level tickets and minor features autonomously

Not ideal for

interactive pair programming (use Claude Code instead)large complex refactors requiring tight reasoning controlAPI-based integration right now (no API yet)organizations with strict cybersecurity procurement requirementsvery large repository analysis requiring 1M+ context

This is not a chatbot — it's an autonomous coding agent

GPT-5.3-Codex powers the Codex product (chatgpt.com/codex). You delegate a task, it runs autonomously in a cloud sandbox pre-loaded with your repo — writing code, running tests, fixing bugs, opening PRs — for hours or even days. One developer ran it for 25 hours uninterrupted, generating ~30,000 lines of code across ~13 million tokens. You steer it mid-task but you're managing it, not typing alongside it.

Benchmark performance (AA-measured)

Benchmark	GPT-5.3-Codex	Claude Opus 4.6	Gemini 3.1 Pro	GPT-5.2
GPQA Diamond (PhD science)	91.5%	84.0%	94.1%	90.3%
HLE — standard mode	39.9%	18.6%	44.7%	35.4%
τ²-bench (tool use & agents)	90.9%	84.8%	95.6%	84.8%

All scores independently measured by Artificial Analysis in standard mode — consistent methodology across all models.

Full write-up: GPT-5.3-Codex vs Claude Opus 4.6 →Full write-up: GPT-5.2 full review →

Sources:Artificial Analysis: GPT-5.3-Codex Artificial Analysis: Claude Opus 4.6 Artificial Analysis: Gemini 3.1 Pro

τ²-bench: 90.9% — second only to Gemini 3.1 Pro

On Artificial Analysis's τ²-bench (multi-turn agentic tool use), GPT-5.3-Codex scores 90.9% — ahead of Claude Opus 4.6 (84.8%) and GPT-5.2 (84.8%), trailing only Gemini 3.1 Pro (95.6%). For real-world autonomous coding pipelines that rely on tool calling and multi-step execution, this is the meaningful number.

Token efficiency vs Claude Code

Task	GPT-5.3-Codex tokens	Claude Code tokens	Codex advantage
TypeScript feature implementation	72,579	234,772	~3.2× fewer
Figma-to-code conversion	~1.5M	~6.2M	~4.1× fewer

Token usage translates directly to cost. On API-equivalent tasks, Codex's efficiency advantage is substantial — particularly on longer jobs.

Sources:Composio: token efficiency deep-dive

Codex vs Claude Code: which one to use

Dimension	GPT-5.3-Codex	Claude Code (Opus 4.6)
Interaction model	Async delegation — fire & check back	Sync pair programming — stay in loop
Token efficiency	✓ ~3× fewer tokens	Higher token usage
Speed advantage	✓ 40% faster on simple tasks	Slower
Complex refactors	More errors reported	✓ ~23% fewer runtime errors
τ²-bench (AA)	✓ 90.9%	84.8%
GPQA Diamond (AA)	91.5%	84.0%
Context window	~400K (est)	✓ 200K standard / 1M beta
API availability	Not yet (Q1 2026 est)	✓ Available now
Ecosystem	✓ GitHub Copilot, VS Code, ChatGPT	Claude.ai, Cursor, IDE extensions

These are complementary tools, not direct substitutes. Expert consensus is to use both.

Full write-up: Claude Opus 4.6 full review →

The Codex model family (modern product — not the 2021 model)

Model	Release	Key milestone
codex-1 (based on o3)	May 16, 2025	First agentic Codex; research preview
GPT-5-Codex	~Sep 2025	First GPT-5 variant for agentic coding
GPT-5.1-Codex	~Nov 2025	Incremental improvement
GPT-5.1-Codex-Max	~Dec 2025	Long-horizon variant
GPT-5.2-Codex	Jan 14, 2026	Context compaction, Windows support, cybersecurity features
GPT-5.3-Codex	Feb 5, 2026	Current flagship — combines Codex + GPT-5.2 training stacks; 25% faster
GPT-5.3-Codex-Spark	Feb 12, 2026	1,000+ t/s on Cerebras; 128K context; text-only; Pro preview only

The modern Codex product is unrelated to OpenAI's deprecated GPT-3-based Codex of 2021–2023. They share a name only.

Sources:OpenAI Codex changelog

⚠️ First OpenAI model classified 'High capability' in cybersecurity

OpenAI's System Card states this is the first model treated as High capability under their Preparedness Framework for Cybersecurity. Apollo Research found a mean best-of-10 sabotage score of 0.88/1.00 (vs 0.75 for GPT-5.2). OpenAI doesn't claim it can fully automate cyberattacks but "cannot rule out the possibility." This is why the API is delayed. A California SB 53 violation was alleged by a watchdog organization — OpenAI disputes the interpretation.

Access options

Access path	Entry price	What you get
ChatGPT Plus	$20/mo	Codex Web + CLI + VS Code; standard usage limits
ChatGPT Pro	$200/mo	Higher limits + Codex-Spark research preview (Cerebras)
GitHub Copilot	~$10/mo	Codex in github.com, Mobile, VS, VS Code (from Feb 9, 2026)
ChatGPT Team	$30/user/mo	Shared workspace, admin controls
Enterprise / Edu	Custom	SOC 2, HIPAA, zero data retention
API	Not yet available	GPT-5.2-Codex API ($1.75/$14 per 1M) is current alternative

Key capabilities

→

Autonomous long-horizon executionRuns for hours or days without intervention. Writes features, fixes bugs, proposes PRs, executes tests — all in isolated cloud sandboxes pre-loaded with your repository.

→

Parallel task delegationRun 7+ simultaneous Codex instances on independent tasks. Each runs in its own sandbox with full repo context. Used at OpenAI DevDay 2025 to ship game implementations in parallel.

→

Reasoning effort controlLow / medium / high / xhigh reasoning effort settings. Dial cost vs depth per task — use low for routine tickets, xhigh for architectural decisions.

→

Mid-task steeringReprioritize, redirect, or ask questions while Codex is working. You're managing it asynchronously, not blocked waiting for a response.

→

Computer use / OSWorldCan operate computers end-to-end — navigate UIs, run terminal commands, interact with applications — not just write code.

→

GitHub Copilot integrationAvailable natively in GitHub.com, GitHub Mobile, Visual Studio, and VS Code from February 9, 2026. Reaches the millions of developers already on Copilot.

→

Codex-Spark (Cerebras, Pro only)Ultra-fast 1,000+ tokens/sec variant on Cerebras hardware. 128K context, text-only, research preview for Pro users. Separate rate limits. Does not affect standard Codex usage.

→

Self-referential developmentFirst model 'instrumental in creating itself' — the Codex team used early versions to debug training, manage deployment, and diagnose evaluations.

Enterprise adoption (named customers)

Company	Use case	Reported result
Cisco	Accelerating engineering teams	Named in OpenAI enterprise report
Virgin Atlantic	Development productivity	"Markedly increased productivity"
Temporal	Feature dev, debugging, refactoring	Named customer
Kodiak	Debugging tools for autonomous driving	Named customer
GitHub Copilot users (all)	IDE coding assistance	1M+ developers on Codex product

Sources:DevOps.com: enterprise adoption analysis

Bottom line

GPT-5.3-Codex is the best tool for autonomous, parallelizable coding tasks — τ²-bench at 90.9% (AA-measured, second only to Gemini 3.1 Pro), ~3× token efficiency over Claude Code, and native GitHub integration put it ahead for async workflows. But it doesn't replace Claude Code for complex interactive refactoring, and the API delay blocks it from production pipelines today. If you're on ChatGPT Plus or GitHub Copilot, it's worth adding to your workflow now for ticket-level delegation. If you need API access or tight reasoning control, wait or use GPT-5.2-Codex.

Pricing details

Subscription plans

ChatGPT PlusGPT-5.3-Codex via Codex Web, Codex CLI, VS Code extension(Usage limits apply; Pro at $200/mo for higher limits)

$20/mo

ChatGPT ProFull Codex access + GPT-5.3-Codex-Spark research preview (1,000+ t/s on Cerebras)(Spark variant is research preview only; separate rate limits)

$200/mo

ChatGPT Team / BusinessCodex access for teams, shared workspace, admin controls(Per-user pricing)

$30/mo

GitHub CopilotGPT-5.3-Codex available in Copilot from Feb 9 2026 — github.com, GitHub Mobile, VS, VS Code(Copilot usage limits apply)

$10/mo

ChatGPT Enterprise / EduFull Codex access, SOC 2, HIPAA, zero data retention, admin dashboard(Custom pricing — contact OpenAI sales)

Free

API pricing

OpenAI (estimated floor — API not yet live)GPT-5.3-Codex API access delayed due to 'High capability' cybersecurity classification. Price shown is GPT-5.2-Codex current rate — likely the floor for 5.3-Codex. 90% cached input discount ($0.175/M) expected to carry over. Verify at platform.openai.com before budgeting.

$1.75/$14

Prices verified February 2026. LLM pricing changes frequently — verify at the provider's site before budgeting.

Last updated: February 26, 2026

Benchmark sources:Artificial Analysis: GPT-5.3-Codex·Composio: token efficiency comparison vs Claude Code·Builder.io: Codex vs Claude Code