[good?]

Google

Gemini 3 Pro

7.8
out of 10

Released November 18, 2025, Gemini 3 Pro was the first model to break 1,500 Elo on LMArena and led 13 of 16 major benchmarks at launch. Three months later, Google shipped Gemini 3.1 Pro at the same price — better reasoning across the board — and scheduled Gemini 3 Pro for deprecation on March 9, 2026. If you're starting fresh, use 3.1 Pro. For existing deployments, the migration is a model string swap. The model is still capable: 138 t/s output, a real 1M-token context window, and native multimodal inputs including up to 9.5 hours of audio and an hour of video per call. Just know what you're working with.

Context window

1.0M tokens

API (blended)

$4.50/1M

Consumer access

Free (limited) / $20/mo

Multimodal

Yes

Score Breakdown

77.6/100 → 7.8/10
Total77.6/100 → 7.8/10

Intelligence, Reliability, Speed, and Context are field-relative — scores shift as models are added. Accessibility and Trust are absolute checklists. Full methodology →

Strengths

  • +1M token context — handles book-length documents, entire codebases, and multi-hour transcripts natively
  • +GPQA Diamond 90.8% and HLE 37.2% (AA-measured) — top-tier scientific reasoning
  • +τ²-bench 87.1% and LiveCodeBench 91.7% (AA-measured) — excellent for coding and agentic tool use
  • +AA Intelligence Index 48 — second only to Gemini 3.1 Pro in the Gemini family
  • +Natively multimodal: text, image, audio, video input in a single API call
  • +138 t/s output speed (AA-measured) — significantly faster than Claude and GPT-5.2
  • +Free tier via AI Studio — good for development and prototyping

Weaknesses

  • -Deprecated March 9, 2026 — migrate to gemini-3.1-pro-preview (same price, better reasoning)
  • -88% hallucination rate (AA-Omniscience) — fabricates confident answers when it doesn't know something
  • -Extreme verbosity inflates real API costs: Artificial Analysis spent $892 evaluating it vs ~$100 for other models
  • -GDPval-AA: 1,317 Elo — trails Claude Sonnet 4.6 by 316 points on office and professional judgment tasks
  • -No free API tier — free API access is limited to Gemini 3 Flash
  • -Output truncation bug cuts code generation at ~21K tokens despite 65K stated cap (fixed in 3.1 Pro)

Best for

long documentsmultimodal taskscodingscientific researchlarge codebase analysis

Not ideal for

maximum reasoning depth (use 3.1 Pro)strict data residency requirements

Deprecated March 9, 2026 — plan your migration

Gemini 3 Pro (model string: gemini-3-pro-preview) will be decommissioned on March 9, 2026. API calls to this endpoint fail after that date. The replacement is gemini-3.1-pro-preview — same pricing, no migration cost, and significantly better on ARC-AGI-2 (31.1% → 77.1%), GPQA Diamond (91.9% → 94.3%), and SWE-Bench Verified (76.2% → 80.6%). Update the model string, test your prompts. That's the whole migration.

Benchmark Performance

All scores independently measured by Artificial Analysis in standard mode — no extended thinking, same methodology across all models.

Knowledge & Science (AA-measured)

BenchmarkGemini 3 ProGPT-5.2Claude Opus 4.6
GPQA Diamond (PhD science)90.8%90.3%84.0%
HLE (expert-level knowledge)37.2%35.4%18.6%

Gemini 3 Pro edges GPT-5.2 on both. Gemini 3.1 Pro extended those leads further: GPQA Diamond 94.3%, HLE 44.7%.

Coding & Tool Use (AA-measured)

BenchmarkGemini 3 ProGPT-5.2Claude Opus 4.6
τ²-bench (multi-turn tool use)87.1%84.8%84.8%
LiveCodeBench (coding accuracy)91.7%88.9%

Gemini 3 Pro leads both on tool use — a consistent strength across the Gemini 3 family. Gemini 3.1 Pro pushed τ²-bench further to 95.6%.

Where competitors still lead

GDPval-AA — everyday office tasks, strategic planning, financial analysis — is where Claude models win. Gemini 3 Pro scores 1,317 Elo against Claude Sonnet 4.6's 1,633. That gap is real. For professional judgment work requiring contextual reasoning over structured data, the Claude models are the better choice.

What 1M Tokens Actually Means

The 1,048,576-token input limit is real and tested. Here's what fits.

Input typeCapacityNotes
Text / code~750,000 wordsFull codebases, legal filings, research archives
ImagesUp to 900 per promptPNG, JPEG, WEBP, HEIC, HEIF — ~1,120 tokens each
AudioUp to 9.5 hours32 tokens per second — speech and acoustic signals natively
VideoUp to 1 hour45 min with audio; YouTube URLs supported; up to 10 files per prompt
PDFsUp to ~1,000 pagesText, diagrams, and layout extracted natively
Max output65,536 tokens (~49K words)Note: Gemini 3 Pro had a bug cutting code at ~21K tokens. Gemini 3.1 Pro fixed it.

Prompts over 200K tokens trigger the long-context pricing tier: $4/$18 per 1M instead of $2/$12. Build that into your cost estimates for large-document jobs.

Pricing — Cheap Listed Rate, Higher Real Cost

The headline numbers are competitive. Verbosity is where the budget goes.

API pricing

Context tierInput (per 1M tokens)Output (per 1M tokens)
≤200K tokens$2.00$12.00
>200K tokens$4.00$18.00
Batch API (≤200K)$1.00$6.00
Context cache read$0.20
Cache storage$4.50 per 1M tokens/hr
Search grounding5,000 queries/mo freethen $14.00 per 1,000 queries

No free API tier for Gemini 3 Pro. Free API access is limited to Gemini 3 Flash. Consumer access starts at Google AI Pro ($19.99/month). Context caching cuts repeated input costs by 90% — essential for long-session document analysis.

The verbosity problem — real costs are much higher than listed

Artificial Analysis spent $892 evaluating Gemini 3 Pro on their Intelligence Index. Other frontier models averaged roughly $100 for the same evaluation. Gemini 3 Pro generated 57 million output tokens where competitors generated around 12 million. At $12 per million output tokens, that's a 4–5× cost multiplier at equivalent task quality. Set max_tokens limits in production or the bill will surprise you.

Built-in Tools

Google Search groundingLive web search baked into the API — the primary mitigation for the 88% hallucination rate. 5,000 free queries/month, then $14/1,000. Required for anything time-sensitive.
Code executionSandboxed Python environment for calculations, data analysis, and image manipulation. Runs inside the API call — no separate container needed.
URL contextFetch and analyze live web pages inline in a prompt. The model reads the current page, not a cached version.
Function calling with parallel executionMultiple tool calls in a single inference step. Streaming arguments and multimodal responses — images and PDFs in tool results — added in Gemini 3.
Thought signaturesCryptographic reasoning state preserved across multi-turn conversations. Prevents the drift that breaks long autonomous agent runs on other models.
Computer use (preview)See digital screens and perform UI actions — clicking, typing, navigating apps. Still in preview; not production-ready for most teams.

88% hallucination rate — read this before deploying

Artificial Analysis measured an 88% hallucination rate on their Omniscience evaluation. When the model can't reliably answer something, it produces a confident wrong answer 88% of the time rather than acknowledging uncertainty. For comparison: Claude 4.5 Haiku was 26%, Claude 4.5 Sonnet 48%. Gemini 3.1 Pro reportedly improved to around 50% — still high. For anything where factual accuracy matters, pair this model with Search grounding and build output verification into your pipeline.

Gemini 3 Pro vs Gemini 3.1 Pro — Is the Upgrade Worth It?

Same price. Significantly better reasoning. Shorter answer: yes.

DimensionGemini 3 ProGemini 3.1 ProChange
ARC-AGI-231.1%77.1%+46.0pp — more than doubled
GPQA Diamond (provider-reported)91.9%94.3%+2.4pp
SWE-Bench Verified76.2%80.6%+4.4pp
Humanity's Last Exam (no tools)37.5%44.4%+6.9pp
Thinking levelsLOW + HIGHLOW + MEDIUM + HIGHMEDIUM added
Output truncation bugCuts at ~21K tokensFixed
Token efficiencyVerboseImprovedLower real cost per task
Deprecation dateMarch 9, 2026None set
API pricing$2/$12 per 1M$2/$12 per 1MIdentical

The output truncation bug in Gemini 3 Pro silently cut code generation at ~21,000 tokens despite the stated 65,536-token cap. If you've hit this in production, that alone is reason to switch — not just a benchmark improvement.

Bottom line

Gemini 3 Pro is a capable model at a competitive price — real 1M-token context, strong science and coding benchmarks, and built-in Search grounding. But it has a March 9, 2026 deprecation date, an 88% hallucination rate, and verbosity that pushes real API costs well above the listed rate. Migrate to Gemini 3.1 Pro: same price, meaningfully better reasoning, fixed truncation bug. If you're deciding which Gemini model to use from scratch, the 3.1 Pro review is where to start.

Pricing details

Subscription plans

FreeGemini app access (rate-limited)(Request rate limits apply; some features require premium)
Free
Google One AI PremiumGemini Advanced with 3 Pro access, 2TB Google Drive, Google Workspace integration
$20/mo

API pricing

Google AI Studiofree tier≤200K context: $2/$12. >200K: $4/$18 per 1M. Free tier: 60 req/min (rate-limited). Context caching and function calling supported.
$2/$12
Google Vertex AISame pricing as AI Studio. Enterprise SLAs and VPC support available.
$2/$12

Prices verified February 2026. LLM pricing changes frequently — verify at the provider's site before budgeting.

Last updated: February 27, 2026