Google

Gemini 3 Flash

Fastest

7.8

out of 10

Released December 17, 2025, Gemini 3 Flash was distilled from Gemini 3 Pro — and then outperformed it on SWE-bench Verified (78% vs 76.2%). Google made it the default model for the Gemini app and AI Mode in Google Search within days of launch. At $0.50/$3.00 per 1M tokens with a 1M context window, 214 t/s output speed, and GPQA Diamond at 90.4%, it's the new baseline against which everything else gets measured. The main catch: a 91% hallucination rate that needs to be mitigated, and text-only output with no image or audio generation.

Context window

1.0M tokens

API (blended)

$1.13/1M

Consumer access

Free

Multimodal

Yes

Score Breakdown

77.5/100 → 7.8/10

Total77.5/100 → 7.8/10

Intelligence, Reliability, Speed, and Context are field-relative — scores shift as models are added. Accessibility and Trust are absolute checklists. Full methodology →

Try Gemini 3 Flash Compare

Strengths

+SWE-bench Verified: 78% — beats Gemini 3 Pro (76.2%) on real-world coding despite being the smaller model
+GPQA Diamond: 90.4% — within 1.5 points of Gemini 3 Pro at one-quarter the API cost
+214 t/s output speed (AA-measured) — significantly faster than Claude Sonnet 4.6 (56 t/s) or GPT-5.2
+AIME 2025: 95.2% without tools, 99.7% with code execution — top-tier math reasoning
+Four thinking levels (minimal/low/medium/high) — more granular cost-quality control than any competitor
+1M token context window at $0.50/$3.00 per 1M — 4× cheaper than Gemini 3 Pro
+Default model in the Gemini app and Google Search — available free at gemini.google.com

Weaknesses

-91% hallucination rate (AA-Omniscience) — fabricates confident answers even more often than Gemini 3 Pro (88%)
-Text-only output — no native image or audio generation
-Image segmentation removed — use Gemini 2.5 Flash (thinking off) for pixel-level masks
-ARC-AGI-2 abstract reasoning: 33.6% vs GPT-5.2's 53% — 19-point gap on the hardest reasoning tasks
-Free API tier quotas cut ~90% in December 2025 — from ~250 to ~20 requests/day
-Preview status — no production SLA; model may change before GA

Best for

high-volume API workloadscoding and software engineeringmultimodal pipelinesreal-time streamingmath reasoningagentic tool usebudget-conscious production API use

Not ideal for

factual tasks without Search grounding (91% hallucination rate)image or audio generationimage segmentationfrontier abstract reasoning (use Gemini 3.1 Pro or GPT-5.2)

A distilled model that beat its teacher on coding

Gemini 3 Flash was built using knowledge distillation from Gemini 3 Pro — reasoning pathways from the larger model compressed into a faster, cheaper architecture. The distillation sharpened specific paths rather than just compressing them: Flash outperforms Pro on SWE-bench Verified (78% vs 76.2%), MMMU Pro (81.2% vs 81.0%), and ARC-AGI-2 (33.6% vs 31.1%). SWE-bench tests real GitHub bug-fixing on unmodified issues. Beating the teacher model on it is not a statistical artifact.

Benchmark Performance

Numbers from Google's published model card. Where Flash beats Gemini 3 Pro is marked.

Knowledge, Science & Math

Benchmark	Gemini 3 Flash	Gemini 3 Pro	Gemini 2.5 Pro
GPQA Diamond (PhD science)	90.4%	91.9%	86.4%
Humanity's Last Exam (no tools)	33.7%	37.5%	21.6%
AIME 2025 (no tools)	95.2%	95.0%	88.0%
AIME 2025 (with code execution)	99.7%	—	—
MMMU Pro (multimodal reasoning)	81.2% ✓	81.0%	68.0%

✓ marks where Flash beats Gemini 3 Pro. AIME 2025 (no tools): Flash edges Pro by 0.2 points. GPQA and HLE: Pro holds a modest lead. These are provider-reported scores from Google's model card, not AA-measured in standard mode.

Sources:Google: Gemini 3 Flash model card Google: Gemini 3 Flash launch

Coding & Tool Use

Benchmark	Gemini 3 Flash	Gemini 3 Pro	Gemini 2.5 Pro
SWE-bench Verified (real-world coding)	78.0% ✓	76.2%	59.6%
ARC-AGI-2 (abstract reasoning)	33.6% ✓	31.1%	4.9%
MCP Atlas (tool orchestration)	57.4%	—	8.8%
ScreenSpot-Pro (UI navigation)	69.1%	—	11.4%
Toolathlon (multi-tool use)	49.4%	—	10.5%
Video-MMMU	86.9%	87.6%	83.6%

✓ marks where Flash beats Gemini 3 Pro. The tool-use improvements over Gemini 2.5 Pro are dramatic — MCP Atlas jumped from 8.8% to 57.4%, a 6.5× improvement. GPT-5.2 leads ARC-AGI-2 at 53%, a 19-point gap over Flash.

Sources:Google: Gemini 3 Flash model card

Where GPT-5.2 leads

ARC-AGI-2 abstract reasoning: GPT-5.2 scores 53% to Flash's 33.6% — a 19-point gap. For tasks requiring the hardest abstract problem-solving that doesn't involve coding, Flash isn't the ceiling. Gemini 3.1 Pro (77.1%) and GPT-5.2 are the right alternatives there.

The Value Proposition: Frontier Intelligence at Sub-Frontier Prices

At $0.50 input / $3.00 output per 1M tokens, the gap vs competitors is significant.

API price comparison

Model	Input (per 1M)	Output (per 1M)	vs Flash
Gemini 3 Flash	$0.50	$3.00	—
Gemini 3 Pro / 3.1 Pro	$2.00	$12.00	4× more on input
Claude Haiku 4.5	$1.00	$5.00	2× more on input
GPT-5.2	~$1.25	~$10.00	2.5× more on input
Claude Sonnet 4.6	$3.00	$15.00	6× more on input
GPT-4o mini	$0.15	$0.60	3× cheaper but far less capable

Batch API: 50% discount across all tiers. Context caching: $0.05/1M cached tokens (90% off) — makes repeated long-context queries cheap. Prompts over 200K tokens are billed at 2×. Audio input billed separately at $1.00/1M tokens. Free API tier available — but quotas were cut ~90% in December 2025 (from ~250 to ~20 requests/day for Flash models).

Sources:Google AI pricing

214 tokens per second

Artificial Analysis measured Gemini 3 Flash at 214 t/s output speed — significantly faster than Claude Sonnet 4.6 (56 t/s) or GPT-5.2. In reasoning mode, time-to-first-token is ~12 seconds while the model thinks. Set thinking_level to 'minimal' when latency matters more than reasoning depth, and the response feels near-instant.

Four Thinking Levels — More Control Than Any Competitor

→

MinimalNear-zero reasoning overhead. Fastest response, lowest cost. For data retrieval, translation, classification, and formatting tasks.

→

LowLight chain-of-thought. Quick coding tasks, summaries, simple Q&A.

→

MediumBalanced. Most analysis tasks, API integration, document editing, structured extraction.

→

High (default)Full reasoning depth — generates 15K–20K hidden thinking tokens billed at output rates. Use for complex algorithms, scientific research, and multi-step agentic planning.

→

Google Search groundingThe primary mitigation for the 91% hallucination rate. 5,000 free queries/month via API, then $14/1,000. Required for any task where current or specific factual information matters.

→

Thought signaturesEncrypted reasoning state preserved across multi-turn tool calls. Prevents drift in long autonomous agent runs — must be passed back exactly as received.

→

Parallel function callingMultiple tool calls in one inference step. Supports multimodal function responses — images and PDFs in tool results, added in Gemini 3.

91% hallucination rate — worse than Gemini 3 Pro

Artificial Analysis measured a 91% hallucination rate on their Omniscience evaluation — slightly worse than Gemini 3 Pro's 88%. When the model can't reliably answer something, it fabricates a confident wrong answer 91% of the time rather than admitting uncertainty. Claude 4.5 Haiku: 26%. Claude 4.5 Sonnet: 48%. For any task where factual accuracy matters, Search grounding is the only reliable mitigation. Build it into your default configuration, not as a per-request decision.

What Gemini 3 Flash Cannot Do

Six hard limits worth knowing before you build on it.

Limitation	Detail	Workaround
Text-only output	No native image or audio generation	Nano Banana 2 (Gemini 3.1 Flash Image) for images; separate TTS model for audio
Image segmentation removed	Pixel-level object masks not supported — a regression from Gemini 2.5 Flash	Use Gemini 2.5 Flash with thinking_budget set to 0
Built-in tools + function calling	Cannot combine Search grounding and custom functions in one request	Use separate requests or pick one tool type per call
No fine-tuning during preview	Model customization unavailable until GA	Prompt engineering only
Knowledge cutoff: January 2025	Events after this date are unreliable	Search grounding for anything time-sensitive
Preview status	No SLA; model may change before GA	Pin to explicit model version string in production

The image segmentation removal is a real regression. Google explicitly recommends Gemini 2.5 Flash with thinking disabled for workloads that depend on pixel-level masks.

Flash vs Pro — Which One to Use

Same context window, same ecosystem, 4× price difference. Most workloads belong on Flash.

Dimension	Gemini 3 Flash	Gemini 3 Pro / 3.1 Pro
API cost	$0.50/$3.00 per 1M	$2.00/$12.00 per 1M — 4× more
Output speed	214 t/s	138 t/s
SWE-bench Verified (coding)	78% — Flash wins	76.2%
GPQA Diamond (science)	90.4%	91.9% (3.1 Pro: 94.3%)
Humanity's Last Exam	33.7%	37.5% (3.1 Pro: 44.4%)
Hallucination rate	91% — slightly worse	88% (3.1 Pro: ~50%)
Abstract reasoning (ARC-AGI-2)	33.6%	31.1% (3.1 Pro: 77.1%)
Thinking levels	4 (minimal/low/medium/high)	3 (low/medium/high)
Context window	1M tokens	1M tokens
Free consumer access	Yes — default in Gemini app	Requires AI Pro ($19.99/mo)
Deprecation	No date set	3 Pro: March 9, 2026 / 3.1 Pro: no date

The coding result is the most counterintuitive finding: Flash beats Pro on SWE-bench. For pure coding workloads at scale, Flash is the smarter API choice.

Full write-up: Gemini 3.1 Pro full review →Full write-up: Gemini 3 Pro full review →

Bottom line

Gemini 3 Flash is the best price-to-capability model available for most production workloads. It beats Gemini 3 Pro on coding, matches it within 1.5 points on GPQA Diamond, runs at 214 t/s, and costs a quarter of the price. The 91% hallucination rate is the one real production risk — Search grounding solves it. If you need frontier abstract reasoning (ARC-AGI-2), use Gemini 3.1 Pro or GPT-5.2. If you need image or audio output, use a separate model. For everything else, Flash is the right default.

Pricing details

Subscription plans

Free (gemini.google.com)Gemini Flash access via web and mobile; no hard usage cap for normal use(May be slower during peak; some advanced features locked to premium)

Free

Google One AI PremiumGemini Advanced (higher capability tier), 2TB Google Drive, Gemini in Gmail/Docs/Sheets

$20/mo

API pricing

Google AI Studiofree tierFree tier: rate-limited (60 req/min). Paid: $0.50/$3.00 per 1M tokens. Prompts >200K tokens billed at 2×. All four modalities (text/image/audio/video) included.

$0.5/$3

Google Vertex AIEnterprise tier. Same base pricing, SLA available.

$0.5/$3

OpenRouterSmall markup over direct Google pricing.

$0.52/$3.1

Prices verified February 2026. LLM pricing changes frequently — verify at the provider's site before budgeting.

Last updated: February 27, 2026

Benchmark sources:Artificial Analysis: Gemini 3 Flash

Compare Gemini 3 Flash

Gemini 3 Flash vs GPT-5 MiniWe pick this →