OpenAI

GPT OSS 120B

Open Source

5.4

out of 10

GPT OSS 120B is OpenAI's first large open-weight language model, released August 2025. It uses a Mixture-of-Experts architecture with 117 billion total parameters and 5.1 billion active per forward pass — designed so it can run on a single H100 GPU. With an AA Intelligence Index of 33 (#1 of 50 in reasoning open-weight models), it's the most capable officially released open-weight model from a frontier lab. At $0.15/$0.60 per 1M tokens and 336 tokens/second, it's both cheap and fast. The open weights are available on Hugging Face and can be self-hosted. A smaller companion model, GPT OSS 20B, runs on consumer 16GB GPUs at $0.05/$0.20 per 1M.

Context window

131K tokens

API (blended)

$0.26/1M

Consumer access

Free (limited)

Multimodal

Text only

Score Breakdown

53.6/100 → 5.4/10

Total53.6/100 → 5.4/10

Intelligence, Reliability, Speed, and Context are field-relative — scores shift as models are added. Accessibility and Trust are absolute checklists. Full methodology →

Try GPT OSS 120B Compare

Strengths

+#1 AA Intelligence Index among all open-weight reasoning models (score: 33)
+OpenAI provenance — first frontier-lab open-weight model with real RLHF training
+336 tokens/second — extremely fast, among the top 5 models measured by Artificial Analysis
+117B parameters, 5.1B active (MoE) — runs on a single H100, deployable in enterprise data centers
+$0.15/$0.60 per 1M tokens — among the cheapest paths to a frontier-quality open model
+Open weights (Hugging Face) — self-host, fine-tune, quantize, or build on commercially

Weaknesses

-No official consumer chat interface — ChatGPT uses GPT-5.x, not GPT OSS
-AA Index 33 vs Gemini 3.1 Pro (57) — still a significant gap from proprietary frontier
-Knowledge cutoff: May 31, 2024 — over a year out of date
-131K context window — smallest of the new open models reviewed here
-Community benchmark reception mixed — some users report weaker common-sense performance vs expectations
-Reasoning mode verbosity is very high (78M tokens during benchmarking vs 12M average)

Best for

enterprise teams wanting to self-host a frontier-quality OpenAI modelcost-sensitive API pipelines where open-source is preferred over proprietarydevelopers building on OpenAI's open ecosystemhigh-throughput batch workloads (fastest open model available)fine-tuning for specialized domains

Not ideal for

everyday consumer chat (no official ChatGPT interface)tasks requiring current knowledge (cutoff May 2024)long-context work (131K is relatively short for this class)teams without GPU infrastructure for self-hosting

Pricing details

Subscription plans

Self-hostedOpen weights — download and run on your own hardware (requires ~2× H100 80GB or equivalent)(You provide the hardware; inference costs are your GPU costs)

Free

API pricing

Weights & Biases Inference / OpenRouterW&B Inference pricing: $0.15/$0.60 per 1M tokens. Also available on Together AI, Fireworks AI, and OpenRouter at competitive rates. Context window: 131K tokens. This is the reasoning (high) variant; a non-reasoning (low) variant also exists at lower cost.

$0.15/$0.6

Prices verified February 2026. LLM pricing changes frequently — verify at the provider's site before budgeting.

Last updated: February 26, 2026

Benchmark sources:Artificial Analysis: GPT OSS 120B