AI model pricing is listed in tokens. Conversations are limited in tokens. Models are compared by tokens per second. And most people have no idea what a token actually is.
Here's the plain-English version.
Tokens are not words
A token is a chunk of text that the model processes as a single unit. In English, a token is roughly 0.75 words — or about 4 characters.
The word “fantastic” is 1 token. The word “unfortunately” might be 2. Short common words are usually 1 token. Longer or rarer words get split up.
As a quick mental shortcut: 1,000 tokens ≈ 750 words ≈ one and a half pages of a book.
Why does pricing use tokens?
Because compute cost scales with tokens, not words or characters. Every token the model reads or writes requires processing. Longer inputs and outputs cost more.
Most AI APIs charge separately for input tokens (what you send) and output tokens (what the model writes back). Output is almost always more expensive — generating text is harder than reading it.
A typical API call with a short prompt and a medium-length response might use 200 input tokens and 400 output tokens. At current prices for mid-tier models, that's a fraction of a cent.
Why speed is measured in tokens per second
When you see “120 t/s” for a model, it means it generates 120 tokens per second. At 0.75 words per token, that's about 90 words per second — fast enough to feel instant for most responses.
Slower models — typically the larger, more capable ones — might do 30-50 t/s. That's still fast for reading, but you'll notice a delay on long responses.
What this means for free tiers
When an API says “free up to 1 million tokens per month,” that's about 750,000 words of combined input and output. For a developer sending occasional test requests, that's plenty. For production traffic, it evaporates quickly.
See which models offer the best free API tiers: Best free API tiers →
The practical takeaway
If you're just using a consumer product like Claude.ai or ChatGPT, you don't need to think about tokens at all. The free and paid plans just work.
If you're calling APIs directly or building something, tokens matter. Keep prompts tight, don't repeat context unnecessarily, and be aware that long conversations accumulate cost because the whole history gets sent with each message.
Want to compare models on price and specs? See the full data table →