Every AI model has a memory limit. It's called the context window, and it's one of the most important specs you're probably ignoring.
Here's the short version: when you send a message to an AI, it reads everything in the conversation — your message, its previous replies, any files you pasted in. The context window is how much it can read at once, measured in tokens.
What actually happens when you hit the limit
The model starts forgetting things. Specifically, it drops the oldest parts of the conversation to make room for new ones.
You've probably experienced this. You're deep in a long conversation and the AI suddenly forgets something you mentioned 20 messages ago. That's not a bug — it's the context window filling up.
For short conversations, it doesn't matter at all. For long documents, research sessions, or complex coding projects, it matters a lot.
What is a token, exactly?
Tokens are chunks of text — not words, not characters, something in between. A rough rule of thumb: 1 token ≈ 0.75 words in English.
So 128,000 tokens (a common context size) is roughly 96,000 words, or about a full novel.
That sounds like a lot until you're asking an AI to analyze an entire codebase, or compare 20 research papers, or work through a 300-page legal document.
What does "1 million tokens" actually mean?
Some models — Gemini 3 Pro, Gemini 3.1 Pro, Llama 4 Scout — support 1 million token context windows. That's about 750,000 words, or roughly 10 full-length novels at once.
For most people, that's overkill. But if you work with large codebases, lengthy research, or need to dump entire datasets into a prompt, that headroom genuinely changes what's possible.
Context window vs. context quality
Bigger isn't always better. Some models handle long contexts poorly — they pay less attention to things buried in the middle of a long prompt.
Claude is known for being unusually good at this. It tends to stay accurate and coherent even deep into a long conversation, which matters more in practice than the raw number.
A 200K context window used well beats a 1M context window used badly.
What to actually look for
If you're a casual user: anything above 32K is plenty for normal conversations. You won't notice the difference.
If you work with long documents or code: look for 200K minimum. At that point, you can fit entire books or large projects without worrying about the limit.
If you're doing serious multi-document research or enterprise data work: models with 1M+ context windows start to become the right tool for the job.
See how the major models stack up on context window size: Context window rankings →