Question 1

How accurate are the token estimates?

Accepted Answer

We use ~4 characters per token, which is the standard BPE approximation for English text. This is typically within 10% of the actual token count. For code, JSON, or non-English text, accuracy may vary. For exact counts, use your provider's tokenizer API.

Question 2

Do output tokens count against the context window?

Accepted Answer

Yes. The context window must fit input + output. If you have a 128K window and use 120K for input, you only have 8K left for the response. This is the most common context budget mistake.

Question 3

What happens if I exceed the context window?

Accepted Answer

The API will return an error and your request will fail. Some providers truncate silently. Either way, you lose money on the failed request and need to retry with less input.

Question 4

Which model has the largest context window?

Accepted Answer

As of 2026, GPT-4.1, Gemini 2.5 Pro, Gemini 2.5 Flash, and Llama 4 Maverick all support 1M tokens. Claude Opus/Sonnet support 200K.

Question 5

How do I reduce my token usage?

Accepted Answer

Common strategies: condense system prompts (remove redundancy), use fewer few-shot examples, summarize long documents before including them, and only reserve as much output as you actually need.

Context Budget Planner

How to Plan Your AI Context Budget Before Running Out of Tokens

Frequently Asked Questions

Related Tools