Uncategorized

LLM Token Counter and API Cost Calculator

Estimate tokens and API cost for GPT, Claude, Gemini, Llama and Mistral from any prompt. Per-model pricing, context window usage and total-cost calculator.

Paste your prompt or text

Cost inputs

Expected output tokens

Requests (calls)

Input tokens come from your prompt above. Output is an estimate of how much the model will generate per call. Total cost = (input + output) x requests.

Per-model tokens, context window and cost

Model	Tokens (in)	Context	Input $	Output $	Total $

Cheapest model—

Most expensive—

Across all calls—

Heads up. Token counts are heuristic approximations, not exact BPE tokenization. Real tokenizers (tiktoken cl100k, Anthropic, SentencePiece) can differ by 5 to 15 percent, especially on code, JSON, and non-English text. Pricing is listed per 1M tokens and may change — always verify with the provider before committing a budget. Nothing you type is sent to a server.

Frequently asked questions

What is a token?

A token is the atomic unit large language models read and write. It can be a whole short word like "cat", a piece of a longer word such as "tokeni" + "zation", a punctuation mark, or a byte. As a rough rule for English prose, 1 token is about 4 characters or 0.75 of a word, so 1,000 tokens is roughly 750 English words. Code, JSON, URLs, and non-English text use more tokens per character.

How does BPE tokenization work?

Byte Pair Encoding (BPE) builds a vocabulary by repeatedly merging the most frequent adjacent character pairs in a training corpus. GPT uses tiktoken with the cl100k_base vocabulary (about 100,000 merges). At encode time, any input text is split into the longest matching merges. Common words become a single token, rare words split into several subword tokens. Claude uses a similar BPE variant tuned on different data, Gemini uses SentencePiece, and Llama uses its own tiktoken-compatible vocab.

Why do models charge different prices?

Price reflects a mix of model size, inference hardware cost, engineering overhead, and the provider's margin. Flagship models (GPT-4 Turbo, Claude Opus) run on larger weights and cost more per token. Small or distilled models (GPT-4o mini, Claude Haiku, Gemini Flash) are 10 to 60 times cheaper. Output tokens usually cost 3 to 5 times more than input because generation is autoregressive and slower. Prompt caching, batch APIs, and self-hosting open models like Llama can cut costs further.

What is a context window and why does it matter?

The context window is the maximum number of tokens (prompt + output) a model can handle in a single call. Classic GPT-3.5 allowed 4K or 16K tokens, GPT-4 family now ships 128K, Claude 4.5 supports 200K, and Gemini 1.5 scales to 1M. If your input plus expected output exceeds the limit, the request is rejected or silently truncated. This calculator shows a usage bar per model so you can see whether a long prompt still fits.

How accurate is this estimate?

The tool uses a char-per-token heuristic calibrated for English prose: GPT and Gemini at about 4 chars per token, Claude at 3.5, Llama at 3.8, Mistral at 3.9. It cross-checks with a word-based estimate (1.3 tokens per word) and takes the larger of the two. For typical articles the error is within 5 percent. Code, minified JSON, base64 blobs, and languages like Chinese or Japanese push the error higher — in those cases use the provider's official tokenizer for critical numbers.

Is my text sent anywhere?

No. Every calculation runs locally in your browser. The tool loads zero external scripts or fonts, stores nothing in cookies, and never uploads your prompt to a server. You can safely paste proprietary prompts, draft code, or unreleased content.

This calculator estimates how many tokens your prompt will consume on major large language models and then converts that into an API cost estimate. Paste any text into the box and the tool shows token counts, characters, words, and the tokens-per-word ratio. The model table lists GPT-4o, GPT-4o mini, GPT-4 Turbo, GPT-3.5 Turbo, Claude Sonnet 4.5, Claude Haiku 4.5, Claude Opus 4, Gemini 1.5 Pro and Flash, Llama 3.1 70B, and Mistral Large with per-1M-token prices, context window usage bars, and split input and output costs. Adjust expected output tokens and number of API calls to get a scenario total. Example: a 1,200-character prompt resolves to roughly 300 tokens; with 500 output tokens on GPT-4o mini across 100 calls the total cost lands near $0.035. A yellow bar warns when prompt plus output exceeds 85% of the context window.