Building design

AI Model Comparison

Compare 22 large language models by price, context window, max output and release date, then estimate monthly API cost for your own token volume.

Filters

Provider

License

Max input $/1M

Min context

Workload preset

Input tokens / call

Output tokens / call

Calls / month

Cheapest—

Highest cost—

Largest context—

Heads up. Pricing is a snapshot from April 15, 2026 and can change. Figures are USD per 1M tokens. Open-weights models list reference prices on hosted-inference APIs and vary by provider. Benchmark scores are best publicly reported numbers from official model cards. Treat as a rough ranking, not exact measurement. Always verify with the provider before committing budget.

Frequently asked questions

Start with quality, context size, and cost. Chatbots: Claude Haiku 4.6, GPT-5.4 nano, Gemini 3.1 Flash-Lite, DeepSeek V3.2. Agents and code: GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, Grok 4, DeepSeek R2. Long documents (1M+): Claude Opus 4.6 / Sonnet 4.6, GPT-5.4, Gemini 3.1 Pro, Grok 4.1 Fast (2M), Llama 4 Scout (10M). Use the workload presets above for a starting point.

Snapshot from April 15, 2026. Includes Claude Opus / Sonnet / Haiku 4.6, GPT-5.4 (Pro/Standard/Mini/Nano), Gemini 3.1 Pro / Flash / Flash-Lite, DeepSeek V4 and R2, Grok 4 and Grok 4.1 Fast, Llama 4.5 Behemoth and Maverick, Qwen 3.5 Plus, Mistral Large 3, Phi-4.5. Always verify on provider pricing pages before committing.

Closed APIs (GPT, Claude, Gemini) are fastest to ship: zero infrastructure, frontier quality, SLA. Downsides: vendor lock-in, data-residency, output pricing. Open-weights (Llama, Mistral, DeepSeek, Qwen, Phi) let you self-host for predictable cost and full data control. A common middle ground is hosted inference (Together, Groq, Fireworks, Bedrock) for open-weights checkpoints.

No. They are best publicly reported scores from official model cards and vendor blog posts. Benchmarks get gamed as they age. Use as a rough ranking, not a measurement. For a real decision, run your own small evaluation on a slice of your data.

This calculator puts 22 current LLMs from Anthropic, OpenAI, Google, Meta, Mistral, DeepSeek and xAI into one interactive table. Every row shows input and output price per million tokens, context window, max output, license (open weights or closed API) and release month. Three modes cover the common workflows. Grid view is a sortable table with filters on provider, license, price ceiling and minimum context. Head-to-head lets you tick 2 or 3 rows and renders a stacked card comparison with the winning metric highlighted in green and the losing one in red. By use case picks the top models for coding, reasoning, vision, longest context, lowest cost, fastest, or open weights only. A cost calculator sits above the table: enter input tokens, output tokens and calls per month, and every row recomputes a monthly dollar figure for your actual workload. Example: 2000 input tokens plus 500 output at 10,000 calls per month on Gemini 1.5 Flash runs about $2.50, while Claude Opus 4.5 on the same load is roughly $975.