Anthropic Claude API Pricing Guide 2026: Haiku, Sonnet,…

Anthropic's Claude API has a straightforward four-tier model hierarchy, but the pricing differences between tiers are large enough that picking the wrong one for a given workload can multiply your monthly bill by 5x or more. This guide covers every current Claude model, works through realistic cost scenarios with real numbers, and explains when the savings levers, specifically prompt caching and the Batch API, are worth the engineering effort.

The Model Hierarchy and What Each Tier Is For

Claude's lineup in 2026 runs from Haiku at the budget end through Sonnet as the workhorse, up through Opus for demanding tasks, and into Fable 5 (also branded as Mythos 5) at the frontier tier. The pricing spread between Haiku and Fable is dramatic.

Model	Input (per MTok)	Output (per MTok)	Best For
Claude Haiku 4.5	$1.00	$5.00	High-volume, latency-sensitive, simple tasks
Claude Sonnet 4.6	$3.00	$15.00	Balanced: code, reasoning, multi-turn chat
Claude Opus 4.5-4.8	$5.00	$25.00	Complex reasoning, long-form generation
Claude Fable 5 / Mythos 5	$10.00	$50.00	Frontier capability, research, hardest tasks

A quick note on Haiku 3.5: it has been retired from Anthropic's first-party API. If you are still running it, you can access it through AWS Bedrock or Google Cloud Vertex AI, but it is no longer available directly. Haiku 4.5 is the current fast-and-cheap option.

All current Claude models support a 200,000 token context window, with some configurations extending to 1 million tokens. The context window itself is not separately priced, but longer contexts mean more input tokens per call, so longer prompts cost more regardless of which model you use.

Three Realistic Cost Scenarios

Rather than quoting theoretical numbers, here is what these models actually cost at production-scale usage patterns.

Scenario 1: Customer support chatbot. Average call is 500 tokens input and 200 tokens output. Volume is 10,000 calls per day. Using Haiku 4.5: input cost is 500 times 10,000 divided by 1,000,000 times $1.00, which equals $5.00 per day. Output cost is 200 times 10,000 divided by 1,000,000 times $5.00, which equals $10.00 per day. Total is $15.00 per day, or roughly $450 per month. For a customer support context, this is a manageable number. The same workload on Sonnet 4.6 would cost $45 per day, or $1,350 per month. The quality delta between Haiku and Sonnet for straightforward FAQ-style support is usually not worth 3x the cost.

Scenario 2: Code review assistant. Average call is 3,000 tokens input and 800 tokens output. Volume is 1,000 calls per day. Using Sonnet 4.6: input cost is 3,000 times 1,000 divided by 1,000,000 times $3.00, which equals $9.00 per day. Output cost is 800 times 1,000 divided by 1,000,000 times $15.00, which equals $12.00 per day. Total is $21.00 per day, or $630 per month. Code review is a task where Sonnet's instruction-following and reasoning justify the step up from Haiku. Running this on Haiku would save about $14 per day, but code review errors have real downstream costs.

Scenario 3: Complex document analysis. Average call is 10,000 tokens input and 2,000 tokens output. Volume is 100 calls per day. Using Opus 4.8: input cost is 10,000 times 100 divided by 1,000,000 times $5.00, which equals $5.00 per day. Output cost is 2,000 times 100 divided by 1,000,000 times $25.00, which equals $5.00 per day. Total is $10.00 per day, or $300 per month. For 100 calls per day, the absolute dollar amount is manageable at the Opus tier. The tradeoff changes quickly if volume scales up. At 1,000 calls per day, you are looking at $3,000 per month for the same workload.

How Prompt Caching Changes the Math

Anthropic's prompt caching is one of the most significant cost reduction tools in the API, and it is underused by most teams. The mechanism: when you write content to the cache, reads from that cache cost 0.10x the base input price, a 90% discount. There are two write tiers: a 5-minute cache write costs 1.25x base input, and a 1-hour cache write costs 2.0x base input.

Here is a concrete example. Say you have a system prompt that runs 2,000 tokens and you are making 50,000 calls per day with Sonnet 4.6. Without caching, the system prompt alone costs 2,000 times 50,000 divided by 1,000,000 times $3.00, which equals $300 per day. With a 5-minute cache write (cache fills once per 5-minute window, reads the rest of the time): the write cost is 2,000 times some small number of writes per day times $3.75 per MTok, which is negligible. The read cost for the bulk of calls is 2,000 times 50,000 divided by 1,000,000 times $0.30 (10% of $3.00), which equals $30 per day. That is $270 per day in savings from one change.

For the customer support chatbot in Scenario 1, where the system prompt is stable across all calls, caching the system prompt at 1,000 tokens would reduce input costs by roughly 80%. The total monthly bill drops from $450 to under $200, depending on average cache hit rates across sessions.

The main constraint: cached content must be at the beginning of your prompt (system prompt position), and the same content must appear across many calls to justify the write cost. For workloads with stable system prompts, this is almost always a net win.

Batch API: 50% Off Everything

Anthropic's Batch API applies a 50% discount to all models for asynchronous workloads. Results are returned within 24 hours. For anything that does not need a real-time response, such as data enrichment, nightly analysis pipelines, content moderation queues, or classification jobs, the Batch API halves your costs with minimal code change.

The Scenario 3 document analysis workload above, at $300 per month for Opus 4.8, would cost $150 per month via the Batch API if the use case permits a delayed response. For background pipelines, that is an easy decision.

How Anthropic Compares to OpenAI by Tier

The comparison varies significantly depending on which tier you are evaluating.

Use Case Tier	Anthropic Option	Cost (Input/Output)	OpenAI Comparable	Cost (Input/Output)
Budget	Haiku 4.5	$1.00 / $5.00	GPT-4o-mini	$0.15 / $0.60
Mid-range	Sonnet 4.6	$3.00 / $15.00	GPT-5.4	$2.50 / $15.00
Premium	Opus 4.8	$5.00 / $25.00	GPT-5.4	$2.50 / $15.00

The budget tier comparison is stark. Haiku 4.5 at $1.00/$5.00 is 6 to 8 times more expensive than GPT-4o-mini at $0.15/$0.60 per million tokens. If you are running a high-volume classification or extraction workload and cost is the primary constraint, GPT-4o-mini is the obvious starting point. Haiku 4.5 competes on a different basis: strong instruction following, excellent multi-turn conversation, and consistently high quality on complex short-context tasks.

At the mid-range, Sonnet 4.6 and GPT-5.4 are almost identically priced. The input cost difference is $0.50 per million tokens, and the output cost is identical at $15.00. At this tier, the selection decision should be driven by task performance, not price. See our comparison of cross-provider routing strategies for more on making that call programmatically.

At the premium tier, OpenAI's GPT-5.4 undercuts Opus 4.8 significantly, at half the price on both input and output. If you need the best available reasoning and are price-sensitive, GPT-5.4 is worth testing against Opus before committing to Opus pricing.

The frontier tier comparison is harder because the models serve different use cases. Claude Fable 5 at $10.00/$50.00 is positioned for tasks where the most capable model available is a hard requirement, and cost is secondary. The OpenAI equivalent in that tier (GPT-5.5 at $5.00/$30.00) is cheaper, but benchmark comparisons at this level are still emerging.

When to Choose Anthropic Over OpenAI

This is not a purely cost-based decision. Claude Sonnet and Haiku consistently score well on instruction following, code generation, and multi-turn conversation, particularly for tasks that require the model to maintain context across many turns without drifting. If your application involves long conversations, document-grounded Q&A, or structured data extraction from complex text, Claude's performance characteristics often justify the price premium over the cheapest OpenAI equivalent.

The practical approach: run your actual workload on both providers using a sample of 200 to 500 representative inputs. Score outputs on your quality metrics. Then apply the pricing math. You will often find that the total cost-adjusted quality tradeoff favors one provider clearly for your specific use case. For more detail on making this decision systematically, see the LLM model routing guide.

Tracking Costs Across Providers

PromptUnit provides per-call cost tracking across Anthropic, OpenAI, and Google, so you can see exactly what each workload costs at the model level without manually tracking token counts. It also surfaces caching opportunities, specifically identifying high-frequency calls with stable prefixes that would benefit from cache writes.

If your OpenAI or Anthropic bill has grown faster than you expected, PromptUnit gives you the granular data to find out where the cost is coming from and what to do about it.

For a step-by-step plan to act on these numbers, see How to Reduce Claude API Costs: A Practical 5-Step Guide, which walks through model tier audits, prompt caching, the Batch API, and output length control with the specific savings each step produces.

Anthropic Claude API Pricing Guide 2026: Haiku, Sonnet, Opus, and What Each Tier Actually Costs