All posts
·7 min read

OpenAI Doubled the Frontier Price. Claude Opus 4.7 Took the Coding Lead a Week Earlier.

GPT-5.5 launched April 23 at $5/$30 per 1M tokens, double GPT-5.4. Claude Opus 4.7 shipped April 16 at $5/$25 with the SWE-Bench lead. Here is what changed for cross-provider routing.

llm cost optimizationcross-provider llm routinganthropic api costopenai api costmodel routing llm

OpenAI launched GPT-5.5 on April 23, 2026 at $5 per million input tokens and $30 per million output tokens. That is exactly double the rate of GPT-5.4. Seven days earlier, on April 16, Anthropic shipped Claude Opus 4.7 at $5 per million input and $25 per million output, and it scored 64.3% on SWE-Bench Pro and 87.6% on SWE-Bench Verified, beating every public OpenAI model on coding benchmarks.

For the first time in two years, the cheapest path to frontier-grade coding output runs through Anthropic, not OpenAI. If your routing logic still defaults coding traffic to OpenAI by reflex, the math against your bill changed this month.

This is not a tribal argument. It is a pricing and benchmark walk-through that any engineering team running real LLM volume should run before the end of the week.

What launched, when, and at what price

Claude Opus 4.7 shipped on April 16, 2026. Pricing held flat from Opus 4.5 at $5 per million input tokens and $25 per million output tokens. The 1M context window remained, vision support extended to 2,576-pixel images, and the headline numbers across coding, agentic reasoning, and knowledge benchmarks all moved up.

GPT-5.5 shipped on April 23, 2026. ChatGPT Plus, Pro, Business, and Enterprise got it first; the public API is described as "coming very soon" pending additional safety guardrails for serving at scale. Pricing landed at $5 per million input tokens and $30 per million output tokens, with a 1M context window. That is exactly 2x the price of GPT-5.4 on both axes. There is also a Pro variant at $30 input and $180 output, plus Batch and Flex tiers at half the standard rate and Priority at 2.5x.

For anyone tracking frontier API costs over time, this is the largest single-step price increase on a flagship OpenAI model in three years.

The benchmarks

On SWE-Bench Pro, the standard for autonomous software engineering tasks, Claude Opus 4.7 scored 64.3% against GPT-5.5's 58.6%. That is a 5.7-point Anthropic lead.

On SWE-Bench Verified, Opus 4.7 scored 87.6%, up from 80.8% on Opus 4.5. This is the strongest publicly reported number for autonomous coding agents on this benchmark.

On MCP Atlas, which tests Model Context Protocol fluency for multi-tool orchestration, Opus 4.7 scored 77.3%. That number is best-in-class across all generally available models.

GPQA Diamond, the graduate-level reasoning benchmark, gave Opus 4.7 a 94.2%. CharXiv visual navigation hit 79.5% at full resolution, up from 57.7% on Opus 4.5. Finance Agent landed at 64.4%, also state-of-the-art.

GPT-5.5 wins where it always has: agentic browsing, knowledge work, multimodal generation, and tasks where OpenAI's tooling stack already integrates deeply. On Terminal-Bench, GDPval, OSWorld, and τ2-bench, GPT-5.5 still leads. The split, in other words, is real and stable: Anthropic owns coding and tool orchestration; OpenAI owns the broader agentic and knowledge surface.

OpenAI's cost defense

OpenAI's public defense of the GPT-5.5 price hike is that the model uses about 40% fewer output tokens to complete the same Codex task as GPT-5.4. If that holds in production, the effective per-task cost increase is closer to 20% than 100%.

That is a meaningful claim and worth taking seriously. But it is also the kind of claim that is true on average and false in 30% of edge cases, and it does not change the per-token sticker price on the bill. If your traffic is dominated by long-output generation tasks where token efficiency drops, you will pay closer to the full 2x. If your traffic is dominated by terse tool-call sequences, you will pay closer to the 20%. The honest answer is that you have to measure your own workload before assuming you got the average.

The other point to acknowledge: GPT-5.5 is not yet on the API. Direct apples-to-apples cost comparison against Opus 4.7 in production will not be possible until the API ships. Until then, GPT-5.4 at $2.50/$15 is still your cheap-OpenAI option, and Opus 4.7 at $5/$25 is your top-coding-benchmark option.

What changed about cross-provider routing

Cross-provider routing has always been technically possible. The question has been whether it was worth the operational overhead. Different APIs, different rate limit behaviors, different failure modes, different prompt formats. We covered why the calculus has tilted toward yes for most production workloads in our piece on cross-provider LLM routing.

What this week did was sharpen the calculus on coding workloads specifically.

Before April 16, the default routing logic for a coding agent was: send everything to GPT-5.4, fall back to Claude Sonnet 4.6 for cost, escalate to Opus 4.5 for hard cases. The default model and the cheap model both lived inside OpenAI's stack. Cross-provider routing was an option, not a requirement.

After April 23, the math looks different. The cheapest path to frontier-grade coding output is Opus 4.7 at $5/$25. The next-cheapest path is GPT-5.4 at $2.50/$15, but it is now two generations behind on coding benchmarks. The new flagship from OpenAI, GPT-5.5, costs more output-side than Opus 4.7 ($30 vs $25) and loses the coding benchmarks. So if your workload is coding-heavy and quality matters, you have one provider winning on benchmarks at lower output cost than the OpenAI flagship.

This is the exact scenario cross-provider routing was built for, and it just became the dominant case for any coding-heavy production workload.

What to do with this

If you are running a coding agent, a code review pipeline, an IDE assistant, or any workflow where SWE-Bench-style task quality is the bottleneck, this week is the moment to add Anthropic as a primary route, not a fallback. The path looks like this:

Route to Opus 4.7: hard coding tasks, multi-file refactors, autonomous agent loops with five or more steps, tool-orchestration heavy work, and anything requiring deep reasoning over long context.

Route to GPT-5.4 mini: bounded coding tasks, classification, intent detection, structured extraction, and all the other categories we covered in our analysis of when the cheaper model wins after the GPT-5.4 mini release.

Hold off on GPT-5.5 routing until the API ships and you can run your own token-efficiency tests. The 40% fewer output tokens claim is plausible, but production workloads vary, and the 2x sticker price means you cannot afford to take the average on faith.

Keep GPT-5.4 in the mix for non-coding agentic work, knowledge synthesis, and tool-use tasks where OpenAI still leads on Terminal-Bench, OSWorld, and τ2-bench. That decision has not changed, and the routing logic should reflect it.

The deeper pattern

There is a longer arc here that engineering teams should pay attention to. For two years, the default frontier model assumption was OpenAI, and routing decisions inherited that default. The result was the default-flagship trap we wrote about in the hidden cost of defaulting to GPT-4o in production: paying flagship rates for traffic that did not need flagship capability.

April 2026 is the first month where that default has visibly broken. The OpenAI flagship doubled in price. The Anthropic flagship took the coding crown without raising prices. The mini-tier OpenAI model approached flagship benchmarks at one-third the cost. Each of these shifts independently would justify revisiting routing logic. Together, they make any team that is still on a single-provider, single-model setup measurably overpaying.

The fix is not to switch providers. The fix is to route per request to the cheapest model that meets the quality bar for that specific task. The model lineup has gotten too good and too price-stratified for static defaults to keep working.

How a routing proxy handles this

PromptUnit's Inferio engine does cross-provider routing at the request level. When Opus 4.7 shipped on April 16, the routing weights updated within hours, and customer coding traffic that previously went to GPT-5.4 by default started landing on Opus 4.7 when the quality classifier flagged it as a coding-heavy request. Customers do not change code. They swap a base URL, run for 14 days in observation mode to see projected savings, and then flip the switch. Pricing is 20% of verified savings, with no flat fee.

If your team is running a coding-heavy LLM workload and has not benchmarked Opus 4.7 against your current default, that is the highest-leverage thing you can do this week. Start at promptunit.ai.

Start your 14-day observation period

See exactly how much you'd save before paying anything. Zero risk — if we save you $0, you pay $0.

Get started free →