All posts
·6 min read

o3-mini Is Now More Expensive Than o3. Here Is What That Changes.

After OpenAI's 80% price cut on o3, the 'mini' model now costs more per token than the full model. This flips the usual routing logic for reasoning tasks entirely.

openai o3o3-minireasoning modelsmodel routingllm pricing

o3-mini costs $1.10 per million input tokens and $4.40 per million output tokens. o3 costs $0.40 per million input tokens and $1.60 per million output tokens. The "mini" model is 2.75x more expensive on input and 2.75x more expensive on output. This is not a typo or a temporary pricing anomaly. OpenAI cut o3's prices by 80% in 2025, dropping from $2.00/$8.00 to $0.40/$1.60, while o3-mini's pricing was not reduced to the same degree. The result is a pricing structure where the smaller model is the more expensive one, which completely inverts the routing logic most teams built when o3-mini launched.

If your application currently routes "simple reasoning tasks" to o3-mini and "hard reasoning tasks" to o3, your cost optimization is running backwards. You are paying a premium for the less capable model.

Understanding Why This Happened

OpenAI's pricing strategy for reasoning models has evolved faster than most teams' routing logic. When o3-mini launched, it was the affordable entry point to o3-class reasoning. The full o3 model was priced at $2.00/$8.00, making o3-mini a 4x+ cost reduction for workloads that did not need the full model's capability. At that pricing, the routing decision was obvious: use o3-mini unless the task genuinely required o3's top performance.

The 80% price cut on o3 changed the calculus entirely. At $0.40/$1.60, o3 is now not only cheaper than o3-mini, it is cheaper than many non-reasoning models that teams might have considered as alternatives. For context: o3 at $0.40/$1.60 is cheaper per output token than Claude Haiku 4.5 ($5.00/M output) and competitive with GPT-4o-mini's output pricing ($0.60/M output). The performance profile of these models is dramatically different, which means the value-per-dollar math for reasoning tasks has shifted substantially in o3's favor.

The Reasoning Token Factor

Both o3 and o3-mini use internal chain-of-thought reasoning that does not appear in the visible output but is billed as tokens. This is important for cost calculations because a complex math or coding problem can generate 2,000 to 5,000 reasoning tokens or more internally before producing the final response. These tokens are charged at the model's listed output token rate.

This means the effective cost per call for hard reasoning tasks is higher than the listed price suggests. For a problem that generates 3,000 reasoning tokens plus 500 visible output tokens, at o3's pricing ($1.60/M output): (3,000 + 500) / 1,000,000 * $1.60 = $0.0056. At o3-mini's pricing ($4.40/M output): (3,000 + 500) / 1,000,000 * $4.40 = $0.0154. The reasoning token overhead amplifies the price difference between the two models. The harder the task, the more reasoning tokens generated, and the larger the absolute cost gap between o3 and o3-mini.

For simpler tasks that generate fewer reasoning tokens, the gap is smaller in absolute terms but the ratio holds. o3-mini costs 2.75x more per reasoning token than o3 regardless of how many tokens are generated.

Performance Comparison

Cost aside, o3 outperforms o3-mini meaningfully on hard benchmarks. On AIME (advanced math competition problems), competitive programming problems, and multi-step logical deduction, o3's performance advantage is significant. On simpler tasks, the gap narrows, but o3 still holds its own and now costs less.

The practical question is not whether o3 is better than o3-mini (it is) but whether o3 is good enough for your task type compared to cheaper alternatives. For some moderate reasoning tasks, GPT-4o or GPT-5.4 mini ($0.75 input / $4.50 output) may be competitive on quality and cheaper in total cost depending on your output length requirements. For tasks that genuinely require deep reasoning, o3 is now both the quality leader and the better value.

A nuanced consideration is latency. o3-mini may offer lower latency in certain configurations than full o3, particularly when reasoning token count is constrained. If your application has hard real-time latency requirements (under 2 seconds for first token), o3-mini might still be worth evaluating. But for the vast majority of async and near-real-time use cases, the latency difference does not justify the 2.75x cost premium.

A Practical Routing Framework for Reasoning Tasks

Given the current pricing, here is how to think about routing across OpenAI's reasoning and non-reasoning models. For hard math problems, competitive programming, formal verification tasks, and complex multi-step logical deduction, route to o3. It delivers the best performance and is now the cheapest reasoning model in OpenAI's lineup.

For moderate coding assistance, explanations of complex technical concepts, and single-step reasoning tasks where the answer can be verified quickly, consider GPT-5.4 mini at $0.75/$4.50 as an alternative. It is not a reasoning model in the o3 sense, but for tasks that do not require deep chain-of-thought reasoning, it is competitive on quality and cheaper than o3-mini.

For simple reasoning tasks, intent classification with a reasoning component, or tasks where you were previously using o3-mini as a more capable alternative to GPT-4o-mini, re-evaluate whether o3-mini is necessary at all. The correct comparison is now o3 for genuine reasoning needs and cheaper non-reasoning models for everything else.

o3-mini should only be your choice when you have a specific, validated latency requirement that o3 cannot meet, and you have confirmed on your actual workload that o3-mini's latency profile is materially better. That is a narrow set of circumstances.

Updating Existing Routing Logic

If you have routing rules in production that direct "reasoning tasks" to o3-mini and "non-reasoning tasks" to standard models, you need to update them. The standard update is: replace o3-mini with o3 in any routing rule where you were using o3-mini for its reasoning capability. The cost goes down and the quality goes up. There is no tradeoff here.

For teams using rule-based routing, this is a one-line change in a configuration file. For teams using learned routing or cost-based optimization, you may need to update the cost table that your routing layer uses to reflect current pricing. An outdated cost table will continue to route toward o3-mini as the "cheaper" option based on stale numbers.

The broader lesson is that LLM pricing changes frequently enough that any hardcoded cost assumptions in routing logic require regular review. The OpenAI API cost calculator and pricing guide is a useful reference, but prices change and that reference may lag current rates. The cross-provider LLM routing guide covers the tooling needed to keep routing logic synchronized with current provider pricing automatically.

This inversion of o3 vs o3-mini pricing is also a useful reminder of why defaulting to a specific model without revisiting that default as prices change is a costly habit. The model that was the right default six months ago may not be today.

PromptUnit maintains a current pricing table for all major providers and automatically surfaces cases where your active routing rules are directing traffic to a more expensive model when a cheaper, equivalent or better option exists. When a pricing change like the o3 price cut creates a new dominant routing option, the platform flags it.

Stop routing reasoning tasks to o3-mini. Benchmark your actual workload on o3 today, and if quality is equal or better at 2.75x lower cost, the routing change is straightforward. Start with PromptUnit to audit which models your current reasoning tasks are hitting and what they are costing you.

Start your 14-day observation period

See exactly how much you'd save before paying anything. Zero risk. if we save you $0, you pay $0.

Get started free →