All posts
·7 min read

How a SaaS Team Cut Their AI Bill From $12K to $6.9K, Without Changing a Line of Product Code

A B2B analytics team was spending $12,000 per month on AI API calls. After 14 days in observation mode, PromptUnit identified that 68% of their requests didn't need GPT-4o. Monthly spend dropped to $6,960, a 42% reduction, with zero impact on output quality.

llm cost optimizationai cost reduction case studymodel routinggpt-4o costai infrastructure savings

This analysis is based on a real account that ran through PromptUnit's 14-day observation period. Company details are anonymized. All figures reflect actual logged API calls and per-token pricing as of April 2026.


A B2B product analytics company came to PromptUnit spending $12,000 per month on AI API calls. They were using GPT-4o for everything — feature extraction, user session summarization, anomaly classification, natural-language query answering, and a handful of complex reasoning tasks in their backend pipeline.

The engineering team had tried switching some endpoints to GPT-4o-mini six months earlier. Quality dropped on two features, a customer complained, and they reverted everything back to GPT-4o within a week. After that, the policy became "GPT-4o everywhere" and nobody touched it again. This is one of the most common patterns we see. The hidden costs go further than the invoice suggests.

By the time they connected to PromptUnit, their monthly AI spend had grown from $4K to $12K over eight months, tracking almost exactly with their user growth. The assumption was that this was just the cost of scale — AI is expensive, and there was nothing to do about it except find a cheaper provider.

That assumption was wrong.

What 14 days of observation showed

PromptUnit's observation mode logs every API call and runs the routing engine in shadow mode. It classifies requests and decides what it would have routed them to, but sends them to the original model unchanged. No production risk. No quality impact. Just 14 days of signal.

After two weeks, the traffic breakdown looked like this:

Task type Share of calls Share of cost
Classification (intent, labels, flags) 31% 18%
Extraction (parse fields, structured output) 22% 14%
Summarization (session recap, feature digest) 15% 11%
Q&A / natural language query 19% 28%
Complex reasoning (anomaly analysis, multi-step) 13% 29%

The top three categories — classification, extraction, summarization — made up 68% of all calls and 43% of the total cost. These were being handled by GPT-4o at $2.50 input / $10.00 output per million tokens. This 60–68% routeable share matches what we found when we analyzed 10,000 GPT-4o calls across different production workloads — the pattern is remarkably consistent.

The routing engine's recommendation: move those three categories to GPT-4o-mini ($0.15 input / $0.60 output) for extraction and classification, and to Claude Haiku 4.5 ($1.00 input / $5.00 output) for summarization tasks where the output needed to be more coherent. The quality case for this split is well-established in benchmarks and in production data.

Keep Q&A on GPT-4o. Keep complex reasoning on GPT-4o.

Why the earlier GPT-4o-mini attempt failed

The team's previous attempt failed because they applied the switch at the API level — one endpoint, one model, all requests. That endpoint handled both simple session summaries and complex multi-step queries from power users. When a complex query hit GPT-4o-mini, the output degraded visibly.

PromptUnit routes per request, not per endpoint. The same endpoint can receive a simple extraction request (routes to GPT-4o-mini) and a complex reasoning query (stays on GPT-4o) in the same minute. The routing decision is made on the content of each request, not on which URL it came from.

The classification that failed before wasn't wrong — routing classification tasks to cheaper models does work. The mistake was routing all tasks on an endpoint to a cheaper model.

The routing configuration

After the observation period, the team enabled routing with the default Balanced setting. The final configuration:

  • Classification tasks — GPT-4o-mini. Confidence threshold: 90. Domain lock: none.
  • Extraction tasks — GPT-4o-mini. Confidence threshold: 88. Domain lock: none.
  • Summarization tasks — Claude Haiku 4.5. Confidence threshold: 85. Domain lock: none.
  • Q&A tasks — GPT-4o. No downgrade.
  • Complex reasoning — GPT-4o. No downgrade. (DeepSeek V4 Pro flagged as candidate for future routing — pending internal benchmark.)

No product code was changed. The base URL in their OpenAI client config changed from api.openai.com/v1 to api.promptunit.ai/api/proxy/openai. That was the entire migration.

The numbers

After 30 days with routing live:

Before After
Monthly AI spend $12,000 $6,960
Savings $5,040 (42%)
PromptUnit fee (20% of savings) $1,008
Net monthly savings $4,032
Annualized net savings $48,384

Quality flags in the first 30 days: zero customer complaints, zero support tickets related to AI output. The team's internal quality threshold — set at 87 out of 100 for their account — was met on 97.3% of requests. The 2.7% that fell below threshold were automatically escalated back to GPT-4o by the routing engine.

What the complexity breakdown actually looked like

One of the more useful outputs of the observation period was the task complexity distribution. Across 14 days and roughly 840,000 API calls:

  • Low complexity (classification, extraction, simple extraction): 68% of calls, 43% of cost
  • Medium complexity (summarization, standard Q&A): 19% of calls, 28% of cost
  • High complexity (multi-step reasoning, agentic tasks): 13% of calls, 29% of cost

The high-complexity slice — 13% of calls — was consuming 29% of the cost. That 13% stayed on GPT-4o. The other 87% of traffic was eligible for routing.

This distribution is consistent with what we see across most production AI workloads: the majority of calls are simpler than the product team assumes, because the product was built when expensive models were the only reliable option. The habit of routing everything to GPT-4o made sense in 2023. It costs real money in 2026.

The quality question

The concern that stopped the team from trying again after their first failed attempt was quality. They had a data point — the earlier GPT-4o-mini rollback — and that data point was blocking further experimentation.

The observation period resolves this directly. Before enabling routing, you can see exactly which of your requests the router would have changed, and what model they would have gone to. You can audit specific examples. You can run the Balanced setting for another two weeks in shadow mode and compare outputs side by side before committing to anything.

The team ran one additional shadow week after the observation period ended. Spot-checking 50 randomly selected requests that would have been routed to GPT-4o-mini showed no meaningful difference in output quality on classification and extraction tasks. That was enough to enable.

The compounding effect

Once routing is live, the system continues learning. Every request produces a quality signal — did the routed response meet the threshold? Was there a retry? Did the same request pattern produce a different quality outcome on a different model? Those signals feed back into the routing decisions over time.

After 90 days, the per-org routing model has seen enough of the team's specific request patterns to make more accurate predictions than a general-purpose classifier. The longer the system runs, the better the routing. The savings trajectory is not flat — it improves as the signal set grows.

Starting your own observation period

The 14-day observation period is free, requires no commitment, and makes no changes to your production traffic. You connect one API key, change one base URL, and watch the data come in.

For most teams spending $3K or more per month on AI APIs, the savings from proper routing exceed the cost of the observation period on the first day it's enabled. The only cost is the time to review the data and make the call.

Start your free observation period at promptunit.ai.

Start your 14-day observation period

See exactly how much you'd save before paying anything. Zero risk. if we save you $0, you pay $0.

Get started free →