All posts
·7 min read

AI Cost Per User: The SaaS Unit Economics Metric You Are Probably Not Tracking

Your monthly LLM bill is the wrong number to watch. AI cost per monthly active user is the metric that predicts whether your margins hold as you scale.

saas unit economicsai cost per userllm cost managementsaas margins

Most SaaS founders and engineering leads watch their total monthly LLM bill. It is the number on the invoice, it is what their finance team asks about, and it is what spikes when usage grows. But it is the wrong metric for understanding whether your AI features are sustainable. A $10,000 monthly LLM bill means nothing without knowing how many users generated it. The number that actually tells you whether your business is viable is AI cost per monthly active user, and if you are not tracking it, you are flying blind on one of the most important levers in your unit economics.

Why AI Cost Per MAU Is the Right Metric

The formula is simple: AI cost per MAU equals total monthly LLM spend divided by monthly active users. If you spend $8,000 per month on LLM calls and have 5,000 monthly active users, your AI cost per MAU is $1.60.

The reason this metric matters is that it connects your LLM spend directly to your revenue model. Most B2B SaaS products price between $20 and $100 per user per month. At $49 per seat, you can sustain roughly $2 to $5 per user per month in AI costs and maintain margins that are acceptable for a software business. At $99 per seat, you have more room, maybe $10 per user per month. If your AI cost per MAU exceeds 10% of your average revenue per user (ARPU), you have a unit economics problem that will get worse, not better, as you scale, unless you actively intervene.

The practical danger is that a growing total bill masks a stable or worsening per-user cost. You add 500 new users and your LLM bill grows from $8,000 to $14,000. That looks like the expected linear growth. But if those 500 new users are heavier users than your existing base, and your cost per MAU has quietly climbed from $1.60 to $2.10, you now have a margin compression trajectory that compounds over time.

What Happens to Per-User Costs at Scale

Here is the part that surprises most teams: LLM costs do not benefit from the same economies of scale as hosting infrastructure. When you add more servers to a cloud-hosted application, fixed costs spread across more users and cost per user drops. LLM API costs are nearly linear with usage. Ten times the users means roughly ten times the token spend, all else being equal.

This is a fundamental difference from traditional SaaS COGS. A database query that costs $0.0001 to run does not get cheaper just because you run it 10 million times per month. LLM tokens work the same way. The only mechanisms that actually reduce cost per user at scale are architectural: prompt caching, model routing, output compression, and semantic caching on repeated queries. These are engineering choices, not volume discounts.

There is one exception worth noting. If your workloads are suitable for batch processing, both OpenAI and Anthropic offer 50% off all models for asynchronous batch requests. As your volume grows, more of your workload may become eligible for batching, giving you some downward pressure on per-unit costs. But this requires deliberate pipeline design, not automatic benefit from growth.

Where the Cost Actually Hides

The most common mistake teams make is treating LLM spend as a single line item instead of attributing it to specific features and users. In practice, a small number of features usually drive a disproportionate share of spend.

Consider a product with four AI features: a smart search, a document summarizer, a code assistant, and an automated email drafter. The search might use GPT-4o-mini with short prompts and run 50,000 times per day. The document summarizer uses a larger model with 10,000-token contexts and runs 2,000 times per day. Despite running 25x less often, the summarizer likely costs more in absolute dollars because each call is far more expensive. If you are not tagging your API calls by feature and aggregating by cost, you cannot see this. You just see a total that is growing.

The fix is to tag every LLM API call with at minimum a feature identifier and a user identifier in the metadata. Most LLM providers allow you to pass custom metadata with requests. Aggregate these monthly and you will quickly see which features and which users drive your spend. It is not unusual to find that 5% of users generate 40% of LLM costs. That is not necessarily a problem, but it is information you need to make good decisions about pricing tiers, rate limits, and feature design.

A related mistake is using the same model for all users regardless of their usage tier. If you have a free tier and a paid tier, running both through Claude Sonnet 4.6 at $3/$15 per MTok means you are subsidizing free users with paid-tier infrastructure costs. Free or low-tier users often have simpler needs that a cheaper model handles adequately. Routing by task type and user tier is one of the highest-leverage interventions you can make once you have attribution data.

Setting Per-User Cost Budgets

Once you know your AI cost per MAU, the next step is setting a target and monitoring for breaches. A reasonable starting target for most B2B SaaS products in the $30 to $99 per month range is keeping AI COGS below 8% of ARPU. At $49/month ARPU, that is $3.92 per user per month.

The way to operationalize this is to set per-user monthly spending limits in your application layer. Track cumulative token spend by user_id across the month. When a user approaches a threshold, either throttle AI feature access, route them to cheaper models, or flag the account for review. This is not punitive, it is financial hygiene. A user who generates $15 in LLM costs on a $49/month plan is unprofitable on a COGS basis before accounting for any other operating expenses.

Per-user limits also give you data for pricing decisions. If a significant portion of your users regularly hit limits, that is a signal that your pricing structure does not reflect the value you are delivering. If almost no users hit limits, your thresholds are too generous and you may be subsidizing usage you cannot see.

A concrete example of how this plays out: suppose you have 500 MAU generating $800 per month in LLM costs. Your AI cost per MAU is $1.60, your ARPU is $30, and the ratio is 5.3%. That is acceptable. You add 4,500 new users and your total bill grows to $8,000 per month. If your architecture is uniform across users, your per-MAU cost should still be $1.60. But if your growth is coming from a customer segment that uses your document processing feature heavily, and that feature was not costing much with 500 users but now runs at scale, the per-MAU cost could climb to $3.00 or more. The total bill grew, but so did the ratio, and the ratio is what tells you whether growth is making you more or less profitable per user.

PromptUnit provides per-user and per-feature cost attribution out of the box, letting you see exactly which parts of your application are driving spend and set alerts when per-user costs exceed defined thresholds. It works by intercepting API calls and attaching metadata before passing them through to the provider.

The teams that manage AI unit economics well are not the ones with the most engineering resources. They are the ones that started measuring the right thing early. Cross-provider routing and model selection are optimization levers, but they only work if you have the attribution data to know where to apply them.

Build the measurement layer first. Know your AI cost per MAU today, set a target, and track it weekly. Then optimize. Start by instrumenting your LLM calls at PromptUnit.

Start your 14-day observation period

See exactly how much you'd save before paying anything. Zero risk. if we save you $0, you pay $0.

Get started free →