AI Cost Observability
Stop guessing where
your AI budget goes.
Most teams know their total OpenAI bill. Nobody knows which feature is responsible for 60% of it. PromptUnit shows you, without adding a single line of logging code.
When AI costs start to matter, the first instinct is to check the OpenAI dashboard. It tells you total spend, total tokens, and a breakdown by model. What it cannot tell you is which part of your product is responsible for the majority of that spend, which feature, which team, which user segment, or which prompt template is driving the bill.
PromptUnit solves this at the proxy layer. Every request that flows through the proxy is automatically tagged with its task type (summarization, classification, reasoning, extraction), scored for token efficiency, and attributed to the feature that made the call. You get a real-time view of cost by feature, by model, by task, and by time period, without instrumentation work on your side.
The feature tagging system uses a single optional request header. Add x-promptunit-feature: chat-assistant to your requests and those calls appear as their own line item in the dashboard. Teams with multiple AI features typically discover within the first week that one feature, usually a chat or summarization flow, is responsible for 50–70% of total spend. That insight alone changes how engineering prioritizes optimization work.
Cost by feature
Tag requests with x-promptunit-feature and see cost broken down per feature, chat, summarizer, assistant, search, or whatever you name it.
Cost by model
See exactly how much each model costs you per day, week, and month, across all providers.
Cost by task type
Understand the composition of your traffic: how much is classification vs reasoning vs generation, and what each costs.
Token efficiency scoring
Every prompt is scored 0–100 for efficiency. You see which features have bloated prompts and exactly how to fix them.
Spend alerts
Set hourly and daily spend limits. Automatic circuit breakers prevent runaway costs from a single misbehaving feature.
Real-time dashboard
Live cost tracking with latency, token counts, and routing decisions for every request.
Tagging is optional, but powerful
Add one header to see cost broken down by feature. Everything else works with zero instrumentation.
const openai = new OpenAI({
baseURL: "https://api.promptunit.ai/api/proxy/openai",
defaultHeaders: {
"x-promptunit-key": process.env.PROMPTUNIT_API_KEY,
"x-promptunit-feature": "chat-assistant", // optional tag
},
});
Who this is for
AI cost observability is most valuable for teams shipping multiple AI-powered features into production. If your product has a chat assistant, a summarization pipeline, a classification engine, and a search feature all running on LLM APIs, you need per-feature cost attribution to make informed trade-offs about where optimization effort pays off.
Teams spending $2,000 or more per month on AI APIs typically find that the observability dashboard pays for itself within the first two weeks, either by identifying a single high-cost feature that can be optimized, or by surfacing a prompt efficiency issue that was invisibly inflating token counts.