All posts
·8 min read

OpenAI Was Down on April 20. The Real Question Is Whether Your Users Noticed.

OpenAI's API platform, ChatGPT, and Codex went down on April 20, 2026 for hours. If your product broke with it, you have a single-provider problem. Here is the architecture fix.

llm proxyai inference proxyproduction llm optimizationcross-provider llm routingai infrastructure cost

OpenAI's API Platform, ChatGPT, and Codex went down on April 20, 2026 starting around 10:05am ET. At the peak, Downdetector logged over 8,700 reports in the UK and 1,900 in the US. Twelve major ChatGPT components and one critical Codex system showed degraded status. Users hit gateway timeouts, 403 forbidden errors, and authentication failures for nearly three hours. OpenAI applied a mitigation by 12:48pm ET. No root cause was disclosed.

If you build on the OpenAI API and your product broke that morning, you have a single-provider dependency problem. If your product did not break, congratulations: someone on your team thought about failover before they had to.

This post is for the first group. The fix is not to switch providers. It is to architect your LLM stack so that any single provider going down for three hours does not turn into your worst customer-support week of the quarter.

What actually happened

Reports started spiking on Downdetector around 10:05am ET. The pattern was the usual mix: gateway timeouts on API requests, 403 errors on auth-related endpoints, login failures inside ChatGPT, and Codex sessions failing to start. By the time OpenAI posted to its status page, twelve services were already affected.

The mitigation went out around 11:45am ET (4:45pm BST), and ChatGPT availability returned to baseline by 12:48pm ET. The Codex partial outage took longer to fully resolve. Total impact window: roughly three hours for end users, longer for some API consumers depending on which endpoints they hit.

For context, this was not a freak event. Looking back at the last 18 months, every major LLM provider has had at least one multi-hour outage. Anthropic had a notable Claude API incident in February 2025. Google had a Gemini availability event in November 2025. OpenAI itself had a global outage in December 2024 that took down ChatGPT for over four hours. The base rate of "your single LLM provider goes down for 1 to 4 hours per quarter" is approximately 100%.

What single-provider failure costs

The dollar cost of an LLM outage is rarely about the failed API calls. Those are pennies. The cost is in second-order effects:

Customer-facing features that fail closed instead of degrading gracefully. A summarization feature that returns "an error occurred" looks broken. A summarization feature that says "long-form summary unavailable, here is the bullet-point version" looks like a feature.

Support volume. Every customer who hits the broken feature opens a ticket, often two. Three hours of API outage produces 30 to 60 days of follow-up email volume in practice.

Trust decay. Users do not separate "OpenAI is down" from "your product is broken." They remember the second one, and they bring it up the next time something glitches.

Code paths that retry storms make the recovery worse. Naive exponential backoff inside thousands of clients all converging on the recovering API guarantees that the recovery is slower than it needed to be. The post-recovery retry storm often does more damage than the outage itself.

The point is that the cost of single-provider dependency is not denominated in API tokens. It is denominated in customer trust, support load, and engineering time spent firefighting things that should have been graceful.

The architecture pattern

A reliable LLM stack has three properties that a single-provider stack does not.

First, it routes per request, not per service. Static failover (where you flip a feature flag from OpenAI to Anthropic when something breaks) takes minutes to hours of human attention. Per-request routing makes the failover decision automatic, in milliseconds, with no human in the loop. We covered the broader case for per-request routing in our guide to LLM model routing, and reliability is one of the strongest reasons to go that direction even if cost is not your primary motivation.

Second, it has a circuit breaker. When a provider's error rate crosses a threshold (we use 5% in our default config, with a 30-second sample window), traffic automatically routes to a backup provider. The circuit breaker prevents two failure modes: cascading retries against a struggling provider, and false-positive failovers caused by a single bad request. When the primary provider's error rate drops back below threshold, traffic resumes gradually, never as a thundering herd.

Third, it speaks multiple provider dialects. OpenAI, Anthropic, Google, and Groq all have different request formats, different streaming protocols, different tool-use schemas, and different rate-limit semantics. If your application code can only emit OpenAI-format requests, your "failover to Anthropic" plan is a six-week migration, not an automatic decision. Dialect translation needs to live somewhere in the stack so the application never has to know which provider served the request.

The pattern that combines these three is the inference proxy. We laid out the long-form case in what an AI inference proxy is and why engineering teams need one, and outages like April 20 are the most concrete reason it stops being optional.

The implementation, in practical terms

If you are starting from a single-provider setup today, the migration is shorter than it looks. The minimum viable reliability upgrade is:

  1. A second provider account with credit on file. Pick the one whose model lineup most closely matches your current OpenAI usage. Anthropic is the natural pair for most coding and tool-use workloads. Google or Groq are options depending on your latency and price targets.

  2. A routing layer in front of the application code. This can be a self-built reverse proxy, an open-source tool like LiteLLM, or a managed proxy service. The exact choice matters less than the fact that the layer exists.

  3. A health check loop that watches each provider's error rate and latency. Most teams under-build this part. The signal you want is not "is the provider's status page green," it is "are my requests succeeding right now." Status pages lag the actual incident by 10 to 30 minutes.

  4. A failover policy that says, in plain language, what should happen when each provider degrades. Default to "route to backup provider, log the failover event, alert on duration over 5 minutes." Customize per workload as needed.

  5. Dialect translation, so the application never knows which provider served the request. If the application is OpenAI-format-native, the proxy layer translates outgoing requests and incoming responses to and from Anthropic, Google, and Groq formats.

The total engineering cost of this setup is less than one engineer-week if you use an off-the-shelf proxy, and one to three engineer-weeks if you build it yourself. The cost of not having it is one bad outage away from being measured in churn.

The cost objection

The standard objection to multi-provider setups is cost. Two accounts, two minimum spends, two contract negotiations, two sets of rate limits. This was a real concern in 2023 and 2024. It is not a real concern in 2026.

Anthropic, Google, and Groq all run pay-as-you-go pricing with no minimum. Setting up a backup account costs zero dollars and one hour of engineering time. Token costs across providers have converged to within a factor of 2 to 3 on equivalent tasks (Claude Opus 4.7 at $5/$25 versus the new GPT-5.5 at $5/$30 is the latest data point), so the routing decision is no longer "the secondary provider is 10x more expensive." It is "the secondary provider is roughly the same cost, sometimes cheaper, and routing per request can capture the cheaper option as a side effect."

If anything, multi-provider routing tends to lower costs, not raise them, because it gives the routing layer more options to pick from. We covered this dynamic in our piece on cross-provider LLM routing.

What the April 20 outage should change

If you took a hit on April 20 and your team has been deferring this work, the calculus is now harder to defend. The base rate of provider outages is not zero. The cost of customer trust is not zero. The engineering work to add a second provider and a routing layer is finite and bounded.

The teams that did not notice the outage on April 20 were not lucky. They were architected for it.

How an inference proxy handles this

PromptUnit's Inferio engine has a circuit breaker layer (one of the 22 routing layers) that watches per-provider error rates and latencies in real time. When a provider's error rate crosses threshold, traffic for affected request types routes to Anthropic and Google automatically, with no customer code changes and no manual flip. Traffic resumes on the primary provider gradually after recovery, in line with the circuit breaker's recovery curve, never as a thundering herd. The SDK also has provider-side automatic failover: if the proxy itself becomes unavailable, traffic falls back to a direct provider call so the application is never left without an inference path. Customers swap a base URL, run for 14 days in observation mode to see projected savings, then flip the switch. Pricing is 20% of verified savings, with no flat fee.

If your team had a rough April 20 morning and is still on a single-provider setup, that is the highest-leverage thing to fix this quarter. Start at promptunit.ai.

Start your 14-day observation period

See exactly how much you'd save before paying anything. Zero risk. if we save you $0, you pay $0.

Get started free →