Field Notes
Zombie Caches and Stolen Keys: A Teardown of Two Runaway AI Bills
Reverse-engineering how Google's billing system broke from the shape of a BigQuery export — and an honest audit of which defenses fibon has built, and which one is still missing
Quick summary: two runaway Gemini API bills from the first half of 2026 — “deleted caches that kept billing” and “a stolen key that burned $82,000 in 48 hours.” Both look like exploding invoices; the failure mechanisms are completely different. What amplified the damage, though, is one and the same structural flaw: cloud billing is an open-loop system. The note ends, as always, with fibon: which bug classes have no surface to attach to here, and which defense we still haven’t built.
Skip this if: you don’t use any pay-as-you-go LLM API and have no curiosity about what a metering pipeline’s internals look like.
Incident one: the deleted cache that kept billing
On June 7, 2026, Brazilian developer Danilo Oliveira posted an SOS on the Google AI Developers Forum. His system ran analysis jobs using Gemini 3 Flash’s context caching. On the afternoon of June 6 he noticed the bill was wrong: after shutting down the script that created the caches and confirming via the official REST API that the cache list was completely empty, a billing SKU called “cached text storage token hours” kept charging him over 1,000 Brazilian reais per hour. By the early hours of June 7, the cumulative bill hit R$17,847 (several thousand US dollars). His last-resort tourniquet: disabling the Gemini API service for the entire Google Cloud project.
He did something valuable for everyone: he exported his billing data to BigQuery and posted the hour-by-hour breakdown. The shape of that data is more honest than any prose. It has three phases:
- The first two days (June 3 – midday June 5): a steady 4–5M token·hours, 20–30 reais per hour — the baseline of his script running normally.
- The runaway phase (from the afternoon of June 5): usage starts compounding, climbing all the way to 200 million token·hours per hour.
- The frozen phase (after killing the script on June 6): the hourly billed quantity locks at exactly 200.7142M token·hours — identical to four decimal places, charged like clockwork every hour, until he pulled the plug on the entire API.
Reverse-engineering the failure from the bill’s shape
To read this data you first need the billing model of explicit context caching. You upload a large block of text (say, a long document) as a cache; subsequent requests reference it instead of re-sending it. The price is a storage fee: cached token count × hours stored. Note the essential difference from a normal API call: a call is a one-shot event, while a cache is a stateful cloud resource that bills continuously — a rented storage unit with a meter that runs every hour until you move out.
So under normal operation, deleting a cache (or its TTL expiring) should stop the meter. From here on is my speculation — but the three-phase bill shape is nearly impossible to explain by any mechanism other than this one:
The resource plane and the billing plane are two separately-governed states. When you call the cachedContents list/delete API, you operate on the resource plane’s registry; billing runs on a different pipeline — periodically snapshotting “total tokens currently in storage” and multiplying by hours. At some point, deletion and expiry events stopped propagating to the billing plane:
- the runaway phase = the script was still creating new caches, but the old ones never disappeared from the billing plane, so the stock kept accumulating;
- the frozen phase = with the script off, nothing new was added; the zombie stock froze in place and became a fixed-amount hourly perpetual-motion charging machine;
- and the cruelest part: the user-facing API showed an empty list — the state you can see is clean, while the state being billed is invisible to you, and undeletable.
What gives the speculation real footing: this wasn’t the first time. In March 2026, another developer, Liz2k, reported the exact same pattern — she created five test caches with a 5-second TTL, the list query came back empty all day, yet her bill showed 3.4 million “storage hours,” then burned a flat $36 every day after. She called it the “infinite zombie cache.” She, too, ended up disabling the entire API — and then observed a crucial detail: about three days after the disable, the bill was retroactively corrected down to the true figures. In other words, reconciliation logic exists — but it apparently only triggers when the API is forcibly severed. The same bug class has publicly detonated at least twice within three months.
Incident two: the stolen key — $82,000 in 48 hours
The second incident dates back to February, but it only shows its full shape next to incident one. A three-person team in Mexico had their Google Cloud API key leaked. Between February 11 and 12, the thief used it to hammer Gemini 3 Pro image and text generation, racking up $82,314 in 48 hours — against the team’s normal monthly spend of $180. When they appealed to Google, they got the cloud industry’s standard answer: the “Shared Responsibility Model” — the platform protects the platform; the key is your problem.
The key leak was, of course, the team’s lapse. But what let $180 become $82,314 with nothing tapping the brakes along the way is structural:
- GCP’s Gemini API has no hard spending cap. A Budget Alert notifies you; it doesn’t stop anything — it’s an observability tool, not a control. Compare the prepaid-credit models at OpenAI and Anthropic: when the balance hits zero, service stops. Naturally capped.
- Billing signals lag. Billing exports can trail reality by 24 hours or more. By the time you see the anomaly, the money is gone.
- Google API keys start with
AIzain a fixed format, trivially harvested by scanners crawling public repos and frontend bundles. These keys were never designed to be high-value authentication credentials — until Gemini turned them into something directly convertible into money.
The common root cause: billing is open-loop
Two incidents — one a provider-side state machine breaking, the other a client-side credential failure — superficially unrelated. But what amplified both disasters a hundredfold is the same structural flaw:
There is no closed loop between the rate of spending and the authorization to spend.
Billing is an asynchronous, eventually-consistent aggregation pipeline. Every signal you can get — dashboards, budget alerts, BigQuery exports — lags by hours to days. And inside that lag window, no mechanism automatically connects “anomalous spend rate” back to “stop authorizing spend.” Control theory calls this an open-loop system: the throttle is pressed, but no sensor feeds back to the wheel. The victims of incident one and incident two were both left with the same manual brake: ripping out the entire API service.
What this means for fibon
Per this section’s convention, we end at home: can fibon withstand these two bug classes? The honest answer comes in three layers.
Layer one: the zombie-cache bug class has no surface to attach to in fibon — but that’s luck plus selection, not foresight. fibon’s prompt cache strategy (Deep Dive C has the full teardown) uses per-request cache_control breakpoints with a 5-minute TTL on Anthropic, and automatic prefix caching on OpenAI and Google. None of these mechanisms carries a separate storage billing SKU — fibon never holds any “stateful cloud resource that bills continuously,” so the failure type “an undeletable zombie resource” has nothing to attach to. When I chose automatic prefix caching over explicit caching, the reasons were engineering simplicity and sufficiency — not a premonition of this incident. The conclusion preceded the correct justification. Noted for the record; no credit claimed.
Layer two: fibon keeps an independent ledger outside the provider — that’s the capital for detection. fibon’s observability layer (Deep Dive A) already writes every LLM call’s token usage — including cache hits — into its own metrics. That means “what I believe I used” exists as a record that doesn’t depend on the provider. The victim of incident one needed three days and a manual BigQuery dig to spot the anomaly; with a daily “own ledger vs. provider bill” reconciliation job, this kind of thing pages you in hour two. The difference is R$50 versus R$18,000.
Layer three: key protection is a defense fibon’s architecture already built. fibon’s API keys live only on the server side; the frontend never touches them. The Worker that runs untrusted code is confined to an isolated network, with the design goal written in black and white: even if compromised, it cannot reach the API keys (Chapter 6). Incident two’s failure mode — keys sitting in frontend code or a public repo — has no path to occur under this architecture.
To close, four lines of defense for anyone using pay-as-you-go APIs, sorted by value for money:
- A billing-layer firewall — put high-risk workloads in a separate cloud project, and wire budget alerts to a function that detaches billing automatically (GCP’s own docs describe this pattern; it is the only true hard cap).
- Keys never leave the server — frontends and public repos only ever see a proxy endpoint.
- Keep your own ledger — record every call’s usage yourself and reconcile against the provider’s bill; don’t outsource “knowing what you spent” to the billing system.
- Avoid stateful, continuously-billed features — unless you truly need a “rented storage unit” service like explicit caching, use per-request alternatives. No state, no zombies.
You can never prevent a provider-side bug. But the blast radius is yours to draw.
Sources
- Google Gemini API cache billing bug — BlockTempo (2026-06-09, zh-TW)
- URGENT: Huge cost cache increase issue (2) — Google AI Developers Forum (2026-06-07)
- URGENT: Huge cost cache increase issue — the same bug class, three months earlier (2026-03-18)
- API key leak turns into a 48-hour nightmare — TechBang (2026-03-17, zh-TW)
- Dev stunned by $82K Gemini API key bill after theft — The Register (2026-03-03)
- Gemini API key thief racks up $82,314 in charges in just two days — Tom's Hardware (2026-03)