Why Gemini 3.5 Flash Eats Your Tokens (and How Google Just Fixed It)

June 30, 2026 5:49 AM

Gemini 3.5 Flash token drain and the Flash Low fix

---Advertisement---

If you have used Gemini 3.5 Flash and watched your token balance vanish, you are not imagining it. Google’s newest fast model is brilliant at coding and agentic work, but it has a habit of burning through tokens at a shocking rate, in some cases faster and more expensively than the pricier Pro model it was supposed to undercut. The good news is that Google just shipped a fix. Here is what is going on and what changed.

Let me explain why Gemini 3.5 Flash eats tokens, what real developers found, and how the new Flash Low option and a full quota reset are meant to solve it.

Why Gemini 3.5 Flash burns tokens so fast

Gemini 3.5 Flash launched generally available on May 19, 2026, with a 1 million token input window and frontier level performance for its tier. The catch is how it thinks. Flash generates a lot of internal reasoning tokens to reach its answers, which means a single task can quietly consume far more than you expect. Independent comparisons suggested it could run roughly 2 times more expensive than the older Gemini 3.1 Pro, and far more than the previous Gemini 3 Flash, on the same kind of work. For a model marketed as the cheap and fast option, that stung.

What real developers actually found

This is where it gets painful, and very human. On the Google AI developer forum, one user said Gemini 3.5 Flash “wreaked havoc” on their codebase and they had to buy 60,000 extra tokens just to recover work lost over several days. On r/google_antigravity, a widely shared benchmark claimed Flash was about 2 times more expensive than 3.1 Pro and around 5 times more than Gemini 3 Flash in practice. My favorite line came from another developer who joked that the model is called Flash because of how fast it drains your tokens, not just how fast it responds. When you are paying per million tokens, that adds up quickly.

How Google is fixing the token drain

Google heard the complaints and responded on two fronts. First, it introduced a new lighter variant, Gemini 3.5 Flash Low, which generates roughly 45 percent fewer tokens than the standard model by spending less effort on simple tasks. The original model has effectively been renamed Gemini 3.5 Flash Medium to make the difference clear. Second, Google reset the usage quota counters across both free and paid plans, giving developers a clean slate after weeks of unexpected burn.

There was a wrinkle worth knowing. An earlier low effort version of the model cut token use by about 45 percent but caused a sharp drop in output quality on harder, more analytical tasks. Google says it patched that specific blind spot in the refined version, so the goal now is fewer wasted tokens without the quality hit. If you tried an early low effort build and hated it, it is worth testing again.

What this means for you

If you use Gemini 3.5 Flash for simple, repetitive work, switch to Flash Low and you should see your token bill drop noticeably. Save the standard Flash Medium for genuinely complex tasks where the extra reasoning earns its keep. And if you were burned before the quota reset, check your dashboard, because your counters were likely refreshed. For heavier reasoning jobs, the bigger question is whether to wait for the flagship, which I covered in our guide to the Gemini 3.5 Pro release date. The delay behind that model is part of the same quality tuning story I broke down in our piece on the Gemini 3.5 Pro delay.

For the official details, see Google’s Gemini 3.5 developer documentation and Android Authority’s coverage of the Flash Low fix. If you want everyday AI tools that do not eat your wallet, check our roundup of the best free AI apps for Android.

Gemini 3.5 Flash FAQ

Why does Gemini 3.5 Flash use so many tokens?

It generates a lot of internal reasoning to reach answers, so a single task can cost more than expected, sometimes more than the older 3.1 Pro on the same work.

What is Gemini 3.5 Flash Low?

A lighter variant that generates about 45 percent fewer tokens by spending less effort on simple tasks. The original is now called Flash Medium.

Did Google reset Gemini token limits?

Yes. Google reset quota counters across free and paid plans alongside a performance patch aimed at fixing the token drain and quality issues.

The bottom line

Gemini 3.5 Flash is fast and capable, but its hunger for tokens caught a lot of users off guard. With the new Flash Low variant, the Flash Medium rename, and a full quota reset, Google is trying to make the cheap model actually feel cheap again. If you pick the right variant for each task, you get the speed without watching your tokens evaporate.

Your turn: how bad was your token bill?

Now be honest in the comments. Did Gemini 3.5 Flash quietly drain your tokens too, and how much did it cost you before you noticed? Do not just scroll past, share your experience below so others know what to watch for. Have you tried the new Flash Low variant yet, and did it actually cut your usage? And if you switched to a different model over this, tell us which one and why, because plenty of readers are weighing the exact same move.