The Token Squeeze is Real

AI should feel like it’s getting cheaper. After all, compute costs fall, models get optimized, and every year brings new claims of a 10x drop in inference prices. But as Ethan Ding argues in Tokens Are Getting More Expensive, the opposite is true: the economics of AI subscriptions are in a squeeze.

Ding’s Core Argument

The paradox is simple. While yesterday’s models do get cheaper, users don’t want them. Demand instantly shifts to the latest frontier model, which always carries a premium. GPT-3.5 may cost a fraction of what it once did, but the market moved on to GPT-4, Claude 3, and beyond.

At the same time, token consumption is exploding. A task that once required 1,000 tokens now consumes 100,000, thanks to advances in reasoning, retrieval, and long-context computation. Unlimited-use subscription models can’t withstand this surge. As Ding shows with examples like Claude Code, even the most creative pricing experiments eventually collapse under runaway token demand.

He suggests three possible ways out:

Usage-based pricing from the start.
Enterprise sales where switching costs create defensible margins.
Vertical integration, where inference is the loss leader for cloud and developer services.

Extending the Argument

Ding is right: flat-rate consumer subscriptions are unsustainable. But the future might not be a strict choice between usage-based and enterprise-only strategies. There are other avenues worth exploring:

Hybrid models: Offer flat-rate tiers with defined token quotas, then metered billing for overages. This mimics mobile data plans and could ease users into variable pricing without shocking them with unpredictable bills.
Freemium for light tasks: Everyday consumer use—chatting, drafting short notes—could remain “free” or bundled, while heavier research or agent-based workloads become paid tiers.
Bundling with value-added services: Just as telecom bundles data with phones and streaming, AI providers could wrap agents with hosting, monitoring, or compliance features. This shifts the conversation from “pay for tokens” to “pay for outcomes.”

Another extension of Ding’s point is around agentic AI behavior. As models increasingly operate in loops—planning, critiquing, and iterating—they will consume tokens at orders of magnitude greater than those required for simple Q&A interactions. This means demand for compute is effectively infinite. Any model of pricing that assumes stable or predictable consumption is ignoring this reality.

Takeaway for Builders

The dream of $20 per month unlimited AI is exactly that—a dream. Token economics are forcing product teams to confront a hard truth: the marginal cost of AI isn’t vanishing, it’s multiplying as capabilities expand.

AI builders should look beyond consumer SaaS metaphors and study the pricing strategies of cloud infrastructure, telecom, and enterprise software. Ding’s framing of a “token short squeeze” is spot-on. The next challenge is designing models that align incentives across users, providers, and investors—before the squeeze becomes a choke.

Credit to Ethan Ding for sparking this discussion with his original article.

Ding’s Core Argument

Extending the Argument

Takeaway for Builders

Related Posts

When AI Bots Rule the Web

AI Platforms as the New Distribution Layer

AEO is the New SEO?