The Open Bar Is Closing: Thinking in Tokens, Not Just Hours

For two years, we consumed AI the way you use electricity in a loft: without ever looking at the meter. That era is ending.

The Open Bar Is Over

We could feel it coming. The models have become too efficient, too powerful, too expensive to operate for their use to keep being served as an all-you-can-eat buffet. The latest models can run for hours uninterrupted, produce results of astonishing precision, mobilize entire chains of agents — and every one of those seconds has a real cost somewhere. Anthropic, for example, positions Claude Opus 4.7 around long-horizon autonomous tasks, with field reports of 7- to 8-hour unattended sessions now part of normal testing.¹ On the provider side, OpenAI's inference costs are projected to jump from about $8.4B in 2025 to roughly $14.1B in 2026, and the company is tracking toward a ~$14B loss on $20B of annualized revenue.² The economics that kept the buffet open are running out of runway.

The new norm is no longer asking anything to anyone at any time. It's using the right model for the right task, giving it enough information to deliver the result without getting lost, and without ignoring the context in which we operate.

The time when you could start a conversation in the morning with one task and end the day with the same task, without thinking about what you consumed in between, is gone.

A New Way to Use Models

Using models today is a pragmatic exercise. Not ideological, not lazy. Pragmatic.

That means:

Understanding how much information the model actually uses to complete the task.
Making sure it uses the right tools — not all tools, not the most powerful by default, but the ones that make sense.
Getting closer to the metrics providers make available, to appreciate their depth.

The notion of token was already part of our technical vocabulary. But going forward, it will become part of our way of thinking, in the same way person-hours have always been part of how we estimate a project.

A token is not an implementation detail. It's the cost unit of a new class of intellectual work.

The Unit of Management Is Shifting

Before, our unit of management was time.

Time spent developing a feature. Time spent analyzing a client's needs. Time spent delivering value. All our management — projects, budgets, capacity, velocity — rested on that familiar unit.

That time hasn't disappeared. But it isn't alone anymore.

The transition between not using AI and using it well is also over. The comfortable learning pause is past. Today, we have to take into account two metrics in parallel:

The time we spend with the AI — because a human is always in the loop, orchestrating, validating, deciding.
The number of tokens the AI consumes to reach the goal we set.

It's the combination of these two metrics that will let us put a realistic cost on each of our interventions. One without the other gives a distorted view. A task can take 10 human minutes but consume $40 worth of tokens. Another can run for 2 hours in the background for $0.80. Both scenarios exist, and they don't follow the same management logic.

💡 The Reflex to Build

Before launching a task to an agent, ask yourself two questions: how long will this take me, and how many tokens will it burn? If you can't answer the second one, you're managing only half of the picture.

The End of Plans That Hide the Truth

The plans and licenses that let us predict our model usage will become less and less common. Not because providers are mean, but because the underlying economics no longer hold.

We'll likely keep paying for the tool that gives us access to the model — the IDE, the agent, the extension, the platform — without consumption being included in that subscription. We'll pay for Copilot. We'll pay for Claude Code. We'll pay the monthly fee for the tool. But what we do with it will depend on:

The number of tokens consumed.
The intensity of model usage.
And especially the choice of models activated behind the tool.

This is a structural shift. Organizations that negotiated global licenses and forgot about the topic until renewal will need to put real consumption governance in place. Not just a usage policy. Governance, with thresholds, alerts, per-team budgets, and regular trade-offs.

The market is already moving that way. Cursor retired its flat "fast request" quota in June 2025 and replaced it with a usage-based credit pool tied to actual token consumption.³ GitHub Copilot followed: on April 20, 2026, Microsoft paused new sign-ups for Copilot Pro, Pro+ and Student plans, tightened weekly limits, and pulled Opus models out of the Pro tier — while internal documents describe an accelerating plan to move all individual users to token-based billing, driven by a week-over-week cost of running Copilot that has nearly doubled since January 2026.⁴⁵ Even Anthropic, in April 2026, began testing the removal of Claude Code from its $20/month Pro plan for a slice of new prosumer sign-ups.⁶ The flat-fee era isn't just in question — it's actively being dismantled.

Educating People About Consumption

In the future — a very near future — there will be an education effort to make. For developers, but also for individuals at large.

An education about token consumption.

Just as we look at our electricity bill today, or in some regions our water bill, we'll look at our token bill. And with that habit will come a new skill: choosing the best model based on your own needs, your own usage patterns, and your own budget.

That assumes you can read what you consume. That you understand why a poorly framed prompt costs three times more than a precise one. That you realize an agent looping for an hour because it was misconfigured is the equivalent of leaving the air conditioner running all summer in an empty room.

As long as you don't read the bill, you don't change behavior. And as long as you don't change behavior, the bill keeps climbing.

Is This Really a Paradigm Shift?

On the other hand — let's be honest — are we really talking about a paradigm shift? Or are we talking about a paradigm we already know and just haven't applied to AI yet?

When you need a professional, you take the one most aligned with your need. You don't hire the best lawyer in the country to handle a fence dispute with your neighbor. You don't book an Olympic-grade strength coach for a sore shoulder.

You always choose the professional best suited to your need and your budget.

Why would it be different with AI? Why would we systematically send the most trivial task to the most powerful and most expensive model, just because we have a button that allows it?

Some organizations will always prefer to use the best of the best, all the time, because the advantage they get from it is so obvious that they refuse to worry about cost. That's a legitimate choice, and for some markets, it's probably the right one.

But the majority of organizations will need to develop AI management at the same level as the management of their other vendors: with selection grids, contracts, alternatives, periodic evaluations, and a sharp sense of value for money.

Sonnet or Opus? The Real Question

The question keeps coming up in teams: is there really a major, or even noticeable, difference between a more standard model like Sonnet and a premium model like Opus when you're doing a simple task?

The honest answer is nuanced. There's probably a slight advantage to using a bigger model, even on a simple task. A slightly more accurate phrasing, a slightly better-structured response, a marginally finer grasp of the context. But that's not always the case, and even when it is, the gap is often invisible to the naked eye.

So the real question isn't "is the bigger model better?". It's "is it better enough to justify paying the token premium, sometimes 5 to 10 times more, on this specific use case?"

The numbers make this concrete. As of April 2026, Claude Opus 4.7 is priced at $5 / $25 per million input/output tokens, while Sonnet 4.6 sits at $3 / $15 — roughly 1.7× at the API sticker price.⁷ But many organizations are still running on older contracts and legacy models: Claude 3 Opus was $15 / $75 per million tokens, an exact 5× premium over Sonnet.⁷ And if you compare a premium model to an entry-level one — say Opus 4.7 vs Haiku, or vs a small local model — the ratio can easily climb to 10× or more. The "5–10×" you hear in the field isn't hyperbole; it's what your invoice actually looks like when you default to the biggest model for every task.

For the vast majority of simple cases, the answer is no. We probably won't pay the premium for:

A translation of an email.
A proofread or grammar correction.
A clearer rewording of a paragraph.
A simple classification or short summary.
A mechanical refactor or a format transformation.

For these tasks, a base model — or even a small local one — does the job at a fraction of the cost. Keeping Opus for those cases is exactly like calling a surgeon to put on a band-aid: technically they can do it, but it's not a good allocation of the resource.

Conversely, as soon as you enter complex reasoning, architectural analysis, synthesis of ambiguous documents, or high-stakes decision-making, the gap between models becomes obvious — and paying the premium becomes equally obvious.

The trap isn't using a big model. It's using it by default, without asking the question.

Intelligence Has a Cost

Education about token consumption will happen through a natural parallel: humans.

It's artificial intelligence, but it's still intelligence. And intelligence, whether biological or silicon-based, has a cost.

You don't ask a principal researcher to reply to a routine email. You don't mobilize a $400-an-hour senior consultant to draft a meeting summary. You don't have a trivial contract reviewed by a partner at a top-tier law firm. Not because they couldn't do it — quite the opposite — but because the value produced doesn't justify the cost mobilized.

This intuition we all have, naturally, for human resources is exactly what we need to develop for AI models. It's the same reflex: match the quality of the intelligence mobilized to the quality of the intelligence required by the task.

And as with humans, this doesn't mean looking down on smaller models. It means using them for what they do well, and reserving premium models for what truly justifies them.

Concretely, Where to Start?

If this resonates, here are a few immediate actions:

Measure before judging. Before deciding whether AI is expensive or cheap, instrument it. How many tokens per developer per day? Which models are actually being used? Which tasks consume 80% of the budget? Most organizations have no idea.

Map tasks to the right model. Not every task deserves the same model. Trivial refactors, boilerplate generation, format transformation: a small local model or an entry-level model is enough. Architectural analysis, risky refactors, complex reasoning: that's where the premium model justifies itself.

Give the model what it needs, and nothing more. Context costs tokens. An entire project dumped into context "just in case" is wilful waste. Learn to provide the minimum relevant context. Your results will be better and cheaper.

Set up lightweight governance. Per-team budgets, overage alerts, monthly reviews. Not to police, to empower. A team that sees its consumption is a team that optimizes it.

Train teams to read their bill. Literally. Understanding the difference between input and output tokens, between hidden and explicit prompts, between a synchronous call and an agent that iterates. This economic literacy will become as fundamental as reading a cloud infrastructure cost.

What's Next?

The unit of management for intellectual work is shifting. Time remains, but it now comes with a second dimension: consumption. The organizations that will succeed in this transition aren't the ones with the biggest AI budgets. They're the ones that will have built a culture of measurement and choice.

The open bar was comfortable. But comfort has never been a sustainable business model.

Next time you launch an agent on a task, ask yourself: how much will this really cost? If you don't know how to answer, that's already the answer.

Sources

Footnotes

Anthropic Releases Claude Opus 4.7: A Major Upgrade for Agentic Coding, High-Resolution Vision, and Long-Horizon Autonomous Tasks — MarkTechPost; Claude Opus 4.7 Field Report: Eight Hours of Autonomous Work — dev.to. ↩
OpenAI Lost $5B on $3.7B Revenue: The AI Inference Cost Crisis — AI Automation Global; OpenAI's own forecast predicts $14 billion loss in 2026 — Yahoo Finance. ↩
Clarifying our pricing — Cursor blog, June 2025; Cursor Pricing Explained 2026 — Vantage. ↩
Changes to GitHub Copilot Individual plans — The GitHub Blog, April 2026; GitHub Copilot premium requests — GitHub Docs. ↩
Microsoft To Shift GitHub Copilot Users To Token-Based Billing, Tighten Rate Limits — Where's Your Ed At; Microsoft Brings Token-Based Billing to GitHub Copilot — Thurrott.com. ↩
Anthropic tests reaction to yanking Claude Code from Pro — The Register, April 2026. ↩
Anthropic API Pricing in 2026: Complete Guide — Finout; Pricing — Anthropic / Claude API Docs; Claude 3 Opus API Pricing — pricepertoken.com. ↩ ↩²