Demand for Intelligence Is Near-Infinite, at a Price

AI’s next bottleneck is not intelligence, but affordability

Jun 28, 2026

Hello,

For a long time, since OpenAI made large language models a common household phenomenon, AI felt like a new toy in the town to play around with. It was only after the new models began consuming significantly more energy and resources that the reality of AI hitting its token limits dawned on many.

Even in our editorial calls and conversations on AI forums, every new model always prompted people to utilise it to its full potential and compare the results. Most of those who debated AI models paid less attention to how each marginal improvement in the thinking model increased unit cost disproportionately, often for similar or only slightly better output.

Better thinking models meant you exhausted your daily token limit faster, with fewer queries and prompts. This changed the user behaviour almost immediately. Even after I started hitting my daily limits more often, I still used AI, but I became more careful about choosing models based on the task I wanted to carry out.

That extra decision annoys users, but it begs a much larger question. The trillions of dollars of capex flowing into the AI industry, the series of data centres being set up, and the investments flowing into the portfolios of companies operating in compute suggest that demand for intelligence is infinite. But most people and companies don’t repeatedly buy capacity for intelligence unless they have a clear task to achieve - like write faster, code better, sell more, support customers, cut costs, or make decisions.

This is why presuming the demand for intelligence is near-infinite is a flawed assumption.

In today’s piece, Kevin Simback, COO at Delphi Labs, explains this clearly. Demand for intelligence may be enormous, but it is not infinite at every price. It becomes near-infinite only when the cost of useful work falls below the point where buyers can justify it.

On to Kevin’s story,
Prathik

Follow Kevin on X

I keep coming back to one assumption that only some people are starting to say out loud.

The entire buildout - the data centres, the chips, the trillion-plus dollars of committed capital, and the large, growing share of the stock market now riding on it - rests on a single belief. That the demand for intelligence is infinite.

It isn’t. Demand for intelligence is near-infinite at a price. That qualifier is the difference maker.

In the first half of this year, we’ve mostly ignored price and just devoured intelligence. You can see it clearly in Anthropic’s revenue numbers.

We’ve even coined a new term, tokenmaxxing, to gloat about our unbounded consumption.

For most companies, this just isn’t sustainable. Companies need to find ROI, otherwise they will absolutely throttle demand.

In the first half of this year, we got away with convincing the CFO to loosen the purse strings, but the second half will be about tightening them back up.

This article double clicks on this thesis, what to do about it, and what happens if we get this wrong.

The Demand Curve Nobody Drew

“Near-infinite demand” and “infinite demand” may sound like the same thing. They are actually two completely different economies.

Infinite demand is a vertical line. It says: people will buy intelligence at whatever price is offered, so the only thing that matters is making more of it.

Just keep scaling the supply and the demand is already there waiting. Most of the capital being deployed today behaves as if this is true.

How many people have you heard say, “If Anthropic had more compute, they could be doing $100B ARR?”

Near-infinite demand is a downward-sloping line with a hard floor underneath it. It says: demand for intelligence explodes, but only once the price drops below the level where the buyer earns a return on it.

Above the ROI line, demand is thin and more volatile. Below it, demand is effectively bottomless. The floor is ROI, and where a company sits relative to it decides everything.

This shouldn’t be considered crazy talk, yet many speak of the demand curve as vertical.

The actual buyers are telling us exactly where the line is. An MIT study last year found that only about 5% of enterprise generative-AI pilots produced a rapid, measurable financial impact. The rest stalled.

A large cross-country survey of executives this year found that nearly 90% of firms reported no measurable effect from AI on productivity or headcount over three years.

The typical response is to call this a “skill issue” and just assume companies will get there.

That may be true; perhaps we just need more Anthropic and OpenAI Forward Deployed Engineer (FDEs) embedded within enterprises to make ROI a reality. That seems to be what the new “deploy cos” are betting on.

But it’s hard to assume both things can be true at once - either the demand for intelligence is infinite (in which case, why do the labs need deploycos?) OR demand is near-infinite but only under the right conditions.

Tokenmaxxing: Optimising against Your Own Customer

Here is where it gets a bit self-inflicted.

The reflex across the industry is to reach for the biggest model and the deepest reasoning setting for every task.

If you leave this up to users, that’s exactly what you’ll get - they’ll just run everything through Opus 4.8/GPT-5.5 because, why not?

But this behaviour is what will quietly kill ROI.

The newest reasoning models make it so easy to burn tokens. They generate long internal chains of thought before they answer, and you pay for every one of those hidden tokens.

A request that returns a few hundred visible words can burn many thousands of billed tokens under the hood, which means the real cost per useful answer can run an order of magnitude higher than the sticker suggests.

In one controlled evaluation, a top reasoning model cost roughly 14x more than a cheaper general model on the same task and produced four times as many failures. Maximum intelligence, applied indiscriminately, made the result both more expensive and worse.

The deeper problem is an incentive mismatch - the seller is paid by the token, but the buyer is paid by the outcome.

Tokenmaxxing maximises the first and starves the second. It pushes the price of every decision up the demand curve, away from the floor where demand actually lives, at the exact moment the whole system needs to drive price down through it.

You cannot fill near-infinite demand by selling the most expensive version of the product to everyone. You fill it by getting the price of a good-enough answer under the ROI line and letting volume do the rest.

Follow Kevin on X

Why “But Costs Always Fall” Isn’t the Rescue

I know you want to say “Jevons paradox” right now, but that’s not going to let you off the hook.

First, the cost that has to fall is the cost per useful outcome, not the cost per token. Those two can be related, but not if we don’t address the behaviour mentioned above.

If per-token prices drop while tokenmaxxing pushes token consumption per-task up even faster, the cost of getting a real job done doesn’t fall at all. It could even rise.

Falling unit prices and rising bills coexist comfortably, and I hear reports of teams living that contradiction right now.

Second, the savings have to actually reach the buyer. Efficiency captured entirely by the seller, or burned on more reasoning that the task never needed, never crosses the ROI line where the demand is.

If you lower the cost of tokens for a model, but the reasoning and tool calling under the hood drives up the number of tokens consumed, it’s not clear that the outcome itself is getting cheaper.

This is a markets problem, not a procurement problem

If this were only about enterprise software disappointing, it wouldn’t be that big of a deal. But we’ve now put most of the global economy’s chips in the “AI trade” basket, so any slip-up here and the implications get big.

The major US hyperscalers will spend more than $1 trillion in 2025-26. That spend is underwritten by an expectation of future revenue that depends on enterprise adoption, which in turn is dependent on ROI.

The gap between capex and revenue is being bridged with financing, some of which looks quite circular. I’m not ready to raise any alarm bells yet, but we need to connect all this back to the demand curve.

In the simplest view, if demand turns out not to be infinite, but is actually near-infinite and gated by ROI that hasn’t yet materialised, then the big risk is that the revenue that justifies the capex doesn’t arrive on schedule.

The financing loop can’t keep substituting for real cash flow forever. A technology this leveraged and this concentrated doesn’t get to underdeliver without major economic reverberations.

Said bluntly, if ROI doesn’t show up at scale, the AI trade begins to unwind, and a reflexive, circular, concentrated structure unwinds the way it was built: fast, and on everyone at once.

That is the worst-case scenario. I’m not saying that is the most likely, but it is a scenario that starts with enterprises becoming unhappy about their pilots and throttling demand.

“ROI takes time” - yes, and that is exactly the danger

The most serious objection to the ROI issue is one of timing.

Every general-purpose technology lags, and the payoff can be real. But it might not be on the same quarterly clock that drives the markets.

The mismatch is between a payoff that runs on the slow clock of organisational redesign and a financing structure that runs on the fast clock of capital markets.

The technology can be a genuine multi-decade transformation, and the trade financing can break in the meantime. Both can happen at once.

That is why time is the scarce input here, and why we need to get moving on it.

What ROI Already Looks Like

The picture is not entirely bleak, and that’s not what I want as the takeaway either.

Some enterprises are getting real, measured returns right now. But, in many cases, it is not the teams chasing the smartest possible model on the broadest possible mandate.

It is the teams running narrow, bounded, repeatable workloads, with the right-sized model for the job and an actual measurement of the outcome.

The ROI is happening in companies that deploy intelligent routing - sending easy queries to cheap models and escalating only the hard ones.

The ROI is happening in teams that swap a frontier model for a small, fine-tuned one on high-volume structured tasks.

The ROI is occurring in enterprises that employ prompt and context-trimming tools to mitigate against users sending everything to the most expensive, highest-reasoning model.

The ROI is happening in projects that build custom agent harnesses that manage tool access, memory, permissions, workflow state, evals, logging, human approvals, fallbacks, and cost controls.

None of these are examples of weaker AI. They are examples of the same outcome delivered below the ROI line instead of above it.

The early winners, those finding ROI, obey one discipline: intelligence per dollar, not intelligence at any price.

The Reframe

The entire premise of tokenmaxxing measures the wrong number.

The number that decides whether this entire AI trade is sustainable is the outcome per dollar.

Useful work delivered, divided by what it cost to deliver. That ratio is what moves a use case from above the ROI line to below it, and the volume of demand waiting below that line is the only thing large enough to justify what’s already been spent.

So the urgent work, the genuinely urgent work, is driving cost-per-outcome down hard enough and fast enough to pull the mass of stranded use cases under the line, and getting those methods adopted before the markets run out of patience.

The primary levers already exist:

Route every task to the cheapest model that can actually do it, instead of defaulting to the most expensive one.
Trim context and reasoning to what the task needs, rather than dumping everything into the biggest model on the highest setting.
Wrap agents in harnesses that control tools, memory, evals, approvals, and budgets, so reliability comes from the system and not from brute-force model size.
Fine-tune smaller open-weight models on proprietary data for the high-volume, repeatable workloads that don’t need a frontier brain.

Each of these is a way of doing the same job for less, which is the only thing that grows near-infinite demand into the real thing.

The optimistic reading of all of this is that the demand really is there. It is enormous, it is waiting, and it is gated by a single solvable variable - ROI. We just need to collectively make it happen soon.

P.S. This piece was originally published here.

Follow Kevin on X

We will be featuring good writing and writers we love from time to time. If you have recommendations, send them our way.

Token Dispatch is a daily crypto newsletter handpicked and crafted with love by human bots. If you want to reach out to 170,000 subscriber community of the Token Dispatch, you can explore the partnership opportunities with us 🙌

📩 Fill out this form to submit your details and book a meeting with us directly.

Disclaimer: This newsletter contains analysis and opinions of the author. Content is for informational purposes only, not financial advice. Trading crypto involves substantial risk - your capital is at risk. Do your own research.

A guest post by

Kevin Simback

Building and investing in AI and crypto

Discussion about this post

Ready for more?