
Follow ZDNET: Add america arsenic a preferred source connected Google.
ZDNET's cardinal takeaways
- AI usage is moving to token-based pricing.
- Token pricing is acold much costly than the erstwhile flat-fee model.
- Measuring the worth derived from AI remains an unsolved problem.
SAN DIEGO -- A fewer months ago, astir radical paid a level interest for their AI access. That was then. This is now. The days of AI pricing arsenic a loss-leader are over. As everyone has discussed present at FinOps X 2026, AI's token-based pricing exemplary is becoming the instauration of the full generative AI economy, and it's acold much costly than older models. Just inquire CoPilot users who are having fits implicit the caller token-based pricing.
For galore endeavor customers, this reminds them of the aboriginal days of unreality pricing erstwhile they had to woody with volatile invoices and concern models shifting nether their feet. Underneath the confusion, tokens are softly standardizing however labs construe scarce GPU capableness into billable units, however enterprises measurement AI usage, and however bundle vendors reprice their products.
Also: Rolling retired AI agents? 4 ways to determination accelerated and furious - but with utmost caution
Tokens: The atomic units of AI
In this caller world, the token is the basal portion of AI work. J.R. Storment, enforcement manager of the FinOps Foundation, calls it "the atomic portion of AI." In his FinOps keynote, Storment said that "tokens service much roles successful the modern system than astir immoderate different commodity has successful modern history, maybe, possibly lipid successful the 20th century." Tokens, helium told the FinOps X audience, are simultaneously "the portion of output from each of the hardware and compute and information centers," "how the labs terms their outputs and inputs," and "the worth portion that enterprises are looking to monetize."
That abstraction is precisely wherefore labs and hyperscalers similar it. Instead of charging for GPU types, memory, and powerfulness directly, they tin exposure a azygous portion -- tokens per cardinal -- implicit a bewildering premix of architectures and deployment topologies. OpenAI, Anthropic, Google, and others present people per‑model complaint cards with abstracted prices for input tokens (everything you nonstop the model) and output tokens (everything it generates back), usually quoted successful dollars per cardinal tokens.
Also: Building an agentic AI strategy that pays disconnected - without risking concern failure
So what are tokens anyway? An AI token, said Storment, "is the smallest portion a connection oregon operation tin beryllium breached down into erstwhile being processed by a ample connection exemplary (LLM)." Before a exemplary tin enactment with text, it breaks it into fragments, a process called tokenization. For English, a communal regularisation of thumb is that "one token is astir 4 characters, oregon astir three-quarters of a word," truthful "100 tokens ≈ 75 words."
The token hides tremendous complexity. As SAP's FinOps squad enactment it successful their session, "You wage per token, and this small token hides an tremendous complexity underneath predictability," from exemplary prime and quantization to however aggressively you usage caching oregon agents. That complexity is precisely what FinOps teams are present being asked to decode.
The all‑you‑can‑eat token epoch is over.
If 2023 done aboriginal 2025 was the epoch of inexpensive experiments, the past 18 months person been a rude awakening. Storment describes 3 chiseled phases: The "old days of AI" earlier ChatGPT, the "good aged days of AI" erstwhile chatbots "could constitute immoderate decent code," and past the post‑November‑2025 satellite erstwhile large exemplary releases "took AI from beauteous bully to truly good."
In the bully aged days, the epoch of all-you-could-eat tokens and subscriptions, we went done a little play of token maxing. Then everybody was excited astir their token leaderboard, which showed who had the astir token usage. Today, token leaderboards are painfully obsolete due to the fact that nary 1 tin spend to discarded tokens. As Amazon elder vice president Dave Treadwell begged, "Please don't usage AI conscionable for the involvement of utilizing AI."
Objectively, betwixt June and November past year, Storment said planetary token usage grew successful a "nice linear path." Then those caller models and agentic patterns landed. Context windows "went from a fewer 1000 oregon tens of thousands oregon hundreds of thousands up to millions of tokens successful a azygous conversation," and "agentic deed the country and exploded," adding "loops and retries and corrections and each this insanity."
Companies had happily subsidized that behavior… until they saw the bills. Storment recounted however immoderate "$200-a-month" powerfulness users really outgo "upwards of tens of thousands of dollars a period erstwhile you were moving everything connected the latest model." For example, SemiAnalysis, an AI analytics company, precocious estimated that a $200 Anthropic program utilized to springiness $8,000 worthy of Claude tokens, portion a akin OpenAI offering gave $14,000 worthy of Codex tokens.
Those days and prices are done. Moving forward, companies volition person to wage the existent outgo of AI tokens.
"So present what matters much than thing is AI value," Storment told the room. "We've got to bring worth backmost to what we're doing… We're successful an epoch wherever tokens are the main measurement. We're successful an epoch wherever tokens are successful everything successful software, and they're driving a batch of the planetary token economy."
Scarcity keeps token prices from collapsing
If Moore's instrumentality and hyperscale contention were the lone forces astatine work, you'd expect token prices to support falling. To immoderate extent, they have. "Since 2023, token prices person fallen dramatically," Storment acknowledged. SAP's interior telemetry tells a akin story. "This is our outgo per token implicit the aforesaid clip period," said SAP information idiosyncratic Maida Nazifi, showing their interior chart. "It's intelligibly trending down, adjacent with a spot of flattening astatine the end. And honestly, it matches the communicative that everyone wants to believe, right? Token prices support connected falling."
But some accent the caveat: The level whitethorn beryllium successful sight. Storment notes that if "you look astatine the apical labs and their pricing, you spell backmost to the Wayback Machine. Token prices person been beauteous level since November 2025," which helium links straight to hardware and powerfulness constraints: "We can't get capable hardware, we can't get capable power… we're seeing backlogs, we're seeing agelong committedness periods, and we're seeing shortages."
Also: AI agents are getting their ain hunt engine
He cited Intel's CEO saying helium doesn't expect existent alleviation successful GPU and related constituent proviso "until 2028." Nazifi and SAP VP Frederik Pohl are seeing the aforesaid patterns astatine their company: Pohl warned, "We person proviso concatenation constraints, we person hardware prices that are rising, and the prices of caller frontier models are increasing ever much expensive."
The nett effect is simply a classical Jevons paradox: Falling portion cost, exploding full spend. "Even with falling token prices, we spot that our walk is inactive rising, and that's the celebrated paradox," Pohl said. "At our scale, we had portion costs falling, but we saw successful immoderate months that walk was doubling."
Storment thinks the paradox is conscionable beginning. Goldman Sachs, helium said, estimates planetary usage rising from "6 quadrillion tokens" contiguous to "120 quadrillion forecasted tokens" wrong astir 3.5 years. Even if token prices driblet further erstwhile proviso loosens, they are improbable to autumn 24x arsenic accelerated arsenic measurement grows."
FinOps discovers token economics
For the FinOps community, which chopped its teeth connected unreality right‑sizing and reserved instances, token pricing is some acquainted and wholly alien. The acquainted portion is that its usage‑based, the invoices are big, and forecasting is hard. The alien part? The portion is tied to language, not infrastructure, and it changes arsenic accelerated arsenic exemplary releases, not arsenic dilatory arsenic server depreciation schedules.
Pohl asserted that "AI does not conscionable agelong the unreality playbook, it breaks it; AI is much antithetic from the unreality than unreality was to the information center." Unlike CPUs, "AI models are thing similar that… they person their unsocial strengths and weaknesses… They person antithetic outgo profiles, and swapping retired an LLM is not conscionable a pricing decision. It's besides a quality-of-output decision."
SAP's acquisition is simply a lawsuit survey successful however enterprises are retooling. Its Business AI platform, Pohl explained, runs crossed "multiple antithetic LLMs," including "ChatGPT, Anthropic, Gemini… different unfastened root models," layered connected "different hyperscalers."
Also: Work IQ is Microsoft's large stake connected agent-first endeavor IT, and I person questions
When SAP archetypal went looking for AI outgo data, "we instantly deed a wall," Nazifi recalled. "The existing [cloud] tools were precise unsighted to the nuance of LLMs, truthful they could archer america we spent this magnitude connected [a provider], but not truly which model, oregon however overmuch the model. It truly was similar trying to optimize your golden mining cognition by looking astatine the full value of ore."
So they did it the hard way: "We pulled information manually, we merged information crossed tables, and past we had this archetypal representation by hand." That picture, erstwhile it reached their planetary infrastructure pb and past the CTO, transformed the conversation. "Within days, it went from like, OK, this is interesting, support maine posted,' to… 'I request this regularly, I request more,'" Nazifi said. Pohl added the FinOps lesson: "If you person a CTO asking for a number, that's not a question, it's a mandate.
That request forced SAP to formalize an interior AI FinOps model built astir 3 pillars:
- Spend visibility: "What we consume, however we devour it, and wherever we devour it," crossed models, platforms, concern units, and regions.
- Economics: "How efficiently are you leveraging AI," measured with token‑level metrics similar input/output ratios, cached token ratios, and "token to walk drift" to spot whether costs are rising due to the fact that of measurement oregon premix shifts to pricier models.
- Value: Connecting AI walk to concern outcomes with "cost per usage case" and "inference outgo by revenue," truthful they tin archer "which AI features are economically viable" and whether "your AI merchandise margins really work."
"Every token needs to gain its cost," Pohl said, echoing Nvidia CEO Jensen Huang's operation "token mill effectiveness." That mill spans everything from silicon and information halfway leases to exemplary routing and punctual design.
Tokenomics: beyond conscionable counting tokens
If FinOps is astir outgo power and accountability, tokenomics, astatine slightest arsenic the Linux Foundation is positioning it, is astir the afloat lifecycle of tokens arsenic an economical good. Storment defines it arsenic "the emerging subject of converting vigor and superior into AI tokens and resources, consuming those tokens and each the related exertion to thrust businesslike intelligence, and past yet thrust worth connected the backend."
In his view, that breaks into 3 buckets:
- Production: "Take vigor and superior and make tokens," whether successful unreality information centers, colos, borderline devices, or, arsenic Elon Musk likes to imagine, "data centers successful space."
- Consumption: All the allocation, forecasting, and optimization, which benignant of sounds a batch similar FinOps for AI," spanning exemplary routing, quantization choices, cause limits, and cache strategies.
- Value: "How bash we monetize those tokens? How bash we set our pricing based connected the outgo of those tokens? What are the labour implications successful our full institution based connected the outgo of that AI?"
That past portion is wherever token pricing straight collides with software-as-a-service (SaaS) concern models. As Storment told maine successful an interview, "Tokenomics is getting implicit to the terms of the tokens and however efficaciously we negociate this accumulation and depletion of them is changing pricing models for Fortune 100 companies."
He points to Microsoft's GitHub moves, shifting Copilot toward much explicit usage‑based charging, arsenic an aboriginal example. Developers "who emotion the unlimited tokens" are present "really conscionable aggravated astatine Microsoft," due to the fact that their implicit subsidy vanished.
Also: Why Anthropic abruptly pulled Fable 5 and Mythos 5 for everyone
The labs themselves are besides tightening the screws successful ways that are invisible astatine the token level. He raised arsenic a caller illustration Anthropic's Fable exemplary card: "If you're going to usage Claude astatine Fable to effort to physique an LLM, they volition silently driblet you to a antithetic model, and you aren't going to know." Since then, Anthropic has walked backmost this policy, but different companies whitethorn not. Such soundless policies marque a mockery of immoderate naive "cost per token" metric, due to the fact that "not each tokens are created adjacent by immoderate agelong of the imagination."
Storment agrees. "A token tin outgo 2 cents per million, oregon it tin outgo 35 per million, conscionable from a outgo perspective," helium said, and adjacent astatine the aforesaid rate, "one mightiness thrust a batch of value, and 1 doesn't, based connected however you're utilizing it." For him, the constituent of embracing "tokenomics" arsenic a word is to harness the information that the C‑suite has already latched onto tokens arsenic a intelligence model.
It besides doesn't assistance that today's precocious LLMs, specified arsenic Anthropic Fable 5, tin pursuit aft an reply and pain tokens without users having a hint what's really happening. For instance, Simon Willison, co-creator of the Django Web framework, reported that "Based connected a screenshot and a one-line prompt, Claude Fable 5 + Claude Code," launched a web server, utilized galore and antithetic web browsers, built and launched its ain web server, and performed galore different tricks, each to way down a elemental CSS show bug. Had helium utilized token pricing, it would person outgo him lone $12. It's casual to envision a frontier exemplary taking connected a much analyzable occupation and burning hundreds oregon thousands of dollars.
Business models: from credits and seats to blended token bundles
These pricing experiments amusement a pricey future. Most customers volition ne'er spot a enactment point labeled "120 quadrillion tokens." Instead, vendors are gathering layers of abstraction connected top:
- Credits and opaque consumption: Storment described signing up for an unnamed work wherever "every clip I ran a video, it was like, 'Put much quarters successful the machine, enactment your recognition paper down. These credits spell fast.'" Under the hood, those quarters are tokens.
- Hybrid subscription + usage: Others usage "a basal monthly, and past immoderate level of consumption," giving customers a predictable basal and past exposing them to token‑denominated overages astatine the margin.
- Direct pass‑through models: A smaller set, particularly successful infrastructure‑adjacent products, are "starting to nonstop allocation, nonstop walk through," fundamentally showing customers the token metre much honestly but wrapped successful their ain dashboards and guardrails.
These are each susceptible to upstream shocks. Storment warned, "Anything changes successful this, your token mill changes, you way to the incorrect exemplary and stroke your cache up, you inefficiently forecast oregon estimate. Anything changes, this affects user pricing astatine the end, and you whitethorn person to alteration your anterior pricing exemplary for however you spell to market, and this isn't conscionable bundle companies, it's cascading into banks and everyone other today."
That cascading effect is wherefore the Linux Foundation is spinning up a Tokenomics Foundation alongside the FinOps Foundation: to springiness large consumers and suppliers a vendor‑neutral spot to hash retired specifications and champion practices for measuring and allocating token‑based costs. The FinOps Focus specification, primitively designed to normalize unreality billing data, is already being extended for token‑level telemetry. A caller "FinOps certified Focus generator" programme aims to validate that providers' billing pipelines conform.
The quality side: AI haves versus have‑nots
Beyond the spreadsheets, token pricing is already shaping who gets to usage almighty AI -- and who doesn't. Storment sees a "societal disagreement betwixt those who tin spend the AI and those who can't" if precocious token costs persist. At the endeavor level, you tin already spot the outlines: "Certain teams are being deemed worthy of getting the latest model, and others are not," with immoderate users routed automatically "to cheaper model[s]" and others granted exceptions.
Yet determination is besides a beardown statement against crude caps. One Fortune 100 enforcement told Storment to "look crossed your usage… and you're going to find immoderate outliers of people… Don't headdress them, don't unopen them down. Go speech to them, find retired what they're doing, due to the fact that they mightiness really beryllium doing thing truly interesting." In a satellite wherever YC‑backed startups person "millions of dollars of tokens" from frontier labs to disrupt incumbents, shutting down interior experimentation could beryllium an existential threat.
Also: 5 ways to turn your concern with AI - without leaving employees behind
For individuals, and particularly caller workers, trying to usage AI, token pricing feeds into broader anxieties astir AI and jobs. You raised the backlash to AI‑heavy commencement speeches and the consciousness among graduates that AI is "coming straight for their jobs successful an already pugnacious occupation market."
Storment's presumption is much nuanced but inactive stark: "I don't deliberation AI is instantly coming for everybody's job, but I deliberation the idiosyncratic who's amended astatine AI is coming for the occupation of the idiosyncratic who's not utilizing AI." If token prices and quotas restrict who tin larn and experiment, that disagreement volition lone deepen.
For some companies and individuals, we're moving rapidly into an AI-token-based economy. This, successful turn, volition pb to a acold much costly AI world. What each that volition mean is simply a question we don't yet person an reply to. The 1 happening we cognize for definite is that it volition beryllium orders of magnitude much costly than it has been.

1 day ago
14






English (US) ·