AI Agents Spend on Your Behalf. When They Fail, Who Pays?

The OpenClaw token refund request is exposing a bigger problem

March 27, 202610 min read
AI Agents Spend on Your Behalf. When They Fail, Who Pays?

Somewhere in the last week of March 2026, a person sat down and wrote an email to an AI developer. They explained that the agent they had used, OpenClaw, had made mistakes on their sensitive financial documents. It had fabricated data.

They had spent hours cleaning up the fallout. They wanted compensation.
Specifically, they wanted a “token session refund.

Peter Steinberger, OpenClaw’s creator and now Agent Lead at OpenAI, posted the request on X on 23 March. Nearly 765,000 people saw it. The tweet drew 6,800 likes and almost a thousand replies, which is the internet’s version of a sold-out show.

Source: X
Source: X

Many people laughed. The phrase “token session refund” has the cadence of someone confidently mispronouncing a word they have only ever read.

It sounds like the complaint of a person who fundamentally does not understand what they bought.

Except that is not quite right.

What A Token Actually Is, and Why the Bill is Confusing

Image source: NVDIA
Image source: NVDIA

Before judging the request, it helps to understand what the person was actually asking for.

A token is a small fragment of text. Not quite a word, not quite a letter. It’s something in between.

The word “running” might be one token. The word “uncharacteristically” might be three or four. Punctuation marks are tokens. Spaces can be tokens.

The rough rule of thumb is that a token is about four characters of English text, which means a thousand tokens is roughly 750 words.

Here is the part that matters for billing: you are charged for every token going in and every token coming out.

Input tokens are the words you send to the model, including your message, the system instructions, and everything else in the conversation so far. Output tokens are the words the model sends back. And in an agentic setup like OpenClaw, the meter runs far more aggressively than most users expect.

Think of it like this. Imagine you hired a contractor and agreed to pay them by the word, for every word they read and every word they write, across every conversation they have about your project. That would feel reasonable if they were just responding to your occasional questions.

However, it would feel very different if you discovered they were re-reading your entire project brief from the beginning every single time you asked them anything, even if you had only asked them to check one small detail.

That is essentially what happens with agentic AI. Every time OpenClaw takes an action, whether it is browsing the web, checking your calendar, writing a piece of code, or just waking up on its heartbeat schedule to check for new tasks, it sends a full conversation history to the model.

If that history is long, you pay for all of it, every single time.

One user discovered that his heartbeat alone, the feature that lets OpenClaw proactively check in on its environment at regular intervals, was costing $50 a day because he had set it to trigger every five minutes and each trigger was carrying the full session context. Another reported spending $200 in a single day because a task got stuck in a loop and kept calling the API until someone noticed.

Tech blogger Federico Viticci burned through 1.8 million tokens in a month. His bill was $3,600. He knew what he was doing. He is not a confused novice. The costs still surprised him.

The billing is not hidden in a malicious sense. The per-token pricing is documented. The behaviour of agentic systems is explained, if you go looking for it.

But the practical reality of how quickly those costs accumulate in an autonomous, always-on system is not something most users can intuitively model before they have experienced it firsthand.

And that is a product design problem, not just a user education problem.

Photo by Jp Valery on Unsplash
Photo by Jp Valery on Unsplash

This is Not the Story of a Naive User. It Is the Story of a Gap.

Photo by Francisco De Legarreta C. on Unsplash
Photo by Francisco De Legarreta C. on Unsplash

The instinctive read on the token refund tweet was to frame it as user error.

Someone who did not understand what they were buying, complained when they got exactly what they paid for, and is learning an expensive lesson about how AI billing works. That reading is too comfortable.

Yes, technically, the user got what they paid for. The tokens were processed. The compute was consumed. The model ran. That the output was bad does not change the fact that the infrastructure did its job in the narrow sense of completing the request.

But we are in the early phase of a significant shift in how software works. The norms, the transparency standards, the consumer expectations, and the liability frameworks are all still catching up to the reality of what agentic systems actually do.

The frustration behind that refund request is not the frustration of someone who bought a product and used it correctly and is now complaining about the result.

It is the frustration of someone who did not fully understand what they were participating in and feels, with some justification, that the gap between expectation and reality was larger than it should have been.

Blaming users for not understanding token economics is a bit like blaming early internet users for not understanding bandwidth caps. The concept existed. The documentation was available. The blame for confusion still sits partly with the industry that designed an opaque pricing model and shipped it to a general consumer audience without adequate guardrails.

Steinberger posted a clear disclaimer alongside OpenClaw:

"The Service is provided “as is” and “as available” without warranties of any kind, whether express or implied, including but not limited to implied warranties of merchantability, fitness for a particular purpose, and non-infringement."

Source: X
Source: X

That disclaimer is legally bulletproof. It is also the kind of language that anyone who has ever shipped consumer software writes almost reflexively, because it has to be there.

But there is a cultural gap between “our lawyers are covered” and “our users genuinely understood what they were signing up for.”

That gap is where the token refund request lives. It is where a lot of the brewing frustration in the OpenClaw community lives too.

The Refund Question, Sorted

Photo by Glenn Carstens-Peters on Unsplash
Photo by Glenn Carstens-Peters on Unsplash

On the specific question of whether token refunds should happen: no, not for tokens consumed during a session that ran as designed.

The economic model breaks if completed compute usage is refundable. API providers cannot absorb the cost of every bad output any more than a taxi company can refund fares because the passenger did not like where they ended up.

That said, “refunds for tokens” and “some form of remediation for genuinely failed experiences” are not the same question.

A handful of AI providers and platforms have already started offering credit for sessions that produced clear system failures, not bad outputs due to poor prompting, but demonstrable malfunctions where the agent looped endlessly, failed to complete a task it explicitly claimed to be handling, or behaved in ways that no reasonable configuration would have predicted. That is a more nuanced position than blanket refunds, and it is a reasonable one.

The subscription model is a separate matter again. If you are paying a flat monthly fee for access to an agentic platform and the platform consistently fails to deliver on its core promise, the argument for credit or refund is considerably stronger than in a pay-per-token arrangement.

You are not paying for compute. You are paying for outcomes, or at least for a reasonable expectation of outcomes. Several platforms are already navigating this tension, quietly crediting accounts after high-profile failures rather than litigating the ToS in public.

The honest answer is that the consumer norms here are still being written. What feels absurd today, a token refund request, might look like a reasonable early signal of consumer expectations that the industry will eventually need to take more seriously.

The Liability Bomb Has Not Gone Off Yet

Photo by Immo Wegmann on Unsplash
Photo by Immo Wegmann on Unsplash

Here is the part of the conversation that the laughter on X was drowning out, and the part that will matter a great deal more as agentic AI moves into serious commercial territory.

The “AS IS, without warranty of any kind” shield that covers OpenClaw and most AI services today was designed for a world in which software was a tool humans operated.

It made sense. If you use a hammer badly, the hammer manufacturer is not responsible for the result.

However, agentic AI is different in kind, not just degree.

When an AI agent is given a high-level goal and the autonomy to pursue it across days and weeks, making decisions and taking actions without per-step human approval, the question of where human oversight ends and machine accountability begins gets genuinely complicated.

Steinberger’s own tweet thread spawned exactly this discussion: if an agent autonomously executes a financial transaction, or deletes files, or sends a communication, or generates a legal document, based on a misunderstanding of its instructions, how far does “AS IS” actually stretch?

The answer is that we do not yet know, because it has not been litigated.

In enterprise contexts, it will be. As agents move into financial modelling, legal document review, medical information, and complex code in production systems, the current liability framework will be stress-tested in ways that go well beyond refund requests.

The companies building in these spaces are already thinking about it. NanoClaw, one of the newer agent frameworks, has positioned its entire product philosophy around the premise that AI agents will misbehave and that the right response is sandboxing every action, not trusting the agent and hoping for the best.

A Forbes piece published in the same news cycle as the refund tweet ran with the headline “Don’t Trust AI Agents.” That framing is a bit strong for everyday use cases. But for high-stakes work, it is not wrong.

The principle is straightforward:

If you would not give a junior hire unsupervised access to your board documents, your production codebase, or your client communications, you should not give today’s AI agents that access either.

Not because the agents are malicious, but because they are capable, confident, and wrong in ways that are sometimes difficult to predict or catch before the damage is done.

The disclaimer in OpenClaw’s documentation is legally sound. The cultural work of helping users understand what it actually means, in practical terms, for the work they are trusting agents to do, is still very much in progress.

The Real Takeaway

The token refund request was not a story about one confused user. It was an early, clear signal about a set of tensions that are going to intensify as agentic AI scales.

Token costs are opaque in ways that matter for real people making real financial decisions. The “you agreed to the terms” defence is technically correct and culturally insufficient. Liability frameworks designed for passive software tools are going to be tested in environments where agents act autonomously on behalf of users in consequential situations. And the gap between what users expect when they hand a task to an AI and what they should reasonably expect from current systems is still much wider than most of the industry is acknowledging in public.

None of this means the refund request was right. It was not, in the narrow sense of how the billing model actually works.

It means the instinct behind it, that something about this exchange did not feel fair, was pointing at something real. The industry would do well to take that signal more seriously than the joke invited.