Stronger models did better, and users did not notice
An internal Anthropic experiment suggests a subtle but important form of AI inequality may already be emerging: people represented by stronger models can secure better outcomes without anyone around them realizing the gap exists. According to the supplied source text, Anthropic ran a one-week internal marketplace called “Project Deal” in December 2025 in which 69 employees used Claude-based AI agents to buy and sell real goods over Slack.
Each participant received a $100 budget. Before the marketplace opened, Claude interviewed volunteers about what they wanted to buy or sell, their price preferences and the negotiating style they wanted their agent to use. Anthropic then used those inputs to generate custom system prompts. After that, the AI agents handled the process end to end: writing listings, finding counterparties, making offers, haggling and closing transactions. Humans stepped back in only at the end to exchange the goods.
The key experimental twist was hidden from participants. Anthropic ran parallel versions of the market. In some, every participant was represented by Claude Opus 4.5, described in the source text as Anthropic’s frontier model at the time. In others, participants had a 50% chance of being represented by Claude Haiku 4.5, the company’s smallest model.
The outcome was not just technical. It was social.
According to the source, the more capable Opus model consistently secured better prices and closed more deals on average than Haiku. At the same time, more aggressive negotiation instructions did not produce a statistically significant difference in outcomes. In other words, model capability mattered more than simply telling the system to bargain harder.
That result cuts against a common instinct in enterprise AI adoption, where organizations sometimes assume prompt style or surface behavior will determine most of the value. Anthropic’s findings suggest that underlying model strength can matter more than tone. If that pattern generalizes, the quality of the agent itself may quietly shape who gets favorable terms in digital transactions.
The most striking finding may be perceptual rather than economic. Anthropic says users whose weaker Haiku agents obtained objectively worse outcomes still rated their transactions as just as fair as users represented by Opus. That mismatch is what the company flags as a form of “invisible inequality” in AI-assisted decision-making.
This is a consequential idea. Traditional forms of inequality are often visible in pricing, access or service quality. What Anthropic is pointing to is more difficult to detect: two people can feel equally satisfied while one has systematically received worse representation from the machine acting on their behalf.
AI agents are becoming intermediaries
Project Deal matters because it moves the discussion beyond chatbots and into agency. These systems were not just answering questions. They were representing people in negotiations with other machines. That makes them less like productivity tools and more like intermediaries operating in markets.
As that role expands, model differences could have direct consequences in commerce, procurement, hiring, customer service and internal business operations. If stronger systems routinely negotiate better, sort information more effectively or identify better counterparties, then access to a frontier model becomes a practical advantage. The people on the weaker side of that divide may not even know they are at a disadvantage.
The source text does not claim that this result automatically extends to all markets. The experiment was internal, short in duration and limited in scale. Even so, it offers a concrete demonstration of something policymakers and companies are likely to confront more often: once AI agents begin acting for users, capability gaps can become outcome gaps.
Prompting may not be enough
One of the more useful findings in the report is that aggressive negotiation instructions did not produce a statistically significant improvement. That suggests organizations cannot assume they can compensate for weaker models simply by tuning prompts toward assertiveness.
For developers and buyers of AI systems, that is a practical warning. Agent performance may depend less on personality framing and more on core reasoning and decision quality. A slick interface or a forceful style does not necessarily translate into stronger representation.
That distinction matters because many AI deployments are justified on the basis of adequacy rather than excellence. If a cheaper or smaller model appears good enough in conversation, it may still perform materially worse once it is trusted to make or negotiate decisions on a user’s behalf.
The policy question is already here
Anthropic’s language around invisible inequality should resonate well beyond this single experiment. If organizations deploy different classes of AI agents across employee ranks, customer segments or public services, they may create uneven treatment without clear signs of unfairness at the point of use.
That is a harder governance problem than simple transparency. Telling users that an AI was involved does not answer whether the AI was as capable as the one used for someone else. And when the user experience still feels fair, the market or institution may not face immediate pressure to correct the imbalance.
Project Deal therefore reads as an early warning. It suggests that AI access is not just about whether a person gets a digital assistant, but about which assistant they get and how capable it is when stakes are attached to the outcome.
- Anthropic ran a one-week internal Slack marketplace using Claude agents for real transactions.
- Claude Opus 4.5 secured better prices and more deals on average than Claude Haiku 4.5.
- Users represented by weaker agents rated fairness similarly, despite worse outcomes.
This article is based on reporting by The Decoder. Read the original article.
Originally published on the-decoder.com







