Stronger models did better, and users did not notice

An internal Anthropic experiment suggests a subtle but important form of AI inequality may already be emerging: people represented by stronger models can secure better outcomes without anyone around them realizing the gap exists. According to the supplied source text, Anthropic ran a one-week internal marketplace called “Project Deal” in December 2025 in which 69 employees used Claude-based AI agents to buy and sell real goods over Slack.

Each participant received a $100 budget. Before the marketplace opened, Claude interviewed volunteers about what they wanted to buy or sell, their price preferences and the negotiating style they wanted their agent to use. Anthropic then used those inputs to generate custom system prompts. After that, the AI agents handled the process end to end: writing listings, finding counterparties, making offers, haggling and closing transactions. Humans stepped back in only at the end to exchange the goods.

The key experimental twist was hidden from participants. Anthropic ran parallel versions of the market. In some, every participant was represented by Claude Opus 4.5, described in the source text as Anthropic’s frontier model at the time. In others, participants had a 50% chance of being represented by Claude Haiku 4.5, the company’s smallest model.

The outcome was not just technical. It was social.

According to the source, the more capable Opus model consistently secured better prices and closed more deals on average than Haiku. At the same time, more aggressive negotiation instructions did not produce a statistically significant difference in outcomes. In other words, model capability mattered more than simply telling the system to bargain harder.

That result cuts against a common instinct in enterprise AI adoption, where organizations sometimes assume prompt style or surface behavior will determine most of the value. Anthropic’s findings suggest that underlying model strength can matter more than tone. If that pattern generalizes, the quality of the agent itself may quietly shape who gets favorable terms in digital transactions.

The most striking finding may be perceptual rather than economic. Anthropic says users whose weaker Haiku agents obtained objectively worse outcomes still rated their transactions as just as fair as users represented by Opus. That mismatch is what the company flags as a form of “invisible inequality” in AI-assisted decision-making.

This is a consequential idea. Traditional forms of inequality are often visible in pricing, access or service quality. What Anthropic is pointing to is more difficult to detect: two people can feel equally satisfied while one has systematically received worse representation from the machine acting on their behalf.