multi-agent negotiationAnthropic ClaudeAI marketplaceAnthropic Project Deal fairness score resultsClaude multi-agent trading performance

Anthropic's Project Deal: 186 Trades, $4,000, and a Fairness Score of 4/7

5 Jun 20267 min readSarah Chen

// IN THIS ARTICLE

01Key Details 02Why This Matters for AI Development 03Industry Impact 04What to Watch 05Key Takeaways 06Frequently Asked Questions

Anthropic ran 69 employees through a closed marketplace and recorded 186 completed trades with a combined value above $4,000. That works out to roughly 2.7 deals per participant and an average ticket of about $21.50, on a $100 per-person budget. It is a tiny dataset, but it is the first public number we have on what multi-agent negotiation actually does when both sides hand the keyboard to Claude.

The more interesting figure is the fairness score: 4 out of 7. Right in the middle of the scale. Read generously, that means neither buyer nor seller felt fleeced. Read skeptically, it means the system was unremarkable to the people inside it. Both readings matter for where agentic commerce goes next.

Key Details

As Digital Commerce 360 reported, Anthropic published the Project Deal results on April 24. The setup was a self-selected pool of 69 San Francisco staffers, each handed $100 and asked to list items ranging from ping-pong balls to a snowboard. Anthropic compared the venue to Craigslist. The agents communicated over Salesforce's Slack platform.

Two design choices stand out. First, participants were not told in advance that there were actually four separate marketplaces running in parallel, each assigning users a different version of Claude. That is an A/B/C/D test of model variants against live human counterparties, with the counterparties blind to the split. Second, Anthropic kept some experimental details hidden from the human marketplace users entirely. The company has not disclosed which details, and that omission matters because it affects how you weight the satisfaction scores.

On the satisfaction side, users said they would be "willing to pay for a similar service in the future." Anthropic itself flagged the obvious selection-bias problem: the participants are Anthropic employees, the people most likely to benefit professionally from Claude-powered commerce succeeding. The 4/7 fairness number was framed by Anthropic as evidence that neither side got a disproportionate advantage, which is a defensible read but not the only one.

The competitive backdrop is where this gets sharper. OpenAI revealed in March that its agentic commerce strategy would pivot away from in-ChatGPT checkout, redirecting toward ChatGPT apps and a Shopify-ChatGPT integration that leaves more data control with merchants. Google has gone retailer-specific: Ulta Beauty products are now reachable through Google's Universal Commerce Protocol in AI Mode, the "Ask Macy's" agent runs on Google's stack, and Walmart and Home Depot are in the pipeline. Ashish Gupta, VP and GM of merchant shopping at Google, said in April that agentic AI "has great potential to make online shopping easier for everyone." Meanwhile, Amazon and eBay forbid certain AI agents on their sites outright.

Why This Matters for AI Development

Three companies, three architectures. OpenAI is building a merchant-integration play: ChatGPT calls into Shopify, and the retailer keeps the checkout. Google is building a protocol play: UCP plus retailer-specific agents on top of existing Google Cloud relationships. Anthropic is, on the evidence of Project Deal, building something different again: agent-to-agent negotiation where both buyer and seller are represented by Claude instances. Same category, three completely different bets on where the value accrues.

The multi-agent angle is the one I'd watch most carefully. Single-agent commerce (a ChatGPT user asks Claude to buy something from a passive store) is a UX upgrade on search and checkout. Multi-agent commerce (both sides have agents that negotiate price, terms, bundle composition) is structurally new. It needs message protocols, identity, dispute resolution, and an audit trail that holds up if one side claims the agent acted out of bounds. Anthropic's own published tool-use patterns and the broader Model Context Protocol work give some of the plumbing, but nothing in the public record tells us how Project Deal handled the harder questions: did agents have spending limits enforced server-side, or were they advisory? Did Claude execute the final transaction, or did a human confirm? The source does not disclose this, which matters because the answer determines whether what Anthropic tested is a chat assistant with a notepad or an actual autonomous economic agent.

The 4/7 fairness score also deserves scrutiny. A genuinely good two-sided negotiation system probably should score around the midpoint, because perfect fairness from both sides' perspective is the equilibrium. But a system where nobody felt strongly either way is also consistent with users not engaging deeply enough to form a view. We cannot tell which from one number.

Industry Impact

For engineering teams in commerce, payments, and marketplace infrastructure, the immediate question is whether to plan for an agent-mediated buyer flow within the next 12 to 18 months. The fact that Amazon and eBay already block certain agents tells you the incumbents see this as a threat to take-rate, not an enhancement to it. Anthropic's experiment essentially routed around that by building its own marketplace inside Slack. If Claude, ChatGPT, and Gemini cannot get onto Amazon, they will have an incentive to bootstrap alternative venues, and Meta with Facebook Marketplace and existing AI stack is the obvious dark horse.

For iGaming, fintech, and ad-tech readers, the relevant transfer is the negotiation primitive itself. A two-sided agent that can settle on price, timing, and terms is the same primitive you need for dynamic odds adjustment, automated KYC-bounded credit terms, or programmatic ad inventory negotiation outside the current auction model. None of that is in Project Deal directly, but the building block is the same: paired agents with bounded authority, an audit log, and a settlement step.

For platform leads specifically, the data question is the one to nail down before signing anything. OpenAI's pivot to keep checkout inside Shopify, rather than inside ChatGPT, is a tacit admission that merchants will not hand over order data and customer relationships to a third-party agent layer for free. Whatever Anthropic eventually ships will face the same wall. If your team is asked to integrate with any of these agents in the next year, the contract terms around data residency, attribution, and chargeback responsibility will matter more than the model benchmarks.

What to Watch

The signature unknown here is whether Project Deal scales past 69 friendly employees and $4,000 in low-stakes goods. The bound is testable: if Anthropic runs a second iteration outside its own staff within six months, with disclosed protocols and external participants, the multi-agent thesis is real and worth budgeting against. If the next public update is another internal study, the thesis is still a research artifact.

Three predictions I'd put numbers on. First, if OpenAI's Shopify integration genuinely captures merchant adoption, expect at least one major retailer to publish a percentage of sessions originating from ChatGPT by end of 2026; if no retailer discloses that number, the integration is underperforming. Second, if Google's UCP gains traction, the Ulta and Macy's pilots will be followed by at least three more named retailers within the same window. Third, if Anthropic is serious about multi-agent commerce, expect a published protocol spec, not just a blog post, within twelve months. Anything less and Project Deal stays a science fair entry.

The unanswered question that matters most: do any of the three vendors have a credible answer for what happens when a buyer agent and a seller agent collude, accidentally or otherwise, against the humans on either end? Nothing in the public record addresses this. Until one of them does, I would treat agent-to-agent commerce as a prototype, not a platform.

Key Takeaways

Project Deal produced 186 deals and $4,000+ in volume across 69 Anthropic staffers, with a 4/7 average fairness score. Small sample, friendly users, useful first data point.
Three vendors, three architectures: OpenAI is integrating with merchants via Shopify, Google is shipping a protocol (UCP) plus retailer-specific agents, Anthropic is testing two-sided agent negotiation.
Amazon and eBay already block certain agents, which pushes LLM vendors toward building or partnering on alternative marketplaces. Meta with Facebook Marketplace is the obvious incumbent threat.
Critical unknowns: whether Claude executed final transactions, what spending-limit enforcement existed, and what details were hidden from human users. These determine whether Project Deal is autonomy or assistance.
Watch for a second Project Deal iteration with external participants within six months as the test of whether multi-agent commerce is a real product direction or an internal research curiosity.

Frequently Asked Questions

Q: What was Anthropic's Project Deal?

Project Deal was a closed pilot experiment Anthropic published on April 24, 2026, involving 69 of its own San Francisco staffers. Each received $100 and used Claude-powered agents to buy and sell items in a Craigslist-like marketplace built on Salesforce's Slack platform. It produced 186 completed deals totaling over $4,000.

Q: How does Anthropic's approach to agentic commerce differ from OpenAI and Google?

OpenAI pivoted in March away from in-ChatGPT checkout toward merchant integrations like its Shopify partnership, leaving control with retailers. Google is rolling out its Universal Commerce Protocol with named retailers like Ulta Beauty and Macy's. Anthropic's Project Deal uniquely tested two-sided negotiation where both buyer and seller are represented by AI agents.

Q: Why do Amazon and eBay block certain AI agents?

Both marketplaces forbid specified AI agents from operating on their sites, which industry observers read as a defense of their take-rate and customer relationships. If LLM vendors cannot route agents through the largest marketplaces, they have an incentive to build or partner on alternative venues, which is part of what makes Project Deal's marketplace experiment strategically interesting.

Sarah Chen

RiverCore Analyst · Dublin, Ireland

// RELATED ARTICLES

Meta May Lease AI Compute to Anthropic in $10B Deal

Meta is reportedly in early talks to lease compute to Anthropic for up to $10B over two years, roughly 7% of its 2026 capex. The signal matters more than the number.

Sutton Quits Carmack's Lab to Declare War on LLMs

Richard Sutton, the father of reinforcement learning, has left John Carmack's Keen Technologies to co-found Oak Lab and take direct aim at the LLM approach.

France Blocks Polymarket: ISPs Ordered to Cut Access

France's gambling regulator ANJ has ruled Polymarket illegal and ordered ISPs to block it. The bigger question: where does prediction market end and unlicensed book begin?