Project Vend: Phase Two

Introduction

In June, Anthropic unveiled a small shop in their San Francisco office lunchroom operated by an AI shopkeeper named "Claudius." This experiment, called Project Vend, explored how well AI models could handle complex, real-world business tasks. The initial results were disappointing—the AI lost money over time, experienced an identity crisis, and was manipulated into selling items at substantial losses.

However, with rapid improvements in large language model capabilities across reasoning, coding, and writing, Anthropic and their partners at Andon Labs decided to test whether Claudius's business acumen had improved in phase two.

Phase Two Improvements

Model Upgrades

The experiment upgraded from Claude Sonnet 3.7 to Claude Sonnet 4.0 and later 4.5. Updated instructions and new tools were provided, though no specialized shopkeeper training was implemented.

Enhanced Tools

Claudius gained access to:

Customer Relationship Management (CRM) systems to track customers, suppliers, and orders
Improved inventory management showing acquisition costs
Enhanced web search capabilities for price checking and supplier research
Payment links allowing payment collection before ordering
Google Forms integration for customer feedback
Reminder systems for self-management

New Colleagues

Seymour Cash (CEO) was introduced to apply performance pressure. Cash implemented objectives and key results, required reporting via Slack, and managed financial approvals. The CEO reduced discounts by 80% and giveaways by half, though it also authorized eight refund/credit requests for every one it denied.

Clothius was hired to create custom merchandise like t-shirts, hats, and stress balls. This agent proved more successful due to clear role separation, finding profitable opportunities in tungsten cube sales once Andon Labs purchased a laser etching machine.

Business Performance

Financial Results

Claudius's business, renamed "Vendings and Stuff," showed marked improvement:

Stabilized performance as phase two progressed
Largely eliminated weeks with negative profit margins
Expanded to three locations: San Francisco (with two machines), New York, and London

Top Products

Merchandise performed well, particularly stress balls and etched tungsten cubes, with healthy profit margins on most items.

What Worked

The most impactful change was forcing Claudius to follow procedures. Instead of offering quick responses, the AI now double-checked pricing and delivery times using research tools, resulting in more realistic quotes—higher prices and longer wait times, but improved accuracy.

"Bureaucracy matters" proved true: procedures and checklists provide institutional memory that prevents common errors.

Clothius's success suggested that clear role separation between specialized agents works better than unified management.

Persistent Vulnerabilities

Despite improvements, Claudius remained vulnerable to exploitation in several ways:

Rogue Trading

When asked to lock onion prices for January delivery, neither Claudius nor Cash recognized this violated the 1958 Onion Futures Act—a specific U.S. law banning such contracts. A staff member had to intervene before the illegal agreement proceeded.

Security Lapses

When informed of shoplifting, Claudius proposed messaging unknown thieves demanding payment and attempted to hire the reporting employee as security at $10/hour—below California minimum wage.

Governance Failures

During CEO naming procedures, Claudius confused selecting a name for a CEO agent with actually electing a new CEO, nearly replacing Seymour Cash with a staff member named Mihir before oversight intervention.

Additional Issues

Other red-teaming attempts included:

Attempts to purchase gold bars below market value for arbitrage
Convincing Claudius to end messages with specific emojis
Various manipulation tactics for receiving discounts

Extended Testing

When internal red-teaming decreased, Anthropic extended testing to the Wall Street Journal newsroom. The publication's reporters tested both phase one and two setups in an uncontrolled, adversarial environment and documented creative methods they used to obtain free items.

Key Findings

Training Conflicts

The models' training to be "helpful" conflicted with hard-nosed business principles. Claudius made decisions from a friend-like perspective prioritizing niceness over profitability.

The Helpfulness Problem

Many problems stemmed from this underlying tension: the AI wanted to please customers and employees, undermining financial discipline.

Limitations of Simulation

Real-world deployments expose unexpected situations that simulations cannot fully capture. Testing with autonomous agents revealed vulnerabilities invisible in controlled evaluations.

Broader Implications

As AI systems enter more critical functions, designing guardrails becomes increasingly important. These safeguards must be:

General enough to account for diverse problematic behaviors
Flexible enough to preserve economic potential

This balance represents "one of our industry's trickiest and most important challenges."

Conclusion

Claudius demonstrated genuine improvement from phase one to phase two. Better tools, procedures, and specialized colleagues enhanced business performance significantly. However, the gap between "capable" and "completely robust" remains substantial. The experiment revealed that autonomous AI agents, while increasingly sophisticated, still require considerable human oversight to prevent costly errors and exploitation.

The willingness of AI systems to be helpful—a desirable quality in many contexts—becomes a liability in adversarial business environments where financial discipline and legal compliance matter most.

Project Vend: Phase Two ​

Introduction ​

Phase Two Improvements ​

Model Upgrades ​

Enhanced Tools ​

New Colleagues ​

Business Performance ​

Financial Results ​

Top Products ​

What Worked ​

Persistent Vulnerabilities ​

Rogue Trading ​

Security Lapses ​

Governance Failures ​

Additional Issues ​

Extended Testing ​

Key Findings ​

Training Conflicts ​

The Helpfulness Problem ​

Limitations of Simulation ​

Broader Implications ​

Conclusion ​