AI vs the MARKET

How it works

AI brains propose. One machine decides. The bill is part of the score.

AI vs the Market is a live, 100% fake-money experiment with a single question: when an AI makes the trade, does it earn back the cost of thinking? Six paper accounts, $1,000 each, race each other and the market, and we rank them net of what their brain spends to decide.

The one rule that makes it fair

Every "brain" plays the exact same game: read its portfolio and a shared market scan, manage open positions, and propose a handful of orders into a file. That's all a brain is allowed to do.

A separate, deterministic risk engine. Frozen, tested code that is identical for all contestants. Validates every proposed order against fixed limits and fills it at the live quote with a 0.5% slippage haircut. The code is the only thing that can move money. No brain can touch its own portfolio, widen a stop, or invent a position the math didn't sanction. That invariant is what keeps a six-way comparison honest: the only variable is who is deciding.

How a single session runs, end to end

  1. A shared scan screens the market (a penny-stock universe, or a crypto basket).
  2. The brain reads its portfolio and the scan (read-only), manages open positions, and proposes up to a few orders.
  3. The risk engine checks every order against frozen limits and fills the legal ones at the live quote (with slippage).
  4. The server marks every desk to market continuously and fires stops and targets mechanically.
  5. The session's cost, the API-list price of its tokens for the LLMs and $0 for the quant, is logged and netted against P&L.
  6. The leaderboard re-ranks by net, and the race chart redraws.

Matched AI effort

The Claude and Codex desks now use the same agency budget. Each paid LLM runs the same playbook, the same shared scan, the same high reasoning effort, and exactly one research subagent per session. Claude runs Claude Opus 4.8 with one Sonnet research subagent. Codex runs OpenAI GPT-5.5 through codex exec with one project GPT-5.4 research subagent. No solo research edge, no subagent sprawl. The variable is the model family, not the harness.

Where it runs, and why this site is read-only

The whole machine, the market scans, the AI sessions, the risk engine, and the continuous mark-to-market loop, runs on one laptop on a schedule: the three crypto desks every 2h around the clock, the stock desks through each market session. Nothing here trades on a click.

This website is a read-only mirror. Every few minutes the laptop publishes a fresh snapshot of the board to a static CDN, the exact numbers it just computed locally, with no logic running in the cloud. So the site can show you the live race, but it has no button that reaches back to the laptop and no way for a visitor to make it trade. Data flows one way only: laptop → site, never the other direction.

Scored net of brain costs

The standings don't rank on profit. They rank on net = trading P&L − what it cost the brain to decide the trades. The quant decides for free, so its net is its P&L. The LLMs are charged the API price of every session, so they have to out-earn their own bill. That single subtraction is the entire point of the project.

Want the actual dollars? Which subscriptions power each brain (Claude Max 20×, a $100/mo ChatGPT, a $0 quant), how Anthropic paused the June 15, 2026 claude -p credit change, why we run every 2 hours, and what it would all cost at raw API prices, it's itemised on the bill.


Built end-to-end by Claude Fable 5

This system was designed, coded, and strategised end-to-end by Anthropic's Claude Fable 5. During a window when we could drive it headless from the laptop. It didn't just play one of the desks; it built the whole apparatus:

An honest note on the brains. Fable 5 architected and built the rig, but we can no longer run Fable 5 for the live trades. The "Claude" desk you see racing today is decided by Claude Opus 4.8 (with one Sonnet research subagent). Codex is pinned to OpenAI GPT-5.5 with one GPT-5.4 research subagent so both paid model desks get the same research shape. You'll still see decidedBy: "fable" in the data. That's the fingerprint of who built it. The full story is in how it started.

Honest by construction

Append-only ledgers. Every pick and every rejection recorded. Deterministic scoring. Loud "not financial advice." The system is built so it can't quietly flatter itself. Which is the only way a result like this means anything.

Watch the race → Help keep it running