Anthropic Releases Hiring Exam Claude Keeps Beating as Open Engineering Challenge

AnthropicAnthropic

· Updated

Anthropic's performance engineering take-home, completed by 1,000+ candidates since 2024, has been defeated by successive Claude models - forcing three complete redesigns. They're releasing the original exam on GitHub, where the fastest human solution still beats Claude's best after hours of compute.

Anthropic published how their performance engineering take-home has been defeated by three successive Claude models. The exam - a Python simulator where candidates optimize a parallel tree traversal - has been completed by 1,000+ candidates since 2024. Claude Opus 4 outperformed most humans within the 4-hour limit, and Claude Opus 4.5 matched the best human performance in 2 hours.

Anthropic redesigned the exam three times. The current version uses Zachtronics-inspired puzzles with constrained instruction sets - problems sufficiently novel that models can't draw on training data. The takeaway: domain-specific problems fall to models with extensive training coverage. Novel, constrained puzzles are where human reasoning still wins.

The original exam is open-sourced. Score below 1,487 cycles and email performance-recruiting@anthropic.com with your code and resume.

Anthropic
Anthropic
@AnthropicAI
X

New on the Anthropic Engineering Blog: We give prospective performance engineering candidates a notoriously difficult take-home exam. It worked well—until Opus 4.5 beat it. Here's how we designed (and redesigned) it: https://t.co/3RZVyhpVij

202retweets
View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update