New on the Anthropic Engineering Blog: We give prospective performance engineering candidates a notoriously difficult take-home exam. It worked well—until Opus 4.5 beat it. Here's how we designed (and redesigned) it: https://t.co/3RZVyhpVij
Anthropic Releases Hiring Exam Claude Keeps Beating as Open Engineering Challenge
· Updated
Anthropic published how their performance engineering take-home has been defeated by three successive Claude models. The exam - a Python simulator where candidates optimize a parallel tree traversal - has been completed by 1,000+ candidates since 2024. Claude Opus 4 outperformed most humans within the 4-hour limit, and Claude Opus 4.5 matched the best human performance in 2 hours.
Anthropic redesigned the exam three times. The current version uses Zachtronics-inspired puzzles with constrained instruction sets - problems sufficiently novel that models can't draw on training data. The takeaway: domain-specific problems fall to models with extensive training coverage. Novel, constrained puzzles are where human reasoning still wins.
The original exam is open-sourced. Score below 1,487 cycles and email performance-recruiting@anthropic.com with your code and resume.
Anthropic
@AnthropicAI
202retweets
View on X


