HeadsUpAI

Anthropic Releases Hiring Exam Claude Keeps Beating as Open Engineering Challenge

· Updated

Anthropic published how their performance engineering take-home has been defeated by three successive Claude models. The exam - a Python simulator where candidates optimize a parallel tree traversal - has been completed by 1,000+ candidates since 2024. Claude Opus 4 outperformed most humans within the 4-hour limit, and Claude Opus 4.5 matched the best human performance in 2 hours.

Anthropic redesigned the exam three times. The current version uses Zachtronics-inspired puzzles with constrained instruction sets - problems sufficiently novel that models can't draw on training data. The takeaway: domain-specific problems fall to models with extensive training coverage. Novel, constrained puzzles are where human reasoning still wins.

The original exam is open-sourced. Score below 1,487 cycles and email performance-recruiting@anthropic.com with your code and resume.

Anthropic
Anthropic
@AnthropicAI
X

New on the Anthropic Engineering Blog: We give prospective performance engineering candidates a notoriously difficult take-home exam. It worked well—until Opus 4.5 beat it. Here's how we designed (and redesigned) it: https://t.co/3RZVyhpVij

202retweets
View on X

Share this update