Anthropic Releases Hiring Exam Claude Keeps Beating as Open Engineering Challenge

Anthropic

Jan 22, 2026 · Updated Apr 25, 2026

Anthropic's performance engineering take-home, completed by 1,000+ candidates since 2024, has been defeated by successive Claude models - forcing three complete redesigns. They're releasing the original exam on GitHub, where the fastest human solution still beats Claude's best after hours of compute.

Anthropic published how their performance engineering take-home has been defeated by three successive Claude models. The exam - a Python simulator where candidates optimize a parallel tree traversal - has been completed by 1,000+ candidates since 2024. Claude Opus 4 outperformed most humans within the 4-hour limit, and Claude Opus 4.5 matched the best human performance in 2 hours.

Anthropic redesigned the exam three times. The current version uses Zachtronics-inspired puzzles with constrained instruction sets - problems sufficiently novel that models can't draw on training data. The takeaway: domain-specific problems fall to models with extensive training coverage. Novel, constrained puzzles are where human reasoning still wins.

The original exam is open-sourced. Score below 1,487 cycles and email performance-recruiting@anthropic.com with your code and resume.

View the full update on anthropic.com

Anthropic

@AnthropicAIJan 22

New on the Anthropic Engineering Blog: We give prospective performance engineering candidates a notoriously difficult take-home exam. It worked well—until Opus 4.5 beat it. Here's how we designed (and redesigned) it: https://t.co/3RZVyhpVij

202

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Anthropic →

Keep reading

Anthropic Benchmark Shows Claude Solving Biological Research Problems That Stump Experts

Anthropic launched BioMysteryBench, a bioinformatics evaluation using real-world datasets to test if AI can devise creative solutions to open-ended research problems. While human experts were stumped by 23 of the tasks, the latest Claude models solved up to 30% of these difficult cases by combining internal knowledge with multi-step data analysis.

Anthropic Claude Mythos Autonomously Writes MCP Servers to Optimize Chip Design

bubble boiApr 8

Anthropic Claude Mythos Autonomously Writes MCP Servers to Optimize Chip Design

Anthropic's Claude Mythos model demonstrated autonomous engineering by writing its own MCP server to interface with professional chip design software. The model reduced timing slack by 40 percent and performed iterative optimizations without human direction. This marks a shift from AI as a coding assistant to an autonomous domain engineer.

ClaudeMay 1

Anthropic Launches Claude Security Beta to Automatically Scan and Patch Codebases

Anthropic launched Claude Security in public beta for Enterprise customers to identify and remediate vulnerabilities across entire codebases. Unlike traditional scanners that rely on pattern matching, the tool uses reasoning to trace data flows and validate findings through an adversarial pass. This shift reduces false positive fatigue by ensuring every reported issue includes a verified, human-reviewable patch.