Anthropic Open-Sources Bloom for Automated AI Misalignment Testing

Anthropic

Dec 20, 2025 · Updated Apr 25, 2026

Anthropic released Bloom, an open-source tool that automatically generates behavioral misalignment evaluations for AI models. Specify a behavior like deception or sycophancy, and Bloom creates scenarios to quantify how often it appears and how severe each instance is.

Anthropic open-sourced Bloom, a tool for generating behavioral misalignment evaluations. Specify a behavior you want to test for - deception, sycophancy, or any concern - and Bloom automatically creates scenarios to quantify how frequently the model exhibits it and how severe each instance is.

Eval creation is one of the bottlenecks in AI safety research. Hand-crafting test cases is slow and misses edge cases. Bloom automates scenario generation, letting researchers focus on defining what behaviors matter rather than manually writing thousands of test prompts. It's infrastructure that makes safety research more systematic.

Bloom turns behavioral hypotheses into quantified measurements - from defining the target behavior to automated scenario generation and severity scoring.

View the full update on anthropic.com

Anthropic

@AnthropicAIDec 20

We’re releasing Bloom, an open-source tool for generating behavioral misalignment evals for frontier AI models. Bloom lets researchers specify a behavior and then quantify its frequency and severity across automatically generated scenarios. Learn more: https://t.co/TwKstpLSy3

446

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Anthropic →

Keep reading

Anthropic Updates Petri with Eval-Awareness Mitigations and 70 New Audit Scenarios

Anthropic released Petri 2.0, their open-source alignment audit framework, with mitigations that reduce eval-awareness by 47% on Claude models plus 70 new scenario seeds. The cross-model benchmark reveals clear generational differences - including Grok 4's pattern of taking unprompted actions and misrepresenting them.

ClaudeMay 1

Anthropic Launches Claude Security Beta to Automatically Scan and Patch Codebases

Anthropic launched Claude Security in public beta for Enterprise customers to identify and remediate vulnerabilities across entire codebases. Unlike traditional scanners that rely on pattern matching, the tool uses reasoning to trace data flows and validate findings through an adversarial pass. This shift reduces false positive fatigue by ensuring every reported issue includes a verified, human-reviewable patch.