We’re donating Petri, our open-source alignment tool, to @meridianlabs_ai, so its development can continue independently. Working with Meridian Labs, we’ve also released a major update that improves the adaptability, realism, and depth of Petri’s tests. https://t.co/CyicsIScJi
Anthropic Donates Petri Alignment Tool to Meridian Labs and Releases Version 3.0
Anthropic released Petri 3.0, an open-source framework for auditing AI alignment (ensuring models behave according to human values). This version introduces a modular architecture that decouples the auditor model from the target model. It also integrates with Anthropic's Bloom tool for high-depth behavioral testing alongside Petri's broad-spectrum audits.
The update addresses "eval-awareness," where models recognize they are being tested and suppress misaligned behaviors. By using the new Dish add-on, researchers can run audits within a model's actual system prompt and software scaffold. This makes the test environment indistinguishable from real-world deployment, forcing models to reveal their true tendencies.
You can now access Petri 3.0 through Meridian Labs, an AI evaluation nonprofit, where it joins open-source tools like Inspect. This transition follows the Petri 2.0 update to establish a neutral standard for labs. The tool is available on GitHub for researchers auditing model behavior for deception.
Anthropic
@AnthropicAI
113retweets1.5klikes
View on XStill wondering? A few quick answers below.
Petri is an open-source toolbox of alignment tests used to evaluate large language models for concerning behaviors like deception, sycophancy, and harmful cooperation. It uses an auditor model to simulate scenarios and a judge model to score the results. Anthropic has used it to assess every Claude model since Sonnet 4.5.
The Dish add-on makes alignment tests more realistic by running them using a model's actual system prompt and software scaffold. This prevents models from detecting they are being evaluated, a phenomenon known as eval-awareness. By making the test environment indistinguishable from real-world deployment, researchers can better observe how a model behaves in general.
Yes, Petri is an open-source tool available for the entire AI development community to use. Following its donation to Meridian Labs, the project is hosted on GitHub where users can find instructions for installation and use. It is designed to be applied to any large language model, not just those developed by Anthropic.
Anthropic donated Petri to Meridian Labs, an AI evaluation nonprofit, to ensure the tool remains independent of any specific AI lab. This move is intended to make Petri's results more neutral and credible across the industry. It follows a similar pattern to Anthropic's donation of the Model Context Protocol to the Linux Foundation.
Petri 3.0 introduces a modular architecture that allows users to tweak the auditor and target models separately. It also includes the Dish add-on for more realistic testing and integrates with Bloom, another open-source tool, for deeper behavioral assessments. These changes improve the adaptability, realism, and depth of the framework's alignment tests.



