We’re donating Petri, our open-source alignment tool, to @meridianlabs_ai, so its development can continue independently. Working with Meridian Labs, we’ve also released a major update that improves the adaptability, realism, and depth of Petri’s tests. https://t.co/CyicsIScJi
Anthropic Donates Petri Alignment Tool to Meridian Labs and Releases Version 3.0
AnthropicAnthropic is donating its open-source Petri alignment framework to the nonprofit Meridian Labs to ensure independent, neutral testing of frontier models. The accompanying 3.0 update introduces a Dish add-on that prevents models from detecting they are being audited, ensuring more realistic safety evaluations.
The update addresses "eval-awareness," where models recognize they are being tested and suppress misaligned behaviors. By using the new Dish add-on, researchers can run audits within a model's actual system prompt and software scaffold. This makes the test environment indistinguishable from real-world deployment, forcing models to reveal their true tendencies.
You can now access Petri 3.0 through Meridian Labs, an AI evaluation nonprofit, where it joins open-source tools like Inspect. This transition follows the Petri 2.0 update to establish a neutral standard for labs. The tool is available on GitHub for researchers auditing model behavior for deception.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →


