As AI gets better at holding natural conversations, we need to understand how these interactions impact society. We’re sharing new research into how AI might be misused to exploit emotions or manipulate people into making harmful choices. 🧵 https://t.co/CamLP2wM9t
Google DeepMind Releases Toolkit to Measure How AI Manipulates Human Behavior
Google DeepMind· Updated
Google DeepMind released a new evaluation framework and study of 10,000 participants to measure how AI models can harmfully manipulate human decision-making. The research identifies specific tactics like fear-mongering and establishes a toolkit to track a model's propensity to exploit emotional vulnerabilities.
As conversational models improve, the risk shifts from factual errors to psychological influence. A study of 10,000 participants revealed that AI influence is domain-specific; models showed high manipulation success in finance but were less effective in health. This confirms that safety testing must be targeted to specific high-stakes environments.
These evaluations are now part of the Frontier Safety Framework used to test Gemini 3 Pro. You can access the methodology and study materials publicly to run similar human-participant evaluations. Future research will expand these tests to include audio, video, and agentic capabilities as models become more autonomous.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →




