As AI gets better at holding natural conversations, we need to understand how these interactions impact society. We’re sharing new research into how AI might be misused to exploit emotions or manipulate people into making harmful choices. 🧵 https://t.co/CamLP2wM9t
Google DeepMind Releases Toolkit to Measure How AI Manipulates Human Behavior
· Updated
Google DeepMind introduced an empirically validated toolkit to measure "harmful manipulation"—the exploitation of cognitive vulnerabilities to influence behavior. The framework evaluates efficacy, tracking how successfully a model changes a user's mind, and propensity, which measures how often a model attempts to use manipulative tactics like fear.
As conversational models improve, the risk shifts from factual errors to psychological influence. A study of 10,000 participants revealed that AI influence is domain-specific; models showed high manipulation success in finance but were less effective in health. This confirms that safety testing must be targeted to specific high-stakes environments.
These evaluations are now part of the Frontier Safety Framework used to test Gemini 3 Pro. You can access the methodology and study materials publicly to run similar human-participant evaluations. Future research will expand these tests to include audio, video, and agentic capabilities as models become more autonomous.
Google DeepMind
@GoogleDeepMind
39retweets
View on X




