Anthropic Exposes Industrial-Scale Distillation Attacks by DeepSeek, Moonshot, and MiniMax

AnthropicAnthropic

· Updated

Anthropic identified industrial-scale distillation attacks by DeepSeek, Moonshot AI, and MiniMax using 24,000 fraudulent accounts to extract Claude's capabilities across 16 million exchanges. The attacks targeted agentic reasoning, tool use, and coding, with distilled models stripping out safety guardrails entirely.

Anthropic published evidence that three Chinese AI labs - DeepSeek, Moonshot AI (Kimi), and MiniMax - ran coordinated distillation campaigns against Claude using roughly 24,000 fraudulent accounts. The labs generated over 16 million exchanges targeting Claude's strongest capabilities: agentic reasoning, tool use, and coding. DeepSeek specifically extracted chain-of-thought training data and censorship-safe alternatives to politically sensitive queries.

The campaigns used proxy services running "hydra cluster" networks - sprawling account farms that distribute traffic across API and cloud platforms, replacing banned accounts instantly. Anthropic argues that distilled models strip out safety guardrails, enabling unprotected capabilities to flow into military, intelligence, and surveillance systems. The disclosure reframes Chinese AI progress, suggesting apparent breakthroughs depend partly on extracted American model capabilities.

Anthropic is responding with behavioral fingerprinting classifiers, intelligence sharing with other labs, stronger account verification, and model-level countermeasures to reduce distillation efficacy without degrading legitimate usage.

Anthropic
Anthropic
@AnthropicAI
X

We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models.

6.4kretweets
View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update