Detecting and Preventing Distillation Attacks
Published: February 23, 2026
Overview
Anthropic has identified large-scale campaigns by three Chinese AI laboratories—DeepSeek, Moonshot, and MiniMax—attempting to unlawfully extract Claude's capabilities. These operations generated over 16 million interactions through approximately 24,000 fraudulent accounts, violating terms of service and regional access restrictions.
What is Distillation?
Model distillation involves training a weaker AI system using outputs from a stronger one. While legitimate uses include creating smaller, more affordable model variants, the technique can also enable competitors to acquire advanced capabilities faster and cheaper than independent development.
Security Implications
Illicitly distilled models bypass necessary safeguards, creating national security concerns. These unprotected systems could enable state actors to deploy AI for offensive cyber operations, disinformation, and mass surveillance. Open-sourced distilled models multiply these risks exponentially.
The Three Campaigns
DeepSeek (150,000+ exchanges)
- Targeted reasoning capabilities and reward model functions
- Used synchronized accounts with shared payment methods
- Generated chain-of-thought training data at scale
- Produced "censorship-safe alternatives" to politically sensitive queries
Moonshot AI (3.4+ million exchanges)
- Focused on agentic reasoning, tool use, and coding
- Deployed hundreds of fraudulent accounts across multiple pathways
- Attempted to extract and reconstruct Claude's reasoning traces
MiniMax (13+ million exchanges)
- Concentrated on agentic coding and tool orchestration
- Pivoted within 24 hours when Anthropic released new models
- Redirected nearly half their traffic toward latest capabilities
Access Methods
Labs circumvent geographic restrictions using commercial proxy services operating "hydra cluster" architectures—sprawling networks distributing fraudulent accounts across APIs and cloud platforms. One proxy network managed over 20,000 simultaneous fraudulent accounts, mixing illicit traffic with legitimate customer requests.
Detection Indicators
Distillation attacks display characteristic patterns: massive volume concentrated on narrow capabilities, highly repetitive prompt structures, and content directly mapping to valuable training data—distinctly different from normal usage.
Response Strategy
Anthropic is implementing:
- Detection: Classifiers and behavioral fingerprinting systems identifying attack patterns
- Intelligence Sharing: Technical indicators distributed to industry partners and authorities
- Access Controls: Strengthened verification for educational and research accounts
- Countermeasures: Product and model-level safeguards reducing output utility for illicit distillation
Broader Context
These attacks undermine export controls designed to preserve American AI competitiveness. Apparent rapid foreign advances actually depend significantly on extracted American capabilities. The issue demands coordinated industry, cloud provider, and policymaker response.