Detecting and Preventing Distillation Attacks

Published: February 23, 2026

Overview

Anthropic has identified large-scale campaigns by three Chinese AI laboratories—DeepSeek, Moonshot, and MiniMax—attempting to unlawfully extract Claude's capabilities. These operations generated over 16 million interactions through approximately 24,000 fraudulent accounts, violating terms of service and regional access restrictions.

What is Distillation?

Model distillation involves training a weaker AI system using outputs from a stronger one. While legitimate uses include creating smaller, more affordable model variants, the technique can also enable competitors to acquire advanced capabilities faster and cheaper than independent development.

Security Implications

Illicitly distilled models bypass necessary safeguards, creating national security concerns. These unprotected systems could enable state actors to deploy AI for offensive cyber operations, disinformation, and mass surveillance. Open-sourced distilled models multiply these risks exponentially.

The Three Campaigns

DeepSeek (150,000+ exchanges)

Targeted reasoning capabilities and reward model functions
Used synchronized accounts with shared payment methods
Generated chain-of-thought training data at scale
Produced "censorship-safe alternatives" to politically sensitive queries

Moonshot AI (3.4+ million exchanges)

Focused on agentic reasoning, tool use, and coding
Deployed hundreds of fraudulent accounts across multiple pathways
Attempted to extract and reconstruct Claude's reasoning traces

MiniMax (13+ million exchanges)

Concentrated on agentic coding and tool orchestration
Pivoted within 24 hours when Anthropic released new models
Redirected nearly half their traffic toward latest capabilities

Access Methods

Labs circumvent geographic restrictions using commercial proxy services operating "hydra cluster" architectures—sprawling networks distributing fraudulent accounts across APIs and cloud platforms. One proxy network managed over 20,000 simultaneous fraudulent accounts, mixing illicit traffic with legitimate customer requests.

Detection Indicators

Distillation attacks display characteristic patterns: massive volume concentrated on narrow capabilities, highly repetitive prompt structures, and content directly mapping to valuable training data—distinctly different from normal usage.

Response Strategy

Anthropic is implementing:

Detection: Classifiers and behavioral fingerprinting systems identifying attack patterns
Intelligence Sharing: Technical indicators distributed to industry partners and authorities
Access Controls: Strengthened verification for educational and research accounts
Countermeasures: Product and model-level safeguards reducing output utility for illicit distillation

Broader Context

These attacks undermine export controls designed to preserve American AI competitiveness. Apparent rapid foreign advances actually depend significantly on extracted American capabilities. The issue demands coordinated industry, cloud provider, and policymaker response.

Detecting and Preventing Distillation Attacks ​

Overview ​

What is Distillation? ​

Security Implications ​

The Three Campaigns ​

Access Methods ​

Detection Indicators ​

Response Strategy ​

Broader Context ​