Skip to content

Anthropic's Responsible Scaling Policy: Version 3.0

Published: February 24, 2026

Overview

Anthropic has released the third version of its Responsible Scaling Policy (RSP), a voluntary framework designed to mitigate catastrophic risks from AI systems. This update reflects learnings from over two years of implementation and restructures the policy to better address practical challenges that have emerged.

Background: The Original RSP and Theory of Change

The RSP operates on "conditional, or if-then, commitments." When AI models exceed certain capability thresholds—such as biological science abilities relevant to weapons creation—corresponding safeguards must be implemented.

The original framework established AI Safety Levels (ASLs), with ASL-2 and ASL-3 defined in detail, while higher levels remained largely undefined pending better understanding of advanced capabilities.

Anthropic's initial theory of change relied on four mechanisms:

  • Internal forcing function: Requiring safeguards as prerequisites for model launches
  • Race to the top: Encouraging industry-wide adoption of similar standards
  • Risk consensus: Using capability thresholds as pivotal moments for multilateral action
  • Future coordination: Assuming governments would engage substantially on AI safety

Assessment: What Worked and What Didn't

Successes

The RSP successfully incentivized development of stronger safeguards. Anthropic implemented sophisticated input and output classifiers to comply with ASL-3 standards addressing chemical and biological weapons risks. The company activated ASL-3 protections in May 2025.

Other AI companies—OpenAI and Google DeepMind among them—adopted broadly similar frameworks within months. Governments including California, New York, and the EU have incorporated principles from voluntary standards like the RSP into emerging AI legislation.

Challenges

The policy encountered significant obstacles in creating public consensus about AI risks. Capability thresholds proved "far more ambiguous than anticipated." Models now demonstrate sufficient biological knowledge to pass readily available tests, yet insufficient evidence exists to make a strong public case for high risk levels—creating a "zone of ambiguity."

Government action on AI safety has moved slowly. The political environment has shifted toward prioritizing competitiveness and economic growth, with safety discussions gaining limited traction at the federal level.

Higher ASL requirements appear potentially impossible for Anthropic to implement unilaterally. A RAND report on model weight security states that achieving robust protections against state-level actors is "currently not possible" and will require national security community assistance.

Three Key Elements of RSP Version 3.0

1. Separating Company Plans from Industry Recommendations

The updated RSP now distinguishes between mitigations Anthropic commits to pursuing independently and an ambitious industry-wide capabilities-to-mitigations map. This acknowledges that some safeguards require collective action beyond single-company capacity.

2. Frontier Safety Roadmap

Rather than hard commitments, Anthropic will publish publicly-graded goals across Security, Alignment, Safeguards, and Policy domains. Example objectives include:

  • Launching moonshot R&D projects for unprecedented information security
  • Developing red-teaming methods surpassing hundreds of bug bounty participants
  • Implementing systematic measures ensuring Claude aligns with its constitution
  • Establishing comprehensive records of critical AI development activities
  • Publishing policy proposals for a "regulatory ladder" scaling with increasing risk

3. Risk Reports and External Review

Anthropic will publish Risk Reports every 3-6 months detailing model safety profiles, explaining how capabilities, threat models, and mitigations interconnect, and assessing overall risk levels.

The RSP requires external review in certain circumstances. Expert third-party reviewers with AI safety expertise, incentivized toward honesty, and free from major conflicts of interest will receive unredacted or minimally-redacted access and conduct public reviews of Anthropic's reasoning and decision-making.

Risk Reports will address gaps between current measures and more ambitious industry-wide recommendations, potentially contributing to beneficial policy development.

Conclusion

As a "living document," the RSP will continue evolving as AI capabilities advance. This revision amplifies successful elements while separating realistic unilateral commitments from broader industry recommendations requiring multilateral coordination.