Skip to content

Introducing Claude Sonnet 4.6

Published: February 17, 2026

Overview

Claude Sonnet 4.6 represents Anthropic's most capable Sonnet model to date. This upgrade enhances capabilities across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. The model features a 1M token context window currently in beta.

For Free and Pro plan users, Claude Sonnet 4.6 is now the default model in claude.ai and Claude Cowork. Pricing remains consistent with Sonnet 4.5 at $3/$15 per million tokens.

Key Improvements

Coding Performance

Developers with early access strongly prefer Sonnet 4.6 over its predecessor, with many even favoring it over Claude Opus 4.5 from November 2025. The model demonstrates superior consistency and instruction-following capabilities.

Performance previously requiring Opus-class models—including real-world office tasks—is now accessible through Sonnet 4.6. The model shows major improvements in computer use skills compared to earlier Sonnet versions.

Safety Evaluation

Extensive safety testing indicates Sonnet 4.6 is as safe or safer than other recent Claude models. Researchers noted the model displays "very strong safety behaviors" with no major misalignment concerns.

Computer Use Capabilities

In October 2024, Anthropic introduced the first general-purpose computer-using model. Sixteen months of development show steady progress on OSWorld benchmarks, which test AI performance across real software like Chrome, LibreOffice, and VS Code.

Early users report human-level capability in complex tasks such as navigating spreadsheets and completing multi-step web forms. While the model still lags behind skilled human users, the improvement rate is substantial.

Prompt Injection Resistance

Computer use poses security risks through prompt injection attacks. Safety evaluations show Sonnet 4.6 significantly improves resistance compared to Sonnet 4.5 and performs similarly to Opus 4.6.

Benchmark Performance

Claude Sonnet 4.6 approaches Opus-level intelligence while maintaining a more practical price point. Key improvements include:

  • Claude Code testing: Users preferred Sonnet 4.6 over Sonnet 4.5 approximately 70% of the time
  • Comparison to Opus 4.5: Users preferred Sonnet 4.6 59% of the time, citing better instruction following and fewer hallucinations
  • Long-context reasoning: The 1M token window enables effective reasoning across entire codebases, lengthy contracts, and multiple research papers
  • Vending-Bench Arena: Sonnet 4.6 developed sophisticated business strategy, investing heavily early then pivoting to profitability

Design and Frontend Quality

Customers report notably more polished visual outputs from Sonnet 4.6, featuring superior layouts, animations, and design sensibility. Fewer iteration rounds are needed to reach production-quality results.

Product Updates

Platform Features

  • Adaptive thinking and extended thinking support
  • Context compaction in beta, which automatically summarizes older context
  • Enhanced web search and fetch tools with automatic filtering and code execution
  • Memory tool, programmatic tool calling, tool search, and tool use examples now generally available

Excel Integration

Claude in Excel now supports MCP connectors, enabling integration with tools like S&P Global, LSEG, Daloopa, PitchBook, Moody's, and FactSet. MCP connections configured in Claude.ai automatically work in Excel.

Recommendation

Opus 4.6 remains optimal for tasks demanding deepest reasoning, such as codebase refactoring, multi-agent coordination, and precision-critical work. Sonnet 4.6 offers strong performance across thinking effort levels.

Availability

Claude Sonnet 4.6 is available across:

  • All Claude plans
  • Claude Cowork
  • Claude Code
  • Claude API (using claude-sonnet-4-6)
  • Major cloud platforms
  • Free tier (now includes file creation, connectors, skills, and compaction)

Customer Testimonials

Industry leaders report significant improvements:

  • Databricks: Matches Opus 4.6 on document comprehension tasks
  • Replit: Extraordinary performance-to-cost ratio for agentic workloads
  • Cursor: Notable improvements on long-horizon tasks
  • GitHub: Excels at complex code fixes across large codebases
  • Cognition: Meaningfully closed gap with Opus on bug detection
  • Pace: Achieved 94% on insurance benchmark for computer use

Methodology Notes:

Benchmark comparisons reference best available API versions. OSWorld tests specific controlled tasks but doesn't fully capture real-world messiness. Terminal-Bench 2.0 uses Terminus-2 harness with 1x guaranteed/3x ceiling resource allocation. SWE-bench Verified scores averaged over 10 trials. Tools evaluations used web search, fetch, code execution, and various reasoning configurations as specified.