Introducing Claude Sonnet 4.6
Published: February 17, 2026
Overview
Claude Sonnet 4.6 represents Anthropic's most capable Sonnet model to date. This upgrade enhances capabilities across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. The model features a 1M token context window currently in beta.
For Free and Pro plan users, Claude Sonnet 4.6 is now the default model in claude.ai and Claude Cowork. Pricing remains consistent with Sonnet 4.5 at $3/$15 per million tokens.
Key Improvements
Coding Performance
Developers with early access strongly prefer Sonnet 4.6 over its predecessor, with many even favoring it over Claude Opus 4.5 from November 2025. The model demonstrates superior consistency and instruction-following capabilities.
Performance previously requiring Opus-class models—including real-world office tasks—is now accessible through Sonnet 4.6. The model shows major improvements in computer use skills compared to earlier Sonnet versions.
Safety Evaluation
Extensive safety testing indicates Sonnet 4.6 is as safe or safer than other recent Claude models. Researchers noted the model displays "very strong safety behaviors" with no major misalignment concerns.
Computer Use Capabilities
In October 2024, Anthropic introduced the first general-purpose computer-using model. Sixteen months of development show steady progress on OSWorld benchmarks, which test AI performance across real software like Chrome, LibreOffice, and VS Code.
Early users report human-level capability in complex tasks such as navigating spreadsheets and completing multi-step web forms. While the model still lags behind skilled human users, the improvement rate is substantial.
Prompt Injection Resistance
Computer use poses security risks through prompt injection attacks. Safety evaluations show Sonnet 4.6 significantly improves resistance compared to Sonnet 4.5 and performs similarly to Opus 4.6.
Benchmark Performance
Claude Sonnet 4.6 approaches Opus-level intelligence while maintaining a more practical price point. Key improvements include:
- Claude Code testing: Users preferred Sonnet 4.6 over Sonnet 4.5 approximately 70% of the time
- Comparison to Opus 4.5: Users preferred Sonnet 4.6 59% of the time, citing better instruction following and fewer hallucinations
- Long-context reasoning: The 1M token window enables effective reasoning across entire codebases, lengthy contracts, and multiple research papers
- Vending-Bench Arena: Sonnet 4.6 developed sophisticated business strategy, investing heavily early then pivoting to profitability
Design and Frontend Quality
Customers report notably more polished visual outputs from Sonnet 4.6, featuring superior layouts, animations, and design sensibility. Fewer iteration rounds are needed to reach production-quality results.
Product Updates
Platform Features
- Adaptive thinking and extended thinking support
- Context compaction in beta, which automatically summarizes older context
- Enhanced web search and fetch tools with automatic filtering and code execution
- Memory tool, programmatic tool calling, tool search, and tool use examples now generally available
Excel Integration
Claude in Excel now supports MCP connectors, enabling integration with tools like S&P Global, LSEG, Daloopa, PitchBook, Moody's, and FactSet. MCP connections configured in Claude.ai automatically work in Excel.
Recommendation
Opus 4.6 remains optimal for tasks demanding deepest reasoning, such as codebase refactoring, multi-agent coordination, and precision-critical work. Sonnet 4.6 offers strong performance across thinking effort levels.
Availability
Claude Sonnet 4.6 is available across:
- All Claude plans
- Claude Cowork
- Claude Code
- Claude API (using
claude-sonnet-4-6) - Major cloud platforms
- Free tier (now includes file creation, connectors, skills, and compaction)
Customer Testimonials
Industry leaders report significant improvements:
- Databricks: Matches Opus 4.6 on document comprehension tasks
- Replit: Extraordinary performance-to-cost ratio for agentic workloads
- Cursor: Notable improvements on long-horizon tasks
- GitHub: Excels at complex code fixes across large codebases
- Cognition: Meaningfully closed gap with Opus on bug detection
- Pace: Achieved 94% on insurance benchmark for computer use
Methodology Notes:
Benchmark comparisons reference best available API versions. OSWorld tests specific controlled tasks but doesn't fully capture real-world messiness. Terminal-Bench 2.0 uses Terminus-2 harness with 1x guaranteed/3x ceiling resource allocation. SWE-bench Verified scores averaged over 10 trials. Tools evaluations used web search, fetch, code execution, and various reasoning configurations as specified.