The "think" Tool: Enabling Claude to Stop and Think in Complex Tool Use Situations

Published: Mar 20, 2025

Overview

Anthropic has introduced a "think" tool that improves Claude's performance on complex problem-solving tasks. This tool creates dedicated space for structured reasoning during tool use, distinct from the company's separate extended thinking capability.

What is the "think" Tool?

The "think" tool allows Claude to pause during response generation and work through reasoning about whether it has sufficient information to proceed. As explained in the article, it differs from extended thinking: "Extended thinking is all about what Claude does before it starts generating a response...The "think" tool is for Claude to add a step to stop and think about whether it has all the information it needs."

The tool is designed for scenarios where Claude must process external information from tool results rather than relying solely on initial user queries.

Implementation

A basic implementation follows this format:

json

{
  "name": "think",
  "description": "Use the tool to think about something. It will not obtain new information or change the database, but just append the thought to the log. Use it when complex reasoning or some cache memory is needed.",
  "input_schema": {
    "type": "object",
    "properties": {
      "thought": {
        "type": "string",
        "description": "A thought to think about."
      }
    },
    "required": ["thought"]
  }
}

Performance Results

tau-Bench Evaluation

Testing on tau-bench, a customer service benchmark, revealed significant improvements:

Airline Domain:

"Think" tool with optimized prompt: 0.570 on pass^1 metric
Baseline: 0.370
Improvement: 54% relative increase

Retail Domain:

"Think" tool alone: 0.812 on pass^1 metric
Baseline: 0.783

The evaluation used pass^k metric, measuring consistency across multiple trials rather than success in at least one attempt—a critical measure for customer service applications.

SWE-Bench Results

When added to software engineering tasks, the "think" tool improved performance by 1.6% on average (statistically significant with p < .001).

Key Insights

Prompting effectiveness varies by domain complexity: The tool showed dramatic improvements when paired with domain-specific examples in complex policy environments, while simpler domains benefited from the tool alone.
Consistency improvements: Benefits were maintained across multiple trial runs (k=1 through k=5), indicating the tool helps handle edge cases effectively.

When to Use the "think" Tool

Best suited for:

Tool output analysis: Processing previous results before taking action
Policy-heavy environments: Following detailed guidelines while verifying compliance
Sequential decision-making: Tasks where each step builds on previous ones and errors are costly

Not recommended for:

Non-sequential tool calls: Single or parallel calls don't require additional reasoning space
Simple instruction following: When default behavior already performs adequately

Implementation Recommendations

Provide domain-specific guidance: Include clear examples of reasoning approaches tailored to your use case, demonstrating detail levels, decision-making processes, and information-gathering checklists.
Place complex instructions in system prompts: Longer or more intricate guidelines work better in system prompts than tool descriptions, allowing broader contextual integration.
Monitor and iterate: Observe how Claude uses the tool in practice and refine prompts to encourage effective thinking patterns.

Comparison with Extended Thinking

The article notes that extended thinking is preferable for: simpler tool scenarios, non-sequential calls, and domains like coding or mathematics without tool requirements. The "think" tool excels when handling complex tool chains requiring careful analysis of outputs.

Conclusion

The research demonstrates meaningful performance gains for policy-adherent complex tasks with "minimal implementation complexity." The tool requires no changes to external behavior unless Claude chooses to use it and integrates seamlessly with existing workflows.

Note: The article includes an update indicating extended thinking has since improved sufficiently that it's now recommended over the dedicated "think" tool in most cases.

The "think" Tool: Enabling Claude to Stop and Think in Complex Tool Use Situations ​

Overview ​

What is the "think" Tool? ​

Implementation ​

Performance Results ​

tau-Bench Evaluation ​

SWE-Bench Results ​

Key Insights ​

When to Use the "think" Tool ​

Implementation Recommendations ​

Comparison with Extended Thinking ​

Conclusion ​