Skip to content

Introducing Advanced Tool Use on the Claude Developer Platform

Overview

Anthropic has released three beta features enabling Claude to dynamically discover, learn, and execute tools more efficiently:

  • Tool Search Tool – Allows Claude to discover tools on-demand without loading all definitions upfront
  • Programmatic Tool Calling – Enables Claude to orchestrate tools through code rather than sequential API calls
  • Tool Use Examples – Provides concrete usage patterns beyond schema definitions

Tool Search Tool

The Problem

Large tool libraries consume significant tokens. A five-server setup (GitHub, Slack, Sentry, Grafana, Splunk) totals approximately 55K tokens before any conversation begins. The feature states that "tool definitions consume 134K tokens before optimization" at Anthropic's scale.

Tool selection errors occur frequently, especially with similarly named tools like notification-send-user versus notification-send-channel.

The Solution

Rather than loading all tool definitions upfront, the Tool Search Tool discovers relevant tools as needed. According to the documentation, this achieves "an 85% reduction in token usage while maintaining access to your full tool library."

Internal testing showed accuracy improvements: Opus 4 improved from 49% to 74%, and Opus 4.5 improved from 79.5% to 88.1%.

Implementation

Tools are marked with defer_loading: true to enable on-demand discovery:

json
{
  "tools": [
    {"type": "tool_search_tool_regex_20251119", "name": "tool_search_tool_regex"},
    {
      "name": "github.createPullRequest",
      "description": "Create a pull request",
      "defer_loading": true
    }
  ]
}

For MCP servers, entire servers can be deferred while keeping high-use tools loaded.

When to Use

Best for:

  • Tool definitions consuming >10K tokens
  • Tool selection accuracy issues
  • MCP systems with multiple servers
  • 10+ tools available

Less beneficial for:

  • Small libraries (<10 tools)
  • Frequently used tools in every session
  • Compact tool definitions

Programmatic Tool Calling

The Problem

Traditional tool calling creates two bottlenecks:

  1. Context pollution – Intermediate results accumulate unnecessarily. Analyzing a 10MB log file for error patterns loads the entire file into context despite only needing a summary.

  2. Inference overhead – Each tool call requires a full model inference pass. A five-tool workflow means five separate inferences plus manual result synthesis.

The Solution

Claude writes Python code that orchestrates tools, processes outputs, and controls what enters its context. The code executes in a sandboxed environment, with only final results visible to the model.

Example use case: Finding team members who exceeded Q3 travel budgets.

Traditional approach: Fetch 20 team members → fetch each person's 50-100 expense items (20 calls) → fetch budgets → all 2,000+ line items enter context → Claude manually sums and compares.

With Programmatic Tool Calling: Claude writes code that fetches data in parallel, filters locally, and returns only the 2-3 people who exceeded limits.

Efficiency Gains

  • Token savings: 37% reduction (43,588 to 27,297 tokens on complex tasks)
  • Latency: Eliminates 19+ inference passes by orchestrating 20+ calls in one code block
  • Accuracy: Knowledge retrieval improved from 25.6% to 28.5%; GIA benchmarks from 46.5% to 51.2%

Implementation

Step 1 – Mark tools as callable from code:

json
{
  "tools": [
    {"type": "code_execution_20250825", "name": "code_execution"},
    {
      "name": "get_team_members",
      "allowed_callers": ["code_execution_20250825"]
    }
  ]
}

Step 2 – Claude generates orchestration code in Python

Step 3 – Tools execute within the Code Execution environment, not Claude's context

Step 4 – Only final output returns to Claude

When to Use

Most beneficial for:

  • Large datasets requiring aggregation
  • Multi-step workflows (3+ dependent calls)
  • Filtering/transforming results before Claude sees them
  • Parallel operations across many items

Less beneficial for:

  • Single-tool invocations
  • Tasks requiring Claude to see all intermediate results
  • Quick lookups with small responses

Tool Use Examples

The Problem

JSON schemas define structural validity but cannot express usage patterns: when to include optional parameters, which combinations make sense, or API conventions.

Example: A support ticket API with required title field and many optional nested fields leaves questions unanswered:

  • Date format: "2024-11-06", "Nov 6, 2024", or ISO 8601?
  • ID conventions: UUID, "USR-12345", or "12345"?
  • When to populate nested structures?
  • Relationship between escalation.level and priority?

The Solution

Tool Use Examples show concrete usage patterns:

json
{
  "name": "create_ticket",
  "input_examples": [
    {
      "title": "Login page returns 500 error",
      "priority": "critical",
      "labels": ["bug", "authentication", "production"],
      "reporter": {
        "id": "USR-12345",
        "name": "Jane Smith",
        "contact": {"email": "jane@acme.com", "phone": "+1-555-0123"}
      },
      "due_date": "2024-11-06",
      "escalation": {"level": 2, "notify_manager": true, "sla_hours": 4}
    },
    {
      "title": "Add dark mode support",
      "labels": ["feature-request", "ui"],
      "reporter": {"id": "USR-67890", "name": "Alex Chen"}
    },
    {"title": "Update API documentation"}
  ]
}

Internal testing showed accuracy improvement from 72% to 90% on complex parameter handling.

When to Use

Most beneficial for:

  • Complex nested structures with non-obvious patterns
  • Many optional parameters with conditional inclusion
  • Domain-specific API conventions
  • Similar tools needing clarification

Less beneficial for:

  • Simple single-parameter tools
  • Standard formats (URLs, emails)
  • Validation better handled by schema constraints

Best Practices

Layer Features Strategically

Start with the biggest bottleneck:

  • Context bloat from tool definitions → Tool Search Tool
  • Large intermediate results → Programmatic Tool Calling
  • Parameter errors → Tool Use Examples

Layer additional features as needed; they're complementary.

Tool Search Setup

Use clear, descriptive tool names and descriptions:

json
{
  "name": "search_customer_orders",
  "description": "Search for customer orders by date range, status, or total amount. Returns order details including items, shipping, and payment info."
}

Keep three to five most-used tools always loaded; defer the rest.

Programmatic Tool Calling Setup

Document return formats clearly to help Claude write correct parsing logic:

json
{
  "name": "get_orders",
  "description": "Retrieve orders for a customer. Returns: List of order objects with id, total (float), status, items array, and created_at timestamp"
}

Opt-in tools that can run in parallel and are idempotent.

Tool Use Examples Setup

Craft examples showing realistic data with minimal, partial, and full specification patterns. Keep examples concise (1-5 per tool) and focus on actual ambiguities.

Getting Started

Enable features with the beta header:

python
client.beta.messages.create(
    betas=["advanced-tool-use-2025-11-20"],
    model="claude-sonnet-4-5-20250929",
    max_tokens=4096,
    tools=[
        {"type": "tool_search_tool_regex_20251119", "name": "tool_search_tool_regex"},
        {"type": "code_execution_20250825", "name": "code_execution"},
    ]
)

See Platform Documentation for detailed guides and cookbooks.

Real-World Application

Claude for Excel uses Programmatic Tool Calling to read and modify spreadsheets with thousands of rows without overloading the model's context window, demonstrating practical capabilities previously impossible with conventional tool patterns.