The AI Agent Cost Dilemma
Run AI agents in production for any length of time and you hit a dilemma. Use a smart model and costs spiral; use a cheap model and quality slips. Keep a top-tier model running end to end on complex tasks and the bill becomes unmanageable — swap in a budget model and it stumbles on exactly the decisions that matter most.

The 'Advisor StrategyAdvisor Strategy', which Anthropic unveiled on April 9, resolves this dilemma structurally. The core idea is simple: instead of having the expensive model handle everything, a cheaper model does the work and consults the expensive model only when a hard judgment call comes up.

Why the Old Approaches Get Expensive
There have traditionally been two ways to build an AI agent.

The first is running a top-tier model (Opus) end to end. Accuracy is high, but every step — calling tools, reading results, iterating — burns expensive tokens. Even a simple file read gets billed at Opus rates.

The second is the orchestrator pattern, where a big model draws up a plan and farms tasks out to smaller models. This requires decomposition logic, worker pools, and orchestration management — a complex build.

The Advisor Strategy flips both on their head.

The Small Model Works, the Big Model Advises
Here's the structure. Sonnet (or Haiku) acts as the 'executor,' handling the entire task — calling tools, reading results, and iterating toward a solution. The executor completes most tasks entirely on its own.

When the executor hits a decision it can't confidently make alone, it requests 'advice' from Opus. Opus reviews the shared context and returns a plan, a course correction, or a stop signal. The executor takes that input and keeps going.

There's one key rule. The advisor (Opus) never calls tools directly. It never sends output straight to the user. It only points the executor in the right direction. All execution stays with the cheaper model.

A workplace analogy makes it click. It's like a junior employee handling the day-to-day work and stopping by a senior colleague's desk only when a tough call comes up. The senior doesn't have to do everything personally — stepping in on the key judgments alone is enough to raise the quality of the whole.

Real-World Performance and Cost
The benchmark results show why this strategy works.

The Sonnet + Opus advisor combination scored 2.7 percentage points higher on SWE-bench Multilingual than Sonnet alone. At the same time, cost per agent task dropped 11.9%. More accurate, and cheaper.

On BrowseComp and Terminal-Bench 2.0 as well, Sonnet + advisor posted higher scores than Sonnet alone — at lower cost per task.

The Haiku + Opus advisor pairing is even more dramatic. On BrowseComp, Haiku alone scored 19.7%. Add an Opus advisor and it jumps to 41.2% — more than double. That's 29% below Sonnet alone on score, but at 85% lower cost. For high-volume workloads, it's a powerful option.

The reason costs fall is straightforward. All the advisor (Opus) generates is a short plan, typically 400–700 tokens. Every other token of output is billed at the cheaper executor model's rates. Compared with running Opus end to end, the cost difference is substantial.

How to Put It into Practice
In the API, it works with a single added line.

```pythonresponse = client.messages.create( model="claude-sonnet-4-6", # executor tools=[ { "type": "advisor_20260301", "name": "advisor", "model": "claude-opus-4-6", # advisor "max_uses": 3, # cap on advisor calls }, # keep your existing tools as-is ], messages=[...])```

Add the beta header `anthropic-beta: advisor-tool-2026-03-01` and you're ready to go.

A few practical points are worth knowing.

Cost control. Use `max_uses` to cap advisor calls per request. Advisor tokens are reported separately in the usage block, so you can track costs precisely by tier.

Works alongside existing tools. The advisor tool slots into a Messages API request right next to your other tools. Web search, code execution, and Opus consultations can all run inside the same loop.

No extra plumbing. The model handoff happens inside a single /v1/messages request. No additional round trips, no context-management code. The executor decides for itself when to call the advisor.

When Should You Use It?

Question

The AI Agent Cost Dilemma
Run AI agents in production for any length of time and you hit a dilemma. Use a smart model and costs spiral; use a cheap model and quality slips. Keep a top-tier model running end to end on complex tasks and the bill becomes unmanageable — swap in a budget model and it stumbles on exactly the decisions that matter most.

The 'Advisor StrategyAdvisor Strategy​', which Anthropic unveiled on April 9, resolves this dilemma structurally. The core idea is simple: instead of having the expensive model handle everything, a cheaper model does the work and consults the expensive model only when a hard judgment call comes up.

Why the Old Approaches Get Expensive
There have traditionally been two ways to build an AI agent.

The first is running a top-tier model (Opus) end to end. Accuracy is high, but every step — calling tools, reading results, iterating — burns expensive tokens. Even a simple file read gets billed at Opus rates.

The second is the orchestrator pattern, where a big model draws up a plan and farms tasks out to smaller models. This requires decomposition logic, worker pools, and orchestration management — a complex build.

The Advisor Strategy flips both on their head.

The Small Model Works, the Big Model Advises
Here's the structure. Sonnet (or Haiku) acts as the 'executor,' handling the entire task — calling tools, reading results, and iterating toward a solution. The executor completes most tasks entirely on its own.

When the executor hits a decision it can't confidently make alone, it requests 'advice' from Opus. Opus reviews the shared context and returns a plan, a course correction, or a stop signal. The executor takes that input and keeps going.

There's one key rule. The advisor (Opus) never calls tools directly. It never sends output straight to the user. It only points the executor in the right direction. All execution stays with the cheaper model.

A workplace analogy makes it click. It's like a junior employee handling the day-to-day work and stopping by a senior colleague's desk only when a tough call comes up. The senior doesn't have to do everything personally — stepping in on the key judgments alone is enough to raise the quality of the whole.

Real-World Performance and Cost
The benchmark results show why this strategy works.

The Sonnet + Opus advisor combination scored 2.7 percentage points higher on SWE-bench Multilingual than Sonnet alone. At the same time, cost per agent task dropped 11.9%. More accurate, and cheaper.

On BrowseComp and Terminal-Bench 2.0 as well, Sonnet + advisor posted higher scores than Sonnet alone — at lower cost per task.

The Haiku + Opus advisor pairing is even more dramatic. On BrowseComp, Haiku alone scored 19.7%. Add an Opus advisor and it jumps to 41.2% — more than double. That's 29% below Sonnet alone on score, but at 85% lower cost. For high-volume workloads, it's a powerful option.

The reason costs fall is straightforward. All the advisor (Opus) generates is a short plan, typically 400–700 tokens. Every other token of output is billed at the cheaper executor model's rates. Compared with running Opus end to end, the cost difference is substantial.

How to Put It into Practice
In the API, it works with a single added line.

```pythonresponse = client.messages.create(    model="claude-sonnet-4-6",  # executor    tools=[        {            "type": "advisor_20260301",            "name": "advisor",            "model": "claude-opus-4-6",  # advisor            "max_uses": 3,  # cap on advisor calls        },        # keep your existing tools as-is    ],    messages=[...])```

Add the beta header `anthropic-beta: advisor-tool-2026-03-01` and you're ready to go.

A few practical points are worth knowing.

Cost control. Use `max_uses` to cap advisor calls per request. Advisor tokens are reported separately in the usage block, so you can track costs precisely by tier.

Works alongside existing tools. The advisor tool slots into a Messages API request right next to your other tools. Web search, code execution, and Opus consultations can all run inside the same loop.

No extra plumbing. The model handoff happens inside a single /v1/messages request. No additional round trips, no context-management code. The executor decides for itself when to call the advisor.

When Should You Use It?

Accepted Answer

Anthropic recommends a simple comparison: run your existing eval set under three configurations.

The Advisor Strategy: How to Cut AI Agent Costs in Half

The AI Agent Cost Dilemma

Why the Old Approaches Get Expensive

The Small Model Works, the Big Model Advises

Real-World Performance and Cost

How to Put It into Practice

When Should You Use It?

Org-Chart Logic, Applied to AI

References

리브레토의 인기글

리브레토 인사이트 구독

The AI Agent Cost Dilemma

Why the Old Approaches Get Expensive

The Small Model Works, the Big Model Advises

Real-World Performance and Cost

How to Put It into Practice

When Should You Use It?

Org-Chart Logic, Applied to AI

References

Books to read with this insight

Recommended

리브레토의 인기글