The AI Agent Cost Dilemma
Run AI agents in production for any length of time and you hit a dilemma. Use a smart model and costs spiral; use a cheap model and quality slips. Keep a top-tier model running end to end on complex tasks and the bill becomes unmanageable — swap in a budget model and it stumbles on exactly the decisions that matter most.
The 'Advisor StrategyAdvisor Strategy', which Anthropic unveiled on April 9, resolves this dilemma structurally. The core idea is simple: instead of having the expensive model handle everything, a cheaper model does the work and consults the expensive model only when a hard judgment call comes up.
Why the Old Approaches Get Expensive
There have traditionally been two ways to build an AI agent.
The first is running a top-tier model (Opus) end to end. Accuracy is high, but every step — calling tools, reading results, iterating — burns expensive tokens. Even a simple file read gets billed at Opus rates.
The second is the orchestrator pattern, where a big model draws up a plan and farms tasks out to smaller models. This requires decomposition logic, worker pools, and orchestration management — a complex build.
The Advisor Strategy flips both on their head.
The Small Model Works, the Big Model Advises
Here's the structure. Sonnet (or Haiku) acts as the 'executor,' handling the entire task — calling tools, reading results, and iterating toward a solution. The executor completes most tasks entirely on its own.
When the executor hits a decision it can't confidently make alone, it requests 'advice' from Opus. Opus reviews the shared context and returns a plan, a course correction, or a stop signal. The executor takes that input and keeps going.
There's one key rule. The advisor (Opus) never calls tools directly. It never sends output straight to the user. It only points the executor in the right direction. All execution stays with the cheaper model.
A workplace analogy makes it click. It's like a junior employee handling the day-to-day work and stopping by a senior colleague's desk only when a tough call comes up. The senior doesn't have to do everything personally — stepping in on the key judgments alone is enough to raise the quality of the whole.
Real-World Performance and Cost
The benchmark results show why this strategy works.
The Sonnet + Opus advisor combination scored 2.7 percentage points higher on SWE-bench Multilingual than Sonnet alone. At the same time, cost per agent task dropped 11.9%. More accurate, and cheaper.
On BrowseComp and Terminal-Bench 2.0 as well, Sonnet + advisor posted higher scores than Sonnet alone — at lower cost per task.
The Haiku + Opus advisor pairing is even more dramatic. On BrowseComp, Haiku alone scored 19.7%. Add an Opus advisor and it jumps to 41.2% — more than double. That's 29% below Sonnet alone on score, but at 85% lower cost. For high-volume workloads, it's a powerful option.
The reason costs fall is straightforward. All the advisor (Opus) generates is a short plan, typically 400–700 tokens. Every other token of output is billed at the cheaper executor model's rates. Compared with running Opus end to end, the cost difference is substantial.
How to Put It into Practice
In the API, it works with a single added line.
```pythonresponse = client.messages.create( model="claude-sonnet-4-6", # executor tools=[ { "type": "advisor_20260301", "name": "advisor", "model": "claude-opus-4-6", # advisor "max_uses": 3, # cap on advisor calls }, # keep your existing tools as-is ], messages=[...])```
Add the beta header `anthropic-beta: advisor-tool-2026-03-01` and you're ready to go.
A few practical points are worth knowing.
Cost control. Use `max_uses` to cap advisor calls per request. Advisor tokens are reported separately in the usage block, so you can track costs precisely by tier.
Works alongside existing tools. The advisor tool slots into a Messages API request right next to your other tools. Web search, code execution, and Opus consultations can all run inside the same loop.
No extra plumbing. The model handoff happens inside a single /v1/messages request. No additional round trips, no context-management code. The executor decides for itself when to call the advisor.
When Should You Use It?
Anthropic recommends a simple comparison: run your existing eval set under three configurations.
Three tiers of AI model deployment
Compare accuracy and cost across the three, and you'll see immediately which combination is optimal for your workload.
In practice, the most useful scenarios look like this.
Coding agents. Sonnet handles most code changes, with Opus stepping in only for architectural decisions and gnarly debugging. The SWE-bench results validate this scenario directly.
Web research agents. Haiku handles information gathering and synthesis at speed, while Opus sets the direction on judgments that demand complex reasoning. The jump from 19.7% for Haiku alone to 41.2% with an advisor on BrowseComp is exactly this scenario.
Bulk processing. When classifying or summarizing thousands of documents, the Haiku + Opus advisor combination shines — 85% cheaper than Sonnet alone, with double the accuracy of plain Haiku.
Org-Chart Logic, Applied to AI
The significance of the Advisor Strategy goes beyond saving money.
AI agent design used to be a binary choice: pay for the best model, or save on cost. The Advisor Strategy breaks that binary. It offers a structure that delivers top-tier judgment and cost efficiency at the same time.
Human organizations already work this way. If senior staff did every task, the payroll would be unsustainable; if juniors handled everything, quality couldn't be guaranteed. The efficient structure is juniors executing while seniors weigh in on the critical calls. Anthropic has designed collaboration between AI models on the same logic.
If cost has been the thing holding you back from putting AI agents into production, the Advisor Strategy may be the practical answer: use Opus only in the moments that demand Opus-level judgment, and let Sonnet or Haiku handle the rest. It takes one line of code.




