Anthropic · Model release
Claude Sonnet 5: The Agentic Upgrade That Changes Your Daily Workflow
Claude Sonnet 5 shipped June 30 with a 13-point Terminal-bench jump, configurable reasoning effort, and introductory pricing that undercuts Opus for most automation. Here's what changes for Claude Code users today.
Sonnet 4.6 was the workhorse. Solid, predictable, occasionally frustrating when it stalled halfway through a complex migration and decided to stop short rather than push through ambiguity. Sonnet 5 — released June 30, 2026 — is the first Sonnet model that reliably finishes what it starts. Anthropic calls it "the most agentic Sonnet yet" and, unusually for marketing copy, the benchmarks back that claim up.
If you run Claude Code daily, this post covers exactly what changed, what didn't, and what you should reconfigure before the end of July.
What changed in 90 seconds
Sonnet 5 is now the default model for Free and Pro plans. Three things meaningfully changed for developers:
- The model finishes complex multi-step tasks where Sonnet 4.6 stopped short — checking its own output without being asked to.
- It introduces configurable reasoning effort — low/medium/high — so you tune compute per task rather than accepting a fixed budget.
- It's safer by design: lower hallucination rate, better refusal of malicious requests, stronger resistance to prompt injection. These aren't just checkbox improvements for production agents running autonomously.
The one thing that did not change: the gap with Opus 4.8 is real. 63.2% vs 69.2% on agentic coding. For daily automation that gap rarely bites you; for complex open-ended architecture work, it shows. More on that in the pricing section.
Agentic coding benchmarks
The headline number is Terminal-bench 2.1: Sonnet 5 scores 80.5%, up from 67% for Sonnet 4.6. That +13.5 percentage-point jump is the biggest single-release improvement in Sonnet history.
Terminal-bench tests what Claude Code users actually care about: multi-step command-line workflows requiring planning, iteration, and coordinated tool use. Not static code completion in a bubble. Not clever one-shot answers to contrived puzzles. Planning and error-recovery in a real shell. That's the benchmark that predicts whether your CI automation pipeline actually finishes.
80.5%
Terminal-bench 2.1
vs 67% for Sonnet 4.6
63.2%
Agentic coding overall
vs 69.2% for Opus 4.8
58.1%
Sonnet 4.6 baseline
for context
The overall agentic coding score of 63.2% sits between its predecessor (58.1%) and Opus 4.8 (69.2%). The interesting data point is that Sonnet 5 outperforms Opus 4.8 on knowledge work benchmarks — the gap is coding-specific and concentrated in the kinds of tasks where context accumulation and sustained multi-hour reasoning actually matter.
Configurable reasoning effort
This is the practical change that will matter most day-to-day. Instead of a fixed reasoning budget, Sonnet 5 exposes three effort levels. Pick the right tier for the task:
- low — simple edits, fast file reads, one-liner fixes. Near-instant latency, minimal token cost.
- medium — the default. Standard one-file refactors, most debugging sessions, routine Claude Code sessions.
- high — multi-file refactors, complex migrations, security audits. Tokens run 2–4× medium. Worth it when a partial result is worse than no result.
You can set effort per-project in your CLAUDE.md config block, or override it per-session from the CLI:
{
"model": "claude-sonnet-5",
"reasoningEffort": "high"
}# Run a session at high effort
claude --model claude-sonnet-5 --effort high
# Pin to medium for routine work (matches the default)
claude config set model claude-sonnet-5
claude config set reasoningEffort mediumPricing and the Opus trade-off
Introductory pricing through August 31: $2/M input, $10/M output. After that: $3/M input, $15/M output. Cheaper than Opus 4.8, pricier than Gemini 3.5 Flash for the same output volume.
The practical trade-off: Opus 4.8 scores 69.2% on agentic coding vs Sonnet 5's 63.2%. For the category of task where that gap bites — novel architecture problems, complex multi-file migrations with unexpected edge cases, anything requiring sustained multi-hour reasoning — Opus is still the safer bet. For everything else, Sonnet 5 finishes the job and costs less.
Daniel Shepard (Zapier) put it directly: "That used to stall halfway. For day-to-day automation, it's a no-brainer." The Zapier use case is the right mental model — multi-part workflows that need a model to coordinate tools from start to finish without human re-prompting. Sonnet 5 is purpose-built for that pattern.
Switching in Claude Code today
Sonnet 5 is already the default if you're on Free or Pro plans. Max, Team, and Enterprise users can select it explicitly. To pin it and configure reasoning effort across your projects:
# Pin Sonnet 5 globally
claude config set model claude-sonnet-5
# Verify the active model
claude config get modelFor project-specific configuration in a CLAUDE.md-managed repo:
## Model configuration
- Model: claude-sonnet-5
- Reasoning effort: high (security audits, migrations)
- Reasoning effort: medium (day-to-day coding sessions)
- Reasoning effort: low (file reads, simple edits, lint fixes)Skills that benefit most from high effort: code-review, security audit, codebase analysis. Skills that are fine on low: file reads, simple refactors, lint fixes, docstring generation.
Install the effort-tuner skill from skills-hub.ai to get a guided walkthrough that audits your current Claude Code sessions and recommends the right effort level per workflow type:
npx @skills-hub-ai/cli install sonnet5-effort-tunerCaveats worth knowing
Three things to verify before migrating production workloads to Sonnet 5:
Tokenizer change
Already covered in the pricing section, but worth repeating: inputs map to 1.0–1.35× more tokens. Your token budget math may be off by up to 35%. Run representative prompts through the API and measure before you commit.
Cybersecurity capabilities
Anthropic deliberately weakened Sonnet 5's cybersecurity capabilities compared to Sonnet 4.6. This was a safety decision, not an oversight. If your workflow includes security tooling — pentest automation, CVE analysis, exploit research — test on representative inputs before switching. Some tasks that Sonnet 4.6 completed may require Opus 4.8 instead.
Rate limits
Sonnet 5 ships with higher rate limits across all plans, specifically to accommodate the increased token usage from high-effort reasoning sessions. If you've been hitting rate limits on Sonnet 4.6 during agentic runs, those limits expanded with Sonnet 5. Check the API dashboard for your updated tier.
Verdict
Switch now, while introductory pricing holds through August 31. Run your most demanding Claude Code sessions on Sonnet 5 with higheffort for a week. If it feels like the model that should have shipped six months ago, you're not imagining it — that's the 13.5-point Terminal-bench jump showing up in your day.
Keep Opus 4.8 on the bench for tasks where a partial result is worse than no result: novel architecture rewrites with no prior art in your codebase, migrations touching hundreds of files with cascading dependencies, anything where stopping halfway costs more than starting over. Outside that narrower set, Sonnet 5 is the right default.
Sonnet 5 is a strict improvement over Sonnet 4.6 on agentic search and computer use evaluations across different effort levels.
The August 31 pricing change is the date to watch. At $3/M input after that, the Opus trade-off math shifts again. For now, run Sonnet 5 hard and build your intuition for where it holds and where it falls short. That intuition will matter when you're calibrating your agent pipelines at standard pricing.
Written by
Skills-Hub Team
Anthropic ecosystem coverage
Skills-Hub is the open registry for AI coding skills, with SKILL.md files synced daily from Anthropic, Google, Microsoft, and 90+ official sources. Free + MIT.