Claude · Model release
Claude Opus 4.8: 88.6% SWE-bench, Cheaper Fast Mode, and What Developers Need to Know
Opus 4.8 lands at 88.6% SWE-bench Verified and 69.2% on the harder Pro set — nearly 5 points ahead of 4.7. Here's what the benchmark gap means in practice, how the 3× cheaper fast mode changes cost math, and when to route tasks to Opus vs Sonnet.
Anthropic released Claude Opus 4.8 on May 28, 2026. The headline number is 88.6% on SWE-bench Verified, a one-point gain over Opus 4.7's 87.6%. That sounds incremental. The more interesting number is on SWE-bench Pro: 69.2% versus 64.3% — a 4.9-point jump on the harder evaluation that better predicts real-world coding agent performance. The gap between 4.7 and 4.8 is larger than the headline suggests, and it shows up exactly where it matters most.
Pricing is unchanged at $5/M input and $25/M output. But fast mode — which runs at 2.5× speed for 2× the token rate — is now three times cheaper than the equivalent mode on previous-generation models. For teams running high-throughput pipelines, that math changes the build.
The 4.7 → 4.8 benchmark gap
SWE-bench Verified is the industry's de facto coding benchmark: 500 real GitHub issues, each with a failing test suite that the model must fix. Opus 4.7 scored 87.6%. Opus 4.8 scores 88.6% — the highest any model has posted at general availability. But the one-point delta undersells what changed.
88.6%
SWE-bench Verified
vs 87.6% for Opus 4.7
69.2%
SWE-bench Pro
+4.9 pts over Opus 4.7
93.6%
GPQA Diamond
graduate-level reasoning
SWE-bench Pro is the same format but harder: the problems require navigating larger codebases, the tests are less forgiving, and fewer hints appear in issue descriptions. A 5-point jump there is substantial. TrueFoundry ran Opus 4.8 and 4.7 head-to-head on their own gateway infrastructure (50 real issues) and saw 50/50 versus 47/50 — a 6-point gap on their set, directionally consistent with Anthropic's Pro numbers.
Terminal-Bench 2.1, the agentic coding evaluation that measures multi-turn tool use across a shell session, puts Opus 4.8 at 74.6%. GDPval-AA, the alignment-aware evaluation that discounts outputs that are helpful but dishonest, scores 1890 Elo — a measurable honesty improvement over 4.7.
SWE-bench Verified vs Pro: what they measure
The distinction matters when you're deciding whether to upgrade. SWE-bench Verified samples from a cleaned, human-verified subset of the full SWE-bench dataset. Problems are real but bounded: the issue title usually contains a strong signal, the failing test is often directly related to the description, and the repo context is manageable.
SWE-bench Pro adds full repo context, noisier issue descriptions, and requires the model to identify which files are even relevant before attempting a fix. It more closely approximates what a coding agent encounters on a real ticket from a backlog — no one has pre-filtered the noise for you.
Fast mode: 2.5× speed, 3× cheaper than before
Fast mode is the option that changes the cost math for teams running Claude Code at scale. Enabling it on Opus 4.8 runs the model at 2.5× the throughput for 2× the per-token rate: approximately $10/M input and $50/M output.
3×
cheaper than the previous generation's fast mode
Opus 4.8 fast mode vs Opus 4.7 fast mode at equivalent throughput
The previous fast mode on Opus 4.7 ran at roughly 3× the base token rate to achieve a similar throughput multiplier. Opus 4.8 achieves 2.5× speed at only 2× rate. Net result: if you were already using Opus fast mode, switching to 4.8 cuts your fast-mode bill by about 33% while also delivering higher benchmark accuracy.
Input $/M Output $/M Speed
Sonnet 4.6 $3.00 $15.00 1×
Opus 4.8 (normal) $5.00 $25.00 1×
Opus 4.8 (fast) $10.00 $50.00 2.5×
Opus 4.7 (fast) $15.00 $75.00 ~2× ← previous generationFor interactive Claude Code sessions where latency is user-visible, fast mode on Opus 4.8 is now competitive with normal-mode Sonnet 4.6 on speed, while running a significantly more capable model. The tradeoff is about 3× the cost — worthwhile when the task complexity justifies Opus anyway.
Dynamic workflows and mid-task messages
Opus 4.8 ships alongside two Claude Code features that work specifically with this model generation. Dynamic workflows allow a parent agent to spawn up to 100 parallel subagents, each with isolated context windows and independent tool access. We covered dynamic workflows in depth separately; the short version is that fan-out parallelism is now first-class in Claude Code rather than a workaround using shell scripts.
Mid-task system messages are the less-covered addition. On the Messages API, you can now inject a new system message into a running conversation without breaking the thread. This matters for long-running agents that need runtime context updates — new environment variables, changed constraints, updated access tokens — without restarting the session and losing accumulated state.
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
// Start a long-running agent session
const thread = await client.beta.threads.create();
// ... agent runs several turns ...
// Inject updated context mid-task without restarting
await client.beta.threads.messages.create(thread.id, {
role: "system",
content: "Deployment environment changed to production. Apply stricter validation.",
});
// Agent continues with updated constraints, full history preservedWhen to use Opus vs Sonnet
The practical question after any model release is the same: for which tasks does the accuracy delta justify the cost delta? Opus 4.8 costs 66% more than Sonnet 4.6 per token in normal mode. That premium is worth paying for specific task shapes — and actively wasteful for others.
Route to Opus 4.8 when the task involves any of these signals: a multi-file refactor with no clear starting point, a security audit where false negatives have real consequences, a real-issue-with-failing-tests workflow (the SWE-bench-class task), or an extended agentic session expected to run more than 10 tool calls where accumulated context errors compound.
Opus 4.8
Multi-file bugs, security, architecture
$5/M in — SWE-bench Pro 69.2%
Sonnet 4.6
Feature additions, tests, PR review
$3/M in — 3× cheaper than Opus
Haiku 4.5
Formatting, completions, leaf agents
$0.80/M in — 6× cheaper than Sonnet
The correct mental model is an org chart, not a single hire. Claude Code's dynamic workflows make this concrete: use Opus as the orchestrator for complex root tasks, Sonnet for the specialist subagents doing bounded work, and Haiku for high-fan-out leaf tasks like per-file formatting or per-chunk summarization. The skills-hub registry includes a claude-model-router skill that automates this decision based on task complexity signals.
Upgrading: API, SDK, Claude Code
The model ID for Opus 4.8 is claude-opus-4-8. If you're on the Anthropic SDK, the upgrade is one line. If you're using Claude Code, the latest version already defaults to Opus 4.8 when Opus is selected — no config change required.
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.messages.create({
model: "claude-opus-4-8", // was: "claude-opus-4-7"
max_tokens: 8192,
messages: [{ role: "user", content: "Fix the failing test in src/auth.ts" }],
});# In your project .claude/settings.json
{
"model": "claude-opus-4-8"
}
# Or per-session via flag
claude --model claude-opus-4-8// Bedrock cross-region inference profile
const modelId = "us.anthropic.claude-opus-4-8-20260528-v1:0";Opus 4.8 is available on Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry in addition to the direct Anthropic API. If you're routing through a gateway, check your provider's model ID catalog — Bedrock uses the versioned ARN format shown above.
"A modest but tangible improvement over Opus 4.7 — the Pro benchmark gains reflect real progress on the hardest class of multi-file issues that agents encounter in production."
For most teams, the upgrade path is: update the model ID, run your existing eval suite against Opus 4.8 outputs for a week, and decide whether the fast mode cost savings justify enabling it on latency-sensitive pipelines. The base capability upgrade is backward-compatible — nothing about the API surface or skill format changed.
If you're evaluating whether to upgrade at all: the 5-point SWE-bench Pro gain is the deciding signal for agentic workflows. If your agents are primarily doing bounded, well-specified tasks, the 1-point Verified gain may not move your production metrics. If they're doing open-ended bug triage, migration work, or security review — the Pro gap predicts real improvement.
# Install the claude-model-router skill for automated model selection
npx @skills-hub-ai/cli install claude-model-router
# The skill scores task complexity signals and outputs a routing recommendation
# including cost estimates and fast-mode guidanceWritten by
Skills-Hub Team
Anthropic ecosystem coverage
Skills-Hub is the open registry for AI coding skills, 4,400+ SKILL.md files synced daily from Anthropic, Google, Microsoft, and 90+ official sources. Free + MIT.