Skip to main content

Claude · Model release

Claude Opus 4.8: 88.6% SWE-bench, Cheaper Fast Mode, and What Developers Need to Know

Opus 4.8 lands at 88.6% SWE-bench Verified and 69.2% on the harder Pro set — nearly 5 points ahead of 4.7. Here's what the benchmark gap means in practice, how the 3× cheaper fast mode changes cost math, and when to route tasks to Opus vs Sonnet.

88.6%SWE-bench Verified — all-time high at launch
By Skills-Hub Team · Anthropic ecosystem coverage8 min read
Claude Opus 4.8SWE-benchModel Release

Anthropic released Claude Opus 4.8 on May 28, 2026. The headline number is 88.6% on SWE-bench Verified, a one-point gain over Opus 4.7's 87.6%. That sounds incremental. The more interesting number is on SWE-bench Pro: 69.2% versus 64.3% — a 4.9-point jump on the harder evaluation that better predicts real-world coding agent performance. The gap between 4.7 and 4.8 is larger than the headline suggests, and it shows up exactly where it matters most.

Pricing is unchanged at $5/M input and $25/M output. But fast mode — which runs at 2.5× speed for 2× the token rate — is now three times cheaper than the equivalent mode on previous-generation models. For teams running high-throughput pipelines, that math changes the build.

The 4.7 → 4.8 benchmark gap

SWE-bench Verified is the industry's de facto coding benchmark: 500 real GitHub issues, each with a failing test suite that the model must fix. Opus 4.7 scored 87.6%. Opus 4.8 scores 88.6% — the highest any model has posted at general availability. But the one-point delta undersells what changed.

88.6%

SWE-bench Verified

vs 87.6% for Opus 4.7

69.2%

SWE-bench Pro

+4.9 pts over Opus 4.7

93.6%

GPQA Diamond

graduate-level reasoning

SWE-bench Pro is the same format but harder: the problems require navigating larger codebases, the tests are less forgiving, and fewer hints appear in issue descriptions. A 5-point jump there is substantial. TrueFoundry ran Opus 4.8 and 4.7 head-to-head on their own gateway infrastructure (50 real issues) and saw 50/50 versus 47/50 — a 6-point gap on their set, directionally consistent with Anthropic's Pro numbers.

Terminal-Bench 2.1, the agentic coding evaluation that measures multi-turn tool use across a shell session, puts Opus 4.8 at 74.6%. GDPval-AA, the alignment-aware evaluation that discounts outputs that are helpful but dishonest, scores 1890 Elo — a measurable honesty improvement over 4.7.

SWE-bench Verified vs Pro: what they measure

The distinction matters when you're deciding whether to upgrade. SWE-bench Verified samples from a cleaned, human-verified subset of the full SWE-bench dataset. Problems are real but bounded: the issue title usually contains a strong signal, the failing test is often directly related to the description, and the repo context is manageable.

SWE-bench Pro adds full repo context, noisier issue descriptions, and requires the model to identify which files are even relevant before attempting a fix. It more closely approximates what a coding agent encounters on a real ticket from a backlog — no one has pre-filtered the noise for you.

Fast mode: 2.5× speed, 3× cheaper than before

Fast mode is the option that changes the cost math for teams running Claude Code at scale. Enabling it on Opus 4.8 runs the model at 2.5× the throughput for 2× the per-token rate: approximately $10/M input and $50/M output.

cheaper than the previous generation's fast mode

Opus 4.8 fast mode vs Opus 4.7 fast mode at equivalent throughput

The previous fast mode on Opus 4.7 ran at roughly 3× the base token rate to achieve a similar throughput multiplier. Opus 4.8 achieves 2.5× speed at only 2× rate. Net result: if you were already using Opus fast mode, switching to 4.8 cuts your fast-mode bill by about 33% while also delivering higher benchmark accuracy.

Cost comparison
                    Input $/M   Output $/M   Speed
Sonnet 4.6           $3.00       $15.00        1×
Opus 4.8 (normal)    $5.00       $25.00        1×
Opus 4.8 (fast)     $10.00       $50.00       2.5×
Opus 4.7 (fast)     $15.00       $75.00        ~2×   ← previous generation

For interactive Claude Code sessions where latency is user-visible, fast mode on Opus 4.8 is now competitive with normal-mode Sonnet 4.6 on speed, while running a significantly more capable model. The tradeoff is about 3× the cost — worthwhile when the task complexity justifies Opus anyway.

Dynamic workflows and mid-task messages

Opus 4.8 ships alongside two Claude Code features that work specifically with this model generation. Dynamic workflows allow a parent agent to spawn up to 100 parallel subagents, each with isolated context windows and independent tool access. We covered dynamic workflows in depth separately; the short version is that fan-out parallelism is now first-class in Claude Code rather than a workaround using shell scripts.

Mid-task system messages are the less-covered addition. On the Messages API, you can now inject a new system message into a running conversation without breaking the thread. This matters for long-running agents that need runtime context updates — new environment variables, changed constraints, updated access tokens — without restarting the session and losing accumulated state.

Mid-task system message (Messages API)
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

// Start a long-running agent session
const thread = await client.beta.threads.create();

// ... agent runs several turns ...

// Inject updated context mid-task without restarting
await client.beta.threads.messages.create(thread.id, {
  role: "system",
  content: "Deployment environment changed to production. Apply stricter validation.",
});

// Agent continues with updated constraints, full history preserved

When to use Opus vs Sonnet

The practical question after any model release is the same: for which tasks does the accuracy delta justify the cost delta? Opus 4.8 costs 66% more than Sonnet 4.6 per token in normal mode. That premium is worth paying for specific task shapes — and actively wasteful for others.

Route to Opus 4.8 when the task involves any of these signals: a multi-file refactor with no clear starting point, a security audit where false negatives have real consequences, a real-issue-with-failing-tests workflow (the SWE-bench-class task), or an extended agentic session expected to run more than 10 tool calls where accumulated context errors compound.

Opus 4.8

Multi-file bugs, security, architecture

$5/M in — SWE-bench Pro 69.2%

Sonnet 4.6

Feature additions, tests, PR review

$3/M in — 3× cheaper than Opus

Haiku 4.5

Formatting, completions, leaf agents

$0.80/M in — 6× cheaper than Sonnet

The correct mental model is an org chart, not a single hire. Claude Code's dynamic workflows make this concrete: use Opus as the orchestrator for complex root tasks, Sonnet for the specialist subagents doing bounded work, and Haiku for high-fan-out leaf tasks like per-file formatting or per-chunk summarization. The skills-hub registry includes a claude-model-router skill that automates this decision based on task complexity signals.

Upgrading: API, SDK, Claude Code

The model ID for Opus 4.8 is claude-opus-4-8. If you're on the Anthropic SDK, the upgrade is one line. If you're using Claude Code, the latest version already defaults to Opus 4.8 when Opus is selected — no config change required.

SDK upgrade
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-opus-4-8",   // was: "claude-opus-4-7"
  max_tokens: 8192,
  messages: [{ role: "user", content: "Fix the failing test in src/auth.ts" }],
});
Claude Code — set model explicitly
# In your project .claude/settings.json
{
  "model": "claude-opus-4-8"
}

# Or per-session via flag
claude --model claude-opus-4-8
Amazon Bedrock
// Bedrock cross-region inference profile
const modelId = "us.anthropic.claude-opus-4-8-20260528-v1:0";

Opus 4.8 is available on Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry in addition to the direct Anthropic API. If you're routing through a gateway, check your provider's model ID catalog — Bedrock uses the versioned ARN format shown above.

"A modest but tangible improvement over Opus 4.7 — the Pro benchmark gains reflect real progress on the hardest class of multi-file issues that agents encounter in production."
, Anthropic on 4.8

For most teams, the upgrade path is: update the model ID, run your existing eval suite against Opus 4.8 outputs for a week, and decide whether the fast mode cost savings justify enabling it on latency-sensitive pipelines. The base capability upgrade is backward-compatible — nothing about the API surface or skill format changed.

If you're evaluating whether to upgrade at all: the 5-point SWE-bench Pro gain is the deciding signal for agentic workflows. If your agents are primarily doing bounded, well-specified tasks, the 1-point Verified gain may not move your production metrics. If they're doing open-ended bug triage, migration work, or security review — the Pro gap predicts real improvement.

Install the model-routing skill
# Install the claude-model-router skill for automated model selection
npx @skills-hub-ai/cli install claude-model-router

# The skill scores task complexity signals and outputs a routing recommendation
# including cost estimates and fast-mode guidance

Written by

Skills-Hub Team

Anthropic ecosystem coverage

Skills-Hub is the open registry for AI coding skills, 4,400+ SKILL.md files synced daily from Anthropic, Google, Microsoft, and 90+ official sources. Free + MIT.

Continue reading