Cursor · Release deep dive

Cursor Composer 2.5: Build in Parallel, 10x Cheaper, and What Changes for Your Workflow

Cursor's third-generation agentic coding model launched May 18, 2026. Composer 2.5 matches Claude Opus 4.7 on SWE-bench at one-tenth the cost, introduces a 100-sub-agent parallel architecture, and brings targeted RL training that makes it dramatically more reliable on long-running tasks.

10xcheaper than Opus 4.7 at near-identical SWE-bench performance

By Skills-Hub Team · AI coding tool ecosystem coverageMay 23, 20268 min read

CursorComposer 2.5Build in Parallel

Cursor shipped Composer 2.5 on May 18, 2026, five days ago as of this writing. If you blinked you might have missed it in the noise of Anthropic's SpaceX compute deal and Google's Antigravity launch the same week. That would be a mistake. Composer 2.5 matches Claude Opus 4.7 on SWE-bench Multilingual at one-tenth the token cost and introduces a parallel task architecture that changes how long-running agentic work actually behaves.

This post covers what's new, how the model was built, what the numbers mean in practice, and how to adjust your workflow to take advantage of it.

What changed in Composer 2.5

Cursor describes it in one line: "better at sustained work on long-running tasks, follows complex instructions more reliably, and more pleasant to collaborate with." That's marketing. The real changes are three technical bets that shipped together.

First: a training method called targeted RL with textual feedback, where corrections are inserted at the exact decision point that went wrong rather than applied as a blanket signal across the full conversation. Second: 25x more synthetic training tasks than Composer 2, generated with a "feature deletion" technique — strip real code from repos with test suites, then train the model to reimplement it using the tests as ground truth. Third: a parallel execution architecture (more on that below) that lets the model decompose tasks into concurrent subtasks rather than grinding through them serially.

Build in Parallel: the architecture

The headline feature is what Cursor calls Build in Parallel. The underlying model uses an Agent Swarm design: when the model encounters a task it can decompose, it spawns up to 100 specialized sub-agents and delegates portions of the work concurrently. The parent coordinates; the children execute. This isn't a UI feature — it's how the model reasons about multi-step work.

The infrastructure underneath it runs Sharded Muon with a dual-grid HSDP (Hybrid Sharded Data Parallel) layout that overlaps parallel training dimensions. The practical effect: optimizer step times on Cursor's 1-trillion-parameter model came down to 0.2 seconds through asynchronous communication overlapping. The training infrastructure and the runtime architecture share the same parallel decomposition principle.

100

max parallel sub-agents

Agent Swarm decomposition

0.2s

optimizer step time

1T-param model, async overlap

25x

more synthetic training tasks

vs Composer 2

In practice, this means tasks that used to require you to manually fan out work — "implement this feature, then separately write tests, then separately update the docs" — can now be issued as a single instruction and the model will handle the decomposition. Whether it does so correctly depends on how well you've specified the task, but the capability is there in a way it wasn't in Composer 2.

How it was trained

The two training innovations are worth unpacking because they explain the behavioral improvements people are reporting.

Targeted RL with textual feedback

Standard RLHF applies a reward signal to the full trajectory. If the model calls a tool that doesn't exist, the penalty is diffused across the entire response. Targeted RL inserts a correction at the exact step: when the model attempts an unavailable tool, the training inserts a local hint — "Reminder: Available tools are…" — immediately after that decision point and continues training from there. The signal is surgical.

This is why Composer 2.5 is noticeably better at staying in bounds on complex codebases. It's not that the model learned to follow rules in general — it learned to catch itself at the specific moment it was about to break them.

Feature deletion synthetic data

To generate realistic training tasks, Cursor built a pipeline that takes real open-source repos with test suites, removes specific features from the production code, and trains the model to reimplement them using the tests as the only specification. The ground truth is testable — the model either makes the tests pass or it doesn't.

Running this at 25x the scale of Composer 2's training set produces a model that has seen an enormous variety of realistic coding tasks with machine-verifiable success criteria. That's a different quality of training data than human-labeled examples.

What the feature-deletion training task looks like

// BEFORE (in training corpus): working implementation
function parseConfig(raw: string): Config {
  return JSON.parse(raw) as Config;
}

// AFTER deletion (what the model sees):
// (function body removed — tests still present)
function parseConfig(raw: string): Config {
  // TODO: implement
}

// Model must reimplement to pass:
// ✓ test: parseConfig('{"key":"val"}') returns {key:"val"}
// ✓ test: parseConfig('invalid') throws SyntaxError

Benchmarks and pricing

SWE-bench Multilingual is the most widely cited benchmark for agentic coding as of 2026 because it tests real GitHub issue resolution across multiple programming languages, not just Python. Composer 2.5 scores 79.8%. Claude Opus 4.7 scores 80.5%. The gap is 0.7 percentage points.

79.8%

SWE-bench Multilingual

Composer 2.5 — vs 80.5% for Claude Opus 4.7 (0.7 pp gap)

The pricing delta is larger. Opus 4.7 costs roughly $15/M input and $75/M output (Anthropic API pricing). Composer 2.5 is $0.50/M input and $2.50/M output via the Cursor API — the standard variant is 30x cheaper on input and output. The fast variant ($3.00/M input, $15.00/M output) is still substantially cheaper than Opus 4.7 at comparable latency.

$0.50

per million input tokens

standard variant

$2.50

per million output tokens

standard variant

$3.00 / $15.00

fast variant (in / out)

lower latency, still cheaper than Opus

What changes for your workflow

Three workflow changes are worth making today.

1. Give it compound tasks

The old pattern with Composer 2 was to break work into single-step prompts because the model struggled to track compound state. With 2.5, issuing a prompt like "implement the auth refresh logic, write unit tests, and update the README" is more likely to produce a coherent result because the Agent Swarm handles the decomposition. You'll still want to review each piece — parallel doesn't mean correct — but the scaffolding work is gone.

2. Use acceptance criteria in every prompt

Targeted RL trained the model to respond well to in-prompt constraints. Being explicit about what done looks like — "done means all existing tests still pass and the new feature has ≥2 test cases" — gives the model the same kind of signal its training was built around. Vague prompts got vaguer from Composer 2 to 2.5; precise prompts got more precise.

Prompt structure that works well

Task: Add rate limiting to the /api/auth/token endpoint.

Acceptance criteria:
- Existing test suite passes (run: pnpm test)
- New tests cover: (a) requests within limit succeed,
  (b) requests over limit return 429 with Retry-After header
- No changes to other endpoints
- Implementation uses existing Redis client (src/lib/redis.ts)

Do not modify the database schema.

3. Stack SKILL.md files for sub-agent specialization

When Composer 2.5 decomposes a task into parallel sub-agents, each sub-agent benefits from having a narrow, well-defined scope. Loading a code-review skill from the skills-hub registry before asking for a review pass, or a unit-test skill before a test generation pass, narrows the sub-agent's behavior in exactly the way the model's training expects.

Terminal — install the parallel workflow stack

# install skills that compose well with Composer 2.5 parallel tasks
npx @skills-hub-ai/cli install unit-test
npx @skills-hub-ai/cli install code-review
npx @skills-hub-ai/cli install tech-debt

# or install the full ship-it composition (all three + orchestrator)
npx @skills-hub-ai/cli install ship-it

What's next

Cursor confirmed a collaboration with SpaceX AI on the next model, using 10x more total compute on Colossus 2's infrastructure. The timing is notable: Anthropic signed its own Colossus 1 deal the same week Composer 2.5 shipped, doubling Claude Code's rate limits in the process. Both of the major agentic coding platforms are now betting compute scale as the path to the next capability jump.

For users, the near-term implication is that Composer 2.5 is effectively a base to optimize against, not a ceiling. The parallel architecture and targeted RL training are production-proven now. Whatever ships next will run the same patterns at larger scale.

Composer 2.5 is better at sustained work on long-running tasks, follows complex instructions more reliably, and is more pleasant to collaborate with.

, Cursor engineering blog

If you're already on Cursor Pro, Composer 2.5 is available now with a double-usage launch promotion through the end of May. The fast variant is worth testing on latency-sensitive workflows — it's priced below what most teams were paying for frontier models six months ago.

For a broader view of how Composer 2.5 fits into the current AI coding landscape, see our Windsurf 2.0 deep dive and the three-way Cursor vs Windsurf vs Claude Code comparison.

Written by

Skills-Hub Team

AI coding tool ecosystem coverage

Skills-Hub is the open registry for AI coding skills, with SKILL.md files synced daily from Anthropic, Google, Microsoft, and 90+ official sources. Free + MIT.

Browse skills →More posts

Continue reading

Windsurf 2.0: Agent Command Center, Spaces & Devin Cloud

7 min read →

Claude Code Subagents: The Complete 2026 Guide to Agent Teams

9 min read →