Fable 5 · Deep dive

Fable 5's 1M-Token Context: A Practical Guide to Whole-Codebase Analysis

Fable 5 ships with a 1-million-token context window — enough to load your entire codebase without summarization tricks. Here's what actually works in practice: sizing repos, caching strategy, recall tradeoffs, and a four-phase migration playbook.

1Mtokens — load your entire codebase in one shot

By Skills-Hub Team · Anthropic ecosystem coverageJune 15, 20268 min read

Fable 51M ContextCodebase Migration

Every large-codebase task used to require a trade-off: load enough context to understand the system, or stay within the model's window and lose the big picture. Fable 5's 1-million-token context collapses that trade-off. Released June 9, 2026, it's the first production model where "load the whole repo" is a real, affordable strategy — not a benchmark flex.

In practice, though, one million tokens is not magic. There's a usable envelope, a recall curve, a cost structure, and a set of workflows where full-context wins decisively over retrieval. This guide covers all of it.

95%

SWE-bench score

generational leap from Opus 4.8's 88%

token context window

128K output per response

~830K

practical usable envelope

Claude Code reserves buffer for auto-compaction

Why 1M tokens changes codebase analysis

The core insight is simple: when a model can see all of the relevant code simultaneously, it reasons about it differently. Cross-file dependencies, implicit invariants, naming inconsistencies that span ten modules — these are things that chunked retrieval breaks into fragments and reassembles badly. Full-context doesn't.

The real-world signal on this is strong. Stripe used Fable 5 to migrate a 50-million-line codebase in a single day — work that previously took two months with traditional tooling. That's not a benchmark; that's production throughput at a company that has seen every AI coding tool.

Sizing your codebase for the window

Before loading a repo, estimate its token footprint. The usable envelope in Claude Code is approximately 830K tokens — Fable 5 reserves ~170K for the auto-compaction buffer that prevents mid-session truncation. If you exceed that, Claude Code compresses older context silently.

A fast sizing command:

Terminal

# Count tokens in the source tree (exclude artifacts)
find . -type f \( -name "*.ts" -o -name "*.tsx" -o -name "*.py" \
  -o -name "*.go" -o -name "*.rs" -o -name "*.java" \) \
  ! -path "*/node_modules/*" ! -path "*/.git/*" ! -path "*/dist/*" \
  | xargs wc -c | tail -1
# ~4 bytes per token on Fable 5 (vs ~3 bytes on Opus 4.8)

Rough rule of thumb: divide the byte count by 4 to get an approximate Fable 5 token count. If you land under 600K tokens, you're comfortably inside the usable window with room for a large system prompt and conversation history. Between 600K and 830K, you're in the working zone — use prompt caching (see below). Above 830K, trim before loading.

What to trim first

Adding a .claudeignore (same syntax as .gitignore) strips junk before it hits the context:

.claudeignore

# Always exclude
node_modules/
dist/
build/
.git/
*.lock
*.snap
coverage/
__pycache__/
*.pyc

# Usually safe to exclude
*.min.js
*.min.css
vendor/
fixtures/large-*.json

Three workflows that work

1. Whole-repo architecture review

The canonical use case: load the entire source tree, ask Fable 5 to produce an architecture map, identify coupling hotspots, and flag design violations. This is where full-context wins decisively over RAG — the model needs to see all the modules simultaneously to reason about circular dependencies and abstraction leaks.

Terminal

# Load trimmed repo and request architecture analysis
claude --model claude-fable-5 \
  "Load all source files under src/ (exclude node_modules, dist).
   Produce: (1) module dependency graph in mermaid, (2) top 5 coupling
   hotspots with file paths and line counts, (3) any abstraction leaks
   where implementation details cross module boundaries."

2. Log archaeology for incident post-mortems

Despite the large window, 1M tokens holds only about 16,000–17,000 typical log lines. The winning pattern is filter-first: extract the incident time window and the suspect services, then load the filtered slice. This captures cross-service causal chains that chunked retrieval breaks.

Terminal

# Filter logs for the incident window before loading
grep -E "2026-06-14T14:[23][0-9]" /var/log/api/*.log \
  | grep -E "(ERROR|WARN|timeout|5[0-9]{2})" \
  > /tmp/incident-slice.log

claude --model claude-fable-5 \
  "Analyze /tmp/incident-slice.log. Identify the root cause, the
   propagation chain across services, and the first failure signal.
   Output a 5-bullet post-mortem and a fix recommendation."

3. Multi-document synthesis

Loading a complete set of design documents — RFCs with their superseded predecessors, migration specs with their constraints — lets Fable 5 find contradictions and evolving assumptions that a human reading docs linearly would miss. Per Anthropic's guidance: skip RAG for document corpora under 200K tokens and use full-context loading with caching instead.

Cost reality and prompt caching

Fable 5 is priced at $10/M input tokens and $50/M output tokens. An uncached 800K-token context costs $8.00 per turn in input alone. At any meaningful frequency, that's prohibitive without caching.

70%

input cost reduction with prompt caching

Cache the static codebase snapshot; pay only for the delta on each subsequent query.

The pattern for multi-question sessions: load and cache the codebase snapshot once, then run all queries against the cached prefix. Each additional query pays only for the new question tokens and the uncached portion of the context, not the full 800K again.

Caching via the API

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

// First call: populate the cache
const response = await client.messages.create({
  model: "claude-fable-5",
  max_tokens: 8096,
  system: [
    {
      type: "text",
      text: "You are an expert codebase analyst.",
    },
    {
      type: "text",
      text: repoContents, // full repo as a string
      cache_control: { type: "ephemeral" }, // cache this prefix
    },
  ],
  messages: [{ role: "user", content: "Summarize the module architecture." }],
});

// Subsequent calls reuse the cached prefix — ~70% cheaper

For non-interactive scheduled reviews (nightly architecture audits, weekly dependency scans), the Batch API offers an additional 50% discount. The tradeoff is up to 24h latency — acceptable for scheduled work, obviously not for interactive sessions.

Recall degradation and content placement

Full-context is not perfect recall. Independent testing shows approximately a 2% effectiveness loss per 100K tokens of context depth. At 800K tokens, that's a meaningful degradation for content buried deep in the middle of the window.

The mitigation is placement strategy: put the most important content either at the very start or at the very end of the context. Content in the first 50K tokens and the last 50K tokens has near-100% recall. Content in the 400K–600K range (the "lost in the middle" zone) is where recall softens.

A four-phase migration playbook

Codebase migrations are where the 1M context window earns its premium. Here's the playbook we used for a 400K-line TypeScript monorepo migration from CommonJS to ES Modules — a task that took four hours instead of the projected two weeks.

.claude/agents/migration-planner.md

---
name: migration-planner
description: Analyzes a codebase and produces a migration plan with ordered steps.
tools:
  - Read
  - Glob
  - Bash
skills:
  - codebase-migration
---

You are a migration analyst. Load the full source tree provided.
Produce a migration plan in three sections:
1. Impact map — which files change, grouped by type of change
2. Ordered execution steps — smallest safe batches, each independently testable
3. Rollback checkpoints — git branch points after each safe batch

Phase 1: Full-context impact analysis

Load the entire codebase into Fable 5. Ask it to identify every file that will change, the nature of each change, and cross-file dependencies that create ordering constraints. This produces an impact map that would take a human three days to assemble.

Phase 2: Ordered batch planning

From the impact map, have Fable 5 produce a sequenced execution plan: smallest independent batches first, each batch ending with a passing test suite. The key is making each batch independently testable — this is the constraint that prevents migration "big bang" failures.

Phase 3: Execution with subagents

Run each batch as a Claude Code subagent loaded with only its batch's files. The parent orchestrator (also Fable 5) maintains the migration state and gates each subsequent batch on the prior batch's test suite passing. Subagents stay small; the orchestrator stays authoritative.

Phase 4: Cross-codebase validation

After all batches complete, reload the entire migrated codebase into Fable 5 for a final consistency check. Ask it to find any files missed by the batch plan and any inconsistencies introduced across batch boundaries. This full-context pass catches the gaps that batch execution misses.

We loaded the entire module graph into Fable 5 and asked for the migration plan. It came back in four minutes with a 2,000-step sequenced plan. We ran it. One day later the migration was done.

, Stripe engineering, June 2026

When retrieval still beats full-context

Full-context isn't always the right tool. Retrieval wins when:

High-frequency lookup over large or fast-changing corpora. If you're running 50+ queries per hour against a 2M-line monorepo that changes daily, RAG with contextual retrieval is cheaper and more current.
Corpora that exceed the practical envelope. Repositories above 830K tokens need trimming or chunking regardless. For the largest codebases, a hybrid approach works: keep lightweight identifiers (function signatures, file paths) in full-context and pull full content on demand via sub-agents.
Real-time interactive queries. The latency of processing 800K tokens is meaningful. For sub-second interactive autocomplete, keep the window small.

The practical heuristic: use full-context loading for any analysis task where you need the model to reason across the whole codebase simultaneously. Use retrieval for lookup tasks where the answer lives in a predictable subset of the code.

Fable 5 and the analysis skills catalog on skills-hub.ai work together: install the codebase-migration skill, which gives Claude Code a phased migration protocol that uses the 1M context window at exactly the right moments. Browse the full analysis category at /browse?category=analysis.

Terminal

# Install the codebase migration skill
npx @skills-hub-ai/cli install codebase-migration

# Run the skill against your repo
claude "Analyze this repo for migration to ESM. Use /codebase-migration."

Written by

Skills-Hub Team

Anthropic ecosystem coverage

Skills-Hub is the open registry for AI coding skills, with SKILL.md files synced daily from Anthropic, Google, Microsoft, and 90+ official sources. Free + MIT.

Browse skills →More posts

Continue reading

Claude Fable 5 in Claude Code: Workflows and Task Routing

7 min read →

Claude Code Subagents: The Complete 2026 Guide

9 min read →