AI coding glossary
Prompt Cache
Also known as: llm prompt caching, context caching
In one sentence
A provider-side cache that reuses the model's processing of a repeated prompt prefix, cutting cost ~90% and first-token latency ~80% when the prefix is identical between calls.
Full definition
Prompt caching is the provider-side optimization that reuses the model's processing of a repeated prompt prefix. When you make multiple calls with the same long system prompt + same long context (e.g., the same codebase + the same skill instructions across an agent's turn loop), the provider caches the prefix's intermediate state. Repeat calls hit the cache and pay roughly 10% of the prefix cost and start streaming roughly 80% faster. Anthropic launched prompt caching in 2024; OpenAI and Google added it in 2025. In 2026 it's the largest single cost optimization for production AI agents, Claude Code's sub-agents leverage it heavily, as does Cursor's Composer. Effective with consistent prefix structure (system prompt + skill body + workspace context); breaks down when prompts vary.