Skip to main content

AI coding glossary

Prompt Cache

Also known as: llm prompt caching, context caching

In one sentence

A provider-side cache that reuses the model's processing of a repeated prompt prefix, cutting cost ~90% and first-token latency ~80% when the prefix is identical between calls.

Full definition

Prompt caching is the provider-side optimization that reuses the model's processing of a repeated prompt prefix. When you make multiple calls with the same long system prompt + same long context (e.g., the same codebase + the same skill instructions across an agent's turn loop), the provider caches the prefix's intermediate state. Repeat calls hit the cache and pay roughly 10% of the prefix cost and start streaming roughly 80% faster. Anthropic launched prompt caching in 2024; OpenAI and Google added it in 2025. In 2026 it's the largest single cost optimization for production AI agents, Claude Code's sub-agents leverage it heavily, as does Cursor's Composer. Effective with consistent prefix structure (system prompt + skill body + workspace context); breaks down when prompts vary.

On skills-hub.ai

Related terms