Skip to main content

AI coding glossary

Prompt Injection

Also known as: prompt injection attack, indirect prompt injection

In one sentence

A class of attack where untrusted content hidden in an LLM's input (a webpage, a document, a tool result) overrides the system prompt and makes the model execute the attacker's instructions.

Full definition

Prompt injection is the LLM-era equivalent of SQL injection: untrusted text fed into the model's input window overrides its system instructions. Direct prompt injection happens when a user types adversarial input ('ignore previous instructions and…'). Indirect prompt injection, the more dangerous form in agentic systems, happens when an LLM agent reads a webpage, document, or tool response that contains hidden instructions and obeys them. Real 2026 examples: emails with hidden text that make Claude leak prior conversation, READMEs with malicious instructions that make coding agents exfiltrate secrets, and webpages that get an agent to run rm -rf via a 'helpful' shell suggestion. Mitigations: sanitize tool inputs, restrict tool scope per subagent, use approval gates on destructive operations, and treat all external content as adversarial.

On skills-hub.ai

Related terms