Prompt Injection Defenses Adotta

security agents governance llm-security owasp prompt-injection indirect-prompt-injection rag-security tool-security

Mai 2026

Overview

Prompt injection is a primary risk for LLM applications because user input, retrieved documents, web pages, emails, images, tool outputs, and memory can all contain instructions that compete with the developer’s intended policy. OWASP distinguishes direct prompt injection, where the user input alters model behavior, from indirect prompt injection, where external content such as websites or files changes model behavior when interpreted by the model (OWASP LLM01).

Defenses must be layered because OWASP explicitly notes that foolproof prevention is unclear given how generative models work (OWASP LLM01). Strong systems separate instructions from data, treat remote content as untrusted, constrain tools with least privilege, validate outputs, and require human approval for high-risk actions.

This belongs in Adopt for any production LLM, RAG, or agent system. Prompt-only defenses are insufficient; the security boundary must include retrieval, tool execution, identity, authorization, logging, and downstream output handling.

Adoption Signals

OWASP lists prompt injection as LLM01 in the 2025 Top 10 for LLM Applications, with impacts including sensitive information disclosure, unauthorized function access, arbitrary command execution in connected systems, and manipulation of decision-making processes (OWASP LLM01).
OWASP’s cheat sheet documents direct, remote/indirect, encoded, typoglycemia, HTML/Markdown, multimodal, RAG poisoning, and agent-specific attacks, reflecting the breadth of attack surfaces in modern LLM applications (OWASP Cheat Sheet Series).
Recommended mitigations now include input validation, structured prompts with instruction/data separation, output monitoring, human-in-the-loop controls, remote-content sanitization, model-based guardrails, and least-privilege tool access (OWASP Cheat Sheet Series).
OWASP recommends segregating and identifying external content so untrusted text is clearly separated from privileged instructions (OWASP LLM01).
OWASP recommends adversarial testing and attack simulations that treat the model as an untrusted user when testing trust boundaries and access controls (OWASP LLM01).

Risks

No single filter is enough. Attackers can use obfuscation, encoding, hidden markup, multi-turn setup, tool-output poisoning, and RAG poisoning to bypass simple keyword checks (OWASP Cheat Sheet Series).

Agents increase blast radius. If the model can call tools, write files, send messages, query private systems, or persist memory, a successful injection can become a real action rather than a bad answer.

Guardrails can be attacked too. OWASP notes that guardrail models are themselves susceptible to prompt injection, so they should be one layer in a defense-in-depth design rather than the only control (OWASP Cheat Sheet Series).

Overblocking is a product risk. Strict filtering can break legitimate workflows, so teams need task-specific risk scoring, user experience fallbacks, escalation paths, and continuous evals for both security and usefulness.

Pros & Cons

Advantages

Reduces risk from malicious instructions in retrieved content, tool outputs, and user input.
Encourages layered controls such as isolation, allowlists, permissions, and output checks.
Improves confidence for agents that can access sensitive systems or perform actions.

Disadvantages

No single defense fully solves prompt injection across all contexts.
Overly strict filters can block legitimate workflows or reduce answer quality.
Controls must evolve as attackers target tools, memory, and context pipelines.

Recommendation

Adopt defense in depth for every production LLM workflow: instruction/data separation, remote-content quarantine, least-privilege tools, scoped credentials, parameter validation, output validation, action allowlists, rate limits, audit logs, and human approval for high-impact actions. For agentic systems, validate every tool call against the original user intent and current permissions before execution.

Treat prompt injection as an application-security problem, not a better-prompt problem. Include adversarial evals, red-team tests, incident runbooks, and regression tests for known injection patterns whenever prompts, retrievers, tools, models, or memory behavior change.