Toxic Flow Analysis for AI Assess

security governance agents ai-security threat-modeling prompt-injection mcp data-exfiltration tool-security least-privilege

May 2026

Overview

Toxic flow analysis is a threat-modeling technique for AI agents and tool-using LLM applications. It focuses on dangerous runtime paths where untrusted instructions can influence an AI system that also has access to sensitive data and a way to act externally, such as sending data, calling APIs, creating pull requests, browsing the web, or invoking MCP tools. Invariant Labs describes toxic flow analysis as a framework for constructing flow graphs of agent systems, modeling tool sequences with trust, sensitivity, and exfiltration-sink properties, then scoring flows that could lead to security violations at runtime (Invariant Labs: Toxic Flow Analysis).

The core risk maps closely to Simon Willison's lethal-trifecta framing: private data, exposure to untrusted content, and the ability to communicate externally. If an agent combines those three capabilities, malicious content can trick the system into reading private data and sending it to an attacker (Simon Willison: The lethal trifecta). OWASP similarly treats prompt injection as a top LLM application risk because direct or indirect prompts can lead to sensitive information disclosure, unauthorized tool access, arbitrary command execution, and manipulation of critical decisions (OWASP LLM01: Prompt Injection).

The reason to classify toxic flow analysis as Assess is that the underlying risk is real and recurring, but the practice is still becoming standardized. It should be evaluated for agentic systems, MCP deployments, RAG pipelines, browser agents, enterprise copilots, and workflow automation where private data and external actions meet untrusted input. It should not be treated as a standalone security product or a replacement for least privilege, deterministic authorization, sandboxing, monitoring, and human approval for high-risk actions.

Adoption Signals

Invariant Labs and Snyk introduced toxic flow analysis in July 2025 as a framework for identifying toxic flows in agentic systems and MCP servers, with an early preview available through MCP-scan (Invariant Labs: Toxic Flow Analysis, Snyk Labs: Toxic Flow Analysis).
The technique aligns with source-sink analysis used in agent security: OpenAI describes prompt-injection-resistant agent design in terms of an attacker-controlled source and a risky sink, such as transmitting information to a third party, following a link, or interacting with a tool (OpenAI: Designing agents to resist prompt injection).
OWASP's 2025 LLM guidance explicitly recommends least privilege, segregating external content, validating outputs, human approval for high-risk actions, and adversarial testing, all of which map naturally to toxic-flow modeling and test generation (OWASP LLM01: Prompt Injection).
MCP-specific attacks have made flow-level modeling more urgent. OWASP describes MCP tool poisoning as an indirect prompt-injection attack where external tool responses can cause an agent to call restricted tools, read sensitive files, or send data to an attacker-controlled endpoint (OWASP: MCP Tool Poisoning).
Microsoft's MCP guidance highlights indirect prompt injection, tool poisoning, hosted-server rug pulls, prompt shields, spotlighting, datamarking, and supply-chain controls, reinforcing the need to classify trust boundaries and tool response channels rather than reviewing prompts alone (Microsoft: Protecting against indirect prompt injection attacks in MCP).
OpenAI's product guidance emphasizes layered controls such as sandboxing, logged-out modes, link approvals, confirmations before consequential actions, and limiting agent access to only the sensitive data or credentials needed for the task (OpenAI: Understanding prompt injections).

Risks

The analysis can become stale quickly. Tool definitions, MCP servers, hosted tool metadata, permissions, prompts, and connected data sources can change after review; Microsoft specifically warns that hosted MCP tool definitions can be dynamically amended after a user previously approved them, creating a rug-pull risk (Microsoft: Protecting against indirect prompt injection attacks in MCP).
Prompt-level defenses are insufficient. OWASP states there are no foolproof methods to prevent prompt injection, and recommends layered mitigations such as least privilege, external-content segregation, output validation, user approval, and adversarial testing (OWASP LLM01: Prompt Injection).
Internal and external tools can collapse into one trust zone. OWASP's MCP tool-poisoning guidance warns that risk increases when external MCP servers and privileged internal tools share the same agent context, because an untrusted tool response can trigger trusted tools unless enforcement happens outside the model (OWASP: MCP Tool Poisoning).
False assurance is a major operational risk. A flow graph that lists forbidden data-action paths does not guarantee enforcement unless those paths are backed by server-side authorization, scoped credentials, network restrictions, structured tool outputs, logging, and review gates.
Coverage is hard in agentic systems. Invariant notes that any combination of available tools may be used at runtime, so teams need to reason about tool combinations and not just individual tools or static prompts (Invariant Labs: Toxic Flow Analysis).
Human approval can be bypassed if the approval surface is weak. Confirmation prompts need to show the actual action, recipient, destination, and data being transmitted outside the LLM context; otherwise users may approve an action without seeing the toxic flow it completes.

Pros & Cons

Advantages

Gives teams a concrete way to model dangerous agent paths that combine untrusted content, sensitive data, and external communication or privileged tool actions.
Extends threat modeling beyond prompt-level defenses by representing tools, data sources, trust boundaries, and exfiltration sinks as analyzable flows.
Fits emerging agent and MCP security practices, including least-privilege tools, isolated privileged contexts, structured tool outputs, runtime monitoring, and adversarial tests.

Disadvantages

The technique is still early and mostly appears through research previews, scanning tools, and design patterns rather than mature platform standards.
Flow models can miss real runtime behavior if tool metadata, MCP servers, permissions, prompts, or connected data sources change after review.
Toxic-flow findings are only useful when engineering teams can enforce controls at the tool, identity, network, and approval layers rather than relying on the model to follow policy text.

Recommendation

Assess toxic flow analysis for any AI system that can read private or enterprise data, process untrusted content, and use tools that communicate externally or mutate state. Start with high-risk workflows: MCP-enabled desktops, coding agents with repository and network access, RAG systems over confidential documents, browser agents, email or calendar assistants, customer-support agents, workflow automations, and agents that can call internal APIs. The output should be a small set of explicit forbidden flows, such as "untrusted web page -> private CRM record -> outbound email," plus the controls that make each path impossible or require audited approval.

Implement it as an engineering practice rather than a one-time scan. Inventory data sources, label content trust levels, classify tools by privilege and sink behavior, separate untrusted retrieval from privileged action contexts, and require structured tool outputs where possible. Enforce least privilege at the tool execution layer, use scoped service identities instead of user-wide credentials, block or review network egress, and keep privileged tools out of the same context as arbitrary external MCP responses.

Use toxic-flow scenarios as tests. For each critical workflow, create adversarial fixtures that simulate malicious emails, documents, tickets, web pages, repository issues, MCP tool responses, and RAG chunks. The expected result should be refusal, safe summarization, blocked egress, scoped data access, or a user confirmation that clearly displays the destination and data being shared. Move toward Adopt only when those controls are automated, monitored, and tied to security ownership rather than depending on prompt wording alone.