Multi-Agent Security: Why One Compromised Agent Can Take Down the Whole System - By Sourav Mishra (@souravvmishra)

In multi-agent systems, Agent A's output becomes Agent B's input with no verification. Cascade risk and how to defend.

BySourav Mishra5 min read

In most multi-agent setups, Agent A's output is piped straight into Agent B. No check that it's allowed, on-topic, or safe. So one compromised or manipulated agent can steer the rest—and the whole pipeline can exfiltrate data or run the wrong tools. Research in 2026 showed systems like CrewAI could be pushed into data exfiltration in a majority of tests when one agent was compromised; one bad agent was enough. In this post I, Sourav Mishra, explain cascade risk, why frameworks don't fix it by default, and what I do to defend: schema checks, allowlists, gatekeepers, and when not to use multiple agents at all.

Why Multi-Agent Is Riskier Than Single-Agent

Single-agent has one decision-maker. You control the prompt, the tools, and the stop condition. If something goes wrong, the blast radius is that agent's scope. Multi-agent multiplies the attack surface. Agent A's output becomes Agent B's input. If A is compromised or manipulated, it can feed B malicious instructions, out-of-scope data, or prompts that cause B to run the wrong tools. So one bad agent can take down the whole pipeline. That's cascade risk.

Frameworks make it easy to pass one agent's result to the next. The docs rarely say "validate and sanitize between agents." Default is "trust the previous agent." In production that's unsafe. So you have to add verification yourself: schema checks, allowlists for tool calls or handoff payloads, or a gatekeeper that validates handoffs before the next agent runs.

What a Cascade Looks Like

A typical cascade: Agent A (e.g. "researcher") fetches or summarizes data and passes it to Agent B (e.g. "writer" or "executor"). If A is manipulated to exfiltrate data, it can embed that data in the "summary" and send it to B, which might send it to an external tool or API. If A is compromised to suggest harmful actions, it can output instructions that B then executes. So the compromise propagates. Real 2026 incidents (EchoLeak, Drift, Kiro, Uncrew) are in my security fact-check. The pattern is the same: one weak link, broad access or unverified handoffs, mass impact.

So the fix is to break the chain: don't pass raw text from A to B. Validate structure and content; reject out-of-scope payloads; give each agent the least privilege it needs.

What I Do: Verify Handoffs

I add explicit verification between agents. Options:

Schema checks. Agent A's output must match a defined schema (e.g. JSON with allowed fields and types). If it doesn't, reject it and log. That blocks malformed or unexpected payloads from propagating.

Allowlists. Only certain tool names, actions, or topics are allowed for Agent B. If A's output suggests something off the allowlist, B doesn't run it (or a gatekeeper strips it). Reduces "agent A told B to do X" when X is out of scope.

Gatekeeper. A separate component (could be a small service or a validated step) inspects A's output before B sees it. It validates structure, checks against policy, and only forwards safe content. B only ever receives sanitized input.

In all cases I don't pass raw, unvalidated text from A to B. I also use least privilege per agent: B doesn't get credentials or permissions that were intended for A. And I log tool calls and handoffs so I can audit what happened if something goes wrong.

When Not to Use Multiple Agents

If I don't need multiple agents, I don't use them. A single agent with good tools is easier to secure and reason about. I only add a second (or third) agent when the task genuinely benefits from different roles or models—e.g. one agent for research, one for synthesis, with a clear handoff contract. And when I do, I add verification at every handoff. For the single-agent pattern with tools and step limits, see building an agentic chatbot. For the four production guardrails (step limit, least privilege, human-in-the-loop, verify handoffs), see production-ready agents.

Key Takeaways

  • Cascade risk: In multi-agent systems, Agent A's output becomes Agent B's input. One compromised or manipulated agent can steer the rest. Default in frameworks is "trust the previous agent"—unsafe in production.
  • Defense: Schema checks, allowlists, or a gatekeeper at handoffs. Don't pass raw text from A to B. Least privilege per agent; log tool calls and handoffs.
  • When to use multiple agents: Only when the task really needs different roles or models. Otherwise one agent with the right tools is simpler and safer. Agentic chatbot; production-ready agents.

Written by Sourav Mishra. Full Stack Engineer, Next.js and AI.

Frequently Asked Questions

Q: Why are multi-agent systems riskier? Agents trust each other's output by default. One compromised agent can make the next run harmful or unintended actions. Cascade = one bad output propagates through the pipeline.

Q: What's a cascade? When compromise or bad output from one agent propagates to the next—whole pipeline affected. I prevent it by validating handoffs (schema, allowlist, or gatekeeper).

Q: Do CrewAI and LangGraph require verification between agents? No. They make it easy to pass output; verification is on you. I add schema checks or gatekeepers in production. See production-ready agents.

Q: Should I use multiple agents? Only when the task really needs different roles or models. Otherwise one agent with the right tools is simpler and safer. Building an agentic chatbot has the single-agent pattern.

Share this post

Cover image for Multi-Agent Security: Why One Compromised Agent Can Take Down the Whole System

You might also like

See all