How do I stop tool poisoning in MCP?

Tool poisoning hides instructions inside the tool metadata that the model reads but humans rarely review. Defend against it by treating every tool description as untrusted input, pinning tool definitions to a reviewed version, diffing them on every update, and never auto-trusting tools added at runtime. Keep a human in the loop before a newly discovered tool can act.

MCP Server Security Checklist 2026

Q: What is an MCP server and why is it a security risk?

An MCP (Model Context Protocol) server exposes tools an AI agent can call, like reading a database, sending an email, or executing code. It is a risk because the agent will follow instructions it reads, and a malicious or misconfigured server can hand the agent dangerous tools, poisoned descriptions, or overbroad access. Researchers this year found thousands of MCP servers reachable on the public internet, many with no authentication.

Q: Do MCP servers need authentication?

Yes. Any MCP server that exposes a real tool should require authentication and should not be reachable from the open internet without it. Put it behind an identity layer, scope each client to only the tools it needs, and rotate credentials. An unauthenticated server with write access is one prompt injection away from being abused.

Why MCP became an attack surface this year

The Model Context Protocol, or MCP, is a standard way to give an AI agent a set of tools. A tool can be anything: read a row from a database, look up a customer, send an email, run a shell command, move money. The agent reads a list of available tools, decides which to call, and the MCP server runs it. That single idea is why agents went from chatbots to systems that do real work this year.

It is also why agents became a real target. The moment an agent can call a tool, two things are true at once. The agent will do what it is told, and it cannot reliably tell the difference between an instruction from you and an instruction smuggled into the data it is reading. Connect that to a tool that writes, deletes, or pays, and a single poisoned web page or email can turn your helpful assistant into someone else's.

This is not theoretical anymore. Through the first half of this year, security researchers scanned the public internet and reported thousands of MCP servers reachable directly, with a large share requiring no authentication at all. Separate research into agentic systems found that prompt injection, instructions hidden in data the agent reads, remained the single most common cause of real security failures in production. Both findings point at the same gap: teams shipped the tool layer faster than they secured it.

You do not fix this with one setting. You fix it by hardening the boundary between the agent and the tools it can reach. Below is the checklist we walk through before connecting any agent to an MCP server. It applies whether you wrote the server yourself or pulled it off the shelf.

1. Put authentication in front of every server

Start with the most basic failure and the most common one. An MCP server that exposes a real tool should never be reachable without authentication, and it should not sit on the open internet at all unless it has to. Treat it like a database, not like a public web page.

Concretely: every client that connects should present a credential, that credential should map to a specific identity, and each identity should be limited to the tools it actually needs. If your agent only reads support tickets, its credential should not be able to call the refund tool. Rotate these credentials on a schedule and revoke them the moment a client is decommissioned. An unauthenticated server with write access is one bad prompt away from being abused by anyone who finds it.

2. Treat tool descriptions as untrusted input

This is the attack that catches teams off guard, because it hides in a place humans almost never look. Every MCP tool ships with a description: a short block of text that tells the model what the tool does and how to call it. The model reads that text and follows it. Tool poisoning is the practice of hiding instructions inside that description, instructions the model obeys but a reviewer skimming a UI never sees.

A poisoned tool can tell the model to silently copy a secret into one of its arguments, to call a second tool first, or to ignore an earlier safety instruction. Because the metadata is machine-facing, it is the highest-leverage place to attack an agent and the lowest-visibility place to defend.

Defend it the way you would defend any input you do not control:

Pin tool definitions to a reviewed version. Do not let a server quietly change what a tool claims to do.
Diff every change to a tool description before it reaches production, the same way you review a code change.
Never auto-trust runtime tools. A tool that appears after the session starts should not be callable until a human has looked at it.
Read the full metadata, not the rendered label. The dangerous part is usually the part the UI hides.

3. Scope every tool to least privilege

Overbroad permissions are the quiet killer. A tool that was built to read one table ends up with credentials for the whole database. An email tool can reach any address instead of a fixed list. When something goes wrong, the blast radius is whatever that tool could touch, not whatever the task needed.

For each tool, write down the smallest set of actions it needs and cut everything else. A lookup tool gets read-only access to one resource. A payment tool gets a per-call cap and an allowlist of destinations. A code tool runs as a user that owns nothing important. The goal is simple: if an attacker takes control of one tool, they should inherit almost nothing.

4. Vet and pin third-party servers

Pulling a community MCP server into your stack is a dependency decision, and you should treat it like one. You would not run an unreviewed package with production credentials, and a third-party tool server is the same risk wearing a friendlier name. The difference is that this dependency can read your data and act on your behalf.

Before a third-party server touches anything real:

Pin a specific version. Floating to latest means trusting a future change you have not read.
Read the tool definitions yourself. If you cannot see what the tools do, you cannot trust them.
Give it no standing credentials. Hand it short-lived, scoped access at call time, not a key that lives forever.
Watch what it sends. A tool server that phones home to an address you do not recognize is a finding, not a feature.

5. Sandbox tool execution

Some tools run code, shell commands, or queries. Those should never run with direct access to your host, your secrets store, or your internal network. Run them in a sandbox: a container or isolated environment with a tight allowlist of what it can read, write, and reach.

The point of the sandbox is to assume the tool will be misused and make that survivable. If a code tool gets a malicious instruction, the worst case should be a wrecked throwaway container, not a path into your production systems. Rebuild the environment between runs so nothing an attacker leaves behind carries into the next call.

6. Control what the agent can reach

Two boundaries matter here: where the agent can send data, and which secrets it can see.

On the network side, default to deny. An agent and its tools should only reach the specific endpoints the task needs. This is your backstop against exfiltration: even if a prompt injection convinces the agent to leak data, it has nowhere to send it. On the secrets side, the agent should never receive a raw API key or private key in its context. Keep secrets in the tool layer, inject them at call time, and make sure they never appear in a prompt, a log, or a tool argument the model can echo back.

7. Log every tool call

You cannot defend what you cannot see. Log every tool call the agent makes: which tool, which arguments, which identity, what came back, and when. Keep those logs somewhere the agent cannot edit.

Good logging does two jobs. It lets you catch a compromise while it is happening, by alerting on unusual patterns like a sudden burst of writes or a tool being called far more than normal. And it lets you reconstruct exactly what happened after the fact, which is the difference between a one-line incident note and a week of guessing. If an agent ever does something it should not, the tool log is your single source of truth.

8. Separate instructions from data

The root cause behind most agent compromises is that the model treats everything in its context as equally authoritative. Your system instructions, the user's request, and the contents of a web page the agent just fetched all arrive as text, and the model has no built-in sense of which one is allowed to give orders.

You cannot make that separation perfect, but you can make it much stronger. Keep your trusted instructions in a channel the model is told to prioritize. Wrap untrusted content, fetched pages, emails, documents, and tool outputs, in clear markers that label it as data to consider, not commands to follow. And design the system so that no amount of clever text inside that data can grant new permissions. The permissions live in the tool layer, where text cannot rewrite them.

9. Put a human gate on irreversible actions

Not every action deserves the same trust. Reading a record is cheap to get wrong. Sending money, deleting data, emailing a customer, or signing a transaction is not. For that second category, the agent should propose and a human should approve.

A good gate is specific: it shows the human exactly what will happen, the tool, the arguments, the effect, in plain language, and requires a deliberate yes. It is the cheapest insurance you can buy, because it converts a silent automated mistake into a question someone can catch. As agents get more autonomous, the discipline is knowing which actions you are willing to let run unattended and which you are not.

The checklist, in order

Here is the order we work through before connecting an agent at Shazra Labs to any MCP server:

Authentication on every server. No tool reachable without an identity.
Tool descriptions reviewed and pinned. No runtime tool trusted automatically.
Every tool scoped to least privilege. Read-only stays read-only.
Third-party servers version-pinned, read, and given no standing credentials.
Code and command tools sandboxed, rebuilt between runs.
Network egress default-deny. Secrets kept out of the model's context.
Every tool call logged to a place the agent cannot edit.
Untrusted data clearly separated from trusted instructions.
A human gate on every irreversible action.

None of these steps is exotic. They are the same instincts good engineers already use for any system that touches money or data, applied to a layer that got built in a hurry. The teams that get burned this year are not the ones that hit an unknowable zero-day. They are the ones that shipped an agent with a tool that could do too much, reachable by too many, watched by no one.

Where this connects to the rest

MCP security is the tool-layer half of a bigger picture. The agent-side practices, least privilege at the model level, instruction and data separation, red teaming for indirect injection, sit alongside it. If you want that side, we wrote it up in the AI agent security checklist. And if your agent holds a wallet, the on-chain controls matter even more, which we covered in giving an agent a wallet without getting drained.

Put together, the rule is the same at every layer. Assume the agent will be tricked, and make sure that when it is, the damage it can do is small, visible, and reversible.

FAQ

What is an MCP server and why is it a security risk?
It is a server that exposes tools an AI agent can call, like reading a database or sending an email. It is a risk because the agent follows instructions it reads, so a misconfigured or malicious server can hand it dangerous tools, poisoned descriptions, or too much access. Researchers this year found thousands of MCP servers exposed online, many with no authentication.

How do I stop tool poisoning?
Tool poisoning hides instructions in the tool metadata the model reads but humans rarely review. Treat every description as untrusted, pin it to a reviewed version, diff every change, and never auto-trust a tool added at runtime.

Should I use third-party MCP servers?
You can, but treat them like any dependency you would not run unreviewed. Pin a version, read the tool definitions, run them sandboxed with no standing credentials, and restrict what they can reach on the network.

Do MCP servers need authentication?
Yes. Any server exposing a real tool should require authentication and should not be open to the internet without it. Scope each client to only the tools it needs and rotate credentials.

MCP Server Security · The Checklist for Connecting Agents to Tools Safely