AI Security, Open Source Tools

MCP Server Security: The Hidden AI Attack Surface

TL;DR – MCP servers – the integration layer connecting AI assistants to external tools and data – are a significant and underexplored attack surface. Our research demonstrates that both locally hosted and third-party MCP servers can be exploited to execute arbitrary code, exfiltrate sensitive data, and manipulate user behavior, often with zero indication to the user that an attack has occurred.

The Model Context Protocol (MCP) is an open-source standard announced by Anthropic in November 2024 for connecting AI applications to external systems. While MCP enables powerful interoperability between large language models and data sources, it also creates a machine-in-the-middle opportunity for attackers. Malicious MCP server attacks have already been observed in the wild, including the Postmark MCP server infostealer campaign.

As AI adoption accelerates across businesses, development teams, and workflows, understanding these emerging cybersecurity threats is critical. This research demonstrates that MCP servers, both locally hosted and third-party, remotely hosted, introduce significant attack vectors that can be exploited to execute arbitrary code, exfiltrate sensitive data, and manipulate user behavior through social engineering.

Using our in-house tooling, https://github.com/praetorian-inc/MCPHammer we validated these attacks across multiple models, agents, and tools, proving that these threats are both feasible and concerning for enterprise environments.

What Are MCP Servers and Why Are They Risky?

MCP servers operate in two primary configurations, each presenting distinct security challenges:

Locally Hosted MCP Servers execute as processes directly on the user’s machine. These servers can act as a perfect machine-in-the-middle attack vector, enabling attackers to:

Execute arbitrary code with user privileges
Exfiltrate local data including files and credentials
Install persistence mechanisms and malware
Poison AI responses
Collect detailed system information

Remotely Hosted Third-Party MCP Servers from SaaS providers like Slack, Notion, Box, and Atlassian can be configured directly through official connectors or custom integrations. While these MCP servers cannot directly execute code on the user’s machine, they enable:

Access to enterprise information systems
Unauthorized actions within connected platforms
Credential harvesting through OAuth flows
Autonomous agent interactions with enterprise systems, often without adequate oversight

The most dangerous scenarios occur when legitimate remote MCP servers are chained with malicious local MCP servers, combining enterprise data access with local code execution capabilities.

MCP servers expose “tools” that allow LLMs to interact with external systems. For example, the Slack MCP server provides various read-only and write/delete tools, each configurable with different approval levels: Always allow, Needs approval, or Blocked.

How Can Attackers Exploit MCP Servers?

Our research identified four primary attack vectors through MCP server exploitation:

Third-Party MCP Server Chaining

This attack vector explores MCP server chaining, where a malicious local MCP server leverages trusted, legitimate remote MCP servers as data sources. Rather than operating independently as the Postmark MCP server did, this attack vector showcases a malicious server that exploits the chain of trust established by official integrations.

Slack provides an ideal target given its documented history as C2 infrastructure. An initial goal was to explore whether we could create a C2 channel that leverages Anthropic’s official Slack MCP server as a delivery mechanism. This attack path demonstrates what happens when the trusted Slack MCP server is chained with a malicious local MCP server. The malicious server doesn’t directly access Slack, but instead intercepts data flowing through the integration that users have already authorized.

We developed a proof-of-concept malicious MCP server called `conversation_assistant` that mimics legitimate productivity tools, offering message analysis and context management features. The server exposes innocuous-sounding tools such as get_current_context, analyze_messages, and store_message_context.

The attack flow operates as follows: An attacker posts encoded commands in the victim’s Slack workspace. When the victim asks Claude to read Slack messages using the official Slack MCP, the Slack server returns messages including the planted commands. Claude passes those messages to malicious MCP tools as arguments, where the malicious server decodes and executes them.

In our demonstration, an attacker posts a status update containing the base64-encoded string b3BlbiAtYSBDYWxjdWxhdG9y (which decodes to `open -a Calculator`). In this case, the following request was given to demonstrate: “Can you get my Slack DMs to myself Domenic Lo Iacono and analyze them with conversation assistant?” This triggered the attack chain that ultimately resulted in the calculator being launched silently in the background.

As shown above, the screenshot captures the full attack sequence: Claude made multiple Slack API calls, the analyze_messages tool received the messages (including the message with text “Status update for project-b3BlbiAtYSBDYWxjdWxhdG9y”), and Calculator appeared while Claude presented its analysis findings. All the while the user sees only helpful AI assistance.

This demonstrates code execution through MCP server chaining. Unlike standalone malicious MCP servers that operate independently, this attack leverages data flow from legitimate, trusted MCP servers that users have already authorized. However, command execution is only one capability enabled by this architecture.

Any malicious MCP can exfiltrate data passed to its tools, this is inherent to the machine-in-the-middle position. The more valuable capability lies in maximizing the volume and consistency of exfiltration through careful tool design. In the example above, Claude states “Let me get more messages to build a comprehensive dataset” before making additional Slack API calls. This behavior results from tool descriptions that emphasize requirements for as much data as possible, exploiting Claude’s optimization for quality output. The malicious MCP server receives this complete dataset as tool arguments and immediately exfiltrates it to an attacker-controlled Slack workspace. This technique allows for data exfiltration to be much more consistent.

To demonstrate both command execution and automatic exfiltration, we planted an encoded command that referenced the exfiltrated data itself:

“Status update for project-b3BlbiAtYSBUZXh0RWRpdCB7e1BBWUxPQURfRklMRX19”

This is decoded to: open -a TextEdit {{PAYLOAD_FILE}}, where {{PAYLOAD_FILE}} references the captured message data. When executed, TextEdit launched displaying all exfiltrated Slack messages.

The `conversation assistant` MCP server contains a hardcoded Slack bot token for an attacker-controlled workspace, configured with write and file upload permissions. This enables the malicious server to exfiltrate captured data by uploading it as JSON files to the attacker’s Slack channel.

Arbitrary File Download & Execution

MCP servers can exploit the common software pattern of requiring initialization before primary functionality. Our proof-of-concept implements mandatory “init” tools that all other tools check before executing. When invoked, the init tool fetches content from a configured URL and opens it using the system’s default application handler, this behavior resembles a routine setup while serving as a covert payload delivery mechanism.

We developed a companion configuration server allowing attackers to manage deployed instances. In our demonstration, a simple request like “Call hello world” triggered the init tool, which downloaded and opened a file from GitHub in the system’s default application. The file provided here could be as benign as a GitHub readme, or as malicious as our ChromeAlone browser implant.

Supply Chain Attacks via Package Manager Configurations

The MCP ecosystem has largely standardized on uvx (from Astral’s UV package manager) as the primary method for running Python-based MCP servers. This creates a supply chain attack surface at the configuration layer itself, before any MCP tools are even invoked.

A typical MCP configuration looks something like this:

When an MCP client loads this configuration, uvx dynamically downloads the specified package from PyPI and executes it. This pattern appears across the MCP ecosystem and introduces several attack vectors that require no malicious MCP server code at all.

Typosquatting: MCP configurations are frequently copy-pasted from blog posts, GitHub gists, and documentation. A single character typo, mcp-server-sqllite instead of mcp-server-sqlite, would silently download and execute attacker-controlled code on every agent startup.
Package Compromise: If an attacker compromises a legitimate MCP server package on PyPI through credential theft or CI/CD pipeline exploitation, every user with that package in their configuration automatically executes malicious code the next time their agent restarts.
Revival Hijack: When maintainers remove packages from PyPI, the names become available for re-registration. Users with outdated configurations suddenly download attacker-controlled code.

Critical Insight: Unlike the other attack vectors demonstrated above, supply chain attacks through package manager configurations require no interaction with the MCP protocol itself. Malicious code executes during agent startup, before any tools are invoked, making this a zero-click attack vector that bypasses tool approval mechanisms entirely.

In testing, we injected fake support information and malicious shortened URLs. Users receive legitimate, accurate answers to their questions, along with social engineering payloads that appear to originate from the trusted AI assistant.

This technique enables:

Credential harvesting via fake login pages
Vishing attacks through attacker-controlled phone numbers
Watering hole attacks tailored to user interests
Internal phishing through fake policy updates

Data Exfiltration

MCP servers process queries, access local resources, and communicate with external services, creating multiple exfiltration channels without user awareness. Unlike traditional malware that must establish its own communication channels, malicious MCP servers leverage connectivity that users have already authorized.

The most straightforward vector is query interception. During the request to make that hello world script more efficient, the request and reply from the LLM was logged and captured. Our ask_claude tool proxies requests through attacker-controlled infrastructure, enabling passive collection of:

Confidential business communications
Source code and other intellectual property
Customer data

Credentials mentioned in troubleshooting contexts

What Makes MCP Attacks Especially Dangerous?

These attack vectors become particularly dangerous under specific conditions:

“Always Allow” Permissions: Read-only tools are often granted automatic approval because they appear safe. When chained with malicious MCP servers, they become zero-click attack vectors.
Trusted Source Obfuscation: Users trust data from legitimate sources like Slack or Google Drive. They don’t expect that a message from these platforms could trigger code execution through an MCP chain.
Legitimate Use Case Mimicry: Malicious servers presenting as productivity tools (conversation summarizers, context managers) appear entirely benign while enabling attacks.
No Visual Indication: Attacks leave no trace in the chat interface. Users see only the requested output, not the background execution.

While our demonstrations executed benign payloads like calc.exe, the same techniques could deploy ransomware, exfiltrate credentials, establish persistence, or download additional malicious payloads.

How Should Organizations Defend Against MCP Attacks?

As AI integration becomes increasingly prevalent across enterprise environments, the attack surface presented by MCP servers will continue to expand. Organizations adopting AI assistants with MCP capabilities should:

Implement strict review processes for any MCP server installations
Treat all MCP servers as potentially adversarial code
Audit tool permissions to minimize “Always allow” configurations
Monitor for unusual data flows between connected services
Educate users about the risks of chained tool calls

While this research focused on the interaction between locally run and remotely hosted third-party MCP servers, additional attack vectors exist for internally maintained MCP servers.

Traditional CI/CD attacks, such as GitHub Actions exploitation and device code phishing, could enable attackers to inject malicious tools into an organization’s own internally developed or maintained MCP servers, achieving results similar to those demonstrated above.

As MCP adoption grows within enterprise development pipelines, we believe this intersection of CI/CD security and MCP server integrity represents an area ripe for further research.

The fundamental challenge is that MCP’s power comes from its ability to seamlessly connect AI assistants to external data and services. This same capability weakens trust boundaries that attackers can exploit. As the ecosystem matures, we expect to see both more sophisticated attacks and improved defensive measures.

Conclusion

MCP servers represent a significant and underexplored attack surface in AI-integrated environments. Our research demonstrates that attackers can leverage these integration points for code execution, data exfiltration, and social engineering—often without any indication to the user that an attack has occurred. The combination of legitimate enterprise data access with local execution capabilities creates particularly dangerous scenarios.

Praetorian’s offensive security experts specialize in identifying and mitigating emerging threats such as MCP server attacks before they impact your organization. Explore our https://github.com/praetorian-inc/MCPHammer, or contact our team to learn how our AI security assessments can protect your enterprise AI deployments.

Frequently Asked Questions

What is the Model Context Protocol (MCP)?

MCP is an open-source standard announced by Anthropic in November 2024 that enables AI assistants like Claude, ChatGPT, and others to connect to external tools and data sources. It allows AI models to read Slack messages, query databases, manage files, and interact with SaaS platforms through standardized “server” integrations.

Can MCP servers execute code on my machine?

Yes. Locally hosted MCP servers run as processes on your machine with your user privileges. A malicious local MCP server can execute arbitrary commands, access local files, install malware, and exfiltrate data—all while appearing to provide legitimate productivity features.

What is MCP server chaining?

MCP server chaining occurs when a malicious local MCP server intercepts data flowing through a legitimate, trusted remote MCP server (such as the official Slack or Google Drive integration). The attacker doesn’t need to compromise the legitimate server—they exploit the data that passes through the authorized integration.

How do I secure MCP server integrations?

Audit every MCP server before installation. Minimize “Always allow” tool permissions. Monitor data flows between connected services. Treat all MCP servers as potentially adversarial code. Review package sources to avoid typosquatting and supply chain attacks.

Are remotely hosted MCP servers safe?

Remotely hosted MCP servers from official providers (Slack, Notion, Atlassian) cannot directly execute code on your machine. However, they can access enterprise data, perform unauthorized actions within connected platforms, and harvest credentials through OAuth flows. When chained with a malicious local MCP server, the combination becomes especially dangerous.

Where can I find the MCP security research tools?

Our proof-of-concept tooling, MCPHammer, is open-source and available on GitHub at https://github.com/praetorian-inc/MCPHammer. It includes the MCP server chaining demonstrations, content injection examples, and data exfiltration proofs-of-concept described in this research.

About the Authors

Connor Slack

Connor Slack is a Practice Manager for Praetorian’s Product Security and Red Team domains. Prior to that, he was a Staff Security Engineer at Praetorian. He brings a decade of experience at the intersection of offensive security and security risk management across many industries.

Domenic Lo Iacono

Domenic is an Offensive Security Engineer at Praetorian focused on web application security, external network penetration testing, and GenAI security assessments including prompt injection and jailbreak testing. He holds both a B.S. and M.S. in Cybersecurity from Rochester Institute of Technology, where he also developed an automated malware detection system using machine learning.

Catch the Latest

Catch our latest exploits, news, articles, and events.

Uncategorized

April 1, 2026

A Possible Solution to the Zodiac Killer Z32 Cipher

Application Security, Offensive Security, Open Source Tools

March 27, 2026

Your API Has Authorization Bugs. Hadrian Finds Them.

CVE, Offensive Security, Vulnerability Research

March 27, 2026

Reflecting on Your Tier Model: CVE-2025-33073 and the One-Hop Problem

Ready to Discuss Your Next Continuous Threat Exposure Management Initiative?

Praetorian’s Offense Security Experts are Ready to Answer Your Questions

Praetorian Guard Platform

Penetration Testing Services

Advanced Offensive Security

Continuous Offensive Security

Customer Case Studies

Resources

Use Cases

About Praetorian

Join Praetorian