Uncategorized

Ghost Calls: Abusing Web Conferencing for Covert Command & Control (Part 1 of 2)

August 6, 2025

In the middle of a particularly tight red team engagement, we hit a familiar wall. Our long-term implant was rock solid—quiet, persistent, and thoroughly under the radar. But when it came time to pivot into something more interactive—proxy traffic, tunnel HVNC, relay NTLM—we started running into limits. The channel that worked so well for low-and-slow operations wasn’t built for real-time interaction.

That experience stuck with us. What we needed wasn’t just a faster channel—it was a different kind of channel entirely. One that could light up briefly, handle the burst of activity, and disappear when the job was done. We started looking for cover in plain sight. Where, in a modern enterprise network, do you find short-lived, high-bandwidth traffic that no one thinks twice about? The answer was obvious: web conferencing. Zoom. Microsoft Teams. Encrypted, high-volume, and everywhere.

So we asked ourselves: what if joining a web conferencing call was all it took to spin up an interactive C2 tunnel?

This blog post series answers that question. It walks through our research, our approach, and the tool we built—TURNt—to tunnel real-time command and control traffic through the same infrastructure that powers your daily standups.

We’re releasing TURNt shortly after our talk, Ghost Calls: Abusing Web Conferencing for Covert Command & Control, presented at Black Hat USA 2025. The conference will be the first place this technique is shared publicly—this post serves as a deeper dive for those who want to dig into the details.

Note: This post is part of a multi-part series. Think of this series as a technical deep dive—a brain dump of some of the more interesting artifacts and lessons we uncovered along the way. It’s meant for operators, researchers, and anyone curious about the internals of web conferencing tech and how it can be creatively abused for post-exploitation operations.

In part one, we explore the internals and technical architecture of web conferencing applications with a particular focus on Zoom as it’s the most popular web conferencing application (and the first one we happened to look into). In part two, we explore our implementation of tunneling traffic through trusted web conferencing providers using the TURN protocol.

Note: We were informed that Zoom pushed a mitigation for this particular technique which only allows TURN infrastructure to pair a client with a media-server and disabled support for peer-to-peer connections through the infrastructure. This wouldn’t be possible in scenarios where providers leverage this infrastructure for peer-to-peer communication, but may be possible in certain instances where this infrastructure is only used for communication with a centralized server. At the moment, we haven’t tested if any workarounds would be possible in this scenario.

Why Web Conferencing Protocols?

Once we realized web conferencing traffic might be the perfect cover, we started digging into what makes it so useful from both a performance and evasion standpoint. These platforms—Zoom, Microsoft Teams, and others—generate a ton of real-time, encrypted media traffic. They use both TCP and UDP, often over common ports like 443, and rely heavily on relay infrastructure such as TURN servers to route traffic across NAT and firewall boundaries.

The beauty of these solutions is that they are designed to function even in environments with relatively strict egress controls. In our analysis, we observed them attempting to use multiple methods to egress through client networks. These included traditional UDP traffic as well as tunneling traffic over TCP using the TURN protocol to bypass UDP-related egress controls. Zoom’s documentation even mentions a service called “Zoom HTTP Tunnel” for tunneling outbound as well.

Additionally, this traffic is often end-to-end encrypted using AES or other strong encryption. This means the traffic is naturally heavily obfuscated and impossible to analyze in depth which makes it a perfect place to hide as an attacker. These solutions tend to use a fairly large amount of bandwidth due to the nature of the low-latency connections it establishes.

The chef’s kiss was discovering that both Zoom and Microsoft Teams documentation recommend excluding their respective subdomains from TLS inspection. Additionally, they suggest configuring split-tunneling so that their traffic bypasses the corporate VPN reducing load on VPN infrastructure. This guidance implies that many commercial proxies and similar solutions might include these services’ IP ranges in their default whitelists—though we weren’t able to confirm this definitively during our research.

Built-In Cover Traffic

The beauty of web conferencing traffic is that it’s noisy by design. A user joining a meeting creates a burst of outbound UDP or TCP packets. High-frequency, sustained traffic to a handful of remote IPs is the norm. Trying to distinguish a real Zoom call from an imposter tunnel at a network level is extremely difficult, especially when everything is encrypted and routed through vendor infrastructure.

In fact, the architecture of these platforms helps us. Most conferencing traffic is relayed through large globally distributed networks of proxy-like servers. This means there’s no persistent direct peer-to-peer connection to a known endpoint, which is exactly the kind of ambiguity we desire as attackers.

WebRTC is commonly used across all major web-conferencing platforms and it provides an excellent source of cover traffic for communication between nodes. Modern platforms leveraging WebRTC typically don’t leverage a peer-to-peer architecture and instead leverage WebRTC with a Selective Forwarding Unit (SFU) which is a centralized node that is used by all clients to consume and send video streams. This is useful as it addresses the scaling problem introduced by peer-to-peer traffic.

Other Channels We Considered

Before committing to this approach, we explored a few other options. One strong candidate was HTTP/3, which runs over QUIC. QUIC’s support for encrypted ClientHello and its out-of-order, UDP-based transport makes it a good fit for tunneling. Paired with major CDNs like Cloudflare or Google Cloud, QUIC/HTTP/3 would give us access to infrastructure that’s already deeply integrated into most networks.

But even with those advantages, web conferencing traffic still had the edge. It was richer in behavioral cover, more predictable in flow, and more difficult to distinguish from normal user activity—especially in environments where Zoom and Teams are already part of the daily workflow.

Understanding the Architecture of Zoom

We began our research by focusing on Zoom to better understand its architecture and how its components function. In parallel, we reviewed common protocols used in web conferencing—particularly WebRTC—and explored prior research related to covert command and control.

At a high level, most web conferencing platforms share a broadly similar architecture and rely on common protocol stacks, or variations thereof. Once you understand the design of one platform, it becomes easier to reason about the others. Typically, these services consist of a centralized web interface—hosted on domains like zoom.us or webex.com—which users access to create, join, and manage meetings through either desktop clients or web browsers.

When it comes to transmitting media content, there are generally two architectural models:

Peer-to-Peer Communication

In peer-to-peer (P2P) setups, clients transmit media streams directly to one another, without routing through a centralized media server. This approach is bandwidth-efficient for small meetings—typically with two or three participants—but doesn’t scale well. As participant count increases, each client must maintain multiple simultaneous media streams, quickly consuming both bandwidth and CPU. Establishing P2P connections across NATs often requires traversal techniques such as STUN, which helps discover public IP/port mappings to facilitate direct communication.

Centralized Media Servers

To support larger meetings or scenarios where direct connections cannot be established (e.g., due to strict NATs or firewalls), conferencing solutions rely on centralized media servers. These servers act as intermediaries, ingesting media streams from all participants and relaying them as needed. This architecture enables several optimizations, such as:

Mixing multiple streams into a single composite stream to reduce bandwidth consumption.
Dynamically adjusting stream quality per participant based on their network conditions.

What Happens in Practice?

While some platforms support peer-to-peer communication, our research shows that most default to centralized media servers for all sessions. This appears to be driven by two primary factors:

Connectivity Constraints: Establishing reliable P2P connections is not always feasible, even with STUN. In such cases, platforms fall back to intermediaries like TURN or dedicated media servers.
Scalability Requirements: Beyond a handful of participants, P2P becomes impractical due to exponential bandwidth demands. Centralized infrastructure provides a more scalable solution.

As a result, most conferencing applications attempt to establish outbound connections over UDP to reach their media servers. If this is blocked, they often escalate to more evasive behaviors, such as tunneling media over TCP using TURN over TLS (TURNS) via port 443—an egress path that is typically open in most environments. The architecture for most web conferencing applications is generally fairly simple, but the challenge comes from operating this type of network of media servers at scale which is often non-trivial.

Understanding the Zoom Multi-Media Router (MMR)

Zoom supports a self-hosted version of the application called Zoom Meeting Connector which allows us to gain some insights into how the Zoom backend works as a similar backend architecture is leveraged when the Zoom software-as-a-service platform is leveraged without a self-hosted connector component. The Zoom Meeting Connector typically includes two components which are the Zone Controller (ZC) and the Multi-Media Router (MMR).

The zone controller is leveraged as a mechanism to provision or select the MMR server used to host a meeting and this is how Zoom is able to scale to handle millions of meetings all over the world. The Zoom Meeting Connector Introduction post by Jaron Davis post does a great job explaining how the Zoom Meeting Connector component works within on-premises environments. Figure Z shows how multiple Zoom Connectors can be provisioned to handle hosting and the Zone Controller component handles allocating meetings to specific MMR servers depending on their available capacity.

**Figure 1:** A diagram showing the relationship between the Zone Controller and MultiMedia Router (MMR) components used by Zoom in self-hosted connector environments (*source*).

It’s a little over-simplified, but Zoom basically just maintains a globally distributed network of “relays” that are used by various media clients to send data to each other. Natalie Silvanovich has an excellent presentation titled Zooming in on Zero Click Exploits which discusses her work reverse engineering the Zoom MMR component and provides some additional useful context on Zoom internals.

Understanding the Zoom Desktop Client Transport Protocol

After gaining an understanding of the architecture of Zoom we reviewed some published research from Princeton on the Zoom protocol. During the pandemic, many universities switched over to an online-learning model overnight and this caused a significant amount of interest among academic researchers on the performance of Zoom traffic and the Zoom protocol.

Princeton published a paper titled Enabling Passive Measurement of Zoom Performance in Production Networks which provided a lot of useful context on how the Zoom protocol functioned around the early pandemic era. This research was quite useful and even included a Wireshark protocol dissector which we were able to leverage during our research as well. After reviewing existing research we performed some live-packet captures of real-world Zoom meetings to see if the protocol had changed at all since the original research was published. It hadn’t.

The core Zoom protocol is essentially a custom UDP header wrapping the RTP protocol header with some custom extensions used for RTP traffic. This traffic is sent cleartext across the network, but the body of the traffic is encrypted in terms of the actual video, audio, or screen-sharing payloads. This is in the default configuration in Zoom unless end-to-end encryption is configured which is discussed in the subsequent section. Figure 2 shows the screenshot from Wireshark where we inspected Zoom traffic using the custom Wireshark analyzer developed by Princeton researchers.

**Figure 2:** An example UDP packet frame sent by the Zoom meeting client to a backend multi-media router server during a web conferencing call/meeting.

An interesting implication of this is that since the RTP traffic headers aren’t encrypted at the network-level it’s possible to see which type of data-stream is associated with each RTP packet such as video, audio, or screen-sharing related data being sent over a given channel. The underlying payload data itself is encrypted, but there is a decent bit of metadata available as well. Figure 3 shows a screenshot from the Zoom end-to-end encryption whitepaper which discusses how encryption works by default in Zoom.

**Figure 3:** The Zoom end-to-end encryption whitepaper describes how encryption in Zoom works in the default settings with “enhanced” encryption mode (*source*).

Understanding how End-to-End Encryption Works within Zoom

One of the neat things about Zoom is that it appears to be the only major web-conferencing platform that supports end-to-end encryption between clients with an untrusted backend architecture. The other major meeting application platforms such as Google Meet, Cisco WebEx, and Microsoft Teams all leverage a model where a compromise of the backend would lead to the ability to intercept and record meetings as the decryption key for those meetings is stored on a remote server.

Historically, there has been some contention around Zoom’s end-to-end encryption claims. CitizenLab originally documented in Move Fast and Roll Your Own Crypto: A Quick Look at the Confidentiality of Zoom Meetings the fact that Zoom wasn’t actually end-to-end encrypted in the traditional sense despite their claims. It was true that Zoom meets were encrypted between-clients, but typically end-to-end encryption implies the server can’t decrypt the traffic which wasn’t the case as Zoom servers also contained the decryption key.

Zoom in response to this developed a more robust end-to-end encryption protocol and even published a specification describing how the protocol worked in more detail. Our testing indicated that end-to-end encryption isn’t used by default in Zoom as it breaks certain functionality like being able to dial into the meeting via phones and other legacy clients that don’t support the native Zoom protocol. The desktop client must also be used when joining end-to-end encrypted meetings. This may be why true end-to-end encryption isn’t enabled by default despite it being supported by Zoom as the company is very focused on maintaining a high-degree of usability.

From a research perspective, end-to-end encryption seemed particularly compelling since one of our concerns with regular Zoom traffic was that the multi-media router might decrypt RTP packets and modify them (e.g. reducing the video quality in a scenario where lag is present to send lower-quality frames to clients with slower connections). It also seemed a bit simpler since we wouldn’t need to properly mimic the RTP header values if we were tunneling using the default Zoom protocol which doesn’t encrypt the RTP packet headers (these aren’t normally encrypted in practice even when leveraging SRTP for performance reasons).

Analyzing the Zoom Web Client

Zoom has two official client applications which can be leveraged which are the web client and the desktop client respectively. The web client is used in scenarios where a user wants to join the meeting, but for whatever reason isn’t able to use the desktop client. There are some notable implementation differences in how these clients work which is worth noting in our analysis. An interesting one is that while the desktop client leverages the custom Zoom protocol which wraps a custom in-house extension of the RTP protocol, the web client leverages WebRTC for communication purposes.

The interesting thing from an attacker’s perspective is that the same backend MMR servers with communication over 8801/UDP occur by default in both scenarios. However, the protocols used seem to differ depending on the client application being used. We observed both DTLS and STUN traffic occurring over 8801/UDP when using the Zoom web client application as this is typical traffic which would be observed during a WebRTC connection setup process. However, normal desktop application versions of Zoom would still communicate with the same MMR server, but over the custom Zoom meeting protocol which is leveraged by desktop clients using the same 8801/UDP port. This suggests that Zoom is doing some sort of inspection of the UDP packets on the MMR server to determine what protocol is being used between STUN, their custom UDP protocol, and WebRTC with DTLS to determine what handler should be executed on the MMR server side.

**Figure 4:** A screenshot showing UDP traffic observed being to the Zoom MMR server using the STUN protocol as part of ICE candidate gathering mechanism used during the WebRTC connection initiation process.

**Figure 5:** A screenshot showing how DTLS is used over 8801/UDP during the WebRTC connection which is distinct from the custom UDP protocol used by Zoom in the desktop application.

Another key difference between the web client and the desktop client is that the desktop client leverages port 443/TCP running on the MMR servers as a control channel to send and receive control messages with the server. This is slightly different from the web client which seems to leverage a websocket connection with a backend server referred to as a Real Time Web Gateway (RWG) which appears to serve a similar purpose. The communication protocol used by the web client is very similar to that used by the desktop client, but not identical to the one used in the desktop client (see Figure 6).

**Figure 6:** We observed the remote web gateway (RWG) being used in a very similar manner to the TCP control channel which is leveraged in Zoom desktop.

The connection between the web client and the MMR server is negotiated by leveraging the websocket connection with the remote web gateway (RWG) as a signaling server. The signaling server is leveraged to exchange the Session Description Protocol (SDP) offer and answer which is used during the WebRTC handshake process to establish a DTLS connection between the browser and the MMR server. Figure 7 shows the offer generated by the client. Figure 8 shows the generated answer provided by the RWG to the client browser to finalize the creation of the WebRTC connection.

**Figure 7:** An SDP offer generated by the Zoom web client which is used to initiate the WebRTC signaling process with the server to connect to the MMR server over WebRTC.

**Figure 8:** An answer received from the Zoom RWG backend when using the Zoom web client to negotiate a WebRTC connection with a MMR server running an SFU.

Once the WebRTC session is established, the Zoom web client opens two dedicated data channels: ZoomWebclientAudioDataChannel and ZoomWebclientVideoDataChannel. Interestingly, each channel is created using a separate WebRTC peer connection, even though WebRTC natively supports multiple data channels over a single connection.

**Figure 9:** We observed that the web client used two WebRTC data channels ZoomWebclientVideoDataChannel and ZoomWebclientAudioDataChannel for audio and video streams respectively.

One surprising detail in Zoom’s web client is its use of WebAssembly (WASM) for what appears to be video encoding. The WASM module was compiled with Emscripten, a tool commonly used to port native C++ code to run in the browser. This suggests that Zoom likely compiled portions of their C++ desktop client to WASM, enabling reuse of the same encoding logic within the web client.

This is an unexpected design choice, as most web applications tend to implement media processing using pure JavaScript or leverage built-in Web APIs. By opting for WASM, Zoom can maintain consistency across platforms while likely benefiting from the performance advantages of near-native code execution.

While we didn’t fully reverse-engineer the custom WASM modules, a review of their exported functions and surrounding JavaScript integration clearly indicates that they are responsible for video encoding, working in conjunction with WebRTC data channels negotiated with the MMR server. Figure 10 shows the video.mtsimd.wasm module which is ostensibly used for video stream encoding.

**Figure 10:** We observed that the video.mtsimd.wasm module appeared to be used for video encoding based on the available function exports.

We observed that the Zoom web client would leverage multiple different methods to egress from a network:

It first attempts to egress using the standard 8801/UDP with WebRTC as a media channel during the initial meeting join process to communicate with the MMR server functioning as an SFU.
If this fails it will try to egress using the TURN credentials provided by the client in order to establish a WebRTC connection with the MMR server.
If both of these vectors fail such as in scenarios where the client is only able to egress via HTTP through a webproxy which enforces TLS inspection then it will fallback to leveraging the RWG with separate websocket connections for things like the audio and video streams.

Analyzing the Zoom Desktop Client

We already discussed a little bit about how the Zoom desktop client functions in previous sections discussing things like the architecture of Zoom and how the web client functions, however, in this section we will touch in a little more detail our research into the Zoom desktop client specifically.

One of the things we were particularly interested in during our research was the “Zoom HTTP Tunnel” functionality we saw in various documentation sources. This seemed to suggest that there was some sort of tunneling service leveraged by Zoom we could potentially take advantage of if it didn’t include restrictions on where we could tunnel traffic to (see Figure 11).

**Figure 11:** We identified several instances in the Zoom documentation that referenced an HTTP tunneling service that could by clients facing egress restrictions within their environment.

By default, when the Zoom desktop application joins a meeting, it first contacts a Zone Controller (ZC) to determine which MMR server it should use. Zone Controllers serve as a routing layer that distributes meeting sessions across multiple MMR servers. Each meeting is assigned to a specific MMR, which may concurrently handle other meetings up to a certain capacity threshold.

The Zoom client initiates these connections primarily over TCP port 443, which it uses to communicate with both the Zone Controller and the MMR server. For real-time media transport, the client typically relies on UDP port 8801, using Zoom’s custom-wrapped version of the RTP protocol, as previously discussed.

To study this behavior, we configured a lab environment with a virtual machine running the Zoom desktop client. This environment was intentionally constrained: the only available egress path was through an HTTP proxy running Burp Suite with TLS inspection enforced. This setup ensured that any outbound Zoom traffic had to pass through the proxy in order to exit the network.

Unsurprisingly, the Zoom client is built to operate effectively in environments with strict egress controls — a necessity for enterprise deployments where proxies and TLS inspection are common. To maintain functionality in such conditions, the client adapts by rerouting traffic that would normally use UDP or its custom TCP control protocol over a WebSocket tunnel on TCP port 443, directly to the MMR server.

The MMR server supports a minimal set of HTTP operations that allow the client to upgrade to a WebSocket connection. Once established, this WebSocket tunnel is used not only for control signaling but also for transmitting real-time media traffic, effectively replacing the default UDP path. This fallback mechanism enables the Zoom client to remain fully operational even in environments with highly restrictive outbound filtering. We found this to be a rather ingenious technique for bypassing the requirement to leverage a web proxy for egress within a network.

In Figure 12, we observe the Zoom client performing its initial handshake with the Zone Controller. This includes the transmission of session metadata and connection parameters. Subsequently, as illustrated in Figure 13, the client successfully resolves the appropriate MMR server for the meeting and initiates a dedicated WebSocket connection to it.

**Figure 12:** An initial handshake message sent from the Zoom desktop client to the zone controller as part of a handshake process.

**Figure 13:** A response generated by the Zone Controller giving the desktop client key information such as the information on the MMR server associated with the meeting they are in the process of joining.

During our research, we also observed references to several other domains related to tunneling traffic within Zoom in order to egress from a corporate environment. For example, during the connection process with an MMR server we noticed the client sent a value “TUNNEL” to the MMR server with a domain referencing the Zoom tunneling service (see Figure 14).

**Figure 14:** We also observed references to a tunneling service through various responses while observing traffic within the Zoom application such as references to “TUNNEL” and “HT”.

In our lab environment, we spent some time performing testing by blocking the MMR servers and then trying to get the Zoom client to attempt to egress through the tunneling service. However, due to the timeboxed nature of our research we didn’t have a chance to fully investigate this vector.

Conclusion

In part one of this series, we explored how web conferencing applications function under the hood, using Zoom as our primary example. While each platform has its quirks, they generally share a common architecture — and Zoom serves as a representative model for understanding how these systems are built and operate.

What stood out most was Zoom’s persistence and flexibility in maintaining connectivity. Its ability to adapt to a wide range of network environments — often with more resilience than many red team implants — and highlights the platform’s strong focus on reliability and user experience. From a network egress perspective, Zoom is impressively tenacious.

In part two, we’ll pivot to the offensive side of the house. We’ll walk through the various strategies we considered for routing traffic through web conferencing platforms and explain why we ultimately chose TURN as our preferred egress channel. We’ll also introduce TURNt — a tool we developed specifically to take advantage of this approach — and provide a brief overview of how to use it effectively.