Why Service Meshes Matter

Over the last few years, the pace of moving workloads to the cloud has continued to accelerate. Mostly, this has been a boon for innovation, allowing complex monolithic on-prem instances to be broken into microservice architectures, which provide decoupling, agility, and stability. From a development perspective, life has in some ways never been easier. However, this decomposition has not been entirely free: breaking a monolith apart has led to the widespread adoption of Service Meshes. Service meshes typically come with several components to assist in their operation, a proxy that gets deployed alongside each service, and a control plane that coordinates these proxies and provides the desired features to the application developers. Many projects offer service mesh capabilities, like Consul and Linkerd, but this blog post is primarily focused on perhaps the most well-known mesh: Istio.

Service meshes bring many benefits to a microservices architecture. Service-to-service authorization is handled transparently by the control plane/proxy system, allowing developers to remove this burden from each individual service’s codebase. It provides developers with fine-grained controls over which services are allowed to talk to each other, and via which endpoints/HTTP methods. Mutual TLS (mTLS) is provided by the proxy system to allow for automatic authentication between services. The combination of these features provides the potential for a strong microservice architecture from a security perspective. Lastly, service meshes contribute other useful features to an application team. The mesh can be responsible for request tracing, increasing an application’s observability, and failure handling, allowing for automated request retrying/backoff.

As with any sufficiently complicated software system, service meshes are not without faults, and misconfigurations in the mesh can lead to security issues that could help an attacker quickly learn about, and interact with potentially sensitive services. To begin our examination of Istio, and the inner-workings of service meshes in general, we set out to develop a tool that can discover and audit core pieces of Istio configuration and to detect common misconfigurations in its rules. Such tools, when built correctly, can be included as part of the DevOps tool chain, preventing ‘badness’ from ever even reaching production.

 

Snowcat Overview

(Snowcat is available on Github.)

Goals

Snowcat has two primary goals: to obtain information about an Istio deployment and to report on misconfigurations or deviations from best practices where possible. Istio publishes a full list of security best practices in their documentation. Snowcat currently reports on the adherence to the following best practices:

Mutual TLS: Strict vs Permissive

By default, Istio does not require mTLS for all connections. This means that sidecar proxies accept both mTLS and plaintext traffic. The operator must opt in to strict mTLS by deploying a PeerAuthentication policy in each namespace as desired. Enforcing mTLS is important to prevent attackers outside the mesh from communicating with services within the mesh. It should also be noted that mTLS policies alone are not enough. To further protect workloads within the mesh, additional controls are recommended (e.g. AuthorizationPolicies).

Unsafe Authorization Policy Patterns

Istio allows operators to apply fine-grained authorization policies to connections between workloads using the AuthorizationPolicy resource. As with firewall rules, the safest approach is to configure “default deny” policies first and make exceptions for known good cases. The following policy demonstrates a deviation from this best practice:

		apiVersion: security.istio.io/v1beta1kind: AuthorizationPolicymetadata:name: allow-with-negativespec:action: ALLOWrules:- to:- operation:notPaths:  ["/private"]	

The above policy allows all requests to endpoints that are not `/private`. If the team responsible for the microservice adds another sensitive endpoint group (e.g. `/secret`), this would be permitted under the policy.

TLS Certificate Validation in DestinationRules

Istio allows operators to use DestinationRules to control egress traffic routing. For example, if a service routinely communicates with an external API without TLS, an operator can create a rule to add TLS in the Istio sidecar proxy. However, by default, the sidecar does not validate TLS certificates when configured with `SIMPLE` TLS mode. If egress TLS rules are used, they should explicitly define a set of `caCertificates` to use when validating certificates.

Weak Service Account Authentication

The Istio control plane validates connections from sidecars to the control plane using JWTs signed by the cluster. By default, Kubernetes will mount a JWT into a pod for the pod’s service account. Kubernetes also supports JWTs with custom audiences and expiration using the projected service account feature.

If the Istio JWT policy is set to “first-party-jwt”, the control plane will not validate the audience in JWTs. This allows the automatically mounted JWTs to access the control plane. However, if the JWT policy is set to “third-party-jwt”, the control plane will require the audience in the JWT to be set to “istio-ca”. This allows the sidecar to authenticate using a projected service account, and prevents any other containers in the pod from accessing the Istio control plane by default.

Vulnerable Istio Versions

Known security vulnerabilities in Istio are documented via security bulletins on Istio’s website. Snowcat will attempt to discover the cluster’s version and then compare it against known Istio bulletins to determine if there are any potential security issues.

Usage Modes

Snowcat is designed to work in two primary operation modes: unauthenticated and static analysis.

By default Snowcat will attempt to enumerate and discover information about the Istio control plane using one of the three techniques described in the Technical Details section. These techniques provide access to Istio resources like AuthorizationPolicies, VirtualServices, etc. This operational mode is intended for security engineers attempting to enumerate weaknesses from an unauthenticated point of view (e.g. a compromised workload in the cluster). The resources discovered using these techniques are then scanned and exported for the operator.

In the future, we intend to implement an authenticated mode that uses Kubernetes API access and/or authenticated XDS access to obtain the same results.

Alternatively, Snowcat can scan static YAML files using the same analysis engine. This mode is intended for continuous scanning of configuration files. We’re biased here, but we firmly believe that this type of continuous security posture management frees up developers for more valuable tasks that they are uniquely qualified for. If it can be done by a computer, it should be. That’s the design philosophy behind Praetorian’s Chariot product, that provides integration with tools like this covering the entire DevSecOps continuum. Our plan is to significantly expand Chariot’s reach into the environment, and we hope to add Snowcat to the arsenal in the near future. Even before that, if you want to check out the functionality Chariot has today, you can sign up for free here.

Different Istio Deployments

There are many factors that impact the underlying implementation details of an Istio deployment. Deploying Istio in AWS versus in a Google Cloud cluster can affect which version of Istio is used, what metadata services are exposed, and many other deployment minutiae which can make the difference between a secure and insecure cluster.

For the first release of Snowcat, service provider-agnostic discovery approaches were used. For example, there is no direct (ab)use of the Google Cloud metadata services. Despite these efforts, however, Snowcat’s current discovery mechanisms are most effective in Google Cloud where we performed the majority of our testing.

For example, when we tested Snowcat in AWS, we found that a newer version of Istio which blocked the majority of the Istiod Debug APIs had been deployed and there were greater restrictions on accessing the Kubelet Read-Only API. We were hoping that these mechanisms would be cloud provider agnostic, but like many security challenges, the devil is in the details. Future release of Snowcat will have additional cloud-specific discovery mechanisms to ensure the widest possible range of use.

Alternatives

Istio itself ships a configuration analyzer that operators can invoke with `istioctl analyze`. This analysis tool is not specifically designed as a security tool and focuses on all types of misconfiguration (e.g. checks that your authorization policy has matching pods). Snowcat focuses specifically on security issues, but also has an explicit goal of operating in environments with limited credentials.

Technical Details

While the static analysis of a dumped Istio cluster’s configuration is fairly straightforward to analyze, there is a great deal more legwork necessary to perform the same analysis from a less privileged position.

Snowcat relies on abusing several different metadata discovery mechanisms within Istio to help gather information which is necessary for analysis. For example, identifying if third-party service accounts are properly handled requires access to the Istiod pod specification or the Debug API. We tried to describe what information sources could be used and quickly discovered a tangled web of metadata services, specifications, and potential configuration issues.

Rough attempt to map metadata discover components to findings

In building the above map we relied on three primary discovery mechanisms: the plaintext XDS API, the Istiod Debug API, and the Kubelet Read-Only API.

Plaintext XDS

The Istio data plane uses Envoy proxies to manage traffic routing, mTLS, authentication, etc. The Istio control plane exposes information to these proxies using XDS (the “*” Discovery Service). Typically, an istio-proxy sidecar authenticates to XDS at istiod.istio-system.svc:15012 using TLS and its JWT. However, istiod also exposes a plaintext XDS on port 15010. The plaintext service does not validate JWTs by default (controlled by the XDS_AUTH_PLAINTEXT environment variable) and, as a result, is accessible to an attacker with network access to the port.

XDS is a gRPC service with the following protobuf definitions:

gRPC service exposed on port 15010 and 15012


Request protobuf for XDS


When making unauthenticated requests to the plaintext XDS, the most important fields in the request are the resource_names and the type_url. The type_url specifies the type of data being queried (e.g. Envoy listeners) and the resource_names control the individual resources that are returned. Istio supports a large number of type URLs not present in Envoy, as shown in the code snippet below:

Type URL generators from pkg/xds/discovery.go


Additionally, if a type URL is unrecognized, it is generally handled by the api generator. This generator is the most useful for Snowcat, as it allows querying objects in the Istio Kubernetes schema. For example, we can query the plaintext XDS for a list of all AuthorizationPolicies within the cluster. This generator is enough to provide Snowcat with all the information it needs to make decisions.

Istiod Debug API

In addition to XDS, Istio exposes cluster metadata through a separate debug API. This is what enables troubleshooting commands like proxy-config and proxy-status. While the debugging APIs aren’t explicitly specified, their functionality is described in documents like the Mesh Troubleshooting Architecture RFC. The exposed API also helpfully provides a webpage documenting its features when visited via the HTTP interface.

All enabled debug endpoints are exposed via the HTTP debug interface

By default, this is exposed without requiring authentication on port 8080 for any Istiod servers. Istio 1.11 and later have reduced this attack surface by requiring access to go through the istio-proxy sidecar and restricting access to many of these endpoints, but it is still possible to retrieve metadata about the cluster this way.

While the API is intended for identifying issues with proxy configuration, it makes a solid alternative for gathering configuration information to identify issues. Some interesting APIs for gathering cluster metadata are:

  • The /debug/configz endpoint exposes information about DestinationRules, AuthorizationPolicies, EnvoyFilters, and Gateways – all of which can be parsed for potential security issues.
  • /debug/syncz leaks the pod name, namespace name, and Istio version of every istio-proxy within the cluster.
  • /debug/inject describes the active injection template applied to all Istio managed pods – this can be analyzed for additional misconfigurations of the cluster.
  • /debug/endpointz describes Kubernetes service resources for the cluster.

Newer versions of Istio lock down the majority of this API but it is still recommended to ensure that the API is not accessible from cluster workloads. Removing all service references from the cluster configuration is not sufficient to block this access either – even if there is no dns entry to resolve an endpoint, port 8080 will still be exposed on the IP address of the Istiod pod. This means that an attacker on a workload could attempt to port scan the cluster hunting for IPs that respond to a HEAD request to http://:8080/debug. Given that this could require scanning the entire 10.0.0.0/8 range it would be a non-trivial scan length, but thanks to the Kubelet Read-Only API, identifying Istiod pod IPs can be much quicker than a brute force enumeration of the cluster.

Kubelet Read-Only Port

Finally, another useful source of information is the cluster’s Kubelet Read-Only API. By default, this service listens on the node’s port 10255. The availability of this service is configuration dependent. AWS disables the port by default, however, GKE enables it for metrics collection. The read-only API exposes the /pods endpoint, which returns all data for pods on that node. In Snowcat, we use this API to examine the Istio control plane and the injected sidecars. For example, we use the pod spec of Istiod to obtain the IP address of the control plane pods, which allows us to test direct connections to XDS and the Istiod Debug API.

Snippet of data returned from Kubelet API identifying the Istiod pod specification.

To prevent unwanted access to this service, Kubernetes Network Policies can be applied to control traffic to the configured Kubelet API port.

Future Work

We hope to continue work on Snowcat to expand its capabilities, both with additional features and future research into the underlying components. Some future work on our horizon includes:

  • Adding an authenticated mode to allow scanning with credentialed access to a Kubernetes cluster.
  • Additional collection strategies (e.g. extracting information from Envoy config dumps).
  • Additional support for cloud-provider specific configurations.
  • Visualization of various Istio data (e.g. authorization policies).

Go to the Github Repo