You are viewing documentation for Kubernetes version: v1.24

Kubernetes v1.24 documentation is no longer actively maintained. The version you are currently viewing is a static snapshot. For up-to-date information, see the latest version.

A Closer Look at NSA/CISA Kubernetes Hardening Guidance

Tuesday, October 05, 2021

Authors: Jim Angel (Google), Pushkar Joglekar (VMware), and Savitha Raghunathan (Red Hat)

Disclaimer

The open source tools listed in this article are to serve as examples only and are in no way a direct recommendation from the Kubernetes community or authors.

Background

USA's National Security Agency (NSA) and the Cybersecurity and Infrastructure Security Agency (CISA) released, "Kubernetes Hardening Guidance" on August 3rd, 2021. The guidance details threats to Kubernetes environments and provides secure configuration guidance to minimize risk.

The following sections of this blog correlate to the sections in the NSA/CISA guidance. Any missing sections are skipped because of limited opportunities to add anything new to the existing content.

Note: This blog post is not a substitute for reading the guide. Reading the published guidance is recommended before proceeding as the following content is complementary.

Introduction and Threat Model

Note that the threats identified as important by the NSA/CISA, or the intended audience of this guidance, may be different from the threats that other enterprise users of Kubernetes consider important. This section is still useful for organizations that care about data, resource theft and service unavailability.

The guidance highlights the following three sources of compromises:

Supply chain risks
Malicious threat actors
Insider threats (administrators, users, or cloud service providers)

The threat model tries to take a step back and review threats that not only exist within the boundary of a Kubernetes cluster but also include the underlying infrastructure and surrounding workloads that Kubernetes does not manage.

For example, when a workload outside the cluster shares the same physical network, it has access to the kubelet and to control plane components: etcd, controller manager, scheduler and API server. Therefore, the guidance recommends having network level isolation separating Kubernetes clusters from other workloads that do not need connectivity to Kubernetes control plane nodes. Specifically, scheduler, controller-manager, etcd only need to be accessible to the API server. Any interactions with Kubernetes from outside the cluster can happen by providing access to API server port.

List of ports and protocols for each of these components are defined in Ports and Protocols within the Kubernetes documentation.

Special note: kube-scheduler and kube-controller-manager uses different ports than the ones mentioned in the guidance

The Threat modelling section from the CNCF Cloud Native Security Whitepaper + Map provides another perspective on approaching threat modelling Kubernetes, from a cloud native lens.

Kubernetes Pod security

Kubernetes by default does not guarantee strict workload isolation between pods running in the same node in a cluster. However, the guidance provides several techniques to enhance existing isolation and reduce the attack surface in case of a compromise.

"Non-root" containers and "rootless" container engines

Several best practices related to basic security principle of least privilege i.e. provide only the permissions are needed; no more, no less, are worth a second look.

The guide recommends setting non-root user at build time instead of relying on setting runAsUser at runtime in your Pod spec. This is a good practice and provides some level of defense in depth. For example, if the container image is built with user 10001 and the Pod spec misses adding the runAsuser field in its Deployment object. In this case there are certain edge cases that are worth exploring for awareness:

Pods can fail to start, if the user defined at build time is different from the one defined in pod spec and some files are as a result inaccessible.
Pods can end up sharing User IDs unintentionally. This can be problematic even if the User IDs are non-zero in a situation where a container escape to host file system is possible. Once the attacker has access to the host file system, they get access to all the file resources that are owned by other unrelated pods that share the same UID.
Pods can end up sharing User IDs, with other node level processes not managed by Kubernetes e.g. node level daemons for auditing, vulnerability scanning, telemetry. The threat is similar to the one above where host file system access can give attacker full access to these node level daemons without needing to be root on the node.

However, none of these cases will have as severe an impact as a container running as root being able to escape as a root user on the host, which can provide an attacker with complete control of the worker node, further allowing lateral movement to other worker or control plane nodes.

Kubernetes 1.22 introduced an alpha feature that specifically reduces the impact of such a control plane component running as root user to a non-root user through user namespaces.

That (alpha stage) support for user namespaces / rootless mode is available with the following container runtimes:

Some distributions support running in rootless mode, like the following:

Immutable container filesystems

The NSA/CISA Kubernetes Hardening Guidance highlights an often overlooked feature readOnlyRootFileSystem, with a working example in Appendix B. This example limits execution and tampering of containers at runtime. Any read/write activity can then be limited to few directories by using tmpfs volume mounts.

However, some applications that modify the container filesystem at runtime, like exploding a WAR or JAR file at container startup, could face issues when enabling this feature. To avoid this issue, consider making minimal changes to the filesystem at runtime when possible.

Building secure container images

Kubernetes Hardening Guidance also recommends running a scanner at deploy time as an admission controller, to prevent vulnerable or misconfigured pods from running in the cluster. Theoretically, this sounds like a good approach but there are several caveats to consider before this can be implemented in practice:

Depending on network bandwidth, available resources and scanner of choice, scanning for vulnerabilities for an image can take an indeterminate amount of time. This could lead to slower or unpredictable pod start up times, which could result in spikes of unavailability when apps are serving peak load.
If the policy that allows or denies pod startup is made using incorrect or incomplete data it could result in several false positive or false negative outcomes like the following:
- inside a container image, the openssl package is detected as vulnerable. However, the application is written in Golang and uses the Go crypto package for TLS. Therefore, this vulnerability is not in the code execution path and as such has minimal impact if it remains unfixed.
- A vulnerability is detected in the openssl package for a Debian base image. However, the upstream Debian community considers this as a Minor impact vulnerability and as a result does not release a patch fix for this vulnerability. The owner of this image is now stuck with a vulnerability that cannot be fixed and a cluster that does not allow the image to run because of predefined policy that does not take into account whether the fix for a vulnerability is available or not
- A Golang app is built on top of a distroless image, but it is compiled with a Golang version that uses a vulnerable standard library. The scanner has no visibility into golang version but only on OS level packages. So it allows the pod to run in the cluster in spite of the image containing an app binary built on vulnerable golang.

To be clear, relying on vulnerability scanners is absolutely a good idea but policy definitions should be flexible enough to allow:

Creation of exception lists for images or vulnerabilities through labelling
Overriding the severity with a risk score based on impact of a vulnerability
Applying the same policies at build time to catch vulnerable images with fixable vulnerabilities before they can be deployed into Kubernetes clusters

Special considerations like offline vulnerability database fetch, may also be needed, if the clusters run in an air-gapped environment and the scanners require internet access to update the vulnerability database.

Pod Security Policies

Since Kubernetes v1.21, the PodSecurityPolicy API and related features are deprecated, but some of the guidance in this section will still apply for the next few years, until cluster operators upgrade their clusters to newer Kubernetes versions.

The Kubernetes project is working on a replacement for PodSecurityPolicy. Kubernetes v1.22 includes an alpha feature called Pod Security Admission that is intended to allow enforcing a minimum level of isolation between pods.

The built-in isolation levels for Pod Security Admission are derived from Pod Security Standards, which is a superset of all the components mentioned in Table I page 10 of the guidance.

Information about migrating from PodSecurityPolicy to the Pod Security Admission feature is available in Migrate from PodSecurityPolicy to the Built-In PodSecurity Admission Controller.

One important behavior mentioned in the guidance that remains the same between Pod Security Policy and its replacement is that enforcing either of them does not affect pods that are already running. With both PodSecurityPolicy and Pod Security Admission, the enforcement happens during the pod creation stage.

Hardening container engines

Some container workloads are less trusted than others but may need to run in the same cluster. In those cases, running them on dedicated nodes that include hardened container runtimes that provide stricter pod isolation boundaries can act as a useful security control.

Kubernetes supports an API called RuntimeClass that is stable / GA (and, therefore, enabled by default) stage as of Kubernetes v1.20. RuntimeClass allows you to ensure that Pods requiring strong isolation are scheduled onto nodes that can offer it.

Some third-party projects that you can use in conjunction with RuntimeClass are:

As discussed here and in the guidance, many features and tooling exist in and around Kubernetes that can enhance the isolation boundaries between pods. Based on relevant threats and risk posture, you should pick and choose between them, instead of trying to apply all the recommendations. Having said that, cluster level isolation i.e. running workloads in dedicated clusters, remains the strictest workload isolation mechanism, in spite of improvements mentioned earlier here and in the guide.

Network Separation and Hardening

Kubernetes Networking can be tricky and this section focuses on how to secure and harden the relevant configurations. The guide identifies the following as key takeaways:

Using NetworkPolicies to create isolation between resources,
Securing the control plane
Encrypting traffic and sensitive data

Network Policies

Network policies can be created with the help of network plugins. In order to make the creation and visualization easier for users, Cilium supports a web GUI tool. That web GUI lets you create Kubernetes NetworkPolicies (a generic API that nevertheless requires a compatible CNI plugin), and / or Cilium network policies (CiliumClusterwideNetworkPolicy and CiliumNetworkPolicy, which only work in clusters that use the Cilium CNI plugin). You can use these APIs to restrict network traffic between pods, and therefore minimize the attack vector.

Another scenario that is worth exploring is the usage of external IPs. Some services, when misconfigured, can create random external IPs. An attacker can take advantage of this misconfiguration and easily intercept traffic. This vulnerability has been reported in CVE-2020-8554. Using externalip-webhook can mitigate this vulnerability by preventing the services from using random external IPs. externalip-webhook only allows creation of services that don't require external IPs or whose external IPs are within the range specified by the administrator.

CVE-2020-8554 - Kubernetes API server in all versions allow an attacker who is able to create a ClusterIP service and set the spec.externalIPs field, to intercept traffic to that IP address. Additionally, an attacker who is able to patch the status (which is considered a privileged operation and should not typically be granted to users) of a LoadBalancer service can set the status.loadBalancer.ingress.ip to similar effect.

Resource Policies

In addition to configuring ResourceQuotas and limits, consider restricting how many process IDs (PIDs) a given Pod can use, and also to reserve some PIDs for node-level use to avoid resource exhaustion. More details to apply these limits can be found in Process ID Limits And Reservations.

Control Plane Hardening

In the next section, the guide covers control plane hardening. It is worth noting that from Kubernetes 1.20, insecure port from API server, has been removed.

Etcd

As a general rule, the etcd server should be configured to only trust certificates assigned to the API server. It limits the attack surface and prevents a malicious attacker from gaining access to the cluster. It might be beneficial to use a separate CA for etcd, as it by default trusts all the certificates issued by the root CA.

Kubeconfig Files

In addition to specifying the token and certificates directly, .kubeconfig supports dynamic retrieval of temporary tokens using auth provider plugins. Beware of the possibility of malicious shell code execution in a kubeconfig file. Once attackers gain access to the cluster, they can steal ssh keys/secrets or more.

Secrets

Kubernetes Secrets is the native way of managing secrets as a Kubernetes API object. However, in some scenarios such as a desire to have a single source of truth for all app secrets, irrespective of whether they run on Kubernetes or not, secrets can be managed loosely coupled with Kubernetes and consumed by pods through side-cars or init-containers with minimal usage of Kubernetes Secrets API.

External secrets providers and csi-secrets-store are some of these alternatives to Kubernetes Secrets

Log Auditing

The NSA/CISA guidance stresses monitoring and alerting based on logs. The key points include logging at the host level, application level, and on the cloud. When running Kubernetes in production, it's important to understand who's responsible, and who's accountable, for each layer of logging.

Kubernetes API auditing

One area that deserves more focus is what exactly should alert or be logged. The document outlines a sample policy in Appendix L: Audit Policy that logs all RequestResponse's including metadata and request / response bodies. While helpful for a demo, it may not be practical for production.

Each organization needs to evaluate their own threat model and build an audit policy that complements or helps troubleshooting incident response. Think about how someone would attack your organization and what audit trail could identify it. Review more advanced options for tuning audit logs in the official audit logging documentation. It's crucial to tune your audit logs to only include events that meet your threat model. A minimal audit policy that logs everything at metadata level can also be a good starting point.

Audit logging configurations can also be tested with kind following these instructions.

Streaming logs and auditing

Logging is important for threat and anomaly detection. As the document outlines, it's a best practice to scan and alert on logs as close to real time as possible and to protect logs from tampering if a compromise occurs. It's important to reflect on the various levels of logging and identify the critical areas such as API endpoints.

Kubernetes API audit logging can stream to a webhook and there's an example in Appendix N: Webhook configuration. Using a webhook could be a method that stores logs off cluster and/or centralizes all audit logs. Once logs are centrally managed, look to enable alerting based on critical events. Also ensure you understand what the baseline is for normal activities.

Alert identification

While the guide stressed the importance of notifications, there is not a blanket event list to alert from. The alerting requirements vary based on your own requirements and threat model. Examples include the following events:

Changes to the securityContext of a Pod
Updates to admission controller configs
Accessing certain files / URLs

Additional logging resources

Upgrading and Application Security practices

Kubernetes releases three times per year, so upgrade-related toil is a common problem for people running production clusters. In addition to this, operators must regularly upgrade the underlying node's operating system and running applications. This is a best practice to ensure continued support and to reduce the likelihood of bugs or vulnerabilities.

Kubernetes supports the three most recent stable releases. While each Kubernetes release goes through a large number of tests before being published, some teams aren't comfortable running the latest stable release until some time has passed. No matter what version you're running, ensure that patch upgrades happen frequently or automatically. More information can be found in the version skew policy pages.

When thinking about how you'll manage node OS upgrades, consider ephemeral nodes. Having the ability to destroy and add nodes allows your team to respond quicker to node issues. In addition, having deployments that tolerate node instability (and a culture that encourages frequent deployments) allows for easier cluster upgrades.

Additionally, it's worth reiterating from the guidance that periodic vulnerability scans and penetration tests can be performed on the various system components to proactively look for insecure configurations and vulnerabilities.

Finding release & security information

To find the most recent Kubernetes supported versions, refer to https://k8s.io/releases, which includes minor versions. It's good to stay up to date with your minor version patches.

If you're running a managed Kubernetes offering, look for their release documentation and find their various security channels.

Subscribe to the Kubernetes Announce mailing list. The Kubernetes Announce mailing list is searchable for terms such as "Security Advisories". You can set up alerts and email notifications as long as you know what key words to alert on.

Conclusion

In summary, it is fantastic to see security practitioners sharing this level of detailed guidance in public. This guidance further highlights Kubernetes going mainstream and how securing Kubernetes clusters and the application containers running on Kubernetes continues to need attention and focus of practitioners. Only a few weeks after the guidance was published, an open source tool kubescape to validate cluster against this guidance became available.

This tool can be a great starting point to check the current state of your clusters, after which you can use the information in this blog post and in the guidance to assess where improvements can be made.

Finally, it is worth reiterating that not all controls in this guidance will make sense for all practitioners. The best way to know which controls matter is to rely on the threat model of your own Kubernetes environment.

A special shout out and thanks to Rory McCune (@raesene) for his inputs to this blog post

←Previous