KubeCon + CloudNativeCon North America Virtual | November 17-20, 2020 | Don’t Miss Out | Learn more

Category

Blog

Kubernetes RBAC 101: Overview

By | Blog

Member Post

Guest post originally published on the Kublr blog by Oleg Chunikhin

Cloud native and open source technologies have modernized how we develop software, and although they have led to unprecedented developer productivity and flexibility, they were not built with enterprise needs in mind.

A primary challenge is bridging the gap between cloud native and enterprise reality. Enterprises need a centralized Kubernetes management control plane with logging and monitoring that supports security and governance requirements extended through essential Kubernetes frameworks.

But the job doesn’t end with reliable, enterprise-grade Kubernetes clusters. Organizations are also struggling to define new practices around this new stack. They find they must adjust established practices and processes and learn how to manage these new modern applications. Managing roles and permissions is part of that learning process.

Role-based access control (RBAC) is critical, but it can cause quite a bit of confusion. Organizations seek guidance on where to start, what can be done, and what a real-life implementation looks like. In this first blog in a  three-part series on Kubernetes RBAC, we’ll provide an overview of the terminology and available authentication and authorization methods. Part two and three will take a deeper dive into authentication and authorization.

RBAC is a broad topic. Keeping practicality in mind, we’ll focus on those methods that are most useful for enterprise users.

What is Kubernetes RBAC?

RBAC or access control is a way to define which users can do what within a Kubernetes cluster. These roles and permissions are defined either through various extensions or declaratively.

If you are familiar with Kubernetes, you already know that there are different resources and subjects. But if this is new to you, here is a quick summary.

Kubernetes provides a single API endpoint through which users manage containers across multiple distributed physical and virtual nodes. Following standard REST conventions, everything managed within Kubernetes is handled as a resource. Objects inside the Kubernetes master server are available through API objects like pods, nodes, config maps, secrets, deployments, etc.

In addition to resources, you also need to consider subjects and operations which are all connected through access control.

Operations and Subjects

Operations on resources are expressed through HTTP verbs sent to the API. Based on the REST URL called, Kubernetes will translate HTTP verbs from incoming HTTP requests into a wider set of operations. For example, while the GET verb applied to a specific Kubernetes object is interpreted as a “get” operation for that object; a GET verb applied to a class of objects in the Kubernetes API is interpreted as a “list” operation. This distinction is important when writing Kubernetes RBAC rules, as we’ll explain that in detail in part three (RBAC 101: Authorization).

Subjects represent actors in the Kubernetes API and RBAC, namely processes, users, or clients that call the API and perform operations on Kubernetes objects. There are three subject categories: users, groups, and service accounts. Technically, only service accounts exist as objects within a Kubernetes cluster API, users and groups are virtual — they don’t exist in the Kubernetes database, but Kubernetes identifies them by a string ID.

When sending an API request to the Kubernetes API, Kubernetes will first authenticate the request by identifying the user and group the sender belongs to. Depending on the authentication and method used, Kubernetes may also extract additional information and represent it as a map of key-value pairs associated with the subject.

Resource versus Non-Resource Requests

The Kubernetes API server adds an additional attribute to the request by tagging it as a resource or a non-resource request. This is necessary because, in addition to operating resources and objects via the Kubernetes API, users can also send requests to non-resource APIs endpoints, such as the “/version” URL, a list of available APIs, and other metadata.

For an API resource request, Kubernetes decodes the API request verb, namespace (in case of a namespaced resource), API group, resource name, and, if available, sub-resource. The set of attributes for API non-resource requests is smaller: an HTTP request verb and request path. The access control framework uses these attributes to analyze and decide whether a request should be authorized or not.

Kubernetes API Request Attributes

With non-resource requests, the HTTP request verb is obvious. In the case of resource requests, the verb gets mapped to an API resource action. The most common actions are get, list, create, delete. But there are also some less evident actions, such as watch, patch, bind, escalate, and use.

Authentication Methods for Kubernetes 

There are a number of authentication mechanisms, from client certificates to bearer tokens to HTTP basic authentication to authentication proxy.

Client certificates. There are two ways to sign client certificates so clients can use them to authenticate their Kubernetes API server requests. One is the manual creation of CSR and signing by an administrator, or signing a certificate through an enterprise certificate authority PKI infrastructure, in which case the external infrastructure signs the client certificates.

Another way that doesn’t require an external infrastructure — although not suitable for large scale deployments — is leveraging Kubernetes, which can also sign client certificates.

Bearer token. There are a number of ways for getting a bearer token. There are bootstrap and node authentication tokens which we won’t cover as they are mostly used internally in Kubernetes for initialization and bootstrapping. Static token files is another option that we won’t discuss because they are considered bad practice and are insecure. The most practical and useful methods are from service accounts and OIDC and we’ll cover that in detail in our next blog.

HTTP basic auth. HTTP basic auth is considered insecure as it can only be done through static configuration files in Kubernetes.

Authentication proxy. Mainly used by vendors, authentication proxies are often applied to set up different Kubernetes architectures. A proxy server processes requests to the Kubernetes API and establishes a trusted connection between the Kubernetes API and proxy. That proxy will authenticate users and clients any way it likes and add user identification into the request headers for requests sent through to the Kubernetes API. This allows the Kubernetes API to know who calls it.  Kublr, for example, uses this method to proxy dashboard requests, general web console requests, or provide a proxy Kubernetes API endpoint. Again, if you aren’t a vendor, you don’t really need to worry about this.

Impersonating: If you already have certain credentials providing access to the Kubernetes API, those credentials can be used to “impersonate” users by sending additional headers in the request with the impersonated user identity information. The Kubernetes API will switch your authentication context to that impersonated user based on the headers. Clearly this capability is only available if the “main” user account has permissions to impersonate.

Authorization Methods for Kubernetes

There are a few ways to manage authorization requests in Kubernetes.

First, we will quickly scan through the methods that you will not see or use in the everyday Kubernetes administrator’s life. Node authorization is used internally to authorize kubelet’s API calls, and should never be used by other clients. Authorization methods, such as ABAC and AlwaysDeny / AlwaysAllow are rarely used in real-life clusters: ABAC is based on a static config file and is considered insecure, and AlwaysDeny / AlwaysAllow are generally used for testing and are not approaches you’d use for production deployments.

WebHook is an external service the Kubernetes API can call when it needs to decide whether a request should be allowed or not. The API for this service is well documented in the Kubernetes documentation. In fact, the Kubernetes API itself provides this API. The most common use case for this mechanism is extensions. Extension servers provide authorization webhook endpoints to the API server to authorize access to extension objects.

From a practical standpoint, the most useful authorization method is RBAC. RBAC is based on declarative definitions of permissions stores and managed as cluster API objects. The main objects are roles and cluster roles, both representing a set of permissions on certain objects in the API. These are identified by API groups, source names, and actions performed on those objects. You can have a number of rules within a role or cluster role object.

To properly authorize users in a production grade deployment, it’s important to use RBAC. In the next blogs in this series, we’ll discuss how you can set up and use RBAC.

Conclusion

As we’ve seen, everything managed by Kubernetes is referred to as a resource. Operations on those resources are expressed as HTTP verbs, and subjects are the actors who’ll need to be authenticated and authorized.

There are a few ways to authenticate subjects. Client certificates, bearer tokens, HTTP basic auth, auth proxy, or impersonation (which does require a previous authentication). But only client certificates are a viable option for production deployments. We’ll explore them in more detail in our next blog (Kubernetes RBAC 101: Authentication).

For authorization, you also have a few options. There is ABAC, AlwaysDeny / AlwaysAllow, WebHook, and RBAC. Here too, only one option is viable for external clients in production deployments and that is RBAC. We’ll cover it in detail in part three of this series (Kubernetes RBAC 101: Authorization).

If you’d like to experiment with RBAC, download Kublr and play around with its RBAC feature. The intuitive UI helps speed up the steep learning curve when dealing with RBAC YAML files.

Testing Kubernetes Deployments within CI Pipelines

By | Blog

Member Post

Guest post originally published on eficode Praqma by Michael Vittrup Larsen, Cloud Infrastructure and DevOps Consultant at Eficode-Praqma

Low overhead, on-demand Kubernetes clusters deployed on CI Workers Nodes with KIND

How to test Kubernetes artifacts like Helm charts and YAML manifests in your CI pipelines with a low-overhead, on-demand Kubernetes cluster deployed with KIND – Kubernetes in Docker.

Containers have become very popular for packaging applications because they solve the dependency management problem. An application packaged in a container includes all necessary run-time dependencies so it becomes portable across execution platforms. In other words, if it works on my machine it will very likely also work on yours.

Automated testing is ubiquitous in DevOps and we should containerize our tests for exactly the same reasons as we containerize our applications: if a certain test validates reliably on my machine it should work equally well on yours, irrespective of which libraries and tools you have installed natively.

Testing with Containers

The following figure illustrates a pipeline (or maybe two, depending on how you organize your pipelines) where the upper part builds and packages the application in a container and the lower part does the same with the tests that will be used to validate the application. The application is only promoted if the container-based tests pass.

Test operations through network

If we assume that the application is a network-attached service where black-box testing can be executed through network connectivity, a setup like the one above is easily implemented by:

  1. Build application and test containers, e.g. using ‘docker build …’
  2. Start an instance of the application container attached to a network, e.g. with ‘docker run …’
  3. Start an instance of the test container attached to the same network as the application, e.g. with ‘docker run …’
  4. The exit code of the test container determines the application test result

This is illustrated in the figure below.

Test operations through network

Steps 2 through 4 outlined above can also be described in a docker-compose definition with two services, e.g. (the test container is configured with the application network location through an environment variable):

version: '3.7'
    services:
    test:
      image: test-container:latest
      environment:
        APPLICATION_URL: http://application:8080
    depends_on:
      - application
    application:
      image: application:latest
      ports:
        - 8080:8080

A test using the two containers can now be executed with:

docker-compose up --exit-code-from test

Testing Kubernetes Artifacts in the CI Pipeline

The process described above works well for tests at ‘container level’. But what if the output artifacts of the CI pipelines includes Kubernetes artifacts, e.g. YAML manifests or Helm charts, or needs to be deployed to a Kubernetes cluster to be validated? How do we test in those situations?

One option is to have a Kubernetes cluster deployed which the CI pipelines can deploy to. However, this gives us some issues to consider:

  • A shared cluster which all CI pipelines can deploy to basically becomes a multi-tenant cluster which might need careful isolation, security, and robustness considerations.
  • How do we size the CI Kubernetes cluster? Most likely the cluster capacity will be disconnected from the CI worker capacity i.e. they cannot share compute resources. This will result in low utilization. Also, we cannot size the CI cluster too small because we do not want tests to fail due to other pipelines temporarily consuming resources.
  • We might want to test our Kubernetes artifacts against many versions and configurations of Kubernetes, i.e. we basically need N CI clusters available.

We could also create a Kubernetes cluster on demand for each CI job. This requires:

  • Access to a cloud-like platform where we can dynamically provision Kubernetes clusters.
  • Our CI pipelines have the necessary privileges to create infrastructure which might be undesirable from a security point-of-view.

For some test scenarios we need a production-like cluster and we will have to consider one of the above solutions, e.g. characteristics tests or scalability tests. However, in many situations, the tests we want our CI pipelines to perform can be managed within the capacity of a single CI worker node. The following section describes how to create on-demand clusters on a container-capable CI worker node.

On-Demand Private Kubernetes Cluster with KIND

Kubernetes-in-Docker (KIND) is an implementation of a Kubernetes cluster using Docker-in-Docker (DIND) technology. Docker-in-docker means that we can run containers inside containers and those inner containers are only visible inside the outer container. KIND uses this to implement a cluster by using the outer container to implement Kubernetes cluster nodes. When a Kubernetes POD is started on a node it is implemented with containers inside the outer node container.

With KIND we can create on-demand and multi-node Kubernetes clusters on top of the container capabilities of our CI worker node.

A KIND Kubernetes cluster : KIND Kubernetes cluster

The cluster capacity will obviously be limited by CI worker node capacity, but otherwise the Kubernetes cluster will have many of the capabilities of a production cluster, including HA capabilities.

Let’s demonstrate how to test an application deployed with Helm to a KIND cluster. The application is the k8s-sentence-age application which can be found on Github, including a Github action that implements the CI pipeline described in this blog. The application is a simple service that can return a random number (an ‘age’) between 0 and 100 and also provides appropriate Prometheus compatible metrics.

Installing KIND

KIND is a single executable, named kind, which basically talks to the container runtime on the CI worker. It will create an (outer) container for each node in the cluster using container images containing the Kubernetes control-plane. An example of installing kind as part of a Github action can be found here.

Creating a Cluster

With the kind tool our CI pipelines can create a single node Kubernetes cluster with the following command:

kind create cluster --wait 5m

We can also create multi-node clusters if we need them for our tests. Multi-node clusters require a configuration file that lists node roles:

# config.yaml
  kind: Cluster
  apiVersion: kind.x-k8s.io/v1alpha4
  nodes:
  - role: control-plane
  - role: worker
  - role: worker

With the above configuration file we can create a three-node cluster with the following command:

kind create cluster --config config.yaml

We can specify which container image the KIND Kubernetes nodes should use and thereby control the version of Kubernetes:

kind create cluster --image "kindest/node:v1.16.4"

With this we can easily test compatibility against multiple versions of Kubernetes as part of our CI pipeline.

Building Application Images and Making Them Available to KIND

The example k8s-sentences-age application is packaged in a container named ‘age’ and the tests for the application are packaged in a container named ‘age-test’. These containers are built in the usual way as follows:

docker build -t age:latest ../app
docker build -t age-test:latest .

We can make the new version of these images available to our KIND Kubernetes nodes with the following command:

kind load docker-image age:latest
kind load docker-image age-test:latest

Loading the images onto KIND cluster nodes copies the image to each node in the cluster.

Running a Test

Our pipeline will deploy the application using its Helm chart and run the tests against this deployed application instance.

Deploying the application with the application Helm chart means that we not only test the application container when deployed to Kubernetes, but we also validate the Helm chart itself. The Helm chart contains the YAML manifests defining the application Kubernetes blueprint and this is particularly important to validate – not only against different versions of Kubernetes, but also in various configurations, e.g. permutations of values given to the Helm chart.

We install the application with the following Helm command. Note that we override the Helm chart default settings for image repository, tag and pullPolicy such that the local image used.

helm install --wait age ../helm/age \
--set image.repository=age \
--set image.tag=latest \
--set image.pullPolicy=Never

The test container is deployed using a Kubernetes Job resource. Kubernetes Job resources define workloads that run to completion and report completion status. The job will use the local ‘age-test‘ container image we built previously and will connect to the application POD(s) using the URLs provided in environment variables. The URLs reference the Kubernetes service created by the Helm chart.

apiVersion: batch/v1
kind: Job
metadata:
  name: component-test
spec:
  template:
    metadata:
      labels:
        type: component-test
    spec:
      containers:
      - name: component-test
        image: age-test
        imagePullPolicy: Never
        env:
        - name: SERVICE_URL
          value: http://age:8080
        - name: METRICS_URL
          value: http://age:8080/metrics
      restartPolicy: Never

The job is deployed with this command:

kubectl apply -f k8s-component-test-job.yaml

Checking the Test Result

We need to wait for the component test job to finish before we can check the result. The kubectl tool allows waiting for various conditions on different resources, including job completions. i.e. our pipeline will wait for the test to complete with the following command:

kubectl wait --for=condition=complete \
--timeout=1m job/component-test

The component test job will have test results as part of its logs. To include these results as part of the pipeline output we print the logs of the job with kubectl and with a label selector to select the job pod.

kubectl logs -l type=component-test

The overall status of the component test is read from the job POD field .status.succeeded and stored in a SUCCESS variable as shown below. If the status indicates failure the pipeline terminates with an error:

SUCCESS=$(kubectl get job component-test \
-o jsonpath='{.status.succeeded}')
if [ $SUCCESS != '1' ]; then exit 1; fi
echo "Component test successful"

The full pipeline can be found in the k8s-sentences-age repository on Github.

It is worth noting here that starting a test job and validating the result is what a helm test does. Helm test is a way of formally integrating tests into Helm charts such that users of the chart can run these tests after installing the chart. It therefore makes good sense to include the tests in your Helm charts and make the test container available to users of the Helm chart. To include the test job above into the Helm chart we simply need to add the annotation shown below and include the YAML file as part of the chart.

...
metadata:
  name: component-test
  annotations:
    "helm.sh/hook": test

When a KIND Cluster Isn’t Sufficient

In some situations a local Kubernetes cluster on a CI worker might not be ideal for your testing purposes. This could be when:

  • Unit tests have call functions e.g. use classes from the application. In this case application and tests are most likely a single container which could be executed without Kubernetes.
  • Components tests involve no Kubernetes-related artifacts. If the example shown above did not have a Helm chart to test, the docker-compose solution would have been sufficient.
  • Tests involve a characteristics test, e.g. measuring performance and scalability of your application. In such situations you would need infrastructure which is more stable with respect to capacity.
  • Integration tests that depend on other artifacts cannot easily be deployed in the local KIND cluster, like a large database with customer data.
  • Functional, integration or acceptance tests require the whole ‘application’ to be deployed. Some applications might not fit within the limited size of the KIND cluster.
  • Tests which have external dependencies, e.g. cloud provider specific ingress/load balancing, storage solutions, key management services etc. In some cases these can be simulated by deploying e.g. a database on the KIND cluster and in other cases they cannot.

However, there are still many cases where testing with a KIND Kubernetes cluster is ideal, e.g. when you have Kubernetes-related artifacts to test like a Helm chart or YAML manifests, and when an external CI/staging Kubernetes cluster involves too much maintenance overhead or is too resource in-efficient.

Identifying Kubernetes Config Security Threats: Pods Running as Root

By | Blog

Member Post

Guest post by Joe Pelletier, VP of Strategy at Fairwinds

With different teams – development, security and operations – and prioritization of speedy delivery over perfect configuration, mistakes are inevitable. As teams work on building and shipping new applications, mistakes are bound to happen if the only safeguard is an application developer remembering to adjust Kubernetes’ default configurations. Security, efficiency and reliability end up suffering

Having individual contributors design their own Kubernetes security configuration all but ensures inconsistency and mistakes. It doesn’t often happen intentionally, often it’s because engineers are focused on getting containers to run in Kubernetes. Unfortunately, many neglect to revisit configurations along the way causing gaps in security and efficiency.

A prime example is overpermissioning a deployment with root access to just get something working. Malicious attackers are constantly looking for holes to exploit and root access is ideal for them.

Platform teams responsible for security can attempt to manually go through each pod to check for misconfigured deployments. But many DevOps teams are under-staffed and don’t have the bandwidth to manually inspect every change introduced by a variety of engineering teams. They need a way to proactively audit workloads and validate configurations to identify weaknesses, container vulnerabilities, and misconfigured deployments. Configuration validation provides a tool to proactively identify holes in security instead of waiting for a breach to happen.

Kubernetes configuration validation ensures consistent security: 

  • Built-in centralized control | Often security teams require DevOps to implement a variety of infrastructure controls to meet internal standards. When it comes to Kubernetes, most security teams lack visibility beyond basic metrics, straining the relationship with DevOps. Consolidating this data in a single location bridges the gap between these two stakeholders.
  • Reduce the risk of mistakes | A configuration validation platform dramatically reduces the risk of errors from either Kubernetes inexperience or oversight by adding an expert configuration review into the development process.
  • Ensure security | Configuration validation is key to security for Kubernetes — getting it right dramatically reduces the risk of security incidents in production. Configuration validation ensures that security best practices are being followed organization-wide.

Platform teams can opt to build their own tool for absolute control, but few companies gain a competitive edge from having their own tool. There are open source options available, but teams must evaluate, manage and maintain and building the holistic platform can be time consuming. 

As the fastest way to identify Kubernetes misconfiguration, a purpose-built solution offers baked-in guidance curated by Kubernetes experts with dedicated support when needed. It allows teams to focus time on developing and deploying applications while simplifying operations.

Check out an example of a configuration validation solution. 

Learn more about configuration validation by visiting https://www.fairwinds.com/.

Interested in the Future of Cloud Native Observability? Join SIG-Observability

By | Blog

CNCF Staff Post

The Special Interest Group for observability was formed recently under the umbrella of CNCF, with the goal of fostering the ecosystem around observation of cloud native workloads. Chairs Matt Young and Richard Hartmann are spearheading the SIG’s activities, which include producing supporting material and best practices for end users as well as providing guidance for CNCF observability-related projects.

Among its stated scope:

  • Identify and report gaps in the CNCF’s project portfolio on topics of observability to the TOC and the wider CNCF community.
  • Collect, curate, champion, and disseminate patterns and current best practices related to the observation of cloud-native systems that are effective and actionable. Educate and inform users with unbiased, accurate, and pertinent information. Educate and help other CNCF projects regarding observability techniques and best current practices available within the CNCF.
  • Provide and maintain a vendor-neutral venue for relevant thought validation, discussion, and project feedback.
  • Provide a ladder for community members to become involved with the technical oversight of projects within the SIG’s scope in an open, transparent, and inclusive way.

Hartmann says that while “it’s always the right time” for an observability SIG, the organizers got together now because of the recent growth in the area of observability within CNCF. “Up to now, you mainly had Prometheus, but there are more and more efforts around [observability],” he says. “Cortex and Thanos are up for review to move from the sandbox to incubating. OpenMetrics is finally moving. OpenTelemetry is progressing. We needed a space to talk about the cooperation that will come from all of this.”

The short-term goals include “working through both the review and project progression backlog first, and we are making great strides here,” Hartmann says. “We also want to talk about BCPs, best current practices, so we’re able to actually make suggestions for how to operate observability in a cloud native manner.”

Looking ahead, Hartmann says the SIG is starting to talk about data analysis, “which will most likely lay some groundwork for machine learning and AI, i.e., doing data science on your monitoring data.”

As to why he and Young decided to take on this work, he says, “Personally, as silly as this might sound, I want to make the world a better place. Make things cleaner. You know the phrase from The Dark Knight, ‘Some people just want to see the world burn’? My tagline is ‘People just want to see the world turn.’”

To that end, he says he’s focused on “leading calls and conversations, sniffing out and making explicit the agreements between people which they don’t see, and bringing this to consensus.”

If you’re interested in getting involved with SIG-Observability, you are invited to attend the SIG call on the 2nd and 4th Tuesdays of every month at 1600 UTC. (See details on the CNCF Community Calendar.) Or join the conversation in the #sig-observability channel on the CNCF Slack. 

Introducing the CNCF Technology Radar

By | Blog

Today, we are publishing our first CNCF Technology Radar, a new initiative from the CNCF End User Community. This is a group of over 140 top companies and startups who meet regularly to discuss challenges and best practices when adopting cloud native technologies. The goal of the CNCF Technology Radar is to share what tools are actively being used by end users, the tools  they would recommend, and their patterns of usage.

Slides: github.com/cncf/enduser-public/blob/master/CNCFTechnologyRadar.pdf

How it works

A technology radar is an opinionated guide to a set of emerging technologies. The popular format originated at Thoughtworks and has been adopted by dozens of companies including Zalando, AOE, Porsche, Spotify, and Intuit.

The key idea is to place solutions at one of four levels, reflecting advice you would give to someone who is choosing a solution:

  • Adopt: We can clearly recommend this technology. We have used it for long periods of time in many teams, and it has proven to be stable and useful.
  • Trial: We have used it with success and recommend you take a closer look at the technology.
  • Assess: We have tried it out, and we find it promising. We recommend having a look at these items when you face a specific need for the technology in your project.
  • Hold: This category is a bit special. Unlike the other categories, we recommend you hold on using something. That does not mean that these technologies are bad, and it often might be OK to use them in existing projects. But technologies are moved to this category if we think we shouldn’t use them because we see better options or alternatives now.

The CNCF Technology Radar is inspired by the format but with a few differences:

  • Community-driven: The data is contributed by the CNCF End User Community and curated by community representatives.
  • Focuses on future adoption, so there are only three rings: Assess, Trial, and Adopt.
  • Instead of covering several hundred items, one radar will display 10-20 items on a specific use case. This removes the need to organize into quadrants.
  • Instead of publishing annually, the cadence will be on a shorter time frame, targeting quarterly.

Our first technology radar focuses on Continuous Delivery.

CNCF Technology Radar: Continuous Delivery, June 2020

During May 2020, the members of the End User Community were asked which CD solutions they had assessed, trialed, and subsequently adopted. 177 data points were sorted and reviewed to determine the final positions.

This may be read as:

  • Flux and Helm are widely adopted, and few or none of the respondents recommended against.
  • Multiple companies recommend CircleCI, Kustomize, and GitLab, but something was lacking in the results. For example, not enough responses, or a few recommended against.
  • Projects in Assess lacked clear consensus. For example, Jenkins has wide awareness, but the placement in Assess reflects comments from companies that are moving away from Jenkins for new applications. Spinnaker also showed broad awareness, but while many had tried it, none in this cohort positively recommended adoption. Those who are looking for a new CD solution should consider those in Assess given their own requirements.

The Themes

The themes describe interesting patterns and editor observations:

  1. Publicly available solutions are combined with in-house tools: Many end users had tried up to 10 options and settled on adopting 2-4. Several large enterprise companies have built their own continuous delivery tools and open sourced components, including LunarWay’s release-manager, Box’s kube-applier, and stackset-controller from Zalando. The public cloud managed solutions on the CNCF landscape were not suggested by any of the end users, which may reflect the options available a few years ago.
  2. Helm is more than packaging applications: While Helm has not positioned itself as a Continuous Delivery tool (it’s the Kubernetes package manager first), it’s widely used and adopted as a component in different CD scenarios. 
  3. Jenkins is still broadly deployed, while cloud native-first options emerge. Jenkins and its ecosystem tools (Jenkins X, Jenkins Blue Ocean) are widely evaluated and used. However, several end users stated Jenkins is primarily used for existing deployments, while new applications have migrated to other solutions. Hence end users who are choosing a new CD solution should assess Jenkins alongside tools that support modern concepts such as GitOps (for example, Flux).

The Editor

Cheryl Hung is the Director of Ecosystem at CNCF. Her mission is to make end users successful and productive with cloud native technologies such as Kubernetes and Prometheus. Twitter: @oicheryl

Read more

CNCF Projects for Continuous Delivery: 

  • Argo is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. A CNCF incubating project, it is composed of Argo CD, Argo Workflows, and Argo Rollouts.
  • Flux is the open source GitOps operator for Kubernetes. It is a CNCF sandbox project.
  • Helm is the open source package manager for Kubernetes. It recently graduated within CNCF.

Case studies: Read how Babylon and Intuit are handling continuous delivery.

What’s next

The next CNCF Technology Radar is targeted for September 2020, focusing on a different topic in cloud native such as security or storage. Vote to help decide the topic for the next CNCF Technology Radar.

Join the CNCF End User Community to: 

  • Find out who exactly is using each project and read their comments
  • Contribute to and edit future CNCF Technology Radars. Subsequent radars will be edited by people selected from the End User Community.

We are excited to provide this report to the community, and we’d love to hear what you think. Email feedback to info@cncf.io.

About the methodology

In May 2020, the 140 companies in the CNCF End User Community were asked to describe what their companies recommended for different solutions: Hold, Assess, Trial, or Adopt. They could also give more detailed comments. As the answers were submitted via a Google Spreadsheet, they were neither private nor anonymized within the group.

33 companies submitted 177 data points on 21 solutions. These were sorted in order to determine the final positions. Finally, the themes were written to reflect broader patterns, in the opinion of the editors.

Statement from CNCF General Manager Priyanka Sharma on the Black Lives Matter Movement

By | Blog

CNCF stands in solidarity with the Black Lives Matter movement and racial equality for all. As a foundation that serves a diverse, global ecosystem of members, we also stand in solidarity with members of our community who challenge us all to do better — not just for right now — but for two months from now, two years from now, and beyond that.

As community member Bryan Liles recently tweeted

For several weeks, CNCF has been listening, watching, learning, and feeling. We are not the heroes of this movement, Black people are, but what’s happening — and what needs to happen — matters to all of us. One way we are supporting the movement is by amplifying the voices of our community members through our social channels – see a short list below this post and feel free to ping us if you want to see yours amplified as well. As part of our everyday work, we provide diversity scholarships and diversity and inclusion activities at our events, support intern programs, sponsor programs to bring student developers into open source development, help developers and first time open source contributors to contribute to open source communities, and provide support for anyone who wants to present at CNCF events. Recent events have taught us that we need to understand, empathize, and do more. 

In my view, open source often reflects what is best in our society. It brings together people from across the world, working on problems relevant to anyone touched by technology, for the better of our collective human race. The challenges faced by underrepresented communities, however, are meaningful even in this supportive environment. Systemic issues with the education system have prevented many talented individuals from discovering and thriving in our industry.  

So I call out to each and every one of us in the cloud native ecosystem – do for others what you did for me when I first joined the community. Seek out and support members of our community with mentorship, guidance, and collaboration. If each one of us helps one person, that’s more opinions and experiences that will make our community and technology that much better. 

The very core of open source is to welcome, collaborate, and unify. If you have a beautiful story to share, reach out to us. Let’s work together to change the narrative of our society. Let’s work together, no matter how long it takes – until Black Lives Matter is no longer a statement but is just part of our collective quilt.

Thank you.

# # #

In support of our community

Kubernetes Resources Management – QoS, Quota, and LimitRange

By | Blog

Member Post

Guest post originally published on the Darumatic blog by Brandon Tsai

Before Kubernetes, software applications were typically run standalone in a VM and use up all the resources. Operators and developers needed to carefully choose the size of the VM for running them. But in Kubernetes, pods/containers can run on any machine. This requires sharing resources with others. That is where the QoS (Quality of Service Classes) and Resource Quota comes in.

Resource Request and Limits

When you create a pod for your application, you can set requests and limits for CPU and memory for every container inside. Properly setting these values is the only way to instruct Kubernetes on how to reserve enough resources for your applications.

For example,

spec:
  containers:
  - image: k8s/hello-k8s
    name: hello-k8s
    resources:
      requests:
        cpu: 100m
        memory: 200Mi
      limits:
        cpu: 200m
        memory: 400Mi

Requests: The values are used for scheduling. It is the minimum amount of resources a container needs to run. The Pods will remain in “Pending” state if no node has enough resources.

Limits: The maximum amount for this kind of resource that the node will allow the containers to use.

  • If a container attempts to exceed the specified cpu limit, the system will throttle the container
  • If the container exceeds the specified memory limit, it will be terminated and potentially restarted dependent upon the container restart policy.

Quality of Service Classes (QoS)

A node can be overcommitted when it has pod scheduled that make no request, or when the sum of limits across all pods on that node exceeds the available machine capacity. In an overcommitted environment, the pods on the node may attempt to use more compute resources than the ones available at any given point in time.

When this occurs, the node must give priority to one container over another. Containers that have the lowest priority are terminated/throttle first. The entity used to make this decision is referred as the Quality of Service (QoS) Class.

Priority Class Name Description
1 (highest) Guaranteed If limits and optionally requests are set (not equal to 0) for all resources and they are equal.
2 Burstable If requests and optionally limits are set (not equal to 0) for all resources, and they are not equal
3 (lowest) BestEffort If requests and limits are not set for any of the resources

Therefore, if the developer does not declare CPU/Memory requests and limits, the container will be terminated first. We should protect the critical pods in production projects by setting limits so they are classified as GuarantedBestEffort or Burstable ppods should be used in developing projects only.

Project Quota and Limit Ranges:

The administrator can set the Project Quota to restrict resource consumption. This has an additional effect; if you set a Memory request in the quota, then all pods need to set a Memory request in their definition. The new pod will not be scheduled and will remain pending if it tries to allocate more resources than the quota restriction.

A limit range is a policy to constrain resources by Pod or Container in a namespace. it can:

  • Set default request/limit for computing resources in a namespace and automatically inject them to Containers at runtime.
  • Enforce minimum and maximum resource usage per Pod or Container in a namespace.
  • Enforce minimum and maximum storage requests per PersistentVolumeClaim in a namespace.
  • Enforce a ratio between request and limit for a resource in a namespace.

What should we monitor for managing cluster resources?

Node Status

Make sure all nodes are in “Ready” state

Pod Status

Make sure no pod is in “Pending” Status

Percentage of resource (CPU/Memory) allocated from the total available resource in the cluster

A good warning threshold would be (n-1)/n*100, where n is the number of nodes.

Over this threshold, you may not be able to reallocate your workloads in the rest of the nodes.

Percentage of Resource (CPU/Memory) Usage in the node

The OS Kernel invokes OOMKiller when Memory usage comes under pressure in the node.

CPU Pressure will restrain processes and affect their performance.

A warning threshold to notify the administrator that this node may have issues or be about to reach “Eviction Policies”.

  • Check the “Eviction Policies” setting. Make sure alerts have triggered before reaching the eviction-hard thresholds.

CPU and Memory Request vs Capacity in the node

Add the following warning thresholds to notify the administrator that this node may not able to allocate new pods.

  • Less than 10% CPU can be allocated to CPU Request
  • Less than 10% Memory can be allocated to Memory Request

If n-1 nodes can not allocate new pods, then it is time to scale up or check whether the CPU/Memory requests are too high or not.

Disk Space in the node

If the node runs out of disk, it will try to free docker space with a fair chance of pod eviction

Memory and CPU usage per container

Because Kubernetes limits are per container, not per pod. Therefore it is not necessary to monitor resources usage per pod.

Ideally, containers should use a similar amount of resources than the ones requested. If your usage is much lower than your request this will waste valuable resources and potentially will be too hard to allocate new pods. On the opposite case, usage is higher than resources, you might face performance issues.

Conclusion

It is important to make sure requests and limits are declared and tested before deploying to production. Cluster admins can set up a namespace quota to enforce all of the workloads in the namespace to have a request and limit in every container. A good configuration of requests and limits will make your applications much more stable.

Appropriate monitoring and alerts will help the cluster admin to reduce the waste of the cluster resources and avoid performance issue. Ask us today if you need help to monitor your Kubernetes system! 🙂

CNCF Project Spotlight: Helm

By | Blog

Congratulations to Helm! Recently graduated within CNCF, Helm is the latest subject of our project spotlight.

We spoke to two Helm maintainers — Matt Butcher, a Helm co-founder and Principal Software Development Engineer at Microsoft, and Matt Farina, a Senior Staff Engineer at Samsung SDS — about how the project reached this point and where it’s headed next.

Graduation is of course the big milestone, but what are you proudest of along the way?

Matt Butcher: We hit one million Helm downloads per month in November 2019. That was a really exciting milestone for us. Knowing that so many organizations and users have put trust in our project is a momentous occasion — though one that brings a high burden of responsibility.

At KubeCon San Diego, we got to do our first big introduction of Helm 3. Helm 3 had been a long and frustrating journey for us. The development work was arduous. It took longer than we wanted, and all the while we were frantically trying to keep Helm 2 up to date. KubeCon felt like the victory lap, and a good chance to celebrate all of the contributions and contributors that made Helm 3 a reality.

Matt Farina: When it comes to developing software, there are a lot of competing voices with opinions. There are Kubernetes insiders who know how the sausage is made. There are new application operators who are just getting started. There are those at fast-moving startups. There are those at slow-moving enterprises with regulations they need to meet. And just about everywhere in between.

The Helm developers have taken the time to listen to users and potential users. For example, some of the changes made to Helm v3 and the changes we are talking about in the coming year are targeted at making application operator experiences better and easier. I am proud that the Helm maintainers take the time to listen to end users and work on solutions to their real-world needs.

I think this is one of the things that’s made Helm a success.

Can you talk a bit about the graduation requirements, and how well Helm measured up on all the criteria?

Matt Farina: Graduating is no easy task. We didn’t just want to pass but wanted to excel at the graduation criteria.

For example, one of the criteria is to obtain a Core Infrastructure Initiative Best Practices badge. We not only obtained the best practices badge but are almost all the way to the silver level. This included the creation of a Helm Security Assurance Case, which looks at the way Helm is developed and operated from a security perspective.

Another requirement related to security is having completed a third-party independent security audit. This audit looked at both Helm and the security processes we have around the project. I was blown away at the conclusion from the security auditors, who wrote:

To conclude, in light of the findings stemming from this CNCF-funded project, Cure53 can only state that the Helm project projects the impression of being highly mature. This verdict is driven by a number of different factors described above and essentially means that Helm can be recommended for public deployment, particularly when properly configured and secured in accordance to recommendations specified by the development team.

A third criteria deals with adoption. While not a listed element in the criteria, which looks for some names of those using Helm which we provided, we also looked at Helm downloads over time in the spirit of viewing adoption. Between KubeCon + CloudNativeCon San Diego and Helm graduation, which was a little less than 6 months’ time, the number of downloads per month had doubled. I had to double check those numbers as I was just so surprised by them.

What are your goals or personal hopes for Helm?

Matt Butcher: I think the most interesting thing about Helm, vis-a-vis CNCF, is the fact that we are an old project. Helm was introduced at the very first KubeCon, and has grown up alongside the cloud native community. Over time, we have watched the broad CNCF community change from the “wild west” of research, experiment, refine to the mainstream enterprise market. I think every single one of the Helm core maintainers will tell you that today we spend most of our time talking about stability and maintainability, not new features and fun ideas.

In this vein, Helm will have two major challenges over the next few years.

On the one hand, we will have to learn how to say “no” to exciting features that are either too experimental or are of only niche appeal. I think Helm 3 is the last release where we’ll see any major feature changes. On the other hand, Helm has become the main “interface point” to Kubernetes for so many people that it is now incumbent upon us to protect people from the changes in Kubernetes. In this way, the Helm community has asked us to be the “backward compatibility” layer for Kubernetes. It’s an interesting circumstance to find ourselves in, but if we want to welcome and support the mainstream enterprise with its steady cadence, this is very much something that we should prioritize.

The days of “moving fast and breaking things” are over for Kubernetes, for Helm, and for the rest of the graduated CNCF projects. Sometimes that is hard for people like me to accept. I look back with fondness on the days when we could spike out a new Helm feature in a few days, and then cut a brand-new release a week later. Now, it routinely takes months to get a single pull request merged. An idea for a new feature might require months of debate, only to be rejected at the end. But if we are to keep the user’s best interests in mind, then this is absolutely the way we should go. So it is not just the project that must mature, it is us as core maintainers as well.

Any shoutouts you want to give to the community?

Matt Farina: In the past year we have seen some new people and new contribution areas that, I think, deserve to be highlighted. These are beyond the usual suspects and typical Helm topics.

Marc Khouzam rewrote the way Helm handles auto-completion to make it easier to work with. The feature worked so well, he was able to get it upstreamed into Cobra, which Helm uses for its base console functionality.

The OCI is working on artifact support in distributions, which will allow us to store Helm charts in container repositories. Josh Dolitsky has done a great job working with the OCI and getting code, currently in experimental form, into Helm to support this.

Helm is moving to a distributed Helm repository environment, where we make it easier for organizations to run their own repositories. Scott Rigby and Reinhard Nägele have been working on tools, including GitHub Actions, to make the automation easier. You can find the actions in the GitHub Marketplace.

These are just a subset of the people doing amazing work in the community. There are too many to name, and we appreciate all of them.

Matt Butcher: I almost feel like it would be unjust to call out only a few. Thousands of people have contributed over time. And sometimes it’s those small patches or updates to the documentation that make a world of difference. But I do feel like I should call out Martin Hickey, one of the Helm core maintainers, for his long-term “chop wood, carry water” mentality. If one were to assemble a room full of Helm users, and then ask those in the room to raise a hand if Martin had helped them (via Slack, via issues, or even with his helpful utilities and conference talks), I would not be surprised if 75% of the audience put a hand in the air.

Of course, Martin is not the only generous soul in the community, but I am deeply appreciative of the work he has done day after day.

What do you want the greater community to know about Helm?

Matt Butcher: Since the very first day of Helm development, we have called it the “package manager for Kubernetes.” Hidden in that phrase, though, are two goals that we keep at the top of our minds:

  1. Helm should be a reliable tool for installing, upgrading, and deleting Kubernetes applications.
  2. Helm should be the easiest way for a new Kubernetes user to install their first application.

It has been hard to stick to these two goals. Many people want Helm to be a replacement for Chef or Puppet or tools like that. Others pit Helm as a competitor to the operator design pattern. Still others want us to morph Helm into an application management platform. It was not until recently that we, as the core group of maintainers, had to begin to draw some lines. Helm cannot be the Swiss army knife of the cloud. And we shouldn’t be. There are thousands of brilliant developers out there building excellent tools in each of those niches. The mentality that we need to “own the space” is, to put it bluntly, nonsense.

At the end of the day, Helm needs to stay true to its role as the package manager for Kubernetes, and strive to meet those same two goals we set out to achieve five years ago.

Maturing as a project is hard. Maturing as a developer and a project leader is hard. But I earnestly believe that if we can build a project that keeps a tight and narrow focus, we can produce a superior tool that solves a core set of problems well. If we let our focus drift, we may solve more problems — but we will solve them poorly in ways that frustrate and ultimate drive away the users we care about.

For more about Helm, check out the CNCF webinar Charting Your Voyage to Helm 3 on June 12.

 

CNCF Ambassador Spotlight: Ariel Jatib

By | Blog

We’re shining the spotlight on Ariel Jatib, CNCF Ambassador and webinar moderator extraordinaire.

Ariel first got involved with the cloud native community when he started the NYC Kubernetes Meetup in 2015, and helped organize many of the groups in cities like Seattle, San Francisco, and L.A. He is currently a Business Development Manager for cloud native at NetApp, which acquired his company, StackPointCloud, in 2018. 

More recently, he’s become very active in moderating CNCF webinars — 9 and counting! “With no meetups happening [right now], I’ve had more time to host webinars,” he says. And it fits in with what he considers his overall goal as a CNCF ambassador: “To promote cloud native technologies and the spirit of the community.”

Ariel took some time to chat with us about being an ambassador.

What’s the best part of being an ambassador? Any fun KubeCon + CloudNativeCon memories?

I’ve always enjoyed the ambassador breakfast at KubeCon. It’s there I join up with [fellow ambassador and MLB Principal DevOps Engineer] Mike Goodness and we go to catch the opening keynotes. We’ll say hi to Nanci Lancaster from [the Linux Foundation events team] after the talks — she was key in helping the NYC group grow while working at DigitalOcean. Coffees with Joonas Bergius. Those traditions bring me joy, and it’s something I look forward to at each KubeCon. 

Do you have any favorite moments from the webinars?

I really enjoyed chatting with Kaslin Fields of Google on “Welcome to CloudLand.”

Are there any shoutouts you’d like to give to the community?

A big shoutout to the NYC blueberries — Paul, Stephen, Pop, Liz. May we all soon enjoy each other’s company and some deviled eggs. Shoutouts to Mark Coleman, Daniel Sasche a.k.a. RUNK8S, and the old StackPoint crew — Matt, Pabs, Fran, Nate, and Tareque.  

Any final thoughts?

Thank you all for a magical ride. May you and your loved ones be safe and be well during these trying times.

CNCF Community Leader Spotlight: Liz Rice

By | Blog

You’ve seen her on the keynote stage as co-chair of the 2018 KubeCon + CloudNativeCon events in Copenhagen, Shanghai, and Seattle. Now Liz Rice is settling into her new role as chair of CNCF’s Technical Oversight Committee (TOC). In this community spotlight, we’re celebrating Liz and her many contributions to the cloud native world.

Liz, who’s the technical evangelist at container security specialists Aqua Security, took time to tell us about her journey from open source contributor to TOC chair.

Please tell us a bit about how you got into the cloud native world.

My interest in programming started as a child when we got a ZX80, and from that point on I knew I’d work with computers. At the start of my career I worked on network protocol software, and then spent a few years at consumer-facing companies Skype and Last.fm. I was working on a TV and film recommendation service when I first heard of Docker from one of our neighbors at an accelerator, and ended up a few months later co-founding another startup called Microscaling, where we explored container auto-scaling (way before it was fashionable!). Then I joined Aqua Security, where I got really immersed in cloud native security.

Do you remember your first contribution to an open source project?

I couldn’t remember it,  so I looked it up — the PR was called “Make social-friends-finder work with django-allauth” back in 2012.

Is there any advice you’d give to people who want to start contributing? 

By the time I started wanting to contribute, I already had years of development experience, and my initial fears were more about the social side of it: I was worried about getting the process wrong, or unintentionally offending someone, or not having the credibility to have my changes accepted. Open source is as much about collaboration as it is about code, so it’s helpful to remember that you’re dealing with other people! I’ve seen people get disheartened because they tried to do too much at once without discussing changes first, creating a giant PR that’s hard to review. It’s a good idea to start with small changes, or by explaining first what you’re thinking of doing, rather than jump in with thousands of lines of code. But my experience of open source, particularly in cloud native, is that maintainers are generally excited that you want to contribute, so they’re likely to welcome you and point you in the right direction if you ask for help.  

Why did you decide to get more involved?

The cloud native world is based on open source, so when I started working with containers I found myself using lots of open source code. I was also starting to do more talks, and it made sense to publish the demos and code from those so that people could try them for themselves. I love experimenting with ideas and building proof-of-concepts to see how things work or whether certain ideas might fly, and the feedback loop from people using and building on my work was really rewarding. Fast forward to today, where I manage open source engineering at Aqua Security, with a team of great folks contributing to projects and building open source security tools that complement our commercial products.

How did you become KubeCon + CloudNativeCon co-chair?

[CNCF Executive Director] Dan Kohn reached out to me, and I jumped at the opportunity! My first co-chair was Kelsey Hightower, and I learnt a ton from him, particularly around trying to build a program that reflected what the community wanted to see (even though you can’t please everyone all of the time!).

Do you have any favorite moments from the KubeCons you co-chaired? 

So many! In Copenhagen I remember asking the audience to help me pronounce the word “hygge” properly, and being amazed that people responded with such enthusiasm! It’s a real privilege – and SO MUCH FUN – getting to interact with so many people.

What led to your becoming TOC chair? 

Once my time as co-chair came to an end, I really wanted to carry on being involved in some deep way with the CNCF, and I realised that I had built up some useful knowledge and experience. I had put together the project update keynote for three KubeCons, and that involved research across the whole breadth of the CNCF landscape. Whatever my inner imposter was telling me, I recognised that this knowledge was fairly rare. Combined with my general software engineering experience, plus some confidence-inspiring conversations with people who encouraged me to put myself forward, I figured it was worth throwing my hat into the ring.

What are your goals for this role?

One of our biggest challenges right now is dealing with the pace of incoming TOC work, which has increased dramatically as the cloud native ecosystem grows. TOC members are all doing this work alongside our full-time jobs at our various companies, so it’s a tricky balance! We’re working hard to streamline the processes, and leverage skills, time and enthusiasm from a broader range of folks to help us, through the CNCF SIGs. But it’s important to recognise that this work can’t be reduced to a simple pass-or-fail checklist; there will always be judgement involved – and the TOC members are elected for having the skills and experience to apply that judgement.

Also as the CNCF grows, and we have more end user members with more production experience to learn from, I’m keen for us to work more closely with them to make sure our project portfolio addresses their needs.

What message would you like to send to newcomers to the cloud native community?

Welcome! You won’t be unusual if you feel confused or overwhelmed at times, but there are a ton of people here who are really keen to help you find your feet.

Any fun facts about you that you’d be willing to share? 

In my spare time I’m a decent cyclist and a very mediocre drummer!

 

1 2 3 50