KubeCon + CloudNativeCon Europe Virtual | August 17-20, 2020 | Don’t Miss Out | Learn more

Category

Blog

Open Sourcing the etcd Security Audit

By | Blog

Guest post from Sahdev Zala and Xiang Li, maintainers for etcd 

We are proud to announce that the etcd team has successfully completed a 3rd party security audit for the etcd latest major release 3.4. The third party security audit was done for etcd v3.4.3 by Trail of Bits. We are thankful to the CNCF for sponsoring this audit. Also our big thanks to all the etcd maintainers, specially to Gyuho Lee, Hitoshi Mitake and Brandon Philips for working along with us during the whole process of auditing work. 

A report from the security audit is available in the etcd community repo. We recommend that you take a look at it for the details. The report covers the process, what has been reviewed, and the issues that have been identified to be addressed. The audit was performed as a mixture of manual and automated review. Automated review consisted of running various static analysis tools, such as errcheck, ineffassign, and go-sec. The google/gofuzz and dvyukov/go-fuzz testing harnesses were developed to test the etcd Write Ahead Log (WAL) implementation. Results were subsequently reviewed and triaged as necessary. Manual review focused on gaining familiarity with the implementation details of etcd in various areas, such as, configuration options, default settings, service discovery, WAL operations, Raft consensus and leader election, proxy and gateway. The various security areas evaluated include data validation, access controls, cryptography, logging, authentication, data exposure, denial of service and configuration. 

We are glad to see that there was no major issue found in the core components of etcd. According to the report summary, overall, the etcd codebase represents a mature and heavily adopted product. From the reported issues there was only one high severity issue which was found in the etcd gateway which is a simple TCP proxy that forwards network data to the etcd cluster. All the issues and severity are explained in great detail in the report. Issues are already addressed with needed code updates, documentation and better logging. The fixes are backported to supported versions of etcd, v3.3 and v3.4. These updated releases are now available. The security advisories are created using the GitHub Security tool to publish information about security vulnerabilities.

It is worth noting that a security audit is part of the graduation criteria for CNCF projects. Specifically, the graduation criteria says:

Have completed an independent and third party security audit with results published of similar scope and quality as the following example (including critical vulnerabilities addressed): https://github.com/envoyproxy/envoy#security-audit and all critical vulnerabilities need to be addressed before graduation.

The next step for the etcd team is to work on the project graduation. 

Security audits are one of the benefits of CNCF projects and we are grateful for them and the analysis performed by Trail of Bits, they were also involved in the detailed Kubernetes Security Audit last year. This analysis has provided some concrete areas we can work to improve and given us confidence in what we have. 

A Guide to Untangling the CNCF Cross-Community Relationships

By | Blog

Ambassador Post

Guest post from Diane Mueller, Director of Community Development at Red Hat

The adoption of CNCF technology and continuous growth in terms of projects, contributors, and end users has created one of the most active, dynamic open source ecosystems in the world. Since March 2018, when the first project, Kubernetes, officially ‘graduated’ as a CNCF project, 10 more projects have ‘Graduated,’ and a boatload of new ones (19) are currently being “Incubated” by the CNCF. It has become a daunting, almost impossible, task to navigate the complexity of the community, even as an active participant since the inception of the CNCF. 

While I am eternally grateful for the filters that are baked into the CNCF Interactive Landscape enabling us to easily navigate to each of the project’s home pages, the Landscape diagram doesn’t help explain in a digestible way the burgeoning complexity of the relationships between projects, participants, and the myriad of repositories that are bursting at the seams with innovations, ideas, and almost continuous release announcements. 

For the past few years at Red Hat, with the help of some open source tooling from Bitergia, I’ve been applying data visualization techniques to help us navigate the uncharted territory of project/participant relationships using network analysis. These interactive visualizations enable us to help nurture, track, and support the multiple CNCF and other open source communities that we participate in. 

Leveraging this data-driven approach has become a bit of a minor obsession of mine that has helped untangle key relationships, identify new entrants (both projects and participants), and facilitate conversations with participants that often were not even aware themselves of the key roles they were playing bridging multiple communities. 

Figure #1: Year to Date Diagram of Developers’ (pink dots) Git Activity in CNCF Graduated and Incubated projects as of July 2020. Note: There is a line (pink) when that developer has contributed to a given project (blue dots). A developer has more than one line, if that developer has contributed to more than one project – these are the connector’ participants. 

Interactive Visualization tools help us gain a better understanding of the underlying relationships between the projects and their participants. One still needs a basic understanding of the roles each of the projects plays in the ecosystem to utilize the information accurately. 

The ‘jellyfish’ diagram represents all of the participants and their relationships with both Graduated and Incubated CNCF projects. The current data set includes 13K+ participants, 250+ repositories, and just goes back to year to date as of July 2020. 

There are quite a few more ancillary repositories that we have not yet incorporated into the data set as it’s a bit of a moving target, as you might imagine. For this visualization, we’ve just included the participants’ code contributions.

Each dot represents a developer that has committed at least one piece of code, logged an issue, or made a pull request to repositories.  All the data points come from publicly available data from Github similar to the data that CNCF’s Devstats dashboards

Untangling the Community Connections

If you take a look at the left side of the ‘jellyfish’ diagram above, you will notice that the Linkerd, TUF, and Falco appear a bit less entangled with the rest of the CNCF projects, but still share participants with Kubernetes. This makes it relatively easy to ‘see’ the ‘connector’ personas that are bridging these communities and (one hopes) facilitating communication. At the very least, the network analysis helps us pinpoint participants with knowledge of both projects that can potentially be tapped to help bridge any gaps. 

In the denser section of the diagram, we see those projects whose participants are highly interconnected with multiple projects. This activity highlights the interdependencies of the communities and the complexity of the communication required to ensure healthy, aligned project releases, and road maps. These connectors are the ones that know how each project works, understand their idiosyncrasy, and are the glue that helps them to be aligned and facilitate the cooperation.

To illustrate, let’s dive into a specific example. When we filtered on just the Kubernetes, Operator Framework, Rook, and Helm, we see more clearly the ‘connector’ participants between the projects. 

Figure #2: Kubernetes in the middle, Rook at the right side, Operator Framework at the top, and HELM at the left side of the chart. 

There appears to be a healthy amount of cross-community collaboration and communication between Operator Framework and Kubernetes (a), Helm and Kubernetes (b), and also signals that there’s an opportunity to encourage more collaboration between Helm and Operator Framework (c). 

Nurturing and Onboarding New Participants

With an average of 350 first time committers each month joining into the CNCF ecosystem. We can watch for their points of entry to see which projects they have engaged previously (if any) with, which ‘connectors’ that might help be able to help them get better engaged and help us ensure that they are plugged into the whole ecosystem. 

Identifying Emerging Projects

An additional insight we gain from this network analysis is the ability to ‘see’ movement over time as participants migrate from one project to the next. Incorporating a time filter to the visualizations can often give us the ability to predict the emergence of new projects and initiatives within and outside of the CNCF ecosystem, often a signal to get ready for a new submission to the CNCF Technical Oversight Committee  for Sandbox status

How to Connect

If you’d like to learn more about applying network analysis to community development, check out this recent presentation “Data Driven Approach to Community Development”  with OpenShift’s Diane Mueller and Bitergia’s Daniel Izquierdo

If you are involved in any of the CNCF projects or initiatives, there’s a great new CNCF Contributor Strategy SIG responsible for contributor experience, sustainability, governance, and openness guidance to help CNCF community groups and projects with their own contributor strategies for a healthy project.

If you are just looking to get started contributing to any of the CNCF projects, check out the CNCF’s Guide to Contributing to the CNCF ecosystem

If you’d like to learn more about Red Hat OpenShift’s Open Source Community Initiatives, please visit OpenShift Commons.

 

 

 

4 Tips for Maximizing Your Virtual KubeCon Experience

By | Blog

Member Post

KubeCon + CloudNativeCon sponsor guest post from Amanda Katona, Cloud Native Community Engagement Director at VMware

We are fast approaching the first ever virtual KubeCon. What started a few years ago with a handful of people in a hotel conference room evolved into thousands crowding convention center hallways. And now, it’s a party of one as we all have our own virtual experience.

At this point we have all attended multiple virtual conferences so far this year. With hundreds of hours of programmed content, it risks becoming an overwhelming (and occasionally numbing) experience. So, we wanted to share some ideas for how you can get the most value from the time you invest in KubeCon this year.

Build your agenda

It’s worth stating the obvious … don’t miss that all-important session (if it’s your first KubeCon, you might start here). Explore the KubeCon agenda and pinpoint the top few sessions you must attend. Create invites on your calendar and protect the time. When attending remote, it’s easy to let other responsibilities infringe upon your session time, so be prepared to defend it from meetings and fires that can smolder for a few days (if you were in Amsterdam, they’d have to wait).

Prepare questions for your top 5 sessions

Dig into the KubeCon agenda and pinpoint the 5 sessions that you find most compelling. Rather than come with a clean slate, reflect on what resonated and write down questions you want to ask the presenter. Either ask them in the session, or follow-up directly with speakers to request more resources or recommendations of peers with whom you should also connect. And for additional perspectives, bring your questions to the CNCF project maintainers at the Project Pavilion.

Take incredible notes

Open up an Evernote file or a moleskin notepad and scratch out everything that makes you think. And don’t stop here, the best note-takers add their own color commentary and opinions. That additional context adds a ton of value when it’s time to circle back through those notes days, weeks or months later and draw insight.

Get your swag on

The expo floor at a physical conference is like Halloween for adults—tech or treat. While you won’t be collecting any new swag this year, it’s a chance to show off your favorite swag from yesteryear. Pull out a vintage KubeCon 2017 t-shirt, don your branded socks and guzzle from your most envy-inspiring water bottle.

Be extra social and extra positive

Since we can’t be there in person to trade stories, we’ll need to be extra diligent about using Slack to engage. Find the Slack channels for your favored KubeCon tracks and hang out there to trade ideas with peers. Highlight standout ideas, important projects and opportunities to contribute. Most importantly, use social to spread positivity and celebrate the incredible effort that goes into nurturing this vibrant community year-round.

Eat french fries with mayonnaise

Cause hey, if we were in Amsterdam, that’s what we SHOULD be doing. You can take the KubeCon out of Amsterdam, but you can’t wrestle that last french fry from me. Put a batch in the oven and bring a little bit of the experience to your home office or kitchen table.

This isn’t what we expected. It’s a remarkably challenging time. And yet, we have little doubt this community will find ways to make the best of it. See you at KubeCon soon!

 

Kubernetes RBAC 101: Authentication

By | Blog

Member Blog Post

Guest post originally published on the Kublr blog by Oleg Chunikhin

Leveraging Client Certificates and Bearer Tokens to Authenticate in Kubernetes

In part one of this series on Kubernetes RBAC, we introduced authentication and authorization methods. In this article, we’ll dive a little deeper into authentication — a prerequisite for RBAC.

As we saw, there are a few authentication methods including client certificates, bearer tokens, HTTP basic auth, auth proxy, and impersonation. Because HTTP basic auth and statically configured bearer tokens are considered insecure, we won’t cover them here. Instead, we’ll focus on the authentication mechanisms that are viable options for production deployments.

Client certificates

When authenticating through client certificates, the client must first obtain a valid x509 client certificate which the Kubernetes API server will accept as authentication. This usually means that the client certificate must be signed by the cluster CA certificate.

Externally Signed Certificates

The client certificate can be signed by the Kubernetes API server itself, or externally by an administrator or an enterprise PKI. Let’s first look how the certificate is signed externally, outside the Kubernetes API server.

Kubernetes RBAC
Authentication: X509 Client Cert, PKI

  1. The client (user) generates a CSR (certificate signing request) using a personal private key
  2. The client (user) sends the CSR to the signing authority (an administrator or an enterprise PKI)
  3. The signing authority signs a client certificate based on the CSR and the Kubernetes API server CA private key
  4. The signing authority sends the signed certificate to the client
  5. The client can now use the client certificate with the private key to authenticate the API server requests

There is a drawback, however. The server CA private key will be exposed to an external system or administrator. While that may be acceptable with an enterprise PKI, it likely isn’t with manual certificate signatures.

Here is a sequence of signing certificate commands:

User: generate user private key (if not exist):

openssl genrsa -out user1.key 2048

User: generate user CSR:

openssl req -new -key user1.key -out user1.csr -subj "/CN=user1/O=group1/O=group2"

Admin: sign user client cert:

openssl x509 -req -in user1.csr -CA cluster-ca.crt -CAkey cluster-ca.key \
    -set_serial 101 -extensions client -days 365 -outform PEM -out user1.crt

User: use with kubectl via options or kubeconfig:

kubectl --client-key=user1.key --client-certificate=user1.crt get nodes

kubectl config set-credentials user1 --client-key user1.key --client-certificate user1.crt --embed-certs
kubectl config set-context user1 --cluster demo-rbac --user user1
kubectl --context=user1 get nodes

kubectl config use-context user1
kubectl config get-contexts
kubectl get nodes

Internally Signed Certificates

Alternatively, you can use client certificate authentication directly from the cluster. As a client, you can create certificate signature requests. In this case, the system administrator or external system does not sign it. Instead, it sends it to the Kubernetes cluster which will sign the certificate and return it to the administrator who can now extract the signed certificate from the Kubernetes API and send it back to the client. This is done with a special object in the Kubernetes API called CertificateSigningRequest.

Kubernetes RBAC 2

Authentication: X509 Client Cert, Kubernetes CSR

Here is a sequence of commands:

User: generate user privat key (if not exist):

openssl genrsa -out user2.key 2048

User: generate user CSR:

openssl req -new -key user2.key -out user2.csr -subj "/CN=user2/O=group1/O=group2"

Admin: use Kubernetes API server to sign the CSR:

kubectl apply -f - <<EOF
apiVersion: certificates.k8s.io/v1beta1
kind: CertificateSigningRequest
metadata:
  name: user2
spec:
  request: $(cat user2.csr | base64 | tr -d '\n')
  usages: ['digital signature', 'key encipherment',
    'client auth']
EOF

Admin (approver): approve or deny the CSR in the Kubernetes API:

kubectl certificate approve user2
kubectl certificate deny user2

Admin: extract the approved and signed certificate from the Kubernetes API:

kubectl get csr user2 -o jsonpath='{.status.certificate}' | \
base64 --decode > user2.crt

User: use with kubectl via options or kubeconfig:

kubectl --client-key=user2.key --client-certificate=user2.crt get nodes

kubectl config set-credentials user2 --client-key user2.key --client-certificate user2.crt --embed-certs
kubectl config set-context user2 --cluster demo-rbac --user user2

Bearer Tokens

Service Account

Instead of client certificates, you can also use bearer tokens to authenticate subjects in Kubernetes. The easiest way to get a token is by creating a service account in the Kubernetes API. The Kubernetes server will then automatically issue a token associated with the service account, and anyone using that token will be identified as using this service account to access the cluster.

RBAC Kubernetes 3

Authentication: Service Account

Here is a sequence of commands you can use to create a service account, get a token from it and use that token to access Kubernetes API:

Create service account:

kubectl create serviceaccount sa1

Get service account token:

kubectl get -o yaml sa sa1
SA_SECRET="$(kubectl get sa sa1 -o jsonpath='{.secrets[0].name}')"

kubectl get -o yaml secret "${SA_SECRET}"
SA_TOKEN="$(kubectl get secret "${SA_SECRET}" -o jsonpath='{.data.token}' | base64 -d)"

Send request:

kubectl "--token=${SA_TOKEN}" get nodes

kubectl config set-credentials sa1 "--token=${SA_TOKEN}"
kubectl config set-context sa1 --cluster demo-rbac --user sa1

Side note:

Please note that generally, when working with kubectl, you won’t specify secrets and credentials via command line instead you will use a kubectl configuration file. You can modify that configuration file using kubectl commands. This allows you to add the token into your kube config file as an additional set of credentials and use it via a new context in the kubeconfig file. Learn more about using kubeconfig files to organize access to different clusters with multiple sets of credentials in the Kubernetes documentation on this subject.

Using a kubeconfig file allows you to run kubectl without specifying any sensitive information in the command line while relying on the current context set within that config file.

Note that you can always operate with your config file using various command-line options.

OIDC Token

Alternatively, you can leverage an external identity provider like OIDC to authenticate through a token. At Kublr, we use Keycloak. We love this identity provider as it’s a powerful, scalable open source tool, supporting all modern standards like SAML, OIDC, XACML, etc. It also integrates with most identity management systems. Kublr uses it by default as a user management system for Kublr and managed Kubernetes clusters, but it can also serve as an identity broker when integrated with enterprise identity management tools, or even used as an identity manager for user applications via its powerful “realms.” Realms are a completely separate domain for users, groups, authentication, federation, etc.

How does authentication with OIDC work?

First, your Kubernetes API server must be configured to talk to an OIDC endpoint / OIDC provider. This is done via Kubernetes API server configuration parameters. The following snippet shows additions in the Kublr cluster specification necessary to setup the Kubernetes API server:

spec:
  master:
    kublrAgentConfig:
      kublr:
        kube_api_server_flag:
          oidc_client_id: '--oidc-client-id=kubernetes'
          oidc_groups_claim: '--oidc-groups-claim=user_groups'
          oidc_issuer_url: '--oidc-issuer-url=https://***'
          oidc_username_claim: '--oidc-username-claim=preferred_username'

When a client connects to a Kubernetes API, it talks to an identity provider using one of the flows defined in OIDC protocol to get an authentication access and refresh token. The identity provider sends the tokens back for the client to authenticate with the Kubernetes API.

The Kubernetes API server talks directly with the OIDC identity provider via OIDC API to verify if the client provided token is valid. The token provides all information needed for the Kubernetes API server to identify the client. The client, on the other hand, can also refresh that token using a “refresh token.”

Kubernetes RBAC

Authentication: OIDC

Let’s see how this looks in a command-line world. We will use cURL to talk to the identity provider and kubectl to talk to the Kubernetes server. Although in real-life scenarios in most cases this will be hidden under the hood of the framework or client library of your choice.

Login request from the client with visualization of the response:

curl \
  -d "grant_type=password" \
  -d "scope=openid" \
  -d "client_id=kubernetes" \
  -d "client_secret=${CLIENT_SECRET}" \
  -d "username=da-admin" \
  -d "password=${USER_PASSWORD}" \
  https://kcp.kublr-demo.com/auth/realms/demo-app/protocol/openid-connect/token | jq .

Login – the same request but with response tokens saved in environment variables for use in the followup commands:

eval "$(curl -d "grant_type=refresh_token" -d "client_id=kubernetes" \
  -d "client_secret=${CLIENT_SECRET}" -d "refresh_token=${REFRESH_TOKEN}" \
  https://kcp.kublr-demo.com/auth/realms/demo-app/protocol/openid-connect/token | \
  jq -r '"REFRESH_TOKEN="+.refresh_token,"TOKEN="+.access_token,"ID_TOKEN="+.id_token')" ; \
  echo ; echo "TOKEN=${TOKEN}" ; echo ; echo "ID_TOKEN=${ID_TOKEN}" ; echo ; \
  echo "REFRESH_TOKEN=${REFRESH_TOKEN}"

Refresh token request with visualized response:

curl \
  -d "grant_type=refresh_token" \
  -d "client_id=kubernetes" \
  -d "client_secret=${CLIENT_SECRET}" \
  -d "refresh_token=${REFRESH_TOKEN}" \
  https://kcp.kublr-demo.com/auth/realms/demo-app/protocol/openid-connect/token | jq -r .

Refresh – the same request with response tokens saved in the environment variables:

eval "$(curl -d "grant_type=refresh_token" -d "client_id=kubernetes" \
-d "client_secret=${CLIENT_SECRET}" -d "refresh_token=${REFRESH_TOKEN}" \
https://kcp.kublr-demo.com/auth/realms/demo-app/protocol/openid-connect/token | \
jq -r '"REFRESH_TOKEN="+.refresh_token,"TOKEN="+.access_token,"ID_TOKEN="+.id_token')" ; \
echo ; echo "TOKEN=${TOKEN}" ; echo ; echo "ID_TOKEN=${ID_TOKEN}" ; echo ; \
echo "REFRESH_TOKEN=${REFRESH_TOKEN}"

Token introspection request:

curl \
  --user "kubernetes:${CLIENT_SECRET}" \
  -d "token=${TOKEN}" \
  https://kcp.kublr-demo.com/auth/realms/demo-app/protocol/openid-connect/token/introspect | jq .

kubectl kubeconfig configuration and request:

kubectl config set-credentials da-admin \
   "--auth-provider=oidc" \
   "--auth-provider-arg=idp-issuer-url=https://kcp.kublr-demo.com/auth/realms/demo-app" \
   "--auth-provider-arg=client-id=kubernetes" \
   "--auth-provider-arg=client-secret=${CLIENT_SECRET}" \
   "--auth-provider-arg=refresh-token=${REFRESH_TOKEN}" \
   "--auth-provider-arg=id-token=${ID_TOKEN}"

kubectl config set-context da-admin --cluster=demo-rbac --user=da-admin

kubectl --context=da-admin get nodes

Access tokens are usually short-lived, while the refresh tokens have a longer shelf life. You can refresh tokens through the command line sending the ID and refresh token to the identity provider, providing you a set of refreshed tokens.

You can also introspect the token with an identity provider endpoint. That’s essentially an API the Kubernetes API server can use to check who is sending a specific request.

As mentioned above, there are two more ways to provide access to a Kubernetes cluster. One is using an auth proxy, mainly used by vendors to set up different Kubernetes architectures. It assumes that you start a proxy server, which is responsible for authenticating user requests and forwarding them to the Kubernetes API. That proxy can authenticate users and clients anyway it likes and will add user identifications into the request headers for requests that are sent to the Kubernetes API.

Authentication proxy

Authentication: Authenticating Proxy

This allows the Kubernetes API to know who they work with. Kublr, for example, uses this authentication method to proxy dashboard requests, web console requests, and provide a proxy Kubernetes API endpoint.

Lastly, there is authentication through impersonation. If you already have certain credentials providing access to the Kubernetes API, those credentials can be used to “impersonate” users through authorization rules. This will allow the user to send impersonation headers so the Kubernetes API will switch your authentication context to that impersonated user.

authentication impersonation

Authentication: Impersonation

For regular clients and for production purposes, you only really have two options: client certificates or bearer tokens.

Roll up your sleeves

If you want to get your hands dirty, there are some tools you can use to analyze and debug apps connected to a Kubernetes API. cURL is great for experiments with REST APIs. Then, of course, there is kubectl — the Kubernetes CLI. jq command-line JSON processor helps visualize and process JSON data. JSON and YAML, as you may know, are commonly used file formats for Kubernetes and the Kubernetes API.

cURL

Here is an example of using cURL in linux to call Kubernetes API:

curl -k -v -XGET -H 'Authorization: Bearer ***' \
  'https://52.44.121.181:443/api/v1/nodes?limit=50' | jq -C . | less -R
  • “-k” switch disables checking the server certificate
  • “-v” enables verbose output
  • “-H ‘Authorization: …’ ” adds a token authorization header
  • Sending curl output through “jq -C” command formats and colorizes server JSON output
  • “less -R” allows you to scroll up and down the output

kubectl

Examples of using kubectl:

kubectl --kubeconfig=kc.yaml get nodes

export KUBECONFIG="$(pwd)/config.yaml"
kubectl get nodes

kubectl get nodes --v=9
  • --kubeconfig” option allows you to specify the location of the kubeconfig file containing information and credentials necessary to locate and authenticate with a Kubernetes API server.
  •  “KUBECONFIG” environment variable can be used to specify a default kubeconfig file location
  • If neither “--kubeconfig” option nor “KUBECONFIG” environment variable are specified, kubectl will look for a kubeconfig file at a default location “$HOME/.kube/config”

Example of a kubeconfig file:

apiVersion: v1
kind: Config
clusters:
- name: demo-rbac
  cluster:
    certificate-authority-data: ***
    server: https://52.44.121.181:443
users:
- name: demo-rbac-admin-token
  user:
    token: ***
contexts:
- name: demo-rbac
  context:
    cluster: demo-rbac
    user: demo-rbac-admin-token
current-context: demo-rbac

Conclusion

Authentication in Kubernetes can be handled in different ways. For production-grade deployments you have some options. You can use client certificates that can be either signed externally or by Kubernetes through the Kubernetes API. Alternatively, you can use a bearer token in Kubernetes by creating a service account or leverage an external identity provider like OIDC. These are all valid approaches. Which route you’ll go will be ultimately determined by your system and application architecture and requirements.

If you’d like to experiment with RBAC, download Kublr and play around with its RBAC feature. The intuitive UI helps speed up the steep learning curve when dealing with complexities of Kubernetes deployment and RBAC YAML files.

Jaeger Project Journey Report: A 917% increase in companies contributing code

By | Blog

Project Post

Today we are excited to release our next project journey report for Jaeger. This is the sixth such report we have compiled for one of our graduated projects. It assesses the state of the Jaeger project and how CNCF has impacted its progress and growth.

Jaeger is an open source, end-to-end distributed tracing platform built to help companies of all sizes monitor and troubleshoot their cloud native architectures. Contributors to Jaeger include many of the world’s largest tech companies, such as Uber, Red Hat, Ryanair, IBM, and Ticketmaster as well as fast-growing mid-size companies like Cloudbees. 

The report highlights the growth of Jaeger since the project joined CNCF on September 13, 2017. Between then and now:

  • The total number of companies contributing code has increased by 917% from 29 to 295. 
  • Jaeger has enjoyed a 603% expansion of individual contributors adding 1,682.
  • the number of authors and companies committing documentation to Jaeger has grown by 496% and 415%, respectively. 

Since joining CNCF Jaeger has added:

  • 1,753 contributors
  • 4,310 code commits
  • 3.2K pull requests
  • 39.3K contributions
  • 302 contributing companies

Be sure to read the full report for more impressive growth stats!

Next month, Jaeger will celebrate its fifth birthday. It has been fantastic to be a part of the project’s journey for three of those years, and we can’t wait to see the milestones to come!

 

How to avoid the 503 error!

By | Blog

Member Post

Guest post originally published on the Cuemby blog 

How to avoid the 503 error!

The dreaded 503 Service Unavailable error is an HTTP status code that technically means the web site’s server is simply not available at this time. Usually, it occurs because the server is too busy, there are too many simultaneous requests or because there is a maintenance window, scheduled or worse emergency.

What the 503 error really means is your application isn’t working and your customers, potential customers, and other interested parties cannot see your information, login to your application, or buy products from you.

In some ways, you have become a victim of your own success, the more activity you have, the more requests to the network, the more the infrastructure needs to scale and then BANG! there’s a crash. 503!

But the crash can be avoided. Just as virtualization gave more resiliency to server infrastructure, containerized services can do the same for cloud infrastructure. But you need to understand it and manage it properly, which is not always easy.

That’s where cloud-native technologies such as Kubernetes come into play. Containers allow you to securely run applications and all of the associated dependencies independently and have no impact on other containers or their operating systems.

 

That sounds great because you gain scale in resources but those containers need to be managed properly, they really need to be orchestrated properly. For example, scaling the resources of a container vertically gives you more horsepower for the timeframe you need in (simultaneous logins) but there still could be a central chokepoint with entry.

Think about it in real estate terms, if a building has ten floors and a capacity of 100 people per floor you can easily and effectively have 1,000 people in the building at one time. But what if you need 2,000 or even more. You can add resources and “floors” but what if fire code only allows 1,000 people, scaling vertically is doable but it only adds cost and not value.

Worse, what if those 2,000 people all need to enter the building at the same time and there are only two doors? Now you have twice the capacity all entering simultaneously……scenes from the 6 o’clock news every Black Friday!

Kubernetes can not only automate that vertical scale but also provides a simultaneous horizontal scale as well. Now instead of having one building with ten stories, you can have three buildings with four stories and each having a 1,000 person fire code capacity and two entry points.

Now you can have three times the capacity at the same on slightly higher cost and three times the simultaneous points of entry. This works brilliantly but high rises in the city are there because real estate space is at a premium. But that’s the beauty of cloud infrastructure, horizontal real estate is never an issue and Kubernetes allows you to also deploy that application and scale over multiple clouds and even on-premises.

So containers allow you to have multiple instances of the application, and Kubernetes allows you to scale the resources and instances of that application both vertically and horizontally and over multiple clouds and in multiple regions….but how do you automate the multi-cloud aspect? That is where a declarative API, which several firms are working on creating, would come into play by presenting end-users one dashboard to orchestrate over multiple clouds.

Until then organizations like Cuemby can provide this multi-cloud, multi-region experience for you through our CuembyIO platform. (Platform video)

 

Announcing Vitess 7

By | Blog

Project Post

Guest post originally published on the Vitess blog by Deepthi Sigireddi, Vitess maintainer

On behalf of the Vitess maintainers team, I am pleased to announce the general availability of Vitess 7.

Major Themes

Improved SQL Support

We continued to progress towards (almost) full MySQL compatibility. The highlights in Vitess 7 are replica transactions, savepoint support, and ability to set system variables per session. We expect to continue down this path for Vitess 8.

Stability

Vitess had significant technical debt because of functionality that has been added organically. Some parts of the code had become unmaintainable. In this release, VTGate’s healthcheck and VTTablet’s tabletserver and tabletmanager have been rewritten. The rewrites have already paid dividends. Replica transaction support and system variable support are built on the foundation of the new healthcheck and tabletserver. VTTablet rewrites are expected to facilitate several new features in upcoming releases.

Innovation

Vitess 7 adds ease-of-use and many new features built on top of VReplication. VStream Copy allows streaming of entire tables or databases, thus enabling change data capture applications. Schema Versioning enables correct handling of binlog events on replication streams based on older versions of the schema. VExec and Workflow commands make it possible to manage vreplication workflows without manual edits to metadata. A novel framework has been built to allow dedicated connections alongside connection pooling. Locks and system variables have been implemented using this.

Tutorials

Vitess 7 adds three new tutorials to the documentation. We have added a tutorial that demonstrates how to use the open source vitess-operator from PlanetScale, a tutorial for region-based sharding, and one for a local docker installation.

There is a short list of incompatible changes in this release. We encourage you to spend a moment reading the release notes.

Please download Vitess 7 and take it for a spin!

Logging in Kubernetes: EFK vs PLG Stack

By | Blog

Member Post

Guest post originally published on the InfraCloud blog by Anjul Sahu, Solution Architect at InfraCloud

With ever-increasing complexity in distributed systems and growing cloud-native solutions, monitoring and observability become a very important aspect in understanding how the systems are behaving. There is a need for scalable tools that can collect data from all the services and provide the engineers with a unified view of performance, errors, logs, and availability of components. These tools also need to be cost-effective and performant.  In this article, we will go through two popular stacks – EFK (Elasticsearch) and PLG (Loki) and understand their architecture and differences.

EFK Stack

You might have heard of ELK or EFK stack which has been very popular. It is a set of monitoring tools  – Elastic search (object store), Logstash or FluentD (log routing and aggregation), and  Kibana for visualization.

A typical workflow would be like the following:

Typical EFK Workflow

Elasticsearch is a real-time, distributed object storage, search and analytics engine. It excels in indexing semi-structured data such as logs. The information is serialized as JSON documents and indexed in real-time and distributed across nodes in the cluster. Elasticsearch uses an inverted index which lists all unique words and their related documents for full-text search, which is based on Apache Lucene search engine library.

FluentD is a data collector which unifies the data collection and consumption for better use. It tries to structure data as JSON as much as possible. It has plugin-architecture and supported by 100s of community provided plugins for many use-cases.

Kibana is the visualization engine for elasticsearch data, with features like time-series analysis, machine learning, graph and location analysis.

Elasticsearch Architecture

Typically in an elastic search cluster, the data stored in shards across the nodes. The cluster consists of many nodes to improve availability and resiliency. Any node is capable to perform all the roles but in a large scale deployment, nodes can be assigned specific duties.

There are following type of nodes in the cluster:

  1. Master Nodes – controls the cluster, requires a minimum of 3, one is active at all times
  2. Data Nodes – to hold index data and perform data-related tasks
  3. Ingest Nodes – used for ingest pipelines to transform and enrich the data before indexing
  4. Coordinating Nodes – to route requests, handle search reduce phase, coordinates bulk indexing
  5. Alerting Nodes – to run alerting jobs
  6. Machine Learning Nodes – to run machine learning jobs

Below diagram shows how the data is stored in primary and replica shards to spread the load across nodes and to improve data availability.

The data in each shard is stored in an inverted index. Below figure shows how the data would be stored in an inverted index.

source – grafana.com

EFK Stack – Quick Installation

For the detailed steps, I found a good article on DigitalOcean . Here, I am installing using helm chart in my demo.

Quickstart:

$ helm install efk-stack stable/elastic-stack --set logstash.enabled=false --set fluentd.enabled=true --set fluentd-elasticsearch.enabled=true

PLG Stack (Promtail, Loki and Grafana)

Don’t be surprised if you don’t find this acronym, it is mostly known as Grafana Loki. Anyway, this stack is getting good popularity due to its opinionated design decisions. You might know about Grafana which is a popular visualization tool. Grafana labs designed Loki which is a horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. It indexes only metadata and doesn’t index the content of the log. This design decision makes it very cost-effective and easy to operate.

Promtail is an agent that ships the logs from the local system to the Loki cluster. Grafana is the visualization tool which consumes data from Loki data sources.

The Loki is built on the same design principles of Prometheus, therefore it is a good fit for storing and analyzing the logs of Kubernetes.

Loki Architecture

Loki can be run in single-process mode or in multiple process mode providing independent horizontal scalability.

source: https://github.com/grafana/loki/blob/master/docs/architecture.md

Loki is designed in a way that it can be used as a single monolith or can be used as microservice. The single-process model is good for local development and small monitoring setup. For production and scalable workload, it is recommended to go with the microservices model. The write path and read path in Loki are decoupled so it is highly tuneable and can be scaled independently based on the need.

Let’s look into its logging architecture at high level with below diagram.

source: grafana.com

Below is the breakdown of the Loki (Microservice model).

Source: grafana.com

Components:

Promtail – This is the agent which is installed on the nodes (as Daemonset), it pulls the logs from the jobs and talks to Kubernetes API server to get the metadata and use this information to tag the logs. Then it forwards the log to Loki central service.  The agents support the same labelling rules as Prometheus to make sure the metadata matches.

Distributor – Promtail sends logs to the distributor which acts as a buffer. To handle millions of writes, it batches the inflow and compresses it in chunks as they come in. There are multiple ingesters, the logs belonging to each stream would end up in the same ingester for all relevant entries in the same chunk. This is done using the ring of ingesters and consistent hashing. To provide resiliency and redundancy, it does n (default 3) times.

Ingester – As the chunks come in, they are gzipped and appended with logs. Once the chunk fills up, the chunk is flushed to the database.  The metadata goes into Index and log chunk data goes into Chunks (usually an Object store).  After flushing, ingester creates a new chunk and add new entries in to that.

source: grafana.com

Index – Index is the database like DynamoDB, Cassandra, Google Bigtable, etc.

Chunks –  Chunk of logs in a compressed format is stored in the object stores like S3

Querier – This is in the read path and does all the heavy lifting. Given the time range and label selector, it looks at the index to figure out which are the matching chunks. Then it reads through those chunks and greps for the result.

Now let’s see it in action.

To install in Kubernetes, the easiest way is to use helm.  Assuming that you have helm installed and configured.

Add the Loki chart repository and install the Loki stack.

$ helm repo add loki https://grafana.github.io/loki/charts
$ helm repo update
$ helm upgrade --install loki loki/loki-stack --set grafana.enabled=true,prometheus.enabled=true,prometheus.alertmanager.persistentVolume.enabled=false,prometheus.server.persistentVolume.enabled=false

Below is a sample dashboard showing the data from Prometheus for ETCD metrics and Loki for ETCD pod logs.


Now we have discussed the architecture of both logging technologies, let’s see how they compare against each other.

Comparison

Query Language

Elasticsearch uses Query DSL and Lucene query language which provides full-text search capability. It is a mature powerful search engine with extensive operator support. It can search in the content and sort it using a relevance score. On the other side, Loki uses LogQL which is inspired my PromQL (Prometheus query language). It uses log labels for filtering and selecting the log data. You can use some operators and arithmetic as documented here but it is not mature like Elastic language.

Combined with the labels information, the queries in Loki are simple for operational monitoring and can be correlated with metrics easily.

Scalability

Both are horizontally scalable but Loki has more advantages because of its decoupled read and write path and use of microservices-based architecture. It can be customized as per your specific needs and can be used to consume a very large amount of logging data.

Multi-tenancy

Having multiple tenants in a shared cluster is a common theme to reduce OPEX. Both technologies provide ways to host multiple tenants. With elasticsearch, there are various ways to keep the tenants separate – one index per tenant, tenant-based routing, using unique tenant fields, and use of search filters.

In Loki, the multi-tenancy is supported by using X-Scope-OrgId in the HTTP header request.

Cost

Loki is an extremely cost-effective solution because of the design decision to avoid indexing the actual log data. Only metadata is indexed and thus it saves on the storage and memory (cache).  The object storage is cheaper as compared to the block storage required by Elasticsearch clusters.

Conclusion

EFK stack can be used for a variety of purposes, providing the utmost flexibility and feature-rich Kibana UI for analytics, visualization, and querying.  It is equipped with machine learning capabilities.

Loki Stack is useful in Kubernetes ecosystem because of the metadata discovery mechanism. One can easily correlate the time-series based data in grafana and logs for observability.

When it is a matter of cost and storing logs for a long amount of time, Loki is a great choice for logging in cloud-native solutions.

There are more alternatives in the market which may be better for you. For example, in GKE, Stackdriver is integrated and provides a great observability solution. We haven’t included those in our analysis in this post.

Please let us know what are your thoughts or comments.

References

  1.  Loki / Promtail / Grafana vs EFK by Grafana
  2. https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up
  3. https://www.elastic.co/blog/found-elasticsearch-in-production/

Open Application Model: Carving building blocks for Platforms

By | Blog

Member Post

Guest Post from Andy Shi, developer advocate for Alibaba Cloud

As a platform engineer, I often feel like a sandwich: Being squashed between the customer and the underlying infrastructure. The complaint I get most from users is about Kubernetes YAML.

As Kelsey Hightower says, “We like to hate YAML files.”

Yes, YAML has its own problems. But we shouldn’t use YAML as the scapegoat.

We’ve tried to use GUI to simplify the YAML file. But Devops/ operators wanted more options. We’ve also tried all-in-one YAML. Developers really hated those “additional” fields. Finally, I realized the problem is because Kubernetes API is not team centric. In many organizations, developers and Devops/operators are two different roles. Yet when using Kubernetes, they have to work on the same YAML file. And that means trouble.  Take a look at this all-in-one YAML:

The developers care about a small part of the fields while the Devops/operators care about the rest. But these fields are tangled together in those APIs.

Even with Kubernetes built-in objects, the problem remains:

Take this Deployment YAML for example. For developers, they only care about the mid-section where it specifies the application. They can’t care less about the other sections. But those sections are there. Similar situation is faced by the operators. And the tension gets worsened when new fields or even new CRDs are introduced. When things don’t work out, they leash out the frustration on YAML.

Talking about the new capabilities installed by CRDs, that’s another problem. Over time, K8s community has more active developers than Linux. Most of them are extending K8s through CRD and customized controller/operator. Right now, almost every extended workload and operational capabilities you can think of there is a CRD or some controllers somewhere in my cluster.

First issue I have is how do I keep track of these capabilities? Can I get a list of them? If I query an individual CRD will it give me the details the users need to know?

And it’s not just the capabilities out of Kubernetes community. There are so many managed services on different cloud vendors like databases, message queues, logging systems, etc. There are also great pieces of functionalities from other technologies such as Terraform and Nomad that we can reuse.

How do we converge these resources and capabilities and offer the users a smooth user experience in a unified approach?

We face the dilemma of the level of abstraction. If we wrap them up really nicely, it’s goanna take a long time and we need to fork it to be really opinionated. That will lead to hefty maintenance cost. If we expose these capabilities raw, we would get more complaints about YAML. This got me thinking: What is the good way to abstract these capabilities, so that they are not too rigid for the users and still allow us to provide added value at the same time?

K8s is a declarative system. It means each controller tries to reach the desired state of its objects. The controllers will provide the CRUD operations on the objects. Many call this design “infrastructure as data”. But given such rich source of capabilities, we should call it “infrastructure as database”. Using the database analogy, it’s easier to describe the situation.

The right term in database to describe what I want is a view.

A view is a virtual table. The main purpose of a view is to abstract the complexity of creating a specific result set. This approach has been discussed and tried out in the K8s community for quite some time. A couple months ago, there is a nice blog post on how Pinterest created custom CRDs to abstract and customize several upstream workloads into one. This is an example of the “view” creation.

But we need to take it one step further.

To make all the capabilities manageable, we need a standard on them so that we can collect, query and share them in a unified approach. That calls for a spec. The benefit of a spec is beyond manageability, though. It also enables reusability of the views we created. The Pinterest CRDs may benefit other platforms and vice versa. In the end, this will create an ecosystem of CRDs, a CRD market that will flourish.

This Spec should also take into account of the previous requirement, i.e. being team centric. It should make the life of developers and operators/DevOps easier as well. I’m glad to introduce Open Application Model: the solution that I’m searching for.

OAM is a spec. It is proposed by Alibaba Cloud and Azure. The latest version of the spec is Alpha 0.2. The spec focuses on the definition of Component and Trait. And how they are combined together.

OAM is team centric. It separates application definition into two parts: Components and Traits. Components are natural pieces of application. They are the front end, the back end, the database, the storage, and other dependencies of your applications. They define applications from developer’s view. The component has only application related configurations and are filled in by developers.

Traits defines operational capabilities. Your workloads, rollout strategy, your update policy, ingress, etc. By assembling components and traits together, you will get the whole picture of the application, which is called application configuration. This modularized design can ensure the developers and operators focus on their own concerns.

And for us platform builders, OAM will help us create a view layer for my platform with the idea of separate concerns.

OAM is also a framework. The standard OAM framework for Kubernetes is co-maintained by OAM and Crossplane community. If you are intrigued by how we do it, please check out the git repo (https://github.com/crossplane/oam-kubernetes-runtime).

Author: Andy Shi is a developer advocate for Alibaba Cloud. He works on the open source cloud native platform technologies. He is passionate about making devops’ tasks easier and platforms more powerful.

Conftest joins the Open Policy Agent project

By | Blog

Project Post

Guest post from Gareth Rushgrove, maintainer of the Open Policy Agent project

Today the Open Policy Agent maintainers are happy to announce that Conftest has formally joined the project.

A bit of history

Conftest is a command line tool for testing configuration files and uses Open Policy Agent under the hood. I built it for two reasons; the first being I wanted a developer-friendly tool for testing configuration data. The second reason was I wanted an excuse to learn Open Policy Agent!

Conftest was first demoed at KubeCon + CloudNativeCon EU in Barcelona in May 2019. The first version provided a simple command line tool, really nothing more than some built-in conventions and a high-level CLI user interface wrapping Open Policy Agent functionality.

It turned out that the use case, having a friendly way of testing a range of different configuration file formats and integrating the results into developer tools, resonated with the community. Surprisingly quickly pull requests started to flow in.

Step forward to today, and Conftest has seen over 200 pull requests from more than 30 contributors. The project has a team of maintainers and regular contributors from a range of different organizations. We’ve had an active channel on the Open Policy Agent Slack for the last year. We have integrations with CircleCI, GitHub Actions and Tekton Pipelines. Support for testing a wide range of configuration formats, including YAML, JSON, HCL, TOML, Dockerfile and more. We’ve also been leading on work to help users share OPA Bundles more easily, with tools for sharing via Git, HTTP, S3 and OCI registries. 

Conftest today

Conftest fits nicely into the overall Open Policy Agent project. OPA itself provides the policy engine, a general-purpose CLI tool and defines the Rego language that’s used by users to write policies. But it’s designed mainly as a component that can be used for lots of different use cases. Conftest is focused purely on building the best developer experience for testing configuration files on top of that more generic engine.

$ conftest test deployment.yaml

FAIL - deployment.yaml - Containers must not run as root

FAIL - deployment.yaml - Deployments are not allowed

2 tests, 0 passed, 0 warnings, 2 failures

While the above example is for Kubernetes, Conftest can be used to test a wide range of configuration file formats. You can write tests for your Envoy JSON configuration files or your Linkerd YAML files or we have a Helm plugin for testing Helm charts for example.

You can find out more about using Conftest, including lots of examples, by reading the documentation at conftest.dev.

Conftest also works well with the other Open Policy Agent subproject, Gatekeeper. While Gatekeeper focuses on securing a Kubernetes cluster, Conftest’s focus is earlier in the development process. By virtue of them both using Open Policy Agent under the hood, the same policies can be used in both tools, making using them together a real end-to-end solution.

The future

With Conftest formally joining the Open Policy Agent project we’re already talking about moving some of the features from Conftest into OPA itself. Conftest has acted as a great place to innovate on top of the core engine with things like input format parsing and sharing tools. With the projects now working even more closely together, it’s even easier to benefit all Open Policy Agent users. 

Conftest itself will retain its focus on developer experience, and making Open Policy Agent as easy as possible to adopt for testing configuration.

Join us in the #conftest channel on the Open Policy Agent Slack and head over to the GitHub repository to get started using or contributing to Conftest.

1 2 3 53