Category

Blog

Intuit Wins CNCF End User Award

By | Blog

The Cloud Native Computing Foundation (CNCF) has announced Intuit as the winner of the top end user award. Thanks to our contributions to the cloud native community, Intuit is being recognized for how we leverage cloud native technologies in production, including CNCF projects like Kubernetes, Istio, Prometheus, Fluentd, Jaeger, and Open Policy Agent to build a modern developer platform that provides continuous integration, continuous delivery, and continuous operations to accelerate developer productivity.

As a part of our journey to the cloud and mobile, we continue to modernize all aspects of our technology, including our platform, tools and processes and continue to advance the way we leverage cloud native technologies. This includes how we create, deploy, run, and monitor applications and services, at scale. In January 2018, we acquired Applatix, a talented team with deep system and infrastructure knowledge and expertise in building scalable production systems with containers and Kubernetes in both the public and private cloud environments. As a part of this acquisition, we also inherited their flagship open source tool, Argoproj, a set of Kubernetes native projects.

In the past year alone, Intuit has operationalized more than a hundred Kubernetes clusters that run more than 500 services in production and pre-production across multiple business units. As a part of the deployment, Intuit solved many common issues for teams deploying Kubernetes and related technologies, and shared their solutions to the larger Kubernetes community in order to increase developer productivity.

We continue to be actively involved in the cloud native community, announcing our partnership to the Cloud Native Computing Foundation in January 2018 as an End-User Silver member and serving as a founding member of the GraphQL Foundation since March 2019. Intuit is an active member of the CNCF End User Community, which meets regularly to share adoption best practices and feedback on project roadmaps and future projects for CNCF technical leaders to consider.

“It is such an honor to receive the CNCF End User Award, recognizing our commitment and contributions to the cloud native community,” said Jeff Brewer, Vice President, Chief Architect of the Small Business and Self Employed Group at Intuit. “We’ve undergone several transformations as a company, from the desktop to the web to the cloud and now to AI and ML, and each of these transformations require us to move faster with increased speed of innovation. Through Kubernetes and other cloud native tools, we are able to deploy code faster than we ever have been able to. I’m excited to see how these tools help us to not only better serve our developer community, but also our customers overall as we work to power their prosperity.”  

To learn more about Argo and other Intuit open source tools, check out our open source page at https://opensource.intuit.com.

A Brief History of OpenTelemetry (So Far)

By | Blog

by Ben Sigelman, co-creator of OpenTracing and member of the OpenTelemetry governing committee, and Morgan McLean, Product Manager for OpenCensus at Google since the project’s inception

After many months of planning, discussion, prototyping, more discussion, and more planning, OpenTracing and OpenCensus are merging to form OpenTelemetry, which is now a CNCF sandbox project. The seed governance and technical committees are composed of representatives from Google, LightStep, Microsoft, and Uber, and more organizations are getting involved every day.

We couldn’t be happier about it – here’s why.

Observability, Outputs, and High-Quality Telemetry

Observability is a fashionable word with some admirably nerdy and academic origins. In control theory, “observability” measures how well we can understand the internals of a given system using only its external outputs. If you’ve ever deployed or operated a modern, microservice-based software application, you have no doubt struggled to understand its performance and behavior, and that’s because those “outputs” are usually meager at best. We can’t understand a complex system if it’s a black box. And the only way to light up those black boxes is with high-quality telemetry: distributed traces, metrics, logs, and more.

So how can we get our hands – and our tools – on precise, low-overhead telemetry from the entirety of a modern software stack? One way would be to carefully instrument every microservice, piece by piece, and layer by layer. This would literally work, it’s also a complete non-starter – we’d spend as much time on the measurement as we would on the software itself! We need telemetry as a built-in feature of our services.

The OpenTelemetry project is designed to make this vision a reality for our industry, but before we describe it in more detail, we should first cover the history and context around OpenTracing and OpenCensus.

OpenTracing and OpenCensus

In practice, there are several flavors (or “verticals” in the diagram) of telemetry data, and then several integration points (or “layers” in the diagram) available for each. Broadly, the cloud-native telemetry landscape is dominated by distributed traces, timeseries metrics, and logs; and end-users typically integrate with a thin instrumentation API or via straightforward structured data formats that describe those traces, metrics, or logs.

 

For several years now, there has been a well-recognized need for industry-wide collaboration in order to amortize the shared cost of software instrumentation. OpenTracing and OpenCensus have led the way in that effort, and while each project made different architectural choices, the biggest problem with either project has been the fact that there were two of them. And, further, that the two projects weren’t working together and striving for mutual compatibility.

Having two similar-yet-not-identical projects out in the world created confusion and uncertainty for developers, and that made it harder for both efforts to realize their shared mission: built-in, high-quality telemetry for all.

Getting to One Project

If there’s a single thing to understand about OpenTelemetry, it’s that the leadership from OpenTracing and OpenCensus are co-committed to migrating their respective communities to this single and unified initiative. Although all of us have numerous ideas about how we could boil the ocean and start from scratch, we are resisting those impulses and focusing instead on preparing our communities for a successful transition; our priorities for the merger are clear:

  • Straightforward backwards compatibility with both OpenTracing and OpenCensus (via software bridges)
  • Minimizing the time where OpenTelemetry, OpenTracing, and OpenCensus are being co-developed: we plan to put OpenTracing and OpenCensus into “readonly mode” before the end of 2019.
  • And, again, to simplify and standardize the telemetry solutions available to developers.

In many ways, it’s most accurate to think of OpenTelemetry as the next major version of both OpenTracing and OpenCensus. Like any version upgrade, we will try to make it easy for both new and existing end-users, but we recognize that the main benefit to the ecosystem is the consolidation itself – not some specific and shiny new feature – and we are prioritizing our own efforts accordingly.

How you can help

OpenTelemetry’s timeline is an aggressive one. While we have many open-source and vendor-licensed observability solutions providing guidance, we will always want as many end-users involved as possible. The single most valuable thing any end-user can do is also one of the easiest: check out the actual work we’re doing and provide feedback. Via GitHub, Gitter, email, or whatever feels easiest.

Of course we also welcome code contributions to OpenTelemetry itself, code contributions that add OpenTelemetry support to existing software projects, documentation, blog posts, and the rest of it. If you’re interested, you can sign up to join the integration effort by filling in this form.

Going Big: Harbor 1.8 Takes Security and Replication to New Heights

By | Blog

By Michael Michael, Harbor Core Maintainer, Director of Product Management, VMware (Twitter: @michmike77)

Happy release day everyone! We are very excited to present the latest release of Harbor. The release cycle for version 1.8 was one of our longest cycles, and version 1.8 involved the highest number of contributions from community members of any Harbor release to date. As a result, 1.8 is our best release so far and comes packed with a great number of new features and improvements, including enhanced automation integration, security, monitoring, and cross-registry replication support.

Support for OpenID Connect

In many environments, Harbor is integrated with existing enterprise identity solutions to provide single sign-on (SSO) for developers and users. OpenID Connect (OIDC), which is an authentication layer on top of OAuth 2.0, allows Harbor to verify the identity of users based on authentication performed by an external authorization server or identity provider. Administrators can now enable an OIDC provider as the authentication mode for Harbor users, who can then use their single sign-on credentials to log in to the Harbor portal.

In most situations, tools like the Docker client are incapable of logging in by using SSO and federated identity when the user has to be redirected to an external identity provider. To remedy this issue, Harbor now includes CLI secrets, which can provide end users with a token that can be used to access Harbor via the Docker or Helm clients.

Robot Accounts

In a similar scenario to the Docker client SSO issue mentioned above, Harbor is often integrated with CI/CD tools that are unable to perform SSO with federated enterprise identity providers. With version 1.8, administrators can now create robot accounts, a type of special account that allows Harbor to be integrated and used by automated systems, such as CI/CD tools. You can configure robot accounts to provide administrators with a token that can be granted appropriate permissions for pulling or pushing images. Harbor users can continue operating Harbor using their enterprise SSO credentials, and use robot accounts for CI/CD systems that perform Docker client commands.

Replication Advancements

Many users have the need to replicate images and Helm charts across many different environments, from the data center to the edge. In certain situations, users may have deployed applications on a public cloud and utilize the public cloud provider’s built-in registry. The built-in registries don’t offer the many capabilities and features of Harbor, specifically the static analysis of images.

Harbor 1.8 expands the Harbor-to-Harbor replication feature to add the ability to replicate resources between Harbor and Docker Hub, Docker Registry, and the Huawei Cloud registry by using both push- and pull-mode replication. Harbor can act as the central repository for all images, scan them for vulnerabilities, enforce compliance and other policies, and then replicate images to other registries acting as a pure content repository. One use case is creating replicas of your Harbor image repository on different types of repositories spread across data centers in different regions. This new Harbor feature has been created using a provider interface, and we expect our developer community to add support for more registries in the future.

Additional Features

Harbor 1.8 brings numerous other capabilities for both administrators and end users:

  1. Health check API, which shows detailed status and health of all Harbor components.
  2. Harbor extends and builds on top of the open source Docker Registry to facilitate registry operations like the pushing and pulling of images. In this release, we upgraded our Docker Registry to version 2.7.1
  3. Support for defining cron-based scheduled tasks in the Harbor UI. Administrators can now use cron strings to define the schedule of a job. Scan, garbage collection, and replication jobs are all supported.
  4. API explorer integration. End users can now explore and trigger Harbor’s API via the Swagger UI nested inside Harbor’s UI.
  5. Enhancement of the Job Service engine to include internal webhook events, additional APIs for job management, and numerous bug fixes to improve the stability of the service.

Growing End User Support for Harbor

We’re proud of the functionality we’re delivering in Harbor 1.8. We’re also fortunate to have a growing community willing to try Harbor and provide us with feedback. Here are some comments shared by end users on their use of Harbor:

Fanjian Kong, Senior Engineer, 360 Total Security

“Through Harbor’s Web UI, we can conveniently manage the access rights of projects, members and images. We take advantage of Harbor’s remote replication features to create replicas of image repository in data centers across different regions.”

De Chen, Cloud Platform Senior Software Engineer, CaiCloud

“In Caicloud’s product of cloud native platform, we leverage Harbor to implement the capability of image management, including Harbor’s image synchronization and vulnerability scanning function. Delivered as an important component in our product, Harbor has been used by many of our enterprise customers.”

Mingming Pei, Senior development engineer, Netease Cloud

“Harbor provides rich functions in container image management. It solves our challenges of transferring images and Helm charts between container clusters. Harbor does allow us to save a lot of resources in image repository. The community is very active and the features are constantly being improved.”

Since becoming a Cloud Native Computing Foundation (CNCF) Incubating project, there’s been a tremendous increase in participation by our community, evident in the breadth of new features included with this release. We want to extend a huge thank you to the community for making this release possible through all your contributions of code, testing, and feedback. If you are a new or aspiring contributor, there are many ways to get involved as a developer or a user. You can join us on Slack, GitHub, or Twitter to help advance the Harbor vision.

Join the Harbor Community!

Get updates on Twitter (@project_harbor)

Chat with us on Slack (#harbor on the CNCF Slack)

Collaborate with us on GitHub: github.com/goharbor/harbor

Michael Michael

Harbor Core Maintainer

Director of Product Management, VMware

@michmike77

TOC Votes to Move TiKV into CNCF Incubator

By | Blog

Today, the Cloud Native Computing Foundation’s (CNCF) Technical Oversight Committee (TOC) voted to accept TiKV as an incubation-level hosted project.

TiKV, which entered the CNCF Sandbox in August 2018, is an open source distributed transactional key-value database built in Rust. The project serves as a unifying distributed storage layer that supports strong data consistency, distributed transactions, horizontal scalability, and cloud native architecture.

“There is a huge need for an open-source, unifying distributed storage layer that supports cloud native architectures,” said Siddon Tang, Chief Engineer at PingCAP and TiKV project lead. “Since joining CNCF, not only have our user adoption and project maturity been steadily improving, but we’ve navigated a multitude of new real-world scenarios. We look forward to the growth and development to come as we move into the Incubator.”

TiKV was designed from the ground up to be cloud native, and it integrates well into existing CNCF ecosystems. The project uses Prometheus for metrics reporting, and gRPC for communication. It can also be deployed on top of Kubernetes with an operator to ease installation, upgrades and maintenance.

Since joining CNCF, TiKV has been adopted by hundreds of companies in production, including several CNCF members such as JD.com, UCloud, and PingAn Technology. The team also announced General Availability of TiKV 2.1 with the help of 39 new contributors.

TiKV currently serves millions of users in industries including banking, fintech, insurance, ridesharing and gaming. Some of the largest internet companies, most notably XiaoMi, Bank of Beijing, Zhihu, Shopee, BookMyShow and many others, are using TiKV, both with and without TiDB, a stateless SQL layer that speaks the MySQL protocol, for mission critical systems. Additionally, several storage systems are built on top of TiKV, including three Redis-on-TiKV projects, Tidis, Titan, Titea, and a Prometheus-metrics-in-TiKV project, TiPrometheus.

“The community needs more cloud native storage options that supports consistency and scalability, and TiKV offers this without dependency on any distributed file system,” said Chris Aniszczyk, CTO/COO of the Cloud Native Computing Foundation. “Since it joined CNCF, we’ve seen impressive growth of the project in and outside of China. As it moves into incubation, we are excited to see this continue as new contributors continue to add new capabilities.”

TiKV was originally developed at PingCAP in 2016, and today includes contributions from Samsung, Mobike, Zhihu, Ele.me, Tencent Cloud, and UCloud.  

Main TiKV Features:

  • Geo-Replication – uses Raft and the Placement Driver (PD) to support Geo-Replication.
  • Horizontal scalability – with PD and carefully designed Raft groups, TiKV excels in horizontal scalability and can easily scale to 100+ TBs of data.
  • Consistent distributed transactions – similar to Google’s Spanner, TiKV supports externally-consistent distributed transactions.
  • Coprocessor support – similar to HBase, TiKV implements a coprocessor framework to support distributed computing.
  • Cooperates with TiDB – thanks to the internal optimization, TiKV and TiDB can work together to be a compelling database solution with high horizontal scalability, externally-consistent transactions, support for RDBMS, and NoSQL design patterns.

Notable Milestones:

  • 247 contributors
  • 5,120 GitHub stars
  • 54 releases
  • 3,654 commits
  • 743 forks

As a CNCF hosted project, joining incubating technologies like gRPC, rkt, CNI, Jaeger, Notary, TUF, Vitess, NATS, Linkerd, Helm, Rook, Harbor, etcd, Open Policy Agent and CRI-O, TiKV is part of a neutral foundation aligned with its technical interests, as well as the larger Linux Foundation, which provides governance, marketing support, and community outreach.

Every CNCF project has an associated maturity level: sandbox, incubating, or graduated project. For more information on what qualifies a technology for each level, please visit the CNCF Graduation Criteria v.1.1.

For more on TiKV, please visit https://github.com/tikv/tikv.

Cloud Native Logging with Fluentd: New Online Course Available on Linux Foundation Training

By | Blog

The Cloud Native Computing Foundation and The Linux Foundation have designed a new, self-paced and hands-on course to introduce individuals with a technical background to the Fluentd log forwarding and aggregation tool for use in cloud native logging. Available starting today, Cloud Native Logging with Fluentd will provide users with the necessary skills to deploy Fluentd in a wide range of production settings.

As large scale, distributed systems become increasingly prevalent, the challenges of managing logs has become acute. It is increasingly common to have thousands of nodes and tens of thousands of services all emitting data which needs to be attributed, normalized and aggregated i.e. “logged”. Known as the “unified logging layer”, Fluentd provides fast and efficient log transformation and enrichment, as well as aggregation and forwarding.

“This course will explore the full range of Fluentd features, from installing Fluentd and running it in a container, to using it as a simple log forwarder or a sophisticated log aggregator and processor,” said Eduardo Silva, Principal Engineer at Arm Treasure Data. “As we see the Fluentd project growing into a full ecosystem of third party integrations and components, we are thrilled that this course will be offered so more people can realize the benefits it provides.”

Upon course completion, developers will be able to:

  • Install and configure Fluentd in Cloud Native environments
  • Configure Fluentd to process log data from multiple inputs
  • Configure Fluentd to filter and transform data
  • Configure Fluentd to distribute log data to various backends
  • Configure Fluentd for high availability and high performance

Course Outline

  1. Introduction to Fluentd and Unified Logging
    1. Lab: Run Fluentd on Linux
    2. Lab: Run Fluentd Using the Docker Container Runtime
    3. Lab: Run Fluentd Using Kubernetes
  2. Fluentd Configuration
    1. Lab: Configuring Fluentd
  3. Extending Fluentd with Plugins
    1. Lab: Extending Fluentd with Plugins: Working with Input and Output Plugins
  4. Filtering Data and Creating Pipelines
    1. Lab: Extending Fluentd with Plugins: Working with Input and Output Plugins
  5. Parsing and Formatting Data
    1. Lab: Working with Parser and Formatter Plugins, and Processing Apache2 Log Data
  6. Effective Configurations Design Using Labels and Includes
    1. Lab: Organizing Complex Configurations by Working with Container Logs
  7. Multi-Instance Deployments with Fluentd
    1. Lab: Using Multiple Fluentd Instances, Configuring High Availability and Testing Failover in Fluentd
  8. Monitoring the Unified Logging Layer
    1. Lab: Monitoring Fluentd
  9. Debugging, Tuning and Securing Fluentd Configurations
    1. Lab: Debugging Fluentd Configurations and Creating Intermediate Configurations
  10. Introduction to Fluent Bit
    1. Lab: Fluent Bit with Fluentd

To take this course, some familiarity with logging and log management is helpful. Labs require a minimal Ubuntu 16.04 system with Docker installed (Lab 1.b. supplies Docker installation instructions).

Interested in this course? Please visit the course’s Linux Foundation Training page to get more detailed information.

Helm 3 Preview: Helm 3 Alpha Release Available and What’s Next

By | Blog

First published on https://helm.sh/blog by Matt Fisher @bacongobbler

Helm 3 Preview: Charting Our Future – Part 1: A History of Helm

On October 15th, 2015, the project now known as Helm was born. Only one year later, the Helm community joined the Kubernetes organization as Helm 2 was fast approaching. In June 2018, the Helm community joined the CNCF as an incubating project. Fast forward to today, and Helm 3 is nearing its first alpha release.

In this s blog post, I’ll provide some history on Helm’s beginnings, illustrate how we got where we are today, showcase some of the new features available for the first alpha release of Helm 3, and explain how we move forward from here.

In order, I’ll discuss:

  1. The history of the creation of Helm
  2. A Gentle Farewell to Tiller
  3. Chart Repositories
  4. Release Management
  5. Changes to Chart Dependencies
  6. Library Charts
  7. What’s Next?

A History of Helm

Helm was Born

Helm 1 began as an open source project created by Deis. We were a small startup company acquired by Microsoft in the spring of 2017. Our other open source project – also called Deis – had a tool called deisctlthat was used for (among other things) installing and operating the Deis platform on a Fleet cluster. Fleet was one of the first “container orchestrator” platforms to exist at the time.

In mid-2015, we decided to shift gears, and the foundation of Deis (now re-named “Deis Workflow”) moved from Fleet to Kubernetes. One of the first things we had to rewrite was the installation tool, deisctl. We used this tool to install and manage Deis Workflow on a Fleet cluster.

Modeled after package managers like Homebrew, apt, and yum, the focus of Helm 1 was to make it easy for users to package and install their applications on Kubernetes. We officially announced Helm in 2015 at the inaugural KubeCon in San Francisco.

Our first attempt at Helm worked, but had its fair share of limitations. It took a set of Kubernetes manifests – sprinkled with generators as YAML front-matter – and loaded the generated results into Kubernetes.

For example, to substitute a field in a YAML file, one would add the following to a manifest:

#helm:generate sed -i -e s|ubuntu-debootstrap|fluffy-bunny| my/pod.yaml

Makes you really happy that template languages exist today, eh?

For many reasons, this early Kubernetes installer required a hard-coded list of manifest files and performed only a small fixed sequence of events. It was painful enough to use that the Deis Workflow R&D team was having a tough time replatforming their product around it, but the seed of an idea was there. Our first attempt was a very successful learning opportunity: we learned that we were passionate about building pragmatic solutions that solved real day-to-day problems for our users.

Learning from our past mistakes, we started designing Helm 2.

Designing Helm 2

As 2015 wound to a close, a team from Google reached out to the Helm team. They, too, had been working on a similar tool for Kubernetes. Deployment Manager for Kubernetes was a port of an existing tool they used for Google Cloud Platform. Would we be interested, they asked, in spending a few days talking about similarities and differences?

In January 2016, the Helm and Deployment Manager teams sat down in Seattle to share some ideas. We walked out with a bold plan: merge the projects to create Helm 2. Along with Deis and Google, SkippBoxjoined the development team, and we started work on Helm 2.

Our goal was to maintain Helm’s ease of use, but add the following:

  • Chart templates for customization
  • In-cluster management for teams
  • A first-class chart repository
  • A stable and signable package format
  • A strong commitment to semantic versioning and retaining backward compatibility version-to-version

To accomplish these goals, we added a second component to the Helm ecosystem. This in-cluster component was called Tiller, and it handled installing and managing Helm charts.

Since the release of Helm 2 in 2016, Kubernetes added several major features. Role-Based Access Control (RBAC) was added and eventually replaced Attribute-Based Access Control (ABAC). Many new resource types were introduced (Deployments were still in beta at the time). Custom Resource Definitions (then called Third Party Resources, or TPRs) were invented. And most importantly, a set of best practices emerged.

Throughout all of these changes, Helm continued to serve the needs of Kubernetes users. After 3 years and many new feature additions, it became a good idea to introduce some major changes to the code base so that Helm would continue to meet the needs of this evolving ecosystem.

Helm 3 Preview: Charting Our Future – Part 2: A Gentle Farewell to Tiller

During the Helm 2 development cycle, we introduced Tiller as part of our integration with Google’s Deployment Manager. Tiller played an important role for teams working on a shared cluster – it made it possible for multiple different operators to interact with the same set of releases.

With role-based access controls (RBAC) enabled by default in Kubernetes 1.6, locking down Tiller for use in a production scenario became more difficult to manage. Due to the vast number of possible security policies, our stance was to provide a permissive default configuration. This allowed first-time users to start experimenting with Helm and Kubernetes without having to dive headfirst into the security controls. Unfortunately, this permissive configuration could grant a user a broad range of permissions they weren’t intended to have. DevOps and SREs had to learn additional operational steps when installing Tiller into a multi-tenant cluster.

After hearing how community members were using Helm in certain scenarios, we found that Tiller’s release management system did not need to rely upon an in-cluster operator to maintain state or act as a central hub for Helm release information. Instead, we could simply fetch information from the Kubernetes API server, render the Charts client-side, and store a record of the installation in Kubernetes.

Tiller’s primary goal could be accomplished without Tiller, so one of the first decisions we made regarding Helm 3 was to completely remove Tiller.

With Tiller gone, the security model for Helm is radically simplified. Helm 3 now supports all the modern security, identity, and authorization features of modern Kubernetes. Helm’s permissions are evaluated using your kubeconfig file. Cluster administrators can restrict user permissions at whatever granularity they see fit. Releases are still recorded in-cluster, and the rest of Helm’s functionality remains.

Helm 3 Preview: Charting Our Future – Part 3: Chart Repositories

At a high level, a Chart Repository is a location where Charts can be stored and shared. The Helm client packs and ships Helm Charts to a Chart Repository. Simply put, a Chart Repository is a basic HTTP server that houses an index.yaml file and some packaged charts.

While there are several benefits to the Chart Repository API meeting the most basic storage requirements, a few drawbacks have started to show:

  • Chart Repositories have a very hard time abstracting most of the security implementations required in a production environment. Having a standard API for authentication and authorization is very important in production scenarios.
  • Helm’s Chart provenance tools used for signing and verifying the integrity and origin of a chart are an optional piece of the Chart publishing process.
  • In multi-tenant scenarios, the same Chart can be uploaded by another tenant, costing twice the storage cost to store the same content. Smarter chart repositories have been designed to handle this, but it’s not a part of the formal specification.
  • Using a single index file for search, metadata information, and fetching Charts has made it difficult or clunky to design around in secure multi-tenant implementations.

Docker’s Distribution project(also known as Docker Registry v2) is the successor to the Docker Registry project, and is the de-facto toolset to pack, ship, store, and deliver Docker images. Many major cloud vendors have a product offering of the Distribution project, and with so many vendors offering the same product, the Distribution project has benefited from many years of hardening, security best practices, and battle-testing, making it one of the most successful unsung heroes of the open source world.

But did you know that the Distribution project was designed to distribute any form of content, not just container images?

Thanks to the efforts of the Open Container Initiative (or OCI for short), Helm Charts can be hosted on any instance of Distribution. The work is experimental, with login support and other features considered “table stakes” for Helm 3 yet to be finished, but we’re very excited to learn from previous discoveries that the OCI and Distribution teams have made over the years, learning through their mentorship and guidance on what it means to run a highly available service at scale.

I wrote a more detailed deep-dive on some of the upcoming changes to Helm Chart Repositories if you’d like to read more on the subject.

Helm 3 Preview: Charting Our Future – Part 4: Release Management

In Helm 3, an application’s state is tracked in-cluster by a pair of objects:

  • The release object: represents an instance of an application
  • The release version secret: represents an application’s desired state at a particular instance of time (the release of a new version, for example)

A helm installcreates a release object and a release version secret. A helm upgraderequires an existing release object (which it may modify) and creates a new release version secret that contains the new values and rendered manifest.

The release object contains information about a release, where a release is a particular installation of a named chart and values. This object describes the top-level metadata about a release. The release object persists for the duration of an application lifecycle, and is the owner of all release version secrets, as well as of all objects that are directly created by the Helm chart.

The release version secret ties a release to a series of revisions (install, upgrades, rollbacks, delete).

In Helm 2, revisions were merely incremental. helm installcreated v1, a subsequent upgrade created v2, and so on. The release and release version secret were collapsed into a single object known as a revision. Revisions were stored in the same namespace as Tiller, meaning that each release name was “globally” namespaced; as a result, only one instance of a name could be used.

For Helm 3, a release has one or more release version secrets associated with it. The release object always describes the current release deployed to Kubernetes. Each release version secret describes just one version of that release. An upgrade operation, for example, will create a new release version secret, and then modify the release object to point to this new version. Rollback operations can use older release version secrets to roll back a release to a previous state.

With Tiller gone, Helm 3 stores release data in the same namespace as the release’s destination. This change allows one to install a chart with the same release name in another namespace, and data is persisted between cluster upgrades/reboots in etcd. You can install WordPress into namespace “foo” as well as namespace “bar”, and both releases can be referred to as “wordpress”.

Helm 3 Preview: Charting Our Future – Part 5: Changes to Chart Dependencies

Charts that were packaged (with helm package) for use with Helm 2 can be installed with Helm 3, but the chart development workflow received an overhaul, so some changes are necessary to continue developing charts with Helm 3. One of the components that changed was the chart dependency management system.

The Chart dependency management system moved from requirements.yaml and requirements.lock to Chart.yaml and Chart.lock, meaning that charts that relied on the helm dependencycommand will need some tweaking to work in Helm 3.

Let’s take a look at an example. Let’s add a dependency to a chart in Helm 2 and then look at how that changed in Helm 3.

In Helm 2, this is how a requirements.yaml looked:

dependencies:
- name: mariadb
  version: 5.x.x
  repository: https://kubernetes-charts.storage.googleapis.com/
  condition: mariadb.enabled
  tags:
    - database

In Helm 3, the same dependency is expressed in your Chart.yaml:

dependencies:
- name: mariadb
  version: 5.x.x
  repository: https://kubernetes-charts.storage.googleapis.com/
  condition: mariadb.enabled
  tags:
    - database

Charts are still downloaded and placed in the charts/ directory, so subcharts vendored into the charts/ directory will continue to work without modification.

Helm 3 Preview: Charting Our Future – Part 6: Introducing Library Charts

Helm 3 supports a class of chart called a “library chart”. This is a chart that is shared by other charts, but does not create any release artifacts of its own. A library chart’s templates can only declare defineelements. Globally scoped non-define content is simply ignored. This allows users to re-use and share snippets of code that can be re-used across many charts, avoiding redundancy and keeping charts DRY.

Library charts are declared in the dependenciesdirective in Chart.yaml, and are installed and managed like any other chart.

dependencies:
  - name: mylib
    version: 1.x.x
    repository: quay.io

We’re very excited to see the use cases this feature opens up for chart developers, as well as any best practices that arise from consuming library charts.

Helm 3 Preview: Charting Our Future – Part 6: Introducing Library Charts

This is part 6 of 7 of our Helm 3 Preview: Charting Our Futureblog series on library charts. You can find our previous blog post on the Helm chart dependencies here.

Helm 3 supports a class of chart called a “library chart”. This is a chart that is shared by other charts, but does not create any release artifacts of its own. A library chart’s templates can only declare defineelements. Globally scoped non-define content is simply ignored. This allows users to re-use and share snippets of code that can be re-used across many charts, avoiding redundancy and keeping charts DRY.

Library charts are declared in the dependenciesdirective in Chart.yaml, and are installed and managed like any other chart.

dependencies:
  - name: mylib
    version: 1.x.x
    repository: quay.io

We’re very excited to see the use cases this feature opens up for chart developers, as well as any best practices that arise from consuming library charts.

Helm 3 Preview: Charting Our Future – Part 7: What’s Next?

Helm 3.0.0-alpha.1 is the foundation upon which we’ll begin to build the next version of Helm. The features above are some of the big promises we made for Helm 3. Many of those features are still in their early stages and that is OK; the idea of an alpha release is to test out an idea, gather feedback from early adopters, and validate those assumptions.

Once the alpha has been released, we can start accepting patches from the community for Helm 3. We should have a stable foundation on which to build and accept new features, and users should feel empowered to open tickets and contribute fixes.

In this blog , I have tried to highlight some of the big improvements coming to Helm 3, but this list is by no means exhaustive. The full plan for Helm 3 includes features such as improved upgrade strategies, deeper integrations with OCI registries, and applying JSON schemas against chart values for validation purposes. We’re also taking a moment to clean up the codebase and updating parts that have languished over the last three years.

If you feel like a topic was missed, we’d love to hear your thoughts!

Feel free to join the discussion in our Slack channels:

  • #helm-usersfor questions and just to hang out with the community
  • #helm-devfor discussing PRs, code, and bugs

A year later – updating Container Attached Storage

By | Blog

Guest post by Evan Powell, CEO at MayaData

Last year we published a blog with a good amount of coaching and feedback from the CNCF team that set out to define the Container Attached Storage (CAS) approach.  As a reminder, we tend to include OpenEBS of course as well as solutions that have similar architectures such as the proprietary PortWorx and StorageOS into the CAS category.  

https://www.cncf.io/blog/2018/04/19/container-attached-storage-a-primer/

Now that OpenEBS has been contributed to the CNCF as a Sandbox project as an open source example of the CAS approach (as of May 14th 2019), I thought it timely to update this overview of the category.   

Last year’s category-defining blog built on a vision of our approach that I had shared some years before at the Storage Developer Conference and which Jeffry Molanus, MayaData’s CTO has discussed in more depth at FOSDEM (Free and Open source Software Developers’ European Meeting) and elsewhere including demonstrating soon to be available software breaking the million IOPS barrier:

https://ftp.osuosl.org/pub/fosdem/2019/H.2214/openebs_breaking_million_iops_barrier.mp4

As a quick review:

Key attributes of CAS include:

  • Using Kubernetes and containers to deliver storage and data management services to workloads running on Kubernetes;
  • Additive to underlying storage whether those are cloud volumes, traditional storage SANs, a bunch of disks, NVMe or whatever;
  • Per workload storage meaning each group and workload has its own micro storage system comprised of a controller or controllers which themselves are stateless and then, in addition, underlying data containers.

Key benefits of CAS:

  • Immediate deployment – a few seconds and there you go – storage protecting your data and managing the underlying environment and even providing snapshots and clones for common CI/CD and migration use cases (note that some CAS solutions do have kernel modules which could slow this down depending on your environment).
  • Zero operations – there really isn’t any such thing as NoOps – however, embedding your storage intelligence into Kubernetes itself can reduce the storage operations burden considerably.
  • Run anywhere the same way – especially with a solution like OpenEBS that is in the user space, you can abstract away from the various flavors of storage; this is consistent with the mission of Kubernetes itself of course!
  • Save money, improve the resilience of your cloud storage – thanks to thin provisioning and the ability to span availability zones and to spin up and down easily in some cases users are saving 30% or more on their cloud storage via the use of container attached storage

And the key drivers – why is it possible and even necessary now?

  • Applications have changed – applications and the teams that build them now have very different requirements; see for example the growth of NoSQL and so-called NewSQL solutions.
  • Kubernetes is becoming ubiquitous – providing for the first time a means to scale solutions like Container Attached Storage software.
  • Containers are much more efficient – and more ephemeral – so you see 10x to 100x more containers in a typical environment than VMs and they are much more dynamic than traditional VMs.
  • Storage media are perhaps 10,000x faster than when CEPH was written – the bottleneck in your environment used to be disk drives, and storage software heroically worked around this bottleneck by striping across environments; now the storage media is insanely fast, and your storage software’s inclination to stripe data adds latency

If you are interested in a hopefully humorous view of the history of storage in three slides, please take a look at this GIF-filled presentation:

https://docs.google.com/presentation/d/11dNx7-HEqUg6ZeUAtaKAzfqfscmuDcXaKWbjxbM3us4/edit#slide=id.g4d8fc3d7a4_1_0

Data on what we have learned in the last year:  

The momentum MayaData has and other CAS solution provides are experiencing validates that the Container Attached Storage approach makes a lot of sense for many users.  

We have learned a lot about these deployment patterns and common workloads, some of which I summarize here.  The following data is from a survey of OpenEBS users however the patterns are similar for CAS more broadly – keeping in mind that OpenEBS is open source and especially lightweight so it may skew slightly more towards cloud deployments for example.  

Here are a few interesting data points:

  1. When asked where they are running OpenEBS, users respond yes (i.e. all of the above):

Public or private cloud?

Which cloud?

What Kubernetes?

  1.  Similarly, when asked what workloads they are running on OpenEBS, we again see a great diversity of answers:

What we have found is similar to what I think everyone that has worked in storage has found over the years – workloads and solutions pull storage along.  

Over the last year, we have seen opportunities for collaboration with other CAS solution providers as well.  As an example, we collaborated with Portworx in supporting and improving WeaveScope. The MayaData team contributed extensions to the WeaveScope project to add visibility of PVCs and PVs and underlying storage and Portworx engineers provided feedback and insight into use cases.  The software is of course upstream in WeaveScope itself and then we also offer a free monitoring and management solution called MayaOnline for community members as well that incorporates this code.

While CAS approaches allow Kubernetes to orchestrate the data along with, the workloads – the natural question that is often asked is essentially “Dude, where’s my data?”  As you can see WeaveScope and MayaOnline provide an up to date answer to that and related questions:

Conclusion

Thank you all for your feedback and support over the last couple of years since we first open sourced OpenEBS in early 2017 and since we helped to define the Container Attached Category in the Spring of 2018.  We increasingly hear and see the CAS category used by commentators and “thought leaders” – which is great. The idea of an additive and truly Kubernetes native solution for data services just makes too much sense.  

After all – what is good for the goose (the applications and workloads themselves) is good for the gander (in this case all manner of infrastructure, including storage services).  Even though addressing cloud lock-in could impact the primary source of revenues for clouds, we still, on the whole, see support in the Kubernetes and cloud-native community for the CAS pattern.  

We are getting closer than ever to the openness and agility we need for our data to keep up with our applications.  Thanks again for your interest in the CAS pattern. Please keep in touch with your feedback, use cases, and insights.  

 

ICYMI: May 2019 San Francisco Linkerd Meetup

By | Blog

The San Francisco Linkerd May Meetup was a fun night filled with Linkerd enthusiasts, education, great food, and lots of good conversation. If you missed it, we’ve got ya covered: all the talks were recorded!

Talk 1: Meshing from monolith to microservices with Linkerd


In this talk, Leo Liang, engineering manager at Cruise Automation, spoke about how OfferUp (his previous employer) evolved a high growth startup architecture into the microservice world through practical architecture examples. Over the past 2.5 years, Leo and his team worked with the Linkered community, leveraged with Consul, Nginx, Prometheus, and deeply customized Linkerd with Plugins to build up the service mesh. While at OfferUp his team scaled to billions of requests per day with ever improving system reliability and flexibility.

Talk 2: REST to gRPC with Linkerd

In this talk, Kevin Lingerfelt, a software engineer at Buoyant and a core contributor to the Linkerd project, shared how companies are moving their architectures from REST-based APIs to gRPC. His talk covered reasons for moving and best practices for running gRPC in production. He detailed on how Linkerd enhances gRPC by providing metrics, load balancing, and support for timeout and retries. He also described how the Linkerd project itself employs multiple gRPC features to facilitate robust communication between its control plane and its data plane.

The San Francisco Linkerd Meetup group is a gathering spot for like-minded developers and engineers of all skill levels who are interested in Linkerd, the open source service mesh. Join us now! Want to give a talk, or have a venue in SF that can host this meetup? Please email events@buoyant.io!

Performance optimization of etcd in web scale data scenario

By | Blog

By Xingyu Chen, Software Engineer at Alibaba Cloud

Abstract

etcd is an open source distributed kv storage system that has recently been listed as a sandbox incubation project by CNCF.  etcd is widely used in many distributed systems. For example, Kubernetes uses etcd as a ledger for storing various meta information inside the cluster.  This article first introduces the background of the optimization. It then describes the working mechanism of the etcd internal storage and the specific optimization implementation. The evaluation results are presented in the end.

Background

Due to the large Kubernetes cluster size in Alibaba, there is a remarkably high capacity requirement for etcd which exceeds the supported limit. Therefore, we implemented a solution based on the etcd proxy by dumping the overflowed data to another Redis like KV storage system. Although this solution solves the storage capacity problem, the drawbacks are obvious. The operation delay is much larger than the native etcd since etcd proxy needs to move data around. In addition, operation and maintenance costs are higher due to the use of another KV storage system. Therefore, we’d like to understand the fundamental factor that determines the etcd storage supported limit and try to optimize it for higher capacity limit.

To understand the etcd capacity problem, we first carried out a stress test which keeps injecting data to etcd. When the amount of data stored in etcd exceeded 40GB, after a compact operation, we found that latency of put operation increases significantly and many put operations are timed out. Looking at the monitoring tool closely, we found that the latency increase is due to slow down of the internal spill operation of boltdb (see below for definition) which takes around 8 seconds, much higher than the usual 1ms. The monitoring results are presented in Figure 1. The experiment results are consistent across multiple runs which means once etcd capacity goes beyond 40GB,  all read and write operations are much slower than normal which is unacceptable for large scale data applications.


Figure 1. Performance degradation of Etcd when data exceeds 40GB.  

etcd Internal

The etcd storage layer consists of two major parts, one is in-memory btree-based index layer and one boltdb-based disk storage layer. We focus on the underlying boltDB layer in the rest of the document because it is the optimization target. Here is the introduction of boltDB quoted from https://github.com/boltdb/bolt/blob/master/README.md

Bolt was originally a port of LMDB so it is architecturally similar.

Both use a B+tree, have ACID semantics with fully serializable transactions, and support lock-free MVCC using a single writer and multiple readers.

Bolt is a relatively small code base (<3KLOC) for an embedded, serializable, transactional key/value database so it can be a good starting point for people interested in how databases work.

As mentioned above, bolteDB has a concise design, which can be embedded into other software used as a database. For example, etcd has built-in boltDB as the engine for storing k/v data internally. boltDB uses B+ tree to store data, and the leaf node stores the real key/value. It stores all data in a single file, maps it to memory using mmap syscall. It reads and updates the file using write syscall. The basic unit of data is called a page, which is 4KB by default. When page deletion occurs, boltdb does not directly reclaim the storage of the deleted page. Instead, it saves the deleted pages temporarily to form a free page pool for subsequent use. This free page pool is referred to as freelist in boltDB. Figure 2 presents an example of boltDB page meta data.

Figure 2. The boltDB page meta data

The red page 43, 45, 46, 50 pages are being used, while the pages 42, 44, 47, 48, 49, 51 are free for later use.

Problem

When user data is frequently written into etcd, the internal B+ tree structure will be adjusted (such as rebalancing, splitting the nodes). The spill operation is a key step in boltDB to persist the user data to the disk, which occurs after the tree structure is adjusted. It releases unused pages to the freelist or requests pages from the freelist to save data.

Through an in-depth investigation of the spill operation, we found the performance bottleneck exists in the following code in the spill operation:

 

// arrayAllocate returns the starting page id of a contiguous list of pages of a given size.

// If a contiguous block cannot be found then 0 is returned.

func (f *freelist) arrayAllocate(txid txid, n int) pgid {

         ...

    var initial, previd pgid

    for i, id := range f.ids {

        if id <= 1 {

            panic(fmt.Sprintf("invalid page allocation: %d", id))

        }

 

        // Reset initial page if this is not contiguous.

        if previd == 0 || id-previd != 1 {

            initial = id

        }

 

        // If we found a contiguous block then remove it and return it.

        if (id-initial)+1 == pgid(n) {

            if (i + 1) == n {

                f.ids = f.ids[i+1:]

            } else {

                copy(f.ids[i-n+1:], f.ids[i+1:])

                f.ids = f.ids[:len(f.ids)-n]

            }

 

            ...

            return initial

        }

 

        previd = id

    }

    return 0

}

The above code suggests that when boltDB reassigns the pages in freelist, it tries to allocate consecutive n free pages for use, and returns the start page id if the consecutive space is found. f.ids in the code is an array that records the id of the internal free page. For example, for the case illustrated in Figure 3, f.ids=[42,44,47,48,49,51]

This method performs a linear scan for n consecutive pages. When there are a lot of internal fragments in freelist, for example, the consecutive pages existing in the freelist are mostly small sizes such as 1 or 2, the algorithm will take a long time to perform if the request consecutive page size is large. In addition, the algorithm needs to move the elements of the array. When there are a lot of array elements, i.e., a large amount of data is stored internally, this operation is very slow.

Optimization

From above analysis, we understand the linear scan of empty pages is not a scalable algorithm. Inspired by Udi Manber, former yahoo’s chief scientist, who once said that the three most important algorithms in yahoo are hashing, hashing and hashing!, we attempt to use multiple hashes to solve the scalability problem.

In our optimization, consecutive pages of the same size are organized by set, and then the hash algorithm is used to map different page sizes to different sets. See the freemaps data structure in the new freelist structure below for example. When user needs a continuous page of size n, we just query freemaps to find the first page of the consecutive space.

type freelist struct {

  ...

    freemaps map[uint64]pidSet // key is the size of continuous pages(span), value is a set which contains the starting pgids of same size

    forwardMap map[pgid]uint64 // key is start pgid, value is its span size

    backwardMap map[pgid]uint64 // key is end pgid, value is its span size

    ...

}

In addition, when consecutive pages are released, we need to merge as much as possible into a larger consecutive page. The original algorithm uses a time-consuming approach(O(nlgn)). We optimize it by using hash algorithms as well. The new approach uses two new data structures, forwardMap and backwardMap, which are explained in the comments above.

When a page is released, it tries to merge with the previous page by querying backwardMap and tries to merge with the following page by querying forwardMap. The specific algorithm is shown in the following mergeWithExistingSpan function.

// mergeWithExistingSpan merges pid to the existing free spans, try to merge it backward and forward

func (f *freelist) mergeWithExistingSpan(pid pgid) {

    prev := pid - 1

    next := pid + 1

 

    preSize, mergeWithPrev := f.backwardMap[prev]

    nextSize, mergeWithNext := f.forwardMap[next]

    newStart := pid

    newSize := uint64(1)

 

    if mergeWithPrev {

        //merge with previous span

        start := prev + 1 - pgid(preSize)

        f.delSpan(start, preSize)

 

        newStart -= pgid(preSize)

        newSize += preSize

    }

 

    if mergeWithNext {

        // merge with next span

        f.delSpan(next, nextSize)

      newSize += nextSize

    }

 

    f.addSpan(newStart, newSize)

}

 

The new algorithm is illustrated in Figure 3. When page 45, 46 are released, the algorithm tries to merge with page 44, and then merge with pages 47, 48, 49 to form a new free page span.

Figure 3. Illustration of merging full page spans

The above algorithm is similar to the segregated freelist algorithm used in memory management. It reduces the page allocation time complexity from O(n) to O(1), and the release from O(nlgn) to O(1).

Evaluation

The following tests are conducted in a one-node etcd cluster in order to exclude other factors such as the network. The test simulates 100 clients put 1 million kv pairs to etcd at the same time. The key/value contents are random, and we limit the throughput to 5000 op/s. The test tool is the etcd official benchmark tool. The latency results are presented.

Performance of using new segregated hashmap

Performance of old algorithm

There are some timeouts that have not completed the test,

Comparison

The less time, the better performance. The performance boost factor is the run time normalized to the latest hash algorithm.

Scenario Completion Time Performance boost
New hash algorithm 210s baseline
Old array algorithm 4974s 24x

The new algorithm’s performance will be even better in larger scale scenarios.

Conclusion

The new optimization reduces time complexity of the internal freelist allocation algorithm in etcd from O(n) to O(1) and the page release algorithm from O(nlgn) to O(1), which solves the performance problem of etcd under the large database size. Literally, the etcd’s performance is not bound by the storage size anymore. The read and write operations when etcd stores 100GB of data can be as quickly as storing 2GB. This new algorithm is fully backward compatible, you can get the benefit of this new algorithm without data migration or data format changes! At present, the optimization has been tested repeatedly in Alibaba for more than 2 months and no surprise happened. It has been contributed to the open source community link. You can enjoy it in the new versions of boltdb and etcd.

About the Author

Xingyu Chen (github id: WIZARD-CXY) is a software engineer works in Alibaba Cloud. He is the owner of the etcd cluster management in Alibaba and an active etcd/kubernetes contributor. His main interest is the performance and stability of etcd cluster.

Running Kubernetes locally on Linux with Minikube – now with Kubernetes 1.14 support

By | Blog

Originally posted on Kubernetes.io.

By Ihor Dvoretskyi, Developer Advocate, Cloud Native Computing Foundation

A few days ago, the Kubernetes community announced Kubernetes 1.14, the most recent version of Kubernetes. Alongside it, Minikube, a part of the Kubernetes project, recently hit the 1.0 milestone, which supports Kubernetes 1.14 by default.

Kubernetes is a real winner (and a de facto standard) in the world of distributed Cloud Native computing. While it can handle up to 5000 nodes in a single cluster, local deployment on a single machine (e.g. a laptop, a developer workstation, etc.) is an increasingly common scenario for using Kubernetes.

A few weeks ago I ran a poll on Twitter asking the community to specify their preferred option for running Kubernetes locally on Linux:

This is post #1 in a series about the local deployment options on Linux, and it will cover Minikube, the most popular community-built solution for running Kubernetes on a local machine.

Minikube is a cross-platform, community-driven Kubernetes distribution, which is targeted to be used primarily in local environments. It deploys a single-node cluster, which is an excellent option for having a simple Kubernetes cluster up and running on localhost.

Minikube is designed to be used as a virtual machine (VM), and the default VM runtime is VirtualBox. At the same time, extensibility is one of the critical benefits of Minikube, so it’s possible to use it with drivers outside of VirtualBox.

By default, Minikube uses Virtualbox as a runtime for running the virtual machine. Virtualbox is a cross-platform solution, which can be used on a variety of operating systems, including GNU/Linux, Windows, and macOS.

At the same time, QEMU/KVM is a Linux-native virtualization solution, which may offer benefits compared to Virtualbox. For example, it’s much easier to use KVM on a GNU/Linux server, so you can run a single-node Minikube cluster not only on a Linux workstation or laptop with GUI, but also on a remote headless server.

Unfortunately, Virtualbox and KVM can’t be used simultaneously, so if you are already running KVM workloads on a machine and want to run Minikube there as well, using the KVM minikube driver is the preferred way to go.

In this guide, we’ll focus on running Minikube with the KVM driver on Ubuntu 18.04 (I am using a bare metal machine running on packet.com.)

Disclaimer

This is not an official guide to Minikube. You may find detailed information on running and using Minikube on it’s official webpage, where different use cases, operating systems, environments, etc. are covered. Instead, the purpose of this guide is to provide clear and easy guidelines for running Minikube with KVM on Linux.

Prerequisites

  • Any Linux you like (in this tutorial we’ll use Ubuntu 18.04 LTS, and all the instructions below are applicable to it. If you prefer using a different Linux distribution, please check out the relevant documentation)
  • libvirt and QEMU-KVM installed and properly configured
  • The Kubernetes CLI (kubectl) for operating the Kubernetes cluster

QEMU/KVM and libvirt installation

NOTE: skip if already installed

Before we proceed, we have to verify if our host can run KVM-based virtual machines. This can be easily checked using the kvm-ok tool, available on Ubuntu.

sudo apt install cpu-checker && sudo kvm-ok

If you receive the following output after running kvm-ok, you can use KVM on your machine (otherwise, please check out your configuration):

$ sudo kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used

Now let’s install KVM and libvirt and add our current user to the libvirt group to grant sufficient permissions:

sudo apt install libvirt-clients libvirt-daemon-system qemu-kvm \
    && sudo usermod -a -G libvirt $(whoami) \
    && newgrp libvirt

After installing libvirt, you may verify the host validity to run the virtual machines with virt-host-validate tool, which is a part of libvirt.

sudo virt-host-validate

kubectl (Kubernetes CLI) installation

NOTE: skip if already installed

In order to manage the Kubernetes cluster, we need to install kubectl, the Kubernetes CLI tool.

The recommended way to install it on Linux is to download the pre-built binary and move it to a directory under the $PATH.

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl \
    && sudo install kubectl /usr/local/bin && rm kubectl

Alternatively, kubectl can be installed with a big variety of different methods (eg. as a .deb or snap package – check out the kubectl documentation to find the best one for you).

Minikube installation

Minikube KVM driver installation

A VM driver is an essential requirement for local deployment of Minikube. As we’ve chosen to use KVM as the Minikube driver in this tutorial, let’s install the KVM driver with the following command:

curl -LO https://storage.googleapis.com/minikube/releases/latest/docker-machine-driver-kvm2 \
    && sudo install docker-machine-driver-kvm2 /usr/local/bin/ && rm docker-machine-driver-kvm2

Minikube installation

Now let’s install Minikube itself:

curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 \
    && sudo install minikube-linux-amd64 /usr/local/bin/minikube && rm minikube-linux-amd64

Verify the Minikube installation

Before we proceed, we need to verify that Minikube is correctly installed. The simplest way to do this is to check Minikube’s status.

minikube version

To use the KVM2 driver:

Now let’s run the local Kubernetes cluster with Minikube and KVM:

minikube start --vm-driver kvm2

Set KVM2 as a default VM driver for Minikube

If KVM is used as the single driver for Minikube on our machine, it’s more convenient to set it as a default driver and run Minikube with fewer command-line arguments. The following command sets the KVM driver as the default:

minikube config set vm-driver kvm2

So now let’s run Minikube as usual:

minikube start

Verify the Kubernetes installation

Let’s check if the Kubernetes cluster is up and running:

kubectl get nodes

Now let’s run a simple sample app (nginx in our case):

kubectl create deployment nginx --image=nginx

Let’s also check that the Kubernetes pods are correctly provisioned:

kubectl get pods

Screencast

Next steps

At this point, a Kubernetes cluster with Minikube and KVM is adequately set up and configured on your local machine.

To proceed, you may check out the Kubernetes tutorials on the project website:

It’s also worth checking out the “Introduction to Kubernetes” course by The Linux Foundation/Cloud Native Computing Foundation, available for free on EDX: