Category

Blog

Phippy + Cloud Native Friends Make CNCF Their Home

By | Blog

In 2016, Deis (now part of Microsoft) platform architect Matt Butcher was looking for a way to explain Kubernetes to technical and non-technical people alike. Inspired by his daughter’s prolific stuffed animal collection, he came up with the idea of “The Illustrated Children’s Guide to Kubernetes.” Thus Phippy, the yellow giraffe and PHP application, along with her friends, were born.

Today we’re excited to welcome Phippy and the cast of snuggly, cloud native characters into CNCF. As Kubernetes continues to see unprecedented momentum, her story offers developers an easy way to explain their work to parents, friends, and children.

Today, live from the keynote stage at KubeCon + CloudNativeCon North America, Matt and co-author Karen Chu announced Microsoft’s donation and presented the official sequel to the Children’s Illustrated Guide to Kubernetes in their live reading of “Phippy Goes to the Zoo: A Kubernetes Story” – the tale of Phippy and her niece as they take an educational trip to the Kubernetes Zoo.

As part of Microsoft’s donation of both books and the characters Phippy, Goldie, Captain Kube, and Zee, CNCF has licensed all of this material under the Creative Commons Attribution License (CC-BY), which means that you can remix, transform, and build upon the material for any purpose, even commercially. If you use the characters, please include the text “phippy.io” to provide attribution (and online, please include a link to https://phippy.io).

The characters were created by Matt Butcher, Karen Chu, and Bailey Beougher. Goldie is based on the Go Gopher, created by Renee French, which is also licensed under CC-BY. Images of the characters are available in the CNCF artwork repo in svg, png, and ai formats and in color, black, and white.

Now that Phippy and her cloud native friends have made CNCF their home, make sure to keep an eye out for the fun adventures the characters will find themselves on as the Kubernetes global community continues to grow!

CNCF to Host etcd

By | Blog

Today, the Cloud Native Computing Foundation (CNCF) Technical Oversight Committee (TOC) voted to accept etcd as an incubation-level hosted project from KubeCon + CloudNativeCon Seattle.

etcd is a distributed key value store that provides a reliable way to store data across a cluster of machines with best-of-class stability, reliability, scalability, and performance. The project – frequently teamed with applications such as Kubernetes, M3, Vitess, and Doorman – handles leader elections during network partitions and will tolerate machine failure, including the leader.

“etcd acts as a source of truth for systems like Kubernetes,” said Brian Grant, TOC representative and project sponsor, Principal Engineer at Google, and Kubernetes SIG Architecture Co-Chair and Steering Committee member. “As a critical component of every cluster, having a reliable way to automate its configuration and management is essential. etcd offers the necessary coordination mechanisms for cloud native distributed systems, and is cloud native itself.”

All Kubernetes clusters use etcd as their primary data store. As such, it handles storing and replicating data for Kubernetes cluster state and uses the Raft consensus algorithm to recover from hardware failure and network partitions. In addition to Kubernetes, Cloud Foundry also uses etcd as their distributed key-value store. This means etcd is used in production by companies such as Ancestry, ING, Pearson, Pinterest, The New York Times, Nordstrom, and many more.

“Alibaba uses etcd for several critical infrastructure systems, given its superior capabilities in providing high availability and data reliability,” said Xiang Li, senior staff engineer, Alibaba. “As a maintainer of etcd we see the next phase for etcd to focus on usability and performance. Alibaba looks forward to continuing co-leading the development of etcd and making etcd easier to use and more performant.”

“AWS is proud to have dedicated maintainers of etcd on our team to help ensure the bright future ahead for etcd. We look forward to continuing work alongside the community to continue the project’s stability,” said Deepak Singh, Director of Container Services at AWS.

“The Certificate Transparency team at Google works on both implementations and standards to fundamentally improve the security of internet encryption. The Open Source Trillian project is used by many organizations as part of that effort to detect the mis-issuance of trusted TLS certificates on the open internet”, says Al Cutter, the team lead, “And, etcd continues to play a role in the project by safely storing API quota data to protect Trillian instances from abusive requests, and reliably coordinating critical operations.”

Written in Go, etcd has unrivaled cross-platform support, small binaries, and a thriving contributor community. It also integrates with existing cloud native tooling like Prometheus monitoring, which can track important metrics like latency from the etcd leader and provide alerting and dashboards.

“Kubernetes and many other projects like Cloud Foundry depend on etcd for reliable data storage. We’re excited to have etcd join CNCF as an incubation project and look forward to cultivating its community by improving its technical documentation, governance and more,” said Chris Aniszczyk, COO of CNCF. “etcd is a fantastic addition to our community of projects.”

“When we introduced etcd during the early days of CoreOS, we wanted it to be this ubiquitously available component of a larger system. Part of the way you get ubiquity is to get everyone using it, and etcd hit critical mass with Kubernetes and has extended to many other projects and users since. As etcd goes into the CNCF, maintainers from Amazon, Alibaba, Google Cloud, and Red Hat all have nurtured the project as its user base has grown. In fact, etcd is deployed in every major cloud provider now and is a part of products put forward by all these companies and across the cloud native ecosystem,” said Brandon Philips, CTO of CoreOS at Red Hat. “Having a neutral third party stewarding the copyrights, DNS and other project infrastructure is the reasonable next step for the etcd project and users.”

Other common use cases of etcd include storing important application configuration like database connection details or feature flags as key value pairs. These values can be watched, allowing the application to reconfigure itself when changed. Advanced uses take advantage of the consistency guarantees to implement database leader elections or do distributed locking across a cluster of workers.

Main etcd Features:

  • Easily manages cluster coordination and state management across any distributed system
  • Written in Go and uses the Raft consensus algorithm to manage a highly available replicated log
  • Enables reliable distributed coordination through distributed locking, leader election, and write barriers
  • Handles leader elections during network partitions and will tolerate machine failure, including the leader
  • Supports dynamic cluster membership reconfiguration
  • Offers stable read/write under high load
  • Includes a multi-version concurrency control data model
  • Empowers reliable key monitoring, which never silently drop events
  • Lease primitives for expiring keys

Notable Milestones:

  • 469 contributors
  • 21,627 GitHub stars
  • 157 releases
  • 14,825 commits
  • 4,310 forks
  • 9 maintainers representing 8 companies

As a CNCF hosted project, joining Incubated technologies like OpenTracing, Fluentd, Linkerd, gRPC, CoreDNS, containerd, rkt, CNI, Envoy, Jaeger, Notary, TUF, Vitess, NATS Helm, Rook and Harbor, etcd is part of a neutral foundation aligned with its technical interests, as well as the larger Linux Foundation, which provide governance, marketing support, and community outreach.

Every CNCF project has an associated maturity level: sandbox, incubating, or graduated project. For more information on what qualifies a technology for each level, please visit the CNCF Graduation Criteria v.1.1.

For more on etcd, please visit github.com/etcd-io/etcd. You can also watch this session from KubeCon + CloudNativeCon Copenhagen on distributed consensus in etcd and Kubernetes.

A Look Back At KubeCon + CloudNativeCon Shanghai 2018

By | Blog

Now that we’ve finally caught our breath after a fantastic two days at the KubeCon + CloudNativeCon in Shanghai, let’s dive into some of the key highlights and news. The best part is we get to see so many of you so soon again at KubeCon + CloudNativeCon Seattle in December!

The sold-out event with more than 2,500 attendees (technologists, maintainers and end users of CNCF’s hosted projects) was full of great keynotes, presentations, discussions and deep dives on projects including Rook, Jaeger, Kubernetes, gRPC, containerd – and many more!  Attendees had the opportunity to hear a slew of compelling talks from CNCF project maintainers, community members and end users including eBay, JD.com and Alibaba.

After hosting successful events in Europe and North America the past few years, it’s no wonder China was the next stop on the tour. Asia has seen a spike in of cloud native adoption to the tune of 135% since March 2018. You can find more information about the tremendous growth of cloud usage in China from CNCF’s Survey.

The conference kicked off by welcoming 53 new global members and end users to the Foundation, including several from China such as R&D China Information Technology, Shanghai Qiniu Information Technologies, Beijing Yan Rong Technology and Yuan Ding Technology. The CNCF has 40 members in China, which represents a little more than 10% of the CNCF’s total membership.

AI Fuels Even Greater Kubernetes Growth

It wouldn’t be KubeCon without some great talks on Kubernetes! Opening the first KubeCon conference in China, Janet Kuo, co-chair of conference, said: “Kubernetes is boring and that’s good.”

During the conference there were many great sessions on AI as well as a Kubernetes AI panel discussion that included product managers, data scientists, engineers, and architects from Google, CaiCloud, eBay, and JD.com

“We are extending AI to edge with Kubernetes.”

Harbor

In further exciting China news, Harbor, the first project contributed by VMware to CNCF, successfully moved from sandbox into incubation.  “Harbor is not only the first project donated by VMware to CNCF, but it is also the first Chinese program developed in the Chinese open source community donated to CNCF.”

– More details will be shared on an integration with Harbor in KubeCon Seattle

CNCF Diversity Scholarship

As always, the Foundation offered scholarships to members of traditionally underrepresented groups in the technology and/or open source communities. With $122,000 in diversity scholarship funds available, travel and/or conference registration was covered for 68 recipients KubeCon attendees.Scholarship funding was provided by AWS, CNCF, Google, Helm Summit, Heptio and VMware.

https://twitter.com/LaDonnaJQuinn/status/796475306320097280

End User Awards

The end user community continues to grow at an incredible pace and a very deserving JD.com won the CNCF End User Award.

Missed out on KubeCon + CloudNativeCon Shanghai? Don’t worry as you have more chances to attend. KubeCon + CloudNativeCon North America 2018 is taking place in Seattle, WA from December 10-13. The Conference is currently sold out, but if you’d like to be added to the waitlist, fill out this form. You will be notified as new spots become available.

CNCF News

The Linux Foundation And Cloud Native Computing Foundation Announce New Training Course For Prometheus

Cloud Native Computing Foundation Announces JD.com as Winner of Top End User Award

Cloud Native Computing Foundation Opens Inaugural KubeCon + CloudNativeCon China and Welcomes 53 New Members

TOC Votes to Move Harbor into CNCF Incubator

Tweets:

Sessions, Sessions, Session

Thank you Translators!

Meet the Maintainers

Community Coming Together

Save the Dates!

The call for speakers and proposals is now open for KubeCon + CloudNativeCon Europe 2019 which takes place in Barcelona from May 20-23. Registration will open early in 2019.

We’ll be back in China for KubeCon + CloudNativeCon Shanghai from June 24-26. Registration will go live early 2019.

And to finish of 2019, there’s KubeCon San Diego from November 18 – 21.  

We hope to see you at one of or all of these 2019 events!

 

 

The “Function” Package – Creating a Spec for Serverless Application Packaging

By | Blog

by Yoav Landman, CTO, JFrog

What’s in a function?

Creating serverless applications is a multi-step process. One of the critical steps in this process is packaging the serverless functions you want to deploy into your FaaS (Function as a Service) platform of choice.

Before a function can be deployed it needs two types of dependencies: direct function dependencies and runtime dependencies. Let’s examine these two types.

Direct function dependencies – These are objects that are part of the function process itself and include:

  • The function source code or binary
  • Third party binaries and libraries the function uses
  • Application data files that the function directly requires

Runtime function dependencies – This is data related to the runtime aspects of your function. It is not directly required by the function but configures or references the external environment in which the function will run. For example:

  • Event message structure and routing setup
  • Environment variables
  • Runtime binaries such as OS-level libraries
  • External services such as databases, etc.

For a function service to run, all dependencies, direct and runtime, need to be packaged and uploaded to the serverless platform. Today, however, there is no common spec for packaging functions. Serverless package formats are vendor-specific and are highly dependent on the type of environment in which your functions are going to run. This means that, for the most part, your serverless applications are locked-down to a single provider, even if your function code itself abstracts provider-specific details.

This article explores the opportunity to create a spec for an open and extensibile “Function Package” that enables the deployment of a serverless function binary, along with some extra metadata, across different FaaS vendors.

Function runtime environments

When looking at today’s common runtime FaaS providers, we see two mainstream approaches for running functions: container function and custom function runtimes. Let’s have a closer look at each type.

Container function runtimes

This method uses a container-based runtime, such as Kubernetes, and is the common runtime method for on-prem FaaS solutions. As its name suggests, it usually exposes container runtime semantics to users at one level or another. Here’s how it works.

A container entry-point is used as a function, together with an eventing layer, to invoke a specific container with the right set of arguments. The function is created with docker buildor any other image generator to create an OCI container image as the final package format.

Examples of serverless container-based runtimes include: Google’s Knative, Bitnami’s Kubeless, Iguazio’s Nuclioand Apache’s OpenWhisk.

The main issue with container functions is that developing a function often requires understanding of the runtime environment; developers need to weave their function into the container. Sometimes this process is hidden by the framework, but it is still very common for the developer to need to write a Dockerfile for finer control over operating system-level services and image structure. This leads to high coupling of the function with the runtime environment.

Custom function runtimes

Custom function runtime are commonly offered by cloud providers. They offer a “clean” model where functions are created by simply writing handler callbacks in your favorite programming language. In contrast to container-based functions, runtime details are left entirely to the cloud provider (even in cases where, behind-the-scenes, the runtime is based on containers).

Examples of serverless custom function runtimes are: AWS Lambda, Azure Functions and Google Cloud Functions.

Custom runtime environments use a language-centric model for functions. Software languages already have healthy packaging practices (such as Java jars, npm packages or Go modules). Still, no common packaging format exists for their equivalent functions. This creates a cloud-vendor lock-in. Some efforts have been made to define a common deployment model, (such as AWS Serverless Application Model (SAM) and the open-source Serverless Framework), but these models assume a custom binary package already exists or they may include the process of building it to each cloud provider standards.

The need for a Function Package

To be able to use functions in production, we need to use stable references to them to enable repeatability in deploying, upgrading and rolling back to a specific function version. This can be achieved with an immutable, versioned, sealed package that contains the function with all its direct dependencies.

Container images may meet these requirements because they offer a universal package format. However, there’s a side effect of “polluting” the function by tightly coupling it with details of the container runtime. Custom runtimes also exhibit coupling with their function packages. While they offer clean functions, they use proprietary package formats that mix the function dependencies with the runtime dependencies.

What we need is a clear separation: a clean function together with its dependencies in a native “Function Package” separate from external definitions of runtime-specific dependencies. This separation would allow us to take a function and reuse it across different serverless platforms, only adding external configuration as needed.

Getting a function to run

Let us reexamine for a moment the steps required to build and run a container-based function. We can look at this as a four step process:

(1) Packaging Build a container image together with (direct + runtime) function dependencies and entry point definition
(2) Persisting Push the image to a container registry so that we have a stable reference to it
(3) Installing Pull the image to the runtime and configure it according to runtime-specific dependencies
(4) Running Accept function events at the defined entry point

 

This process works pretty well for container runtimes. We can try to formulate it into a more generalized view:

(1) Packaging Build a function package that contains its direct dependencies (or references to them) and entry point definition
(2) Persisting Upload the package to a package registry
(3) Installing Download the package (and its direct dependencies) to the runtime and configure runtime-specific dependencies
(4) Running Accept function events at the defined entry point

Creating a function package

This general process can be applied across many serverless providers! Developers only need to worry about creating a clean Function Package and keeping it persistent. Runtime configuration can be provided upon installation and can be vendor-specific.

A Function Package would contain:

  1. Function files – in source code or binary format
  2. Direct dependencies – libraries and data files
  3. Entry point definition

The experience for the developer is simple and programming-focused, and therefore we can create spec “profiles” that map to a specific language type. For example:

Java Profile Golang Profile Docker Profile
Function files JAR file Go module source files Build file references to generic files run by the entry-point
Direct dependencies Generated pom file for the jar with a full flat list of dependencies + data files go.mod file containing the dependencies + data files Build file references to base image + other service files and data
Entry point definition Class reference “main” package reference Build file references to image entry point

What about dependencies?

A function package does not necessarily have to physically embed binary dependencies if they can be reliably provided by the runtime prior to installation. Instead, only references to dependencies could be declared. For example, external JAR coordinates declared in a POM file, or go packages declared in a go.modfile. These dependencies would be pulled by the runtime during installation – similar to how a container image is pulled from a docker registry by function runtimes.

By using stable references, we guarantee repeatability and reuse. We also create lighter packages that allow for quicker and more economic installation of functions by dependencies from a registry closer to the runtime, not having to reupload them each time.

Summary

Creating profiles for common runtime languages allows functions to be easily moved across different serverless FaaS providers, adding vendor-specific information pertaining to runtime configuration and eventing only at the installation phase. For application developers, this means they can avoid vendor lock-in and focus on writing their applications’ business logic without having to become familiar with operational aspects. This goal can be achieved more quickly by creating shareable Function Packages as a CNCF Serverless spec. If you are interested in discussing more – let’s talk!

KubeCon + CloudNativeCon Barcelona 2019 Call for Proposals (CFP) Is Open

By | Blog

KubeCon + CloudNativeCon has expanded from its start with 500 attendees in 2015 to become one of the largest and most successful open source conferences ever. With that growth comes challenges, and CNCF is eager to evolve the conference over time to best serve the cloud native community. Our upcoming event in Seattle (December 10-13, 2018), our biggest yet, is sold out several weeks ahead of time with 7,500 attendees.

From the start and throughout this growth, we’ve appreciated the feedback and input the community has shared.  We carefully review the post-event surveys and listen closely to suggestions and new ideas. This feedback loop is crucial and allows us to iterate and improve.

As we open the call for proposals (CFP) for Barcelona (May 20-23, 2019), we want to share several changes we’re planning to make in 2019, as well as some changes we considered but decided not to implement at this time. CNCF is part of the Linux Foundation (LF) and leverages the LF’s decade of experience running open source events, including more than 100 in 2018 with more than 30,000 attendees from more than 11,000 organizations and 113 countries. We’ve also received a lot of feedback from previous events, much of it laudatory and some with specific proposals for improvement.

Here are some changes we’re planning to implement in 2019:

  • CFPs will have room for longer submissions to encourage presenters to share additional background and technical information in their proposals.  
  • We will be willing to provide additional feedback to submitters whose talks are not selected. This feedback will fall in a set of categories rather than be personalized.
  • We have improved our tooling to only accept a single CFP talk from each speaker (or two co-presenter talks), and are limiting submitters to two solo submissions, 4 co-presenter submissions, or a combination.
  • In the maintainer track offered to CNCF-hosted projects, the Kubernetes SIGs and working groups, and CNCF working groups, we’re offering the opportunity to combine the intro and deep dive sessions into a longer 80-minute session. (Note that maintainer talks do not count against the single CFP talk per speaker quota.)
  • We are introducing two new kinds of smaller events in 2019. Kubernetes Days will be single day, single track events targeted at regions with large numbers of developers who cannot necessarily travel easily to our premiere events in Europe, China, and North America. Cloud Native Community Days will be regional events run by community members and will provide additional opportunities for speakers, practitioners and end users to come together.
  • We are encouraging any of our partner summits that would like to try a double-blind talk submission process to do so.

Here are some of the core elements of how we run the event that we are not planning to change:

  • KubeCon + CloudNativeCon is a conference for developers and end users (broadly defined) to communicate about open source, cloud native technologies.
  • Talks are rated by a program committee of community leaders and highly-rated speakers from past conferences, organized by conference co-chairs. The program committee is selected by the conference co-chairs, who also select the tracks and make the final talk selections.
  • Whether a company is a sponsor of the event or a member of CNCF has no impact on whether talks from their developers are selected. The only exception is that each diamond sponsor (6 in total) gets a 5-minute sponsored keynote. The co-chairs are now working with all keynote speakers, including the sponsored ones, to avoid vendor pitches so that the talks resonate with a community audience.
  • All talks are about using and/or developing open source software. Although many speakers are employed by software vendors, the conference content is focused on working with open source, not vendor offerings.
  • Talks can discuss one of CNCF’s 19 graduated/incubating projects or 11 sandbox projects or any other open source technology that adds value to the cloud native ecosystem.
  • We remain committed to increasing the voice of those who have been traditionally underrepresented in tech.

We select community leaders to serve as conference co-chairs and represent the cloud native community. The co-chairs for Barcelona are Janet Kuo of Google and Bryan Liles of Heptio. They are in the process of selecting a program committee of around 80 experts, which includes project maintainers, active community members, and highly-rated presenters from past events. Program committee members register for the topic areas they’re comfortable covering, and CNCF staff randomly assign a subset of relevant talks to each member. We then collate all of the reviews and the conference co-chairs spend a very challenging week assembling a coherent set of topic tracks and keynotes from the highest-rated talks. Here are the scoring guidelines we provide to the program committee. There is not a one-to-one mapping of topic areas to session tracks. We look to the conference co-chairs to craft a program that reflects current trends and interests in the cloud native community.

The above process is used to select the ~180 CFP sessions, which are offered in ~10 rooms. The keynote talks are selected by the conference co-chairs from highly-rated CFP submissions, or in rare cases, by invitation of the co-chairs to specific speakers.

In addition, KubeCon + CloudNativeCon also includes ~90 maintainer sessions spread across ~5 rooms. This is content produced by the maintainers of CNCF-hosted projects to inform users about the projects, add new adopters, and transition some of them from users to contributors. Sessions in the maintainer track are open to each of CNCF’s (29) hosted projects, the Kubernetes SIGs and working groups, and CNCF working groups. Each of these can do one 35-minute Intro and one 35-minute Deep Dive session. New for 2019, we’re offering to schedule these back-to-back to enable one 80-minute session.

Another fast-growing part of KubeCon + CloudNativeCon is the partner summits held the day before the event. This is an opportunity for projects and companies in the cloud native community to engage with KubeCon + CloudNativeCon attendees. For Seattle, there are 27 separate events! They range from community-organized events like the Kubernetes Contributor Summit and EnvoyCon to member-organized summits to open source projects from adjacent communities like networking initiatives FD.io and Tungsten Fabric. The content and pricing of these events are determined by the organization that runs each one.

The Review Process

Submissions for the CFP sessions are selected in a single-blind process. That is, the reviewer can see information on who is proposing the talk but the submitter does not see who reviewed their submission. Some academic conferences have switched to double-blind submissions, where the submitter removes all identifying information from their submission and the reviewers judge it based solely on the quality of the content. The downside is that it would require significantly more detailed submissions.

Submissions for Barcelona consist of a title and up to a 900 character description, which is used in the schedule if the talk is selected. There is an additional Benefits to the Ecosystem section of up to 1,500 characters to make the case for the submission (this is up significantly from the 300 characters allowed in 2018). To support double-blind selection, we would need to require submissions of 9,000 characters (~3 pages) or more, which is typical of academic-style conferences to encourage effective review. We believe this would discourage many of the practitioners and end users of cloud native technologies from submitting, and more talks would come from academics and those with the time and proclivity to make longer submissions.

This has pros and cons, but it would be a very significant change and unprecedented among open source conferences run by the LF. We considered testing a double-blind process with one topic area (such as service mesh) but decided that it would be too big of a change for an unknown improvement. Instead, we are encouraging any of our partner summits that would like to try a double-blind talk submission process to do so. The LF Events staff is happy to work with them to organize such a process for Barcelona or future events, and if the results go well, we might expand to one or more tracks at a future KubeCon + CloudNativeCon.

For Seattle, the acceptance rate was only 13%, which we understand creates a lot of disappointment and frustration when a very good talk is not accepted. For 2019, we will be providing additional feedback to submitters whose talks are not selected. This feedback will fall in a set of categories such as “not in top half of scores submitted” and “highly-rated but a similar talk was accepted instead.”

How to Get Your Talk Accepted

Whether a company is a member or end user supporter of CNCF or is sponsoring the event has no impact on whether talks from their developers will be selected. The only exception is that the 6 diamond sponsors each get a 5-minute sponsored keynote. However, being a community leader does have an impact, as program committee members will often rate talks from the creators or leaders of an open source project more highly.

Avoid the common pitfall of submitting a sales or marketing pitch for your product or service, no matter how compelling it is. Focus on your work with an open source project, whether it is one of the CNCF’s 29 hosted projects or a new project that adds value to the cloud native ecosystem.

KubeCon + CloudNativeCon is fundamentally a community conference focusing on the development and deployment of cloud native open source projects. So, pick your presenter and target audience accordingly. Our participants range from top experts to total beginners, so we explicitly ask what level of technical difficulty your talk is targeted for (beginner, intermediate, advanced, or any) and aim to provide a range.

We often get many submissions covering almost the same concept, so even if there are several great submissions, the co-chairs will probably only pick one. Consider choosing a more unique topic that is relevant, but less likely to be submitted by multiple people.

Our community is particularly interested in end users adopting cloud native technology. End users are companies that use cloud native technologies internally but do not sell any cloud native services externally. End users generally do not have a commercial product on the Cloud Native Landscape, though they may have created an open source project to share their internal technology. For more information, please see the kinds of companies in CNCF’s End User Community. If you don’t work for an end user company, consider co-presenting with an end user who has adopted your technology.

Given that talk recordings are available on YouTube, and there is very limited space on the agenda, avoid submissions that were already presented at a previous KubeCon + CloudNativeCon or any other event. If your submission is similar to a previous talk, please include information on how this version will be different. Make sure your presentation is timely, relevant, and new.

We’ve improved our tooling to only accept a single CFP talk from each speaker, and are limiting submitters to two submissions. Specifically, we’re counting being a co-presenter as 0.5 of a talk, and limiting submissions to 2.0 talks in all. So, at most, you can submit as a co-presenter on 4 talks, a solo presenter on 2 talks, or as a solo presenter on 1 and a co-presenter on 2.

Look through the talks that were selected for Copenhagen and Seattle and notice that most have clear, compelling titles and descriptions. The CFP form has a section for including resources that will help reviewers assess your submission. If you have given a talk before that was recorded, please include a link to it. Blog posts, code repos, and other contributions can also help establish your credentials, especially if this will be your first public talk (and we encourage first-time speakers to apply).

Finally, we are explicitly interested in increasing the voice of those who have been traditionally underrepresented in tech. All submissions are reviewed on merit, but we remain dedicated to having a diverse and inclusive conference and we will continue to actively take this into account when finalizing the list of speakers and the overall schedule. For example, we don’t accept panel proposals where all speakers are men. We also provide diversity scholarships to offset travel costs.

Other Cloud Native Conferences

In 2019, we plan to continue to hold three KubeCon + CloudNativeCon events, in Barcelona (May 20-23, 2019), Shanghai (June 24-26, 2019), and San Diego (November 18-21, 2019). In addition, we support 160 Meetup groups in 38 countries, which have hosted more than 1,600 events and have more than 80,000 members.

We also help CNCF-hosted projects hold their own specialized events, whether in conjunction with KubeCon + CloudNativeCon (including EnvoyCon and the Observability Practitioner’s Summit) or standalone (including PromCon).

New for 2019, we are going to launch two new kinds of smaller events. Kubernetes Days will be single-day, single-track events targeted at regions with large numbers of developers who cannot necessarily travel easily to our premiere events in Europe, China, and North America. The first one will be held in Bengaluru, India on March 23, 2019.

In addition, we are planning to support a set of community-organized events called Cloud Native Community Days. These will be regional events run by community members in those areas and provide additional opportunities for speakers, practitioners and end users to come together. We’ll have more details about these programs early in 2019.

Barcelona and Shanghai Submissions

The CFP for Barcelona is open now and the deadline is January 18, 2019. The deadline for submitting talks to KubeCon + CloudNativeCon Shanghai (June 24-26, 2019) will be February 1, 2019. The submission and selection processes are separate. If you submit the same talk to both and it is accepted for one, it will be rejected from the other, so we encourage you to submit different content to each conference.

Follow-Up

If you have questions on our processes for selecting talks or ideas on how to improve them, or other thoughts on CNCF events, please reach out to me at Dee Kumar <dkumar@linuxfoundation.org> or book a time for us to speak at https://calendly.com/deekumar.

JD.com Recognized for Cloud Native Open Source Technology Usage with CNCF Top End User Award

By | Blog

By JD.com

JD.com, China’s largest retailer, has been presented with the Top End User Award by the Cloud Native Computing Foundation (CNCF) for its unique usage of cloud native open source projects. The award was announced at China’s first KubeCon + CloudNativeCon conference hosted by CNCF, which gathered thousands of technologists and end users in Shanghai from November 13-15 to discuss the future of open source technology development.

Providing the ultimate e-commerce experience to customers requires JD to house and process enormous amount of information that must be accessible at incredibly fast speeds. To put it in perspective, five years ago there were only about two billion images in JD’s product databases for customers. Today, there are more than one trillion, and that figure increases by 100 million images each day. This is why JD turned to CNCF’s Kubernetes project in recent years to accommodate its clusters.

JD currently runs the world’s largest Kubernetes cluster in production. The company first rolled out its containerized infrastructure a few years ago and, as the clusters grew, JD was one of the early adopters to shift to Kubernetes. The move, known as JDOS 2.0, marked the beginning of JD’s partnership with CNCF to build stronger collaborative relationships with the industry’s top developers, end users, and vendors. Ultimately, CNCF provided a window for JD to both contribute to and benefit from open source development.

In April, JD became the CNCF’s first platinum end user member, and took a seat on the organization’s governance board in order to help shape the direction of future Foundation initiatives. JD’s overall commitment to open source is highly aligned with its broader Retail as a Service strategy in which the company is empowering other retailers, partners, and industries with a broad range of capabilities in order to increase efficiency, reduce costs, and provide a higher level of customer service.

JD’s Kubernetes clusters support a wide range of workloads and big data and AI-based applications. The platform has boosted collaboration and enhanced productivity by reducing silos between operations and DevOps teams. As a result, JD has contributed code to projects such as Vitess, Prometheus, Kubernetes, CNI (Container Networking Interface), and Helm as part of its collaboration with CNCF.

“One contribution that we are very proud of is Vitess, the CNCF project for scalable MySQL cluster management,” said Haifeng Liu, chief architect, JD.com. “We are not only the largest end user of Vitess, but also a very active and significant contributor. We’re looking forward to working together with CNCF and its members to pave the way for future development of open source technology.”

Vitess allows JD to manage resources much more flexibly and efficiently, reducing operational and maintenance costs, and JD has one of the world’s most complex Vitess deployments. The company is actively collaborating with the CNCF community to add new features such as subquery support and global transactions, setting industry benchmarks.

“JD spearheads the use of cloud native technologies at scale within the APAC market, and is responsible for one of the largest Kubernetes deployments in the world,” said Chris Aniszczyk, COO of Cloud Native Computing Foundation. “The company also makes significant contributions to CNCF projects and its involvement in the community made JD a natural fit for this award.”

JD will continue to work on contributions to cloud native technologies as well as release its own internal and homegrown open source projects to empower others in the community.


京东(JD.com)应用云原生开源技术荣获“云原生计算基金会 (CNCF) 最佳终端用户奖”

作者:京东(JD.com)

中国最大的零售商京东因其开创性地使用云原生开源项目荣获“云原生计算基金会最佳终端用户奖”。11 月 13 日至 15 日,几千名技术人员和终端用户齐聚上海,参加云原生计算基金会举办的中国首届 KubeCon + CloudNativeCon,探讨开源技术的未来发展,主办方也在会上隆重宣布了这一奖项。

为了向客户提供最佳的电子商务体验,京东必须储存和处理庞大的信息量,确保人们可以用快到不可思议的速度访问该信息。相比五年之前,京东的客户产品数据库中仅大约 20 亿张图片。如今,京东数据库已有超过 一万亿张图片,并以每天 1 亿张图片的速度增长。正因为如此,近年来,京东将目光转向了云原生计算基金会的 Kubernetes 项目,以因应企业集群。

目前,京东在生产中运行着全世界庞大的Kubernetes 集群。几年前,公司首次推广容器化基础架构,并且,随着集群规模的增加,京东成为 Kubernetes 集群的早期用户之一。此举便是众所周知的 JDOS 2.0,标志着京东与云原生计算基金会合作的开端,促使京东与业内顶级开发商、终端用户及供应商建立更强大的合作关系。最终,云原生计算基金会为京东提供了一个窗口,让京东既促进了开源技术的发展,又受益于它的发展。

今年 4 月,京东成为云原生计算基金会的首位白金级终端用户成员,并加入了该机构的管理委员会,帮助确定基金会未来的行动方向。京东对开源技术的总体承诺与该企业推行的“零售即服务”策略不谋而合,通过该策略,京东为其他零售商、合作伙伴和行业提供广泛支持,从而提高效率、降低成本、提供更高水平的客户服务。

京东的 Kubernetes 集群支持广泛的工作负载以及大数据和人工智能应用。该平台通过减少运营团队与 DevOps 之间的筒仓,促进了合作,提高了生产效率。所以,在与云原生计算基金会合作的过程中,京东为项目贡献了代码,比如 Vitess、Prometheus、Kubernetes、CNI(容器网路接口)以及 Helm。

京东首席架构师刘海锋表示“我们引以为傲的一个贡献是 Vitess,它是云原生计算基金会的一个可扩展性 MySQL 集群管理项目。我们不仅是最大的 Vitess 终端用户,还是非常活跃和重要的贡献者。我们期待与云原生计算基金会及其成员合作,为开源技术的未来发展开辟道路。”

Vitess 帮助京东更加灵活高效地管理资源,降低运营和维护成本,同时京东拥有世界上最复杂的 Vitess 部署之一。京东与云原生计算基金会社圈积极合作,增加新功能(比如子查询支持与全球交易),设定行业基准。

云原生计算基金会的首席运营官 Chris Aniszczyk 表示,“京东在亚太市场带头大规模使用云原生技术,并负责完成世界上最大的 Kubernetes 部署之一。该公司为云原生计算基金会的项目做出了巨大贡献,正是因为京东的参与,它获得这一奖项当之无愧。”

京东将继续努力为云原生技术做出贡献,同时发布自己的内部本土开源项目,为圈内的其他参与者提供支持。

TOC Votes to Move Harbor into CNCF Incubator

By | Blog

Harbor started in 2014 as a humble internal project meant to address a simple use case: storing images for developers leveraging containers. The cloud native landscape was wildly different and tools like Kubernetes were just starting to see the light of the day. It took a few years for Harbor to mature to the point of being open sourced in 2016, but the project was a breath of fresh air for individuals and organizations attempting to find a solid container registry solution. We were confident Harbor was addressing critical use cases based on its strong growth in user base early on.

We were incredibly excited when Harbor was accepted to the Cloud Native Sandbox in the summer of 2018. Although Harbor had been open sourced for some years by this point, having a vendor-neutral home immediately impacted the project resulting in increased engagement via our community channels and GitHub activity.

There were many things we immediately began tackling after joining the Sandbox, including addressing some technical debt, laying out a roadmap based solely on community feedback, and expanding the number of contributors to include folks that have consistently worked on improving Harbor from other organizations. We’ve also started a bi-weekly community call where we hear directly from Harbor users on what’s working well and what’s not. Finally, we’ve ratified a project governance model that defines how the project operates at various levels.

Given Harbor’s already-large global user base across organizations small and large, proposing the project mature into the CNCF Incubator was a natural next step. The processes around progressing to Incubation are defined here. In order to be considered, certain growth and maturity characteristics must first be demonstrated by the project:

  • Production usage: There must be users of the project that have deployed it to production environments and depend on its functionality for their business needs. We’ve worked closely with a number of large organizations leveraging Harbor the last number of years, so: check!
  • Healthy maintainer team: There must be a healthy number of members on the team that can approve and accept new contributions to the project from the community. We have a number of maintainers that founded the project and continue to work on it full time, in addition to new maintainers joining the party: check!
  • Healthy flow of contributions: The project must have a continuous and ongoing flow of new features and code being submitted and accepted into the codebase. Harbor released v1.6 in the summer of 2018, and we’re on the verge of releasing v1.7: check!
  • CNCF’s Technical Oversight Committee (TOC) evaluated the proposal from the Harbor team and concluded that we had met all the required criteria. It is both deeply humbling and an honor to be in the company of other highly-respected incubated projects like gRPC, Fluentd, Envoy, Jaeger, Rook, NATS, and more.

What’s Harbor anyway?

Harbor is an open source cloud native registry that stores, signs, and scans container images for vulnerabilities.

Harbor solves common challenges by delivering trust, compliance, performance, and interoperability. It fills a gap for organizations and applications that cannot use a public or cloud-based registry, or want a consistent experience across clouds.

Harbor addresses the following common use cases:

  • On-prem container registry – organizations with the desire to host sensitive production images on-premises can do so with Harbor.
  • Vulnerability scanning – organizations can scan images before they are used in production. Images with failed vulnerability scans can be blocked from being pulled.
  • Image signing – images can be signed via Notary to ensure provenance.
  • Role-based Access Control – integration with LDAP (and AD) to provide user- and group-level permissions.
  • Image replication – production images can be replicated to disparate Harbor nodes, providing disaster recovery, load balancing and the ability for organizations to replicate images to different geos to provide a more expedient image pull.

Architecture

The “Harbor stack” is comprised of various 3rd-party components, including nginx, Docker Distribution v2, Redis, and PostgreSQL. Harbor also relies on Clair for vulnerability scanning, and Notary for image signing.

The Harbor components, highlighted in blue, are the heart of Harbor and are responsible for most of the heavy lifting in Harbor:

  • Core Services provides an API and UI interface. Intercepts docker pushes / pulls to provide role-based access control and also to prevent vulnerables images from being pulled and subsequently used in production (all of this is configurable).
  • Admin service is being phased out for v1.7, with feature / functionality being merged into the core service.
  • Job Service is responsible for running background tasks (e.g., replication, one-shot or recurring vulnerability scans, etc.). Jobs are submitted by the core service and run in the job service component.

Currently Harbor is packaged via both docker-compose service definition and a Helm chart.

Want to learn more?

The best way to learn about Harbor is:

Community stats and graphs

Harbor has continued an upward trajectory of community growth through 2018. The stats below visualize the consistent growth pre- and post-acceptance into the Cloud Native Sandbox:

Where we are

Harbor is both mature and production-ready. We know of dozens of large organizations leveraging Harbor in production, including at least one serving millions of container images to tens-of-thousands of compute nodes. The various components that comprise Harbor’s overall architecture are battle-tested in real-world deployments.

Harbor is API driven and is being used in custom SaaS and on-prem products by various vendors and companies. It’s easy to integrate Harbor in your environment, whether a customer-facing SaaS or an internal development pipeline.

The Harbor team strives to release quarterly. We’re currently working on our eight major release, v1.7, due out soon. Over the last two releases alone we’ve made marked strides in achieving our long terms goals:

  • Native support of Helm charts
  • Initial support for deploying Harbor via Helm chart
  • Refactoring of our persistence layer, now relying solely on PostgreSQL and Redis – this will help us achieve our high-availability goals
  • Added labels and replication filtering based on labels
  • Improvements to RBAC, including LDAP group-based access control
  • Architecture simplification (i.e., collapsing admin server component responsibilities into core component)

Where we’re going

This is the fun part. 🙂

Harbor is a vibrant community of users – those who use Harbor and publicly share their experiences, the individuals who report and respond to issues, the folks who hang around in our Slack community, and those who spend time on GitHub improving our code and documentation. We’re all incredible fortunate at the rich and exciting ideas that are proposed via GitHub issues on a regular basis.

We’re still working on our v1.8 roadmap, but here are some major features we’re considering and might land at some point in the future (timing to be determined, and contributions are welcome!):

  • Quotas – system- and project-level quotas; networking quotas; bandwidth quotas; user quotas; etc.
  • Replication – the ability to replicate to non-Harbor nodes.
  • Image proxying and caching – a docker pull would proxy a request to, say, Docker Hub, then scan the image before providing to developer. Alternatively, pre-cache images and block images that do not meet vulnerability requirements.
  • One-click upgrades and rollbacks of Harbor.
  • Clustering – Harbor nodes should cluster, replicate metadata (users, RBAC and system configuration, vulnerability scan results, etc.). Support for wide-area clustering is a stretch goal.
  • BitTorrent-backed storage – images are transparently transferred via BT protocol.
  • Improved multi-tenancy – provide additional multi-tenancy construct (system → tenant → project)

Please feel free to share your wishlist of features via GitHub; just open an issue and share your thoughts. We keep track of items the community desires and will prioritized based on demand.

How to get involved

Getting involved in Harbor is easy. Step 1: don’t be shy. We’re a friendly bunch of individuals working on an exciting open source project.

The lowest-barrier of entry is joining us on Slack. Ask questions, give feedback, request help, share your ideas on how to improve the project, or just say hello!

We love GitHub issues and pull requests. If you think something can be improved, let us know. If you want to spend a few minutes fixing something yourself – docs, code, error messages, you name it – please feel free to open a PR. We’ve previously discussed how to contribute, so don’t be shy. If you need help with the PR process, the quickest way to get an answer is probably to ping us on Slack.

See you on GitHub!


技术委员会投票推举Harbor为孵化项目


By James Zabala 詹姆斯 扎巴拉

Harbor始于2014年,是一个内部发起的项目,旨在解决一个简单的问题:帮助开发人员存储容器镜像。 当时云原生的版图和现在完全不同,像Kubernetes这样的工具刚刚开始引起注意。 直到2016年,Harbor逐渐成熟并开源,它为试图解决类似问题的个人和组织带来了新的选择。 而用户量的持续增长也让我们相信Harbor 解决了关键的问题。


当Harbor于2018年夏天被 CNCF 接受为 “沙箱”项目时,我们感到非常兴奋。虽然Harbor已经开源了几年,但是有一个供应商中立的“家”立刻提高了社区的活跃度和github上用户的参与度。

成为沙箱项目后,我们立即开始着手改善之前存在的问题,包括偿还技术债务,基于社区反馈制定未来的路线图,并吸纳社区中积极贡献的开发者成为项目成员。我们还每两周定期召开社区电话会议,以便让社区用户有机会更直接地我们向我们提出意见。最后,我们还讨论通过了治理模式,并用它定义社区运行的规范。

鉴于Harbor已经拥有了庞大的用户基础,很自然地,我们认为将项目升级为CNCF“孵化”项目是接下来正确的一步。升级成为“孵化”项目的条件可在这里找到。 为了获得升级的资格,我们首先陈述并表明我们已经达到了要求:

生产环境的使用:必须有用户把该项目部署到生产环境,并用它的功能满足其商业需求。过去几年中我们已经和若干个大型组织密切合作帮助解决他们在生产环境使用Harbor中遇到的问题。(满足)
健康的维护团队:必须有足够数量的团队成员可以批准并接受社区对项目的新贡献。 除去新加入的维护人员外,本项目的创始团队成员仍在全职工作在此项目上。(满足!)
持续健康的成长:必须有持续新代码和功能被提交到项目的代码库中。Harbor在2018年夏天发布了1.6版,1.7版也马上会发布。(满足)
CNCF的技术委员会(TOC)已经对Harbor团队提出的申请进行了评估,并认为我们已经满足了所有的条件。我们很荣幸地成为像gRPC, Fluentd, Envoy, Jaeger, Rook, NATS等有高度声望的项目中的一员。
Harbor是什么?
Harbor是开源的云原生镜像库用来存储、签名容器镜像并对容器镜像进行漏洞扫描。

Harbor通过提供信任、合规、性能以及互操作等能力来解决常见的挑战。Harbor的出现使得那些无法使用公有仓库、基于云的镜像库异或在多云环境中想保留一致性体验的组织和应用有了新的选择。


Harbor解决以下常见用例:

本地容器镜像库-希望在本地托管敏感生产镜像的组织可以使用Harbor。
漏洞扫描-组织可以在上线前对镜像进行扫描。具有安全漏洞的镜像将被阻止拉取。
镜像签名-通过Notary对镜像进行签名以确保来源。
基于角色的访问控制-集成LDAP(和AD)来提供用户-组级别的权限控制。
镜像复制-生产镜像可以复制到不同的Harbor节点,以提供灾难恢复、负载均衡以及在不同地理位置复制镜像进而实现更方便的镜像拉取的能力。

系统架构

 


“Harbor技术栈”中有多个第三方组件,包括NGINX,Docker,Distribution V2,Redis以及PostgreSQL。Harbor同时依赖Clair来进行漏洞扫描,依赖Notary进行镜像签名。

图中蓝色高亮的Harbor组件,是Harbor的核心,承担大部分的处理工作:
内核服务-提供API和UI服务接口。解析docker push/pull请求以提供基于角色的访问控制,同时阻止漏洞镜像被拉取并随后用在生产环境中(都可配置)
配置服务将在V1.7中移除,相关的功能会合并到内核服务中去。
任务队列服务主要负责运行后台任务(比如:复制、单次或者经常性的漏洞扫描等)。任务通过内核服务提交后在任务队列服务组件中运行。

目前可通过docker-compose和Helm chart来部署Harbor。
更多学习资料
学习Harbor的最佳途径:
Harbor 主页:https://goharbor.io/
Harbor CNCF在线研讨链接:https://www.cncf.io/event/webinar-harbor/
胶片:
https://drive.google.com/file/d/1F6nvZhtw6-bgwdlySXLq3OIvlC6f6Vvb/view?usp=sharing

社区统计图表

从2018年,Harbor社区成长度呈现持续上升的趋势。下列图表显示了Harbor作为”沙箱“项目加入CNCF基金会前后的稳定增长。


现状
Harbor是一个成熟且可付诸于生产的项目。已知有大量的公司在他们的生产环境中使用Harbor,其中至少一家公司成功部署了上万个Harbor节点来管理其百万级的容器镜像,这足以证明Harbor所有的功能组件都经过了实际工程检验。

Harbor是可API驱动的项目,已被许多厂商和合作公司集成到他们订制的SaaS系统或者离线安装产品中。部署Harbor到你的环境中并非难事,无论你的环境是面向客户的SaaS系统还是内部开发流程。

Harbor团队致力于季度发布。我们正在研发并将很快发布第八个大版本 – v1.7。对于实现Harbor的长期目标,仅在过去的两个版本中,我们就已经取得了显著的进步:

增加了Helm Charts的支持
支持使用Helm chart方式部署Harbor
重构持久化层,使得其仅依赖于PostgreSQL和Redis,这有助于实现高可用
增加标签功能,支持通过标签过滤的镜像复制
增强RBAC,包括支持LDAP的用户组管理
简化架构(合并admin server组件的职能到core组件)

未来发展

到了有趣的部分了。 🙂

Harbor是一个充满活力的社区,这离不开广大用户的积极参与:大家使用并公开分享经验、提出和回答问题、持续关注Slack社区、改进代码和文档。GitHub上经常有用户提出丰富而令人兴奋的想法,我们非常庆幸有着这样一群活跃的社区参与者。

我们仍然在制定1.8版本路线图,但是这里有一些我们正在考虑的主要功能,可能会在未来的某个时刻落地(时间待定,欢迎贡献!):

配额——系统和项目级配额、网络配额、带宽配额、用户配额等
复制——复制到非Harbor节点的能力
镜像代理和缓存——docker pull的请求将会被代理,比如对于从Docker Hub拉取的镜像,可以先对其进行扫描然后再提供给开发人员。还可以对镜像进行预缓存或者阻止不符合漏洞要求的镜像被下载
一键升级和回滚
群集——Harbor节点应该群集化,相互之间复制元数据(用户,RBAC和系统配置,漏洞扫描结果等),进而支持更大范围的集群
BitTorrent支持的存储——镜像通过BT协议透明传输
改进多租户——提供额外的多租户结构(系统→租户→项目)
如果您期待Harbor拥有某些新功能,只需要在GitHub开一个issue并描述您的想法,我们会跟踪社区反馈的需求列表,并根据需求程度确定优先级。
如何参与
参与到Harbor的过程非常容易。首先,放开胆子,参与到这个激动人心的开源项目中的人们都很友好。

加入到Slack是进入的最低门槛, 你可以在里面问个问题,给个反馈,请求帮助,分享如何改进项目的想法,或者只是打个招呼!

我们喜欢使用github提出问题和pull请求,如果你想到了哪些地方可以改进,告诉我们。如果你想自己花些时间来解决一些问题–例如:文档,代码,错误消息等任何你能想到的东西,请提交pull请求(PR)吧。前面我们谈到了如何贡献,所以不要犹豫,如果你在提交PR的过程中需要帮助,在Slack找我们,这里能得到快速帮助。

相约Github,不见不散!

CNCF Survey: Cloud Usage in Asia Has Grown 135% Since March 2018

By | Blog

The bi-annual CNCF survey takes a pulse of the community to better understand the adoption of cloud native technologies. This is the second time CNCF has conducted its cloud native survey in Mandarin to better gauge how Asian companies are adopting open source and cloud native technologies. The previous Mandarin survey was published in March 2018. This post also makes comparisons to the most recent North American / European version of this survey from August 2018.

Key Takeaways

  • Usage of public and private clouds in Asia has grown 135% since March 2018, while on-premise has dropped 48%.
  • Usage of nearly all container management tools in Asia has grown, with commercial off-the-shelf solutions up 58% overall, and home-grown solutions up 690%. Kubernetes has grown 11%.
  • The number of Kubernetes clusters in production is increasing. Organizations in Asia running 1-5 production clusters decreased 37%, while respondents running 11-50 clusters increased 154%.
  • Use of serverless technology in Asia has spiked 100% with 29% of respondents using installable software and 21% using a hosted platform.

300 people responded to the Chinese version with 83% being from Asia, compared to 187 respondents from our March 2018 survey.

CNCF has a total of 42 members across China, Japan, and South Korea including 4 platinum members: Alibaba Cloud, Fujitsu, Huawei, and JD.com. A number of these members are also end users, including:

Growth of Containers

Container usage is becoming prevalent in all phases of the development cycle. There has been a significant jump in the use of containers for testing, up to 42% from 24% in March 2018 with an additional 27% of respondents citing future plans. There has also been an increase in use of containers for Proof of Concept (14% up from 8%).

As the usage of containers becomes more prevalent across all phases of development, the use of container management tools is growing. Since March 2018, there has been a significant jump in the usage of nearly all container management tools.

Usage of Kubernetes has grown 11% since March 2018. Other tools have also grown:

  • Amazon ECS: up to 22% from 13%
  • CAPS: up to 13% from 1%
  • Cloud Foundry: up to 20% from 1%
  • Docker Swarm: up to 27% from 16%
  • Shell Scripts: up to 14% from 5%

There are also two new tools that were not cited in the March 2018 survey. 16% of respondents are using Mesos and an additional 8% are using Nomad for container management.

Commercial off-the-shelf solutions (Kubernetes, Docker Swarm, Mesos, etc.) have grown 58% overall, while home-grown management (Shell Scripts and CAPS) have grown 690%, showing that home-grown solutions are still widely popular in Asia while North American and European markets moved away from those in favor of COTS solutions.

Cloud vs. On-Premise

While on-premise solutions are widely used in the North American and European markets (64%), that number seems to be declining for the Asian market. Only 31% of respondents reported using on-premise solutions in this survey, compared to 60% in March 2018. Cloud usage is growing with 43% of respondents using private clouds (up from 24%) and 51% using public clouds (up from 16%).

Kubernetes

As for where Kubernetes is being run, Alibaba still remains No. 1 with 38% of respondents reporting usage, but is down from 52% in March 2018. Following Alibaba, is Amazon Web Services (AWS) with 24% of respondents citing usage, slightly down from 26%.  

New environments that were not previously reported and are taking up additional market share are Huawei Cloud (13%), VMware (6%), Baidu Cloud (21%), Tencent Cloud (7%), IBM Cloud (8%), and Packet (5%).

The decline of on-premise usage is also evident in these responses, with 24% of respondents reporting that they run Kubernetes on-prem compared to 38% in March 2018. OpenStack usage has also declined significantly, down to 9% from 26% in March 2018.

For organizations running Kubernetes, the number of production clusters is also increasing. Respondents running 1-5 production clusters decreased 37%, while respondents running 11-50 clusters increased 154%. Still, respondents are mostly running 6-10 production containers, with 29% reporting that number.

We also asked respondents about the tools they are using to manage various aspects of their applications:

Packaging

The most popular method of packaging Kubernetes applications is Managed Kubernetes Offerings (37%), followed by Ksonnet (27%) and Helm (24%).  

Autoscaling

Respondents are primarily using autoscaling for Task and Queue processing applications (44%) and Java Applications (44%). This is followed by stateless applications (33%) and stateful databases (29%).  

The top reasons respondents aren’t using Kubernetes autoscaling capabilities are because they are using a third party autoscaling solution (32%), were not aware these capabilities existed (30%), or have built their own solution to autoscale (26%).

Ingress Providers

The top Kubernetes ingress providers reported are F5 (36%), nginx (34%), and GCP (22%).

Exposing Cluster External Services

The most popular ways to expose Cluster External Services were Load-Balancer Services (43%), integration with a third party load-balancer (37%), and L7 Ingress (35%).

Separating Kubernetes in an Organization with Multiple Teams

Respondents are separating multiple teams within their organization using namespaces (49%), separate clusters (42%), and only labels (34%).

Separating Kubernetes Applications

Respondents are primarily separating their Kubernetes applications using separate clusters (45%), namespaces (46%), and only labels (33%).

Cloud Native Projects

What are the benefits of cloud native projects in production? Respondents cited the top four reasons as:

  • Improved Availability (47%)
  • Improved Scalability (46%)
  • Cloud Portability (45%)
  • Improved Developer Productivity (45%)

Compared to the North American and European markets, improved availability and developer productivity are more important in the Asian market, while faster deployment time is less important (only 38% cited this compared to 50% in the English version of this survey).

As for the cloud native projects that are being used in production and evaluated:

Many cloud native projects have grown in production usage since March 2018. Projects with the largest spike in production usage are: gRPC (22% up from 13%), Fluentd (11% up from 7%), Linkerd (11% up from 7%), OpenTracing (27% from 20%).

The number of respondents evaluating cloud native projects also grew with: gRPC (20% up from 11%), OpenTracing (27% up from 18%), and Zipkin (12% up from 9%).

Challenges Ahead

As cloud native technologies continue to be adopted, especially into production, there are still challenges to address. The top challenges respondents are facing are:

  • Lack of training (46%)
  • Difficulty in choosing an orchestration solution (30%)
  • Complexity (28%)
  • Finding vendor support (28%)
  • Monitoring (25%)

One interesting note is that many of the challenges have significantly declined since our previous survey in March 2018 as more resources are added to address these concerns. A new challenge that has come up is lack of training. While CNCF has invested heavily in Kubernetes training over the past year, including courses and certification for Kubernetes Administrators and Application Developers, we are still actively working to make translated versions of the courses and certifications available and more easily accessible in Asia. CNCF is also working with a global network of Kubernetes Training Partners to expand these resources, as well as Kubernetes Certified Service Providers to help support organizations with the complexity of embarking on their cloud native journey.  

Serverless

The use of serverless technology has spiked with 50% of organizations using the technology compared to 25% in March 2018. Of that 50%, 29% are using installable software and 21% are using a hosted platform. An additional 17% plan to use the technology within the next 12-18 months.

For installable serverless platforms, Apache OpenWhisk is the most popular with 11% of respondents citing usage. This is followed by Dispatch (6%), FN (5%) and OpenFaaS, Kubeless, and Fission tied at 4%.

For hosted serverless platforms, AWS Lambda is the most popular with 11% of respondents citing usage. This is followed by Azure Functions (8%), and Alibaba Cloud Compute Functions, Google Cloud Functions, and Cloudflare Functions tied at 7%.

Serverless usage in Asia is higher than what we saw in North American and European markets where 38% of organizations were using serverless technology. Hosted platforms (32%) were also much more popular compared to installable software (6%), whereas in Asia both options are more evenly used. There is also much more variety in the solutions used, whereas AWS Lambda and Kubeless were the clear leaders in North America and Europe.

Relating back to CNCF projects, a small percentage of respondents are now evaluating (3%) or using CloudEvents in production (2%). CloudEvents is an effort organized by CNCF’s Serverless Working Group to create a specification for describing event data in a common way.

Cloud Native is Growing in China

As cloud native continues to grow in China, the methods for learning about these technologies becomes increasingly important. Here are the top ways respondents are learning about cloud native technologies:

 

Documentation

50% of respondents are learning through documentation. Each CNCF project hosts extensive documentation on their websites, which can be found here. Kubernetes, in particular, is currently working on translating their documentation and website across multiple languages including Japanese, Korean, Norwegian, and Chinese.

 

Technical Podcasts

48% of respondents are learning through technical podcasts. There are a variety of cloud native podcasts out there to learn from such as the Kubernetes Podcast from Google and PodCTL by Red Hat. Also, check out this list of 10 China tech podcasts you should follow.

 

Technical Webinars

29% of respondents are learning through technical webinars. CNCF runs a weekly webinar series that takes place every Tuesday from 10am-11am PT. You can see the upcoming schedule and view recordings and slides of previous webinars.

 

Business Case Studies

27% of respondents are learning through business case studies. CNCF has a collection of end user case studies while Kubernetes also maintains an extensive list of Kubernetes-specific user case studies.

 

The Cloud Native Community in China

As cloud native continues to grow in Asia, CNCF is excited to be hosting the first annual KubeCon + CloudNativeCon in Shanghai this week. With over 1,500 attendees at the inaugural event, we look forward to seeing the continued growth of cloud native technologies at a global scale.

To keep up with the latest news and projects, join us at one of the 22 cloud native Meetups across Asia. We hope to see you at one of our upcoming Meetups!  

A huge thank you to everyone who participated in our survey!

You can also view the findings from past surveys here:

About the Survey Methodology & Respondents

The pool of respondents represented a variety of company sizes with the majority being in the 50-499 employee range (48%). As for job function, respondents identified mostly as Developers (22%), Development Manager (15%), and IT Managers (12%).  

Respondents represented 31 different industries, the largest being software (13%) and financial services (11%).

This survey was conducted in Mandarin. You can view additional demographics breakdowns below:

 

 


 

 

CNCF 调查:自 2018 年 3 月以来,亚洲云使用率增长 135%

 

一年两次的 CNCF 调查让社区紧跟时代脉搏,更好地了解云原生技术的采用情况。这是 CNCF 第二次用普通话进行云原生调查,以更好地衡量亚洲公司如何采用开源和云原生技术。之前的普通话调查已于 2018 年 3 月进行。本文还将与 2018 年 8 月以来北美/欧洲最新的同文调查版本进行比较。

重要结论

  • 自 2018 年 3 月以来,亚洲公共云和私有云的使用量增长 135%,而本地使用量下降 48%。
  • 亚洲几乎所有容器管理工具的使用率都有所增长,现成商业解决方案总体增长 58%,本土解决方案增长 690%。Kubernetes 增长 11%。
  • Kubernetes 集群在生产中的数量正在增加。运行 1-5 个生产集群的亚洲组织减少 37%,而运行 11-50 个集群的受访者增加 154%。
  • 在亚洲,无服务器技术的使用率激增了 100%,29% 的受访者使用可安装软件,21% 的受访者使用托管平台。

300 人回复了中文版调查,其中 83% 来自亚洲,相比之下,我们 2018 年 3 月的调查中有 187 人回复。

CNCF 在中国, 日本和韩国共有 42 位会员,其中包括 4 位白金会员:阿里云, 富士通, 华为和京东 (JD.com)。其中 某些成员也是终端用户,包括:

容器的增长

在开发周期的所有阶段均使用容器已经成为常态利用容器进行测试的人数大幅增加,从 2018 年 3 月的 24% 增加到 42%,另外 27% 的受访者提到未来计划采用。用于概念验证的容器使用也有所增加(从 8% 上升到 14%)。

随着容器的使用在所有开发阶段日益普遍,容器管理工具的使用也越来越多。自 2018 年 3 月以来,几乎所有容器管理工具的使用都大幅增加。

自 2018 年 3 月以来, Kubernetes 的使用量增长 11%。其他工具的使用也有所增加:

  • Amazon ECS:从 13% 上升至 22%
  • CAPS:从 1% 上升至 13%
  • Cloud Foundry:从 1% 上升至 20%
  • Docker Swarm:从 16% 上升至 27%
  • Shell Scripts:从 5% 上升至 14%

也有 2018 年 3 月的调查中没有提及的两个新工具。16% 的受访者使用 Mesos,另外 8% 使用 Noman 进行容器管理。

 

现成的商业解决方案(Kubernetes, Docker Swarm, Mesos 等)总体增长 58%,而本土管理(Shell Scripts 和 CAPS)增长 690%,这表明本土解决方案在亚洲仍然很受欢迎,而北美和欧洲市场则不再青睐 COTS 解决方案。

云与本地部署

虽然北美和欧洲市场广泛使用本地解决方案 (64%),但在亚洲市场,这一数字似乎在下降。在本次调查中,只有 31% 的受访者报告了使用本地解决方案,相比之下,2018 年 3 月这一比例为 60%。云使用率正在增长,43% 的受访者使用私有云(从 24% 上升,51% 的受访者使用公共云(从 16% 上升。

Kubernetes

至于在何处运营  Kubernetes,阿里巴巴仍是第一,38% 的受访者报告正在使用,但低于 2018 年 3 月的 52%。紧随阿里巴巴之后的是亚马逊网络服务 (AWS),24% 的受访者提到使用,略低于 26%。 

以前未报告过,但正在占据额外市场份额的新环境是华为云 (13%), VMware (6%), 百度云 (21%), 腾讯云 (7%),IBM 云 (8%) 和 Packet (5%)。

这些回应中也显示,本地使用量下降,24% 的受访者报告说,他们在本地运行 Kubernetes,而 2018 年 3 月这一比例为 38%。OpenStack 的使用率也大幅下降,从 2018 年 3 月的 26% 降至 9%。

对于运行 Kubernetes 的组织而言,生产集群的数量也在增加。运行 1-5 个生产集群的受访者减少 37%,而运行 11-50 个集群的受访者增加 154%。尽管如此,受访者大多运营 6-10 个生产容器,29% 的受访者报告这一数字。

我们还向受访者询问了他们用于管理应用程序各个方面的工具:

 

封装

封装 Kubernetes 应用程序最流行的方法是 Managed Kubernetes Offerings (37%),其次是 Ksonnet (27%) 和 Helm (24%)。  

 

自动扩展

受访者主要对任务和队列处理应用程序 (44%) 和 Java 应用程序 (44%) 使用自动扩展。其次是无状态应用程序 (33%) 和有状态数据库 (29%)。  

受访者未使用 Kubernetes 自动扩展功能的最大原因是因为,其使用第三方自动扩展解决方案 (32%),不知道这些功能存在 (30%),或者已经构建了自己的自动扩展解决方案 (26%)。

 

入口提供商 (Ingress Providers)

据报告,Kubernetes 最大的入口提供商是 F5 (36%), nginx (34 %) 和 GCP (22%)。

 

公开集群外部服务 (Cluster External Services)

公开集群外部服务最流行的方式是负载平衡器服务 (43%), 与第三方负载平衡器的集成 (37%)和 L7 入口 (35%)。

 

在拥有多个团队的组织中分离 Kubernetes

受访者使用命名空间 (49%), 单独集群 (42%)和仅使用标签 (34 %),将组织内的多个团队分开。

 

分离 Kubernetes 应用程序

受访者主要使用单独的集群 (45%), 命名空间 (46%) 和标签 (33%),分离他们的 Kubernetes 应用程序。

云原生项目

云原生项目在生产中有什么优点?受访者列举了前四个原因:

  • 提高可用性 (47%)
  • 改进可扩展性 (46%)
  • 云便携性 (45%)
  • 提高开发人员的工作效率 (45%)

与北美和欧洲市场相比,提高可用性和开发人员工作效率在亚洲市场更为重要,而更快的部署时间则不那么重要(只有 38% 的受访者提到了这一点,而在本调查的英文版本中这一比例为 50%。

至于生产和评估中使用的云原生项目:

自 2018 年 3 月以来,许多云原生项目的生产使用率有所增长。生产使用率最高的项目是:gRPC(从 13% 上升至 22%, Fluentd(从 7% 上升至 11%, Linkerd(从 7% 上升至 11%,OpenTracing(从 20% 上升至 27%。

评估云原生项目的受访者数量也在增长:gRPC(从 11% 上升到 20%), OpenTracing(从 18% 上升到 27%)和 Zipkin(从 9% 上升到 12%。

未来的挑战

随着云原生技术不断被采用,尤其是在生产中,仍然有挑战需要解决。受访者面临的最大挑战是:

  • 缺乏培训 (46%)
  • 难以选择编排解决方案 (30%)
  • 复杂性 (28%)
  • 寻找供应商支持 (28 %)
  • 监控 (25%)

我们注意到一个有趣的现象,自 2018 年 3 月的上一次调查以来,随着更多资源用于解决这些问题,许多挑战已经显著减少。出现的一个新挑战是缺乏培训。虽然 CNCF 在过去的一年中在Kubernetes 培训方面投入了大量资金,包括 Kubernetes 管理员应用程序开发人员的课程和认证,但我们仍在积极努力使这些课程和认证的翻译版本在亚洲更容易访问。CNCF 还与 Kubernetes 培训合作伙伴的全球网络合作,以扩展这些资源,并与Kubernetes 认证服务提供商]合作,帮助支持复杂的组织踏上云原生之旅。  

无服务

无服务器技术的使用在使用此技术的组织中激增 50%,而 2018 年 3 月这一比例为 25%。在这 50% 中,29% 使用可安装软件,21% 使用托管平台。另外 17% 的受访者计划在未来 12-18 个月内采用这项技术。

对于可安装的无服务器平台,Apache openshill 最受欢迎,有 11% 的受访者提到正在使用。其次是 Dispatch (6%), FN (5%) 和 OpenFaaS, Kubeless,和 Fission,并列 4%。

 

对于托管式无服务器平台,AWS Lambda 最受欢迎,有 11% 的受访者提到正在使用。其次是 Azure Functions (8%),阿里云 Compute Functions, 谷歌云 Functions 和Cloudflare Functions 并列 7%。

亚洲的无服务器使用量高于北美和欧洲市场,其中,38% 的组织使用无服务器技术。与可安装软件 (6%) 相比,托管平台 (32%) 也更受欢迎,而在亚洲,这两种选择的使用更均匀。使用的解决方案也多种多样,而 AWS Lambda 和 kubeness 显然是北美和欧洲的领导者。

关于回到 CNCF 项目,一小部分受访者现在正在评估 (3%)或在生产中使用 CloudEvents(2%)。CloudEvents 是由 CNCF 无服务器工作组 组织的一项工作,旨在创建以一种通用方式描述事件数据的规范。

云原生在中国生根发芽

随着云原生在中国的持续增长,学习有关这些技术的方法变得越来越重要。以下是受访者了解云原生技术的主要方式:

 

文档

50% 的受访者通过文档学习。 每个 CNCF 项目在网站上都有大量文档,可以在此处找到。尤其是 Kubernetes,目前正致力于采用多种语言翻译其文档和网站,包括日语, 韩语, 挪威语和汉语。

 

技术播客

48% 的受访者通过技术播客学习。有各种各样的云原生播客可供学习,例如,谷歌的 Kubernetes 播课和 Red Hat 的PodCTL b。此外,可以查看 您应当留意的 10 个中国技术播客

 

技术网络研讨会

29% 的受访者通过技术网络研讨会学习。CNCF 每周星期二上午 10 点到 11 点举办一次网络研讨会系列。您可以查看即将到来的时间表,或查看 之前网络研讨会的录音和幻灯片。

 

商业案例研究

27% 的受访者通过商业案例学习。CNCF 收集了终端用户案例研究 ,而 Kubernetes 也保留了大量的Kubernetes 特定用户案例研究列表。

中国的云原生社区

随着云原生技术在亚洲的持续增长,CNCF 很期待本周在上海举办首届 KubeCon + CloudNativeCon 年会。有 1,500 多名与会者参加开幕式,我们期待看到云原生技术在全球范围内的持续增长。

若要了解最新的新闻和项目,请参加我们在亚洲举办的 22 场云原生交流会。我们期待着在即将举行的会议上与您见面!  

衷心感谢每一位调查参与人员!

您也可以在这里查看过去调查的结果:

关于调查方法和受访者

受访者代表各种公司规模,其中大多数在 50-499 名员工范围内 (48%)。至于工作职能,受访者大多数为开发人员 (22%), 开发经理 (15%) 和 IT 经理 (12%)。  

受访者代表了 31 个不同的行业,主要为软件 (13%) 和金融服务 (11%) 行业。

此项调查采用普通话进行。下列为其他人口学统计数据:                   

The Beginner’s Guide to the CNCF Landscape

By | Blog

This blog was originally posted on the CloudOps blog, by Ayrat Khayretdinov

The cloud native landscape can be complicated and confusing. Its myriad of open source projects are supported by the constant contributions of a vibrant and expansive community. The Cloud Native Computing Foundation (CNCF) has a landscape map that shows the full extent of cloud native solutions, many of which are under their umbrella.

As a CNCF ambassador, I am actively engaged in promoting community efforts and cloud native education throughout Canada. At CloudOps I lead workshops on Docker and Kubernetes that provide an introduction to cloud native technologies and help DevOps teams operate their applications.

I also organize Kubernetes and Cloud Native meetups that bring in speakers from around the world and represent a variety of projects. They are run quarterly in MontrealOttawaTorontoKitchener-Waterloo, and Quebec City. Reach out to me @archyufaor email CloudOps to learn more about becoming cloud native.

In the meantime, I have written a beginners guide to the cloud native landscape. I hope that it will help you understand the landscape and give you a better sense of how to navigate it.

CNCF

The History of the CNCF

In 2014 Google open sourced an internal project called Borg that they had been using to orchestrate containers. Not having a place to land the project, Google partnered with the Linux Foundation to create the Cloud Native Computing Foundation (CNCF), which would encourage the development and collaboration of Kubernetes and other cloud native solutions. Borg implementation was rewritten in Go, renamed to Kubernetes and donated as the incepting project. It became clear early on that Kubernetes was just the beginning and that a swarm of new projects would join the CNCF, extending the functionality of Kubernetes.

The CNCF Mission

The CNCF fosters this landscape of open source projects by helping provide end-user communities with viable options for building cloud native applications. By encouraging projects to collaborate with each other, the CNCF hopes to enable fully-fledged technology stacks comprised solely of CNCF member projects. This is one way that organizations can own their destinies in the cloud.

CNCF Processes

A total of twenty-five projects have followed Kubernetes and been adopted by the CNCF. In order to join, projects must be selected and then elected with a supermajority by the Technical Oversight Committee (TOC). The voting process is aided by a healthy community of TOC contributors, which are representatives from CNCF member companies, including myself. Member projects will join the Sandbox, Incubation, or Graduation phase depending on their level of code maturity.

Sandbox projects are in a very early stage and require significant code maturity and community involvement before being deployed in production. They are adopted because they offer unrealized potential. The CNCF’s guidelines state that the CNCF helps encourage the public visibility of sandbox projects and facilitate their alignment with existing projects. Sandbox projects receive minimal funding and marketing support from the CNCF and are subject to review and possible removal every twelve months.

Projects enters the Incubation when they meet all sandbox criteria as well as demonstrate certain growth and maturity characteristics. They must be in production usage by at least three companies, maintain healthy team that approves and accepts a healthy flow of contributions that include new features and code from the community.
Once Incubation projects have reached a tipping point in production use, they can be voted by the TOC to have reached Graduation phase. Graduated projects have to demonstrate thriving adoption rates and meet all Incubation criteria. They must also have committers from at least two organizations, have documented and structured governance processes, and meet the Linux Foundation Core Infrastructure Initiative’s Best Practices Badge. So far, only Kubernetes and Prometheus have graduated.

The Projects Themselves

Below I’ve grouped projects into twelve categories: orchestration, app development, monitoring, logging, tracing, container registries, storage and databases, runtimes, service discovery, service meshes, service proxy, security, and streaming and messaging. And provided information that can be helpful for companies or individuals to evaluate what each project does, how project integrates with other CNCF projects and understand its evolution and current state.

Orchestrations

KubernetesKubernetes (graduated) –  Kubernetes automates the deployment, scaling, and management of containerised applications, emphasising automation and declarative configuration. It means helmsman in ancient Greek.  Kubernetes orchestrates containers, which are packages of portable and modular microservices. Kubernetes adds a layer of abstraction, grouping containers into pods. Kubernetes helps engineers schedule workloads and allows containers to be deployed at scale over multi-cloud environments. Having graduated, Kubernetes has reached a critical mass of adoption. In a recent CNCF survey, over 40% of respondents from enterprise companies are running Kubernetes in production.

App Development

HelmHelm (Incubating) – Helm is an application package manager that allows users to find, share, install, and upgrade Kubernetes applications (aka charts) with ease. It helps end users deploy existing applications (including MySQL, Jenkins, Artifactory and etc.) using KubeApps Hub, which display charts from stable and incubator repositories maintained by the Kubernetes community. With Helm you can install all other CNCF projects that run on top of Kubernetes. Helm can also let organizations create and then deploy custom applications or microservices to Kubernetes. This involves creating YAML manifests with numerical values not suitable for deployment in different environments or CI/CD pipelines. Helm creates single charts that can be versioned based on application or configuration changes, deployed in various environments, and shared across organizations.

Helm originated at Deis from an attempt to create a ‘homebrew’ experience for Kubernetes users. Helm V2 consisted of the client-side of what is currently the Helm Project. The server-side ‘tiller’, or Helm V2, was added by Deis in collaboration with Google at around the same time that Kubernetes 1.2 was released. This was how Helm became the standard way of deploying applications on top of Kubernetes.

Helm is currently making a series of changes and updates in preparation for the release of Helm V3, which is expected to happen by the end of the year. Companies that rely on Helm for their daily CI/CD development, including Reddit, Ubisoft, and Nike, have suggested improvements for the redesign.

TelepresenceTelepresence (Sandbox) – It can be challenging to develop containerized applications on Kubernetes. Popular tools for local development include Docker Compose and Minikube. Unfortunately, most cloud native applications today are resource intensive and involve multiple databases, services, and dependencies. Moreover, it can be complicated to mimic cloud dependencies, such as messaging systems and databases in Compose and Minikube. An alternative approach is to use fully remote Kubernetes clusters, but this precludes you from developing with your local tools (e.g., IDE, debugger) and creates slow developer “inner loops” that make developers wait for CI to test changes.

Telepresence, which was developed by Datawire, offers the best of both worlds. It allows the developer to ‘live code’ by running single microservices locally for development purposes while remaining connected to remote Kubernetes clusters that run the rest of their application. Telepresence deploys pods that contain two-way network proxies on remote Kubernetes clusters. This connects local machines to proxies. Telepresence implements realistic development/test environments without freezing local tools for coding, debugging, and editing.

Monitoring

PrometheusPrometheus (Graduated) – Following in the footsteps of Kubernetes, Prometheus was the second project to join the CNCF and the second (and so far last) project to have graduated. It’s a monitoring solution that is suitable for dynamic cloud and container environments. It was inspired by Google’s monitoring system, Borgman. Prometheus is a pull-based system – its configurations decide when and what to scrape. This is unlike other monitoring systems using push-based approach where monitoring agent running on nodes. Prometheus stores scrapped metrics in a TSDB. Prometheus allows you to create meaningful graphs inside the Grafana dashboard with powerful query languages, such as PromQL. You can also generate and send alerts to various destinations, such as slack and email, using the built-in Alert Manager.

Hugely successful, Prometheus has become the de facto standard in cloud native metric monitoring. With Prometheus one can monitor VMs, Kubernetes clusters, and microservices being run anywhere, especially in dynamic systems like Kubernetes. Prometheus’ metrics also automate scaling decisions by leveraging Kubernetes’ features including HPA, VPA, and Cluster Autoscaling. Prometheus can monitor other CNCF projects such as Rook, Vitesse, Envoy, Linkerd, CoreDNS, Fluentd, and NATS. Prometheus’ exporters integrate with many other applications and distributed systems. Use Prometheus’ official Helm Chart to start.

OpenMetricsOpenMetrics (Sandbox) – OpenMetrics creates neutral standards for an application’s metric exposition format. Its modern metric standard enables users to transmit metrics at scale. OpenMetrics is based on the popular Prometheus exposition format, which has over 300 existing exporters and is based on operational experience from Borgmon. Borgman enables ‘white-box monitoring’ and mass data collection with low overheads. The monitoring landscape before OpenMetrics was largely based on outdated standards and techniques (such as SNMP) that use proprietary formats and place minimal focus on metrics. OpenMetrics builds on the Prometheus exposition format, but has a tighter, cleaner, and more enhanced syntax. While OpenMetrics is only in the Sandbox phase, it is already being used in production by companies including AppOptics, Cortex, Datadog, Google, InfluxData, OpenCensus, Prometheus, Sysdig, and Uber.

CortexCortex (Sandbox) – Operational simplicity has always been a primary design objective of Prometheus. Consequently, Prometheus itself can only be run without clustering (as single nodes or container) and can only use local storage that is not designed to be durable or long-term. Clustering and distributed storage come with additional operational complexity that Prometheus forgoed in favour of simplicity. Cortex is a horizontally scalable, multi-tenant, long-term storage solution that can complement Prometheus. It allows large enterprises to use Prometheus while maintaining access to HA (High Availability) and long-term storage. There are currently other projects in this space that are gaining community interest, such as Thanos, Timbala, and M3DB. However, Cortex has already been battle-tested as a SaaS offering at both GrafanaLabs and Weaveworks and is also deployed on prem by both EA and StorageOS.

Logging and Tracing

FluentdFluentd (Incubator) – Fluentd collects, interprets, and transmits application logging data. It unifies data collection and consumption so you can better use and understand your data. Fluentd structures data as JSON and brings together the collecting, filtering, buffering, and outputting of logs across multiple sources and destinations. Fluentd can collect logs from VMs and traditional applications, however it really shines in cloud native environments that run microservices on top of Kubernetes, where applications are created in a dynamic fashion.

Fluentd runs in Kubernetes as a daemonset (workload that runs on each node). Not only does it collects logs from all applications being run as containers (including CNCF ones) and emits logs to STDOUT. Fluentd also parses and buffers incoming log entries and sends formatted logs to configured destinations, such as Elasticsearch, Hadoop, and Mongo, for further processing.

Fluentd was initially written in Ruby and takes over 50Mb in memory at runtime, making it unsuitable for running alongside containers in sidecar patterns. Fluentbit is being developed alongside Fluentd as a solution. Fluentbit is written in C and only uses a few Kb in memory at runtime. Fluentd is more efficient in CPU and memory usage, but has more limited features than Fluentd. Fluentd was originally developed by Treasuredata as an open source project.

Fluentd is available as a Kubernetes plugin and can be deployed as version 0.12, an older and more stable version that currently is widely deployed in production. The new version (Version 1.X) was recently developed and has many improvements, including new plugin APIs, nanosecond resolution, and windows support. Fluentd is becoming the standard for log collection in the cloud native space and is a solid candidate for CNCF Graduation.

OpenTracingOpenTracing (Incubator) – Do not underestimate the importance of distributed tracing for building microservices at scale. Developers must be able to view each transaction and understand the behaviour of their microservices. However, distributed tracing can be challenging because the instrumentation must propagate the tracing context both within and between the processes that exist throughout services, packages, and application-specific code. OpenTracing allows developers of application code, OSS packages, and OSS services to instrument their own code without locking into any particular tracing vendor. OpenTracing provides a distributed tracing standard for applications and OSS packages with vendor-neutral APIs with libraries available in nine languages. These enforce distributed tracing, making OpenTracing ideal for service meshes and distributed systems. OpenTracing itself is not a tracing system that runs traces to analyze spans from within the UI. It is an API that works with application business logic, frameworks, and existing instrumentation to create, propagate, and tag spans. It integrates with both open source (e.g. Jaeger, Zipkin) or commercial (e.g InstanaDatadog) tracing solutions, and create traces that are either stored in a backend or spanned into a UI format. Click here to try a tutorialor start instrumenting your own system with Jaeger, a compatible tracing solution.

JaegerJaeger (Incubator) – Jaeger is a distributed tracing system solution that is compatible with OpenTracing and was originally developed and battle tested by Uber. Its name is pronounced yā′gər and means hunter. It was inspired by Dapper, Google’s internal tracing system, and Zipkin, an alternative open source tracing system that was written by Twitter but built with the OpenTracing’s standard in mind. Zipkin has limited OpenTracing integration support, but Jaeger does provide backwards-compatibility with Zipkin by accepting spans in Zipkin formats over HTTP. Jaeger’s use cases monitor and troubleshoot microservices-based distributions, providing distributed context propagation, distributed transaction monitoring, root cause analysis, service dependency analysis, and performance and latency optimization. Jaeger’s data model and instrumentation libraries are compatible with OpenTracing. Its Modern Web UI is built with React/Javascript and has multiple supports for its backend. This includes Cassandra, Elasticsearch, and memory. Jaeger integrates with service meshes including Istio and Linkerd, making tracing instrumentation much easier.

Jaeger has observatibility because it exposes Prometheus metrics by default and integrates with Fluentd for log shipping. Start deploying Jaeger to Kubernetes using a Helm chart or the recently developed Jaeger Operator. Most contributions to the Jaeger codebase come from Uber and RedHat, but there are hundreds of companies adopting Jaeger for cloud native, microservices-based, distributed tracing.

Container Registries

HarborHarbor (Sandbox) – Harbor is an open source trusted container registry that stores, signs, and scans docker images. It provides free-of-charge, enhanced docker registry features and capabilities. These include a web interface with role-based access control (RBAC) and LDAP support. It integrates with Clair, an open source project developed by CoreOS, for vulnerability scanning and with Notary, a CNCF Incubation project described below, for content trust. Harbor provides activity auditing, Helm chart management and replicates images from one Harbor instance to another for HA and DR. Harbor was originally developed by VMWare as an open source solution. It is now being used by companies of many sizes, including TrendMicro, Rancher, Pivotal, and AXA.

Storage and Databases

RookRook (Incubator) – Rook is an open source cloud native storage orchestrator for Kubernetes.  With Rook, ops teams can run Software Distributed Systems (SDS) (such as Ceph) on top of Kubernetes. Developers can then use that storage to dynamically create Persistent Volumes (PV) in Kubernetes to deploy applications, such as Jenkins, WordPress and any other app that requires state. Ceph is a popular open-source SDS that can provide many popular types of storage systems, such as Object, Block and File System and runs on top of commodity hardware. While it is possible to run Ceph clusters outside of Kubernetes and connect it to Kubernetes using the CSI plugin, deploying and then operating Ceph clusters on hardware is a challenging task, reducing the popularity of the system. Rook deploys and integrates Ceph inside Kubernetes as a first class object using Custom Resource Definition (CRDs) and turns it into a self-managing, self-scaling, and self-healing storage service using the Operator Framework. The goal of Operatorsin Kubernetes is to encode human operational knowledge into software that is more easily packaged and shared with end users. In comparison to Helm that focuses on packaging and deploying Kubernetes applications, Operator can deploy and manage the life cycles of complex applications. In the case of Ceph, Rook Operator automates storage administrator tasks, such as deployment, bootstrapping, configuration, provisioning, horizontal scaling, healing, upgrading, backups, disaster recovery and monitoring. Initially, Rook Operator’s implementation supported Ceph only. As of version 0.8, Ceph support has been moved to Beta. Project Rook later announced Rook Framework for storage providers, which extends Rook as a  general purpose cloud native storage orchestrator that supports multiple storage solutions with reusable specs, logic, policies and testing. Currently Rook supports CockroachDB, Minio, NFS all in alpha and in future Cassandra, Nexenta, and Alluxio. The list of companies using Rook Operator with Ceph in production is growing, especially for companies deploying on Prem, amongst them CENGN, Gini, RPR and many in the evaluation stage.

VitessVitess (Incubator) – Vitess is a middleware for databases. It employs generalized sharding to distribute data across MySQL instances. It scales horizontally and can scale indefinitely without affecting your application. When your shards reach full capacity, Vitess will reshard your underlying database with zero downtime and good observativability. Vitess solves many problems associated with transactional data, which is continuing to grow.

TiKVTiKV (Sandbox) – TiKV is a transactional key-value database that offers simplified scheduling and auto-balancing. It acts as a distributed storage layer that supports strong data consistency, distributed transactions, and horizontal scalability. TiKV was inspired by the design of Google Spanner and HBase, but has the advantage of not having a distributed file system. TiKV was developed by PingCAP and currently has contributors from Samsung, Tencent Cloud, and UCloud.

Runtimes

RKTRKT (Incubator) – RKT (read as Rocket) is an application container runtime that was originally developed at CoreOS. Back when Docker was the default runtime for Kubernetes and was baked into kubelet, the Kubernetes and Docker communities had challenges working with each other. Docker Inc., the company behind the development of Docker as an open source software, had its own roadmap and was adding complexity to Docker. For example, they were adding swarm-mode or changing filesystem from AUFS to overlay2 without providing notice. These changes were generally not well coordinated with the Kubernetes community and complicated roadmap planning and release dates. At the end of the day, Kubernetes users need a simple runtime that can start and stop containers and provide functionalities for scaling, upgrading, and uptimes. With RKT, CoreOS intended to create an alternative runtime to Docker that was purposely built to run with Kubernetes. This eventually led to the SIG-Node team of Kubernetes developing a Container Runtime Interface (CRI) for Kubernetes that can connect any type of container and remove Docker code from its core. RKT can consume both OCI Images and Docker format Images. While RKT had a positive impact on the Kubernetes ecosystem, this project was never adopted by end users, specifically by developers who are used to docker cli and don’t want to learn alternatives for packaging applications. Additionally, due to the popularity of Kubernetes, there are a sprawl of container solutions competing for this niche. Projects like gvisor and cri-o (based on OCI) are gaining popularity these days while RKT is losing its position. This makes RKT a potential candidate for removal from the CNCF Incubator.

ContainerdContainerd (Incubator) – Containerd is a container runtime that emphasises simplicity, robustness and portability. In contrast to RKT, Containerd is designed to be embedded into a larger system, rather than being used directly by developers or end-users. Similar to RKT containerd can consume both OCI and Docker Image formats.  Containerd was donated to the CNCF by the Docker project. Back in the days, Docker’s platform was a monolithic application. However, with time, it became a complex system due to the addition of features, such as swarm mode. The growing complexity made Docker increasingly hard to manage, and its complex features were redundant if you were using docker with systems like Kubernetes that required simplicity. As a result, Kubernetes started looking for alternative runtimes, such as RKT, to replace docker as the default container runtime. Docker project then decided to break itself up into loosely coupled components and adopt a more modular architecture. This was formerly known as Moby Project, where containerd was used as the core runtime functionality. Since Moby Project, Containerd was later integrated to Kubernetes via a CRI interface known as cri-containerd. However cri-containerd is not required anymore because containerd comes with a built-in CRI plugin that is enabled by default starting from Kubernetes 1.10 and can avoid any extra grpc hop. While containerd has its place in the Kubernetes ecosystem, projects like cri-o (based on OCI) and gvisor are gaining popularity these days and containerd is losing its community interest. However, it is still an integral part of the Docker Platform.

Service Discovery

CoreDNSCoreDNS (Incubator) – CoreDNS is a DNS server that provides service discovery in cloud native deployments. CoreDNS is a default Cluster DNS in Kubernetes starting from its version 1.12 release. Prior to that, Kubernetes used SkyDNS, which was itself a fork of Caddy and later KubeDNS. SkyDNS – a dynamic DNS-based service discovery solution – had an inflexible architecture that made it difficult to add new functionalities or extensions. Kubernetes later used KubeDNS, which was running as 3 containers (kube-dns, dnsmasq, sidecar), was prone to dnsmasq vulnerabilities, and had similar issues extending the DNS system with new functionalities. On the other hand, CoreDNS was re-written in Go from scratch and is a flexible plugin-based, extensible DNS solution. It runs inside Kubernetes as one container vs. KubeDNS, which runs with three. It has no issues with vulnerabilities and can update its configuration dynamically using ConfigMaps. Additionally, CoreDNS fixed a lot of KubeDNS issues that it had introduced due to its rigid design (e.g. Verified Pod Records). CoreDNS’ architecture allows you to add or remove functionalities using plugins. Currently, CoreDNS has over thirty plugins and over twenty external plugins. By chaining plugins, you can enable monitoring with Prometheus, tracing with Jaeger, logging with Fluentd, configuration with K8s’ API or etcd, as well as enable advanced dns features and integrations.

Service Meshes

LinkerdLinkerd (Incubator) – Linkerd is an open source network proxy designed to be deployed as a service mesh, which is a dedicated layer for managing, controlling, and monitoring service-to-service communication within an application. Linkerd helps developers run microservices at scale by improving an application’s fault tolerance via the programmable configuration of circuit braking, rate limiting, timeouts and retries without application code change. It also provides visibility into microservices via distributed tracing with Zipkin. Finally, it provides advanced traffic control instrumentation to enable Canaries, Staging, Blue-green deployments. SecOps teams will appreciate the capability of Linkerd to transparently encrypt all cross-node communication in a Kubernetes cluster via TLS. Linkerd is built on top of Twitter’s Finagle project, which has extensive production usage and attracts the interest of many companies exploring Service Meshes. Today Linkerd can be used with Kubernetes, DC/OS and AWS/ECS. The Linkerd service mesh is deployed on Kubernetes as a DaemonSet, meaning it is running one Linkerd pod on each node of the cluster.

Service Proxies

EnvoyEnvoy (Incubator) – Envoy is a modern edge and service proxy designed for cloud native applications. It is a vendor agnostic, high performance, lightweight (written in C++) production grade proxy that was developed and battle tested at Lyft. Envoy is now a CNCF incubating project. Envoy provides fault tolerance capabilities for microservices (timeouts, security, retries, circuit breaking) without having to change any lines of existing application code. It provides automatic visibility into what’s happening between microservice via integration with Prometheus, Fluentd, Jaeger and Kiali. Envoy can be also used as an edge proxy (e.g. L7 Ingress Controller for Kubernetes) due to its capabilities performing traffic routing and splitting as well as zone-aware load balancing with failovers.

While the service proxy landscape already has many options, Envoy is a great addition that has sparked a lot of interest and revolutionary ideas around service meshes and modern load-balancing. Heptio announced project Contour, an Ingress controller for Kubernetes that works by deploying the Envoy proxy as a reverse proxy and load balancer. Contour supports dynamic configuration updates and multi-team Kubernetes clusters with the ability to limit the Namespaces that may configure virtual hosts and TLS credentials as well as provide advanced load balancing strategies. Another project that uses Envoy at its core is Datawires Ambassador – a powerful Kubernetes-native API Gateway. Since Envoy was written in C++, it is a super lightweight and perfect candidate to run in a sidecar pattern inside Kubernetes and, in combination with its API-driven config update style, has become a perfect candidate for service mesh dataplanes. First, the service mesh Istio announced Envoy to be the default service proxy for its dataplane, where envoy proxies are deployed alongside each instance inside Kubernetes using a sidecar pattern. It creates a transparent service mesh that is controlled and configured by Istio’s Control Plane. This approach compares to the DaemonSet pattern used in Linkerd v1 that provides visibility to each service as well as the ability to create a secure TLS for each service inside Kubernetes or even Hybrid Cloud scenarios. Recently Hashicorp announced that its open source project Consul Connect will use Envoy to establish secure TLS connections between microservices.

Today Envoy has large and active open source community that is not driven by any vendor or commercial project behind it. If you want to start using Envoy, try Istio, Ambassador or Contour or join the Envoy community at Kubecon (Seattle, WA) on December 10th 2018 for the very first EnvoyCon.

Security

Sysdig FalcoFalco (Sandbox) – Falco is an open source runtime security tool developed by Sysdig. It was designed to detect anomalous activity and intrusions in Kubernetes-orchestrated systems. Falco is more an auditing tool than an enforcement tool (such as SecComp or AppArmor). It is run in user space with the help of a Sysdig kernel module that retrieves system calls.

Falco is run inside Kubernetes as a DaemonSet with a preconfigured set of rules that define the behaviours and events to watch out for. Based on those rules, Falco detects and adds alerts to any behaviour that makes Linux system calls (such as shell runs inside containers or binaries making outbound network connections). These events can be captured at STDERR via Fluentd and then sent to ElasticSearch for filtering or Slack. This can help organizations quickly respond to security incidents, such as container exploits and breaches and minimize the financial penalties posed by such incidents.

With the addition of Falco to the CNCF sandbox, we hope that there will be closer integrations with other CNCF projects in the future. To start using Falco, find an official Helm Chart.

SpiffeSpiffe (Sandbox) – Spiffe provides a secure production identity framework. It enables communication between workloads by verifying identities. It’s policy-driven, API-driven, and can be entirely automated. It’s a cloud native solution to the complex problem of establishing trust between workloads, which becomes difficult and even dangerous as workloads scale elastically and become dynamically scheduled. Spiffe is a relatively new project, but it was designed to integrate closely with Spire.

SpireSpire (Sandbox) – Spire is Spiffe’s runtime environment. It’s a set of software components that can be integrated into cloud providers and middleware layers. Spire has a modular architecture that supports a wide variety of platforms. In particular, the communities around Spiffe and Spire are growing very quickly. HashiCorp just announced support for Spiffe IDs in Vault, so it can be used for key material and rotation. Spiffe and Spire are both currently in the sandbox.

TufTuf (Incubator) – Tuf is short for ‘The Update Framework’. It is a framework that is used for trusted content distribution. Tuf helps solve content trust problems, which can be a major security problem. It helps validate the provenance of software and verify that it only the latest version is being used. TUF project play many very important roles within the Notary project that is described below. It is also used in production by many companies that include Docker, DigitalOcean, Flynn, Cloudflare, and VMware to build their internal tooling and products.

NotaryNotary (Incubator) – Notary is a secure software distribution implementation. In essence, Notary is based on TUF and ensures that all pulled docker images are signed, correct and untampered version of an image at any stage of you CI/CD workflow, which is one of the major security concerns for Docker-based deployments in Kubernetes systems. Notary publishes and manages trusted collections of content. It allows DevOps engineers to approve trusted data that has been published and create signed collections. This is similar to the software repository management tools present in modern Linux systems, but for Docker images. Some of Notary’s goals include guaranteeing image freshness (always having up-to-date content so vulnerabilities are avoided), trust delegation between users or trusted distribution over untrusted mirrors or transport channels. While Tuf and Notary are generally not used by end users, their solutions integrate into various commercial products or open source projects for content signing or image signing of trusted distributions, such as Harbor, Docker Enterprise Registry, Quay Enterprise, Aqua. Another interesting open-source project in this space Grafeas is an open source API for metadata, which can be used to store “attestations” or image signatures, which can then be checked as part of admission control and used in products such as Container Analysis API and binary authorization at GCP, as well products of JFrog and AquaSec.

Open Policy AgentOpen Policy Agent (Sandbox) – By enforcing policies to be specified declaratively, Open Policy Agent (OPA) allows different kinds of policies to be distributed across a technology stack and have updates enforced automatically without being recompiled or redeployed. Living at the application and platform layers, OPA runs by sending queries from services to inform policy decisions. It integrates well with Docker, Kubernetes, Istio, and many more.

Streaming and Messaging

NATSNATS (Incubator) – NATS is a messaging service that focuses on middleware, allowing infrastructures to send and receive messages between distributed systems. Its clustering and auto-healing technologies are HA, and its log-based streaming has guaranteed delivery for replaying historical data and receiving all messages. NATS has a relatively straightforward API and supports a diversity of technical use cases, including messaging in the cloud (general messaging, microservices transport, control planes, and service discovery), and IoT messaging. Unlike the solutions for logging, monitoring, and tracing listed above, NATS works at the application layer.

gRPCgRPC (Incubator) – A high-performance RPC framework, gRPC allows communication between libraries, clients and servers in multiple platforms. It can run in any environment and provide support for proxies, such as Envoy and Nginx. gRPC efficiently connects services with pluggable support for load balancing, tracing, health checking, and authentication. Connecting devices, applications, and browsers with back-end services, gRPC is an application level tool that facilitates messaging.

CloudEventsCloudEvents (Sandbox) – CloudEvents provides developers with a common way to describe events that happen across multi-cloud environments. By providing a specification for describing event data, CloudEvents simplifies event declaration and delivery across services and platforms. Still in Sandbox phase, CloudEvents should greatly increases the portability and productivity of an application.

 

What’s Next?

The cloud native ecosystem is continuing to grow at a fast pace. More projects will be adopted into the Sandbox in the close future, giving them chances of gaining community interest and awareness. That said, we hope that infrastructure-related projects like Vitess, NATs, and Rook will continuously get attention and support from CNCF as they will be important enablers of Cloud Native deployments on Prem. Another area that we hope the CNCF will continue to place focus on is Cloud Native Continuous Delivery where there is currently a gap in the ecosystem.

While the CNCF accepts and graduates new projects, it is also important to have a working mechanism of removal of projects that have lost community interest because they cease to have value or are replaced other, more relevant projects. While project submission process is open to anybody, I hope that the TOC committee will continue to only sponsor the best candidates, making the CNCF a diverse ecosystem of projects that work well with each other.

As a CNCF ambassador, I hope to teach people how to use these technologies. At CloudOps I lead workshops on Docker and Kubernetes that provide an introduction to cloud native technologies and help DevOps teams operate their applications. I also organize Kubernetes and Cloud Native meetups that bring in speakers from around the world and represent a variety of projects. They are run quarterly in MontrealOttawaTorontoKitchener-Waterloo, and Quebec City. I would also encourage people to join the Ambassador team at CloudNativeCon North America 2018 on December 10th. Reach out to me @archyufa or email CloudOps to learn more about becoming cloud native.

Connect Everything: A Look at How NATS.io can Lead to a Securely Connected World

By | Blog

By Colin Sullivan

Developing and deploying applications that communicate in distributed systems, especially in cloud computing, is complex. Messaging has evolved to address the general needs of distributed applications but hasn’t gone far enough. We need a messaging system that takes the next steps to address cloud, edge, and IoT needs. These include ever-increasing scalability requirements in terms of millions, if not billions of endpoints, a new emphasis toward resiliency of the system as a whole over individual components, end-to-end security, and the ability to have a zero-trust system. In this post we’ll discuss the steps NATS is taking to address these needs, leading toward a securely connected world.

Let’s break down the challenges into scalability, resiliency at scale, and security.

Scalability

To support millions, or even billions of endpoints spanning the globe, most architectures would involve a federated approach with many layers filtering up to a control layer, driven by a required central authority for configuration and security. Instead, NATS is taking a distributed and decentralized approach.

We are introducing a resilient, self-healing way to connect NATS clusters together – NATS superclusters (think clusters of clusters) that optimize data flow. If one NATS server cluster can handle millions of clients, imagine hundreds of them connected to support truly global connectivity with no need for a central authority or hub – no single point of failure.

Traditional messaging systems require precise knowledge of a server cluster topology. This was acceptable in the past as there was no requirement otherwise. However, this severely hinders scalability in a hybrid cloud environment, and beyond to edge and IoT. NATS addresses this with the cloud-native feature of auto-discovery. NATS servers share topology by automatically exchanging cluster information with each other. When scaling upwards and adding a NATS server to an existing cluster, topology information is automatically shared, and boom – each server has complete knowledge of the cluster in real-time with zero configuration changes. And it gets better – clients also receive this topology information. NATS maintainer supported clients use this to keep an up-to-date list of available servers, enabling clients to connect to new servers and avoid connecting to servers that are no longer available – again with no configuration changes. View this feature in the context of scaling up (or down), rolling upgrades, or cruising past instance failures and this is an operator’s dream.

Resiliency at Scale

In messaging, priorities have changed with the move from traditional on-premise computing to cloud computing: The needs of the system as a whole must be prioritized over individual components in the system. Messaging systems evolved to suit predictable and well-defined systems. We knew exactly what and where our hardware resources were to accurately identify, predict, and shore up weak points of the system. While cloud vendors have done a great job, you cannot count on the same levels of predictability unless you are willing to pay for dedicated hardware.

Today we have machine instances disappearing (often intentionally), spurious network issues, or even unpredictable availability of CPU cycles on multi-tenant instances. Add to this the decomposition of applications into microservices and we have many more moving parts in a distributed system – and many more places to fail.

Unhealthy servers or clients that can’t keep up are known to NATS as slow consumers. The NATS messaging system will not assist slow consumers. Instead, they are disconnected, protecting the system as a whole. Server clusters self-heal, and clients might be restarted or simply reconnect elsewhere. If using NATS streaming, missed messages are redelivered.

This works so well that we have users who deploy on spot instances knowing the instances will terminate at some point, counting on the holistically resilient behavior of NATS to reduce their operational costs.

Security

The NATS team aims to enable the creation of a securely connected world where data transfer is protected end to end and metadata is anonymous. NATS is taking a novel approach to solving this.

Concerning identities, we’ll continue to support user/password authentication, but our preference will be a new approach using Nkeys – a form of an Ed25519 key made extremely simple.  

NKeys are fast and resistant to side channel attacks. Public Nkeys may be registered with the NATS server and during connect, the endpoint (client application) signs a nonce with its private key and returns it to the server where the identity can be verified. The NATS messaging system will never have access or store private keys. Operators or organizations manage their own private keys. In this new model, authorization to publish and receive data are tied to an NKey identity.

NKeys may be assigned to accounts, which are securely isolated communication contexts that allow multi-tenancy. Servers may be configured with multiple accounts, to support true multi-tenancy, bifurcating topology from the data flow. The decision to silo data is driven by use case, rather than software limitations, and operators only need to maintain one NATS deployment. An account can span several NATS server clusters (enterprise multi-tenancy), be limited to just one cluster (data silo by cluster), exist within a subset of a cluster or even in a single server.

Between accounts, specific data can be shared or exchanged through a service (request/reply) or a stream (publish/subscribe). Mutual agreement between accounts allows data flow, allowing for decentralized account management.

Of course, NATS will continue to support TLS connections, but applications can go a step further by using NKeys to sign payloads.

Connecting It All Together

These features, along with other work the NATS team is doing behind scalability, reliability at scale, and security will allow NATS to provide secure and decentralized global connectivity. We’ll be discussing these in-depth at KubeCon + CloudNativeCon North America in December, and have three sessions – we’d love it if you can attend. We always enjoy a good conversation around solving hard problems like this and invite you to stop by to visit the NATS maintainers at the Synadia booth to learn more.