Category

Blog

rkt: The pod-native container engine launches in the CNCF

By | Blog | No Comments

By: Jonathan Boulle, rkt project co-founder, CNCF TOC representative, and head of containers and Berlin site lead at CoreOS

Earlier this month, we announced that CoreOS made a proposal to add rkt, the pod-native container engine, as a new incubated project within the Cloud Native Computing Foundation (CNCF). Today we are happy to celebrate that rkt has been formally proposed and accepted into the CNCF.

This means that with rkt now housed in the CNCF, we ensure that the rkt and container community will continue to thrive in a neutral home for collaboration. We are excited to work alongside the CNCF community to push forward the conversation around container execution in a cloud native environment. We look forward to working alongside the CNCF to further the development of the rkt community and develop interoperability between Kubernetes, OCI and containerd.

It is a historical moment where CNCF has the opportunity to push progress on container execution for the future of the ecosystem, under a neutral and collaborative home. The future of container execution is important for cloud native. Rkt is now joining the CNCF family alongside other critical projects like gRPC, Kubernetes, Prometheus, and others.

Working with the community and next steps

rkt developers already actively collaborate on container specifications in the OCI project, and we are happy to collaborate more on the implementation side with the CNCF. We are actively working to integrate rkt with Kubernetes, the container cluster orchestration system, and together we can work to refine and solidify the shared API for how Kubernetes communicates with container runtimes. Having container engine developers work side-by-side on the testing and iteration of this API ensures a more robust solution beneficial for users in our communities.

The OCI project is hard at work on the standards side, and we expect we will be able to share the code in working with those image and runtime specifications. rkt closely tracks OCI development and has developers involved in the specification process. rkt features early implementation support for the formats with the intention of being fully compliant once the critical 1.0 milestone is reached.

What can rkt users expect from this new announcement? All of the rkt maintainers will continue working on the project as usual. Even more, we can encourage new users, and maintainers, with the help of the CNCF, to contribute to and rely on rkt.

We encourage the community to continue using rkt or try it out: and you can get involved on the rkt page on GitHub or on the mailing list.

A big thank you to all the supporters of rkt over the years. We would also like to thank Brian Grant of Google for being the official sponsor of proposal for rkt’s contribution into the CNCF.

FAQ

What is rkt? A pod-native container engine

rkt, an open source project, is an application container engine developed for modern production cloud-native environments. It features a pod-native approach, a pluggable execution environment, and a well-defined surface area that makes it ideal for integration with other systems.

The core execution unit of rkt is the *pod*, a collection of one or more applications executing in a shared context (rkt’s pods are synonymous with the concept in the Kubernetes orchestration system). rkt allows users to apply different configurations (like isolation parameters) at both pod-level and at the more granular per-application level. rkt’s architecture means that each pod executes directly in the classic Unix process model (i.e. there is no central daemon), in a self-contained, isolated environment. rkt implements a modern, open, standard container format, the App Container (appc) spec, but can also execute other container images, like those created with Docker.

Since its introduction by CoreOS in December 2014, the rkt project has greatly matured and is widely used. It is available for most major Linux distributions and every rkt release builds self-contained rpm/deb packages that users can install. These packages are also available as part of the Kubernetes repository to enable testing of the rkt + Kubernetes integration. rkt also plays a central role in how Google Container Image and CoreOS Container Linux run Kubernetes.

How were rkt and containerd contributed to the CNCF?

On March 15, 2017, at the CNCF TOC meeting, CoreOS and Docker made proposals to add rkt and containerd as new projects for inclusion in the CNCF. During the meeting, we as rkt co-founders, proposed rkt, and Michael Crosby, a containerd project lead and co-founder, proposed containerd. These proposals were the first step, and then the project went through a formal proposal to the TOC, and finally were called to a formal vote last week. Today these projects have been accepted to the organization.

What does this mean for rkt and other projects in the CNCF?

As part of the CNCF, we believe rkt will continue to advance and grow. The donation will ensure that there is ongoing shared ecosystem collaboration around the various projects, where interoperability is key. Finding a well-respected neutral home at the CNCF provides benefits to the entire community around fostering interoperability with OCI, Kubernetes, and containerd. There’s also an exciting number of opportunities for cross-collaboration with other projects like gRPC and Prometheus.

Container execution is a core part of cloud-native. By housing rkt under the CNCF, a neutral, respected home for projects, we see benefits including help with community building and engagement, and overall, fostering of interoperability with other cloud native projects like Kubernetes, OCI, and containerd.

How should we get involved?

The community is encouraged to keep using, or begin using rkt, and you can get involved on the rkt page on GitHub or on the mailing list. Note that this repo will be moved into a new vendor-neutral GitHub organisation over the coming weeks.

Deploying 2048 OpenShift nodes on the CNCF Cluster (Part 2)

By | Blog | No Comments

By Jeremy Eder, Red Hat, Senior Principal Software Engineer

Overview

The Cloud Native community has been incredibly busy since our last set of scaling tests on the CNCF cluster back in August.  In particular, the Kubernetes (and by extension, OpenShift) communities have been hard at work pushing scalability to entirely new levels. As a significant contributor to Kubernetes, Red Hat engineers are involved in this process both upstream and with our enterprise distribution of Kubernetes, Red Hat OpenShift Container Platform.  

It’s time to put the new OpenShift 3.5 release to the test again with more benchmarking on the CNCF community cluster that has been donated and built out by Intel.

For more information about what Red Hat is doing in the Kubernetes community, be sure to attend our talks At CloudNativeCon + KubeCon Europe this week:

Recap

The previous round of benchmarking on CNCF’s cluster provided us with a wealth of information, which greatly aided our work on this new release. The last series of scaling tests on the CNCF cluster consisted of using a cluster-loader utility (as demonstrated at CloudNativeCon + KubeCon in Seattle last year) to load the environment with realistic content such as Django/Wordpress, along with multi-tier apps that included databases such as PostgreSQL and MariaDB.  We did this on Red Hat OpenShift Container Platform running on a 1,000 node cluster provisioned and managed using Red Hat OpenStack Platform. We scaled the number of applications up, analyzed the state of the system while under load and folded all of the lessons learned into Kubernetes and then downstream into OpenShift.

What we built on the CNCF Cluster

This time we wanted to leverage bare metal as well.  So we built two OpenShift clusters:  one cluster of 100 nodes on bare metal and another cluster of 2,048 VM nodes on Red Hat OpenStack Platform 10.  We chose 2,048 because it’s a power of 2, and that makes engineers happy.

Goals

We kept some of our goals from last time, and added some cool new ones:

  • 2,000+ node OpenShift Cluster and research future reference designs
  • Use Overlay2 graph driver for improved density & performance, along with recent SELinux support added in kernel v4.9
  • Saturation test for OpenShift’s HAProxy-based network ingress tier
  • Persistent volume scalability and performance using Red Hat’s Container-Native Storage (CNS) product
  • Saturation test for OpenShift’s integrated container registry and CI/CD pipeline

Network Ingress/Routing Tier

The routing tier in OpenShift consists of machine(s) running HAProxy as the ingress point into the cluster.  As our tests verified, HAProxy is one of the most performant open source solutions for load balancing.  In fact, we had to re-work our load generators several times in order to push HAProxy as required.  By super popular demand from our customers, we also added SNI and TLS variants to our test suite.

Our load generator runs in a pod and its configuration is passed in via configmaps.  It queries the Kubernetes API for a list of routes and builds its list of test targets dynamically.

Here is an example of the configuration that our load generator uses:

 

In our scenario, we found that HAProxy was indeed exceptionally performant.  From field conversations, we identified a trend that there are (on average) a large number of low-throughput cluster ingress connections from clients (i.e. web browsers) to HAProxy versus a small number of high-throughput connections.  Taking this feedback into account, the default connection limit of 2,000 leaves plenty of room on commonly available CPU cores for additional connections.  Thus, we have bumped the default connection limit to 20,000 in OpenShift 3.5 out of the box.

If you have other needs to customize the configuration for HAProxy, our networking folks have made it significantly easier — as of OpenShift 3.4, the router pod now uses a configmap, making tweaks to the config that much simpler.

As we were pushing HAProxy we decided to zoom in on a particularly representative workload mix – a combination of HTTP with keepalive and TLS terminated at the edge.  We chose this because it represents how most OpenShift production deployments are used – serving large numbers of web applications for internal and external use, with a range of security postures.

Let’s take a closer look of this data, noting that since this is a throughput test with a Y-axis of Requests Per Second, higher is better.

nbproc is the number of HAProxy processes spawned.  nbproc=1 is currently the only supported value in OpenShift, but we wanted to see what if anything increasing nbproc bought us from a performance and scalability standpoint.

Each bar represents a different potential tweak:

  • 1p-mix-cpu*:  HAProxy nbproc=1, run on any CPU
  • 1p-mix-cpu0: HAProxy nbproc=1, run on core 0
  • 1p-mix-cpu1: HAProxy nbproc=1, run on core 1
  • 1p-mix-cpu2: HAProxy nbproc=1, run on core 2
  • 1p-mix-cpu3: HAProxy nbproc=1, run on core 3
  • 1p-mix-mc10x: HAProxy nbproc=1, run on any core, sched_migration_cost=5000000
  • 2p-mix-cpu*: HAProxy nbproc=2, run on any core
  • 4p-mix-cpu02: HAProxy nbproc=4, run on core 2

We can learn a lot from this single graph:

  • CPU affinity matters.  But why are certain cores nearly 2x faster?  This is because HAProxy is now hitting the CPU cache more often due to NUMA/PCI locality with the network adapter.
  • Increasing nbproc helps throughput.  nbproc=2 is ~2x faster than nbproc=1, BUT we get no more boost from going to 4 cores, and, in fact, nbproc=4 is slower than nbproc=2.  This is because there were 4 cores in this guest, and 4 busy HAProxy threads left no room for the OS to do its thing (like process interrupts).

In summary, we know that we can improve performance more than 20 percent from baseline with no changes other than sched_migration_cost.  What is that knob? It is a kernel tunable that weights processes when deciding if/how the kernel should load balance them amongst available cores.  By increasing it by a factor of 10, we keep HAProxy on the CPU longer, and increase our likelihood of CPU cache hits by doing so.

This is a common technique amongst the low-latency networking crowd, and is in fact recommended tuning in our Low Latency Performance Tuning Guide for RHEL7.

We’re excited about this one, and will endeavor to bring this optimization to an OpenShift install near you :-).

Look for more of this sort of tuning to be added to  the product as we’re constantly hunting opportunities.

Network Performance

In addition to providing a routing tier, OpenShift also provides an SDN.  Similar to many other container fabrics, OpenShift-SDN is based on OpenvSwitch+VXLAN.  OpenShift-SDN defaults to multitenant security as well, which is a requirement in many environments.

VXLAN is a standard overlay network technology.  Packets of any protocol on the SDN are wrapped in UDP packets, making the SDN capable of running on any public or private cloud (as well as bare metal).

Incidentally, both the ingress/routing and SDN tier of OpenShift are pluggable, so you can swap those out for vendors who have certified compatibility with OpenShift.

When using overlay networks, the encapsulation technology comes at a cost of CPU cycles to wrap/unwrap packets and is mostly visible in throughput tests.  VXLAN processing can be offloaded to many common network adapters, such as the ones in the CNCF Cluster.

Web-based workloads are mostly transactional, so the most valid microbenchmark is a ping-pong test of varying payload sizes.

Below you can see a comparison of various payload sizes and stream count.  We use a mix like this as a slimmed down version of RFC2544.

  • tcp_rr-64B-1i:  tcp, round-robin, 64byte payload, 1 instance (stream)
  • tcp_rr-64B-4i:  tcp, round-robin, 64byte payload, 4 instances (streams)
  • tcp_rr-1024B-1i:  tcp, round-robin, 1024byte payload, 1 instance (stream)
  • tcp_rr-1024B-4i:  tcp, round-robin, 1024byte payload, 4 instance (streams)
  • tcp_rr-16384B-1i:  tcp, round-robin, 16384byte payload, 1 instance (stream)
  • tcp_rr-16384B-4i:  tcp, round-robin, 16384byte payload, 4 instance (streams)

The X-axis is number of transactions per second.  For example, if the test can do 10,000 transactions per second, that means the round-trip latency is 100 microseconds.  Most studies indicate the human eye can begin to detect variations in page load latencies in the range of 100-200ms.  We’re well within that range.

Bonus network tuning:  large clusters with more than 1,000 routes or nodes require increasing the default kernel arp cache size.  We’ve increased it by a factor of 8x, and are including that tuning out of the box in OpenShift 3.5.

Overlay2, SELinux

Since Red Hat began looking into Docker several years ago, our products have defaulted to using Device Mapper for Docker’s storage graph driver.  The reasons for this are maturity, supportability, security, and POSIX compliance.  Since the release of RHEL 7.2 in early 2016, Red Hat has also supported the use of overlay as the graph driver for Docker.

Red Hat engineers have since added SELinux support for overlay to the upstream kernel as of Linux 4.9.  These changes were backported to RHEL7, and will show up in RHEL 7.4.  This set of tests on the CNCF Cluster used a candidate build of the RHEL7.4 kernel so that we could use overlay2 backend with SELinux support, at scale, under load, with a variety of applications.

Red Hat’s posture toward storage drivers has been to ensure that we have the right engineering talent in-house to provide industry-leading quality and support.  After pushing overlay into the upstream kernel, as well as extending support for SELinux, we feel that the correct approach for customers is to keep Device Mapper as the default in RHEL, while moving to change the default graph driver to overlay2 in Fedora 26.  The first Alpha of Fedora 26 will show up sometime next month.

As of RHEL 7.3, we also have support for the overlay2 backend.  The overlay filesystem has several advantages over device mapper (most importantly, page cache sharing among containers).  Support for the overlay filesystem was added to RHEL with salient caveats such as that it is not POSIX compliant, and that use of overlay was, at the time, incompatible with SELinux (a key security/isolation technology).

That said, the density improvements gained by page cache sharing are very important for certain environments where there is significant overlap in base image content.

We constructed a test that used a single base image for all pods, and created 240 pods on a node.  The cluster-loader utility used to drive this test has a feature called a “tuningset” which we use to control the rate of creation of pods.  You can see there are 6 bumps in each line.  Each of those represents a batch of 40 pods that cluster-loader created.  Before it moves to the next batch, cluster-loader makes sure the previous batch is in running state.  In this way, we? avoid crushing the API server with requests, and can examine the system’s profiles at each plateau.

Below are the differences between device mapper and overlay for memory consumption.  The savings in terms of memory is reasonable (again, this is a “perfect world” scenario and your mileage may vary).

The reduction in disk operations below is due to subsequent container starts leveraging the kernel’s page cache rather than having to repeatedly fetch base image content from storage:

We have found overlay2 to be very stable, and it becomes even more interesting with the addition of SELinux support.

Container Native Storage

In the early days of Kubernetes, the community identified the need for stateful containers.  To that end, Red Hat has contributed significantly to the development of persistent volume support in Kubernetes.

Depending on the infrastructure you’re on, Kubernetes and OpenShift support dozens of volume providers.  Fiber Channel, iSCSI, NFS, Gluster, Ceph as well as cloud-specific storage providers such as Amazon EBS, Google persistent disks, Azure blob and OpenStack Cinder.  Pretty much anywhere you want to run, OpenShift can bring persistent storage to your pods.

Red Hat Container Native Storage is a Gluster-based persistent volume provider that runs on top of OpenShift in a hyper-converged manner.  That is, it is deployed in pods, scheduled like any other application running on OpenShift.  We used the NVME disks in the CNCF nodes as “bricks” for gluster to use, out of which CNS provided 1GB secure volumes to each pod running on OpenShift using “dynamic provisioning.”

If you look closely at our deployment architecture, while we have deployed CNS on top of OpenShift, we also labeled those nodes as “unschedulable” from the OpenShift standpoint, so that no other pods would run on the same node.  This helps control variability — reliable, reproducible data makes performance engineers happy :-).

We know that cloud providers limit volumes attached to each instance in the 16-128 range (often it is a sliding scale based on CPU core count).  The prevailing notion seems to be that field implementations will see numbers in the range of 5-10 per node, particularly since (based on your workload) you may hit CPU/memory/IOPS limits long before you hit PV limits.

In our scenario we wanted to verify that CNS could allocate and bind persistent volumes at a consistent rate over time, and that the API control plane for CNS called Heketi can withstand an API load test.  We ran throughput numbers for create/delete operations, as well as API parallelism.

The graph below indicates that CNS can allocate volumes in constant time – roughly 6 seconds from submit to the PVC going into “Bound” state.  This number does not vary when CNS is deployed on bare metal or virtualized.  Not pictured here are our tests verifying that several other persistent volume providers respond in a very similar timeframe.

OpenStack and Ceph

As we had approximately 300 physical machines for this set of tests, and goals of hitting the “engineering feng shui” value of 2,048 nodes, we first had to deploy Red Hat OpenStack Platform 10, and then build the second OpenShift environment on top.  Unique to this deployment of OpenStack was:

  • We used the new Composable roles feature to deploy OpenStack
    • 3 Controllers
    • 2 Networker nodes
    • A bare metal role for OpenShift
  • Multiple Heat stacks
    • Bare metal Stack
    • OpenStack Stack
  • Ceph was also deployed through Director.  Ceph’s role in this environment is to provide boot-from-volume service for our VMs (via Cinder).

We deployed a 9-node Ceph cluster on the CNCF “Storage” nodes, which include (2) SSDs and (10) nearline SAS disks.  We know from our counterparts in the Ceph team that Ceph performs significantly better when deployed with write-journals on SSDs.  Based on the CNCF storage node hardware, that meant creating two write-journals on the SSDs and allocating 5 of the spinning disks to each SSD.  In all, we had 90 Ceph OSDs, equating to 158TB of available disk space.

From a previous “teachable moment,” we learned that when importing a KVM image into glance, if it is first converted to “raw” format, creating instances from that image takes a snapshot/boot-from-volume approach.  The net result of this is that for each VM we create, we end up with approximate 700MB of disk space consumed.  For the 2,048 node environment, the VM pool in Ceph only took approximately 1.5TB of disk space.  Compare this to the last (internal) test when we had 1,000 VMs taking nearly 22TB.

In addition to reduced I/O to create VMs and reduced disk space utilization, booting from snapshots on Ceph was incredibly fast.  We were able to deploy all 2,048 guests in approximately 15 minutes.  This was really cool to watch!

Bonus deployment optimization:  use image-based deploys!  Whether it’s on OpenStack, or any other infrastructure public or private, image-based deploys reduce much of what would otherwise be repetitive tasks, and can reduce the burden on your infrastructure significantly.  

Bake in as much as you can.  Review our (unsupported) Ansible-based image provisioner for a head start.

Improved Documentation for Performance and Scale

Phew! That was a lot of work..how do we ensure that the community and customers benefit?

First, we push absolutely everything upstream.  Next, we bake in as much of the tunings, best practices and config optimization into the product as possible….and we document everything else.

Along with OpenShift 3.5, the performance and scale team at Red Hat will deliver a dedicated  Scaling and Performance Guide within the official product documentation.  This provides a consistently updated section of documentation to replace our previous whitepaper, and a single location for all performance and scalability-related advice and best practices.

Summary

The CNCF Cluster is an extremely valuable asset for the open source community.  This 2nd round of benchmarking on CNCF’s cluster has once again provided us with a wealth of information to incorporate into upcoming releases. The Red Hat team hopes that the insights gained from our work will provide benefit for the many Cloud Native communities upon which this work was built:

  • Kubernetes
  • Docker
  • OpenStack
  • Ceph
  • Kernel
  • OpenvSwitch
  • Golang
  • Gluster
  • And many more!

Our team also wishes to thank the CNCF and Intel for making this valuable community resource available.  We look forward to tackling the next levels of scalability and performance along with the community!

Want to know what Red Hat’s Performance and Scale Engineering team is working on next? Check out our Trello board. Oh, and we’re hiring Principal-level engineers!

Tell Us Your Opinion About Diversity in Tech at Google Cloud Next 2017

By | Blog | No Comments

Author: Leah Petersen, Systems Engineer Samsung CNCT

Contributed blog from CNCF Platinum member Samsung

“Tell me your opinion about diversity in tech.”

…not something you expect to be asked at a technology conference booth. This year at the Samsung Cloud Native Computing Team sponsor booth we decided to ask Google Cloud Next attendees their opinion about a problem in our industry – the lack of diversity. We could immediately see a trend – some people nervously shied away from us, laughing it off, and others marched straight up to us and began speaking passionately about the subject.

We chose this unique approach to interacting with conference goers for a few reasons. As big supporters of diversity in tech, we wanted to gather ideas. We asked if their companies were proactively taking any measures to increase the diversity at their company or retain diverse individuals. We also wanted to get people thinking about this issue, since the cloud native computing space is relatively young and we see a great benefit in assembling a diverse group of people to move it forward. Finally, as information-bombarded, weary attendees navigated through the booth space, we wanted to offer something beyond yet another sales pitch.

The three day long conference turned up a lot of interesting ideas and new perspectives. A favorite was how one person defined lack of diversity by describing a group of entirely “Western-educated males” making decisions for a company. After defining what a diverse workforce does or doesn’t look like, lots of people talked about what their companies were doing to take action.

Many companies were involved in youth programs, coding camps, university outreach, and mentoring programs. Salesforce hired a Chief Equality Officer to put words into action. One CTO from a Singapore-based company told us how more women in Asian countries commonly chose a STEM education track and 65% of his engineering team is female. Another company removed names and university names from resumes to address implicit biases in the interviewers. Most of the women we talked to simply thanked us for bringing up this issue and described how isolating it can be being the lone female on a team.

Another common, less positive story was how their company had tried a diversity program but gave up. This scenario underscores the unavoidable truth: bringing diversity into tech isn’t easy. Encouraging children to choose STEM careers is a long game that will bring change, but bringing diversity into the workplace right now is another story.

There’s a diverse, willing, and intelligent pool of workers, but they need training. People of different backgrounds, who weren’t able to get the classic Western STEM education need opportunities to transition into tech. As one man pointed out, adult training programs are the answer, but companies need to do more than just offer training. Finding the time and energy to break out of the demanding lifestyle of a single parent or low-income adult is near impossible.

Apprenticeships with financial support are the answer to getting a mature, diverse workforce.

The week’s undeniable message from everyone was: we NEED diversity and more specifically we need diversity in leadership positions. We need more points of view and we need a better representation of our society in the tech industry.

Diverse teams are more adaptable overall and build better products that serve more people.

FOSDEM 2017 Recap: Monitoring and Cloud Devroom & Linux Containers and Microservices Devroom Sponsored by CNCF

By | Blog | No Comments

Each year, FOSDEM attracts more than 8,000 developers as Josh Berkus, the project atomic community lead at Red Hat, puts it, the event is “a great way to reach a large number of open source geeks, community members and potential contributors.” Richard Hartmann, project director and system architect at SpaceNet AG, even dubbed it “the largest of its kind in Europe and, most likely, the world.”

To display some of the cloud native space’s brightest insight and engage with our ever-growing community, we sponsored the Monitoring and Cloud Devroom and the Linux Containers and Microservices Devroom.

On Sunday, the Linux Containers and Microservices Devroom room was overflowing with 200+ infrastructure hackers, devops admins and more – all gathered to learn more about new containerized application stacks, from the practical to the theoretical. The room was at capacity throughout the day. According to Diane Mueller-Klingspor, director of community development for Red Hat OpenShift, it was “so popular that very few relinquished their seats between talks. When people did leave, the entire row just shifted to fill the seat if it was in the middle, leaving an open seat on the end of the row for newcomers. We called this the ‘Open Shift’ which the audience got a good kick out of.”

That same day the Monitoring and Cloud Devroom kicked off with an equally eager group of developers. The Devroom, which was the largest FOSDEM has to offer, was also packed and demonstrated that as the world moves to cloud- and microservice-based architectures, monitoring is super important to aid observability and diagnose problems.

Here are more highlights from our community ambassadors on why these “grassroots” gatherings foster such excitement in the cloud native community:

Mueller-Klingspor: “Attendees of these Devrooms weren’t just hackers or people hacking on Kube, they were downright serious about the topic of microservices and containers. These developers were looking for information to help empower them to get their organizations up and running using microservices, containers and, of course, orchestrating it all with Kubernetes. Unlike the FOSDEM Devrooms I’ve attended in the past, the Linux Containers and Microservices Devroom was not filled with dilettantes; everyone here had already committed to the cloud native mindset and was looking for deep technical information to help them get the job done.”

Berkus: “The technical content for the day was really good. In the end, we ended up with a program where most of the talks effectively built on each other really rewarding the attendees who sat through the whole day (and there were more than a few). From my perspective, the top takeaways for developers who dropped into the Containers and Microservices Devroom were:

  • Kubernetes is the main cloud native orchestration platform;
  • Communications between containers is moving from REST to gRPC in upstream development; and
  • You can containerize Java applications too, and there are some real benefits to that.”

Chris Aniszczyk, COO of CNCF: “The presentations were especially amazing for those new to cloud native monitoring. We kicked off with a talk about the history of monitoring and then transitioned into the general categories of metrics, logs, profiling and distributed tracing, along with dives into how each of these is important. My main takeaway from the Monitoring and Cloud Devroom was how critical monitoring is becoming as companies scale out their operations in the cloud. We heard from Booking.com and Wikimedia about some of the challenges they had with monitoring at scale. I was also thrilled to hear the Prometheus project woven into almost every monitoring talk in the devroom; it’s becoming a fantastic open source choice for cloud native monitoring.”

Hartmann: “From my perspective, the main theme of the Monitoring and Cloud Devroom was to help people lead their teams to cloud-native technology and how to enact said change in a way that their teams want to play along. One of the biggest takeaways for developers was to make sure, first and foremost, to focus on user-visible services. FOSDEM is the world’s single largest FLOSS developer conference with dozens of satellite mini-conferences, team meetings and more. Sponsoring these Devrooms helps to support these efforts, gives back to the community and ensures that CNCF gets better acquainted with traditional developers.”

Here’s a list of of speakers and their topics with links to videos in case you missed FOSDEM:

Linkerd Celebrates One Year with One Hundred Billion Production Requests

By | Blog | No Comments

By William Morgan, Linkerd co-creator and Buoyant co-founder

We’re happy to announce that, one year after version 0.1.0 was released, Linkerd has processed over 100 billion production requests in companies around the world. Happy birthday, Linkerd! Let’s take a look at all that we’ve accomplished over the past year.

We released Linkerd into the wild in February 2016, with nothing more than a couple commits, a few early contributors, and some very big dreams. Fast-forward by one year, and Linkerd has already grown to 30+ releases, 800+ commits, 1500+ stars, 30+ contributors, 600+ people in the Linkerd Slack, and 30-odd companies around the globe using it in production (or on the path to production)—including folks like Monzo, Zooz,NextVR, Houghton Mifflin Harcourt, Olark and Douban.

Not to mention, of course, that Linkerd is now officially a CNCF project, alongside Kubernetes, Prometheus, gRPC, and a couple other other amazing projects that are defining the very landscape of cloud native infrastructure.

To the many contributors, users, and community members—thank you for helping us make Linkerd so successful this past year. (And thank you for privately sharing your production request volumes and deployment dates, which allow us to make big claims like the one above!) We couldn’t have asked for a better community. We’d especially like to thank Oliver Beattie, Jonathan Bennet, Abdel Dridi, Borys Pierov, Fanta Gizaw, Leo Liang, Mark Eijsermans, Don Petersen, and Oleksandr Berezianskyi for their contributions to the project and the community.

You can read the full press release here.

Finally, here’s a fun vanity metric graph, courtesy of Tim Qian’s excellent Github star history plotter:

Linkerd GitHub star history

Here’s to another great year for Linkerd!

* Blog originally posted on https://blog.buoyant.io/2017/03/07/linkerd-one-hundred-billion-production-requests/

Cloud Native Computing Foundation Becomes Steward of Service Naming And Discovery Project CoreDNS

By | Blog | No Comments

The CNCF’s Technical Oversight Committee (TOC) recently voted CoreDNS into the CNCF portfolio of projects. CoreDNS, a fast, flexible and modern DNS server, joins a growing number of projects integral to the adoption of cloud native computing. CoreDNS was voted in as an inception project – see explanation of maturity levels and graduation criteria here.

“Kubernetes and other technology projects use DNS for service discovery, so we are a key component to the implementation of cloud native architectures. Additionally, CoreDNS’ design allows easy extension of DNS functionality to various container stacks,” said John Belamaric, CoreDNS core maintainer and distinguished architect at Infoblox. “As a CNCF project, we are looking forward to the Foundation helping to fuel developer contributions and user growth. Increased integration with other CNCF projects and cloud native technologies and priority usage of its Community Cluster are also important to us.

A focused, lightweight DNS server with a microservice philosophy guiding its internal design, CoreDNS is a critical component to the cloud native architecture and an important project for CNCF,” said Jonathan Boulle, CNCF TOC representative and head of containers and Berlin site lead at CoreOS.The TOC has made it easier to find, submit and follow the progress of project proposals. By making the process more streamlined, we have been able to add several new projects over the past several months to speed the development of open source software stacks that enable cloud native deployments.

Here’s more about the young, but growing distributed systems-friendly DNS project started by Miek Gieben, a Google Site Reliability Engineer.

CoreDNS

Founded in March 2016, CoreDNS is the successor to the popular SkyDNS server. It is built as a server plugin for the widely-used Caddy webserver and uses the same model: it chains middleware.

The SkyDNS architecture did not lend itself to the flexible world of cloud deployments (organic grown code base, monitoring, caching, etc.). Additionally, other DNS servers (BIND9, NSD, Knot) may be good for serving DNS, but are not flexible and do not support etcd as a backend, for instance.

The CoreDNS creators built on the concept of SkyDNS and took into account its limitation and the limitations of other DNS servers to create a generic DNS server that can talk to multiple backends (etcd, Consul, Kubernetes, etc.). CoreDNS aims to be a fast and flexible DNS server, allowing users to access and use their DNS data however they please.

CoreDNS has been extended to operate directly with Kubernetes to access the service data. This “middleware” implementation for CoreDNS provides the same client-facing behavior as KubeDNS. The pipeline-based design of CoreDNS allows easy extension to use any container orchestrator as a DNS data source.

“In creating CoreDNS, our goal is to become the cloud DNS server allowing others like Docker, Hashicorp, and Weaveworks to use this technology as a library,” said Gieben. “Additionally, CoreDNS is useful outside cloud environments as well, so whether you use a private, public, on-premises or multi-cloud environment, you can use CoreDNS.” With the future addition of a pluggable policy engine, CoreDNS will extend its capabilities to sophisticated security and load balancing use cases.

Architecture

  • Chains DNS middleware, each “feature” is contained in a middleware, for instance:
    • monitoring (Prometheus),
    • logging,
    • file based DNS,
    • etcd and k8s backends.
  • Only load the middleware(s) you need
  • Each middleware is self-contained —  easy to add new behavior

Figure 1: CoreDNS Architecture

Currently CoreDNS is able to:

  • Serve zone data from a file; both DNSSEC (NSEC only) and DNS are supported (middleware/file).
  • Retrieve zone data from primaries, i.e., act as a secondary server (AXFR only) (middleware/secondary).
  • Sign zone data on-the-fly (middleware/dnssec).
  • Load balancing of responses (middleware/loadbalance).
  • Allow for zone transfers, i.e., act as a primary server (middleware/file).
  • Caching (middleware/cache).
  • Health checking (middleware/health).
  • Use etcd as a backend, i.e., a replacement for SkyDNS (middleware/etcd).
  • Use k8s (kubernetes) as a backend (middleware/kubernetes).
  • Serve as a proxy to forward queries to some other (recursive) nameserver, using a variety of protocols like DNS, HTTPS/JSON and gRPC (middleware/proxy).
  • Rewrite queries (qtype, qclass and qname) (middleware/rewrite).
  • Provide metrics (by using Prometheus) (middleware/metrics).
  • Provide Logging (middleware/log).
  • Support the CH class: version.bind and friends (middleware/chaos).
  • Profiling support (middleware/pprof)
  • Integrate with OpenTracing-based distributed tracing solutions (middleware/trace)

For more on CoreDNS, check out the CNCF project proposal here and follow @corednsio to stay in touch.

“CoreDNS provides essential naming services efficiently and integrates effectively with many other cloud native (CNCF) projects,” said Chris Aniszczyk, COO, CNCF.  “Furthermore, with a highly modular and extensible framework, CoreDNS is a compelling option for Kubernetes service discovery.”

As a CNCF inception project, every 12 months, CoreDNS will come to a vote with the TOC. A supermajority vote is required to renew a project at inception stage for another 12 months or move it to incubating or graduated stage. If there is not a supermajority for any of these options,, the project is not renewed.

To learn more, check out these in-depth blogs on adding middleware to CoreDNS and using CoreDNS for Kubernetes service discovery.

Discuss this post on Hacker News!

Slack Gives Back to K8s and CNCF Community

By | Blog | No Comments

Slack is giving back to the Kubernetes and CNCF communities with free access as part of their not for profit program. We are also thrilled that they have extended their not for profit program to include 501(c)(6) organizations like CNCF:

Like many projects, companies and foundations, CNCF is a very active Slack user, leveraging the tool to communicate with members, ambassadors, and our larger cloud native community. Additionally, we have Slack channels for many of our projects like OpenTracing and Prometheus, while Kubernetes has its own channel with 8,062 registered users.

We’re excited about this move from Slack – and certainly appreciate improved archiving at no extra cost. Even more importantly, free Slack frees up CNCF to invest more in the k8s community for things like better documentation, continuous integration and scholarships and to attend community events.

Insert your favorite Slack emoji here: !!!

 

Thank you @SlackHQ!

Cloud Native Computing Foundation To Host gRPC from Google

By | Blog | No Comments

CNCF is the new home for gRPC and its existing ecosystem projects (https://github.com/grpc and https://github.com/grpc-ecosystem). The sixth project voted in by CNCF’s Technical Oversight Committee (TOC), gRPC is a modern, open source, high performance remote procedure call (RPC) framework originally developed by Google that can run in any environment.  

Designed to make connecting and operating distributed systems easy and efficient, Google has been using many of the underlying technologies and concepts in gRPC. The current implementation is being used in several of Google’s cloud products and externally facing APIs.

Outside of Google, there’s a growing number of public users. According to The New Stack: “Within the first year of its launch, gRPC was adopted by CoreOS, Lyft, Netflix, Square, and Cockroach Labs among others. Etcd by CoreOS, a distributed key/value store, uses gRPC for peer to peer communication and saw huge performance improvements. Telecom companies such as Cisco, Juniper, and Arista are using gRPC for streaming the telemetry data and network configuration from their networking devices.”

Developers often work with multiple languages, frameworks, technologies, as well as multiple first- and third-party services. This can make it difficult to define and enforce service contracts and to have consistency across cross-cutting features such as authentication and authorization, health checking, load balancing, logging and monitoring and tracing — all the while maintaining efficiency of teams and underlying resources. gRPC can provide one uniform horizontal layer where service developers don’t have to think about these issues and can code in their native language. (Read this Container Solutions blog for an introduction to gRPC).

“For large-scale Internet companies and high throughput storage and streaming systems where performance really matters, gRPC can shine. In addition, having a uniform framework to connect and operate cross-language services where difficult concepts like circuit breaking, flow control, authentication, and tracing are taken care of can be very useful,” said Varun Talwar, product manager at Google in charge of gRPC.

In the same New Stack article, Janakiram MSV wrote, “When compared to REST+JSON combination, gRPC offers better performance. It heavily promotes the use of SSL/TLS to authenticate the server and to encrypt all the data exchanged between the client and the server.”

Aiming to be the protocol that becomes a next-generation standard for server-to-server communications in an age of cloud microservices, recently InfoWorld reported on the 1.0 release and its ease of use, API stability, and breadth of support.

“We are excited to have gRPC be the sixth project voted in by CNCF’s Technical Oversight Committee (TOC). Being a part of the CNFC can help bolster the gRPC community and tap into new use cases with microservices, cloud, mobile and IoT,” continued Talwar.

Just a little more than one and half years old, the project already has 12K Github stars (combined), >2500 forks (combined) and >100 contributors.

“As the neutral home of Kubernetes and four additional projects in the cloud native technology space (Fluentd, Linkerd, Prometheus, and OpenTracing), having gRPC join CNCF will attract more developers to collaborate, contribute and grow into committers. We also look forward to bringing gRPC into the CNCF family by hosting a gRPC project update at our upcoming CloudNativeCon EU event,” said Chris Aniszczyk, COO, CNCF.

The move into CNCF comes with a change of license from BSD-3 license plus a patent grant to Apache v2. Read “Why CNCF Recommends ASLv2” to understand why CNCF believes this is the best software license for open source projects today.

“ASL v2 is a well-known and familiar license with companies and thus is more suitable for our next wave of adoption while likely requiring less legal reviews from new potential gRPC users and contributors,” said Dan Kohn, Executive Director, CNCF.

With a strong focus on working with existing stacks, gRPC’s pluggable architecture allows for integrations with service discovery systems like Consul, ZooKeeper, etcd, Kubernetes API, tracing and metrics systems like: Prometheus, Zipkin, Open Tracing and proxies like Proxies: nghttp2, linkerd, Envoy. gRPC also encapsulates authentication and provides support for TLS mutual auth or you can plugin your own auth model (like OAuth).

Being a part of the broader CNCF ecosystem will encourage future technical collaborations, according to Talwar.

Interested in hearing more from core gRPC developers? Look for a future post from Talwar on why the project joined CNCF. He’ll also talk about how the project hopes to gain new industry friends and partners to pave the way for becoming the de-facto RPC framework for the industry to adopt and build on top of. This Google Cloud Platform Podcast on gRPC with Talwar from last spring is also worth a listen.

Discuss this post on Hacker News!

Prometheus User Profile: How DigitalOcean Uses Prometheus

By | Blog | No Comments

DigitalOcean – a CNCF member and devoted Prometheus user – is approaching one million registered users with more than 40,000 active teams. With workloads becoming more complex, it is focused on delivering the tools and performance that are required to seamlessly deploy, scale and manage any sized application.

However, before transitioning to Prometheus, the team’s metrics collecting experience from all of DigitalOcean’s physical servers wasn’t so great. Many of the company’s teams were disappointed with the previous monitoring system for a myriad of reasons, with more than one team expressing frustration in the offering’s query language and visualization tools available.

This led to DigitalOcean’s transition to Prometheus.

In the below blog, originally published by Prometheus, Ian Hansen who works on DigitalOcean’s platform metrics team talks about how they use Prometheus.

Going to CloudNativeCon + KubeCon Berlin? Ian’s DigitalOcean colleague, Joonas Bergius, will be presenting “Kubernetes at DigitalOcean: Building a Platform for the Future [B]” on March 29th. Make sure to check catch his session!

 

Interview with DigitalOcean

Posted at: September 14, 2016 by Brian Brazil

Next in our series of interviews with users of Prometheus, DigitalOcean talks about how they use Prometheus. Carlos Amedee also talked about the social aspects of the rollout at PromCon 2016.

Can you tell us about yourself and what DigitalOcean does?

My name is Ian Hansen and I work on the platform metrics team. DigitalOcean provides simple cloud computing. To date, we’ve created 20 million Droplets (SSD cloud servers) across 13 regions. We also recently released a new Block Storage product.

What was your pre-Prometheus monitoring experience?

Before Prometheus, we were running Graphite and OpenTSDB. Graphite was used for smaller-scale applications and OpenTSDB was used for collecting metrics from all of our physical servers via Collectd. Nagios would pull these databases to trigger alerts. We do still use Graphite but we no longer run OpenTSDB.

Why did you decide to look at Prometheus?

I was frustrated with OpenTSDB because I was responsible for keeping the cluster online, but found it difficult to guard against metric storms. Sometimes a team would launch a new (very chatty) service that would impact the total capacity of the cluster and hurt my SLAs.

We are able to blacklist/whitelist new metrics coming in to OpenTSDB, but didn’t have a great way to guard against chatty services except for organizational process (which was hard to change/enforce). Other teams were frustrated with the query language and the visualization tools available at the time. I was chatting with Julius Volz about push vs pull metric systems and was sold in wanting to try Prometheus when I saw that I would really be in control of my SLA when I get to determine what I’m pulling and how frequently. Plus, I really really liked the query language.

How did you transition?

We were gathering metrics via Collectd sending to OpenTSDB. Installing the Node Exporter in parallel with our already running Collectd setup allowed us to start experimenting with Prometheus. We also created a custom exporter to expose Droplet metrics. Soon, we had feature parity with our OpenTSDB service and started turning off Collectd and then turned off the OpenTSDB cluster.

People really liked Prometheus and the visualization tools that came with it.

Suddenly, my small metrics team had a backlog that we couldn’t get to fast enough to make people happy, and instead of providing and maintaining Prometheus for people’s services, we looked at creating tooling to make it as easy as possible for other teams to run their own Prometheus servers and to also run the common exporters we use at the company.

Some teams have started using Alertmanager, but we still have a concept of pulling Prometheus from our existing monitoring tools.

What improvements have you seen since switching?

We’ve improved our insights on hypervisor machines. The data we could get out of Collectd and Node Exporter is about the same, but it’s much easier for our team of golang developers to create a new custom exporter that exposes data specific to the services we run on each hypervisor.

We’re exposing better application metrics. It’s easier to learn and teach how to create a Prometheus metric that can be aggregated correctly later. With Graphite it’s easy to create a metric that can’t be aggregated in a certain way later because the dot-separated-name wasn’t structured right.

Creating alerts is much quicker and simpler than what we had before, plus in a language that is familiar. This has empowered teams to create better alerting for the services they know and understand because they can iterate quickly.

What do you think the future holds for DigitalOcean and Prometheus?

We’re continuing to look at how to make collecting metrics as easy as possible for teams at DigitalOcean. Right now teams are running their own Prometheus servers for the things they care about, which allowed us to gain observability we otherwise wouldn’t have had as quickly. But, not every team should have to know how to run Prometheus. We’re looking at what we can do to make Prometheus as automatic as possible so that teams can just concentrate on what queries and alerts they want on their services and databases.

We also created Vulcan so that we have long-term data storage, while retaining the Prometheus Query Language that we have built tooling around and trained people how to use.

Measuring the Popularity of Kubernetes Using BigQuery

By | Blog | No Comments

By Dan Kohn, CNCF Executive Director, @dankohn1

Kubernetes Logo

As the executive director of CNCF, I’m proud to host Kubernetes, which is one of the highest development velocity projects in the history of open source. I know this because I can do a web search and see… quite a few people being quoted saying that, but does the data support this claim?

This blog post works through the process of investigating that question. CNCF licenses a dashboard from Bitergia, but it’s more useful for project trends over time than comparing to other open source projects. Project velocity matters because developers, enterprises and startups are more interested in working with a technology that others are adopting, so that they can leverage the investments of their peers. So, how does Kubernetes compare to the other 53 million GitHub repos?

By way of excellent blog posts from Felipe Hoffa and Jess Frazelle (the latter a Kubernetes contributor and speaker at our upcoming CloudNativeCon/KubeCon Berlin), I got started on using BigQuery to analyze the public GitHub data set. You can re-run any of the gists below by creating a free BigQuery account. All of the data below is for 2016, though you can easily run against different time periods.

My first attempt found that the project with the highest commit rate on GitHub is… KenanSulayman/heartbeat, a repo with 9 stars which appears to be an hourly update from a Tor exit node. Well, that’s kind of a cool use of GitHub, but not really what I’m looking for. I learned from Krihelinator (a thoughtful though arbitrary new metric that currently ranks Kubernetes #4, right in front of Linux), that some people use GitHub as a backup service. So, rerunning with a filter of more than 10 contributors puts Kubernetes at #29 based on its 8,703 commits. For reference, that’s almost exactly one commit an hour, around the clock, for the entire year.

That metric also leaves off torvalds/linux, because the kernel’s git tree is mirrored to GitHub, but that mirroring does not generate GitHub events that are stored in that data set. Instead, there is a separate BigQuery data set that just measures commits. When I run a query to show the projects with the most commits, I unhelpfully get dozens of forks of Linux and also many forks of a git learning tool. Here is a better query that manually checks for committers, authors, and commits of 8 popular projects, and shows Kubernetes as #2, with about 1/5th the authors and commits of Linux.1

To see how many unique committers Kubernetes had in 2016, I used this query, which showed that there were… 59, because Kubernetes uses a GitHub robot to do the vast majority of the actual commits. The correct query requires looking inside the commits at the actual authors, and when ranked by unique authors, Kubernetes comes in at #10 with 868.

Updating Hoffa’s query about issues opened to include data for all of 2016 (while still ignoring robot comments), Kubernetes remains #1 with 42,703, with comments from 3,077 different developers. Frazelle’s analysis of pull requests (updated for all of 2016 and to require more than 10 contributors to avoid backup projects) now shows Kubernetes at #2 with 10,909, just behind a Java intranet portal. (Rather than GitHub issues and pull requests, Linux uses its own email-based workflow described in a talk last year by stable kernel maintainer Greg Kroah-Hartman, so it doesn’t show up in these comparisons.)

Kubernetes 2016 Rankings

Measure Ranking
Krihelinator 4
Commits 29
Authors 10
Issue Comments 1
Pull Requests 2

In conclusion, I’m not sure that any of these metrics represents the definitive one. You can pick your preferred statistic, such as that Kubernetes is in the top 0.00006% of the projects on GitHub. I prefer to just think of it as one of the fastest moving projects in the history of open source.

What’s your preferred metric(s)? Please let me know at @dankohn1 or in the Hacker News comments, and I’m happy to provide t-shirts in exchange for cool visualizations.


1 OpenHub incorrectly showed more than 3x as many authors and 5x the commits for Linux in 2016 as the BigQuery data set. I confirmed this is an error with Linux stable kernel maintainer Greg Kroah-Hartman (who checked the actual git results) and reported it to OpenHub. They’ve since fixed the bug.