KubeCon + CloudNativeCon San Diego | November 18 – 21 | Learn more

Category

Blog

Announcing “CNCF Community Awards” – Winners to be Recognized at CloudNativeCon

By | Blog

CNCF Community AwardsNominations open today for the CNCF Community Awards to honor those who have made the greatest impact over the last year in the cloud native space. Within the fast-growing Kubernetes and Prometheus communities, there’s an incredible amount of talent, hard work and commitment worthy of recognition. Rising open source communities require champions to lead the charge.

So, if you know a great advocate or project contributor in the cloud native space, click here to nominate the next CNCF Top Ambassador or CNCF Top Committer today!

The CNCF Community Awards will reward the community members, developers and advocates working hardest to advance born-in-the-cloud technologies through these two awards:

  • CNCF Top Ambassador: A champion for the cloud native space, this individual helps spread awareness of the CNCF (and its incubated projects). The CNCF Ambassador leverages multiple platforms, both online as well as speaking engagements, driving interest and excitement around the project.

  • CNCF Top Committer: This will recognize excellence in technical contributions to CNCF and its incubated projects. The CNCF Top Committer has made key commits to the project and, more importantly, contributes commits that benefit the project neutrally, as a whole (versus commits that primarily benefit the committer’s employer/sponsor).

CNCF Member Session: Vipin Jain of Cisco on “Leveraging Enterprise Networks for Containers”

By | Blog

Vipin Jain

At the recent May 2016 CoreOS Fest in San Francisco, Vipin Jain of Cisco gave a persuasive presentation to attendees on “Leveraging Enterprise Networks for Containers.” The distinguished engineer covered everything from the traditional enterprise data center at a “status quo” level, the journey toward new application environments, leveraging native connectivity in today’s networks for better visibility, and the growing organizational desire to move toward an agile, microservices-based architecture. Find a full video of his session here.

CNCF Member Session: Tim Hockin of Google on “Building Longevity into Kubernetes”

By | Blog

This year’s CoreOS Fest in San Francisco was jam packed with exciting sessions. Tim Hockin of Google gave one of the event’s most informative sessions, titled “Building Longevity into Kubernetes.” The veteran Kubernetes engineer discussed the importance of building modularity, extensibility, and pluggability into the famed container cluster. Offering his experience and perspective on what does and does not work for open source containers, Hockin’s session also covered how to make enterprise systems more adaptable. Find a full video of his session here.

Why Prometheus Developers Love Open Source

By | Blog

“Open source is a lot of people working hard to build the best software they can to give it to you for free,” said Fabian Reinartz, engineer at CoreOS and core Prometheus developer.

We asked the Prometheus developers why they work so hard to give software away. Their responses showed a sense of community, sharing, thirst for knowledge, desire to be challenged, and deep love of open source.

Open Source Democratizes Knowledge

“I have always felt strongly about the importance of open source software, open culture, and democratization of knowledge and power in our society in general. Contributing to open source is one way to further that goal”, said Julius Volz, infrastructure engineer and co-creator of Prometheus. “The particular reason why we initially started Prometheus was because no other open-source monitoring systems met our needs at SoundCloud. We created it as an open-source project from the onset so that everyone else in the world would be able to benefit from it and contribute back as well. This motivation remains until today, where countless organizations around the globe are now using Prometheus and helping us to improve it. I believe that this is the model that works best in the long term, especially for infrastructure software of this kind.”

Open Source Challenges a Person

“Writing software in the open source world adds a lot of challenging dimensions to a project. One does not solve problems against a single organisation’s requirements, but against hundreds. This forces you to think about which features you actually want to build and how to solve them to fit as many use cases as possible, said Reinartz. “Ultimately it enables you to write better, more flexible software and gives you the opportunity to discuss your ideas with a wide range of people.”

Open Source is Give as Much as Take

“I hope companies consider that open source is not a free lunch,” said Brian Brazil, founder of Robust Perception and core developer of Prometheus. “As part of using OSS they should at least contribute back bug reports with fixes to the projects they use, and preferably new features too. Without people and companies collaborating there is no open source.”

Open Source as a Recruitment Tool

“Open source is a great way for companies to attract developers. On one hand, developers will find working at your company more attractive and purposeful if they can work on open-source software and thus benefit the entire world instead of just one company. On the other hand, managing an open-source project is also a great way for discovering possible people to hire among the contributors to your project,” said Volz.

Open Source Advances Careers

A study by the Linux Foundation found that 86% of open source professionals said that knowing open source has advanced their careers.

Screen Shot 2016-06-27 at 10.56.47 AM

Figure 1: The 2016 Open Source Jobs Report – What Employers and Tech Pros Want – The Linux Foundation

“Co-founding Prometheus has helped my career tremendously – although I didn’t have a reason to complain before – however, now I am now one of the top experts on a piece of software which is used worldwide by companies large and small. This is already opening up many new doors and I’m sure it will continue to do so in the future,” said Volz.

“In many ways open source has been defining my career. Contributing to Prometheus got me my first real job at SoundCloud and subsequently my new role at CoreOS, where I keep working on the project. Open source is hands down the fastest way to gain visibility as an aspiring programmer. More importantly, it exposes me to an incredible amount of collective knowledge and ideas. I doubt I would have learned at the pace I did in a more closed down environment,” said Reinartz.

Open Source Drives Cutting Technologies

Screen Shot 2016-06-27 at 10.59.26 AM

Figure 2: The 2016 Open Source Jobs Report – What Employers and Tech Pros Want – The Linux Foundation

“Open source is a great way to gain access to the best software, and to collaborate with others to improve it, in order to solve real world problems,” said Brazil. ”I wouldn’t be where I am today without open source, as it was instrumental to my growth as a developer. Today supporting open source, in particular the Prometheus project, is the basis of my business.”

CNCF Project Session: Björn Rabenstein & Fabian Reinartz of Prometheus on “Monitoring Kubernetes Clusters with Prometheus”

By | Blog

prometheus
Björn Rabenstein & Fabian Reinartz, both of Prometheus, treated attendees at this summer’s CoreOS Fest in Berlin to a presentation on “Monitoring Kubernetes Clusters with Prometheus.” Their interactive session covered installing a Prometheus server in a Kubernetes cluster, monitoring the cluster components, and monitoring application services with an example of a Prometheus node exporter. Find a full video of their joint session here. For those who missed the session, Fabian followed up with this step-by-step blog post – “How to Monitor Kubernetes Clusters with Prometheus.”

Prometheus User Profile: Monitoring the World’s Largest Digital Festival – DreamHack

By | Blog

A local area network gathering with live concerts and competitions in digital art and esports, DreamHack is considered the world’s largest digital festival. In fact, it was recognized by the Guinness Book of Records and Twin Galaxies as being the world’s largest LAN party and computer festival, as well as having the world’s fastest Internet connection, and the most generated traffic.

DreamHack hosts several of these events throughout Europe: in Stockholm and Jönköping, Sweden; Tours, France; Bucharest and Cluj, Romania; Valencia, Spain; London, England and Leipzig, Germany; and in North America: Austin, Texas; and Montreal, Quebec.

Screen Shot 2016-08-09 at 3.35.33 PM
The Network Team at Dreamhack builds the infrastructure each amazing pop-up event from scratch in just five days! To put this in perspective, an ordinary infrastructure of this size takes months to build. To tackle something of this magnitude in such a short time-frame takes an impressively talented team and the ability to have eyes and ears in every facet of the network. This is where Prometheus helps.

In this blog, originally published on the Prometheus blog, Christian Svensson from Dreamhack’s Networking Team shares his experiences evaluating and using Prometheus.

Going to PromCon? Svensson will be presenting the below case study on August 25th. Make sure to check him out live!

Monitoring DreamHack – the World’s Largest Digital Festival

Posted at: June 24, 2015 by Christian Svensson (DreamHack Network Team) Editor’s note: This article is a guest post written by a Prometheus user.

If you are operating the network for 10,000’s of demanding gamers, you need to really know what is going on inside your network. Oh, and everything needs to be built from scratch in just five days.

If you have never heard about DreamHack before, here is the pitch: Bring 20,000 people together and have the majority of them bring their own computer. Mix in professional gaming (eSports), programming contests, and live music concerts. The result is the world’s largest festival dedicated solely to everything digital.

To make such an event possible, there needs to be a lot of infrastructure in place. Ordinary infrastructures of this size take months to build, but the crew at DreamHack builds everything from scratch in just five days. This of course includes stuff like configuring network switches, but also building the electricity distribution, setting up stores for food and drinks, and even building the actual tables.

The team that builds and operates everything related to the network is officially called the Network team, but we usually refer to ourselves as tech or dhtech. This post is going to focus on the work of dhtech and how we used Prometheus during DreamHack Summer 2015 to try to kick our monitoring up another notch.

The equipment

Turns out that to build a highly performant network for 10,000+ computers, you need at least the same number of network ports. In our case these come in the form of ~400 Cisco 2950 switches. We call these the access switches. These are everywhere in the venue where participants will be seated with their computers.

8206439882_4739d39a9c_c
Dutifully standing in line, the access switches are ready to greet the DreamHackers with high-speed connectivity.

Obviously just connecting all these computers to a switch is not enough. That switch needs to be connected to the other switches as well. This is where the distribution switches (or dist switches) come into play. These are switches that take the hundreds of links from all access switches and aggregate them into more manageable 10-Gbit/s high-capacity fibre. The dist switches are then further aggregated into our core, where the traffic is routed to its destination.

On top of all of this, we operate our own WiFi networks, DNS/DHCP servers, and other infrastructure. When completed, our core looks something like the image below.

dh_network_planning_map-cbcc49fca65

The planning map for the distribution and core layers. The core is clearly visible in “Hall D”

All in all this is becoming a lengthy list of stuff to monitor, so let’s get to the reason you’re here: How do we make sure we know what’s going on?

Introducing: dhmon

dhmon is the collective name of the systems that not only monitor the network, but also allow other teams to collect metrics on whatever they want.

Since the network needs to be built in five days, it’s essential that the monitoring systems are easy to set up and keep in sync if we need to do last minute infrastructural changes (like adding or removing devices). When we start to build the network, we need monitoring as soon as possible to be able to discover any problems with the equipment or other issues we hadn’t foreseen.

In the past we have tried to use a mix of commonly available software such as Cacti, SNMPc, and Opsview among others. While these have worked they have focused on being closed systems and only provided the bare minimum. A few years back a few people from the team said “Enough, we can do better ourselves!” and started writing a custom monitoring solution.

At the time the options were limited. Over the years the system went from using Graphite (scalability issues), a custom Cassandra store (high complexity), and InfluxDB (immature software) to finally land on using Prometheus. I first learned about Prometheus back in 2014 when I met Julius Volz and I had been eager to try it ever since. This summer we finally replaced the custom InfluxDB-based metrics store that we had written with Prometheus. Spoiler: We’re not going back.

The architecture

The monitoring solution consists of three layers: collection, storage, presentation. Our most critical collectors are snmpcollector (SNMP) and ipplan-pinger (ICMP), closely followed by dhcpinfo (DHCP lease stats). We also have some scripts that dump stats about other systems into node_exporter‘s textfile collector.

dh_dhmon_architecture-cb49a96e082

The current architecture plan of dhmon as of Summer 2015

We use Prometheus as a central timeseries storage and querying engine, but we also use Redis and memcached to export snapshot views of binary information that we collect but cannot store in Prometheus in any sensible way, or when we need to access very fresh data.

One such case is in our presentation layer. We use our dhmap web application to get an overview of the overall health of the access switches. In order to be effective at resolving errors, we need a latency of ~10 seconds from data collection to presentation. Our goal is to have fixed the problem before the customer notices, or at least before they have walked over to the support people to report an issue. For this reason, we have been using memcached since the beginning to access the latest snapshot of the network.

We continued to use memcached this year for our low-latency data, while using Prometheus for everything that’s historical or not as latency-sensitive. This decision was made simply because we were unsure how Prometheus would perform at very short sampling intervals. In the end, we found no reason for why we can’t use Prometheus for this data as well – we will definitely try to replace our memcached with Prometheus at the next DreamHack.

dh_dhmon_visualization-cbf5fbcd3c6

The overview of our access layer visualized by dhmon

Prometheus setup

The block that so far has been referred to as Prometheus really consists of three products:Prometheus, PromDash, and Alertmanager. The setup is fairly basic and all three components are running on the same host. Everything is served by an Apache web server that just acts as a reverse proxy.

ProxyPass /prometheus http://localhost:9090/prometheus
ProxyPass /alertmanager http://localhost:9093/alertmanager
ProxyPass /dash http://localhost:3000/dash

Exploring the network

Prometheus has a powerful querying engine that allows you to do pretty cool things with the streaming information collected from all over your network. However, sometimes the queries need to process too much data to finish within a reasonable amount of time. This happened to us when we wanted to graph the top 5 utilized links out of ~18,000 in total. While the query worked, it would take roughly the amount of time we set our timeout limit to, meaning it was both slow and flaky. We decided to use Prometheus’ recording rules for precomputing heavy queries.

precomputed_link_utilization_percent = rate(ifHCOutOctets{layer!=’access’}[10m])*8/1000/1000
/ on (device,interface,alias)
ifHighSpeed{layer!=’access’}

After this, running topk(5, precomputed_link_utilization_percent) was blazingly fast.

Being reactive: alerting

So at this stage we had something we could query for the state of the network. Since we are humans, we don’t want to spend our time running queries all the time to see if things are still running as they should, so obviously we need alerting.

For example: we know that all our access switches use GigabitEthernet0/2 as an uplink. Sometimes when the network cables have been in storage for too long they oxidize and are not able to negotiate the full 1000 Mbps that we want.

The negotiated speed of a network port can be found in the SNMP OID IF-MIB::ifHighSpeed. People familiar with SNMP will however recognize that this OID is indexed by an arbitrary interface index. To make any sense of this index, we need to cross-reference it with data from SNMP OID IF-MIB::ifDescr to retrieve the actual interface name.

Fortunately, our snmpcollector supports this kind of cross-referencing while generating Prometheus metrics. This allows us in a simple way to not only query data, but also define useful alerts. In our setup we configured the SNMP collection to annotate any metric under the IF-MIB::ifTable and IF-MIB::ifXTable OIDs with ifDescr. This will come in handy now when we need to specify that we are only interested in the GigabitEthernet0/2 port and no other interface.

Let’s have a look at what such an alert definition looks like.

ALERT BadUplinkOnAccessSwitch
IF ifHighSpeed{layer=’access’, interface=’GigabitEthernet0/2′} < 1000 FOR 2m
SUMMARY “Interface linking at {{$value}} Mbps”
DESCRIPTION “Interface {{$labels.interface}} on {{$labels.device}} linking at {{$value}} Mbps”

Done! Now we will get an alert if a switch’s uplink suddenly links at a non-optimal speed.

Let’s also look at how an alert for an almost full DHCP scope looks like:

ALERT DhcpScopeAlmostFull
IF ceil((dhcp_leases_current_count / dhcp_leases_max_count)*100) > 90 FOR 2m
SUMMARY “DHCP scope {{$labels.network}} is almost full”
DESCRIPTION “DHCP scope {{$labels.network}} is {{$value}}% full”

We found the syntax to define alerts easy to read and understand even if you had no previous experience with Prometheus or time series databases.

dh_prometheus_alerts-cbabd22b60f

Oops! Turns out we have some bad uplinks, better run out and fix it!

Being proactive: dashboards

While alerting is an essential part of monitoring, sometimes you just want to have a good overview of the health of your network. To achieve this we usedPromDash. Every time someone asked us something about the network, we crafted a query to get the answer and saved it as a dashboard widget. The most interesting ones were then added to an overview dashboard that we proudly displayed.

dh_dhmon_dashboard-cbc14d81579
The DreamHack Overview dashboard powered by PromDash

The future

While changing an integral part of any system is a complex job and we’re happy that we managed to integrate Prometheus in just one event, there are without a doubt a lot of areas to improve. Some areas are pretty basic: using more precomputed metrics to improve performance, adding more alerts, and tuning the ones we have. Another area is to make it easier for operators: creating an alert dashboard suitable for our network operations center (NOC), figuring out if we want to page the people on-call, or just let the NOC escalate alerts.

Some bigger features we’re planning on adding: syslog analysis (we have a lot of syslog!), alerts from our intrusion detection systems, integrating with our Puppet setup, and also integrating more across the different teams at DreamHack. We managed to create a proof-of-concept where we got data from one of the electrical current sensors into our monitoring, making it easy to see if a device is faulty or if it simply doesn’t have any electricity anymore. We’re also working on integrating with the point-of-sale systems that are used in the stores at the event. Who doesn’t want to graph the sales of ice cream?

Finally, not all services that the team operates are on-site, and some even run 24/7 after the event. We want to monitor these services with Prometheus as well, and in the long run when Prometheus gets support for federation, utilize the off-site Prometheus to replicate the metrics from the event Prometheus.

Closing words

We’re really excited about Prometheus and how easy it makes setting up scalable monitoring and alerting from scratch.

A huge shout-out to everyone that helped us in #prometheus on FreeNode during the event. Special thanks to Brian Brazil, Fabian Reinartz and Julius Volz. Thanks for helping us even in the cases where it was obvious that we hadn’t read the documentation thoroughly enough.

Finally, dhmon is all open-source, so head over to https://github.com/dhtech/ and have a look if you’re interested. If you feel like you would like to be a part of this, just head over to #dreamhack on QuakeNet and have a chat with us. Who knows, maybe you will help us build the next DreamHack?

Deploying 1000 nodes of OpenShift on the CNCF Cluster (Part 1)

By | Blog

By Jeremy Eder, Red Hat, Senior Principal Software Engineer

Imagine being able to stand up thousands of tenants with thousands of apps, running thousands of Docker-formatted container images and routes, on a self healing cluster. Take that one step further with all those images being updatable through a single upload to the registry, all without downtime. We did just that on Red Hat OpenShift Container Platform running on Red Hat OpenStack on a 1000 node cluster, and this blog tells you how we deployed:

Kubernetes Object Quantity
Nodes 1,000
Namespaces (projects) 13,000
Pods 52,000
Build Configs 39,000
Templates 78,000
Image Streams 13,000
Deployment Configs and Services 39,000 (Incl. 13,000 Replication Controllers)
Secrets 260,000
Routes 39,000

The mission of the Cloud Native Computing Foundation (CNCF) is to create and drive the adoption of a new computing paradigm that is optimized for modern, distributed systems environments capable of scaling to tens of thousands of self healing multi-tenant nodes. That’s a great vision, but how do you get there?

The CNCF community is making the CNCF cluster available to advance this mission. Comprised of 1000 nodes, it provides a great utility to the open source community. It’s rare to come across large swaths of high-end bare metal, and over the last 8 weeks engineers at Red Hat have put the environment to great use to stress test our open source solutions for our customers.

Why OpenStack?

We were only granted 300 nodes of the 1000 at CNCF, and as we wanted to test scaling up to 1000 nodes, we made the decision to also deploy OpenStack. We deployed Red Hat OpenStack Platform 8 based on OpenStack Liberty to provide virtual machines upon which we would install Red Hat OpenShift Container Platform 3.3 (based on Kubernetes 1.3, and currently in beta).

OpenShift Container Platform on top of OpenStack is also a common customer configuration and we wanted to put it thru it’s paces as well. We’re looking forward to future testing scenario where we deploy OpenShift on a 1000 node cluster directly onto bare metal, and aim to write up a comparison of deployment, performance, and other issues in a future blog post here.

Many cloud native systems are moving toward being container-packaged, dynamically managed and micro-services oriented. That describes Red Hat OpenShift Container Platform (built on Red Hat Enterprise Linux, Docker and Kubernetes) to a tee. Red Hat OpenShift Container Platform and Red Hat OpenStack Platform are modern, distributed systems capable of deployment at scale.

This blog post documents the first phase of our ongoing testing efforts on the CNCF environment.

We were able to stand up thousands of tenants with thousands of apps, thousands of Docker-formatted container images, pods and routes, on a self healing cluster, that are all updatable with a single new upload to the registry — without taking downtime. It was glorious — a 1,000 node container platform based on the leading Kubernetes orchestration project, that can house an enormous application load from a diverse set of workloads. Even under these incredible load levels, we had plenty of head-room.

Hardware

 hardware

Software

software

Goals

  1. 1000 node OpenShift Cluster and Reference Design
  2. Push system to it’s limit
  3. Identify config changes and best practices to increase capacity and performance
  4. Document and file issues upstream and send patches where applicable

Inside the CNCF environment, Red Hat engineers deployed an OpenShift-on-OpenStack environment. Because of our previous experience with scalability and performance of both products, we first set out to implement an OpenStack environment that would meet the needs of the OpenShift scalability tests to be run on top.

We used three OpenStack controllers, and configured OpenStack to place VMs on the local NVME disk in each node. We then configured two host aggregate groups so that we could run two separate 1000-node OpenShift clusters simultaneously in order to parallelize our efforts. We configured Neutron to use VXLAN tunnels, and created image flavors to match our OpenShift v3 Scaling, Performance and Capacity Planning guide.

Here is a logical diagram of the environment. We deployed a single director (TripleO) node that also served as our jumpbox/utility node, and three OpenStack controllers which served images to the 300 OpenStack nodes from the image service (Glance).

diagram

This deployment topology represents best practices for large scale OpenShift-on-OpenStack deployments.

To help speed installation and reduce burden on support systems (i.e. yum repos and docker registries), we pre-pulled the necessary OpenShift containers onto each node.

Standing up this cloud took less time and effort than anticipated, thanks in part to the open source communities around OpenStack, Ansible, and Kubernetes. Those communities are working to push, pull and contort these projects in every possible direction, as well as generate tools to make customers happy — the community around Ansible is incredibly helpful and fast moving.

Once we had the tenants, host aggregates, networks, and base images imported into glance, we were able to begin deploying OpenShift Container Platform. Here is an architectural diagram of what we built:

pasted image 0 (1)

More concretely, we deployed the following inside each of the two host aggregates:

1

2

Workloads

To deliver on our main objectives to scale OpenShift up to 1000 nodes, Red Hat’s OpenShift performance and scalability team has written test harnesses (open source of course) to drive Kubernetes and OpenShift.

The primary workload utility we use to deploy content into an OpenShift environment is called the “cluster-loader”. The cluster loader takes as input a yaml file(that describes a complete environment as you’d like to to be deployed):

I’d like an environment with thousands of deployment configs (which include services and replication controllers), thousands more routes, pods (each with a persistent storage volume automatically attached), secrets, image streams, build definitions, etc.

Cluster-loader provides flexible, sophisticated and feature-rich scalability test capabilities for Kubernetes. You only need to vary the content of the yaml file to represent what you expect your environment to look like.  If you have created templates for your own in-house application stack and have workload generators), it’s easy to point cluster-loader at them, and optionally feed them test-specific variables in the yaml config.

In addition to cluster-loader, we have also written application performance, network performance, reliability and disk I/O performance automation in the same repository.

The cluster-loader yaml configuration we used for the 1000-node test is below:

5_0

What this says is that we want to create:

  • 20000 projects (namespaces), and within each project also create
    • 1 user
    • 3 build configs
    • 6 templates
    • 1 image stream
    • 2 deploymentconfigs, each with a 256byte environment variable
    • 1 deploymentconfig, with 2 replication controllers and a 256byte environment variable
    • 20 secrets
    • 3 routes

This config represents a reasonable mix of features used by developers in a shared web hosting cluster.

The last stanza of the configuration is a “tuningset” (our rate-limiting mechanism for the cluster-loader) where it’s creating 5 projects, waiting 250ms between each, and 10 seconds in between each step of 5.  However due to the scale we were trying to achieve, this stanza was not used in this test.

Because of the amount of Kubernetes “objects” this test run creates, we noted that etcd was using a fair amount of disk space, and thus our guidance for large environments has increased to 20GB (max during our test was 12.5GB, shown below).

etcd-disk-utilization

Another issue encountered during test runs was https://github.com/kubernetes/kubernetes/pull/29093 which you can see below as a cycle of panics/restarts of the kubernetes master service, occuring around 19:00 (approx 13,000 cluster-loader projects, 52,000 pods). This issue has already been resolved upstream and in the OpenShift 3.3 product.

cpu-utilization

A brief detour through Ansible

Somewhat surprisingly, our use of CNCF took a brief detour through ansible-land. The product installer for OpenShift is written in Ansible, and is great reference of truly sophisticated roles and playbooks in the field. To meet the needs of customers, OpenShift 3.3 includes Ansible 2.1, which adds advanced certificate management among other capabilities. As with any major version upgrade, lots of refactoring and optimization occurred not only inside Ansible but the in the OpenShift installer as well.

When we began scale testing the first builds of OpenShift 3.3 we first noticed that large installs (100+ nodes) took longer than they had in the past. In addition, we also saw Ansible’s memory usage increase, and even hit a few OOMs along the way.

We did some research, and found that users had reported the issue to Ansible a few weeks earlier, and work was underway to resolve the issue. We collaborated with James Cammarata and Andrew Butcher from Ansible on performance optimizations to both Ansible core, as well as how OpenShift used it.

With the updated code merged into Ansible and OpenShift, 100-node Installs of the OpenShift 3.3 alpha were now back to normal — around 22 minutes, and were down to the expected memory usage, in-line with the resource requirements of Ansible 1.9.

pasted image 0 (2)Summary

Working with the CNCF cluster taught us a number of  important lessons about some key areas where we can add value to Kubernetes moving forward, this includes:

  1. Adding orchestration-level support for additional workloads,
  2. Moving aggressively to a core+plugin model,
  3. Working on more elegant support for high performance workloads
  4. and continuing to refine developer-friendly workflows that deliver on the promise of cloud-native computing.

Benchmarking on CNCF’s 1000 node cluster provided us with a wealth of information. While there were some teething pains as we brought this new lab gear online, we worked very closely with Intel’s lab support folks to get our environment successfully deployed. We’ll be sending in a Pull Request soon tocncf/cluster, continuing testing out bare metal scenarios and posting the results in a follow blog post (Part 2).

Want to know what Red Hat’s Performance and Scale Engineering team is working on next? Check out our Trello board. Oh, and we’re hiring!

Follow us on Twitter @jeremyeder @timothysc @mffiedler @jtaleric @thejimic @akbutcher

Ambassador Program, Meetup Program + Community Store Available for Growing Cloud Native Community

By | Blog

Now Open: CNCF Community Store, Ambassador Program, and Meetup Community

Screen Shot 2016-08-19 at 12.44.04 PM

Today’s cloud native ecosystem is growing at an incredibly rapid pace – as new technologies are continuously introduced and current applications are ever-evolving.

Our new Cloud Native Ambassador Program will help spread cloud native practices and technology across the world. Cloud Native Ambassadors are individuals who are passionate about cloud native technology and are recognized for their expertise and willingness to help others learn about the wider CNCF community. Our Ambassadors are experts working at CNCF member organizations.

 

Follow

Chuck Svoboda @CharlesRishard

Pretty excited about the #CNCF Ambassador program.@CloudNativeFdn #Kubernetes

7:44 PM – 17 Jun 2016 · SBIC, Baltimore, United States

“CNCF Ambassadors are uniquely positioned as advocates to support both developers and operators in not only defining tomorrow’s infrastructure, but also how applications natively leverage this infrastructure,” said Lee Calcote, technology leader in Clouds, containers and their management. “Functioning as an underground conference scene, meetups are surprisingly refreshing in their candidness, content, and convenience – three C’s that deliver value to me as an organizer.”

We are also launching a meetup program to help support our CNCF community and related meetups across the world. CNCF covers meetup costs and more for qualifying meetups. To start your own CNCF meetup, email us at meetups@cncf.io.

We currently have CNCF meetups all over the world, providing the ongoing education and community gathering space needed to navigate this new computing paradigm. Starting today, we have more than 41 meetups, representing more than 11,409 members across the globe. Go to http://www.meetup.com/pro/cncf/to keep up-to-date on future CNCF meetups like “Kubernetes deep dive and anniversary celebration” in Bagalore on August 27th and “Kubernetes with Weaveworks and Apcera” in San Francisco on September 7th.

“The number of meetups focused on cloud computing, containers, cloud native, CNCF’s projects and related open source technologies continues to grow on a global scale. There’s major interest among developers to learn more about any facet of open source, cloud native technologies from our very own subject matter experts,” said Chris Aniszczyk, COO of Cloud Native Computing Foundation.

“I am honored to participate in this program as a Cloud Native Ambassador and I look forward to hosting our first Canada Cloud Native meetups with my co-organizer Todd Wilson from the Office of the CIO for the Province of British Columbia in both Vancouver and Victoria,” said Diane Mueller, Director Community Development at Red Hat. “Community-based meetup programs combine all the topics that are part of the Cloud Native’s new computing paradigm under a single umbrella and allow developers to gain a better understanding of how to optimize them for modern distributed systems environments, and focus on the advancement of and innovation within promising technologies in context, this was unified approach – until now – noticeably missing from the tech community.”

The meetups discuss CNCF related technology like Kubernetes and Prometheus; topics range from container networking and application services to multi-cloud engineering, scalability, microservices, and much more.

Popular meetups include The Bay Area Kubernetes Meetup, which has an impressive 1,675 members; the Cloud Computing Meetup, based in Madrid, which gathers more than 800 members; and Kubernetes and Cloud Native New York, which has more than 530 “Kubelets” discussing how to push the boundaries of what K8S can do.

The CNCF is also excited to announce the launch of a new community store, featuring items including stickers, tees, hoodies, and more. Fans of CNCF, Kubernetes, and Prometheus can gear up to show their support at https://store.cncf.io/

pasted image 0Screen Shot 2016-08-23 at 6.10.39 AMScreen Shot 2016-08-22 at 8.46.50 AM

To keep up with community developments like these join us on Slack: https://slack.cncf.io/!

To join CNCF, go to: https://cncf.io/join.

Only at CloudNativeDay: Your Checklist for a Cloud Native World

By | Blog

cloudnativeday-speaker-card-v4-bhindman

Founder and Chief Architect at Mesosphere, Ben Hindman, knows that building cloud native applications is only half of the challenge. Once you start running them at scale, you then run into outages and the need to make upgrades. In his keynote, “2nd Day Operations: Your Checklist for a Cloud Native World,” Hindman will share the critical challenges you’ll want to be prepared for, and the surprising realizations that you do not want to be surprised by.

Co-creator of Apache Mesos as a PhD student at UC Berkeley before bringing it to Twitter, Hindman will also share his advice from years as a leader in the cloud native movement, working on the front lines with 2nd day operators around the world.

Don’t miss out on this and other perspectives that you’ll only find at CloudNativeDay. Register Today!

Only at CloudNativeDay: Abby Kearns “Opens Up” the Cloud Foundry Service Broker API

By | Blog

cloudnativeday-speaker-card-v4-akernsServices are integral to the success of a platform, and, for Cloud Foundry, the ability to connect to and manage services is a crucial piece of its platform.

During CloudNativeDay on August 25th in Toronto, Abby Kearns, VP of industry strategy for Cloud Foundry Foundation (CFF), will discuss why CFF created a cross-foundation Working Group with CNCF. The group is working to determine how the Cloud Foundry Service Broker API can be opened up and leveraged as an industry-standard specification for connecting services to platforms.

In her presentation, “How Cloud Foundry Foundation & Cloud Native Computing Foundation Are Collaborating to Make the Cloud Foundry Service Broker API the Industry Standard” Kearns will share the vision of the Cloud Foundry Service Broker API, its importance to the industry, and the latest progress on theproof of concepts underway to allow services to connect to multiple platforms with a single API.

Don’t miss out on this and other perspectives that you’ll only find at CloudNativeDay. Register Today!

1 33 34 35 37