KubeCon + CloudNativeCon Amsterdam | March 30 – April 2 | Best Pricing Ends February 2 | Learn more



Prometheus User Profile: Compose Explains their Monitoring Journey from Graphite + InfluxDB to Prometheus

By | Blog

Compose is a fully-managed platform used by developers to deploy, host and scale databases. Its platform streamlines spinning up and managing MongoDB, Elasticsearch, Redis, RethinkDB, PostgreSQL, MySQL, RabbitMQ and etcd databases – both for single developers hacking away on an app, or large projects running millions of queries.

Before joining forces with Prometheus, the company tried a number of different metrics systems. The team now monitors more than 400,000 distinct metrics every 15 seconds for around 40,000 containers on 450 hosts with a single m4.xlarge prometheus instance with 1TB of storage with Prometheus.

In the below blog, originally published by Prometheus, Compose weighs in on their monitoring journey and transition from Graphite and InfluxDB to Prometheus.

To hear more stories about Prometheus’ production use, participate in technical sessions on the monitoring tool, and learn how it integrates with Kubernetes and other open source technologies, attend PromCom 2017, August 17-18 at Google Munich. Speaking submissions close May 31st. Submit here.

Interview with Compose

Posted at: September 21, 2016 by Brian Brazil

Continuing our series of interviews with users of Prometheus, Compose talks about their monitoring journey from Graphite and InfluxDB to Prometheus.

Can you tell us about yourself and what Compose does?

Compose delivers production-ready database clusters as a service to developers around the world. An app developer can come to us and in a few clicks have a multi-host, highly available, automatically backed up and secure database ready in minutes. Those database deployments then autoscale up as demand increases so a developer can spend their time on building their great apps, not on running their database.

We have tens of clusters of hosts across at least two regions in each of AWS, Google Cloud Platform and SoftLayer. Each cluster spans availability zones where supported and is home to around 1000 highly-available database deployments in their own private networks. More regions and providers are in the works.

What was your pre-Prometheus monitoring experience?

Before Prometheus, a number of different metrics systems were tried. The first system we tried was Graphite, which worked pretty well initially, but the sheer volume of different metrics we had to store, combined with the way Whisper files are stored and accessed on disk, quickly overloaded our systems. While we were aware that Graphite could be scaled horizontally relatively easily, it would have been an expensive cluster. InfluxDB looked more promising so we started trying out the early-ish versions of that and it seemed to work well for a good while. Goodbye Graphite.

The earlier versions of InfluxDB had some issues with data corruption occasionally. We semi-regularly had to purge all of our metrics. It wasn’t a devastating loss for us normally, but it was irritating. The continued promises of features that never materialised frankly wore on us.

Why did you decide to look at Prometheus?

It seemed to combine better efficiency with simpler operations than other options.

Pull-based metric gathering puzzled us at first, but we soon realised the benefits. Initially it seemed like it could be far too heavyweight to scale well in our environment where we often have several hundred containers with their own metrics on each host, but by combining it with Telegraf, we can arrange to have each host export metrics for all its containers (as well as its overall resource metrics) via a single Prometheus scrape target.

How did you transition?

We are a Chef shop so we spun up a largish instance with a big EBS volume and then reached right for a community chef cookbook for Prometheus.

With Prometheus up on a host, we wrote a small Ruby script that uses the Chef API to query for all our hosts, and write out a Prometheus target config file. We use this file with a file_sd_config to ensure all hosts are discovered and scraped as soon as they register with Chef. Thanks to Prometheus’ open ecosystem, we were able to use Telegraf out of the box with a simple config to export host-level metrics directly.

We were testing how far a single Prometheus would scale and waiting for it to fall over. It didn’t! In fact it handled the load of host-level metrics scraped every 15 seconds for around 450 hosts across our newer infrastructure with very little resource usage.

We have a lot of containers on each host so we were expecting to have to start to shard Prometheus once we added all memory usage metrics from those too, but Prometheus just kept on going without any drama and still without getting too close to saturating its resources. We currently monitor over 400,000 distinct metrics every 15 seconds for around 40,000 containers on 450 hosts with a single m4.xlarge prometheus instance with 1TB of storage. You can see our host dashboard for this host below. Disk IO on the 1TB gp2 SSD EBS volume will probably be the limiting factor eventually. Our initial guess is well over-provisioned for now, but we are growing fast in both metrics gathered and hosts/containers to monitor.

At this point the Prometheus server we’d thrown up to test with was vastly more reliable than the InfluxDB cluster we had doing the same job before, so we did some basic work to make it less of a single-point-of-failure. We added another identical node scraping all the same targets, then added a simple failover scheme with keepalived + DNS updates. This was now more highly available than our previous system so we switched our customer-facing graphs to use Prometheus and tore down the old system.

What improvements have you seen since switching?

Our previous monitoring setup was unreliable and difficult to manage. With Prometheus we have a system that’s working well for graphing lots of metrics, and we have team members suddenly excited about new ways to use it rather than wary of touching the metrics system we used before.

The cluster is simpler too, with just two identical nodes. As we grow, we know we’ll have to shard the work across more Prometheus hosts and have considered a few ways to do this.

What do you think the future holds for Compose and Prometheus?

Right now we have only replicated the metrics we already gathered in previous systems – basic memory usage for customer containers as well as host-level resource usage for our own operations. The next logical step is enabling the database teams to push metrics to the local Telegraf instance from inside the DB containers so we can record database-level stats too without increasing number of targets to scrape.

We also have several other systems that we want to get into Prometheus to get better visibility. We run our apps on Mesos and have integrated basic Docker container metrics already, which is better than previously, but we also want to have more of the infrastructure components in the Mesos cluster recording to the central Prometheus so we can have centralised dashboards showing all elements of supporting system health from load balancers right down to app metrics.

Eventually we will need to shard Prometheus. We already split customer deployments among many smaller clusters for a variety of reasons so the one logical option would be to move to a smaller Prometheus server (or a pair for redundancy) per cluster rather than a single global one.

For most reporting needs this is not a big issue as we usually don’t need hosts/containers from different clusters in the same dashboard, but we may keep a small global cluster with much longer retention and just a modest number of down-sampled and aggregated metrics from each cluster’s Prometheus using Recording Rules.

CNCF Hosts Container Networking Interface (CNI)

By | Blog

Today, the Cloud Native Computing Foundation (CNCF) Technical Oversight Committee (TOC) voted to accept CNI (Container Networking Interface) as the 10th hosted project alongside Kubernetes, Prometheus, OpenTracing, Fluentd, Linkerd, gRPC, CoreDNS, containerd, and rkt.

Container-based applications are rapidly moving into production. Just as Kubernetes allows enterprise developers to run containers en masse across thousands of machines, containers at scale also need to be networked.

The CNI project is a network interface created by multiple companies and projects; including CoreOS, Red Hat OpenShift, Apache Mesos, Cloud Foundry, Kubernetes, Kurma and rkt. First proposed by CoreOS to define a common interface between the network plugins and container execution, CNI is designed to be a minimal specification concerned only with the network connectivity of containers and removing allocated resources when the container is deleted.

The CNCF TOC wanted to tackle the basic primitives of cloud native and formed a working group around cloud native networking,” said Ken Owens, TOC project sponsor and CTO at Cisco. CNI has become the defacto network interface today and has several interoperable solutions in production. Adopting CNI for the CNCF’s initial network interface for connectivity and portability is our primary order of business. With support from CNCF, our work group is in an excellent position to continue our work and look at models, patterns, and policy frameworks.”

“Interfaces really need to be as simple as possible. What CNI offers is a nearly trivial interface against which to develop new plugins. Hopefully this fosters new ideas and new ways of integrating containers and other network technologies,” said Tim Hockin, Principal Software Engineer at Google. “CNCF is a great place to nurture efforts like CNI, but CNI is still young, and it almost certainly needs fine-tuning to be as air-tight as it should be. At this level of the stack, networking is one of those technologies that should be ‘boring’ – it needs to work, and work well, in all environments.”

Used by companies like Ticketmaster, Concur, CDK Global, and BMW, CNI is now used for Kubernetes network plugins and has been adopted by the community and many product vendors for this use case. In the CNI repo there is a basic example for connecting Docker containers to CNI networks.

“CoreOS created CNI years ago to enable simple container networking interoperability across container solutions and compute environments. Today CNI has a thriving community of third-party networking solutions users can choose from that plug into the Kubernetes container infrastructure,” said Brandon Philips, CTO of CoreOS. “And since CoreOS Tectonic uses pure-upstream Kubernetes in an Enterprise Ready configuration we help customers deploy CNI-based networking solutions that are right for their environment whether on-prem or in the cloud.”

Automated Network Provisioning in Containerized Environments

CNI has three main components:

  1. CNI Specification: defines an API between runtimes and network plugins for container network setup. No more, no less.
  2. Plugins: provide network setup for a variety of use-cases and serve as reference examples of plugins conforming to the CNI specification
  3. Library: provide a Go implementation of the CNI specification that runtimes can use to more easily consume CNI

CNI specification and libraries exist to write plugins to configure network interfaces in Linux containers. The plugins support the addition and removal of container network interfaces to and from networks. Defined by a JSON schema, its templated code makes it straight-forward to create a CNI plugin for an existing container networking project or a good framework for creating a new container networking project from scratch.

“As early supporters and contributors to the Kubernetes CNI design and implementation efforts, Red Hat is pleased that the Cloud Native Computing Foundation has decided to add CNI as a hosted project and to help extend CNI adoption. Once again, the power of cross-community, open source collaboration has delivered a specification that can help enable faster container innovation. Red Hat OpenShift Container Platform embraced CNI both to create a CNI plugin for its default OpenShift SDN solution based on Open vSwitch, and to allow for easier replacement by other third party CNI-compatible networking plugins. CNI is now the recommended way to enable networking solutions for OpenShift. Other projects like Open Virtual Networking (OVN) project have used CNI to integrate more cleanly and quickly with Kubernetes. As CNI gets widely adopted, the integration can automatically extend to other popular frameworks.” — Diane Mueller, Director, Community Development Red Hat OpenShift


Graphic courtesy of Lee Calcote, Sr. Director, Technology Strategy at SolarWinds

Notable Milestones:

  • 56 Contributors
  • 591 Github stars
  • 17 releases
  • 14 plugins

Adopters (Plugins):

CNI provides a much needed common interface between network layer plugins and container execution,” said Chris Aniszczyk, COO of Cloud Native Computing Foundation. “Many of our members and projects have adopted CNI, including Kubernetes and rkt. CNI works with all the major container networking runtimes.

As a CNCF hosted project, CNI will be part of a neutral community aligned with technical interests, receive help in defining an initial guideline for a network interface specification focused on connectivity and portability of cloud native application patterns. CNCF will also assist with CNI marketing and documentation efforts.

“The CNCF network working group’s first objective of curating and promoting a networking project for adoption was a straightforward task – CNI’s ubiquity across the container ecosystem is unquestioned,” said Lee Calcote, Sr. Director, Technology Strategy at SolarWinds. “The real challenge is addressing the remaining void around higher-level network services. We’re preparing to set forth on this task, and on defining and promoting common cloud-native networking models.” Anyone interested in seeing CNI in action should check out Calcote’s talk on container networking at Velocity on June 21.

For more on CNI, read The New Stack article or take a look at the KubeCon San Francisco slide deck “Container Network Interface: Network Plugins for Kubernetes and beyond” by Eugene Yakubovich of CoreOS. CNI will also have a technical salon at CloudNativeCon + KubeCon North America 2017 in Austin on December 6.

To join or learn more about the Kubernetes SIGs and Working Groups, including the Networking SIG, click here. To join the CNCF Networking WG, click here.

Stay up to date on all CNCF happenings by signing up for our monthly newsletter.

Diversity Scholarship Series: Inspired by Kubernetes and Its Community

By | Blog

CNCF offered six diversity scholarships to developers to attend CloudNativeCon + KubeCon Europe 2017. In this post, our scholarship recipient Kim Lehtinen, second-year student at University of Vaasa in Finland, shares his experience attending sessions and meeting the community. Anyone interested in applying for the CNCF diversity scholarship to attend CloudNativeCon + KubeCon North America 2017 in Austin December 6-8, can submit an application for here. Applications due October 13th.

By Kim Lehtinen, second-year student at University of Vaasa, in Finland

Kim Lehtinen with Lucas Kaldstrom, keynote speaker at the Berlin Congress Center

As a second-year student at University of Vaasa in Finland studying information technology, I plan to get my bachelor’s degree next year and continue my studies to receive a master’s degree in software engineering. In Finland, it is actually uncommon not to pursue a master’s if you study at a university.

My friend and Kubernetes maintainer Lucas Käldström introduced me to Kubernetes last year and since then we have been hacking with Kubernetes. He has taught me a lot. It’s also been amazing to follow his work, as he has achieved so much in such a short time.

Lucas attended CloudNativeCon + KubeCon North America 2016 on a diversity scholarship last year and was selected as a keynote speaker for CloudNativeCon + KubeCon Europe 2017. He encouraged me to apply for the diversity scholarship from the Linux Foundation to attend the show with him and I was lucky enough to be accepted.

I still can’t believe it, will I ever? It’s been such an honor to attend the event, I’ve learned so much from such nice, smart and passionate people.

There were a lot of great companies at the event. It was amazing to see how companies use Kubernetes to make new platforms and software to meet internal and user needs. The most amazing thing about the people at the conference was not how smart they were or what they have accomplished, but how passionate they are about what they are doing. It’s such an inspiration.

I had the chance to talk with one my biggest inspirations: Mr. Kelsey Hightower himself. I’ve followed his work for a long time now, watched all his talks on YouTube, all the podcasts he has been featured in and I finally got to meet him! Of course I had to take a selfie!

A bit blurry selfie with Kelsey Hightower, but still got one!

Other great highlights from the event include Lucas’s talk about multi-platform Kubernetes, and Joe Beda’s talk on growing the Kubernetes user base. In general, what I enjoyed the most was talking with all the amazing people.

Although I haven’t had a lot of time to experiment and hack with all these amazing cloud native technologies as I am very busy with school, I’ve still learned so much from all the amazing speeches, companies and attendees. You do not have to be a genius to realize how these people are changing the world. My goal is to start contributing to Kubernetes. Right now, I’m most interested in the cluster lifecycle SIG. My long-term goal is to convince companies in Finland to start using Kubernetes because it is not that widespread here yet.

Thank you Linux Foundation and CNCF for the scholarship, the opportunity and experience…thank you for everything!

Prometheus User Profile: ShuttleCloud Explains Why Prometheus Is Good for Your Small Startup

By | Blog

ShuttleCloud is a small startup specialized in email and contacts migrations. The company developed a reliable migration platform in high availability used by clients like Gmail, Gcontacts and Comcast. For example, Gmail alone has imported data for 3 million users with our API and we process hundreds of terabytes every month.

Before transitioning to Prometheus, the company had near-zero monitoring. Now they have all of their infrastructure monitored with the necessary metrics and alerts. ShuttleCloud currently has around 200 instances monitored with a comfortable cost-effective in-house monitoring stack based on Prometheus.

In the below blog, originally published by Prometheus, ShuttleCloud talks about why a company does not need a big fleet to embrace Prometheus and that it is a non-expensive solution for monitoring.

Ignacio P. Carretero, Software Engineer at ShuttleCloud, also spoke on this topic at CloudNativeCon + KubeCon North America 2016. The video of his presentation can be found here and slides can be found here.

To hear more stories about Prometheus’ production use, participate in technical sessions on the monitoring tool, and learn how it integrates with Kubernetes and other open source technologies, attend PromCom 2017, August 17-18 at Google Munich. Speaking submissions close May 31st. Submit here.

Interview with ShuttleCloud

Posted at: September 7, 2016 by Brian Brazil

Continuing our series of interviews with users of Prometheus, ShuttleCloud talks about how they began using Prometheus. Ignacio from ShuttleCloud also explained how Prometheus Is Good for Your Small Startup at PromCon 2016.

What does ShuttleCloud do?

ShuttleCloud is the world’s most scalable email and contacts data importing system. We help some of the leading email and address book providers, including Google and Comcast, increase user growth and engagement by automating the switching experience through data import.

By integrating our API into their offerings, our customers allow their users to easily migrate their email and contacts from one participating provider to another, reducing the friction users face when switching to a new provider. The 24/7 email providers supported include all major US internet service providers: Comcast, Time Warner Cable, AT&T, Verizon, and more.

By offering end users a simple path for migrating their emails (while keeping complete control over the import tool’s UI), our customers dramatically improve user activation and onboarding.

ShuttleCloud’s integration with Google’s Gmail Platform. Gmail has imported data for 3 million users with our API.

ShuttleCloud’s technology encrypts all the data required to process an import, in addition to following the most secure standards (SSL, oAuth) to ensure the confidentiality and integrity of API requests. Our technology allows us to guarantee our platform’s high availability, with up to 99.5% uptime assurances.

What was your pre-Prometheus monitoring experience?

In the beginning, a proper monitoring system for our infrastructure was not one of our main priorities. We didn’t have as many projects and instances as we currently have, so we worked with other simple systems to alert us if anything was not working properly and get it under control.

  • We had a set of automatic scripts to monitor most of the operational metrics for the machines. These were cron-based and executed, using Ansible from a centralized machine. The alerts were emails sent directly to the entire development team.
  • We trusted Pingdom for external blackbox monitoring and checking that all our frontends were up. They provided an easy interface and alerting system in case any of our external services were not reachable.

Fortunately, big customers arrived, and the SLAs started to be more demanding. Therefore, we needed something else to measure how we were performing and to ensure that we were complying with all SLAs. One of the features we required was to have accurate stats about our performance and business metrics (i.e., how many migrations finished correctly), so reporting was more on our minds than monitoring.

We developed the following system:

  • The source of all necessary data is a status database in a CouchDB. There, each document represents one status of an operation. This information is processed by the Status Importer and stored in a relational manner in a MySQL database.
  • A component gathers data from that database, with the information aggregated and post-processed into several views.
    • One of the views is the email report, which we needed for reporting purposes. This is sent via email.
    • The other view pushes data to a dashboard, where it can be easily controlled. The dashboard service we used was external. We trusted Ducksboard, not only because the dashboards were easy to set up and looked beautiful, but also because they provided automatic alerts if a threshold was reached.

With all that in place, it didn’t take us long to realize that we would need a proper metrics, monitoring, and alerting system as the number of projects started to increase.

Some drawbacks of the systems we had at that time were:

  • No centralized monitoring system. Each metric type had a different one:
    • System metrics → Scripts run by Ansible.
    • Business metrics → Ducksboard and email reports.
    • Blackbox metrics → Pingdom.
  • No standard alerting system. Each metric type had different alerts (email, push notification, and so on).
  • Some business metrics had no alerts. These were reviewed manually.

Why did you decide to look at Prometheus?

We analyzed several monitoring and alerting systems. We were eager to get our hands dirty and check if the a solution would succeed or fail. The system we decided to put to the test was Prometheus, for the following reasons:

  • First of all, you don’t have to define a fixed metric system to start working with it; metrics can be added or changed in the future. This provides valuable flexibility when you don’t know all of the metrics you want to monitor yet.
  • If you know anything about Prometheus, you know that metrics can have labels that abstract us from the fact that different time series are considered. This, together with its query language, provided even more flexibility and a powerful tool. For example, we can have the same metric defined for different environments or projects and get a specific time series or aggregate certain metrics with the appropriate labels:
    • http_requests_total{job=”my_super_app_1″,environment=”staging”} – the time series corresponding to the staging environment for the app “my_super_app_1”.
    • http_requests_total{job=”my_super_app_1″} – the time series for all environments for the app “my_super_app_1”.
    • http_requests_total{environment=”staging”} – the time series for all staging environments for all jobs.
  • Prometheus supports a DNS service for service discovery. We happened to already have an internal DNS service.
  • There is no need to install any external services (unlike Sensu, for example, which needs a data-storage service like Redis and a message bus like RabbitMQ). This might not be a deal breaker, but it definitely makes the test easier to perform, deploy, and maintain.
  • Prometheus is quite easy to install, as you only need to download an executable Go file. The Docker container also works well and it is easy to start.

How do you use Prometheus?

Initially we were only using some metrics provided out of the box by the node_exporter, including:

  • hard drive usage.
  • memory usage.
  • if an instance is up or down.

Our internal DNS service is integrated to be used for service discovery, so every new instance is automatically monitored.

Some of the metrics we used, which were not provided by the node_exporter by default, were exported using the node_exporter textfile collector feature. The first alerts we declared on the Prometheus Alertmanager were mainly related to the operational metrics mentioned above.

We later developed an operation exporter that allowed us to know the status of the system almost in real time. It exposed business metrics, namely the statuses of all operations, the number of incoming migrations, the number of finished migrations, and the number of errors. We could aggregate these on the Prometheus side and let it calculate different rates.

We decided to export and monitor the following metrics:

  • operation_requests_total
  • operation_statuses_total
  • operation_errors_total

We have most of our services duplicated in two Google Cloud Platform availability zones. That includes the monitoring system. It’s straightforward to have more than one operation exporter in two or more different zones, as Prometheus can aggregate the data from all of them and make one metric (i.e., the maximum of all). We currently don’t have Prometheus or the Alertmanager in HA — only a metamonitoring instance — but we are working on it.

For external blackbox monitoring, we use the Prometheus Blackbox Exporter. Apart from checking if our external frontends are up, it is especially useful for having metrics for SSL certificates’ expiration dates. It even checks the whole chain of certificates. Kudos to Robust Perception for explaining it perfectly in their blogpost.

We set up some charts in Grafana for visual monitoring in some dashboards, and the integration with Prometheus was trivial. The query language used to define the charts is the same as in Prometheus, which simplified their creation a lot.

We also integrated Prometheus with Pagerduty and created a schedule of people on-call for the critical alerts. For those alerts that were not considered critical, we only sent an email.

How does Prometheus make things better for you?

We can’t compare Prometheus with our previous solution because we didn’t have one, but we can talk about what features of Prometheus are highlights for us:

  • It has very few maintenance requirements.
  • It’s efficient: one machine can handle monitoring the whole cluster.
  • The community is friendly—both dev and users. Moreover, Brian’s blog is a very good resource.
  • It has no third-party requirements; it’s just the server and the exporters. (No RabbitMQ or Redis needs to be maintained.)
  • Deployment of Go applications is a breeze.

What do you think the future holds for ShuttleCloud and Prometheus?

We’re very happy with Prometheus, but new exporters are always welcome (Celery or Spark, for example).

One question that we face every time we add a new alarm is: how do we test that the alarm works as expected? It would be nice to have a way to inject fake metrics in order to raise an alarm, to test it.

Developing Cloud Native Applications

By | Blog

By Ken Owens, Technologist and Innovation Engineer currently CTO Cloud Native Platforms at Cisco Systems and Cloud Native Computing Foundation (CNCF) Technical Oversight Committee (TOC) representative

*Blog originally posted on DevNetCreate.io

Figure 1: Cloud Native Reference architecture

Software engineering and developer communities are driving the market for cloud consumption and leading each industry into a new era of software-defined disruption. There are no longer questions about elastic and flexible agile development as the way to innovate and reduce time to market for businesses. Open source software plays a key role in the digital transformation to cloud native and understanding how your business strategy needs to address this next disruption in software development is crucial to the success of your business.

Cloud Native applications are a combination of existing and new software development patterns. The existing patterns are software automation (infrastructure and systems), API integrations, and services oriented architectures. The new cloud native pattern consists of microservices architecture, containerized services, and distributed management and orchestration. The journey towards cloud native has started and many organizations are already testing with this new pattern. To be successful in developing cloud native applications it is important that you prepare for the cloud native journey and understand the impact to your infrastructure.

Preparing for the Journey

The first step in a success business transformation is setting the vision and the goal for the organization. Many organizations struggle with transformation because they start with the technology first.

Technology is exciting and challenging, but lessons learned from the industry are not to start there, but with your business mission, vision, and your people.

At the outset of the transformation, it is critical to get your leadership, partners, and customers on board with your plans, set clear expectations, and gather feedback often. It is important to over-communicate at this initial stage and ensure that buy-in is strong. As you progress on the cloud native journey, you will need to get the support and cover from your leadership.

The next step is to assemble the right team and break down the vision for the journey into phases or steps that are further decomposed into development actions or sprints. It is critical to evaluate the team member’s strengths and weaknesses against the organizational goals to accomplish your vision and invest upfront in training. Most good ops and engineers are interested in furthering their career with training.

Lastly, evaluate technology choices and plan for technology integration with your existing back office, support, and IT systems including existing processes you have in place as an organization. You will need to work with other organizations across the business to identify skill sets needed and support the training or staff augmentation requirements of the other organizations.

Infrastructure Impact

If everyone is using the Cloud and everyone is using the same services, your service will perform, fail, and be as secure as your competitors. Perhaps you’re going for “good enough” service offerings, but business need that differentiation and you’re only get differentiation by taking advantage of the both the cloud native software patterns and latest technology advances in the underlying hardware infrastructure.

When evaluating the impact of infrastructure, it is important to start with the end goal of software defined, automated, and integrated in mind. Software defined is often an industry buzzword, but it is very important to look at your existing network, compute, virtualization, storage, and security solutions from a set of software abstractions that can be programmatically configured (set) and consumed (get). Software defined infrastructure is considered part of the infrastructure services (network, compute, storage, and security) that can be configured for the business services to be deployed into or alternatively, the business can deploy a set of services leveraged these abstraction layers, often called blueprints.

Automated infrastructure is the ability to leverage API’s for provisioning a set of services and configuration primitives that enable abstractions and sets of services that deploy pre-defined sets of configuration and then validates the installation and readiness for the application to be deployed. Automation is key for cloud native, as applications need to be able to configure and re-configure in real time based on a number in inputs. These inputs range from failure states, to user demand, to application performance.

Integrated infrastructure considerations are often a secondary thought; however, they should be part of your initial planning. Many applications have dependencies that are internal to the IT infrastructure (i.e. behind the firewall). These dependencies can be data base related, IT compliance related, OSS related, or BSS related. Often, multiple dependencies are discovered as part of the application composition exercise. When evaluating your existing infrastructure, it is important to look at the integration points that your application dependence services have on the cloud native architecture. Many of these integration points can be abstracted as a set of services and API end points.

Defining Business Services

Once you have prepared your organization for the journey and addressed the infrastructure impact- both existing IT, back office, and cloud native technologies, it is time to define business services for the application(s) you are developing. In general, the best way to define business services is to understand the 4 subsystems for a cloud native architecture:

  • Application Composition
  • Policy and Event Framework
  • Application Delivery
  • Common Control and Ops

Figure 2: Cloud Native Architecture Sub-Systems

There are several ways to decompose your application, and in general, there is not a specific right or wrong way. The best guideline from experience is start by the looking at the composition of your application from a set of services. Application composition is as much an art as a science. My recommendation is to look as the application composition as a block box with three components:

Figure 3: Cloud Native Application Composition

  1. Internals of the application black box, which consists of the application design leveraging the functions that comprise the application logic.
  2. North bound interfaces from the black box, which consists of external API interfaces to the customer and external services (external to the your firewall)
  3. South bound interfaces from the black box, which consists of internal API interfaces to internal services, including OSS and BSS.

The application design (internals of the black box) for cloud native design methodology should be thought of as a set of function calls and dependencies. Each function should be independent of the other functions and operate independently as much a possible. The state and scalability should be completely independent.

Each function needs a defined set of policies and events to enable the scalability and resiliency of the service functionality as well as an independent set of common control and operational primitives that enable the individual function service health and control to be performed independent of the other application functional components. The cloud native application composition can be codified into a Cloud Native Application Blueprint as shown below:

Figure 4: Cloud Native Application Composition Blueprint

As mentioned above, the most complicated aspect of the application composition consist of the external and internal services the application depends on. The business models that leverage the OSS and BSS components need to be evaluated in light of business process and function. Some of these services can be containerized while others are not able to be containerized. How to enable a cloud native application to integrate with the OSS and BSS services should not be overlooked.

In addition, external services, especially from cloud providers, are very common. The top concern experienced in cloud native deployments has to do with the latency between the business application services and the external services consumed by the application services. This latency can often times experience timeouts, packet loss, and delays that cause user experience issues. To address this aspect of your cloud native design, understanding the networking impact of your external interfaces and leveraging DNS and SDN controllers to optimize the routing within the application services is required.

Deploy Business Services

Application deployed must be separate from the composition. Application portability is a key business requirement and one of the more reliability methods to achieve this to decouple the application code from the underlying deployment target.

The following guidelines are based on best practices for application portability:

  • Deploying the application into different environments (dev, test, production) each of which can consists of different environments (laptop, server, bare metal, private cloud, or public cloud)
  • Deploying to different locations (data center(s), availability zones, geo-location constraints)
  • Continuous Integration and Continuous Delivery of the application services across environments, location, and hybrid models to ensure that application and application services are continuously refreshed
  • Continually improve and optimize performance and scale. Once the basics and some of these underlying issues of the technology are understood, the team can then focus on improvements from a process and technology aspect. Sprints should incorporate user stories to address performance and stability issues. Then deployments can reflect these improvements along with enhancements to the underlying infrastructure, BSS, and OSS aspects.

Join the Community

Join CNCF in San Francisco May 23 and 24th for DevNet Create 2017, where we will hear from Dan Kohn, Executive Director of CNCF on Migrating Legacy Monoliths, a panel on “Becoming Cloud Native: Taking it One Container at a Time”, and many other sessions on Cloud and DevOps, IoT and Apps, and more!

Diversity Scholarship Series: My experience at CloudNativeCon + KubeCon Europe 2017

By | Blog

CNCF offered six diversity scholarships to developers to attend CloudNativeCon + KubeCon Europe 2017. In this post, our scholarship recipient Konrad Djimeli, University of Buea student, shares his experience meeting the community, participating in technical sessions and bringing his experience back to his community in Africa. Anyone interested in applying for the CNCF diversity scholarship to be able to attend CloudNativeCon + KubeCon North America 2017 in Austin December 6-8, can submit an application for here. Applications due October 13th.

By Konrad Djimeli, Software Developer, Community organizer (GDG Buea), GitHub Campus Expert and Google Summer of Code 2015 and 2017 Software engineering intern

Being a CloudNativeCon + KubeCon Europe 2017 Diversity Scholarship recipient was actually a life changing experience for me. I came across this scholarship while browsing the Linux Foundation website for upcoming events. When I stumbled on this event and realized it had a scholarship opportunity, I was very happy. I am passionate about cloud computing and container related technologies, which is what the conference was all about. I applied for the scholarship knowing that it would give me the opportunity to learn from experts and also gain experience, which I could share with members of my community to inspire and motivate them.

One day, I opened my email inbox and had received an email saying I had been selected as a CloudNativeCon + KubeCon Europe 2017 scholarship recipient. This was like a dream to me as I had never had the opportunity to attend any conference out of Cameroon. This was also going to be my first time traveling out of my country. My contact at the Linux Foundation Katie Schultz was very helpful in enabling me to obtain a Visa and every other requirement for my travel and accommodation. When I arrived Berlin, just the trip alone was an interesting experience and my hotel was just a few minutes walk to the Berlin Congress Center, where the conference was taking place. I went to the conference hall after arriving and obtained my conference badge.

While at the conference center, the meals were great and the talks were very interesting,  although I got lost during some of the talks due to the fact that they required a certain technical knowledge I do not possess yet. Even so, I was inspired and motivated to work hard and improve my technical knowledge. I was excited to visit the sponsor booths and It was great to talk with employees from some of the tech giants in the world like Google, Microsoft and others. I had a very interesting chat with some employees at Bitnami, including the CEO Daniel Lopez Ridruejo , who was very humble and down to earth.

Photo Caption: Konrad and Daniel Lopez Ridruejo

Talking with Sebastien Goasguen, Senior Director of Cloud at Bitnami, led me to start contributing code to a project, which involved developing Jupyter Notebooks for the Kubernetes Python Client. This notebooks, could be used by others to

learn about and explore Kubernetes and its functionalities interactively, and this project is currently hosted on GitHub. This is actually going to be the topic of my Google Summer of Code (GSOC) project this summer.

I was also very glad when I got the chance to meet with Katie Schultz who had been very helpful in making it possible for me to be at the conference.

Photo Caption: Konrad and Katie Schultz

Everyone I met at the conference was very smart and hardworking and it made me realize how much we as developers in Africa have to work in order to become world class developers.

After the conference, I returned to Cameroon and sometimes feel like it was all a dream, as the experience seemed too good to be real. As a member of the Silicon Mountain community in Cameroon, I have shared my experience with other members of my community and they have all been motivated. I have also shared with them the importance of cloud and container technologies in our community and how integrating these technologies into our applications could improve their performance and maintenance.

I think it is extremely necessary for developers from African communities like mine to get an opportunity to attend such an international tech conference. This experience enables us to obtain so much awareness about the hard work required to archive our goals.

Where In the World is CNCF? Find Us at These Community Events 🗓

By | Blog

Throughout the next few weeks, CNCF is sponsoring, speaking and exhibiting at a number of exciting community events, including: Amazonia, OpenStack Summit Boston, OSCON, DevNet Create, Open Source Summit Japan, CoreOS Fest and LinuxCon + ContainerCon + CloudOpen China.

These open source conferences help foster collaborative conversation and feature expert insight around containerized applications, microservices, the modern infrastructure stack, uniting women in tech, Kubernetes and much more.

We hope to see you at our booths, meetups and talks during the following events!


May 6, 2017


As London is home to many Women Who Code meetups and workshop events, Amazonia is a hyperlocal gathering open to all female-identified (cis and trans) and non-binary people – showcasing technical might, promoting diversity in tech and fostering cross-community relationships.

CNCF is a proud sponsor of Amazonia!

OpenStack Summit Boston

May 8-11, 2017


On May 9, from 8 AM – 5 PM, CNCF will host a Kubernetes Day as part of OpenStack’s Open Source Days. As one of the highest velocity projects in the history of open source, please join us for a deep-dive into the system and to learn about how Kubernetes is changing computing.

CNCF will also be exhibiting on the show floor all week. Our booth, staffed with the Foundation team and member company technologists, will be located at C20 – don’t forget to stop by!

Don’t miss “Migrating Legacy Monoliths to Cloud Native Microservices Architectures on Kubernetes” – a session from Dan Kohn, executive director of the Cloud Native Computing Foundation, at 4:10 PM, Thursday, May 11 in MR 210.


May 8-1, 2017


CNCF will be exhibiting at Austin Convention Center (Hall 4) Booth 609 all week. Don’t miss the following presentations and speaking engagements from CNCF community members and ambassadors:

Monday, May 8
Kubernetes Hands-On Kelsey Hightower 1:30pm–5:00pm

Location: Ballroom F

Tuesday, May 9
From zero to distributed traces: An OpenTracing tutorial Ben Sigelman

Yuri Shkuro

Priyanka Sharma


Location: Ballroom E

Wednesday, May 10
Shifting to Kubernetes on OpenShift Seth Jennings 2:35pm–3:15pm

Location: Meeting Room 14

Thursday, May 11
Hands-on with containerized infrastructure services Shannon Williams

Darren Shepherd


Location: Ballroom E

DevNet Create

May 23-24, 2017

San Francisco

On May 23, CNCF ambassador Val Bercovici; Mackenzie Burnett, product at CoreOS; Mark Thiele, chief strategy officer at Apcera; and Stephen Day, senior software engineer at Docker, will participate in a panel titled “Becoming Cloud Native: Taking it One Container at a Time.”

On the same day, Dan Kohn will also present “Migrating Legacy Monoliths to Cloud Native Microservices Architectures on Kubernetes.”

Open Source Summit Japan

May 31-June 2


CNCF is a proud Gold Sponsor of the Summit and will host a Fluentd Mini Summit on May 31 during the event, which will introduce attendees to the CNCF technology project, cloud native logging and more. To view the schedule, learn about the sessions and register to attend the Fluentd Mini Summit, please visit http://bit.ly/2q5Lqwi.

On day one, attendees will hear Dan Kohn present Migrating Legacy Monoliths to Cloud Native Microservices Architectures on Kubernetes at 11 AM in Room 1.

Also on May 31, do not miss a panel discussion on “The Future is Cloud Native: How Projects Like Kubernetes, Fluentd, OpenTracing, and Linkerd Will Help Shape Modern Infrastructure” moderated by Chris Aniszczyk, COO of CNCF and featuring speakers Keiko Harada, Program Manager at Microsoft; Ian Lewis, Developer Advocate at Google; and Eduardo Silva, Open Source Engineer at Treasure Data.

CoreOS Fest

May 31-June 1

San Francisco

CNCF is proud to be a Star Sponsor and can’t wait to gather with systems architects, DevOps engineers, sysadmins, application developers, security engineers and more at Pier 27 later this month!

LinuxCon + ContainerCon + CloudOpen China

June 19-20, 2017


On June 20, catch Dan Kohn presenting Migrating Legacy Monoliths to Cloud Native Microservices Architectures on Kubernetes and Chris Aniszczyk presentingHow the Open Container Initiative (OCI) is Setting Standards for Container Format and Runtime.”

Kubernetes Making a Splash at OpenStack Summit Boston (May 7-11)

By | Blog

By: OpenStack Special Interest Group leaders, Ihor Dvoretskyi from Mirantis and Steve Gordon from Red Hat, highlighting the status of collaboration between Kubernetes and OpenStack

OpenStack Summit is happening next week in Boston (May 7-11). It is one of the most important and notable bi-annual events of the cloud computing world, especially from the world of open source cloud computing.

This summit will feature an increased number of high-quality tracks, talks, panels and other community-gathering events, that are dedicated not just to OpenStack, but to other “friendly” technologies from the world of open source. These technologies are used by many in conjunction with OpenStack, and one of the most noteworthy technologies from the list, is Kubernetes.

OpenStack is an open source solution that allows you to build your own cloud leveraging both the OpenStack software itself and the ecosystem around it on controlled hardware – whether in a single location, or on distributed worldwide multi-datacenter infrastructure. At the same time, end-users require to run a mix of applications, not just pure virtual machines – but also bare-metal and containerized workloads and here Kubernetes is increasingly being used to assist with orchestration of these workloads.

Kubernetes brings a different layer of abstraction between infrastructure (OpenStack) and end-user applications. Kubernetes provides the ability to manage containerized applications providing enough contextual awareness to take advantage of the capabilities of the underlying clouds while ensuring the technical independence of the applications themselves from that infrastructure.

In this context OpenStack exposes a rich set of infrastructure level services for distributed applications to consume via Kubernetes, much as the Linux kernel traditionally exposed hardware resources on a single physical host for consumption by userspace process. OpenStack and Kubernetes provide the level of workflow automation and repeatability required to scale these concepts across the context of a distributed cluster. Here OpenStack works as a cloud provider for Kubernetes, and several Kubernetes-related projects (for example – kargo, tectonic, and OpenShift) are solving the question of deployment and lifecycle management of Kubernetes on OpenStack.

Kubernetes increasingly also plays a different role for OpenStack – it can act as an underlay for the containerization and management of the OpenStack services. OpenStack projects including Kolla-Kubernetes and OpenStack-Helm are actively pursuing this approach to evolving the deployment and management of OpenStack itself

At OpenStack Summit Boston besides the general track, where you will find many Kubernetes-related talks, panels and other community-gathering events (schedule) the Cloud Native Computing Foundation  and OpenStack Foundation are organizing a community day specifically dedicated to Kubernetes – Kubernetes Day. Refer to the recent article in Superuser, the OpenStack Foundations publication highlighting the contributions of superusers, for more information and background on this great event focused on interoperability of not just technologies but communities. Also, this year at the first OpenStack Forum – an enhanced version of what was formerly part of the Developers and Operators Summits which is co-located with the main OpenStack Summit, there will be a Kubernetes Ops on OpenStack session. This session will focus on gathering feedback from users and determining how to apply it to the future development roadmaps of both projects.

In the Kubernetes Community the OpenStack Special Interest Group is the focal point for cross-collaboration between OpenStack and Kubernetes communities to build integrated solutions. We are always looking for new contributors to join us on this journey, if you are interested in joining us jump on the mailing list and say hi!

CNCF Brings Kubernetes, CoreDNS, OpenTracing and Prometheus to Google Summer of Code 2017

By | Blog

The Google Summer of Code (GSOC) program allows university students (over the age of 18) from around the world to spend their summer breaks writing code and learning about open source development. Accepted students work with a mentor and become a part of the open source community. In its 13 year, the program has previously accepted 12,000+ students from 104 countries to work on 568 open source projects, writing over 30 million lines of code.

201 organizations were accepted to participate in GSOC 2017 to bring new, excited developers into their community and the world of open source. The Cloud Native Computing Foundation is proud to be one of these organizations, bringing seven interns this summer. Mentors were paired with interns to help advance the following CNCF projects: 4 Kubernetes, 1 CoreDNS, 1 OpenTracing and 1 Prometheus.

“As a former GSOC mentor, I have seen the amazing impact this program has on the students, projects and larger open source community. CNCF is very proud to have 7 projects in the 2017 program that cover a range of our cloud native technologies. We look forward to watching the progress and results of the students over the summer.” – Chris Aniszczyk (@cra)

Additional details on the projects, mentors, and students can be found below. Coding beings May 30 and we’ll report back on their progress in a few months.  


Create and Implement a Data Model to Standardize Kubernetes Logs

Student: Amit Kumar Jaiswal, UIET CSJM University

Mentor: Miguel Perez Colino, Red Hat

This project aims to build and implement a data model for logs in a large Kubernetes cluster to process, correlate, and query to make troubleshooting easier and reduce the time in finding root causes.

Develop a Set of Jupyter Notebooks for the Kubernetes Python Client + Kubernetes Python Client Update

Student: Konrad Djimeli, University of Buea       

Mentor: Sebastien Goasguen, Skippbox (acquired by Bitnami)

The Kubernetes python client is a Kubernetes incubator project. The python client makes it possible to access Kubernetes with python. Jupyter notebook extends the console-based approach to interactive computing in a qualitatively new direction, providing a web-based application suitable for capturing the whole computation process. The aim of this project is to develop a set of notebooks that highlight the Kubernetes primitives. This project would also include updating the python client to make it easier for users to carry out certain operations.

Improve ThirdPartyResources  

Student: Nikhita Raghunath, Veermata Jijabai Technological Institute (Mumbai)

Mentor: Stefan Schimanski, Red Hat

ThirdPartyResources are already available, but the implementation has languished with multiple outstanding capabilities missing. They did not complete the list of requirements for graduating to beta. Hence, there are multiple problems present in the current implementation of ThirdPartyResources. This project aims to work towards a number of known shortcomings to drive the ongoing effort toward a stable TPR release forward.

Integrate Unikernel Runtime

Student: Hao Zhang, Zhejiang University, Computer Science (master)

Mentor: Harry Zhang, Lob and Pengfei Ni, HyperHQ

This work will focus on  why and how to integrate unikerneal technology as a runtime into into the Kubernetes/frakti project. This will allow Kubernetes to use use a unikernel instance just like it uses Docker, which eventually will open  Kubernetes up to more more application scenarios.


CoreDNS: Middleware

Student: Antoine Debuisson (University of Paris-Sud)

Mentor: Miek Gieben, CoreDNS and John Belamaric, Infoblox

The goal of the project is to capture the DNS data within a CoreDNS middleware and write it to a “dnstap log file” (perhaps over the network).

Codebase to build upon:


Instrument OpenTracing with Go-restful Web Framework

Student: Liang Mingqiang, Hyogo University and Carnegie Mellon University

Mentor: Ted Young, LightStep and Wu Sheng, OpenTracing

Go-restful (https://github.com/emicklei/go-restful) is a widely-used library for building REST-style Web Services using Google Go programming language. With powerful built-in modules, including intelligent request routing, RESTful support and filters for intercepting HTTP request, go-restful makes it very convenient to build a web application from scratch. This proposal aims to instrument OpenTracing with go-restful.


Storage and Query Engine Improvements to Prometheus

Student: Goutham Veeramachaneni, Indian Institute of Technology, Hyderabad

Mentor: Ben Kochie, SoundCloud, Fabian Reinartz, CoreOS and Julius Volz, Prometheus

While the Prometheus monitoring system solves most use-cases, improvements will  will reduce the already minimal load on the ops team, including  checking alerts over time, unit-testing alerts, backups and figuring out which queries OOM.

Meeting Challenges in Using and Deploying Containers

By | Blog

The Cloud Native Computing Foundation (CNCF) surveyed attendees at CloudNativeCon+ KubeCon in late 2016 on a range of topics related to container management and orchestration. In a previous blog, we examined the implications of survey results, in particular how Kubernetes had advanced from the test bench to real-world production deployments in the course of the preceding year.

An equally interesting data set coming out of that survey, and from earlier surveys conducted by Google, highlighted challenges respondents faced as as they increasingly used and deployed applications with containers.  Respondents could include multiple topics in their responses, and in the earlier Google surveys, also insert freeform commentary.

Figure 1. Summarizes the three response sets from the CloudNativeCon + KubeCon (Nov. ‘16)  and previous Google surveys (June and March ‘16):

Figure 1.  Challenges to Adoption
(Non-exclusive Survey Responses)

Characterizing the Challenges

Let’s take a moment to examine the leading concerns in the survey, and also in the earlier ones:

Networking – 50 percent of CloudNativeCon + KubeCon respondents pointed to “networking” as their greatest challenge. Those who articulated their concerns further focused on:

  • Debugging network connectivity, especially across managed containers and containers deployed on geographically disparate clouds
  • More configurable and secure networking with multi-tenancy

Cloud native networking certainly appeals to networking engineers as well as a growing number of developers who increasingly find it a part of their daily work. In this recent CNCF webinar, Christopher Liljenstolpe, CTO of Tigera and Founder orProject Calico, and Brian Boreham, Director of Engineering, Weaveworks, dive into networking for containers and microservices.

Security – as production deployment increases, so do security risks, in particular for containers hosting execution of Internet- and customer-facing applications.  With 42 percent of CloudNativeCon + KubeCon respondents highlighting security, specific concerns included:

  • Applying security patches and updates to container contents
  • Network isolation and secure isolation/communication among managed containers
  • Understanding the scope of potential attack surfaces

Storage & Resource Management – “storage” led the responses for the earlier Google surveys, and almost half (42 percent) of respondents at CloudNativeCon + KubeCon still voiced concerns in this area:

  • Lack of appropriate and accessible network storage
  • Secure and standards-compliant network storage (e.g., for HIPAA)
  • Persistent and performant storage
  • Meeting legacy storage requirements and storage portability
  • Better load management
  • Standardization of / patterns for file systems and container layouts

Complexity – 39 percent of  CloudNativeCon + KubeCon respondents with the above concerns also cited “complexity” as a challenge, and certainly issues with networking, security and storage contribute to these concerns.  

Logging and Monitoring – also high on respondents’ list of concerns at 42 percent was logging and monitoring, in particular:

  • The need for more detailed k8s manifests
  • More insight into operational metrics
  • More robust application logging capabilities

Meeting the Challenges

The challenges posed by respondents are being addressed incrementally by CNCF project developers and the container management ecosystem.  In particular, CNCF projects that address the above technical hurdles in networking, security and storage, and also logging, tooling and automation, and beyond, include:

Linkerd Resilient service mesh for cloud native apps, including a transparent proxy that adds service discovery, routing, failure handling, and visibility to modern software applications.  

Learn more at https://linkerd.io/

Fluentd A data collector for unified logging layer. Fluentd lets you unify data collection and consumption for a better use and understanding of data.

Learn more at http://www.fluentd.org/

Kubernetes Kubernetes itself focuses on automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery.

Learn more at https://kubernetes.io/

Prometheus A systems monitoring and alerting toolkit with a very active developer and user community.  Prometheus works well for recording numeric time series. It fits both machine-centric monitoring as well as monitoring of highly dynamic service-oriented architectures.

Learn more at https://prometheus.io/

The CNCF Cloud Native Landscape Project categorizes many of the most popular projects and startups in the cloud native space. This is another resource where people can find technologies that might help solve their technical challenges. It is under development by CNCF, Redpoint and Amplify.

Learn more about all CNCF projects at https://www.cncf.io/projects.

1 31 32 33 39