KubeCon + CloudNativeCon Virtual | August 17-20, 2020 | Don’t Miss Out | Learn more



Decoding the Self-Healing Kubernetes: Step by Step

By | Blog

Member Post

Guest post originally published on the Msys Technology blog by Atul Jadhav 


Business application that fails to operate 24/7 would be considered inefficient in the market. The idea is that applications run uninterrupted irrespective of a technical glitch, feature update, or a natural disaster. In today’s heterogeneous environment where infrastructure is intricately layered, a continuous workflow of application is possible via self-healing.

Kubernetes, which is a container orchestration tool, facilitates the smooth working of the application by abstracting machines physically. Moreover, the pods and containers in Kubernetes can self-heal.

Captain America asked Bruce Wanners in Avengers to get angry to transform into ‘The Hulk’. Bruce replied, “That’s my secret Captain. I’m always angry.”

You must have understood the analogy here. Let’s simplify – Kubernetes will self-heal organically, whenever the system is affected.

Kubernetes’s self-healing property ensures that the clusters always function at the optimal state. Kubernetes can self-detect two types of object – podstatus and containerstatus. Kubernetes’s orchestration capabilities can monitor and replace unhealthy container as per the desired configuration. Likewise, Kubernetes can fix pods, which are the smallest units encompassing single or multiple containers.

The three container states include

  1. Waiting – created but not running. A container, which is in a waiting stage, will still run operations like pulling images or applying secrets, etc. To check the Waiting pod status, use the below command.
kubectl describe pod [POD_NAME]

Along with this state, a message and reason about the state are displayed to provide more information.

State: Waiting
Reason: ErrImagePull
  1. Running Pods – containers that are running without issues. The following command is executed before the pod enters the Running state.

Running pods will display the time of the entrance of the container.

  State:          Running
   Started:      Wed, 30 Jan 2019 16:46:38 +0530
  1. Terminated Pods – container, which fails or completes its execution; stands terminated. The following command is executed before the pod is moved to Terminated.

Terminated pods will display the time of the entrance of the container.

State:          Terminated

Reason:       Completed

Exit Code:    0

Started:      Wed, 30 Jan 2019 11:45:26 +0530

Finished:     Wed, 30 Jan 2019 11:45:26 +0530

Kubernetes’ self-healing Concepts – pod’s phase, probes, and restart policy.

The pod phase in Kubernetes offers insight into the pod’s placement. We can have

  • Pending Pods – created but not running
  • Running Pods – runs all the containers
  • Succeeded Pods – successfully completed container lifecycle
  • Failed Pods – minimum one container failed and all container terminated
  • Unknown Pods

Kubernetes execute liveliness and readiness probes for the Pods to check if they function as per the desired state. The liveliness probe will check a container for its running status. If a container fails the probe, Kubernetes will terminate it and create a new container in accordance with the restart policy. The readiness probe will check a container for its service request serving capabilities. If a container fails the probe, then Kubernetes will remove the IP address of the related pod.

Liveliness probe example.

apiVersion: v1
kind: Pod
    test: liveness
  name: liveness-http
  - args:
    - /server
    image: k8s.gcr.io/liveness
        # when "host" is not defined, "PodIP" will be used
        # host: my-host
        # when "scheme" is not defined, "HTTP" scheme will be used. Only "HTTP" and "HTTPS" are allowed
        # scheme: HTTPS
        path: /healthz
        port: 8080
        - name: X-Custom-Header
          value: Awesome
      initialDelaySeconds: 15
      timeoutSeconds: 1
    name: liveness

The probes include

  • ExecAction – to execute commands in containers.
  • TCPSocketAction – to implement a TCP check w.r.t to the IP address of a container.
  • HTTPGetAction – to implement a HTTP Get check w.r.t to the IP address of a container.

Each probe gives one of three results:

  • Success: The Container passed the diagnostic.
  • Failure: The Container failed the diagnostic.
  • Unknown: The diagnostic failed, so no action should be taken.

Demo description of Self-Healing Kubernetes – Example 1

We need to set the code replication to trigger the self-healing capability of Kubernetes.

Let’s see an example of the Nginx file.

apiVersion: apps/v1 
kind: Deployment
  name: nginx-deployment-sample
      app: nginx
        app: nginx
      - name: nginx
        image: nginx:1.14.2
        - containerPort: 80

In the above code, we see that the total number of pods across the cluster must be 4.

Let’s now deploy the file.

kubectl apply nginx-deployment-sample

Let’s list the pods, using

kubectl get pods -l app=nginx

Here is the output.

As you see above, we have created 4 pods.

Let’s delete one of the pods.

kubectl delete nginx-deployment-test-83586599-r299i

The pod is now deleted. We get the following output

pod "deployment nginx-deployment-test-83586599-r299i" deleted

Now again, list the pods.

kubectl get pods -l app=nginx

We get the following output.

We have 4 pods again, despite deleting one.

Kubernetes has self-healed to create a new node and maintain the count to 4.

Demo description of Self-Healing Kubernetes – Example 2

Get pod details

$ kubectl get pods -o wide

Get first nginx pod and delete it – one of the nginx pods should be in ‘Terminating’ status

$ NGINX_POD=$(kubectl get pods -l app=nginx --output=jsonpath="{.items[0].metadata.name}")
$ kubectl delete pod $NGINX_POD; kubectl get pods -l app=nginx -o wide
$ sleep 10

Get pod details – one nginx pod should be freshly started

$ kubectl get pods -l app=nginx -o wide

Get deployement details and check the events for recent changes

$ kubectl describe deployment nginx-deployment

Halt one of the nodes (node2)

$ vagrant halt node2$ sleep 30

Get node details – node2 Status=NotReady

$ kubectl get nodes

Get pod details – everything looks fine – you need to wait 5 minutes

$ kubectl get pods -o wide

Pod will not be evicted until it is 5 minutes old – (see Tolerations in ‘describe pod’ ). It prevents Kubernetes to spin up the new containers when it is not necessary

$ NGINX_POD=$(kubectl get pods -l app=nginx --
$ kubectl describe pod $NGINX_POD | grep -A1 Tolerations

Sleeping for 5 minutes

$ sleep 300

Get pods details – Status=Unknown/NodeLost and new container was started

$ kubectl get pods -o wide

Get depoyment details – again AVAILABLE=3/3

$ kubectl get deployments -o wide

Power on the node2 node

$ vagrant up node2
$ sleep 70

Get node details – node2 should be Ready again

$ kubectl get nodes

Get pods details – ‘Unknown’ pods were removed

$ kubectl get pods -o wide

**Source: GitHub. Author: Petr Ruzicka


Kubernetes can self-heal applications and containers, but what about healing itself when the nodes are down? For Kubernetes to continue self-healing, it needs a dedicated set of infrastructure, with access to self-healing nodes all the time. The infrastructure must be driven by automation and powered by predictive analytics to preempt and fix issues beforehand. The bottom line is that at any given point in time, the infrastructure nodes should maintain the required count for uninterrupted services.


Istio Service Mesh in 2020

By | Blog

Member Blog Post

Guest Post by Alon Berger, Technical Marketing Engineer, Alcide

Since 2017, Kubernetes has soared and has played a key role within the cloud-native computing community. With this movement, more and more companies who already embraced microservices realized that a dedicated software layer for managing the service-to-service communication is required. Enter the Service Mesh, and its leading contender as a preferred control plane manager – Istio, a platform built around an Envoy proxy to manage, control and monitor traffic flow and securing services and the connections between one another.

According to the CNCF Survey 2019, Istio is at the top of the chart as the preferred service mesh project:

While Istio clearly made its mark as a powerful service mesh tool, it is still entwined with a relatively complex operation and integration requirements.

Istio’s roadmap for 2020 is all about supporting companies as they adopt microservices architectures for application development. The main focus of Istio’s latest release is simply making it faster and easier to use.

What Should We Expect?

Istio’s offering is a complete solution for enabling orchestration of a deployed services network with ease. It utilizes complex operational requirements like load-balancing, service-to-service authentication, monitoring, rate-limiting and more. 

To achieve that, Istio provides its core features as key capabilities across a network of services:

  • Traffic management
  • Security
  • Observability
  • Platform support
  • Integration and customization

With its latest release, along with some most anticipated improvements, those features are getting buffed as well.

During 2019 Istio’s build and test infrastructure improved significantly, resulting in higher quality and easier release cycles. A big focus was around improving user experience, with many additional commands added to allow easier operations and smother troubleshooting experience.

Furthermore, Istio’s team reported exceptional growth in contributors within the product’s community.

Mixer Out, Envoy In

Extensibility with Istio was enabled by the Mixer, an entity responsible for providing policy controls and telemetry collection, which acts as an Intermediation layer that allows fine-grained control over all interactions between the mesh and infrastructure backends.

This entire model is now migrated directly in the proxies, in order to remove additional dependencies, resulting in a substantial reduction in latency and a significant improvement in overall performance. Eventually, the Mixer will be released as a separate add-on, as part of the Istio ecosystem.

The new model replacing Mixer will use Envoy’s extensions, which paves the path to even more capabilities and flexibility. There is already an ongoing implementation of a WebAssembly runtime in Envoy, which will potentially extend platform efficiency, This type of flexibility was a lot more challenging to achieve with Mixer.

Another key takeaway from this new model is the ability to avoid using a unique CRD for every integration with Istio.

Control Plane Simplified

The desire to have fewer moving parts during deployments drove the Istio team towards istiod, a new single binary, which now acts as a single daemon, responsible for the various microservices deployments. 

This binary combines features from known key components such as the Pilot, Citadel, Galley and the sidecar.

This approach reduces complexity within domains across the board. 

Installation, ongoing maintenance, and troubleshooting efforts will become much more straightforward while supporting all functionalities from previous releases.

Additionally, the node-agent’s functionality used to distribute certificates, moved to the istio-agent, which already runs in each pod, reducing even more dependencies.

Below is a “Before and After” of Istio’s high-level architecture. Can you spot the differences?



Securing all fronts

Another major focus is on buffing up several security fundamentals like reliable workload identity, robust access policies, and comprehensive audit logging. The imperative nature of such requirements is what pushes the team to double down on stabilizing the API for these features.

Inevitably, network traffic will take up several security reinforcements, including implementation of the automated rollout of mutual TLS and leveraging of Secret Discovery Service, which will introduce a safer way of distributing certificates, thus reducing the risk of detection by other workloads running on the machine. 

These upgrades will trim down both dependencies and requirements for cluster-wide security policies, leading to a much more robust system.

There is a lot more to expect from Istio during 2020. Check out this page and Istio’s blog for more information and additional features to come.

How Cloud Native Is Driving Zerodha, the World’s Largest Retail Stock Investment Platform

By | Blog

CNCF Staff Post

The Indian stock brokerage Zerodha handles 8 million trades a day, making it the largest retail stock investment platform in the world. “Our mission is to make trading and investing easy and accessible to the masses,” says CTO Kailash Nadh.

Given its industry and scale, Zerodha requires infrastructure that spans a public cloud (AWS for most in-house applications) and physical machines in multiple data centers, with specific regulatory and technical requirements for capital market connectivity via leased lines and adapters from various stock exchanges. 

That complexity — along with a heavily regulated technology stack and end-user applications and internal systems with disparate external dependencies led the company to embrace cloud native technologies. 

“We needed a centralized, uniform monitoring infrastructure that worked across a wide variety of environments,” Nadh says. “Prometheus gave us powerful monitoring for critical, low-latency financial systems. It helped us aggregate and monitor metrics infra-wide. The large number of existing exporters and the ease of writing custom exporters enabled us to attain wide coverage in a short period of time.”

Additionally, Zerodha began moving its services from VMs to containers, and gradually to Kubernetes in 2020. Because all of its apps had already been developed with a service-oriented architecture and 12-factor approach, the migration was straightforward. The infrastructure team began by creating CI pipelines with GitLab as the company has a well-defined process of pushing changes to production with its CI/CD process. With a focus on infrastructure-as-code practices, Zerodha uses a mix of Terraform, Packer, and eksctl to create its Kubernetes infrastructure on AWS and hosts container artifacts on an internal registry powered by AWS (ECR).

“We have been conscious of not creating an ops vs. dev divide,” says Nadh. “Developers are responsible for the entire lifecycle of their projects, including deployments. We created a standard template for deployments with Kubernetes that allows developers to craft their own deployments with minimal scaffolding or direct involvement of DevOps engineers.”

As a result, deployment rollouts are faster and more frequent: Complete environments with all the dependencies can be brought up in minutes, rather than hours, with very little manual intervention. “Kubernetes has helped us standardize the deployment process of applications built on many different kinds of stacks across teams,” he says. “We’ve gained scale and modularity, especially in an environment where sweeping regulatory changes often demand significant changes to systems. Kubernetes also allows us fine-grained resource allocation for workloads, reducing cost of compute instances by at least 50%.”

Zerodha is using a CNCF incubating project, NATS, “for transmitting large volumes of real time-market data across infrastructures across applications at high throughputs,” says Nadh. “Many of our components depend on the ease of subscriptions and instant and ‘automagical’ failovers NATS offers. We had at least three other technologies that we had tried over the years before stumbling upon NATS. It pretty much solved all the issues we had faced with other message streams and PubSub systems.”

To find out more about Zerodha’s cloud native journey, read the full case study.


Cloud Computing: Choose a Multi-Cloud Strategy or Fly Solo

By | Blog

Member Blog Post

Guest Post originally published on Medium by Saurabh Gupta, Sr. Developer Advocate @DigitalOcean

Multi-cloud strategy is slowly becoming a buzzword in the Cloud industry over the last couple of years, with 2019 research from RightScale (now Flexera) indicating that 84% of enterprises are investigating or actively pursuing a multi-cloud strategy.

What is a multi-cloud strategy?

It means that instead of using only one single Infrastructure-as-a-service (IAAS) Cloud provider that best meets the needs your business, you can use a Multi-cloud strategy by adopting a mixture of IaaS services from two or more cloud providers and sharing workloads between each, and choosing services that provide the greatest flexibility, reliability, most features, or are offered at a much better price point.

In simple terms, Multi-cloud is a strategy where an organisation uses two or more clouds from different cloud providers. This can be a combination of software as a service (SaaS), platform as a service (PaaS) or infrastructure as a service (IaaS) models.

Is Hybrid cloud another name for Multicloud ? Understanding the difference between them ?

Many times people assume that Multi-cloud is just another way of saying Hybrid cloud? Is this another IT rendition of: “You say tomato, I say toe-mah-toe?” Not quite.

So to put across simply, a Hybrid cloud setup is a combination of a public cloud with a private cloud or on-premises infrastructure. On-premises infrastructure can be an internal data center or any other IT infrastructure that runs within a corporate network. Businesses may choose to adopt a hybrid cloud strategy in order to keep some processes and data in a more controlled environment (e.g. a private cloud or on-premises data center), while taking advantage of the greater resources and low overhead of public cloud computing.

Whereas a “Multicloud” refers to the combination and integration of multiple public clouds. A business may use one public cloud as a database, one as PaaS, another one for user authentication, and so on.

Advantages of having a Multi-cloud setup

Adopting a Multi-cloud approach helps enterprises avoid such a scenario!

Cons of a Multi-cloud setup

Role of Containers & Microservices in Revolutionizing IT and Telecom Sector

By | Blog

Member Post

Guest post originally published on the OVOO blog 

Nowadays businesses witness great challenges. Previously, the software services in any organization were solely for back-office functions which were important for businesses. In those days, the main services offered by companies were physical. But now things have changed. Now more and more businesses have shifted to digitization, hence are delivering their basic services digitally to their customers. Now customers demand features and functionality be delivered quickly with optimal users experiences. Keeping in view, competition in the market, organizations nowadays need a software architecture that resembles highly efficient factory assembly lines. To cater to the demands of today’s competitive market, Containers & Microservices have emerged.

Containers & Microservices = The Best Match



Microservices is a significant architecture style of software applications, which primarily focus on cloud-native deployment to achieve quick and continuous delivery. Usually, microservices are positioned within the containers to enable the unremitting deployment of large and complex IT application. Every microservices can be mounted, deployed and reused independently from other services in the application. Every microservices is self-contained so it doesn’t share data with other.

Microservices reusability allows endless updates to the main application. It also carries automation capabilities through well-defined intercommunication APIs. Many tech companies and business have shifted to cloud-native infrastructure with the help of microservices. Moreover, now they are able to achieve a high degree of automation upgrades for new features. In this way delivery time of services to market will greatly reduce.


Containers basically is a method of operating system virtualization with which one can run an application and its dependent resources. With Container, one can easily encase application code, configurations and dependencies into building blocks. These blocks deliver environmental consistency, operational efficiency, developer productivity, and version control.  Virtualization has revolutionized the whole IT industry and has provided an opportunity for tech vendors to offer different IT-based services to consumers.

Containers are commonly used to run each Microservice. They basically serve as a lightweight “envelopes” used to make software portable. Containers needed for each microcontainer can be dynamically created or destroyed, depending on the load. Due to this reason automation is essential.

Usability of Containers

Containers are useful for:

  • Containers are beneficial at the edge level of networks. At the edge levels of networks, low latency, resiliency, and portability requirements are tremendously significant.
  • They are also useful for positioning short-lived and ephemeral services.
  • Containers are advantageous in machine learning models where a problem can be separated into small sets of tasks.


Nowadays, innovation is still the differentiator. When you have to compete with your rivals, you do not wait for the next production means, technology or business model to be provided to you. You have to create the change you want to create a change, you have to adopt modern technologies in order to be successful.

Now network consumers expect more from MNOs and IT sector. After the development of 5G, there will be new demands and requirements which companies would have to fulfil. Only Containers & Microservices are two technologies that will be able to deliver on the new requirements of network users. Moreover, Microservices and service virtualization enables tech giants and vendors to offer a wide variety of IT services which are based on cloud computing.

Happy Developers: Navigators of the data age

By | Blog

Member Post

Guest post originally published on the Rookout blog by Or Weis

In the age of discovery, navigators changed the world. Their unique skills won them fame, riches, and glory, as well as the ears and support of kings and emperors.
The rulers of old who knew the importance of investing in these skilled frontier men rewarded their nations with the longest and wealthiest golden ages they’d ever seen. Nowadays, in the age of data, developers are the new navigators. Their happiness is the key to the success of modern business and the employers and companies who understand this have the opportunity to become market leaders.

Traversing the oceans of data

Software engineers, Devops, SREs, Data Scientists, and developers at large are the new helmsman, navigators, and cartographers. The skills developers have and their unique access to the tools of their trade is the key to solving modern problems – problems of scale, automation, AI training, complex calculation and prediction, and in general, of data manipulation. Tools were and still are a huge part of both the developer and navigator professions. While navigators had the sextant, star charts, kamals, compasses, and containers, developers have an even more impressive list of tools such as IDEs, compilers, CI/CD, ML/AI models, Programming languages, cloud-services, serverless, Istio, kubernetes, and containers, to name just a few.

As you’ve probably noticed “containers” repeat on both lists, and indeed developers have named much of their modern tools after maritime namesakes. The same is true for Kubernetes (‘Helmsman’ in Greek), Istio (‘sail’ in Greek), and many more. When surveying modern software projects, it quickly becomes apparent that the required toolchain is constantly growing, and hence the knowhow and efforts that are required from developers is constantly growing as well. Of course, there is no doubt that without both the tools and developers organizations wouldn’t be able to approach, let alone traverse, the oceans of data.

The importance of quality data

Data is not the new gold or oil, it’s the new oxygen. Every part of the modern business needs it, ranging from sales to marketing to product, all the way through security, data-science, and of course to engineering itself. However, the pursuit and effort to obtain data is not about blindly collecting, as opposed to what some vendors of big-data solutions might be claiming. Data is about quality before quantity. Each voyage is about getting to the right data at the right time and how to derive the right products from it. You don’t want to drown in data, you want to swim in it. As historian Yuval Noah Harari put it in his bestselling book Homo Deus: A History of Tomorrow: “In ancient times having power meant having access to data. Today having power means knowing what to ignore.”

Looking at data-science really highlights this fact. The better data-scientists are able to label and curate their data-sets, the better outcome they can achieve. While more flexibility is afforded with deep learning, the quality of data still remains pivotal. Quality, as with many other aspects of life, translates to not only skill, but motivation and guidance. The ability to see the new frontier that can be obtained beyond the veils of data at the horizon, is directly linked to the creativity, freedom, and ability to persevere through obstacles. If we boil all these parameters down to a key one- happiness would be it. We need our developers happy.

Developers – You need them happy

The basic fact is that in order to truly succeed at their jobs at the level you would need to spark a golden age, your developers have to be happy and motivated. Just like their discovery age counterparts, good developers are hard to find and so it becomes a simple matter of supply and demand. If you want to get this supply, you better listen to their demands. It is currently estimated that by 2021 US companies will be experiencing a shortage of 1.4 million software developers to fill positions.
So how do we make developers happy?

Top causes of dev unhappiness

Before we can discuss how to make developers happy, we need to delve into the root of the issue and understand the cause for their unhappiness. According to the article “On the Unhappiness of Software Developers”, the way to foster happiness is to limit unhappiness. Yes, agreed, this seems quite evident. So what exactly makes these developers, these people who are standing at the helm of the future of technology, unhappy? 10 key causes were found to be the source. The first three originate from the software developer’s own being. This was found in instances when devs were stuck solving a problem, felt their skills and/or knowledge were inadequate, and when experiencing personal issues. The other seven causes are produced by external causes, such as colleagues underperforming, unexplained broken code, and bad decisions. As we can see, much of their unhappiness stems from sources directly relating to their job. So, how can we, with this knowledge, flip it to benefit our devs?

What makes developers happy

The following is a list of key concepts companies can adopt to improve developer wellness and happiness. The list focuses on the unique aspects that are relevant for developers, taking into account you are already doing the best to take care of their happiness as people first.

Reduce context switches:

  • Context switches are interruptions to the workflow that require devs to shift attention from one task to the other. When CPU running software performs context switches it hurts performance. When people do it – it hurts performance and happiness.
    Most developers know that in order to truly get the job done right one needs to get “into the zone”, a focused deep thinking state of mind. Context-switches are the death of that.

    • This can be achieved via methods like:
      • Planning a supportive schedule that doesn’t burden developers with meetings and concentrates blocks of sequential work in which developers can get into their zone.
      • Creating a quiet and supportive environment and culture.
      • Investing in high-quality workstation gear – desks, screens, mice, keyboards, and possibly most important: good noise reducing head-phones.
      • Investing in tools that streamline dev work – such as IDEs (e.g. JetBrains) , or productivity apps (e.g. Alfred)

Improve software knowledge and understanding – allocate time for learning:

  • You need to understand the great professional pressure devs are constantly under; that developer work constantly requires devs to learn and relearn topics and technologies as new methods, solutions and technology in general is constantly rushing forward. From this understanding you can come to elevate the pressure and help your devs invest the time they need to remain up to date, both personally as professionals and, more specifically, as engineers combating technical debt for your organization.

Make resolving issues easy and blame-free:

  • Like a car, which is only truly tested when rubber hits the road; software is only truly tested when it meets reality and production workloads. This makes testing, debugging, and handling incidents both difficult and extremely stressful. True developer agility is gained with focus on quick iteration, learning, and improvement. This requires an enabling culture, one that values learning over blaming. In addition investing in infrastructure and tools that enable agility in these processes: modern APM (e.g. AppDynamics , Datadog) , exception management  (e.g. Sentry),  and production debugging (e.g. Rookout) can dramatically reduce friction, as well as save your devs a lot of time.

Make communication between devs and the rest of the org easy:

  • Developers have their own ways of communication, on average somewhat more introverted, sarcastic, critical, and of course technological. Embrace it- and encourage them and the rest of your organization to communicate. If your devs aren’t invested in your business goals; don’t be surprised when they fail to be motivated by them and ultimately fail to deliver for them.
  • Developer excellence: As a theme developer excellence or wellness is becoming something companies are putting emphasis on, even hiring key personnel to lead this focus, in some cases VP and C-Level. While not a magic cure-all, this is a good strategy to communicate how important the wellbeing of developers is and allocate mind-share, time, and resources to driving it.

The Future is Dev

Looking at human history, there are distinct ages – periods in which key roles in society lead revolutions that forever change the fate of mankind. Shamans and chieftains, philosophers, kings, renaissance-men, and most notably in the age of discovery, explorers and navigators whose unique skills and spirit drove civilization forward, quite literally, by connecting the old world and the new world.

In this age of data, developers are taking the lead, harnessing an ever growing arsenal of tools, constantly requiring them to learn, adapt, and perform, while the challenges are constantly growing in scale and complexity. As the problems faced grow, so do the rewards. Consequently, the companies that best support their developers and take care of their happiness, will win a new world that holds a future that’s probably beyond our wildest dreams.

Call to Participate: 1H 2020 CNCF Cloud Native Survey

By | Blog

CNCF Staff Post

Our 1H 2020 cloud native survey has kicked off!

The goal of this survey is to capture the current state of Kubernetes, CNCF projects, and cloud native technologies including service mesh, serverless, and storage. 

This is the 8th time we have taken the temperature of the infrastructure software marketplace to better understand the adoption of cloud native technologies. We will collect and share insights on:

  • The production usage of CNCF-hosted projects
  • The changing landscape of application development
  • How companies are managing their software development cycles
  • Cloud native in production and the benefits
  • Challenges in using and deploying containers

The information gathered from the survey is used by CNCF to better understand the current cloud native ecosystem. It can be used by the community as a data point to consider as they develop their cloud native strategies.

Help out CNCF and the community by filling out the survey! The results will be open sourced and shared on GitHub as well as a report in the June time frame. To see last year’s results, read the 2019 survey report.

Harbor 2.0 takes a giant leap in expanding supported artifacts with OCI support

By | Blog

Project Post

Originally published on goharbor.io by Alex Xu Harbor Contributor and Senior Product Manager, VMware

We are pleased to announce general availability of Harbor 2.0. This release makes Harbor the first OCI (Open Container Initiative)-compliant open source registry capable of storing a multitude of cloud-native artifacts like container images, Helm charts, OPAs, Singularity, and much more.

If you’re interested in learning more about Project Harbor 2.0, register today for the CNCF Project Webinar on Harbor on May 28, 2020 at 10:00am PDT.

Let’s first dive into what OCI is and what the release of Harbor 2.0 means for the community.

OCI is a tried-and-true industry standard that defines specifications around format, runtime, and the distribution of cloud-native artifacts. Most users are familiar with some of the more popular OCI-compliant artifacts, like docker images and Helm charts. The OCI specification helps bring artifact authors and registry vendors together behind a common standard. As a developer, I can now adopt the OCI standard for my artifacts and be confident that I can use an OCI-compliant registry like Harbor with minimal to no changes.

At a high level, OCI puts forth two specifications: an image specification and a runtime specification. The image specification defines what the image looks like, including the archival format and the contents, including the manifest, the (optional) image index, the ordinal set of filesystem layers, and a configuration file. The OCI runtime then takes that configuration and converts it into an executable that consumes the filesystem bundle in accordance with the runtime specification. Put another way, the image specification facilitates the creation of interoperable tools for building, transporting, and preparing images for running whereas the runtime specification dictates the configuration, execution environment, and lifecycle of a container. 

Supporting OCI-compliant images in Harbor means supporting its set of APIs and interpreting key information. Such information includes the OCI schemas and media types that are used to determine what can or cannot be pushed onto Harbor. For example, the manifest.config.mediaType field is critical for identifying itself to the registry while the layer.mediaType defines the filesystem layers that are to be stored and persisted on the registry—without the registry having to pull and dissect the layers first. 

For example, Helm charts can now be pushed onto Harbor via Helm3. Instead of being hosted separately in ChartMuseum, Helm charts are now stored under artifacts alongside container images. In this figure below, we see a container image, a Helm chart, and a Cloud Native Application Bundles (CNAB) hosted in the same project.

Harbor gets another key benefit from being OCI-compliant: It is now fully capable of handling an OCI index, a higher-level manifest representing a bundling of image manifests that’s ideal for multi-architecture scenarios. Imagine pulling an image without having to specify the operating system and platform and instead relying entirely on the client tooling to ensure the correct version of that image is fetched. This index structure is widely leveraged by artifacts like CNAB for managing distributed cloud-agnostic applications. 

Although Harbor is now OCI-compliant, existing users should not worry; all of the familiar operations and key benefits of Harbor translate well to OCI. You can push, pull, delete, retag, copy, scan, and sign indexes just like you’ve been able to do with images. Vulnerability scanning and project policies, key ingredients to enforcing security and compliance, have been revamped to work with OCI artifacts. We also provided a new, key capability: you now have the ability to delete an image tag without deleting the underlying manifest and all other associated image tags. You can also view untagged images, and have the option to exclude them from being garbage-collected.

As artifact types will undoubtedly come and go, it’s crucial that Harbor exists outside of any particular container format, and be flexible enough to onboard and discard any artifact type based on community demand and adherence to common standards.

Shipping Aqua Trivy as the default scanner

This release also replaces Clair with Aqua’s Trivy as the default image scanner. Trivy takes container image scanning to higher levels of usability and performance than ever before. Since adding support for Trivy through our pluggable scanning framework in Harbor v1.10, we have received great feedback and have seen increasing traction among the Harbor community, making Trivy the perfect complement to Harbor. Trivy has wide coverage for scanning different operating systems and application package managers, and is easy to integrate into CI/CD systems. It also conducts deep scans and picks up vulnerabilities across popular distros like CentOS, Photon OS, Debian, and Ubuntu, among others. Clair also continues to be supported in Harbor as a built-in scanner. In fact, during an upgrade to Harbor v2.0, projects using Clair as the scanner of choice will be unaffected; Trivy will be set as the default scanner only for new installations.

Notable features

We listened to user feedback and are making strides towards an improved design for Harbor robot accounts, a design that reflects common usage patterns. Harbor v2.0 introduces the ability to set an expiration date on each individual robot account as opposed to a system-wide setting. In a future release, we will grant robot accounts the ability to be targeted to one or more projects, and will offer better credential handling for Kubernetes deployments.

Also new in Harbor v2.0 is the ability to configure SSL for core Harbor services. When configured, internal Harbor services will encrypt their service-to-service communication. This feature enhances the security posture of Harbor and reduces the likelihood of man-in-the-middle attacks.

Webhooks can now be individually triggered, and come with Slack integration. Some users may not want to receive callbacks for every supported webhook action, so this update enables users to configure, at the project level, which webhooks to receive and the preferred callback method, HTTP or Slack.

Did you also notice the all-new dark mode in the updated Harbor UI? Download Harbor v2.0 and give it a shot!

Hopefully Harbor v2.0 has your attention. Join us for the CNCF Project Webinar on Harbor v2.0 on May 28, 2020 at 10:00am PDT by registering here.

Community Shoutouts!

About Harbor

Harbor is an open source, trusted cloud native registry project that stores, signs, and scans container images, Helm charts, and any other OCI-compliant artifacts. Harbor extends the open-source Docker Distribution by adding key enterprise-level features in authentication and access control (LDAP and AD as well as OIDC support for RBAC), two-way replication to and from other third-party registries, advanced garbage collection, and authenticity and provenance capabilities through third-party image scanning and signing solutions. Harbor, which supports Docker Compose and Kubernetes, deploys in under 30 minutes. Harbor can be fully managed through a single web console and comes with a rich set of APIs managed with Swagger.

Collaborate with the Harbor Community!

Get updates on Twitter (@project_harbor)

Chat with us on Slack (#harbor on the CNCF Slack)

Collaborate with us on GitHub: github.com/goharbor/harbor

Attend the community meetings: https://github.com/goharbor/community/wiki/Harbor-Community-Meetings

Alex Xu

Harbor Contributor

Senior Product Manager, VMware


Introduction to OpenTelemetry (Overview Part 1/2)

By | Blog

Member Blog Post

Guest post originally published on the Epsagon blog by Ran Ribenzaft, co-founder and CEO at Epsagon

OpenTelemetry is an exciting new observability ecosystem with a number of leading monitoring companies behind it. It is a provider-agnostic observability solution supported by the CNCF and represents the third evolution of open observability after OpenCensus and OpenTracing. Supporting APIs for both tracing and metrics, OpenTelemetry provides rich auto instrumentation and SDKs for a number of programming languages and aims to support provider-agnostic instrumentation, allowing you to avoid vendor lock-in with its OpenTelemetry collector.

This article provides a technical overview of OpenTelemetry and its major components: metrics, tracing, SDKs, and its collector agent. It explains why a new approach to telemetry is important, discusses its current state and supported languages, and talks about the reasoning behind some of its implementation details. Finally, we cover some considerations when getting started with OpenTelemetry as well.

What is OpenTelemetry?

OpenTelemetry is a set of standards, libraries, SDKs, and agents that provide full application-level observability. It uses the same standards-based approach as OpenCensus and OpenTracing that helps avoid vendor lock-in by decoupling application instrumentation and data export. OpenTelemetry’s vast ecosystem is comprised of:

  • Standards/Specifications
  • APIs
  • SDK Concrete Implementation of an API
    • Metrics
    • Tracing
    • Auto-Instrumentation
    • Exporters
  • Collector

OpenTelemetry Ecosystem

OpenTelemetry Ecosystem

Standards & Specifications

OpenTelemetry takes a standards-based approach to implementation. The focus on standards is especially important for OpenTelemetry since it demands tracing interoperability across languages. Many languages come with type definitions to use in implementations, such as for interfaces, that can be used for creating reusable components.

API Language-Specific Types and Interfaces

Each language implements the specification through its API. APIs contain language-specific type and interface definitions, which are abstract classes, types, and interfaces meant to be consumed by concrete language implementations. They also contain no-op implementations to enable local testing and provide tooling for unit testing. The definition of an API is located in each language’s implementation. As stated in the OpenTelemetry Python Client:

“The opentelemetry-api package includes abstract classes and no-op implementations that comprise the OpenTelemetry API following the specification.”

You can see a similar definition in the OpenTelemetry Javascript Client:

“This package provides everything needed to interact with the OpenTelemetry API, including all TypeScript interfaces, enums, and no-op implementations. It is intended for use both on the server and in the browser.”

SDK Exportable Implementation of the Specification

SDKs are the glue that combines exporters with the API. SDKs are concrete, executable implementations of the API. The rest of this section will explore each of the major OpenTelemetry components: exporters, metrics, tracing, auto-instrumentation, and the collector.


Exporters enable you to extract data from applications and transform data into specific instrumentation protocols and vendors. The concept of exporters here is the same as with OpenCensus and OpenTracing. Thus, you can instrument the application using OpenTelemetry and then configure an exporter to determine where the OpenTelemetry data is sent. This decouples the instrumentation from any specific vendor or protocol, avoiding vendor lock-in.


If you’ve already used OpenCensus, you should be very familiar with metrics. The primitive for combining measures (actual metric events) with an exporter is called a Meter in OpenTelemetry. The metric primitives are generic to capture a wide variety of metric events, as shown below:

opentelemetry Meter usage example

Meter usage example


Tracing in OpenTelemetry is very similar to that in OpenTracing. OpenTelemetry introduces the concept of a TracerProvider, which can model global tracer instances in a singleton pattern, similar to OpenTracing’s global tracer. OpenTelemetry also introduces additional abstractions, such as SpanProcessors, which are how exporters are attached to the OpenTelemetry API calls:

Tracer configuration

Tracer/exporter configuration


Auto-instrumentation is the ability to dynamically instrument language-specific libraries for tracing. Instrumenting libraries for tracing requires propagating a trace context throughout all call sites. Modifying code to propagate this can be difficult with legacy projects and large projects and is extremely difficult to do in languages like node.js, which have historically lacked thread-local storage. Auto-instrumenting will automatically patch common libraries (such as HTTP clients/servers, web frameworks, and database clients) to automatically add tracing!

Epsagon is also incorporating their language-specific auto-instrumentation frameworks into Python, Ruby, Java, Go, Node.js, PHP, and .NET, which drastically cuts down on the time it takes to instrument tracing.


One of the biggest new features with OpenTelemetry is the concept of an agent. Agents are standalone daemons that collect metrics. To support agents, OpenTelemetry has created its provider agnostic protocol: collectors. These decouple the exportation and transformation of telemetry from the collection. OpenTelemetry also offers a new vendor-agnostic protocol to go along with the collector. While the protocol is still in its infancy, the goal is to further decouple observability instrumentation from specific vendors!

Why OpenTelemetry?

Here are a few reasons behind CNCF’s development.

Evolution of Standards

One reason for these new components and abstractions is an evolution of standards. OpenCensus started with Google and represented its strategy with a tracing implementation tailored to Google’s tracing implementation. OpenTracing was then an evolution of OpenCensus, taking a standards-based approach to implementing tracing. Both projects inherited the concept of “exporters” and decoupled instrumentation from exportation. OpenTelemetry is a merging of these two frameworks. Once OpenTelemetry is stable, there shouldn’t be a need to use multiple frameworks, but until it is, it’s important to consider OpenCensus and OpenTracing as well.


At the heart of OpenTelemetry is the decoupling of language instrumentation code from vendors. With OpenTelemetry, applications only need to be instrumented once regardless of the provider. This allows companies to choose the best provider for their needs, and they can even change providers with only very minimal changes to their code! And in the case of the OpenTelemetry Collector, no code changes are required!

More Generic APIs

OpenTelemetry also evolves a number of the OpenTracing and OpenCensus APIs, introducing new concepts and abstractions. For example, OpenTracing has the concept of a span “tag,” which is a way to attach key/value data to individual spans. Best practices for choosing tags haven’t changed, but the concept of a tag has been replaced with an “annotation,” which is a more generic form of “tag.” OpenTelemetry has introduced more generic abstractions for several different components.

Additionally, it encodes concepts that were previously only conventions, such as the OpenTracing semantic conventions, into the OpenTelemetry API. In OpenTracing, the span.kind tag was a convention that was not enforced by the API but had significance in some of the tracing providers (OpenCensus specifies SpanKind). OpenTelemetry pulls this concept from OpenCensus into the API and makes SpanKind a property of spans. Figure N shows an example of having an explicit kind in OpenTelemetry:

Span Creation

SpanKind on Span Creation

Asynchronous Events

OpenTelemetry treats asynchronous events as first-class citizens through its Links API. In OpenTracing, there are two ways to model causal relationships between spans. The relationship is specified during the span creation of Tracer.StartSpan() calls:

  • ChildOf: The Parent is dependent on the new span’s results.
  • FollowsFrom: The parent is not dependent on the new spans results.

It establishes causality explicitly through the Links API, which collapses the distinction. The example below shows the Golang API for creating a new span, specifying links:

GitHub opentelemetry-go repo

Linked Example Go

Supported Languages

All major programming languages are supported by OpenTelemetry. Detailed information on the status of different projects is available on the OpenTelemetry website and on each language’s GitHub page.

OpenTelemetry Language Progress

OpenTelemetry Language Progress

Progress is being made quickly, so check back often! A missing feature today could easily be implemented in a couple of days or weeks.

Getting Started with OpenTelemetry

Since this is still a young project, you need to perform some background research before getting started. It’s a good idea to:

  • Check the OpenTelemetry language version.
  • Check feature support for your target languages.
  • Check available exporters for your target languages.

After this, you should check out examples for your chosen language in its given GitHub repo.


OpenTelemetry has all the components necessary to be a one-stop observability solution:

  • Standards-first approach
  • Language-specific SDKs
  • Metrics
  • Traces
  • Collectors
  • Auto-instrumentation

OpenTelemetry aims to embody metrics and tracing, two of the three pillars of observability. But before making the switch, make sure to check if it supports the languages you want to use because each language is in a different phase of implementation and some features may not be available across all languages. OpenTelemetry has made significant progress in the last six months and continues to do so. If you’re looking to implement, it provides backward compatibility for both OpenCensus and OpenTracing as well, reducing the friction involved in getting started.

How to manage Secrets in Kubernetes environment

By | Blog

Member Blog Post

Guest post originally published on Medium by Saurabh Gupta, Sr. Developer Advocate at DigitalOcean


However, there are times when you want bake-in sensitive secret information into your Kubernetes cluster and share it across when needed. You do not want to put this information into a Pod definition YAML or a docker image. This is where Kubernetes Secret comes to your rescue.

In this post, we will try to gain more insight into how we can manage secrets effectively in Kubernetes.

Why Use a Secret?

To solve this challenge, Kubernetes provides an object called Secret, which we can use to store sensitive data. Secrets management also enables better management of microservices-based software architecture.

What is the Kubernetes Secret?

Kubernetes Secret can be injected into a Pod container either as an environment variable or mounted as a file.

Difference between a ConfigMap and Secrets in Kubernetes?

So use Secrets for storing critical data like API keys, passwords, service-account credentials, etc and use Configmaps for not-secret configuration data like app theme, base platform URL, etc

How does Kubernetes Secrets Work?

  • Built-in secrets — Kubernetes Service Accounts automatically create secrets and attach them to containers with API Credentials. The automatic creation and use of API credentials can be disabled or overridden if desired.
  • Custom secrets — you can define your own sensitive data and create a custom secret to store it.

Built-in Kubernetes secrets

These secrets are stored in plaintext format on the cluster’s etcd server, and unless, etc is configured to encrypt communication using TLS, are visible on the wire as the etcd cluster is synchronized. Furthermore, anyone who has, or can gain root access to any node in the cluster can read all secret data by impersonating the kubelet.

Creating your own Custom secrets:

$ kubectl create secret <secret-name>

If you want this secret to be added to a specific namespace or context add the — namespace or use-context arguments to this command. (Otherwise, it will be added to the default namespace).

You can use $ kubectl describe secret <secret-name> to view a summary of your secret. Using $ kubectl get secret or $ kubectl describe secret will not reveal the information in the secret.

  • Creating a Secret manually: you can also create a Secret in a file first, in JSON or YAML format, and then create that object. The name of a Secret object must be a valid DNS subdomain name. The Secret contains two maps: data and stringData. The data field is used to store arbitrary data, encoded using base64. The stringData field is provided for convenience and allows you to provide secret data as unencoded strings.

Secret Management in Cloud:

Some of the popular secret management solutions from different cloud vendors are shared below:

  1. Secrets Manager from AWS
  2. Key Vault from Azure Cloud
  3. Cloud Key Management Service from Google cloud
1 2 47