All Posts By

cncf

A Look Back At KubeCon + CloudNativeCon Shanghai 2018

By | Blog

Now that we’ve finally caught our breath after a fantastic two days at the KubeCon + CloudNativeCon in Shanghai, let’s dive into some of the key highlights and news. The best part is we get to see so many of you so soon again at KubeCon + CloudNativeCon Seattle in December!

The sold-out event with more than 2,500 attendees (technologists, maintainers and end users of CNCF’s hosted projects) was full of great keynotes, presentations, discussions and deep dives on projects including Rook, Jaeger, Kubernetes, gRPC, containerd – and many more!  Attendees had the opportunity to hear a slew of compelling talks from CNCF project maintainers, community members and end users including eBay, JD.com and Alibaba.

After hosting successful events in Europe and North America the past few years, it’s no wonder China was the next stop on the tour. Asia has seen a spike in of cloud native adoption to the tune of 135% since March 2018. You can find more information about the tremendous growth of cloud usage in China from CNCF’s Survey.

The conference kicked off by welcoming 53 new global members and end users to the Foundation, including several from China such as R&D China Information Technology, Shanghai Qiniu Information Technologies, Beijing Yan Rong Technology and Yuan Ding Technology. The CNCF has 40 members in China, which represents a little more than 10% of the CNCF’s total membership.

AI Fuels Even Greater Kubernetes Growth

It wouldn’t be KubeCon without some great talks on Kubernetes! Opening the first KubeCon conference in China, Janet Kuo, co-chair of conference, said: “Kubernetes is boring and that’s good.”

During the conference there were many great sessions on AI as well as a Kubernetes AI panel discussion that included product managers, data scientists, engineers, and architects from Google, CaiCloud, eBay, and JD.com

“We are extending AI to edge with Kubernetes.”

Harbor

In further exciting China news, Harbor, the first project contributed by VMware to CNCF, successfully moved from sandbox into incubation.  “Harbor is not only the first project donated by VMware to CNCF, but it is also the first Chinese program developed in the Chinese open source community donated to CNCF.”

– More details will be shared on an integration with Harbor in KubeCon Seattle

CNCF Diversity Scholarship

As always, the Foundation offered scholarships to members of traditionally underrepresented groups in the technology and/or open source communities. With $122,000 in diversity scholarship funds available, travel and/or conference registration was covered for 68 recipients KubeCon attendees.Scholarship funding was provided by AWS, CNCF, Google, Helm Summit, Heptio and VMware.

https://twitter.com/LaDonnaJQuinn/status/796475306320097280

End User Awards

The end user community continues to grow at an incredible pace and a very deserving JD.com won the CNCF End User Award.

Missed out on KubeCon + CloudNativeCon Shanghai? Don’t worry as you have more chances to attend. KubeCon + CloudNativeCon North America 2018 is taking place in Seattle, WA from December 10-13. The Conference is currently sold out, but if you’d like to be added to the waitlist, fill out this form. You will be notified as new spots become available.

CNCF News

The Linux Foundation And Cloud Native Computing Foundation Announce New Training Course For Prometheus

Cloud Native Computing Foundation Announces JD.com as Winner of Top End User Award

Cloud Native Computing Foundation Opens Inaugural KubeCon + CloudNativeCon China and Welcomes 53 New Members

TOC Votes to Move Harbor into CNCF Incubator

Tweets:

Sessions, Sessions, Session

Thank you Translators!

Meet the Maintainers

Community Coming Together

Save the Dates!

The call for speakers and proposals is now open for KubeCon + CloudNativeCon Europe 2019 which takes place in Barcelona from May 20-23. Registration will open early in 2019.

We’ll be back in China for KubeCon + CloudNativeCon Shanghai from June 24-26. Registration will go live early 2019.

And to finish of 2019, there’s KubeCon San Diego from November 18 – 21.  

We hope to see you at one of or all of these 2019 events!

 

 

The “Function” Package – Creating a Spec for Serverless Application Packaging

By | Blog

by Yoav Landman, CTO, JFrog

What’s in a function?

Creating serverless applications is a multi-step process. One of the critical steps in this process is packaging the serverless functions you want to deploy into your FaaS (Function as a Service) platform of choice.

Before a function can be deployed it needs two types of dependencies: direct function dependencies and runtime dependencies. Let’s examine these two types.

Direct function dependencies – These are objects that are part of the function process itself and include:

  • The function source code or binary
  • Third party binaries and libraries the function uses
  • Application data files that the function directly requires

Runtime function dependencies – This is data related to the runtime aspects of your function. It is not directly required by the function but configures or references the external environment in which the function will run. For example:

  • Event message structure and routing setup
  • Environment variables
  • Runtime binaries such as OS-level libraries
  • External services such as databases, etc.

For a function service to run, all dependencies, direct and runtime, need to be packaged and uploaded to the serverless platform. Today, however, there is no common spec for packaging functions. Serverless package formats are vendor-specific and are highly dependent on the type of environment in which your functions are going to run. This means that, for the most part, your serverless applications are locked-down to a single provider, even if your function code itself abstracts provider-specific details.

This article explores the opportunity to create a spec for an open and extensibile “Function Package” that enables the deployment of a serverless function binary, along with some extra metadata, across different FaaS vendors.

Function runtime environments

When looking at today’s common runtime FaaS providers, we see two mainstream approaches for running functions: container function and custom function runtimes. Let’s have a closer look at each type.

Container function runtimes

This method uses a container-based runtime, such as Kubernetes, and is the common runtime method for on-prem FaaS solutions. As its name suggests, it usually exposes container runtime semantics to users at one level or another. Here’s how it works.

A container entry-point is used as a function, together with an eventing layer, to invoke a specific container with the right set of arguments. The function is created with docker buildor any other image generator to create an OCI container image as the final package format.

Examples of serverless container-based runtimes include: Google’s Knative, Bitnami’s Kubeless, Iguazio’s Nuclioand Apache’s OpenWhisk.

The main issue with container functions is that developing a function often requires understanding of the runtime environment; developers need to weave their function into the container. Sometimes this process is hidden by the framework, but it is still very common for the developer to need to write a Dockerfile for finer control over operating system-level services and image structure. This leads to high coupling of the function with the runtime environment.

Custom function runtimes

Custom function runtime are commonly offered by cloud providers. They offer a “clean” model where functions are created by simply writing handler callbacks in your favorite programming language. In contrast to container-based functions, runtime details are left entirely to the cloud provider (even in cases where, behind-the-scenes, the runtime is based on containers).

Examples of serverless custom function runtimes are: AWS Lambda, Azure Functions and Google Cloud Functions.

Custom runtime environments use a language-centric model for functions. Software languages already have healthy packaging practices (such as Java jars, npm packages or Go modules). Still, no common packaging format exists for their equivalent functions. This creates a cloud-vendor lock-in. Some efforts have been made to define a common deployment model, (such as AWS Serverless Application Model (SAM) and the open-source Serverless Framework), but these models assume a custom binary package already exists or they may include the process of building it to each cloud provider standards.

The need for a Function Package

To be able to use functions in production, we need to use stable references to them to enable repeatability in deploying, upgrading and rolling back to a specific function version. This can be achieved with an immutable, versioned, sealed package that contains the function with all its direct dependencies.

Container images may meet these requirements because they offer a universal package format. However, there’s a side effect of “polluting” the function by tightly coupling it with details of the container runtime. Custom runtimes also exhibit coupling with their function packages. While they offer clean functions, they use proprietary package formats that mix the function dependencies with the runtime dependencies.

What we need is a clear separation: a clean function together with its dependencies in a native “Function Package” separate from external definitions of runtime-specific dependencies. This separation would allow us to take a function and reuse it across different serverless platforms, only adding external configuration as needed.

Getting a function to run

Let us reexamine for a moment the steps required to build and run a container-based function. We can look at this as a four step process:

(1) Packaging Build a container image together with (direct + runtime) function dependencies and entry point definition
(2) Persisting Push the image to a container registry so that we have a stable reference to it
(3) Installing Pull the image to the runtime and configure it according to runtime-specific dependencies
(4) Running Accept function events at the defined entry point

 

This process works pretty well for container runtimes. We can try to formulate it into a more generalized view:

(1) Packaging Build a function package that contains its direct dependencies (or references to them) and entry point definition
(2) Persisting Upload the package to a package registry
(3) Installing Download the package (and its direct dependencies) to the runtime and configure runtime-specific dependencies
(4) Running Accept function events at the defined entry point

Creating a function package

This general process can be applied across many serverless providers! Developers only need to worry about creating a clean Function Package and keeping it persistent. Runtime configuration can be provided upon installation and can be vendor-specific.

A Function Package would contain:

  1. Function files – in source code or binary format
  2. Direct dependencies – libraries and data files
  3. Entry point definition

The experience for the developer is simple and programming-focused, and therefore we can create spec “profiles” that map to a specific language type. For example:

Java Profile Golang Profile Docker Profile
Function files JAR file Go module source files Build file references to generic files run by the entry-point
Direct dependencies Generated pom file for the jar with a full flat list of dependencies + data files go.mod file containing the dependencies + data files Build file references to base image + other service files and data
Entry point definition Class reference “main” package reference Build file references to image entry point

What about dependencies?

A function package does not necessarily have to physically embed binary dependencies if they can be reliably provided by the runtime prior to installation. Instead, only references to dependencies could be declared. For example, external JAR coordinates declared in a POM file, or go packages declared in a go.modfile. These dependencies would be pulled by the runtime during installation – similar to how a container image is pulled from a docker registry by function runtimes.

By using stable references, we guarantee repeatability and reuse. We also create lighter packages that allow for quicker and more economic installation of functions by dependencies from a registry closer to the runtime, not having to reupload them each time.

Summary

Creating profiles for common runtime languages allows functions to be easily moved across different serverless FaaS providers, adding vendor-specific information pertaining to runtime configuration and eventing only at the installation phase. For application developers, this means they can avoid vendor lock-in and focus on writing their applications’ business logic without having to become familiar with operational aspects. This goal can be achieved more quickly by creating shareable Function Packages as a CNCF Serverless spec. If you are interested in discussing more – let’s talk!

JD.com Recognized for Cloud Native Open Source Technology Usage with CNCF Top End User Award

By | Blog

By JD.com

JD.com, China’s largest retailer, has been presented with the Top End User Award by the Cloud Native Computing Foundation (CNCF) for its unique usage of cloud native open source projects. The award was announced at China’s first KubeCon + CloudNativeCon conference hosted by CNCF, which gathered thousands of technologists and end users in Shanghai from November 13-15 to discuss the future of open source technology development.

Providing the ultimate e-commerce experience to customers requires JD to house and process enormous amount of information that must be accessible at incredibly fast speeds. To put it in perspective, five years ago there were only about two billion images in JD’s product databases for customers. Today, there are more than one trillion, and that figure increases by 100 million images each day. This is why JD turned to CNCF’s Kubernetes project in recent years to accommodate its clusters.

JD currently runs the world’s largest Kubernetes cluster in production. The company first rolled out its containerized infrastructure a few years ago and, as the clusters grew, JD was one of the early adopters to shift to Kubernetes. The move, known as JDOS 2.0, marked the beginning of JD’s partnership with CNCF to build stronger collaborative relationships with the industry’s top developers, end users, and vendors. Ultimately, CNCF provided a window for JD to both contribute to and benefit from open source development.

In April, JD became the CNCF’s first platinum end user member, and took a seat on the organization’s governance board in order to help shape the direction of future Foundation initiatives. JD’s overall commitment to open source is highly aligned with its broader Retail as a Service strategy in which the company is empowering other retailers, partners, and industries with a broad range of capabilities in order to increase efficiency, reduce costs, and provide a higher level of customer service.

JD’s Kubernetes clusters support a wide range of workloads and big data and AI-based applications. The platform has boosted collaboration and enhanced productivity by reducing silos between operations and DevOps teams. As a result, JD has contributed code to projects such as Vitess, Prometheus, Kubernetes, CNI (Container Networking Interface), and Helm as part of its collaboration with CNCF.

“One contribution that we are very proud of is Vitess, the CNCF project for scalable MySQL cluster management,” said Haifeng Liu, chief architect, JD.com. “We are not only the largest end user of Vitess, but also a very active and significant contributor. We’re looking forward to working together with CNCF and its members to pave the way for future development of open source technology.”

Vitess allows JD to manage resources much more flexibly and efficiently, reducing operational and maintenance costs, and JD has one of the world’s most complex Vitess deployments. The company is actively collaborating with the CNCF community to add new features such as subquery support and global transactions, setting industry benchmarks.

“JD spearheads the use of cloud native technologies at scale within the APAC market, and is responsible for one of the largest Kubernetes deployments in the world,” said Chris Aniszczyk, COO of Cloud Native Computing Foundation. “The company also makes significant contributions to CNCF projects and its involvement in the community made JD a natural fit for this award.”

JD will continue to work on contributions to cloud native technologies as well as release its own internal and homegrown open source projects to empower others in the community.


京东(JD.com)应用云原生开源技术荣获“云原生计算基金会 (CNCF) 最佳终端用户奖”

作者:京东(JD.com)

中国最大的零售商京东因其开创性地使用云原生开源项目荣获“云原生计算基金会最佳终端用户奖”。11 月 13 日至 15 日,几千名技术人员和终端用户齐聚上海,参加云原生计算基金会举办的中国首届 KubeCon + CloudNativeCon,探讨开源技术的未来发展,主办方也在会上隆重宣布了这一奖项。

为了向客户提供最佳的电子商务体验,京东必须储存和处理庞大的信息量,确保人们可以用快到不可思议的速度访问该信息。相比五年之前,京东的客户产品数据库中仅大约 20 亿张图片。如今,京东数据库已有超过 一万亿张图片,并以每天 1 亿张图片的速度增长。正因为如此,近年来,京东将目光转向了云原生计算基金会的 Kubernetes 项目,以因应企业集群。

目前,京东在生产中运行着全世界庞大的Kubernetes 集群。几年前,公司首次推广容器化基础架构,并且,随着集群规模的增加,京东成为 Kubernetes 集群的早期用户之一。此举便是众所周知的 JDOS 2.0,标志着京东与云原生计算基金会合作的开端,促使京东与业内顶级开发商、终端用户及供应商建立更强大的合作关系。最终,云原生计算基金会为京东提供了一个窗口,让京东既促进了开源技术的发展,又受益于它的发展。

今年 4 月,京东成为云原生计算基金会的首位白金级终端用户成员,并加入了该机构的管理委员会,帮助确定基金会未来的行动方向。京东对开源技术的总体承诺与该企业推行的“零售即服务”策略不谋而合,通过该策略,京东为其他零售商、合作伙伴和行业提供广泛支持,从而提高效率、降低成本、提供更高水平的客户服务。

京东的 Kubernetes 集群支持广泛的工作负载以及大数据和人工智能应用。该平台通过减少运营团队与 DevOps 之间的筒仓,促进了合作,提高了生产效率。所以,在与云原生计算基金会合作的过程中,京东为项目贡献了代码,比如 Vitess、Prometheus、Kubernetes、CNI(容器网路接口)以及 Helm。

京东首席架构师刘海锋表示“我们引以为傲的一个贡献是 Vitess,它是云原生计算基金会的一个可扩展性 MySQL 集群管理项目。我们不仅是最大的 Vitess 终端用户,还是非常活跃和重要的贡献者。我们期待与云原生计算基金会及其成员合作,为开源技术的未来发展开辟道路。”

Vitess 帮助京东更加灵活高效地管理资源,降低运营和维护成本,同时京东拥有世界上最复杂的 Vitess 部署之一。京东与云原生计算基金会社圈积极合作,增加新功能(比如子查询支持与全球交易),设定行业基准。

云原生计算基金会的首席运营官 Chris Aniszczyk 表示,“京东在亚太市场带头大规模使用云原生技术,并负责完成世界上最大的 Kubernetes 部署之一。该公司为云原生计算基金会的项目做出了巨大贡献,正是因为京东的参与,它获得这一奖项当之无愧。”

京东将继续努力为云原生技术做出贡献,同时发布自己的内部本土开源项目,为圈内的其他参与者提供支持。

TOC Votes to Move Harbor into CNCF Incubator

By | Blog

Harbor started in 2014 as a humble internal project meant to address a simple use case: storing images for developers leveraging containers. The cloud native landscape was wildly different and tools like Kubernetes were just starting to see the light of the day. It took a few years for Harbor to mature to the point of being open sourced in 2016, but the project was a breath of fresh air for individuals and organizations attempting to find a solid container registry solution. We were confident Harbor was addressing critical use cases based on its strong growth in user base early on.

We were incredibly excited when Harbor was accepted to the Cloud Native Sandbox in the summer of 2018. Although Harbor had been open sourced for some years by this point, having a vendor-neutral home immediately impacted the project resulting in increased engagement via our community channels and GitHub activity.

There were many things we immediately began tackling after joining the Sandbox, including addressing some technical debt, laying out a roadmap based solely on community feedback, and expanding the number of contributors to include folks that have consistently worked on improving Harbor from other organizations. We’ve also started a bi-weekly community call where we hear directly from Harbor users on what’s working well and what’s not. Finally, we’ve ratified a project governance model that defines how the project operates at various levels.

Given Harbor’s already-large global user base across organizations small and large, proposing the project mature into the CNCF Incubator was a natural next step. The processes around progressing to Incubation are defined here. In order to be considered, certain growth and maturity characteristics must first be demonstrated by the project:

  • Production usage: There must be users of the project that have deployed it to production environments and depend on its functionality for their business needs. We’ve worked closely with a number of large organizations leveraging Harbor the last number of years, so: check!
  • Healthy maintainer team: There must be a healthy number of members on the team that can approve and accept new contributions to the project from the community. We have a number of maintainers that founded the project and continue to work on it full time, in addition to new maintainers joining the party: check!
  • Healthy flow of contributions: The project must have a continuous and ongoing flow of new features and code being submitted and accepted into the codebase. Harbor released v1.6 in the summer of 2018, and we’re on the verge of releasing v1.7: check!
  • CNCF’s Technical Oversight Committee (TOC) evaluated the proposal from the Harbor team and concluded that we had met all the required criteria. It is both deeply humbling and an honor to be in the company of other highly-respected incubated projects like gRPC, Fluentd, Envoy, Jaeger, Rook, NATS, and more.

What’s Harbor anyway?

Harbor is an open source cloud native registry that stores, signs, and scans container images for vulnerabilities.

Harbor solves common challenges by delivering trust, compliance, performance, and interoperability. It fills a gap for organizations and applications that cannot use a public or cloud-based registry, or want a consistent experience across clouds.

Harbor addresses the following common use cases:

  • On-prem container registry – organizations with the desire to host sensitive production images on-premises can do so with Harbor.
  • Vulnerability scanning – organizations can scan images before they are used in production. Images with failed vulnerability scans can be blocked from being pulled.
  • Image signing – images can be signed via Notary to ensure provenance.
  • Role-based Access Control – integration with LDAP (and AD) to provide user- and group-level permissions.
  • Image replication – production images can be replicated to disparate Harbor nodes, providing disaster recovery, load balancing and the ability for organizations to replicate images to different geos to provide a more expedient image pull.

Architecture

The “Harbor stack” is comprised of various 3rd-party components, including nginx, Docker Distribution v2, Redis, and PostgreSQL. Harbor also relies on Clair for vulnerability scanning, and Notary for image signing.

The Harbor components, highlighted in blue, are the heart of Harbor and are responsible for most of the heavy lifting in Harbor:

  • Core Services provides an API and UI interface. Intercepts docker pushes / pulls to provide role-based access control and also to prevent vulnerables images from being pulled and subsequently used in production (all of this is configurable).
  • Admin service is being phased out for v1.7, with feature / functionality being merged into the core service.
  • Job Service is responsible for running background tasks (e.g., replication, one-shot or recurring vulnerability scans, etc.). Jobs are submitted by the core service and run in the job service component.

Currently Harbor is packaged via both docker-compose service definition and a Helm chart.

Want to learn more?

The best way to learn about Harbor is:

Community stats and graphs

Harbor has continued an upward trajectory of community growth through 2018. The stats below visualize the consistent growth pre- and post-acceptance into the Cloud Native Sandbox:

Where we are

Harbor is both mature and production-ready. We know of dozens of large organizations leveraging Harbor in production, including at least one serving millions of container images to tens-of-thousands of compute nodes. The various components that comprise Harbor’s overall architecture are battle-tested in real-world deployments.

Harbor is API driven and is being used in custom SaaS and on-prem products by various vendors and companies. It’s easy to integrate Harbor in your environment, whether a customer-facing SaaS or an internal development pipeline.

The Harbor team strives to release quarterly. We’re currently working on our eight major release, v1.7, due out soon. Over the last two releases alone we’ve made marked strides in achieving our long terms goals:

  • Native support of Helm charts
  • Initial support for deploying Harbor via Helm chart
  • Refactoring of our persistence layer, now relying solely on PostgreSQL and Redis – this will help us achieve our high-availability goals
  • Added labels and replication filtering based on labels
  • Improvements to RBAC, including LDAP group-based access control
  • Architecture simplification (i.e., collapsing admin server component responsibilities into core component)

Where we’re going

This is the fun part. 🙂

Harbor is a vibrant community of users – those who use Harbor and publicly share their experiences, the individuals who report and respond to issues, the folks who hang around in our Slack community, and those who spend time on GitHub improving our code and documentation. We’re all incredible fortunate at the rich and exciting ideas that are proposed via GitHub issues on a regular basis.

We’re still working on our v1.8 roadmap, but here are some major features we’re considering and might land at some point in the future (timing to be determined, and contributions are welcome!):

  • Quotas – system- and project-level quotas; networking quotas; bandwidth quotas; user quotas; etc.
  • Replication – the ability to replicate to non-Harbor nodes.
  • Image proxying and caching – a docker pull would proxy a request to, say, Docker Hub, then scan the image before providing to developer. Alternatively, pre-cache images and block images that do not meet vulnerability requirements.
  • One-click upgrades and rollbacks of Harbor.
  • Clustering – Harbor nodes should cluster, replicate metadata (users, RBAC and system configuration, vulnerability scan results, etc.). Support for wide-area clustering is a stretch goal.
  • BitTorrent-backed storage – images are transparently transferred via BT protocol.
  • Improved multi-tenancy – provide additional multi-tenancy construct (system → tenant → project)

Please feel free to share your wishlist of features via GitHub; just open an issue and share your thoughts. We keep track of items the community desires and will prioritized based on demand.

How to get involved

Getting involved in Harbor is easy. Step 1: don’t be shy. We’re a friendly bunch of individuals working on an exciting open source project.

The lowest-barrier of entry is joining us on Slack. Ask questions, give feedback, request help, share your ideas on how to improve the project, or just say hello!

We love GitHub issues and pull requests. If you think something can be improved, let us know. If you want to spend a few minutes fixing something yourself – docs, code, error messages, you name it – please feel free to open a PR. We’ve previously discussed how to contribute, so don’t be shy. If you need help with the PR process, the quickest way to get an answer is probably to ping us on Slack.

See you on GitHub!


技术委员会投票推举Harbor为孵化项目


By James Zabala 詹姆斯 扎巴拉

Harbor始于2014年,是一个内部发起的项目,旨在解决一个简单的问题:帮助开发人员存储容器镜像。 当时云原生的版图和现在完全不同,像Kubernetes这样的工具刚刚开始引起注意。 直到2016年,Harbor逐渐成熟并开源,它为试图解决类似问题的个人和组织带来了新的选择。 而用户量的持续增长也让我们相信Harbor 解决了关键的问题。


当Harbor于2018年夏天被 CNCF 接受为 “沙箱”项目时,我们感到非常兴奋。虽然Harbor已经开源了几年,但是有一个供应商中立的“家”立刻提高了社区的活跃度和github上用户的参与度。

成为沙箱项目后,我们立即开始着手改善之前存在的问题,包括偿还技术债务,基于社区反馈制定未来的路线图,并吸纳社区中积极贡献的开发者成为项目成员。我们还每两周定期召开社区电话会议,以便让社区用户有机会更直接地我们向我们提出意见。最后,我们还讨论通过了治理模式,并用它定义社区运行的规范。

鉴于Harbor已经拥有了庞大的用户基础,很自然地,我们认为将项目升级为CNCF“孵化”项目是接下来正确的一步。升级成为“孵化”项目的条件可在这里找到。 为了获得升级的资格,我们首先陈述并表明我们已经达到了要求:

生产环境的使用:必须有用户把该项目部署到生产环境,并用它的功能满足其商业需求。过去几年中我们已经和若干个大型组织密切合作帮助解决他们在生产环境使用Harbor中遇到的问题。(满足)
健康的维护团队:必须有足够数量的团队成员可以批准并接受社区对项目的新贡献。 除去新加入的维护人员外,本项目的创始团队成员仍在全职工作在此项目上。(满足!)
持续健康的成长:必须有持续新代码和功能被提交到项目的代码库中。Harbor在2018年夏天发布了1.6版,1.7版也马上会发布。(满足)
CNCF的技术委员会(TOC)已经对Harbor团队提出的申请进行了评估,并认为我们已经满足了所有的条件。我们很荣幸地成为像gRPC, Fluentd, Envoy, Jaeger, Rook, NATS等有高度声望的项目中的一员。
Harbor是什么?
Harbor是开源的云原生镜像库用来存储、签名容器镜像并对容器镜像进行漏洞扫描。

Harbor通过提供信任、合规、性能以及互操作等能力来解决常见的挑战。Harbor的出现使得那些无法使用公有仓库、基于云的镜像库异或在多云环境中想保留一致性体验的组织和应用有了新的选择。


Harbor解决以下常见用例:

本地容器镜像库-希望在本地托管敏感生产镜像的组织可以使用Harbor。
漏洞扫描-组织可以在上线前对镜像进行扫描。具有安全漏洞的镜像将被阻止拉取。
镜像签名-通过Notary对镜像进行签名以确保来源。
基于角色的访问控制-集成LDAP(和AD)来提供用户-组级别的权限控制。
镜像复制-生产镜像可以复制到不同的Harbor节点,以提供灾难恢复、负载均衡以及在不同地理位置复制镜像进而实现更方便的镜像拉取的能力。

系统架构

 


“Harbor技术栈”中有多个第三方组件,包括NGINX,Docker,Distribution V2,Redis以及PostgreSQL。Harbor同时依赖Clair来进行漏洞扫描,依赖Notary进行镜像签名。

图中蓝色高亮的Harbor组件,是Harbor的核心,承担大部分的处理工作:
内核服务-提供API和UI服务接口。解析docker push/pull请求以提供基于角色的访问控制,同时阻止漏洞镜像被拉取并随后用在生产环境中(都可配置)
配置服务将在V1.7中移除,相关的功能会合并到内核服务中去。
任务队列服务主要负责运行后台任务(比如:复制、单次或者经常性的漏洞扫描等)。任务通过内核服务提交后在任务队列服务组件中运行。

目前可通过docker-compose和Helm chart来部署Harbor。
更多学习资料
学习Harbor的最佳途径:
Harbor 主页:https://goharbor.io/
Harbor CNCF在线研讨链接:https://www.cncf.io/event/webinar-harbor/
胶片:
https://drive.google.com/file/d/1F6nvZhtw6-bgwdlySXLq3OIvlC6f6Vvb/view?usp=sharing

社区统计图表

从2018年,Harbor社区成长度呈现持续上升的趋势。下列图表显示了Harbor作为”沙箱“项目加入CNCF基金会前后的稳定增长。


现状
Harbor是一个成熟且可付诸于生产的项目。已知有大量的公司在他们的生产环境中使用Harbor,其中至少一家公司成功部署了上万个Harbor节点来管理其百万级的容器镜像,这足以证明Harbor所有的功能组件都经过了实际工程检验。

Harbor是可API驱动的项目,已被许多厂商和合作公司集成到他们订制的SaaS系统或者离线安装产品中。部署Harbor到你的环境中并非难事,无论你的环境是面向客户的SaaS系统还是内部开发流程。

Harbor团队致力于季度发布。我们正在研发并将很快发布第八个大版本 – v1.7。对于实现Harbor的长期目标,仅在过去的两个版本中,我们就已经取得了显著的进步:

增加了Helm Charts的支持
支持使用Helm chart方式部署Harbor
重构持久化层,使得其仅依赖于PostgreSQL和Redis,这有助于实现高可用
增加标签功能,支持通过标签过滤的镜像复制
增强RBAC,包括支持LDAP的用户组管理
简化架构(合并admin server组件的职能到core组件)

未来发展

到了有趣的部分了。 🙂

Harbor是一个充满活力的社区,这离不开广大用户的积极参与:大家使用并公开分享经验、提出和回答问题、持续关注Slack社区、改进代码和文档。GitHub上经常有用户提出丰富而令人兴奋的想法,我们非常庆幸有着这样一群活跃的社区参与者。

我们仍然在制定1.8版本路线图,但是这里有一些我们正在考虑的主要功能,可能会在未来的某个时刻落地(时间待定,欢迎贡献!):

配额——系统和项目级配额、网络配额、带宽配额、用户配额等
复制——复制到非Harbor节点的能力
镜像代理和缓存——docker pull的请求将会被代理,比如对于从Docker Hub拉取的镜像,可以先对其进行扫描然后再提供给开发人员。还可以对镜像进行预缓存或者阻止不符合漏洞要求的镜像被下载
一键升级和回滚
群集——Harbor节点应该群集化,相互之间复制元数据(用户,RBAC和系统配置,漏洞扫描结果等),进而支持更大范围的集群
BitTorrent支持的存储——镜像通过BT协议透明传输
改进多租户——提供额外的多租户结构(系统→租户→项目)
如果您期待Harbor拥有某些新功能,只需要在GitHub开一个issue并描述您的想法,我们会跟踪社区反馈的需求列表,并根据需求程度确定优先级。
如何参与
参与到Harbor的过程非常容易。首先,放开胆子,参与到这个激动人心的开源项目中的人们都很友好。

加入到Slack是进入的最低门槛, 你可以在里面问个问题,给个反馈,请求帮助,分享如何改进项目的想法,或者只是打个招呼!

我们喜欢使用github提出问题和pull请求,如果你想到了哪些地方可以改进,告诉我们。如果你想自己花些时间来解决一些问题–例如:文档,代码,错误消息等任何你能想到的东西,请提交pull请求(PR)吧。前面我们谈到了如何贡献,所以不要犹豫,如果你在提交PR的过程中需要帮助,在Slack找我们,这里能得到快速帮助。

相约Github,不见不散!

The Beginner’s Guide to the CNCF Landscape

By | Blog

This blog was originally posted on the CloudOps blog, by Ayrat Khayretdinov

The cloud native landscape can be complicated and confusing. Its myriad of open source projects are supported by the constant contributions of a vibrant and expansive community. The Cloud Native Computing Foundation (CNCF) has a landscape map that shows the full extent of cloud native solutions, many of which are under their umbrella.

As a CNCF ambassador, I am actively engaged in promoting community efforts and cloud native education throughout Canada. At CloudOps I lead workshops on Docker and Kubernetes that provide an introduction to cloud native technologies and help DevOps teams operate their applications.

I also organize Kubernetes and Cloud Native meetups that bring in speakers from around the world and represent a variety of projects. They are run quarterly in MontrealOttawaTorontoKitchener-Waterloo, and Quebec City. Reach out to me @archyufaor email CloudOps to learn more about becoming cloud native.

In the meantime, I have written a beginners guide to the cloud native landscape. I hope that it will help you understand the landscape and give you a better sense of how to navigate it.

CNCF

The History of the CNCF

In 2014 Google open sourced an internal project called Borg that they had been using to orchestrate containers. Not having a place to land the project, Google partnered with the Linux Foundation to create the Cloud Native Computing Foundation (CNCF), which would encourage the development and collaboration of Kubernetes and other cloud native solutions. Borg implementation was rewritten in Go, renamed to Kubernetes and donated as the incepting project. It became clear early on that Kubernetes was just the beginning and that a swarm of new projects would join the CNCF, extending the functionality of Kubernetes.

The CNCF Mission

The CNCF fosters this landscape of open source projects by helping provide end-user communities with viable options for building cloud native applications. By encouraging projects to collaborate with each other, the CNCF hopes to enable fully-fledged technology stacks comprised solely of CNCF member projects. This is one way that organizations can own their destinies in the cloud.

CNCF Processes

A total of twenty-five projects have followed Kubernetes and been adopted by the CNCF. In order to join, projects must be selected and then elected with a supermajority by the Technical Oversight Committee (TOC). The voting process is aided by a healthy community of TOC contributors, which are representatives from CNCF member companies, including myself. Member projects will join the Sandbox, Incubation, or Graduation phase depending on their level of code maturity.

Sandbox projects are in a very early stage and require significant code maturity and community involvement before being deployed in production. They are adopted because they offer unrealized potential. The CNCF’s guidelines state that the CNCF helps encourage the public visibility of sandbox projects and facilitate their alignment with existing projects. Sandbox projects receive minimal funding and marketing support from the CNCF and are subject to review and possible removal every twelve months.

Projects enters the Incubation when they meet all sandbox criteria as well as demonstrate certain growth and maturity characteristics. They must be in production usage by at least three companies, maintain healthy team that approves and accepts a healthy flow of contributions that include new features and code from the community.
Once Incubation projects have reached a tipping point in production use, they can be voted by the TOC to have reached Graduation phase. Graduated projects have to demonstrate thriving adoption rates and meet all Incubation criteria. They must also have committers from at least two organizations, have documented and structured governance processes, and meet the Linux Foundation Core Infrastructure Initiative’s Best Practices Badge. So far, only Kubernetes and Prometheus have graduated.

The Projects Themselves

Below I’ve grouped projects into twelve categories: orchestration, app development, monitoring, logging, tracing, container registries, storage and databases, runtimes, service discovery, service meshes, service proxy, security, and streaming and messaging. And provided information that can be helpful for companies or individuals to evaluate what each project does, how project integrates with other CNCF projects and understand its evolution and current state.

Orchestrations

KubernetesKubernetes (graduated) –  Kubernetes automates the deployment, scaling, and management of containerised applications, emphasising automation and declarative configuration. It means helmsman in ancient Greek.  Kubernetes orchestrates containers, which are packages of portable and modular microservices. Kubernetes adds a layer of abstraction, grouping containers into pods. Kubernetes helps engineers schedule workloads and allows containers to be deployed at scale over multi-cloud environments. Having graduated, Kubernetes has reached a critical mass of adoption. In a recent CNCF survey, over 40% of respondents from enterprise companies are running Kubernetes in production.

App Development

HelmHelm (Incubating) – Helm is an application package manager that allows users to find, share, install, and upgrade Kubernetes applications (aka charts) with ease. It helps end users deploy existing applications (including MySQL, Jenkins, Artifactory and etc.) using KubeApps Hub, which display charts from stable and incubator repositories maintained by the Kubernetes community. With Helm you can install all other CNCF projects that run on top of Kubernetes. Helm can also let organizations create and then deploy custom applications or microservices to Kubernetes. This involves creating YAML manifests with numerical values not suitable for deployment in different environments or CI/CD pipelines. Helm creates single charts that can be versioned based on application or configuration changes, deployed in various environments, and shared across organizations.

Helm originated at Deis from an attempt to create a ‘homebrew’ experience for Kubernetes users. Helm V2 consisted of the client-side of what is currently the Helm Project. The server-side ‘tiller’, or Helm V2, was added by Deis in collaboration with Google at around the same time that Kubernetes 1.2 was released. This was how Helm became the standard way of deploying applications on top of Kubernetes.

Helm is currently making a series of changes and updates in preparation for the release of Helm V3, which is expected to happen by the end of the year. Companies that rely on Helm for their daily CI/CD development, including Reddit, Ubisoft, and Nike, have suggested improvements for the redesign.

TelepresenceTelepresence (Sandbox) – It can be challenging to develop containerized applications on Kubernetes. Popular tools for local development include Docker Compose and Minikube. Unfortunately, most cloud native applications today are resource intensive and involve multiple databases, services, and dependencies. Moreover, it can be complicated to mimic cloud dependencies, such as messaging systems and databases in Compose and Minikube. An alternative approach is to use fully remote Kubernetes clusters, but this precludes you from developing with your local tools (e.g., IDE, debugger) and creates slow developer “inner loops” that make developers wait for CI to test changes.

Telepresence, which was developed by Datawire, offers the best of both worlds. It allows the developer to ‘live code’ by running single microservices locally for development purposes while remaining connected to remote Kubernetes clusters that run the rest of their application. Telepresence deploys pods that contain two-way network proxies on remote Kubernetes clusters. This connects local machines to proxies. Telepresence implements realistic development/test environments without freezing local tools for coding, debugging, and editing.

Monitoring

PrometheusPrometheus (Graduated) – Following in the footsteps of Kubernetes, Prometheus was the second project to join the CNCF and the second (and so far last) project to have graduated. It’s a monitoring solution that is suitable for dynamic cloud and container environments. It was inspired by Google’s monitoring system, Borgman. Prometheus is a pull-based system – its configurations decide when and what to scrape. This is unlike other monitoring systems using push-based approach where monitoring agent running on nodes. Prometheus stores scrapped metrics in a TSDB. Prometheus allows you to create meaningful graphs inside the Grafana dashboard with powerful query languages, such as PromQL. You can also generate and send alerts to various destinations, such as slack and email, using the built-in Alert Manager.

Hugely successful, Prometheus has become the de facto standard in cloud native metric monitoring. With Prometheus one can monitor VMs, Kubernetes clusters, and microservices being run anywhere, especially in dynamic systems like Kubernetes. Prometheus’ metrics also automate scaling decisions by leveraging Kubernetes’ features including HPA, VPA, and Cluster Autoscaling. Prometheus can monitor other CNCF projects such as Rook, Vitesse, Envoy, Linkerd, CoreDNS, Fluentd, and NATS. Prometheus’ exporters integrate with many other applications and distributed systems. Use Prometheus’ official Helm Chart to start.

OpenMetricsOpenMetrics (Sandbox) – OpenMetrics creates neutral standards for an application’s metric exposition format. Its modern metric standard enables users to transmit metrics at scale. OpenMetrics is based on the popular Prometheus exposition format, which has over 300 existing exporters and is based on operational experience from Borgmon. Borgman enables ‘white-box monitoring’ and mass data collection with low overheads. The monitoring landscape before OpenMetrics was largely based on outdated standards and techniques (such as SNMP) that use proprietary formats and place minimal focus on metrics. OpenMetrics builds on the Prometheus exposition format, but has a tighter, cleaner, and more enhanced syntax. While OpenMetrics is only in the Sandbox phase, it is already being used in production by companies including AppOptics, Cortex, Datadog, Google, InfluxData, OpenCensus, Prometheus, Sysdig, and Uber.

CortexCortex (Sandbox) – Operational simplicity has always been a primary design objective of Prometheus. Consequently, Prometheus itself can only be run without clustering (as single nodes or container) and can only use local storage that is not designed to be durable or long-term. Clustering and distributed storage come with additional operational complexity that Prometheus forgoed in favour of simplicity. Cortex is a horizontally scalable, multi-tenant, long-term storage solution that can complement Prometheus. It allows large enterprises to use Prometheus while maintaining access to HA (High Availability) and long-term storage. There are currently other projects in this space that are gaining community interest, such as Thanos, Timbala, and M3DB. However, Cortex has already been battle-tested as a SaaS offering at both GrafanaLabs and Weaveworks and is also deployed on prem by both EA and StorageOS.

Logging and Tracing

FluentdFluentd (Incubator) – Fluentd collects, interprets, and transmits application logging data. It unifies data collection and consumption so you can better use and understand your data. Fluentd structures data as JSON and brings together the collecting, filtering, buffering, and outputting of logs across multiple sources and destinations. Fluentd can collect logs from VMs and traditional applications, however it really shines in cloud native environments that run microservices on top of Kubernetes, where applications are created in a dynamic fashion.

Fluentd runs in Kubernetes as a daemonset (workload that runs on each node). Not only does it collects logs from all applications being run as containers (including CNCF ones) and emits logs to STDOUT. Fluentd also parses and buffers incoming log entries and sends formatted logs to configured destinations, such as Elasticsearch, Hadoop, and Mongo, for further processing.

Fluentd was initially written in Ruby and takes over 50Mb in memory at runtime, making it unsuitable for running alongside containers in sidecar patterns. Fluentbit is being developed alongside Fluentd as a solution. Fluentbit is written in C and only uses a few Kb in memory at runtime. Fluentd is more efficient in CPU and memory usage, but has more limited features than Fluentd. Fluentd was originally developed by Treasuredata as an open source project.

Fluentd is available as a Kubernetes plugin and can be deployed as version 0.12, an older and more stable version that currently is widely deployed in production. The new version (Version 1.X) was recently developed and has many improvements, including new plugin APIs, nanosecond resolution, and windows support. Fluentd is becoming the standard for log collection in the cloud native space and is a solid candidate for CNCF Graduation.

OpenTracingOpenTracing (Incubator) – Do not underestimate the importance of distributed tracing for building microservices at scale. Developers must be able to view each transaction and understand the behaviour of their microservices. However, distributed tracing can be challenging because the instrumentation must propagate the tracing context both within and between the processes that exist throughout services, packages, and application-specific code. OpenTracing allows developers of application code, OSS packages, and OSS services to instrument their own code without locking into any particular tracing vendor. OpenTracing provides a distributed tracing standard for applications and OSS packages with vendor-neutral APIs with libraries available in nine languages. These enforce distributed tracing, making OpenTracing ideal for service meshes and distributed systems. OpenTracing itself is not a tracing system that runs traces to analyze spans from within the UI. It is an API that works with application business logic, frameworks, and existing instrumentation to create, propagate, and tag spans. It integrates with both open source (e.g. Jaeger, Zipkin) or commercial (e.g InstanaDatadog) tracing solutions, and create traces that are either stored in a backend or spanned into a UI format. Click here to try a tutorialor start instrumenting your own system with Jaeger, a compatible tracing solution.

JaegerJaeger (Incubator) – Jaeger is a distributed tracing system solution that is compatible with OpenTracing and was originally developed and battle tested by Uber. Its name is pronounced yā′gər and means hunter. It was inspired by Dapper, Google’s internal tracing system, and Zipkin, an alternative open source tracing system that was written by Twitter but built with the OpenTracing’s standard in mind. Zipkin has limited OpenTracing integration support, but Jaeger does provide backwards-compatibility with Zipkin by accepting spans in Zipkin formats over HTTP. Jaeger’s use cases monitor and troubleshoot microservices-based distributions, providing distributed context propagation, distributed transaction monitoring, root cause analysis, service dependency analysis, and performance and latency optimization. Jaeger’s data model and instrumentation libraries are compatible with OpenTracing. Its Modern Web UI is built with React/Javascript and has multiple supports for its backend. This includes Cassandra, Elasticsearch, and memory. Jaeger integrates with service meshes including Istio and Linkerd, making tracing instrumentation much easier.

Jaeger has observatibility because it exposes Prometheus metrics by default and integrates with Fluentd for log shipping. Start deploying Jaeger to Kubernetes using a Helm chart or the recently developed Jaeger Operator. Most contributions to the Jaeger codebase come from Uber and RedHat, but there are hundreds of companies adopting Jaeger for cloud native, microservices-based, distributed tracing.

Container Registries

HarborHarbor (Sandbox) – Harbor is an open source trusted container registry that stores, signs, and scans docker images. It provides free-of-charge, enhanced docker registry features and capabilities. These include a web interface with role-based access control (RBAC) and LDAP support. It integrates with Clair, an open source project developed by CoreOS, for vulnerability scanning and with Notary, a CNCF Incubation project described below, for content trust. Harbor provides activity auditing, Helm chart management and replicates images from one Harbor instance to another for HA and DR. Harbor was originally developed by VMWare as an open source solution. It is now being used by companies of many sizes, including TrendMicro, Rancher, Pivotal, and AXA.

Storage and Databases

RookRook (Incubator) – Rook is an open source cloud native storage orchestrator for Kubernetes.  With Rook, ops teams can run Software Distributed Systems (SDS) (such as Ceph) on top of Kubernetes. Developers can then use that storage to dynamically create Persistent Volumes (PV) in Kubernetes to deploy applications, such as Jenkins, WordPress and any other app that requires state. Ceph is a popular open-source SDS that can provide many popular types of storage systems, such as Object, Block and File System and runs on top of commodity hardware. While it is possible to run Ceph clusters outside of Kubernetes and connect it to Kubernetes using the CSI plugin, deploying and then operating Ceph clusters on hardware is a challenging task, reducing the popularity of the system. Rook deploys and integrates Ceph inside Kubernetes as a first class object using Custom Resource Definition (CRDs) and turns it into a self-managing, self-scaling, and self-healing storage service using the Operator Framework. The goal of Operatorsin Kubernetes is to encode human operational knowledge into software that is more easily packaged and shared with end users. In comparison to Helm that focuses on packaging and deploying Kubernetes applications, Operator can deploy and manage the life cycles of complex applications. In the case of Ceph, Rook Operator automates storage administrator tasks, such as deployment, bootstrapping, configuration, provisioning, horizontal scaling, healing, upgrading, backups, disaster recovery and monitoring. Initially, Rook Operator’s implementation supported Ceph only. As of version 0.8, Ceph support has been moved to Beta. Project Rook later announced Rook Framework for storage providers, which extends Rook as a  general purpose cloud native storage orchestrator that supports multiple storage solutions with reusable specs, logic, policies and testing. Currently Rook supports CockroachDB, Minio, NFS all in alpha and in future Cassandra, Nexenta, and Alluxio. The list of companies using Rook Operator with Ceph in production is growing, especially for companies deploying on Prem, amongst them CENGN, Gini, RPR and many in the evaluation stage.

VitessVitess (Incubator) – Vitess is a middleware for databases. It employs generalized sharding to distribute data across MySQL instances. It scales horizontally and can scale indefinitely without affecting your application. When your shards reach full capacity, Vitess will reshard your underlying database with zero downtime and good observativability. Vitess solves many problems associated with transactional data, which is continuing to grow.

TiKVTiKV (Sandbox) – TiKV is a transactional key-value database that offers simplified scheduling and auto-balancing. It acts as a distributed storage layer that supports strong data consistency, distributed transactions, and horizontal scalability. TiKV was inspired by the design of Google Spanner and HBase, but has the advantage of not having a distributed file system. TiKV was developed by PingCAP and currently has contributors from Samsung, Tencent Cloud, and UCloud.

Runtimes

RKTRKT (Incubator) – RKT (read as Rocket) is an application container runtime that was originally developed at CoreOS. Back when Docker was the default runtime for Kubernetes and was baked into kubelet, the Kubernetes and Docker communities had challenges working with each other. Docker Inc., the company behind the development of Docker as an open source software, had its own roadmap and was adding complexity to Docker. For example, they were adding swarm-mode or changing filesystem from AUFS to overlay2 without providing notice. These changes were generally not well coordinated with the Kubernetes community and complicated roadmap planning and release dates. At the end of the day, Kubernetes users need a simple runtime that can start and stop containers and provide functionalities for scaling, upgrading, and uptimes. With RKT, CoreOS intended to create an alternative runtime to Docker that was purposely built to run with Kubernetes. This eventually led to the SIG-Node team of Kubernetes developing a Container Runtime Interface (CRI) for Kubernetes that can connect any type of container and remove Docker code from its core. RKT can consume both OCI Images and Docker format Images. While RKT had a positive impact on the Kubernetes ecosystem, this project was never adopted by end users, specifically by developers who are used to docker cli and don’t want to learn alternatives for packaging applications. Additionally, due to the popularity of Kubernetes, there are a sprawl of container solutions competing for this niche. Projects like gvisor and cri-o (based on OCI) are gaining popularity these days while RKT is losing its position. This makes RKT a potential candidate for removal from the CNCF Incubator.

ContainerdContainerd (Incubator) – Containerd is a container runtime that emphasises simplicity, robustness and portability. In contrast to RKT, Containerd is designed to be embedded into a larger system, rather than being used directly by developers or end-users. Similar to RKT containerd can consume both OCI and Docker Image formats.  Containerd was donated to the CNCF by the Docker project. Back in the days, Docker’s platform was a monolithic application. However, with time, it became a complex system due to the addition of features, such as swarm mode. The growing complexity made Docker increasingly hard to manage, and its complex features were redundant if you were using docker with systems like Kubernetes that required simplicity. As a result, Kubernetes started looking for alternative runtimes, such as RKT, to replace docker as the default container runtime. Docker project then decided to break itself up into loosely coupled components and adopt a more modular architecture. This was formerly known as Moby Project, where containerd was used as the core runtime functionality. Since Moby Project, Containerd was later integrated to Kubernetes via a CRI interface known as cri-containerd. However cri-containerd is not required anymore because containerd comes with a built-in CRI plugin that is enabled by default starting from Kubernetes 1.10 and can avoid any extra grpc hop. While containerd has its place in the Kubernetes ecosystem, projects like cri-o (based on OCI) and gvisor are gaining popularity these days and containerd is losing its community interest. However, it is still an integral part of the Docker Platform.

Service Discovery

CoreDNSCoreDNS (Incubator) – CoreDNS is a DNS server that provides service discovery in cloud native deployments. CoreDNS is a default Cluster DNS in Kubernetes starting from its version 1.12 release. Prior to that, Kubernetes used SkyDNS, which was itself a fork of Caddy and later KubeDNS. SkyDNS – a dynamic DNS-based service discovery solution – had an inflexible architecture that made it difficult to add new functionalities or extensions. Kubernetes later used KubeDNS, which was running as 3 containers (kube-dns, dnsmasq, sidecar), was prone to dnsmasq vulnerabilities, and had similar issues extending the DNS system with new functionalities. On the other hand, CoreDNS was re-written in Go from scratch and is a flexible plugin-based, extensible DNS solution. It runs inside Kubernetes as one container vs. KubeDNS, which runs with three. It has no issues with vulnerabilities and can update its configuration dynamically using ConfigMaps. Additionally, CoreDNS fixed a lot of KubeDNS issues that it had introduced due to its rigid design (e.g. Verified Pod Records). CoreDNS’ architecture allows you to add or remove functionalities using plugins. Currently, CoreDNS has over thirty plugins and over twenty external plugins. By chaining plugins, you can enable monitoring with Prometheus, tracing with Jaeger, logging with Fluentd, configuration with K8s’ API or etcd, as well as enable advanced dns features and integrations.

Service Meshes

LinkerdLinkerd (Incubator) – Linkerd is an open source network proxy designed to be deployed as a service mesh, which is a dedicated layer for managing, controlling, and monitoring service-to-service communication within an application. Linkerd helps developers run microservices at scale by improving an application’s fault tolerance via the programmable configuration of circuit braking, rate limiting, timeouts and retries without application code change. It also provides visibility into microservices via distributed tracing with Zipkin. Finally, it provides advanced traffic control instrumentation to enable Canaries, Staging, Blue-green deployments. SecOps teams will appreciate the capability of Linkerd to transparently encrypt all cross-node communication in a Kubernetes cluster via TLS. Linkerd is built on top of Twitter’s Finagle project, which has extensive production usage and attracts the interest of many companies exploring Service Meshes. Today Linkerd can be used with Kubernetes, DC/OS and AWS/ECS. The Linkerd service mesh is deployed on Kubernetes as a DaemonSet, meaning it is running one Linkerd pod on each node of the cluster.

Service Proxies

EnvoyEnvoy (Incubator) – Envoy is a modern edge and service proxy designed for cloud native applications. It is a vendor agnostic, high performance, lightweight (written in C++) production grade proxy that was developed and battle tested at Lyft. Envoy is now a CNCF incubating project. Envoy provides fault tolerance capabilities for microservices (timeouts, security, retries, circuit breaking) without having to change any lines of existing application code. It provides automatic visibility into what’s happening between microservice via integration with Prometheus, Fluentd, Jaeger and Kiali. Envoy can be also used as an edge proxy (e.g. L7 Ingress Controller for Kubernetes) due to its capabilities performing traffic routing and splitting as well as zone-aware load balancing with failovers.

While the service proxy landscape already has many options, Envoy is a great addition that has sparked a lot of interest and revolutionary ideas around service meshes and modern load-balancing. Heptio announced project Contour, an Ingress controller for Kubernetes that works by deploying the Envoy proxy as a reverse proxy and load balancer. Contour supports dynamic configuration updates and multi-team Kubernetes clusters with the ability to limit the Namespaces that may configure virtual hosts and TLS credentials as well as provide advanced load balancing strategies. Another project that uses Envoy at its core is Datawires Ambassador – a powerful Kubernetes-native API Gateway. Since Envoy was written in C++, it is a super lightweight and perfect candidate to run in a sidecar pattern inside Kubernetes and, in combination with its API-driven config update style, has become a perfect candidate for service mesh dataplanes. First, the service mesh Istio announced Envoy to be the default service proxy for its dataplane, where envoy proxies are deployed alongside each instance inside Kubernetes using a sidecar pattern. It creates a transparent service mesh that is controlled and configured by Istio’s Control Plane. This approach compares to the DaemonSet pattern used in Linkerd v1 that provides visibility to each service as well as the ability to create a secure TLS for each service inside Kubernetes or even Hybrid Cloud scenarios. Recently Hashicorp announced that its open source project Consul Connect will use Envoy to establish secure TLS connections between microservices.

Today Envoy has large and active open source community that is not driven by any vendor or commercial project behind it. If you want to start using Envoy, try Istio, Ambassador or Contour or join the Envoy community at Kubecon (Seattle, WA) on December 10th 2018 for the very first EnvoyCon.

Security

Sysdig FalcoFalco (Sandbox) – Falco is an open source runtime security tool developed by Sysdig. It was designed to detect anomalous activity and intrusions in Kubernetes-orchestrated systems. Falco is more an auditing tool than an enforcement tool (such as SecComp or AppArmor). It is run in user space with the help of a Sysdig kernel module that retrieves system calls.

Falco is run inside Kubernetes as a DaemonSet with a preconfigured set of rules that define the behaviours and events to watch out for. Based on those rules, Falco detects and adds alerts to any behaviour that makes Linux system calls (such as shell runs inside containers or binaries making outbound network connections). These events can be captured at STDERR via Fluentd and then sent to ElasticSearch for filtering or Slack. This can help organizations quickly respond to security incidents, such as container exploits and breaches and minimize the financial penalties posed by such incidents.

With the addition of Falco to the CNCF sandbox, we hope that there will be closer integrations with other CNCF projects in the future. To start using Falco, find an official Helm Chart.

SpiffeSpiffe (Sandbox) – Spiffe provides a secure production identity framework. It enables communication between workloads by verifying identities. It’s policy-driven, API-driven, and can be entirely automated. It’s a cloud native solution to the complex problem of establishing trust between workloads, which becomes difficult and even dangerous as workloads scale elastically and become dynamically scheduled. Spiffe is a relatively new project, but it was designed to integrate closely with Spire.

SpireSpire (Sandbox) – Spire is Spiffe’s runtime environment. It’s a set of software components that can be integrated into cloud providers and middleware layers. Spire has a modular architecture that supports a wide variety of platforms. In particular, the communities around Spiffe and Spire are growing very quickly. HashiCorp just announced support for Spiffe IDs in Vault, so it can be used for key material and rotation. Spiffe and Spire are both currently in the sandbox.

TufTuf (Incubator) – Tuf is short for ‘The Update Framework’. It is a framework that is used for trusted content distribution. Tuf helps solve content trust problems, which can be a major security problem. It helps validate the provenance of software and verify that it only the latest version is being used. TUF project play many very important roles within the Notary project that is described below. It is also used in production by many companies that include Docker, DigitalOcean, Flynn, Cloudflare, and VMware to build their internal tooling and products.

NotaryNotary (Incubator) – Notary is a secure software distribution implementation. In essence, Notary is based on TUF and ensures that all pulled docker images are signed, correct and untampered version of an image at any stage of you CI/CD workflow, which is one of the major security concerns for Docker-based deployments in Kubernetes systems. Notary publishes and manages trusted collections of content. It allows DevOps engineers to approve trusted data that has been published and create signed collections. This is similar to the software repository management tools present in modern Linux systems, but for Docker images. Some of Notary’s goals include guaranteeing image freshness (always having up-to-date content so vulnerabilities are avoided), trust delegation between users or trusted distribution over untrusted mirrors or transport channels. While Tuf and Notary are generally not used by end users, their solutions integrate into various commercial products or open source projects for content signing or image signing of trusted distributions, such as Harbor, Docker Enterprise Registry, Quay Enterprise, Aqua. Another interesting open-source project in this space Grafeas is an open source API for metadata, which can be used to store “attestations” or image signatures, which can then be checked as part of admission control and used in products such as Container Analysis API and binary authorization at GCP, as well products of JFrog and AquaSec.

Open Policy AgentOpen Policy Agent (Sandbox) – By enforcing policies to be specified declaratively, Open Policy Agent (OPA) allows different kinds of policies to be distributed across a technology stack and have updates enforced automatically without being recompiled or redeployed. Living at the application and platform layers, OPA runs by sending queries from services to inform policy decisions. It integrates well with Docker, Kubernetes, Istio, and many more.

Streaming and Messaging

NATSNATS (Incubator) – NATS is a messaging service that focuses on middleware, allowing infrastructures to send and receive messages between distributed systems. Its clustering and auto-healing technologies are HA, and its log-based streaming has guaranteed delivery for replaying historical data and receiving all messages. NATS has a relatively straightforward API and supports a diversity of technical use cases, including messaging in the cloud (general messaging, microservices transport, control planes, and service discovery), and IoT messaging. Unlike the solutions for logging, monitoring, and tracing listed above, NATS works at the application layer.

gRPCgRPC (Incubator) – A high-performance RPC framework, gRPC allows communication between libraries, clients and servers in multiple platforms. It can run in any environment and provide support for proxies, such as Envoy and Nginx. gRPC efficiently connects services with pluggable support for load balancing, tracing, health checking, and authentication. Connecting devices, applications, and browsers with back-end services, gRPC is an application level tool that facilitates messaging.

CloudEventsCloudEvents (Sandbox) – CloudEvents provides developers with a common way to describe events that happen across multi-cloud environments. By providing a specification for describing event data, CloudEvents simplifies event declaration and delivery across services and platforms. Still in Sandbox phase, CloudEvents should greatly increases the portability and productivity of an application.

 

What’s Next?

The cloud native ecosystem is continuing to grow at a fast pace. More projects will be adopted into the Sandbox in the close future, giving them chances of gaining community interest and awareness. That said, we hope that infrastructure-related projects like Vitess, NATs, and Rook will continuously get attention and support from CNCF as they will be important enablers of Cloud Native deployments on Prem. Another area that we hope the CNCF will continue to place focus on is Cloud Native Continuous Delivery where there is currently a gap in the ecosystem.

While the CNCF accepts and graduates new projects, it is also important to have a working mechanism of removal of projects that have lost community interest because they cease to have value or are replaced other, more relevant projects. While project submission process is open to anybody, I hope that the TOC committee will continue to only sponsor the best candidates, making the CNCF a diverse ecosystem of projects that work well with each other.

As a CNCF ambassador, I hope to teach people how to use these technologies. At CloudOps I lead workshops on Docker and Kubernetes that provide an introduction to cloud native technologies and help DevOps teams operate their applications. I also organize Kubernetes and Cloud Native meetups that bring in speakers from around the world and represent a variety of projects. They are run quarterly in MontrealOttawaTorontoKitchener-Waterloo, and Quebec City. I would also encourage people to join the Ambassador team at CloudNativeCon North America 2018 on December 10th. Reach out to me @archyufa or email CloudOps to learn more about becoming cloud native.

Connect Everything: A Look at How NATS.io can Lead to a Securely Connected World

By | Blog

By Colin Sullivan

Developing and deploying applications that communicate in distributed systems, especially in cloud computing, is complex. Messaging has evolved to address the general needs of distributed applications but hasn’t gone far enough. We need a messaging system that takes the next steps to address cloud, edge, and IoT needs. These include ever-increasing scalability requirements in terms of millions, if not billions of endpoints, a new emphasis toward resiliency of the system as a whole over individual components, end-to-end security, and the ability to have a zero-trust system. In this post we’ll discuss the steps NATS is taking to address these needs, leading toward a securely connected world.

Let’s break down the challenges into scalability, resiliency at scale, and security.

Scalability

To support millions, or even billions of endpoints spanning the globe, most architectures would involve a federated approach with many layers filtering up to a control layer, driven by a required central authority for configuration and security. Instead, NATS is taking a distributed and decentralized approach.

We are introducing a resilient, self-healing way to connect NATS clusters together – NATS superclusters (think clusters of clusters) that optimize data flow. If one NATS server cluster can handle millions of clients, imagine hundreds of them connected to support truly global connectivity with no need for a central authority or hub – no single point of failure.

Traditional messaging systems require precise knowledge of a server cluster topology. This was acceptable in the past as there was no requirement otherwise. However, this severely hinders scalability in a hybrid cloud environment, and beyond to edge and IoT. NATS addresses this with the cloud-native feature of auto-discovery. NATS servers share topology by automatically exchanging cluster information with each other. When scaling upwards and adding a NATS server to an existing cluster, topology information is automatically shared, and boom – each server has complete knowledge of the cluster in real-time with zero configuration changes. And it gets better – clients also receive this topology information. NATS maintainer supported clients use this to keep an up-to-date list of available servers, enabling clients to connect to new servers and avoid connecting to servers that are no longer available – again with no configuration changes. View this feature in the context of scaling up (or down), rolling upgrades, or cruising past instance failures and this is an operator’s dream.

Resiliency at Scale

In messaging, priorities have changed with the move from traditional on-premise computing to cloud computing: The needs of the system as a whole must be prioritized over individual components in the system. Messaging systems evolved to suit predictable and well-defined systems. We knew exactly what and where our hardware resources were to accurately identify, predict, and shore up weak points of the system. While cloud vendors have done a great job, you cannot count on the same levels of predictability unless you are willing to pay for dedicated hardware.

Today we have machine instances disappearing (often intentionally), spurious network issues, or even unpredictable availability of CPU cycles on multi-tenant instances. Add to this the decomposition of applications into microservices and we have many more moving parts in a distributed system – and many more places to fail.

Unhealthy servers or clients that can’t keep up are known to NATS as slow consumers. The NATS messaging system will not assist slow consumers. Instead, they are disconnected, protecting the system as a whole. Server clusters self-heal, and clients might be restarted or simply reconnect elsewhere. If using NATS streaming, missed messages are redelivered.

This works so well that we have users who deploy on spot instances knowing the instances will terminate at some point, counting on the holistically resilient behavior of NATS to reduce their operational costs.

Security

The NATS team aims to enable the creation of a securely connected world where data transfer is protected end to end and metadata is anonymous. NATS is taking a novel approach to solving this.

Concerning identities, we’ll continue to support user/password authentication, but our preference will be a new approach using Nkeys – a form of an Ed25519 key made extremely simple.  

NKeys are fast and resistant to side channel attacks. Public Nkeys may be registered with the NATS server and during connect, the endpoint (client application) signs a nonce with its private key and returns it to the server where the identity can be verified. The NATS messaging system will never have access or store private keys. Operators or organizations manage their own private keys. In this new model, authorization to publish and receive data are tied to an NKey identity.

NKeys may be assigned to accounts, which are securely isolated communication contexts that allow multi-tenancy. Servers may be configured with multiple accounts, to support true multi-tenancy, bifurcating topology from the data flow. The decision to silo data is driven by use case, rather than software limitations, and operators only need to maintain one NATS deployment. An account can span several NATS server clusters (enterprise multi-tenancy), be limited to just one cluster (data silo by cluster), exist within a subset of a cluster or even in a single server.

Between accounts, specific data can be shared or exchanged through a service (request/reply) or a stream (publish/subscribe). Mutual agreement between accounts allows data flow, allowing for decentralized account management.

Of course, NATS will continue to support TLS connections, but applications can go a step further by using NKeys to sign payloads.

Connecting It All Together

These features, along with other work the NATS team is doing behind scalability, reliability at scale, and security will allow NATS to provide secure and decentralized global connectivity. We’ll be discussing these in-depth at KubeCon + CloudNativeCon North America in December, and have three sessions – we’d love it if you can attend. We always enjoy a good conversation around solving hard problems like this and invite you to stop by to visit the NATS maintainers at the Synadia booth to learn more.  

GSoC 2018: Building a Conditional Name Server Identifier for CoreDNS

By | Blog

Google Summer of Code (GSOC) program 2018 has come to an end and we followed up with CNCF’s seven interns previously featured in this blog post to check in on how their summer project progressed.

Over the summer, Jiacheng Xu, a graduate student majoring in Computational Science and Engineering at École Polytechnique Fédérale de Lausanne(EPFL) in Switzerland worked with mentors Miek Gieben and Yong Tang on “Conditional Name Server Identifier for CoreDNS,” a project to identify the nodes without domain name collision in a distributed TendorFlow.

My Google Summer of Code project was to write a CoreDNS plugin named idetcd used for identifying nodes in a cluster without domain name collision. CoreDNS is a fast and flexible DNS server, and now it is a part of Kubernetes.

Motivation

In the distributed system, identifying nodes in the cluster is a big challenge since it’s quite often that nodes can get down or start/restart in the cluster which contains thousands of nodes and would be quite annoying if rebooting is needed after the membership in the cluster. For tackling this problem, usually requiring some complicated protocols or additional DevOps, what’s more, in most case it needs to customize the configuration for every different node which requires lots of work and also adds some risks to manage the system.

Distributed TensorFlow is also encountering the similar problems[1]. In the older version of distributed TensorFlow, adding a node to the cluster is not that easy. First of all, you need to bring the whole system down, and then customize a configuration for the new node, after that restart the whole system. It did require additional DevOps work and it’s also not “friendly” for some machine learning lovers to set up their own distributed TensorFlow clusters because, for people who are not familiar with distributed system, it would be great if she/he can add/delete a node from the system by one or two commands.

In practice, there are some approaches to solve this problem. For example, as mentioned before, building some protocols on the top of current Tensorflow codebase definitely can achieve this goal, but it probably is not a good way since may need to change the structure of Tensorflow, and bring some unnecessary complexities. And also, people who are not familiar with the infrastructure of TensorFlow still can not do anything if they still meet some problems when they deploy their own clusters. A more flexible way to do this is adding a separate module like DNS server, and nodes expose themselves through DNS.

CoreDNS has a plugin-based architecture and it is a really lightweight, flexible and extendable DNS server which can easily enable the customized plugin. For solving this issue, we can set up the CoreDNS plus customized plugin on every node in the TensorFlow cluster, and use the plugin to write/read DNS records in a distributed key-value store, like zookeeper and etcd. And this is what idetcd does.

How it works

The figure[2] above shows the scenario of how it works. The idea is quite simple: Set up CoreDNS server on every node, and node exposes itself by taking the free domain name.

In details, before the cluster is started, we set up CoreDNS server on every node in the cluster, for every node we just use the same configuration(See [example] below for details)which specifies the domain name pattern of nodes in this cluster, like worker{{.ID}}.tf.local., the maximum number of node allowed in this cluster, and etcd endpoints. Then we just start up all the nodes. That’s it!

Notice, at the starting time, all the nodes haven’t exposed themselves to other nodes. Then we just start CoreDNS server on every node, and nodes will try to find free slots in the etcd to expose. For example, the node may first try to take worker1.tf.local., and then it will try to figure out whether this domain name already exists in the etcd: if the answer is yes, then the node will try to increase the id to 2 and look into etcd again; otherwise, it will just take the name, and write it to the etcd. In this way, every node can dynamically find a domain name for itself without any collision. And also we don’t need to customize the configuration for every node; instead, we use the same configuration and let the nodes expose themselves!

Usage

Syntax

CoreDNS uses a configuration file called Corefile to specify the configuration, please go to CoreDNS Github repo for more details. Here is a snippet for idetcd syntax:

  • endpoint ENDPOINT the etcd endpoints. Defaults to “http://localhost:2379“.
  • limit LIMIT the maximum limit of the node number in the cluster, if some nodes is going to expose itself after the node number in the cluster hits this limit, it will fail.
  • pattern PATTERN the domain name pattern that every node follows in the cluster. And here we use golang template for the pattern.

Example

In the following example, we are going to start up a cluster which contains 5 nodes, on every node we can get this project by:

Before you move to the next step, make sure that you’ve already set up a etcd instance, and don’t forget to write down the endpoints.

Then you need to add a Corefile which specifies the configuration of the CoreDNS server in the same directory of main.go, a simple Corefile example is as follows, please go to CoreDNS Github repo for more details.

And then you can generate binary file by:

Alternatively, if you have docker installed, you could also execute the following to build:

Then run it by:

After that, all nodes in the cluster are trying to find free slots in the etcd to expose themselves, once they succeed, you can get the domain name of every node on every node in the same cluster by:

Also ipv6 is supported:

Integration with AWS

Using CoreDNS with idetcd plugin to config the cluster is a one-time process which is different with the general config process. For example, if you want to set up a cluster which contains several instances on AWS, you can use the same configuration for every instance and let all the instances to expose themselves in the init process. This can be achieved by using cloud-init in user data. Here is a bash script example for AWS instances to execute at launch:

Experience with GSoC and CoreDNS

I really appreciate my mentor Yong Tang. Every time he answered my questions so patiently. Instead of telling me the correct answer directly, he always gave me some background about the problems and provided several approaches and then explained the pros and cons of every approach. So I really learned a lot from talking with Yong. He also told me some experience in his career development and gave me a lot of super useful advice about my further career. And he was really responsible and always replied to my email in an hour and provided me with a super solid technical support. I think he is the best mentor I’ve met before! And I don’t know how he can become better!

And also, CoreDNS is a really nice project, all developers there are so warm to me, and every time I posted issues on Github CoreDNS repo, they would give me the clear answer and technical help in a couple of hours.

Overall, I am really happy to be a part of GSoC this summer, and I enjoyed a lot and learned a lot during the project, and now I am an open source lover!

FoundationDB Summit Program Announced

By | Blog

This post was originally published by the FoundationDB team, here.

We’re pleased to announce the program for FoundationDB Summit, our inaugural community conference taking place on December 10 during KubeCon + CloudNativeCon in Seattle, Washington.

Register today to join the FoundationDB community in person!

What to expect

We’ve organized FoundationDB Summit into a single-track event, with plenty of time to meet members of the community and learn from one another.

The conference will kickoff with an introduction to the project, share updates since it was open sourced in April, and offer a technical overview of FoundationDB.

The rest of the day will pick up the pace and take us on a tour of what’s happening in the community, focusing on three areas:

These are just samples of some of the talks that are planned. The event schedule is now online, and we encourage you to check out the full program which includes lightning talks.

Who should attend

FoundationDB Summit is a technical conference that will bring together early adopters, core developers, and other community members. If you’re interested in learning more about the project, we encourage you to join us!

We’ve also established diversity scholarships for applicants from underrepresented and/or marginalized groups. The deadline to apply for scholarships is next Friday, November 2.

FoundationDB Summit is also part of the KubeCon + CloudNativeCon Community Events Day, and will be in the same Washington Convention Center as thousands of other open source engineers that week.

Join the community

FoundationDB Summit is a great opportunity to join the community, learn from and meet others, and get involved. We believe that a strong open source project requires a vibrant community, and we hope you’ll join us in Seattle to help grow that for FoundationDB. Registration is open!

Before the conference, if you’re looking to get involved we also encourage your participation on the community forums, and welcome contributions.

We look forward to seeing you at FoundationDB Summit in Seattle!

CNCF and BVP Host Discussions with Leading Open Source Contributors and End Users

By | Blog

This blog was originally posted on the Bessemer Venture Partners Blog.

Bessemer was proud to partner with the CNCF to host an evening devoted to discussion around exciting new open source infrastructure projects and best practices for enterprises making the cloud native transformation. Given the high concentration of financial services firms in NYC, we were fortunate to have great input from many of these leaders on the added challenges of making these transitions in highly-regulated institutions.

Across all of these discussions, it was clear that everyone is dealing with similar macro challenges. Today’s infrastructure leaders are in a tough spot, having to meet increasingly strict data governance and security policies while also providing the ever increasing scalability that modern enterprises demand.  In addition, these leaders are now faced with a myriad of infrastructure options (public/private/hybrid cloud), compute paradigms (VM/Container/Serverless), open source projects, and startup vendors to choose from. Fortunately, governing bodies such as the CNCF are providing some direction and guidance for enterprises as they begin to navigate the crowded space.

Some of the best actionable tips that we heard regarding helping enterprises succeed with cloud native transitions include:

1)    Empower engineers to explore the open source landscape and experiment with new projects and then establish a point person, such as a Director of Open Source, to help centralize enforcement of best practices before new open source projects go into production.

2)    Create a funnel process to help qualify open source projects during testing and help teams get approval to use production data or build mission critical apps with open source software. This enables development teams to move quickly and experiment with new technologies while also ensuring an enterprise can manage their security and compliance concerns before new deployments go into production.

3)    Assign clear ownership to microservices and internal open source projects. When you have hundreds or thousands of microservices in production, knowing who to contact for what service can become a real problem.

4)    If you need help navigating this rapidly changing ecosystem, check out some of the guidance provided by the CNCF here.

In addition, the evening featured talks from three great speakers, as outlined below.

Ken Owens, VP of Cloud Engineering at Mastercard shared lessons learned from his experience adopting two of the leading open source technologies, Kubernetes and Istio, in production at Mastercard.  Based on our conversations with many large enterprises in the financial services industry we were really impressed to learn from Ken just how far ahead Mastercard is in this transition. His main advice was that in order to get any large enterprise, particularly a highly-regulated company in the financial services industry, to cross the open source chasm, you need to have very strong buy in from senior executives to go cloud native. He advised team leads to lay the necessary communications and relationship building groundwork, since this infrastructure change will be a multi-month, sometimes multi-year endeavor.

Priyanka Sharma, Director of Alliances at Gitlab, shared tips on how to avoid some of the devops horror stories she has seen across enterprises who have tried to keep up with the latest trends: Go for quality over quantity on devtools. Latching onto new trends in devops can ultimately lead to more headache and heartbreak than efficiency savings. Instead of having a “toolbox” of ten different devops tools, try to standardize across 2 or 3 tools that fit the bulk of your needs.

Andrew Jessup, project maintainer for SPIFFE and cofounder of Scytale, shared the origin story of their open source project (SPIFFE) and the need they saw in the market for scalable identity and access management infrastructure for microservices. He also highlighted how the pain points that SPIFFE solves are increasingly acute in enterprises moving to a microservices-based architecture, distributed across heterogeneous infrastructure that requires high scalability and elasticity.  As these environments have workloads that are constantly spinning up and disappearing across various network boundaries, the traditional approaches to security and authentication approaches break down and require an identify-focused approach.

CNCF is proud to foster and support the next wave of open source communities. We want to give a special thanks to BVP for supporting CNCF, and making this event possible with support for their portfolio companies, npm, ScyllaDB, and Scytale. We also want to thank the speakers for sharing their first hand open source knowledge, helping to give back and push the community forward.

 

GSoC 2018: Extending Envoy’s fuzzing coverage

By | Blog

Google Summer of Code (GSOC) program 2018 has come to an end and we followed up with CNCF’s seven interns previously featured in this blog post to check in on how their summer project progressed.

Over the summer, Anirudh Morali of Anna University worked with mentors Matt Klein, Constance Caramanolis, and Harvey Tuch on “Extending Envoy’s Fuzzing Coverage,” a project focusing on extending the fuzz coverage including proto, data plane, and H2 level frame fuzzing.

I am Anirudh, a final year computer science undergraduate at Anna University, India. I previously participated in Google Summer of Code 2017 with Haiku OS, and this time, Google Summer of Code 2018 with Cloud Native Computing Foundation.

It was February 12th when Google posted the list of organizations that will be participating for the Google Summer of Code 2018. During my internship with Hasura, I came to learn about the existence of Cloud Native Computing Foundation and all their projects. Having heard about CNCF’s success from the people who participated in Google Summer of Code with CNCF the previous year, I was interested in sending a proposal to CNCF, but had no knowledge on any specific project or domain to work on. The CNCF projects which participated in GSoC (Kubernetes, Prometheus, Envoy, CoreDNS and few more) had huge codebases, which was overwhelming for me at first. Though most codebases had parts of code written in different languages, the majority usage was Golang. I wanted to learn Go, find low-hanging issues, fix them and then work on a proposal. But I realised it was late to learn a new language with just a few weeks left for the program to begin, and it would be even more difficult to write code for the projects since the projects are not beginner friendly.

I previously worked with a C++ codebase and was looking for projects which were built with C++. I came across Envoy, and noticed that this was the first time for Envoy to participate in GSoC. You can read about Envoy here: https://www.envoyproxy.io/ – Envoy is a service mesh initially created at Lyft, with codebase in C++. It’s designed to minimize the memory and CPU footprint, while offering load balancing, network tracing and database activity in micro-service architecture system.

I spent a few days learning what Envoy was and took part in one of the community meetings. Even though I didn’t understand what they spoke technically, it was a good experience on how people were in touch with each other remotely and worked collaboratively on a project. I had talks with Harvey Tuch, who was already working on fuzzing for Envoy. Matt Klein told me that I’d be working with him and extending the fuzzing support for Envoy.

My last exam and the community bonding were both on the week of May 2nd. With no exams, I was able to spend more time on GSoC. We had video calls to get things started. There was an ongoing issue with all the fuzzing support needed for Envoy: https://github.com/envoyproxy/envoy/issues/508

I became part of the Envoy GitHub organization and was assigned the issue to work on. Harvey gave me a quick intro on the library Envoy was using for fuzzing: oss-fuzz, and there was a guide on getting started with writing fuzz tests. I started working on fuzzing utility functions in Envoy, such as the string functions, and then gave a pull request to the repository: https://github.com/envoyproxy/envoy/pull/3493. After a series of code reviews and suggestions, followed by fixing them, the pull request got merged which marking my first contribution to Envoy. 😀

For the second evaluation, I picked up the configuration validation fuzzing tasks, and the task was supposed to be a small fix to the existing server fuzz test, but it took me time to get that figured out. I also started working on the H1 capture fuzzing test. H1 fuzzing is on both the request and response path as of now, i.e. fuzzing happens both downstream and upstream, with direct response enabled the fuzzing happens only in the downstream. A response like a file is thrown instead of connecting to the upstream.

OSS-Fuzz is a project by Google which helps open-source projects be more secure and stable by fuzzing the codebase with fuzzing engines. I got access to Envoy’s OSS-Fuzz dashboard and started working on the issues present there. I picked up simple tests which were failing because of a failure in assert, and fixed them by proper error capture or adding bazel build constraints. Some PRs needed changes to other repositories as well, raised issues there as well.

Work done during the summer:

Pull requests:

Issues:

Owe my biggest thanks to:

  • Envoy open-source community (Specifically, my kind mentors for taking time to clarify doubts, and helping me learn).
  • Nikhita, Chris Aniszczyk (CNCF organization admins) and Amit, for introducing me to Envoy.
  • People at my university for the permission and time to pursue GSoC.

If you’re an aspiring GSoCer, go ahead and apply to CNCF, you’ll have an awesome summer. You can get in touch with me for any queries on getting started. You can find my contact details here. Will be happy to help! 🙂