Q&A with JD.com: Kubernetes, Cloud Native, and CNCF Projects Driving Big Data and AI

By August 2, 2018 Blog

Liu Haifeng, Chief Architect at JD.com, sat down with the Cloud Native Computing Foundation (CNCF), to talk about Cloud Native, JD.com’s Kubernetes implementation, and tips for other companies looking to get started with open source. Below is their interview.

 

CNCF: How do you see your CNCF membership and cloud native technologies helping JD.com realize its “Retail as a Service” vision?

Haifeng: The goal of our Retail as a Service (RaaS) strategy is to open up our capabilities and resources to empower our partners, suppliers, and other industries. This is very much in line with our commitment to open source technologies. We’ve already benefited tremendously from the CNCF projects we have been a part of and our new commitment to CNCF enables us to build even stronger collaborative relationships with the industry’s top developers, end users, and vendors and ultimately enables us to contribute more to the open source community. Joining CNCF is an important step for us as we develop new container-native technologies towards an open platform to realize our RaaS vision.

CNCF: What impact has Kubernetes had on your company and/or development team?

Haifeng: JD.com is one of the earliest adopters of Kubernetes. The company currently manages the world’s largest Kubernetes clusters in production with more than 20,000 bare metal services in several clusters across data centers in multiple regions.

CNCF: How has Kubernetes helped JD to conduct AI or big-data analytics to revolutionize e-commerce?

Haifeng: JDOS, our customized and optimized Kubernetes supports a wide range of workloads and applications, including big data and AI. JDOS provides a unified platform for managing both physical servers and virtual machines, including containerized GPUs and delivering big data and deep learning frameworks such as Flink, Spark, Storm, and Tensor Flow as services. By co-scheduling online services and big data and AI computing tasks, we significantly improve resource utilization and reduce IT costs.

CNCF: How big is the Kubernetes cluster JD runs? Please describe it, your team using Kubernetes.

Haifeng: JD currently manages the world’s largest Kubernetes clusters in production with more than 20,000 bare metal services in several clusters across data centers in multiple regions.

CNCF: How is Kubernetes and cloud native empowering JD developers? What can they do now that they couldn’t do before?

Haifeng: The old deployment tools required different processes for different environments, from application packaging, container application, deployment, configuration, and scaling. The overall process was complicated and time-consuming. The introduction of Kubernetes dramatically simplifies the process. Applications are now automatically packaged into images and deployed onto containers in near-real time. Scaling is now a simple one-click operation that can occur within a few seconds.

CNCF: JD runs one of the largest Kubernetes clusters in production in the world, how has the company overcome hurdles to make this possible?

Haifeng: We are constantly monitoring the performance of our systems. To address performance issues in the past, we collected and analyzed several key performance indicators and generated a detailed bottleneck analysis report. We then customized Kubernetes by removing unnecessary functions and optimizing the default scheduler. We also enhanced multiple controllers to avoid cascading failures. In addition, we developed an operational toolkit for inspection, monitoring, alarming, and fault handling, which helps operators troubleshoot and quickly resolve any issues which may come up.

CNCF: JD just celebrated its infamous June 18 anniversary sale (“618”) clocking transaction volume of over $24.7 billion during the 18-day period. That is a lot of orders. Can you talk about how your system is able to handle this much volume?

Haifeng: JDOS uses prediction-based algorithm to proactively allocate resources to meet forecasted demand and improve resource utilization. It also provides millisecond-level elastic scaling to handle the extreme workloads. Our June 18 anniversary sales period event, which we hold annually, generated $24.7 billion in transaction volume this year. With over 300 million customers on our platform, we experience a significant peak in traffic during this period. We scheduled approximately 460,000 containers (Pod) and 3,000,000 CPU cores to support the massive volume of orders.

CNCF: Tell us about your use of Vitess. What impact has this had?

Haifeng: Our elastic database is one of the largest and most complex Vitess deployments in the world. We have successfully scaled Vitess to manage large volumes of complex transitional data on JD’s Kubernetes platform. The salient features include the support of RocksDB and TokuDB as new storage engines, automatic re-sharding, automatic load balancing, and migration. Our system currently manages 2,600 MySQL clusters, 9,000 MySQL instances, 350,000 tables, 160 billion records, and 65T data in support of various business applications and services at JD. The use of Vitess enables us to manage resources much more flexibly and efficiently, which significantly reduces operations and maintenance costs. We are actively collaborating with the CNCF community to add new features such as subquery support and global transactions to Vitess.

CNCF: What’s next for Kubernetes and other cloud native technologies (GitLab, Jenkins, Logstash, Harbor, Elasticsearch, and Prometheus) in your company?

Haifeng: Our containerized platform separates the application and infrastructure layers by deploying a DevOps stack on Kubernetes that includes Vitess, Prometheus, GitLab, Jenkins, Logstash, Harbor, Elasticsearch, etc. We have contributed code to some of these projects. We would like to make more contributions in the future. One example of where we think we can really add value is Vitess, the CNCF project for scalable MySQL cluster management. We are not only the largest end-user of Vitess, but also a very active and significant contributor. We look forward to working together with others in the CNCF community to add new features to Vitess, including sub-query support, global transactions, etc. Separately, we are extending Prometheus to create a real time and high performance monitoring system. We’d like to improve Kubernetes to support multiple, diverse workloads and hopefully contribute code to Kubernetes as well.

We plan to release our internal and homegrown projects too. There are a bunch of them on github.com/tiglabs already. We also plan to propose new CNCF projects. One such project is ContainerFS –a large scale, container-native cluster file systems that has been seamlessly integrated with Kubernetes.  

CNCF: What other technologies or practices (DevOps, CI/CD) are you currently evaluating?

Haifeng: We are actively working on our own open source projects centered around cloud native or container-native software and technology, from computing, storage and middleware to applications. One focus is container platforms for diverse workloads, including online services, data analytics, edge computing, and IoT. Another focus is scalable and high-performance data storage for container platforms.  

CNCF: For other Chinese companies just getting started with cloud native, what are the most important things to consider?

Haifeng: With Docker, Kubernetes, and microservices, you can get a lot of value out of cloud native without having to endure high costs. A cloud native solution doesn’t only function in the cloud. It is flexible enough to be deployed across on-premise, private cloud, public cloud, and hybrid environments. It is important to keep a close eye on new technology and on industry trends, leverage open source technologies and actively engage with open source communities.

CNCF: What advice do you have for other companies looking to deploy a cloud native infrastructure?

Haifeng: Think about how to meet your business needs from an ecosystem perspective, including containerized infrastructure, data storage, microservice platform, messaging, monitoring systems, etc. In terms of container orchestration and management, Kubernetes is the de facto standard and a sure thing to bet on. You should also take advantage of emerging serverless architectures to simplify the process of application development, packaging, deployment, and management.

CNCF: Why is cloud native such a business imperative today for JD?

Haifeng: With over 300 million customers, in addition to our merchants, it is imperative that our infrastructure is scalable and extremely efficient. To put it in perspective, five years ago, there were about two billion images in our product images system. Today, there are more than one trillion, and that figure increases by 100 million every day. Furthermore, as not only China’s largest retailer, online or offline, but also the operator of China’s largest e-commerce logistics infrastructure – developed fully in-house – our business is complex and changing rapidly by the day. Accordingly, our infrastructure has to be extremely agile and support a wide range of workloads and application scenarios in a host of areas such as online services, data analytics, AI, supply chain, finance, IoT, or edge computing. Cloud native technologies are well suited to handle our ever-changing requirements.

CNCF: Is this the first open source foundation JD has joined?

Haifeng: Yes.  We are a firm believer in open source and it closely aligns with our own strategy. Through CNCF, we aim to have more and stronger engagement with the open source community and fully see the potential mutual benefit of contributing to the open source community. As the third largest internet company in the world by revenue, JD has already developed many leading technology innovations and we recognize our responsibility to take a leadership role in the open source community.

CNCF: How do you plan to work hand-in-hand with the CNCF?

Haifeng: The areas where we can collaborate are unlimited. Joining CNCF and working with the other members will be extremely helpful as we take some of our new projects forward. In addition, CNCF provides us with a platform with which to raise awareness about some of our projects and recruit leading developers to collaborate and contribute to our efforts.  

CNCF: What are you excited to learn or see at KubeCon China?

Haifeng: We look forward to meeting with the industry’s top developers, end users, and vendors and continuing to learn about the newest technological developments. We also plan to showcase our own work and identify potential collaboration opportunities with companies, end users, and independent developers.

 

Want to learn more about how technology leaders in China are leveraging Cloud Native technologies? Join us for our inaugural KubeCon + CloudNativeCon China in Shanghai from Nov. 14-15. Hope to see you there!