
How LY Corporation is leveraging Kubernetes as the foundation of its large-scale internal PaaS
Challenge
LY Corporation offers a wide range of internet services and manages hundreds of web application development projects. So the company faced several challenges in maximizing development efficiency.
One major challenge was creating a large-scale, unified platform to run web applications across numerous projects while centralizing maintenance and operations within a dedicated platform team. Previously, developers were responsible for managing VMs and Kubernetes clusters, which increased their operational workload and costs. By transferring these responsibilities to the platform team, the aim was to optimize costs and enhance quality, enabling developers to concentrate on core development tasks.
Another challenge was the redundancy of tasks across multiple projects. Many projects shared similar tasks in deploying and operating applications, so automating them was necessary to reduce the developers’ workload and enhance the development experience. Also, they sought to ensure consistent operational quality across its numerous projects by standardizing operations.
Solution
LY Corporation adopted a platform approach to address these challenges, leveraging Kubernetes as the central component for building a large-scale PaaS. By utilizing Kubernetes’ flexibility, they created a logically unified cluster composed of multiple clusters, allowing for scalability beyond a single cluster’s limits. Also, their PaaS integrates with other cloud native technologies to automate tasks like application exposure, metrics collection, and endpoint protection. A simple manifest file defines applications, streamlining deployment and ensuring practices to run applications in a stable fashion.
Impact
This approach transformed LY Corporation’s operations, enabling them to run 140,000 pods and 10,000 nodes across their platform, supporting 600 projects in production. The PaaS significantly reduces developers’ workload, allowing them to focus on business value creation. With around 3,000 releases to production environments daily, the platform accelerates application improvement.
By the numbers
140k pods
and 10K nodes in production
600 app projects
supported by 1 multi-tenant cluster
3k releases
to production daily
LY Corporation offers a wide range of internet services and holds a significant market share in Japan.
LY Corporation’s services span industries like e-commerce, internet portals, entertainment, etc. Japanese internet users have likely interacted with their services in some form. Due to the need to provide diverse services to numerous end-users, they maintain large-scale data centers and run hundreds of application development projects on this extensive infrastructure.
Against this backdrop, they have addressed several challenges to maximize application development efficiency. One challenge is establishing a unified platform for running web applications and centralizing the platform’s maintenance and operation within the platform team. Although they have already offered internal computing services, which provide virtual machines and Kubernetes, managing VMs and clusters was the responsibility of the developers. Transferring this responsibility to the platform team aims to optimize operational costs and quality, allowing developers to concentrate on core application development.
Another challenge is eliminating redundant tasks across multiple projects. Almost all projects share many of the same tasks in deploying, running, and operating applications. Automating these tasks further reduces the burden on developers and enhances the development experience. Additionally, by eliminating variations in operational quality, they ensure that all projects meet a consistent level of operational quality in an environment with numerous diverse projects.
The platform approach is the key to success
An underlying concept in these initiatives is the effective use of standardization and centralization approaches. The large number of projects means that centralizing common elements can yield significant benefits proportional to the number of projects.
They have long enhanced company-wide development efficiency by providing various internal services to developers. These include DB as a Service and Messaging as a Service, in addition to the computing services mentioned earlier. While platform engineering is becoming widely recognized today, their initiatives can be considered pioneering examples.
Therefore, adopting such a platform approach to streamline application development was a natural choice for them, and its benefits were unquestionable. This shared understanding led them to develop a large-scale PaaS for running web applications.
Kubernetes as the platform for building platforms
They adopted Kubernetes as the central building block for constructing their solution. Kubernetes is the de facto container orchestrator and is undoubtedly the most mature and reliable foundation for running various workloads. Moreover, its flexibility and high customizability are significant advantages. Well-established customization points like Custom Resource Definitions (CRD) and Custom Controllers, as well as Mutating/Validating Webhooks, make it easy to extend Kubernetes and adapt it to organizational requirements.
The challenge of large-scale PaaS
Although Kubernetes is inherently scalable, reaching the scale they needed with just a single cluster wasn’t easy. Therefore, they adopted an approach to create a logically unified cluster of multiple clusters leveraging Kubernetes’ customizability.
Their PaaS is composed of multiple workload clusters where applications run and a single control plane cluster that coordinates everything. The control plane handles all developer operations, such as application deployment. Custom controllers on the control plane cluster initiate applications in the appropriate workload clusters. These controllers automatically generate application-specific URLs and expose endpoints outside the clusters. When end-users access these URLs, the requests are routed to the clusters where the applications run through name resolution.
Thanks to this mechanism, neither developers nor end-users need to be aware of the multiple clusters within, allowing it to function as a single logical cluster.
Additionally, by adding workload clusters, they can scale beyond the limits of a single Kubernetes cluster. The platform currently operates 140,000 pods and 10,000 nodes, supporting applications for 600 projects in production.
Excellent development experience through integration with other cloud native technologies
Their PaaS excels not only in scalability. It supports developers by making it easy and efficient to perform yak-shaving tasks in the application lifecycle. For example, it includes features to automate application exposure, metrics, log, trace collection, and endpoint protection through authN/Z. So, let’s explore how they achieve some of these features.
In their PaaS, applications can be defined using a very simple manifest file. Below is an example of the minimum manifest required to run an application. This single file can launch an application and automatically expose a TLS-protected endpoint. It also applies practices for safely running applications, such as PDB and preStop Sleep, and automatically collects basic metrics like resource usage.
Kubernetes’ CRD defines the straightforward format of this manifest file, and they’ve further enhanced usability by incorporating validation with OPA. It provides user-friendly warning messages to help prevent developers’ configuration errors and sets up guardrails to ensure that resource overuse doesn’t impact the applications of other projects.
Once developers pass the manifest to the control plane cluster, the custom controllers launch pods in the workload cluster and automatically create necessary Kubernetes resources, such as Service and Ingress, for endpoint exposure. It also works with CoreDNS and a certificate issuance system (they plan to replace it with a cert-manager) to create endpoint URLs and certificates for their domains, automatically exposing endpoints protected by TLS.
As outlined above, their PaaS significantly streamlines developers’ tasks, allowing them to concentrate on generating business value. Currently, LY Corporation’s developers execute around 3,000 releases to the PaaS’s production environments each business day, highlighting its role in accelerating application improvement.
Conclusion
With its exceptional flexibility and extensibility, Kubernetes effectively meets the demanding requirements of LY Corporation’s platform, serving as a robust foundational component. Over the past decade since its inception, Kubernetes has matured significantly. However, the recent rise of Platform Engineering is introducing new use cases and revitalizing its role. LY Corporation will continue leveraging Kubernetes and other cloud native technologies within their internal platform. They aim to contribute to the ecosystem’s further growth through the insights gained from their experiences.