In this guest post, the Kubernetes team from Alibaba will share how they are building hard multi-tenancy on top of upstream Kubernetes by leveraging a group of plugins named “Virtual Cluster” and extending tenant design in the community. The team has decided to open source the these K8s plugins and contribute them to Kubernetes community in the upcoming KubeCon.
In Alibaba, the internal Kubernetes team is using one web-scale cluster to serve large number of business units as end users. In this case, every end user actually become a “tenant” to this K8s cluster which makes hard multi-tenancy as a strong need.
However, instead of hacking Kubernetes APIServer and resource model, the team in Alibaba tried to build a “Virtual Cluster” multi-tenancy layer without changing any code of Kuberentes. With this architecture, every tenant will be assigned a dedicated K8s control plane (kube-apiserver + kube-controller-manager) and several “Virtual Node” (pure Node API object but no corresponding kubelet) so there’s no worries for naming or node conflicting at all, while the tenant workloads are still mixed running in the same underlying “Super Cluster” so resource utilization is guaranteed. This design is detailed in [virtual cluster proposal] which has received lots of feedback.
Although a new concept of “tenant master” is introduced in this design, virtual cluster is simply an extension built on top of the existing namespace based multi-tenancy in K8s community, which is referred to as “namespace group” in the rest of the document. Virtual cluster fully relies on the resource isolation mechanisms proposed by namespace group, and we are eagerly expecting and pushing them to be addressed in the on-going efforts of Kubernetes WG-multitenancy.
If you want to know more details about Virtual Cluster design, please do not hesitate to read the [virtual cluster proposal] , while in this document, we will focus on the high level idea behind virtual cluster and elaborating how we extend the namespace group with “tenant cluster” view and why this extension is valuable to Kubernetes multi-tenancy use cases.
This section briefly reviews the architecture of namespace group multi-tenancy proposal.
We borrow a diagram from the K8s Multi-tenancy WG Deep Dive presentation, as shown Figure1, to explain the high level idea of using namespaces to organize tenant resources.
Figure 1. Namespace group multi-tenancy architecture
In namespace group, all tenant users share the same access point, the K8s apiserver, to utilize the tenant resource. Their accounts, assigned namespaces and resource isolation policies are all specified in tenant CRD objects, which are managed by tenant admin. Tenant user view is limited in the per tenant namespaces. The tenant resource isolation policies are defined to disable the direct communication between tenants and to protect tenant Pods from security attacks. They are realized by native Kubernetes resource isolation mechanisms including RBAC, Pod security policy, network policy, admission control and sandbox runtime. Multiple security profiles can be configured and applied for different levels of isolation requirements. In addition, resource quotas, chargeback and billing happen at tenant level.
How Virtual Cluster Extends the View Layer
Conceptually, virtual cluster provides a view layer extension on top of the namespace group solution. Its technical details can be found in [virtual cluster]. In virtual cluster, tenant admin still needs to use the same tenant CRD used in namespace group to specify the tenant user accounts, namespaces and resource isolation policy in the tenant resource provider, i.e., the super master.
Figure 2. View Layer Extension By Virtual Cluster
As illustrated in Figure 2, thanks to the new virtual cluster view layer, tenant users now have different access points and tenant resource views. Instead of accessing super master and view the tenant namespaces directly, tenant users interact with dedicated tenant masters to utilize tenant resources and are offered complete K8s master views. All tenant requests are synchronized to super master by the sync-manager, which creates corresponding custom resources on behalf of the tenant users in super master, following the resource isolation policy specified in the tenant CRD. That being said, virtual cluster primarily changes tenant user view from namespaces to an APIserver. From super master perspective, the same workflow is triggered by the tenant controller in respect to the tenant CRD.
Benefits of Virtual Cluster View Extension
There are quite a few benefits of having a Virtual Cluster view on top of the existing namespace view for tenant users:
- It provides flexible and convenient tenant resource management for tenant users. For example, a nested namespace hierarchy, as illustrated in Figure 3(a), can easily resolve some hard problems like naming conflicts, namespace visibility, sub-partition tenant resources in namespace group solution [Tenant Concept]. However, it is almost impractical to change native K8s master to support nested namespaces. By having a virtual cluster view, the namespaces created in the tenant master, along with the corresponding namespace group in super master, can achieve similar user experiences as if nested namespaces are used.
As shown in Figure 3(b), tenant users can do self-service namespace creation in tenant master without worrying about naming conflict with other tenants. The conflict is resolved by sync-manager when it adds the tenant namespaces to super master namespace group. Tenant A users can never view tenant B users’ namespaces since they access different masters. It is also convenient for tenant to customize policy for different tenant users which only takes effect locally in tenant master.
- It provides stronger tenant isolation and security since it avoids certain problems due to sharing the same K8s master among multiple tenant users. For example, DOS attacks, API access rate control among tenants and tenant controllers isolation are not concerns any more.
- It allows tenant users to create cluster scope objects in tenant masters without affecting other tenants. For instance, a tenant user can now create CRD, ClusterRole/ClusterRoleBinding, PersistentVolume, ResourceQuota, ServiceAccount, NetworkPolicy freely in tenant master without worrying about conflicting with other tenants.
- It alleviates the scalability stress on super master. First, RBAC rules, policies, user accounts managed in super master can be offloaded to the tenant masters, which can be scaled independently. Secondly, tenant controllers and operators access to multiple tenant masters instead of a single super master, which again can be scaled independently.
- It is much easier to create users for tenant user. Nowadays, if a tenant user want to expose its tenant resources to other users (for example, a team leader wants to add team members to use the resources assigned to the team), tenant admin has to create all the users. In case a tenant admin needs to serve hundreds of such teams in a big organization, creating users for tenant user can be a big burden. Virtual cluster completely offloads such burden to tenant users from the tenant admin.
Since virtual cluster mainly extends the multi-tenancy view option and prevents problems due to sharing apiserver from happening, it inherits the same limitations/challenges faced by namespace group solution in making kubernetes node components tenant-aware. The node components need to be enhanced include but not limited to:
- Kubelet and CNI-plugin. They need to be tenant-aware to support strong network isolation scenarios like VPC.
- For example, how readiness/liveness probe works if a pod is isolated in different VPC from the node? This is one of the issues we’ve already started to cooperate with SIG-Node on upstream.
- Kube-proxy/Kube-dns. They need to be tenant-aware to make cluster-IP type of tenant services work.
- Tools: For example, monitoring tools should be tenant-aware to avoid leaking tenant information. Performance tuning tools should be tenant-aware to void unexpected performance interference between tenants.
Of course, virtual cluster needs extra resources to run tenant master for each tenant which may not be affordable in some cases.
Virtual cluster extends the namespace group multi-tenancy solution with a user friendly cluster view. It leverages the K8s under-line resource isolation mechanisms and existing Tenant CRD & controller in community, but provides uses with experience of dedicated tenant cluster. Overall, we believe virtual cluster together with namespace based multi-tenancy can offer comprehensive solutions for various Kubernetes multi-tenancy use cases in production clusters, and we are actively working on contributing this plugin to the upstream community.
See ya at KubeCon!