Member post originally published on the SuperOrbital blog by Keegan McCallum

NVIDIA Device Plugin for Kubernetes plays a crucial role in enabling organizations to harness the power of GPUs for accelerating machine learning workloads.

Introduction

Generative AI is having a moment right now, in no small part due to the immense scale of computing resources being leveraged to train and serve these models. Kubernetes has revolutionized the way we deploy and manage applications at scale, making it a natural choice for building large-scale computing platforms.

GPUs, with their parallel processing capabilities and high memory bandwidth, have become the go-to hardware for accelerating machine learning tasks. NVIDIA’s CUDA platform has emerged as the dominant framework for GPU computing, enabling developers to harness the power of GPUs for a wide range of applications. By combining the capabilities of Kubernetes with the extreme parallel computing power of modern GPUs like the NVIDIA H100, organizations are pushing the boundaries of what is possible with computers, from realistic video generation to analyzing entire novels worth of text and accurately answering questions about the contents.

However, orchestrating GPU-accelerated workloads in Kubernetes environments presents its own set of challenges. This is where the NVIDIA Device Plugin comes into play. It seamlessly integrates with Kubernetes, allowing you to expose GPUs on each node, monitor their health, and enable containers to leverage these powerful accelerators. By combining these two best of breed solutions, organizations are building robust, performant computing platforms to power the next generation of intelligent software.

Understanding the Nvidia Device Plugin for Kubernetes

The NVIDIA Device Plugin is a Kubernetes Daemonset that simplifies the management of GPU resources across a cluster. Its primary function is to automatically expose the number of GPUs on each node, making them discoverable and allocatable by the Kubernetes scheduler. This allows pods to request and consume GPU resources in a similar way to cpu and memory. Under the hood, the device plugin communicates with the kubelet on each node, providing information about the available GPUs and their capacities. It also monitors the health of the GPUs, ensuring they are functioning optimally and reporting any issues to Kubernetes.

Some of the benefits of the NVIDIA Device Plugin include:

  1. Automatic GPU discovery and allocation, eliminating the need to manually configure GPUs resources on each node.
  2. Seamless integration with Kubernetes, allowing you to manage GPUs with familiar tools and workflows
  3. GPU health monitoring, allowing Kubernetes to maintain stability and reliability for GPU-accelerated workloads.
  4. Resource sharing, which allows multiple pods to utilize the same GPU, which is crucial in an environment like today where GPUs are scarce and expensive.

Installing and Configuring the Nvidia Device Plugin

Prerequisites

Deploying the Device Plugin

First, we’ll install the daemonset using helm. To install the latest version (v0.14.5 at the time of writing) into a cluster with default settings, the most basic command is:

helm upgrade -i nvdp nvidia-device-plugin \
  --repo https://nvidia.github.io/k8s-device-plugin \
  --namespace nvidia-device-plugin \
  --create-namespace \
  --version v0.14.5

This will install OR upgrade a helm release named nvdp into the nvidia-device-plugin namespace, with default settings.

This will give you a basic setup, but there are many reasons you may want to customize the chart via values.yaml. We’ll dive into some of the most useful options as well as some best practices, but you can see the full set of values here. You’ll likely want to add taints to your GPU nodes (the method used will depend on your kubernetes setup and how you are provisioning node) and then configure tolerations to ensure that the device plugin is only scheduled on GPU-enabled devices. We’ll dive deeper into these types of configurations in part 2 of this series.

Configuring GPU Sharing and Oversubscription

The nvidia-device-plugin supports 3 strategies for GPU sharing and oversubscription, allowing you to optimize GPU utilization based on your specific workload’s requirements. A quick overview of each, with examples of how to configure via values.yaml:

config:
  map:
    default: |-
      version: v1
      sharing:
        timeSlicing:
          resources:
          - name: nvidia.com/gpu
            replicas: 10
config:
  map:
    default: |
      version: v1
      flags:
        migStrategy: "mixed"
config:
  map:
    default: |-
      version: v1
      sharing:
        mps:
          resources:
          - name: nvidia.com/gpu
            replicas: 10

This should be a good introduction to GPU sharing to get you started. We will go into more detail about advanced configuration and best practices in part 2 of this series.

Allocating GPUs to Pods Using the Nvidia Device Plugin

Allocating GPUs to pods when using the nvidia-device-plugin is straightforward and should feel familiar to anyone comfortable with kubernetes. It is highly recommended to use NVIDIA base images for your containers in order to have all the necessary dependencies installed and configured properly for your underlying workload. Setting a limit for nvidia.com/gpu is crucial, otherwise all GPUs will be exposed inside the container. Finally, make sure to include tolerations for any taints set on your nodes so that the pod can be scheduled appropriately. Here’s a barebones example of a GPU-enabled pod:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
    - name: cuda-container
      image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
      resources:
        limits:
          nvidia.com/gpu: 1 # requesting 1 GPU
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule

Conclusion

The NVIDIA Device Plugin for Kubernetes plays a crucial role in enabling organizations to harness the power of GPUs for accelerating machine learning workloads. By abstracting the complexities of GPU management and providing seamless integration with Kubernetes, it empowers developers and data scientists to focus on building and deploying their models without worrying about the underlying infrastructure.

We’re just scratching the surface here, so if you’re interested to learn more please check out part 2 of this series where we’ll go into detail on advanced configuration, troubleshooting common issues, and some of the limitations of using the nvidia-device-plugin alone to manage GPUs. Also, check out the additional resources at the end of this article!

Further Reading and Resources