Dynamic Resource Allocation (DRA) recently reached GA in Kubernetes v1.35, and I believe many of us are eager to give it a try. Adding to the momentum, NVIDIA has moved dra-driver-nvidia-gpu into Kubernetes SIGs, with the documentation dropping the Beta label — a sign that the technology and its standards are gradually maturing.

For this post, I borrowed all the NVIDIA GPUs currently available at CNTUG Infra Labs to learn how to elegantly allocate devices and resources with DRA.

CNTUG Infra Labs: Lab environment overview

CNTUG Infra Labs was founded to nurture the next generation of students and engineers in Taiwan’s software infrastructure field. The lab is hosted in Equinix’s Tokyo data center and is jointly funded by several CNTUG community members. Building the environment leverages a stack of open source projects, including OpenStack, Ceph, and Ansible

.

Since infrastructure software has a steep learning curve and requires substantial compute, storage, and network resources, CNTUG Infra Labs aims to provide a cloud platform where students and community members can experiment with and host related services. Spare capacity is also offered to the open source community for hosting services such as websites, Mattermost, and Jitsi Meet, or for workshop events. You can review the use cases for more details.

Lab Environment

We’ll use a Kubernetes cluster built with Cluster API + OpenStack. For brevity, the setup process is omitted here — feel free to refer to other blog posts for the details, or wait for a future post once I finish writing it up.

Running kubectl get node should return something like:

NAME                                   STATUS   ROLES           AGE     VERSION
capi-dralabs-control-plane-xtcth       Ready    control-plane   8m7s    v1.35.3
capi-dralabs-md-0-p4xkh-rpfxc          Ready    <none>          6m55s   v1.35.3
capi-dralabs-md-gpua5000-jw4mx-d64jz   Ready    <none>          2m37s   v1.35.3
capi-dralabs-md-gput10-gzl84-f2m2d     Ready    <none>          6m49s   v1.35.3

Installing NVIDIA GPU Operator

Before installing the GPU Operator, label the Nodes that have GPUs. For my environment, this looks like:

kubectl label node capi-dralabs-md-gpua5000-jw4mx-d64jz 
nvidia.com/dra-kubelet-plugin=true
kubectl label node capi-dralabs-md-gput10-gzl84-f2m2d 
nvidia.com/dra-kubelet-plugin=true

Add the NVIDIA Chart Repo:

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update

Create a values-gpu-operator.yaml file that we’ll use during installation:

values-gpu-operator.yaml

# version: v26.3.1
devicePlugin:
  enabled: false

driver:
  manager:
    env:
    - name: NODE_LABEL_FOR_GPU_POD_EVICTION
      value: "nvidia.com/dra-kubelet-plugin"


NOTE

If you’re using a different Kubernetes distribution (e.g., Rancher or K3s), the default Containerd installation path may differ — remember to add the following settings to values-gpu-operator.yaml:

toolkit:
  env:
  - name: CONTAINERD_SOCKET
    value: /run/k3s/containerd/containerd.sock

Install NVIDIA GPU Operator:

helm upgrade --install gpu-operator nvidia/gpu-operator \
  --version=v26.3.1 \
  --create-namespace \
  --namespace gpu-operator -f values-gpu-operator.yaml

Wait for the GPU Operator to come up. It will install the NVIDIA GPU Driver and tweak the Container Runtime configuration. For specific tuning needs, refer to the NVIDIA official documentation.

Installing NVIDIA DRA Driver GPU

Create a values-nvidia-dra-driver-gpu.yaml file that we’ll use during installation:

values-nvidia-dra-driver-gpu.yaml

# version: 25.12.0
nvidiaDriverRoot: /run/nvidia/driver
gpuResourcesEnabledOverride: true
image:
  pullPolicy: IfNotPresent
kubeletPlugin:
  nodeSelector:
    nvidia.com/dra-kubelet-plugin: "true"
resources:
  gpus:
    enabled: true
  computeDomains:
    enabled: false # No NVLink here
# featureGates:
#   TimeSlicingSettings: true

If you’d like to try out Scenario IV’s GPU Time Slicing later, you can enable the TimeSlicingSettings Feature Gate now; otherwise, leave it commented out and helm upgrade later when needed.

Install NVIDIA DRA Driver GPU:

helm upgrade -i nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu \
  --version="25.12.0" \
  --namespace nvidia-dra-driver-gpu \
  --create-namespace -f values-nvidia-dra-driver-gpu.yaml

Use kubectl get pod to confirm the NVIDIA DRA Driver GPU is up:

kubectl get pod -n nvidia-dra-driver-gpu
NAME                                         READY   STATUS    RESTARTS   AGE
nvidia-dra-driver-gpu-kubelet-plugin-6skhp   1/1     Running   0          10m
nvidia-dra-driver-gpu-kubelet-plugin-jswk6   1/1     Running   0          10m

A first look at DRA

DeviceClass

Once installed, you’ll find that DeviceClass and ResourceSlice have been set up by NVIDIA DRA Driver GPU. As the name suggests, DeviceClass represents categories of devices — opening it up reveals regular GPUs, MIG, and VFIO. (If ComputeDomains isn’t disabled, you’ll also see ComputeDomains information.)

kubectl get deviceclass

DeviceClass example output

NAME                  AGE
gpu.nvidia.com        44m
mig.nvidia.com        44m
vfio.gpu.nvidia.com   44m

ResourceSlice

ResourceSlice is automatically updated by the DRA driver on each node, recording all devices the driver manages on that node.

Devices on the same node managed by the same driver belong to the same Pool. When the device count exceeds what fits in a single object (up to 128 entries, or 64 if any device uses taints or counters), the driver splits the Pool across multiple ResourceSlices.

.spec.pool.generation and .spec.pool.resourceSliceCount let the scheduler determine whether it has the complete and latest device list for a given node.

kubectl get resourceslice

ResourceSlice example output

NAME                                                        NODE                                   DRIVER           POOL                                   AGE
capi-dralabs-md-gpua5000-jw4mx-d64jz-gpu.nvidia.com-w9fnv   
capi-dralabs-md-gpua5000-jw4mx-d64jz   gpu.nvidia.com   
capi-dralabs-md-gpua5000-jw4mx-d64jz   5m13s
capi-dralabs-md-gput10-gzl84-f2m2d-gpu.nvidia.com-dtgtc     
capi-dralabs-md-gput10-gzl84-f2m2d     gpu.nvidia.com   
capi-dralabs-md-gput10-gzl84-f2m2d     23m

You can expand the full content with -o yaml:

kubectl get resourceslices -o yaml

Click the panel below to see the full output. Each ResourceSlice records its node in .metadata.ownerReferences and the devices in .spec.devices. Every device carries .attributes including (but not limited to) architecture, product name, and driver version.

Since each node in this lab has at most 2 GPUs — far below a single ResourceSlice’s 128-entry limit — every node only shows one ResourceSlice.

▼ Full ResourceSlice output

apiVersion: v1
items:
- apiVersion: resource.k8s.io/v1
  kind: ResourceSlice
  metadata:
    creationTimestamp: "2026-05-04T14:40:17Z"
    generateName: capi-dralabs-md-gpua5000-jw4mx-d64jz-gpu.nvidia.com-
    generation: 1
    name: capi-dralabs-md-gpua5000-jw4mx-d64jz-gpu.nvidia.com-w9fnv
    ownerReferences:
    - apiVersion: v1
      controller: true
      kind: Node
      name: capi-dralabs-md-gpua5000-jw4mx-d64jz
      uid: 83aafab6-7eb3-42d0-9faf-6118f78341ef
    resourceVersion: "11490"
    uid: d03fd27e-f6cb-4386-ae61-80aa84309e77
  spec:
    devices:
    - attributes:
        addressingMode:
          string: HMM
        architecture:
          string: Ampere
        brand:
          string: NvidiaRTX
        cudaComputeCapability:
          version: 8.6.0
        cudaDriverVersion:
          version: 13.0.0
        driverVersion:
          version: 580.126.20
        productName:
          string: NVIDIA RTX A5000
        resource.kubernetes.io/pciBusID:
          string: "0000:00:06.0"
        resource.kubernetes.io/pcieRoot:
          string: pci0000:00
        type:
          string: gpu
        uuid:
          string: GPU-e13ce856-7474-797f-d143-16e99b65c0c3
      capacity:
        memory:
          value: 23028Mi
      name: gpu-0
    driver: gpu.nvidia.com
    nodeName: capi-dralabs-md-gpua5000-jw4mx-d64jz
    pool:
      generation: 1
      name: capi-dralabs-md-gpua5000-jw4mx-d64jz
      resourceSliceCount: 1
- apiVersion: resource.k8s.io/v1
  kind: ResourceSlice
  metadata:
    creationTimestamp: "2026-05-04T14:21:53Z"
    generateName: capi-dralabs-md-gput10-gzl84-f2m2d-gpu.nvidia.com-
    generation: 1
    name: capi-dralabs-md-gput10-gzl84-f2m2d-gpu.nvidia.com-dtgtc
    ownerReferences:
    - apiVersion: v1
      controller: true
      kind: Node
      name: capi-dralabs-md-gput10-gzl84-f2m2d
      uid: d7ecdc93-1d6c-4868-8503-4251bcf8cf3b
    resourceVersion: "7408"
    uid: 66f32713-c547-4369-84de-97f86430d18d
  spec:
    devices:
    - attributes:
        addressingMode:
          string: HMM
        architecture:
          string: Turing
        brand:
          string: Nvidia
        cudaComputeCapability:
          version: 7.5.0
        cudaDriverVersion:
          version: 13.0.0
        driverVersion:
          version: 580.126.20
        productName:
          string: Tesla T10
        resource.kubernetes.io/pciBusID:
          string: "0000:00:06.0"
        resource.kubernetes.io/pcieRoot:
          string: pci0000:00
        type:
          string: gpu
        uuid:
          string: GPU-dae084a2-974c-00e2-6dec-4ba1999b8652
      capacity:
        memory:
          value: 16Gi
      name: gpu-0
    - attributes:
        addressingMode:
          string: HMM
        architecture:
          string: Turing
        brand:
          string: Nvidia
        cudaComputeCapability:
          version: 7.5.0
        cudaDriverVersion:
          version: 13.0.0
        driverVersion:
          version: 580.126.20
        productName:
          string: Tesla T10
        resource.kubernetes.io/pciBusID:
          string: "0000:00:07.0"
        resource.kubernetes.io/pcieRoot:
          string: pci0000:00
        type:
          string: gpu
        uuid:
          string: GPU-d1bf2033-42f6-096c-b0c6-470575bc08df
      capacity:
        memory:
          value: 16Gi
      name: gpu-1
    driver: gpu.nvidia.com
    nodeName: capi-dralabs-md-gput10-gzl84-f2m2d
    pool:
      generation: 1
      name: capi-dralabs-md-gput10-gzl84-f2m2d
      resourceSliceCount: 1
kind: List
metadata:
  resourceVersion: ""

With this information, how does a Pod tell Kubernetes which devices it wants? That’s where ResourceClaim and ResourceClaimTemplate come in!

ResourceClaim & ResourceClaimTemplate

If you’d like multiple Pods to share the same device, you can manually create a ResourceClaim. It stays fully independent regardless of Pod creation or deletion.

ResourceClaim diagram/flow chart

What if you want each Pod to have its own dedicated device? ResourceClaimTemplate lets you predefine a ResourceClaim. Once a Deployment references the template by name, every new Pod automatically gets a corresponding ResourceClaim; conversely, deleting the Pod removes its claim.

ResourceClaimTemplate diagram

Do these concepts feel familiar? DRA is modeled after Storage in Kubernetes — PersistentVolumeClaim and PersistentVolumeClaimTemplate (the latter only existing inside StatefulSet), with DeviceClass playing roughly the role of StorageClass.

Hands-On with DRA

Scenario I: Two Containers Sharing One GPU

Use a ResourceClaim to declare that we need one NVIDIA GPU, then run a Pod with two containers that share it.

lab01-rc.yaml

apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
  name: must-nvidia-gpu
spec:
  devices:
    requests:
    - name: gpu
      exactly:
        deviceClassName: gpu.nvidia.com
        count: 1

Apply the resource:

kubectl apply -f lab01-rc.yaml

Use get to inspect the ResourceClaim:

kubectl get resourceclaim

The status is pending because no Pod is consuming it yet.

NAME              STATE     AGE
must-nvidia-gpu   pending   10s

Now define a Pod with two containers, both referencing the must-nvidia-gpu ResourceClaim we just created.

lab01-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: must-nvidia-gpu-pod
spec:
  restartPolicy: Never
  containers:
  - name: ctr0
    image: ubuntu:24.04
    command: ["bash", "-c"]
    args: ["nvidia-smi -L; trap 'exit 0' TERM; sleep 9999 & wait"]
    resources:
      claims:
      - name: gpu
  - name: ctr1
    image: ubuntu:24.04
    command: ["bash", "-c"]
    args: ["nvidia-smi -L; trap 'exit 0' TERM; sleep 9999 & wait"]
    resources:
      claims:
      - name: gpu
  resourceClaims:
  - name: gpu
    resourceClaimName: must-nvidia-gpu

Apply the Pod:

kubectl apply -f lab01-pod.yaml

Check the ResourceClaim again:

kubectl get resourceclaim

The status changes to allocated and reserved because a Pod is now using the resource.

NAME              STATE                AGE
must-nvidia-gpu   allocated,reserved   16s

Now we can use logs to print the output:

kubectl logs pod must-nvidia-gpu-pod --all-containers --prefix
[pod/must-nvidia-gpu-pod/ctr0] GPU 0: Tesla T10 (UUID: 
GPU-dae084a2-974c-00e2-6dec-4ba1999b8652)
[pod/must-nvidia-gpu-pod/ctr1] GPU 0: Tesla T10 (UUID: 
GPU-dae084a2-974c-00e2-6dec-4ba1999b8652)

In practice, it might not be a T10 — it could just as easily be an A5000.

Now delete the Pod:

kubectl delete -f lab01-pod.yaml

Check the ResourceClaim once more:

kubectl get resourceclaim

The status returns to pending because no Pod is using the resource anymore.

NAME              STATE     AGE
must-nvidia-gpu   pending   3m39s

Delete the ResourceClaim:

kubectl delete -f lab01-rc.yaml

The example above only asked for one GPU but didn’t tell us which one we’d get.

This scenario isn’t all that different from the original Device Plugin, right? The next scenarios are where DRA truly shines!

Scenario II: ResourceClaimTemplate — Prefer A5000 in a Deployment

Today, an engineer asks me for an inference model that prefers the A5000, but since A5000s are scarce, they’re fine falling back to T10 when scaling up.

Beyond exactly, ResourceClaim also supports firstAvailable for ranked preferences. Going back to the full ResourceSlice output, we can target GPUs by name using .attributes.productName.

The configuration looks like this:

lab02.yaml

apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: first-a5000
spec:
  spec:
    devices:
      requests:
      - name: gpu
        firstAvailable:
        - name: a5000
          deviceClassName: gpu.nvidia.com
          selectors:
          - cel:
              expression: device.attributes["gpu.nvidia.com"].productName == "NVIDIA RTX A5000"
        - name: fallback-t10
          deviceClassName: gpu.nvidia.com
          selectors:
          - cel:
              expression: device.attributes["gpu.nvidia.com"].productName == "Tesla T10"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: first-a5000-deploy
  labels:
    app: first-a5000-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: first-a5000-deploy
  template:
    metadata:
      labels:
        app: first-a5000-deploy
    spec:
      containers:
      - name: ctr0
        image: ubuntu:24.04
        command: ["bash", "-c"]
        args: ["nvidia-smi -L; trap 'exit 0' TERM; sleep 9999 & wait"]
        resources:
          claims:
          - name: gpu
      resourceClaims:
      - name: gpu
        resourceClaimTemplateName: first-a5000

Save the above as lab02.yaml and apply it:

kubectl apply -f lab02.yaml

Confirm the Pod is Running and use the nvidia-smi -L output to verify it got the A5000:

kubectl get pod
kubectl logs deployments/first-a5000-deploy --all-pods
NAME                                 READY   STATUS    RESTARTS   AGE
first-a5000-deploy-8c6cf4568-2lsv9   1/1     Running   0          9s

[pod/first-a5000-deploy-8c6cf4568-2lsv9/ctr0] GPU 0: NVIDIA RTX A5000 (UUID: GPU-e13ce856-7474-797f-d143-16e99b65c0c3)

Now scale up to 2 replicas to see which GPU the new Pod gets:

kubectl scale deployment first-a5000-deploy --replicas 2
kubectl logs deployments/first-a5000-deploy --all-pods
[pod/first-a5000-deploy-8c6cf4568-2lsv9/ctr0] GPU 0: NVIDIA RTX A5000 (UUID: GPU-e13ce856-7474-797f-d143-16e99b65c0c3)
[pod/first-a5000-deploy-8c6cf4568-865jj/ctr0] GPU 0: Tesla T10 (UUID: GPU-dae084a2-974c-00e2-6dec-4ba1999b8652)

The first Pod has taken the only A5000, so the second Pod falls back to T10 — exactly the expected behavior of firstAvailable when the top choice is unavailable.

We can also see the corresponding ResourceClaims:

kubectl get resourceclaim
NAME                                           STATE                AGE
first-a5000-deploy-8c6cf4568-2lsv9-gpu-bdz9j   allocated,reserved   4m29s
first-a5000-deploy-8c6cf4568-865jj-gpu-mqfcx   allocated,reserved   3m29s

⚠️ WARNING — If we delete the A5000 Pod, will the rebuilt Pod return to A5000?

With the configuration above, no, it won’t return to A5000. The Deployment default strategy.type is RollingUpdate; while the old Pod is Terminating, its ResourceClaim hasn’t been released yet.

The Deployment controller immediately creates a new Pod and a new ResourceClaim from the ResourceClaimTemplate. Since the A5000 is still held by the old Pod, the new claim falls back to T10.

Finally, clean up:

kubectl delete -f lab02.yaml

Scenario III: GPUs With at Least 20 GiB of Memory

Today an engineer wants to deploy an LLM that needs a single GPU with at least 20 GiB of memory. Since it’s still in testing, compute requirements are flexible — any available GPU meeting the memory threshold will do.

Beyond .attributes, we can also use .capacity.memory. How do we express comparison rules? Take a look at line 15:

lab03.yaml

apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: gt20g
spec:
  spec:
    devices:
      requests:
      - name: gpu
        firstAvailable:
        - name: gt20g
          deviceClassName: gpu.nvidia.com
          selectors:
          - cel:
              expression: device.capacity["gpu.nvidia.com"].memory.isGreaterThan(quantity("20Gi"))
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gt20g-deploy
  labels:
    app: gt20g-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gt20g-deploy
  template:
    metadata:
      labels:
        app: gt20g-deploy
    spec:
      containers:
      - name: ctr0
        image: ubuntu:24.04
        command: ["bash", "-c"]
        args: ["nvidia-smi -L; trap 'exit 0' TERM; sleep 9999 & wait"]
        resources:
          claims:
          - name: gpu
      resourceClaims:
      - name: gpu
        resourceClaimTemplateName: gt20g

We use CEL’s isGreaterThan(quantity(“20Gi”)) to require more than 20 GiB.

Apply the YAML:

kubectl apply -f lab03.yaml

Confirm the Pod is Running and that we got the A5000:

kubectl get pod
kubectl logs deployments/gt20g-deploy --all-pods
NAME                           READY   STATUS    RESTARTS   AGE
gt20g-deploy-5ff576476-hdz8f   1/1     Running   0          5m16s

[pod/gt20g-deploy-5ff576476-hdz8f/ctr0] GPU 0: NVIDIA RTX A5000 (UUID: GPU-e13ce856-7474-797f-d143-16e99b65c0c3)

Now let’s see what happens when we scale up:

kubectl scale deployment gt20g-deploy --replicas 2

After scaling, check whether a new Pod was added:

kubectl get pod
NAME                           READY   STATUS    RESTARTS   AGE
gt20g-deploy-5ff576476-hdz8f   1/1     Running   0          8m9s
gt20g-deploy-5ff576476-vjss8   0/1     Pending   0          26s

Run describe on the gt20g-deploy-5ff576476-vjss8 Pod:

kubectl describe pod gt20g-deploy-5ff576476-vjss8
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  98s   default-scheduler  0/4 nodes are available: 1 node(s) had untolerated taint(s), 3 cannot allocate all claims. still not schedulable, preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.

Since there’s no other GPU with at least 20 GiB of memory in the cluster — the T10 has only 16 GiB — the new Pod is stuck in Pending.

Clean up the resources:

kubectl delete -f lab03.yaml

For more sophisticated matching logic, see CEL in Kubernetes.

Scenario IV: GPU Time Slicing in DRA

Note for the reader:

As of June 2026, neither the NVIDIA official documentation nor the NVIDIA DRA Driver GPU wiki contains any tutorials on Time Slicing.

The configuration below is adapted from demo/specs/quickstart/v1/gpu-test5.yaml, supplemented by reading parts of the source code; the Feature Gate part draws from third-party articles.

The setup may change in future releases — keep that in mind!

Today, an engineer comes back to me asking: “I know DRA is great for resource allocation, but is there a way to fall back to Time Slicing mode?”

Want to go back to the old mode? Nooo… problem at all!

Just specify the device under .spec.devices.config and switch the sharing strategy to TimeSlicing. Here’s an example:

lab04.yaml

apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
  name: time-slicing-manual
spec:
  devices:
    requests:
    - name: ts-gpu
      exactly:
        deviceClassName: gpu.nvidia.com
    config:
    - requests: ["ts-gpu"]
      opaque:
        driver: gpu.nvidia.com
        parameters:
          apiVersion: resource.nvidia.com/v1beta1
          kind: GpuConfig
          sharing:
            strategy: TimeSlicing
            timeSlicingConfig:
              interval: Long
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ts-gpu-deployment
spec:
  replicas: 4
  selector:
    matchLabels:
      app: ts-gpu
  template:
    metadata:
      labels:
        app: ts-gpu
    spec:
      containers:
      - name: ctr
        image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.6.0-ubuntu18.04
        command: ["bash", "-c"]
        args: ["trap 'exit 0' TERM; /tmp/sample --benchmark --numbodies=4226000 & wait"]
        resources:
          claims:
          - name: gpu
      resourceClaims:
      - name: gpu
        resourceClaimName: time-slicing-manual

Save the above as lab04.yaml and apply it:

kubectl apply -f lab04.yaml

Verify all the Pods are running:

kubectl get pod
NAME                                 READY   STATUS    RESTARTS   AGE
ts-gpu-deployment-549c945798-6t2dx   1/1     Running   0          4s
ts-gpu-deployment-549c945798-tlgp4   1/1     Running   0          4s
ts-gpu-deployment-549c945798-x2gbv   1/1     Running   0          4s
ts-gpu-deployment-549c945798-xbv22   1/1     Running   0          4s

Since all 4 Pods share the same ResourceClaim, kubectl get resourceclaim returns only a single entry — which itself is evidence that they’re sharing:

kubectl get resourceclaim
NAME                  STATE                AGE
time-slicing-manual   allocated,reserved   30s

Drill in with describe and you’ll see all 4 Pods listed under Reserved For:

kubectl describe resourceclaim time-slicing-manual
Status:
  Allocation:
    Devices:
      Config:
        Opaque:
          Driver:  gpu.nvidia.com
          Parameters:
            API Version:  resource.nvidia.com/v1beta1
            Kind:         GpuConfig
            Sharing:
              Strategy:  TimeSlicing
              Time Slicing Config:
                Interval:  Long
        Requests:
          ts-gpu
        Source:  FromClaim
      Results:
        Device:   gpu-0
        Driver:   gpu.nvidia.com
        Pool:     capi-dralabs-md-gpua5000-jw4mx-d64jz
        Request:  ts-gpu
    Node Selector:
      Node Selector Terms:
        Match Fields:
          Key:       metadata.name
          Operator:  In
          Values:
            capi-dralabs-md-gpua5000-jw4mx-d64jz
  Reserved For:
    Name:      ts-gpu-deployment-549c945798-x2gbv
    Resource:  pods
    UID:       be21eecf-2d58-4891-9cac-ac3674f4ff09
    Name:      ts-gpu-deployment-549c945798-6t2dx
    Resource:  pods
    UID:       37d504a0-966e-4263-9fcd-b713d73c0e77
    Name:      ts-gpu-deployment-549c945798-xbv22
    Resource:  pods
    UID:       2b5ec297-9889-4cb9-8079-b626a854b292
    Name:      ts-gpu-deployment-549c945798-tlgp4
    Resource:  pods
    UID:       ee836856-8c6b-4161-9582-4bc4ff63a606


That’s how Time Slicing is enabled under DRA — the effect is similar to the legacy Device Plugin’s Time Slicing mode, except we don’t need to specify how many slices to divide into. We just configure timeSlicingConfig and its interval.

Finally, clean up the resources:

kubectl delete -f lab04.yaml

Summary

Compared to the Device Plugin, DRA now offers a much cleaner usage model that lets developers and cluster admins allocate devices more precisely. There’s no longer a need to colocate the same kind of device on the same node, nor to write complex rules in nodeSelector or Affinity.

Starting with K8s v1.36, device health reporting is also available, so Pods no longer simply show Error — we can tell whether the failure stems from the device or from the application.

Previously, when a K8s cluster ran low on CPU or memory, Cluster Autoscaler could spin up new machines. In the future, the same may apply to GPU shortages — Cluster Autoscaler may provision GPU nodes on demand, enabling more efficient resource allocation.