Project post by Volcano project maintainers

CNCF Volcano 1.6.0 is now available with new features such as elastic job management, dynamic scheduling and rescheduling based on actual resource utilization, and MPI job plugin.

Volcano v1.6.0 is now available

Volcano is the  first cloud native batch computing project in CNCF. It was open sourced at Shanghai KubeCon in June 2019 and accepted as a CNCF project in April 2020. In April 2022, Volcano was promoted to a CNCF incubation project. By now, more than 400 global developers have committed code to the project. The community is growing popularity among developers, partners, and users.

Key Features

Scheduling for elastic job

This feature, working with Volcano Jobs or PyTorch Jobs, accelerates AI training and big data analytics and reduces costs by using spot instances on the cloud.

The number of replicas allowed for an elastic job falls within [min, max]. min corresponds to minAvailable of the job, and max indicates the number of replicas of the job. The elastic scheduling module preferentially allocates resources to the minAvailable pods to ensure that their minimum resource requests are met. 

Resources, when idle, will be allocated by the scheduler to elastic pods to accelerate computing. However, when the cluster is resource-starved, the scheduler preferentially preempts the resources of elastic pods, which triggers scale-in. The scheduler also balances resource allocation based on priorities. For example, a high-priority job can preempt resources of an elastic pod of a low-priority job.

Job 1-1

Documentation: https://github.com/volcano-sh/volcano/blob/master/docs/design/elastic-scheduler.md

Issue:

https://github.com/volcano-sh/volcano/issues/1876

Dynamic scheduling

The current scheduling mechanism, based on resource request and allocation, may cause unbalanced node resource utilization. For example, a pod may be scheduled to a node with a extremely high resource usage and cause a node exception, while there are some other nodes in the cluster that are not heavily used

In version 1.6.0, Volcano collaborates with Prometheus to make scheduling decisions. Prometheus collects data about cluster node resource use, and Volcano uses this data to balance node resource usage as much as possible. You can also configure the limits of CPUs and memory of each node. This prevents node exceptions caused by pods using too many resources.

Example scheduling policy:

actions: "enqueue, allocate, backfill"  
tiers:
  - plugins:
      - name: priority
      - name: gang
      - name: conformance
      - name: usage  # usage based scheduling plugin
        arguments:
          thresholds:
            CPUUsageAvg.5m: 90 # The node whose average usage in 5 minute is higher than 90% will be filtered in predicating stage
            MEMUsageAvg.5m: 80 # The node whose average usage in 5 minute is higher than 80% will be filtered in predicating stage
  - plugins:
      - name: overcommit
      - name: drf
      - name: predicates
      - name: proportion
      - name: nodeorder
      - name: binpack
metrics:                         # Metrics Server-related configuration
  address: http://192.168.0.10:9090  # (mandatory) Prometheus server address
  interval: 30s                    # (optional) The scheduler pulls metrics from Prometheus with this interval. 5s by default.

Documentation: https://github.com/volcano-sh/volcano/blob/master/docs/design/usage-based-scheduling.md

Issue:

https://github.com/volcano-sh/volcano/issues/1777

Rescheduling

Improper scheduling policies and dynamic job lifecycles lead to unbalanced node resource utilization. In version 1.6.0, Volcano allows you to add rescheduling policies based on the actual resource utilization or custom metrics. Pods will be evicted from some high-load nodes to low-load nodes, and the resource utilization of all nodes will be periodically checked.  

Rescheduling further balances the loads of each node and improves the cluster resource utilization.

## Configuration Option actions: “enqueue, allocate, backfill, shuffle”  ## Add ‘shuffle’ at the end of the actions tiers:

- plugins:
      - name: priority
      - name: gang
      - name: conformance
      - name: rescheduling       ## Rescheduling plugin
        arguments:
          interval: 5m           ## (optional) The strategies will be called in this duration periodically. 5 minutes by default. 
          strategies:            ## (mandatory) The strategies work in order.
            - name: offlineOnly
            - name: lowPriorityFirst
            - name: lowNodeUtilization
              params:
                thresholds:
                  "cpu" : 20
                  "memory": 20
                  "pods": 20
                targetThresholds:
                  "cpu" : 50
                  "memory": 50
                  "pods": 50
          queueSelector:         ## (optional) Select workloads in specified queues as potential evictees. All queues by default.
            - default
            - test-queue
          labelSelector:         ## (optional) Select workloads with specified labels as potential evictees. All labels by default.
            business: offline
            team: test
  - plugins:
      - name: overcommit
      - name: drf
      - name: predicates
      - name: proportion
      - name: nodeorder
      - name: binpack

Documentation: https://github.com/volcano-sh/volcano/blob/master/docs/design/rescheduling.md 

Issue:

https://github.com/volcano-sh/volcano/issues/1777

MPI plugin

You can use Volcano Jobs to run MPI jobs. Volcano Job build-in plugins such as svc, env, and ssh automatically configure password-free communications and environment variable injection for the masters and workers of MPI jobs.

The new version of Volcano further eases your running of MPI jobs by providing the MPI plugin. No more worries about the shell syntax, the communications between masters and workers, or manual SSH authentication. You can start an MPI job in a simple and graceful manner.

Example configuration:

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: lm-mpi-job
spec:
  minAvailable: 1
  schedulerName: volcano
  plugins:
    mpi: ["--master=mpimaster","--worker=mpiworker","--port=22"]  ## MPI plugin register
  tasks:
    - replicas: 1
      name: mpimaster
      policies:
        - event: TaskCompleted
          action: CompleteJob
      template:
        spec:
          containers:
            - command:
                - /bin/sh
                - -c
                - |
                  mkdir -p /var/run/sshd; /usr/sbin/sshd;
                  mpiexec --allow-run-as-root --host ${MPI_HOST} -np 2 mpi_hello_world;
              image: volcanosh/example-mpi:0.0.1
              name: mpimaster
              workingDir: /home
          restartPolicy: OnFailure
    - replicas: 2
      name: mpiworker
      template:
        spec:
          containers:
            - command:
                - /bin/sh
                - -c
                - |
                  mkdir -p /var/run/sshd; /usr/sbin/sshd -D;
              image: volcanosh/example-mpi:0.0.1
              name: mpiworker
              workingDir: /home
          restartPolicy: OnFailure

Documentation: https://github.com/volcano-sh/volcano/blob/master/docs/design/distributed-framework-plugins.md 

Issue:

https://github.com/volcano-sh/volcano/pull/2194

Links:

Release note: https://github.com/volcano-sh/volcano/releases/tag/v1.6.0 

Branch: https://github.com/volcano-sh/volcano/tree/release-1.6 

About Volcano

Website: https://volcano.sh 

Github: https://github.com/volcano-sh/volcano 

Volcano is designed for high-performance batch computing such as AI, big data, gene sequencing, and rendering jobs. The project has got more than 2400 Stars and 550 Forks on GitHub. 26,000 developers around the world join the community. Contributing enterprises include Huawei, AWS, Baidu, Tencent, JD.com, and Xiaohongshu. 

Volcano supports mainstream computing frameworks, including Spark, Flink, TensorFlow, PyTorch, Argo, MindSpore, PaddlePaddle, Kubeflow, MPI, Horovod, MXNet, and KubeGene. A comprehensive, robust ecosystem has been developed.