OpenKruise v1.8 unlocking infinite possibilities in cloud-native application management

Posted on August 13, 2025 by YuXing Yuan, OpenKruise Project Maintainer

CNCF projects highlighted in this post

OpenKruise is an open-source cloud-native application automation management suite. It is also a current incubating project hosted by the Cloud Native Computing Foundation (CNCF). It is a standard extension component based on Kubernetes that is widely used in production of internet scale company. It also closely follows upstream community standards and adapts to the technical improvement and best practices for internet-scale scenarios.

In February 2025, OpenKruise released its latest version 1.8[2]. This version brings many important updates and enhancements, aimed at further improving the efficiency, elasticity, and reliability of cloud-native application management. This article provides a comprehensive overview of the new version.

1. Embrace In-Place VPA: Unlock New Potential in Resource Management

Authors: @LavenderQAQ, @ABNER-1

In Kubernetes 1.27, the significant enhancement of InPlacePodVerticalScaling (In-Place VPA) was introduced to boost the flexibility and efficiency of resource management. This feature has recently advanced to Beta in Kubernetes 1.33, reflecting its improved stability and suitability for production environments.

Kruise is dedicated to boosting users’ workload management capabilities. In Kruise 1.8, we first integrated InPlacePodVerticalScaling with our advanced workload types like CloneSet, Advanced StatefulSet, and Advanced DaemonSet. Users can directly modify the resource settings in the workload to perform in-place resizing of resources for all pods under the workload without restarts, or simultaneously resources resize and upgrade images during in-place pod upgrades without recreating pods. This integration delivers a unique and optimized resource management experience that has not yet been incorporated into native Kubernetes workloads, offering users a distinct advantage in leveraging this innovative feature.

1.1 Core Highlights

● Easier and More Stable Implementation of Specification Changes

Historically, both manual user-initiated and VPA-recommended resource adjustments triggered instance recreation, potentially introducing operational disruptions or system instability. This approach poses critical risks in high-load scenarios — upgrading database resources during peak demand becomes a high-stakes endeavor with potential cascading effects. Furthermore, the current VPA implementation relies on webhook modifications for newly scaled instances, creating specification drift between runtime configurations and pod templates. This architectural limitation impedes visibility and often prevents intended resource optimizations from materializing effectively.

● Future Integration with VPA for More Stable Vertical Scaling

By leveraging workload controllers’ inherent capability to orchestrate pod lifecycle operations, we can implement coordinated change management with built-in availability safeguards. Our strategic roadmap includes enabling VPA to initiate resource adjustments through standardized workload interfaces, ensuring consistent configuration updates across both existing and newly provisioned instances. This controller-centric approach eliminates direct pod manipulation, mitigating availability degradation risks caused by conflicting modifications from concurrent controllers while maintaining strict service level objectives.

1.2 Enabling the Feature

1. Ensure Kubernetes Cluster Support

Verify that your Kubernetes cluster has the InPlacePodVerticalScaling feature enabled.

2. Enable During Kruise Installation/Upgrade

When installing or upgrading Kruise, enable InPlaceWorkloadVerticalScaling.

3. Configure Update Strategy

Set the update strategy to either InPlaceIfPossible or InPlaceOnly.

1.3 Current Capabilities and Limitations

● Supported Resource Types

Only CPU and memory resources can be adjusted.

● Environment Constraints

Certain limitations exist when operating in a Cgroupv1 environment.

● Adjustment Restrictions

If resource adjustments result in changes to a Pod’s Quality of Service (QoS), Kruise-manager will automatically revert to recreating the Pod.

1.4 Example Configuration

apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
spec:
  template:
    spec:
      containers:
        - name: example-container
          resources:
            requests:
              memory: "512Mi"
              cpu: "500m" # Adjusted from 1 -> 500m / 2
  updateStrategy:
    type: InPlaceIfPossible

Key Considerations:

● The InPlacePodVerticalScaling feature in Kubernetes is still evolving. Carefully assess its suitability before deploying it in production environments.

● Keep up-to-date with the latest developments.

2. Redefining Storage Management for Stateful Workloads

Authors: @ABNER-1

Throughout the evolution of Kubernetes, the Volume Expansion feature[6] was introduced in version 1.8 and became a General Availability (GA) feature in version 1.24. However, Persistent Volume Claims (PVCs) managed by the built-in StatefulSet have not fully leveraged this feature. Users have had to manually batch update PVCs for capacity maintenance, while newly added PVCs retain the original configuration, making the storage management for stateful workloads complex and inefficient.

Starting from its early versions, Kruise allowed users to manage the capacity of new PVCs by directly modifying the volume template of StatefulSet. In the latest Kruise 1.8, we have introducing in-place volume expansion support for StatefulSet, aiming to revolutionize the storage management of stateful applications.

2.1 Core Highlights: Seamless Expansion, Simplified Operations

1. In-place Expansion: No Pod Restart or Data Migration Required

For stateful applications, storage capacity expansion often entails high maintenance costs and potential application downtime risks. Kruise 1.8’s in-place expansion feature allows users to directly increase the PVC capacity managed by Advanced StatefulSet without restarting pods or migrating data. This not only significantly reduces application downtime but also greatly simplifies the storage management process.

2. Gradual Changes: Combined with Rolling Update Strategy

Kruise 1.8 also introduces a gradual change mechanism combined with the pod rolling sequence. Through this method, users can gradually and safely apply storage capacity expansion to existing PVCs, ensuring a smooth and reliable change process that minimizes the impact on business operations.

3. Automated Management: No Manual Operations

In the past, users needed to manually batch update PVC configurations, which was cumbersome and prone to errors. Now, with Kruise’s automated management capabilities, you can easily achieve unified management for both new and existing PVCs, significantly enhancing operational efficiency.

2.2 Feature Limitations

The Kubernetes cluster in use must enable Volume Expansion or be above version 1.24.
The storage class of the PVC to be expanded must be managed by CSI and support volume expansion. You can check the allowVolumeExpansion field of the storage class object to determine this.
To use this feature, you need to enable the StatefulSetAutoResizePVCGate Feature Gate when installing or upgrading Kruise.

2.3 Usage Example

Control the volume update strategy through the volumeClaimUpdateStrategy field, including two modes: OnPodRollingUpdate and OnDelete.

The OnPodRollingUpdate mode allows automatic adjustment of PVC size during pod rolling updates, whereas the OnDelete mode requires manual deletion of old PVC before using the new volume template to recreate the PVC. This feature greatly enhances the flexibility and efficiency of managing stateful application storage, allowing easier adjustments when storage requirements change. Example configuration is as follows:

apiVersion: apps.kruise.io/v1beta1
kind: StatefulSet
spec:
  # ...
  volumeClaimUpdateStrategy:
    # Options are OnPodRollingUpdate and OnDelete
    #   OnPodRollingUpdate: controller will automatically expand PVC during pod upgrade
    #   OnDelete: controller will only use the new volume claim template to reconstruct PVC after PVC deletion
    #             i.e., the default PVC update strategy before Kruise version 1.7.
    type: OnPodRollingUpdate 
  volumeClaimTemplates:
    # ...
    spec:
      resources:
        requests:
          storage: 2Gi # 1Gi -> 2Gi
      storageClassName: allow-volume-expansion

After modifying the advanced statefulset spec, you can observe the changes in the status to monitor the PVC change situation:

status:
  #...
  volumeClaimTemplates:
    - compatibleReadyReplicas: 0 # Number of PVCs with completed changes
      compatibleReplicas: 1 # Number of PVCs with updated spec sizes
      volumeClaimName: data0

3. Empowering AI Workloads with WorkloadSpread’s Capabilities

Authors: @AiRanthem

In the Kubernetes ecosystem, multi-region management and flexible scheduling have always been core needs for complex workloads. Since the introduction of WorkloadSpread in Kruise 0.10.0, this feature has provided users with a non-intrusive way to enable detailed management of workloads across regions and nodes. Whether it’s spreading workloads horizontally by host or availability zone, or partition management by proportion and priority, WorkloadSpread demonstrates its powerful flexibility and adaptability.

However, in practical applications, we found that not all workloads could fully meet the assumptions of WorkloadSpread. For example, AI workloads (such as TFJob in KubeFlow), due to their multi-role design, do not implement the Kubernetes Scale subinterface, making them unable to directly utilize the capabilities provided by WorkloadSpread. These workloads often need to span dedicated hardware or different availability zones to achieve better performance and flexibility, making the demand for multi-region management particularly urgent.

In Kruise 1.8, we have made significant upgrades targeting this pain point—supporting workloads that have not implemented the Scale subinterface. With the newly added targetFilter configuration, users can now easily apply WorkloadSpread’s capabilities to complex AI workloads like TFJob, enabling more efficient resource allocation and management.

Below is a sample configuration for WorkloadSpread’s targetFilter. For more detailed documentation, please refer to the WorkloadSpread documentation.

apiVersion: apps.kruise.io/v1alpha1
kind: WorkloadSpread
spec:
  # ...
  targetRef:
    apiVersion: kubeflow.org/v1alpha1
    kind: TFJob
    name: tfjob-demo
  # Use targetFilter to provide support for workloads without the Scale subinterface
  targetFilter: 
    selector:
      # Use selector to filter instances managed by targetRef
      matchLabels:
        role: worker
    replicasPathList:
      # Use replicasPathList to specify the total number of replicas needed by targetRef
    - spec.tfReplicaSpecs.Worker.replicas

4. Custom Probing: Injecting New Vitality into Serverless Scenarios

Authors: @zmberg

Since the introduction of the PodProbeMarker custom probing feature in Kruise 1.3, it has rapidly become a trusted tool among developers due to its flexible extension capabilities and wide range of applications. Specifically, in game solutions OpenKruise Game (OKG), the service quality probing capabilities based on PodProbeMarker have helped numerous game developers achieve efficient and stable service management, making it the preferred tool in the industry.

However, in serverless scenarios, PodProbeMarker’s custom probing capabilities once faced challenges. Before Kruise version 1.8, the implementation of PodProbeMarker relied on the node component Kruise-Daemon, which could not be deployed in a serverless environment, thereby limiting its probing abilities.

Now, with the release of Kruise 1.8, this issue has finally been resolved! The Kruise team has optimized for serverless scenarios, extending the PodProbeMarker protocol to serverless pods, providing new possibilities for custom probing. You can now fully utilize resources in serverless scenarios while enjoying high-quality custom service probing capabilities, safeguarding your service operations.

The extension of the PodProbeMarker protocol in serverless scenarios is as follows:

The Kruise-manager adds the required probe via the annotation kruise.io/podprobe on serverless pods.
The serverless PodProbeMarker implementation reads the probe information from the annotation kruise.io/podprobe, executes the probe, and writes the results to .status.conditions[x] of the pod.
The Kruise-manager recognizes the probe execution results from .status.conditions[x] in serverless pods and performs the marking actions defined in markerPolicy.

For more detailed protocols and a list of supported container service providers, please refer to the PodProbeMarker documentation.

5. SidecarSet Gradual Injection: More Granular Version Control

Authors: @AiRanthem

In the cloud-native ecosystem of Kubernetes, SidecarSet has become one of the most popular features of Kruise, serving as a powerful tool for simplifying the management and operations of sidecar containers. Whether it’s for log collection, monitoring agents, or service mesh components, SidecarSet’s elegant design helps users easily handle complex production environment requirements. However, in previous versions, support for gradual injection scenarios was somewhat lacking, causing some inconvenience for users requiring fine-grained version control.

Now, Kruise 1.8 introduces new enhancements by adding gradual injection capabilities, providing SidecarSet with unprecedented flexibility and control. Whether you wish to gradually verify the stability of a new version or need to implement complex release strategies, Kruise 1.8 offers robust support. Here is an example configuration:

apiVersion: apps.kruise.io/v1alpha1
kind: SidecarSet
metadata:
  name: sidecarset
spec:
  # ...
  injectionStrategy:
    revision:
      # Specify the version to inject
      revisionName: revision-a
      # Options: Always and Partial
      #  Always: Always inject the specified version, in this case, revision-a
      #  Partial: Combine injection with the partition percentage to control the percentage of the specified version injected
      policy: Partial
  updateStrategy:
    partition: 70%

This configuration allows you to control the injection of sidecar containers with precision, supporting advanced deployment strategies tailored to your specific needs.

6. Helm Pre-delete Hook: Kruise Accidental Deletion Protection

Authors: @AiRanthem

In versions prior to Kruise 1.7.3, there was a significant risk involved in using Helm to uninstall Kruise, as this operation would remove Kruise itself, its custom resource definitions (CRDs), and the associated custom resources (CRs).

Starting from Kruise 1.7.3, a pre-delete hook has been introduced in the Helm uninstall process. This hook checks for the presence of custom resources managed by Kruise within the cluster uninstallation. If such resources are found, the uninstall process is stopped, preventing data loss and service interruptions due to accidental uninstallation. This improvement significantly enhances the security of managing Kruise with Helm.

Future Outlook

For comprehensive upgrade and usage guidelines, please refer to the official Kruise documentation [4]. We believe that Kruise 1.8 will empower you to manage your cloud-native applications more effectively and elevate your application management experience.

Continuous Technological Evolution

Kruise is committed to providing users with exceptional cloud-native solutions. To achieve this vision, we have outlined three exciting upcoming releases:

1. Release 1.9:

Integrate LWS with Advanced StatefulSet for in-place upgrade capabilities
Develop solutions for in-place upgrade restart limitations
Implement gradual change strategies for ConfigmapSet

2. Release 2.0:

Upgrade Kruise API to v1beta1

3. Release 2.1:

Develop minimalist Kruise component deployment solutions
Introduce new Liveness Probe features

Get Involved

Welcome to get involved with OpenKruise by joining us in Github/Slack/DingTalk/WeChat. Have something you’d like to broadcast to our community? Share your voice at our Bi-weekly community meeting (Chinese) [3], or through the channels below:

● Join the community on Slack (English) [5].

● Join the community on DingTalk: Search GroupID 23330762 (Chinese).

● Join the community on WeChat (new): Search User openkruise and let the robot invite you (Chinese).

● [1] OpenKruise github repo
● [2] ChangeLog
● [3] Bi-weekly community meeting (Chinese)
● [4] Kruise docs
● [5] Slack channel
● [6] kubernetes volume-expansion ga

Amsterdam, Netherlands

1. Embrace In-Place VPA: Unlock New Potential in Resource Management

1.1 Core Highlights

1.2 Enabling the Feature

1. Ensure Kubernetes Cluster Support

2. Enable During Kruise Installation/Upgrade

3. Configure Update Strategy

1.3 Current Capabilities and Limitations

1.4 Example Configuration

Key Considerations:

2. Redefining Storage Management for Stateful Workloads

2.1 Core Highlights: Seamless Expansion, Simplified Operations

2.2 Feature Limitations

2.3 Usage Example

3. Empowering AI Workloads with WorkloadSpread’s Capabilities

4. Custom Probing: Injecting New Vitality into Serverless Scenarios

5. SidecarSet Gradual Injection: More Granular Version Control

6. Helm Pre-delete Hook: Kruise Accidental Deletion Protection

Future Outlook

Continuous Technological Evolution

1. Release 1.9:

2. Release 2.0:

3. Release 2.1:

Get Involved