Guest post originally published on the ARMO blog by Amir Kaushansky
Kubernetes’ new version – version 1.25 – will be released on Tuesday 23rd August 2022, and it comes with 40 new enhancements in various areas and numerous bug fixes.
This blog will focus on the highlighted changes from each special interest group (SIG) in the upcoming release and ensure you are confident before upgrading your clusters.
There are two new and shiny enhancements from the API machinery group.
CRD Validation Expression Language (graduation to beta)
Custom resources are the key extension point in Kubernetes to create and manage new resources in the Kubernetes API. With the upcoming release, CRD validation using expression language graduates to beta. Instead of deploying and using webhooks for validation, it will now be possible to add validation rules to the CRD schema and manage them side by side with the resource specification.
Retriable and Non-Retriable Pod Failures for Jobs (alpha release)
Job resources are the way to run one-time tasks in Kubernetes. However, the job API in Kubernetes is minimal regarding failure handling. With this new alpha feature, there’s a new field in podFailurePolicy in the job specification. You can define rules as follows, an example job spec with a failure policy from Github/Kubernetes, and take action on the outcome of the container:
apiVersion: v1 kind: Job spec: template: spec: containers: - name: main-job-container image: job-image command: ["./program"] - name: monitoring-job-container image: job-monitoring command: ["./monitoring"] backoffLimit: 3 podFailurePolicy: rules: - action: Terminate onExitCodes: containerName: main-job-container operator: In values: [1,2,3] - action: Ignore onPodConditions: - type: DisruptionTarget
The Apps SIG focuses on deploying and managing complex applications in Kubernetes. In the 1.25 release, there are two crucial enhancements in this area.
Add minReadySeconds to StatefulSets (graduation to stable)
minReadySeconds is a new—but stable—field in StatefulSet resources to ensure the workload is ready after the pods become available. These extra buffer seconds are beneficial when containers start, but it takes time to be ready for the application to accept requests.
TimeZone Support in CronJob (graduation to beta)
CronJob instances are created by the schedule provided in the resource specification. However, the time zone of the newly created resources is based on where the controller-manager is running. With the new enhancement, you get a new field, spec.timeZone, where you can use a valid timezone from the tz database.
Here, we have one critical depreciation and one new alpha release from the authorization, authentication, and cluster security policy area.
Removal of PodSecurityPolicy
In Kubernetes 1.25, PodSecurityPolicy is completely removed after its depreciation in the 1.21 version. PodSecurityPolicy was the solution to define rules on a pod’s capabilities, but it became complex and confusing over time. Instead, Kubernetes has now implemented Pod Security Admission controllers with a clear migration path.
KMS v2 Improvements (alpha release)
Kubernetes stores all of its data in etcd, and it is not encrypted by default. Because of this, Kubernetes offers external mechanisms like Key Management Service (KMS) providers to safely store data in etcd. The new v2alpha1 enhancement focuses on making KMS handle key rotation automatically. In addition, it offers improvements on KMS plugin health checks and observability of the operations between the API server and KMS.
There are two graduations from the networking area in the upcoming release.
NetworkPolicy Port Range (graduation to stable)
In ingress and egress network policies, you need to specify each port one by one with the current Kubernetes API. The new—and now stable—feature adds a field named endPort to easily declare a port range. For instance, you can apply a rule from port 32000 to 32768 as follows:
spec: egress: - ports: - protocol: TCP port: 32000 endPort: 32768
Reserve Service IP Ranges for Dynamic and Static IP Allocation (graduation to beta)
Kubernetes service resources expose applications inside and outside the cluster. There are two methods to choose an IP for a service resource: Either Kubernetes assigns a random IP from a configured range, or the user statically specifies an IP from the same range. You can divide the IP range using the ServiceIPStaticSubrange field, which has graduated to beta, and avoid collisions while assigning IP addresses to services in Kubernetes.
In the 1.25 release, there are three general availability (GA) graduations, as well as one beta and one alpha release, in the node area.
Ephemeral Containers (graduation to stable)
Debugging a distributed living system is always challenging since it is not easy to connect, send requests, and check the results. With ephemeral containers, you can add a container to a running pod. Since the application container images are minimal without any shell, curl, or debugging tool, ephemeral containers are beneficial for quickly spinning a debugger container.
For instance, you can attach an interactive ephemeral busybox image to db-pod with the following command and start debugging:
$ kubectl debug db-pod -it --image=busybox Defaulting debug container name to debugger-8xzrl. If you don't see a command prompt, try pressing enter. / #
cgroups v2 (graduation to stable)
cgroups is one of the key Linux kernel functionalities to organize and manage container resources on nodes. In the early days of Kubernetes, all container runtimes were built using cgroup v1, but now cgroups v2 support has graduated to general availability. With cgroups v2, container workloads will work more securely, including rootless containers, and more reliably with the latest kernel functionalities.
Add Configurable Grace Period to Probes (graduation to stable)
There is a new—and now stable in liveness probes—field called terminationGracePeriodSeconds in addition to the terminationGracePeriodSeconds on the pod level. The separation of these fields helps decide how long Kubernetes will wait to kill a container under a normal shutdown and due to a failed liveness probe.
seccomp by Default (graduation to beta)
Kubernetes allows increasing container security by defining seccomp profiles; it has been an alpha feature since the 1.22 release. Enabling Seccomp by default adds a security layer to prevent CVEs and 0-days, and now this feature has graduated to beta in the 1.25 release.
Add CPUManager Policy Option to Align CPUs by Socket Instead of NUMA Node (alpha release)
With the new CPU architectures, there’s an increase in the number of NUMA (non-uniform memory access) nodes per socket. The new alpha feature adds a new CPUManager policy option as align-by-socket. With this, CPUs will be considered aligned at socket boundaries instead of NUMA boundaries.
Version 1.25 has a single critical enhancement from the security area.
Auto-Refreshing Official CVE Feed (alpha release)
Kubernetes is one of the most active open-source repositories and thus has many issues and PRs, which, in relation to CVEs, are impossible to filter. The new alpha feature ensures the labeling of issues and PRs with the help of automation. This new approach will let you list CVEs with the relevant information as an end-users, maintainers, or platform providers.
You’ll get one new alpha release from the scheduling area.
Respect PodTopologySpread After Rolling Upgrades (alpha release)
PodTopologySpread is a part of the pod API to define constraints on how pods are distributed over the cluster, such as per region, zone, node, or any other user-defined topology. For instance, let’s assume you have a 20-node cluster and an auto-scaling application with a minimum of 2 and a maximum of 15. When a minimum of 2 instances are running, you would not want both of them to run on the same node—or availability zone. These constraints are helpful, as they increase availability in case of failures in the cluster. With the 1.25 release, Kubernetes will also respect the spread constraints in the rolling-upgrade stage.
From the storage area, there are two essential general availability graduations and one alpha release.
Local Ephemeral Storage Capacity Isolation (graduation to stable)
Pods use temporary storage to write their logs and emptyDir mounts and as a cache. Without any isolation, every pod on the node shares the same temporary storage pool on a “best-effort” basis. In other words, pods do not know how much space is allocated to them or left on the node. With the storage capacity isolation feature, which will be generally available in the upcoming release, pods can reserve their own storage from the ephemeral pool.
In-Tree Storage Plugin to CSI Driver Migration (graduation to stable)
Migration of the in-tree plugins to external CSI plugins graduates to stable in version 1.25. This is an important step that includes the removal and depreciation of many volume plugins:
- Depreciation: GlusterFS, Portworx
- Removal: The Flocker, Quobyte, and StorageOS
- Migration to CSI plugin: AWS EBS, GCE PD, vSphere
Retroactive Default StorageClass Assignment (alpha release)
The default storage class is configured mainly during cluster creation by the cluster admin. However, when there are changes in the underlying storage provider or business requirements, you should also change the default storage class in the cluster. The new alpha feature focuses on changing the Kubernetes behavior to be retroactive for PVCs without any storage class.
Kubernetes 1.25 aims to make Kubernetes more secure, reliable, and flexible. Ensure that you are well-equipped for the latest changes in the release and upgrade your infrastructure promptly. Check the Kubernetes blog and release notes to learn more about the enhancements and the latest changes.