Project post originally published on the Flux Blog

We are thrilled to announce the release of Flux v2.2.0! In this post, we will highlight some of the new features and improvements included in this release, with the primary theme being the many changes made to the helm-controller.

This new release will also be demoed by Priyanka “Pinky” Ravi and Max Werner on Monday, December 18. To attend this demo and ask any questions, you can register here.

Important things first: API changes

This release is accompanied by a series of (backwards compatible) API changes and introductions. Please refer to the release notes for a comprehensive list, and make sure to read them before updating your Flux installation.

Enhanced HelmRelease reconciliation model

The reconciliation model of the helm-controller has been rewritten to be able to better determine the state a Helm release is in, to then decide what Helm action should be performed to reach the desired state.

Effectively, this means that the controller is now capable of continuing where it left off, and to run Helm tests as soon as they are enabled without a Helm upgrade having to take place first.

In addition, it now takes note of releases while they are happening, instead of making observations afterward. Ensuring that when performing a rollback remediation, the version we revert to is always exactly the same as the one previously released by the controller. In cases where it is uncertain about state, it will always decide to (reattempt to) perform a Helm upgrade.

This also allows it with certainty to only count release attempts that did cause a mutation to the Helm storage as failures towards retry attempts, improving continuity due to it retrying instantly instead of remediating first.

Improved observability of Helm releases

An additional thing the enhanced reconciliation model allowed us to work on is making improvements to how we report state back to you, as a user.

The improvements range from the introduction of Reconciling and Stalled Condition types to become kstatus compatible, to an enriched overview of Helm releases up to the previous successful release in the Status, and more informative Kubernetes Event and Condition messages.

Events:
  Type    Reason            Age   From             Message
  ----    ------            ----  ----             -------
  Normal  HelmChartCreated  25s   helm-controller  Created HelmChart/demo/demo-podinfo with SourceRef 'HelmRepository/demo/podinfo'
  Normal  InstallSucceeded  20s   helm-controller  Helm install succeeded for release demo/podinfo.v1 with chart podinfo@6.5.3
  Normal  TestSucceeded     12s   helm-controller  Helm test succeeded for release demo/podinfo.v1 with chart podinfo@6.5.3: 3 test hooks completed successfully

For more details around these changes, refer to the Status section in the HelmRelease v2beta2 specification.

Recovery from pending-* Helm release state

A much-reported issue was the helm-controller being unable to recover from another operation (install/upgrade/rollback) is in progress errors, which could occur when the controller Pod was forcefully killed. From this release on, the controller will recover from such errors by unlocking the Helm release from a pending-* to a failed state, and retrying it with a Helm upgrade.

Helm Release drift detection and correction

Around April we launched cluster state drift detection and correction for Helm releases as an experimental feature. At that time, it could only be enabled using a controller global feature flag, making it impractical to use at scale due to the wide variability in charts and unpredictability of the effects on some Helm charts.

For charts with lifecycle hooks, or cluster resources like Horizontal/Vertical Pod Autoscalers for which controllers may write updates back into their own spec, those updates would always be considered as drift by the helm-controller unless the resource would be ignored in full.

To address the above pain points, Helm drift detection can now be enabled on the HelmRelease itself, while also allowing you to ignore specific fields using JSON Pointers:

spec:
  driftDetection:
    mode: enabled
    ignore:
      - paths: ["/spec/replicas"]
        target:
          kind: Deployment

Using these settings, any drift detected will now be corrected by recreating and patching the Kubernetes objects (instead of doing a Helm upgrade) while changes to the .spec.replicas fields for Deployments will be ignored.

For more information, refer to the drift detection section in the HelmRelease v2beta2 specifiation.

Forcing and retrying Helm releases

Another much-reported issue was the impractical steps one had to take to recover from “retries exhausted” errors. To instruct the helm-controller to retry installing or upgrading a Helm release when it is out of retries, you can now either:

flux reconcile helmrelease <release> --reset
flux reconcile helmrelease <release> --force

For in-depth explanations about these new command options, refer to the “resetting remediation retries” and “forcing a release” sections in the HelmRelease v2beta2 specification.

Benchmark results

To measure the real world impact of the helm-controller overhaul, we have set up benchmarks that measure Mean Time To Production (MTTP). The MTTP benchmark measures the time it takes for Flux to deploy application changes into production. Below are the results of the benchmark that ran on a GitHub hosted runner (Ubuntu, 16 cores):

ObjectsTypeFlux componentDurationMax Memory
100OCIRepositorysource-controller25s38Mi
100Kustomizationkustomize-controller27s32Mi
100HelmChartsource-controller25s40Mi
100HelmReleasehelm-controller31s140Mi
500OCIRepositorysource-controller45s65Mi
500Kustomizationkustomize-controller2m2s72Mi
500HelmChartsource-controller45s68Mi
500HelmReleasehelm-controller2m55s350Mi
1000OCIRepositorysource-controller1m30s67Mi
1000Kustomizationkustomize-controller4m15s112Mi
1000HelmChartsource-controller1m30s110Mi
1000HelmReleasehelm-controller8m2s620Mi

The benchmark uses a single application ( podinfo) for all tests with intervals set to 60m. The results may change when deploying Flux objects with a different configuration.

For more information about the benchmark setup and how you can run them on your machine, check out the fluxcd/flux-benchmark repository.

Breaking changes to Kustomizations

All Flux components have been updated from Kustomize v5.0.3 to v5.3.0.

You should be aware that this update comes with a breaking change in Kustomize, as components are now applied after generators. If you use Kustomize components or .spec.components in Kustomizations along with generators, then please make necessary changes before upgrading to avoid any undesirable behavior. For more information, see the relevant Kustomize issue.

Other notable changes

Installing or upgrading Flux

To install Flux, take a look at our installation and get started guides.

To upgrade Flux from v2.x to v2.2.0, either rerun flux bootstrap or use the Flux GitHub Action.

To upgrade the APIs, make sure the new Custom Resource Definitions and controllers are deployed, and then change the manifests in Git:

  1. Set apiVersion: helm.toolkit.fluxcd.io/v2beta2 in the YAML files that contain HelmRelease definitions.
  2. Set apiVersion: notification.toolkit.fluxcd.io/v1beta3 in the YAML files that contain Alert and Provider definitions.
  3. Commit, push and reconcile the API version changes.

Bumping the APIs version in manifests can be done gradually. It is advised to not delay this procedure as the deprecated versions will be removed after 6 months.

Over and out

If you have any questions, or simply just like what you read and want to get involved. Here are a few good ways to reach us: