Guest post originally published on Logz.io’s blog by Dotan Horovits
We all know that observability is a must-have for operating systems in production. But we often neglect our own backyard — our software release process.
We noticed we made that mistake here at Logz.io. We were wasting time and energy in handling failures in the CI/CD pipeline, and made our Developer-on-Duty (DoD) shifts tedious. That’s why it’s critical to incorporate your observability practices into your CI/CD pipeline.
Some CI/CD tools provide some observability capabilities out of the box. At Logz.io, we use Jenkins and have explored its capabilities and plugins in that area. Jenkins lets you enter into individual runs and see how that run went.
But often, it’s not enough when you wish to monitor aggregated information from all the pipeline’s runs, across all branches and machines, with your own filters and time ranges to really understand the patterns.
We found basic aggregative questions tricky or cumbersome to answer, such as:
- Did all runs fail on the same step?
- Did all runs fail for the same reason?
- Did the failure occur only in a specific branch?
- Did the failure occur on a specific machine?
- Which fail the most?
- What’s the normal run time for identifying outliers?
If you also exhausted the built-in observability capabilities of your CI/CD tool, it’s time to set up proper observability – just like you have for your Production environment, with a dedicated monitoring and observability setup.
You can achieve observability into your CI/CD pipeline in four steps. In this longform guide, Fighting Slow and Flaky CI/CD Pipelines Starts with Observability, I use Jenkins as the reference tool, as many know this popular open source project, and as in my company we’ve used it extensively.
But even if you’re using other tools, you’ll find much of that largely applicable. In order to achieve observability into your CI/CD pipeline, you’ll need to:
- Collect data on CI/CD pipeline runs
- Index and store the data for fast query and retrieval
- Visualize the data with custom dashboards
- Build reports and set alert rules on the data
Investing in good CI/CD observability will pay off with a significant improvement in your Lead Time for Changes, effectively shortening the cycle time it takes a commit to reach production.
Can we standardize on CI/CD observability? In fact, CNCF’s OpenTelemetry project, can be a perfect fit, as it’s a unified open platform for collecting observability data. This is the idea behind my OpenTelemetry extension proposal (OTEP), feel free to check the PR on the CNCF GitHub and chime in to get it going.