Post by LitmusChaos maintainers
Cloud native adoption continues to increase, and it is not a surprise that new challenges are arising that are associated with the scale. The modern DevOps ecosystem driven by cloud native technologies is helping the software changes to ship faster than ever. The speed of shipping cannot become an excuse for the downgrade in the reliability of the deployed service. It is increasingly becoming common to use well-knit chaos experiments in the CD pipelines to ensure the reliability is tested before and after the deployments. Apart from the well-known use case of chaos testing by SREs, the LitmusChaos project has witnessed growth in its usage by the QA teams. The project architecture revolves around simplicity, being highly declarative and API driven. This is validated in the last few quarters for this new QA use case as well and we are seeing some large enterprise users running thousands of chaos experiments in their QA environment every week.
Here is a list of important capabilities that were built out and project updates since the last KubeCon in Valencia!
- Introduction of HTTP Chaos experiment suite
- Support for network & stress experiments on new versions of Kubernetes (1.21+) & OpenShift (>4.x)
- Support for network chaos experiments (Latency, Packet Loss, HTTP, DNS) on service-mesh enabled environments
- Randomization (across range) support for fault inputs
- Redundancy (HA) for the Chaos Operator
- Chaos Workflow CRUD support using CLI (litmusctl)
- Improved support for Air Gapped environments
- Improved support for containerd & CRIO runtime (DNS, HTTP)
- More powerful experiment bootstrapping with Litmus SDK
Many great individuals have been added to the Litmus project, but thanks to those teams at the following companies that came forward to declare their usage of Litmus.
For more details on the full list of adopters and their stories – see our Adopters list.
New contributors and maintainers
Heightened awareness around cloud-native chaos engineering and the need for a holistic approach towards the practice have resulted in organizations investing time and personnel for it. This has naturally benefited the Litmus project, and along with its increased adoption, brought about a welcome growth in contributor interest. Some important community contributions over the past couple of quarters include:
- Simplified setup of Chaos Agents on Kubernetes clusters via dedicated Helm charts
- New category of Application Specific Chaos Experiments (starting with springboot)
- Newer faults centered on Kubernetes Nodes (targeting storage volumes, node network)
- Enabling more powerful hypothesis validation (by widening the scope of command probes)
- Refactoring of Chaos Server APIs to make them more user-friendly
- Improvements to the automated e2e test suite
- Creation of GoogleCodeLabs based Litmus usage tutorials
- Improvement of security posture (optimization of execution privileges, simplifying docs)
The contributions were made by developers from a wide range of organizations, including Orange, RedHat, Microsoft, Oracle, Klanik, T-Mobile, VoerEir AB, FIS, Lowes, Wipro, JFrog, HCL.
Increased community traction calls for better governance and maintenance effort, and the Litmus project has now onboarded 6 newer maintainers, overlooking these specific areas:
- Chaos Control Plane
- Chaos Execution Plane
- Chaos Experiments
- Test automation & e2e
- Deployment & Releases
Announcing Litmus 3.0 Beta
LitmusChaos maintainers have announced the plan for development of the next major release – Litmus 3.0. The project team is now encouraging reviews, contributions and feedback around this Beta release. The Litmus 3.0 will have a new UI console that is completely renewed and improved UX for creating new experiments more intuitively. Just like the previous major version Litmus 2.0, this version will continue as Beta for about 6 months to gather feedback from the community before becoming the preferred choice for usage. The release details are available here.
Other notable upcoming features that will be launched as part the 3.0 beta programme include:
- Improved Scalability for litmus experiments (lesser resources, one transient fault pod per node)
- Chaos Provisioned Clusters – Terraform Templates
- Integration with OpenTelemetry, OPA Gatekeeper and other CNCF projects
To get quick access to Litmus, signup to the hosted Litmus cloud at litmuschaos.cloud or use the getting-started guide.