Project post by LitmusChaos maintainers
As promised, we are back with another edition of monthly updates from the LitmusChaos community. With the growth of Chaos Engineering community as well as the LitmusChaos community,we appreciate this massive participation and immense engagement and strive for the community prosper and contribute back to its development.
This article is written to share monthly updates with the community from July 2022
to update the community on the latest happenings and updates around the LitmusChaos project.
LitmusChaos is a dynamic open source chaos engineering platform that enables teams to identify weaknesses and potential outages in infrastructures by inducing chaos engineering tests/experiments in a controlled manner. LitmusChaos is driven by the principles of Cloud-Native innovation and gave rise to the principles of Cloud-Native Chaos Engineering. Chaos engineering verifies the resilience of business services and helps DevOps pipelines proactively build code that is more resilient against software and infrastructure faults.
The LitmusChaos project was started in late 2017 to provide simple chaos jobs in Kubernetes. It became a CNCF sandbox project in 2020 and was promoted as a CNCF incubating project in January 2022. Today, it has maintainers from 5 different organizations across cloud-native vendors, solution providers, and end-users.
The project is used in production by more than 30 organizations, including large end-users like Adidas, FIS, iFood, Cyren, Intuit, Lenskart, Orange, and more as well as technology organizations like Red Hat and VMware.
LitmusChaos Releases 2.11.0
LitmusChaos version 2.11.0 was released on the 16th of July with some great new updates to the core components, the chaos center, and the Litmusctl.
The community is excited about the addition of the rest of the “http” chaos experiments which were due.
The addition of the following http chaos experiments is a big boost to the already vibrant ChaosHub:
- pod-http-reset-peer: It simulates TCP reset (connection reset by peer error) into the pod which stops outgoing http requests by closing the connection and then reverts back to the original state after the specified duration
- pod-http-modify-status-code: It can modify the http response code for the http request.
- pod-http-modify-header: It can modify/add/remove http response or request headers.
- pod-http-modify-body: It can modify the complete body of the http response or request.
Check out the release notes for deeper details on the release:
Release Notes (2.11.0)
Core Component Updates –
- Introduce different new HTTP chaos experiments for Kubernetes targets, this will allow users to introduce the following faults:
- pod-http-response-code (modifies the status code of the response)
- pod-http-modify-body (modifies the body of the response)
- pod-http-header-modify (modify headers of incoming requests or the response from the service targetted)
- pod-http-peer-reset (stops outgoing http requests by resetting the TCP connection)
- To know more check out the experiment-docs.
- Upgrade the operator SDK version to 1.14.0 for chaos operator and code refactor for other litmus components to achieve compatibility.
- Adds support for non-default vpc for the AWS ELB az-down experiment. As earlier the experiment only targets the ELB(s) that are associated with default vpc, so this will also allow the user to target the ELB(s) that are not associated with default vpc.
- Enhance cmd probe spec to support different configurations for probe pods and containers like imagePullPolicy, cmd, args and so on. This will allow the user to run the cmd probe in a more controlled way when used in source mode.
- Fixes the error handling for application status check with litmus annotation for pod-level experiments.
- Adds litmusctl docker image which will allow the users to install the agents (delegates) from a pod/container.
ChaosCenter Updates –
- Updates terminologies for different entities –
- Agents –> Chaos Delegates
- Workflows –> Chaos Scenarios
- Charts –> Chaos Experiments
- Reduces the permissions in namespaced mode for execution plane components
- Fixes an issue in SyncHub API when there is an error in communication between graphql-server and MongoDB and all hubs were getting deleted while trying to reclone current chaoshub.
NOTE: – Along with the above terminologies updates, we will also be updating the directory structure of ChaosHub for better readability and scalability of experiments and scenarios. With the upcoming release, the charts will be renamed to experiments directory and workflows will be renamed to scenarios. The same changes have been already done in the 2.11.0 version of ChaosHub. So, Users are requested to upgrade their ChaosHub directory structure with provided changes before upgrading to the upcoming release – the 2.12.0 version of ChaosCenter.
Litmus-2.11.0 (Stable) cluster scope manifest
kubectl apply -f https://raw.githubusercontent.com/litmuschaos/litmus/2.11.0/mkdocs/docs/2.11.0/litmus-2.11.0.yaml
Litmus-2.11.0 (Stable) namespace scope manifest.
#Create a namespace eg: litmus kubectl create ns litmus #Install CRDs, if SELF_AGENT env is set to TRUE kubectl apply -f https://raw.githubusercontent.com/litmuschaos/litmus/master/mkdocs/docs/2.11.0/litmus-portal-crds-2.11.0.yml #Install ChaosCenter kubectl apply -f https://raw.githubusercontent.com/litmuschaos/litmus/master/mkdocs/docs/2.11.0/litmus-namespaced-2.11.0.yaml -n litmus
Upgrading from 2.10.0 to 2.11.0
kubectl apply -f https://raw.githubusercontent.com/litmuschaos/litmus/2.11.0/mkdocs/docs/2.11.0/upgrade-agent.yaml
Latest from the LitmusChaos Community
LitmusChaos user and community member Bruno Barin (Software Developer, iFood) shares how iFood, a leading food delivery company in Brazil leveraged Chaos Engineering with LitmusChaos. Check out their end-user story: https://www.cncf.io/blog/2022/07/08/how-ifood-leveraged-chaos-engineering-with-litmuschaos/
As the community continues to grow, so does the content. Over the month the community members have created some amazing and exciting content to uplift the presence of LitmusChaos on the Cloud Native map. Check out all the latest content curated by the community for the community:
Here are our monthly updates from June 2022, Check out the LitmusChaos June update blog: https://www.cncf.io/blog/2022/07/05/litmuschaos-june-2022-update/
LitmusChaos was at KubeCon EU 2022, Check out the highlights from LitmusChaos at KubeCon EU ‘22: https://www.cncf.io/blog/2022/07/01/litmuschaos-at-kubecon-eu-2022/
Chaos Engineering Meetup:
To inculcate the knowledge of Chaos Engineering, we kick-started the initiative of organizing Chaos Engineering Meetups every month to talk about all things Chaos Engineering and move beyond the scope of LitmusChaos to give the community an opportunity to learn principles, concepts and larger ideas around chaos.
In the last couple of editions of the Chaos Engineering Meetup, we had the SRE team from Accenture talk about how they built capabilities around their organization to bring in chaos as a practice and how they continue to run gamedays to bolster their chaos engineering story.
Let’s get an overview of both the meetups:
- “Achieve digital product resiliency with Chaos Engineering –> How to build from scratch a CE Capability at scale”
Accenture suggests that the journey for SRE teams to build a Chaos engineering capability across multiple business value streams is challenging. Chaos Engineering is built around the human factor and cyber resiliency practices, aiming to uncover the “unknown unknowns” and build confidence in the system. DevOps, SRE, and Chaos Engineering teams will progressively understand the complex infrastructure and distributed systems concerns. Engineering teams must have the opportunity to build trust with each other. They display how the entirely remote work enabled teams to think and work with a broader horizon, and how LitmusChaos speeded up this process for them.
This session was curated to enable us to understand the relevance of building the proper Foundation for SRE and Chaos Engineering while working full remotely.
Achieve Digital Product Resiliency with Chaos Engineering | Chaos Engineering Meetup June- 2022
- “Chaos Engineering hands-on -> the practical experience of an SRE ideating Chaos Experiment and use LitmusChaos”
As suggested before, Chaos Engineering is the discipline of experimenting with a complex system to build confidence in the system’s capability to withstand turbulent conditions in production. DevOps, SRE, and Chaos Engineering teams usually collaborate to ideate, create and run Chaos Experiments to achieve production resiliency. In this meetup, Accenture shares their journey to create Chaos Experiments, run them, and shares the success story among the organization. An essential milestone for the Teams to measure the successful activation of the CE capability is to plan, run and review a Game Day.
Chaos Engineering hands-on – An SRE ideating Chaos Experiments and using LitmusChaos | July 2022
The LitmusChaos Community meetings continue as a monthly cadence call to discuss the latest updates, happenings, and questions from the community. They are hosted every 3rd Wednesday of the month. Check out the latest from our last community meeting held on July 20th. Community members Crystal Lam & Jonathas Barosa shared their adoption stories followed by community contributor Akash Srivastava (Software Engineer, Harness) sharing the latest updates on the release and further demoed the http chaos experiments.
LitmusChaos-Community-Sync-up- July 2022| Open Source Chaos Engineering |
Logging Using EFK for LitmusChaos: https://dev.to/litmus-chaos/logging-using-efk-for-chaoscenter-5gp6
|Cloud Native Live: Litmus Chaos Engine and a microservices demo app||https://youtu.be/hOghvd9qCzI|
In the end…
The LitmusChaos community continues to grow with amazing contributions (issues, suggestions, PRs) from the community and looks forward to more members joining in and contributing to the growth of the project.
Join the #litmus channel on the Kubernetes Slack to become a part of the community. Learn, Ask, and Contribute by being a part of the community.
Check out the Contributing Guide to get started with contributions
Subscribe to the LitmusChaos YouTube Channel for the latest videos.
Follow @LitmusChaos on Twitter for the latest social updates.
Check out the LitmusChaos blogs to learn more about LitmusChaos and you can write one too by using the tag #litmuschaos on DEV.to