Welcome to the Chaos Workshop!! Follow the steps provided below to successfully complete the chaos workshop. Earn your certificate & win prizes by sharing the workshop completion details on the #chaos-carnival channel in the Harness Community Slack Workspace !!
To catch the workshop steps live, join the session during Day-1 of Chaos Carnival (March 15, 13:45 CDT) or refer to this Recording.
-
Sign-Up on the Harness SaaS platform via email.
-
Click on the verification email received.
-
Choose the Chaos Engineering Module. This will enable a 14-day enterprise trial license.
Note: Note the Account ID (underlined in red in the above screenshot). This will be needed while submitting request for the sandbox environment.
-
You will see a modal asking to to "Enable Chaos Infrastructure To Run Your First Chaos Experiment". At this point, pause action on the Harness UI & proceed with the next step to obtain the sandbox environment.
-
Fill up and submit this form to request a timed (6 hours) sandbox environment to carry out the chaos workshop steps
-
In under 2 minutes, check your email and verify receipt of your sandbox config information, which consits of:
- A KubeConfig file, which you can download and use as the context for navigating the environment provided
- Note: You will need
kubectl
setup in your local workspace to view the resources in your sandbox environment (which is a Kubernetes Namespace bearing the name firstname-lastname-ns).
- Note: You will need
- URLs to a sample microservices application (which will be subjected to chaos during the workshop), the grafana dashboard where it is monitored along with the corresponding prometheus endpoint.
- A KubeConfig file, which you can download and use as the context for navigating the environment provided
Note: If you don't receive the email containing access info to the sandbox environment within 5 minutes, please send an email to [email protected], [email protected] OR reach out on the #chaos-carnival channel on the Harness Community Slack Workspace
- Now, proceed with setting up of the chaos infrastructure on the "default project". You can create a dedicated/new project if you wish.
- You will be needed to create a new "Environment", configure your chaos infrastructure, download the installation manifest for the chaos infra and apply it in the provided sandbox environment. The detailed set of steps to achieve this can be found here: https://developer.harness.io/docs/chaos-engineering/user-guides/connect-chaos-infrastructures
Notes:
- Please select the "Namespace Mode" option for chaos infrastructure and provide the appropriate namespace name (use the namespace provided as part of your sandbox environment instead of the default
hce
). - Ignore the instructions to create namespace and to apply the CRDs (These steps are already performed for you as part of the sandbox env creation)
- Selection of "Cluster Wide" option can result in failure, it is strictly unsupported for this workshop.
-
The project shall contain the default "Enterprise ChaosHub" which consists of all the supported faults. However, to simplify things, we have a dedicated custom chaos artifact source for this workshop.
-
Add a new chaoshub by following the steps outlined here: https://developer.harness.io/docs/chaos-engineering/user-guides/add-chaos-hub by using the GitHub repo URL https://github.com/chaoscarnival/hub-workshop-2023
-
Browse the newly added chaoshub. You will see 4 chaos experiments ready to be launched.
-
The workshop details chaos experiments against (an instrumented version of) the demo microservices application Online Boutique. The application is simulated to be constantly under "usage" via a load generator component.
-
The experiments involve injection of different types of chaos faults on a given microservice (ex: carts) OR multiple microservices accompanied by validation of specific constraints (hypotheses) around application behaviour and user impact.
Steps to launch chaos experiments from the ChaosHub & view its progress are outlined here: https://developer.harness.io/docs/chaos-engineering/user-guides/construct-and-run-custom-chaos-experiments#launch-an-experiment-from-chaos-hub
-
The chaos experiment progress, its logs and eventually, the results can be viewed on the respective overview page, while real-time impact can be observed on the Grafana dashboard
-
In this experiment, we randomly bounce/delete pods belonging to the carts microservice. The intent of state chaos such as this is to verify impact upon pod kills that occur as a result of evictions, upgrades etc.,
-
During this experiment, we validate the following hypotheses/constraints using "Resilience Probes":
- Healthy Kubernetes resource status prior to and after fault injection
- Continuous availability of the microservice under test
- Expected levels of latency on the website upon user actions (simulated via loadgenerator)
- Upon "Launch Experiment", select the appropriate chaos infrastructure (connected in the previous steps) & provide the appropriate
App Namespace
(namespace corresponding to the sandbox env) andApp Label
(app=cartservice) in theTarget Application
section. Proceed to run the chaos experiment.
- While the resource health is maintained before & after the experiment, the availability and performance constraints are not met, leading to probe failures and hence, a low resilience score.
-
In this experiment, we inject network latency (with jitter, to randomize extent of latency) to the carts microservice to simulate a degraded cluster network. This is also one of the most popular ways to simulate latency between services across AZs/regions. The intent is to evaluate if the network delay is handled within the system OR is propagated upwards to cause degraded user experience on the website's transactions.
-
During this experiment, we validate the following hypotheses/constraints using "Resilience Probes":
- Healthy Kubernetes resource status prior to and after fault injection
- Continuous availability of the microservice under test
- Expected levels of latency on the website upon user actions (simulated via loadgenerator)
- Upon "Launch Experiment", select the appropriate chaos infrastructure (connected in the previous steps) & provide the appropriate
App Namespace
(namespace corresponding to the sandbox env) andApp Label
(app=cartservice)in theTarget Application
section. Proceed to run the chaos experiment.
- While the resource health is maintained before & after the experiment and the website continues to be available, the performance constraints are not met, leading to probe failure and hence, a low resilience score.
-
In this experiment, we hog the cpu resources in the pod belonging to the carts microservice, simulating a high-traffic situation in which the service is deprived of cpu cycles, leading to slower responses. The intent is to evaluate whether the slowness is handled within the system OR is propagated upwards to cause degraded user experience on the website's transactions.
-
During this experiment, we validate the following hypotheses/constraints using "Resilience Probes":
- Healthy Kubernetes resource status prior to and after fault injection
- Continuous availability of the microservice under test
- Expected levels of latency on the website upon user actions (simulated via loadgenerator)
- Upon "Launch Experiment", select the appropriate chaos infrastructure (connected in the previous steps) & provide the appropriate
App Namespace
(namespace corresponding to the sandbox env) andApp Label
(app=cartservice) in theTarget Application
section. Proceed to run the chaos experiment.
- While the resource health is maintained before & after the experiment and the website continues to be available, the performance constraints are not met, leading to probe failure and hence, a low resilience score.
- In this experiment, we illustrate the ability to string faults together in a desired fashion to generate complex scenarios that reproduce past outage conditions OR are used as stressors/mechanisms to evaluate multi-component failure.
- Upon "Launch Experiment", select the appropriate chaos infrastructure (connected in the previous steps) & provide the appropriate
App Namespace
(namespace corresponding to the sandbox env) andApp Label
(app=cartservice, app=paymentservice, app=adservice, respectively) in theTarget Application
section of each individual faults. Proceed to run the chaos experiment.
- This experiment is oriented towards illustrating multi-fault capabilities. The probe successes/failures are aligned with the ones explained in the previous runs.
- Share Screenshots of the "Chaos Experiment Overview Page" for all 4 chaos experiments in the #chaos-carnival channel of the Harness Community Slack Workspace
Note: Please ensure that the screenshots cover the browser address bar with account ID in the URL!