Home

Awesome

Krkn aka Kraken

Workflow-Status coverage action

Krkn logo

Chaos and resiliency testing tool for Kubernetes. Kraken injects deliberate failures into Kubernetes clusters to check if it is resilient to turbulent conditions.

Workflow

Kraken workflow

Demo

Kraken demo

Chaos Testing Guide

Guide encapsulates:

The guide is hosted at https://krkn-chaos.github.io/krkn.

How to Get Started

Instructions on how to setup, configure and run Kraken can be found at Installation.

You may consider utilizing the chaos recommendation tool prior to initiating the chaos runs to profile the application service(s) under test. This tool discovers a list of Krkn scenarios with a high probability of causing failures or disruptions to your application service(s). The tool can be accessed at Chaos-Recommender.

See the getting started doc on support on how to get started with your own custom scenario or editing current scenarios for your specific usage.

After installation, refer back to the below sections for supported scenarios and how to tweak the kraken config to load them on your cluster.

Running Kraken with minimal configuration tweaks

For cases where you want to run Kraken with minimal configuration changes, refer to krkn-hub. One use case is CI integration where you do not want to carry around different configuration files for the scenarios.

Config

Instructions on how to setup the config and the options supported can be found at Config.

Kubernetes chaos scenarios supported

Scenario typeKubernetes
Pod Scenarios:heavy_check_mark:
Pod Network Scenarios:x:
Container Scenarios:heavy_check_mark:
Node Scenarios:heavy_check_mark:
Time Scenarios:heavy_check_mark:
Hog Scenarios: CPU, Memory:heavy_check_mark:
Cluster Shut Down Scenarios:heavy_check_mark:
Service Disruption Scenarios:heavy_check_mark:
Zone Outage Scenarios:heavy_check_mark:
Application_outages:heavy_check_mark:
PVC scenario:heavy_check_mark:
Network_Chaos:heavy_check_mark:
ManagedCluster Scenarios:heavy_check_mark:
Service Hijacking Scenarios:heavy_check_mark:
SYN Flood Scenarios:heavy_check_mark:

Kraken scenario pass/fail criteria and report

It is important to make sure to check if the targeted component recovered from the chaos injection and also if the Kubernetes cluster is healthy as failures in one component can have an adverse impact on other components. Kraken does this by:

Signaling

In CI runs or any external job it is useful to stop Kraken once a certain test or state gets reached. We created a way to signal to kraken to pause the chaos or stop it completely using a signal posted to a port of your choice.

For example if we have a test run loading the cluster running and kraken separately running; we want to be able to know when to start/stop the kraken run based on when the test run completes or gets to a certain loaded state.

More detailed information on enabling and leveraging this feature can be found here.

Performance monitoring

Monitoring the Kubernetes/OpenShift cluster to observe the impact of Kraken chaos scenarios on various components is key to find out the bottlenecks as it is important to make sure the cluster is healthy in terms if both recovery as well as performance during/after the failure has been injected. Instructions on enabling it can be found here.

SLOs validation during and post chaos

Information on enabling and leveraging this feature can be found here

OCM / ACM integration

Kraken supports injecting faults into Open Cluster Management (OCM) and Red Hat Advanced Cluster Management for Kubernetes (ACM) managed clusters through ManagedCluster Scenarios.

Blogs and other useful resources

Roadmap

Enhancements being planned can be found in the roadmap.

Contributions

We are always looking for more enhancements, fixes to make it better, any contributions are most welcome. Feel free to report or work on the issues filed on github.

More information on how to Contribute

If adding a new scenario or tweaking the main config, be sure to add in updates into the CI to be sure the CI is up to date. Please read this file for more information on updates.

Community

Key Members(slack_usernames/full name): paigerube14/Paige Rubendall, mffiedler/Mike Fiedler, tsebasti/Tullio Sebastiani, yogi/Yogananth Subramanian, sahil/Sahil Shah, pradeep/Pradeep Surisetty and ravielluri/Naga Ravi Chaitanya Elluri.

The Linux Foundation® (TLF) has registered trademarks and uses trademarks. For a list of TLF trademarks, see Trademark Usage.