Home

Awesome

Chaos injection using Simmy in a Microservice architecture

A microservice based application to demonstrate how chaos engineering works with Simmy using chaos policies in a distributed system and how we can inject even a custom behavior given our needs or infrastructure, this time injecting custom behavior to generate chaos in our Service Fabric Cluster.

Prerequisites and Installation Requirements

  1. Install Docker for Windows.
  2. Install .NET Core SDK
  3. Install Visual Studio 2017 15.7 or later.
  4. Share drives in Docker settings, in order to deploy and debug with Visual Studio 2017 (See the below image)
  5. Clone this Repo
  6. Set docker-compose project as startup project.
  7. Press F5 and that's it!

Note: All images into the docker-compose.override.yml are configured to run on Production environment in order to inject the chaos policies (I'll explain it later) except the duber.chaos.api in order to you'll be able to use In-Memory cache locally, otherwise it will use Redis cache.

Note 2: If you want to test out the chaos in your cluster, you need to create a service principal, then set up the values forGeneralChaosSetting section into the appsettings file of Duber.Chaos.API project, or their respective environment variables inside docker-compose.override.

Note: The first time you hit F5 it'll take a few minutes, because in addition to compile the solution, it needs to pull/download the base images (SQL for Linux Docker, ASPNET, MongoDb and RabbitMQ images) and register them in the local image repo of your PC. The next time you hit F5 it'll be much faster.

Tuning Docker for better performance

It is important to set Docker up properly with enough memory RAM and CPU assigned to it in order to improve the performance, or you will get errors when starting the containers with VS 2017 or "docker-compose up". Once Docker for Windows is installed in your machine, enter into its Settings and the Advanced menu option so you are able to adjust it to the minimum amount of memory and CPU (Memory: Around 4096MB and CPU:3) as shown in the image.

The Example

This repo provides an example/approach of how to use Simmy in a kind of real but simple scenario over a distributed architecture to inject chaos in your system in a configurable and automatic way.

The example demonstrates the following patterns with Simmy:

The Architecture

The Chaos API

The example provides an API to save and get the chaos configuration in Redis Cache.

The Chaos UI - Configuring the chaos policies (monkeys)

The example provides a UI to set up the general chaos settings and also settings at operation level.

General chaos settings

Operations chaos settings

The WatchMonkey

Is an Azure Function with a timer trigger which is executed every 5 minutes (value set arbitrarily for this example) in order to watch the monkeys (chaos settings/policies) set up in the previous UI. So, if the automatic chaos injection is enabled it releases all the monkeys for the given frequency within the time window configured (Max Duration), after that time window all the monkeys are caged (disabled) again. It also watches monkeys with a specific duration, allowing you to disable specific faults in a smaller time window.

How the chaos is injected:

Calls guarded by Polly policies often wrap a series of policies around a call using PolicyWrap. The policies in the PolicyWrap act as nesting middleware around the outbound call. The recommended technique for introducing Simmy is to use one or more Simmy chaos policies as the innermost policies in a PolicyWrap. By placing the chaos policies innermost, they subvert the usual outbound call at the last minute, substituting their fault or adding extra latency. The existing Polly policies - further out in the PolicyWrap - still apply, so you can test how the Polly resilience you have configured handles the chaos/faults injected by Simmy.

Adding Simmy chaos without changing existing configuration code

As mentioned above, the usual technique to add chaos-injection is to configure Simmy policies innermost in your app's PolicyWraps. One of the simplest ways to do this all across your app is to make all policies used in your app be stored in and drawn from PolicyRegistry. This is the technique demonstrated in this sample app. In StartUp, all the Polly policies which will be used are configured, and registered in PolicyRegistry:

Setting up some Http policies

var policyRegistry = services.AddPolicyRegistry();
policyRegistry["ResiliencePolicy"] = GetResiliencePolicy();

services.AddHttpClient<ResilientHttpClient>()
    .AddPolicyHandlerFromRegistry("ResiliencePolicy");

Setting up some Azure SQL policies

services.AddSingleton<IPolicyAsyncExecutor>(sp =>
{
    var sqlPolicyBuilder = new SqlPolicyBuilder();
    return sqlPolicyBuilder
        .UseAsyncExecutor()
        .WithDefaultPolicies()
        .Build();
});

Injecting chaos policies (monkeys) to our resilient strategies

The AddChaosInjectors/AddHttpChaosInjectors extension methods on IPolicyRegistry<> simply takes every policy in your PolicyRegistry and wraps Simmy policies (as the innermost policy) inside.

if (env.IsDevelopment() == false)
{
    // injects chaos to our Http policies defined previously.
    var httpPolicyRegistry = app.ApplicationServices.GetRequiredService<IPolicyRegistry<string>>();
    httpPolicyRegistry?.AddHttpChaosInjectors();
    
    // injects chaos to our Sql policies defined previously.
    var sqlPolicyExecutor = app.ApplicationServices.GetRequiredService<IPolicyAsyncExecutor>();
    sqlPolicyExecutor?.PolicyRegistry?.AddChaosInjectors();
}

This allows you to inject Simmy into your app without changing any of your existing app configuration of Polly policies. This extension method configures the policies in your PolicyRegistry with Simmy policies which react to chaos configured trhough the UI.

How does it get the chaos settings

We're injecting a factory which takes care of getting the current chaos settings from the Chaos API. So we're injecting the factory as a Lazy Task Scoped service because we want to avoid to add additional overhead/latency to our system, that way we only retrieve the configuration once per request no matter how many times the factory is executed.

Injecting chaos settings factory

public static IServiceCollection AddChaosApiHttpClient(this IServiceCollection services, IConfiguration configuration)
{
    services.AddHttpClient<ChaosApiHttpClient>(client =>
    {
        client.Timeout = TimeSpan.FromSeconds(5);
        client.BaseAddress = new Uri(configuration.GetValue<string>("ChaosApiSettings:BaseUrl"));
    });

    services.AddScoped<Lazy<Task<GeneralChaosSetting>>>(sp =>
    {
        // we use LazyThreadSafetyMode.None in order to avoid locking.
        var chaosApiHttpClient = sp.GetRequiredService<ChaosApiHttpClient>();
        return new Lazy<Task<GeneralChaosSetting>>(() => chaosApiHttpClient.GetGeneralChaosSettings(), LazyThreadSafetyMode.None);
    });

    return services;
}

We're using LazyThreadSafetyMode.None to avoid locking.

Using chaos settings factory from consumers

// constructor
public TripController(Lazy<Task<GeneralChaosSetting>> generalChaosSettingFactory,...)
{
    ...
}

public async Task<IActionResult> SimulateTrip(TripRequestModel model)
{
    ...
    generalChaosSetting = await _generalChaosSettingFactory.Value;
    var context = new Context(OperationKeys.TripApiCreate.ToString()).WithChaosSettings(generalChaosSetting);
    var response = await _httpClient.SendAsync(request, context);
    ...
}

private async Task UpdateTripLocation(Guid tripId, LocationModel location)
{
    ...
    generalChaosSetting = await _generalChaosSettingFactory.Value;
    var context = new Context(OperationKeys.TripApiUpdateCurrentLocation.ToString()).WithChaosSettings(generalChaosSetting);
    var response = await _httpClient.SendAsync(request, context);
    ...
}

References:

Support

If you find this project helpful you can support me!

Note:

Visit my blog where I explain this example deeper and all that we need to know about Simmy! :smiley: