Home

Awesome

ServiceFabric.Watchdog

ServiceFabric.Watchdog is a little framework that helps you emit custom metrics from your Azure Service Fabric Service and monitor/act on those metrics in a watchdog. An example of what you can do is to expose metrics like AverageResponseTime and RequestsPerSecond, and based on those metrics you can setup rules that scales up or down the number of instances of that service. Interesting? Here are the details!

Emitting metrics

The first thing you have to do is emit custom metrics. In the ServiceFabric.Watchdog solution you will find two projects:

Now, in addition to this we need to specify load metrics when we create the service. You can do this in many ways depending on how you create the service - I do it through the Service Fabric Explorer UI. On the MetricsBuilderSample Service Type, press Create and specify the metrics like this:

Create Service with load metrics

Creating the watchdog

Ok, we emit the metrics we want. Now we want to monitor them and take actions on them as well. For that we use the RuleWatchdog class and add rules to it. The properties in the Rule class are:

Putting it together, here is the code in WatchdogSampleService that set up the rule and starts the RuleWatchdog instance:

protected override async Task RunAsync(CancellationToken cancellationToken)
{
    // create the watchdog and create our sample rule
    var ruleWatchdog = new RuleWatchdog();
    ruleWatchdog.Rules.Add(new Rule()
    {
        // only applicable for applications with Watchdog in their name
        RuleFilter = new StringExpression("Application == \'*Watchdog*\'"),
        // action will trigger when Iterations are above 45
        TriggerExpression = new IntExpression("Iterations > 45"),
        // We should aggretage data for the instances of the service
        AggregateData = true,
        // The expression need to trigger for 1 minute before action kicks in
        TriggerPeriod = new TimeSpan(0, 1, 0),
        // 2 minutes need to expire before the expression is considered again after an action has been done
        ActionGracePeriod = new TimeSpan(0, 2, 0),
        // we want to scale the service up, 1 at the time until we have the service on all nodes in the cluster
        TriggerAction = new ScaleStatelessServiceUpRuleAction()
        {
            MaxNumInstances = -1,
            MinNumInstances = 1,
            ScaleDeltaNumInstances = 1
        }
    });
    // start the watchdog and check the rules every 10 seconds
    ruleWatchdog.Start(new TimeSpan(0, 0, 10));

    try
    {
        while (true)
        {
            // just to keep the loop running
            cancellationToken.ThrowIfCancellationRequested();
            await Task.Delay(TimeSpan.FromSeconds(1), cancellationToken);
        }
    }
    catch (Exception)
    {
        ruleWatchdog.Stop();
    }
}

That's it! Test out the sample, implement it in your own solution and improve it by doing a pull request!