Awesome

Grafana dashboard for Logstash monitoring using Prometheus

This Grafana dashboard allows you to monitor various aspects of your logstash instances by using Prometheus. Below is the full list of the monitored components:

System
- Average CPU Load
- Logstash process total virtual memory usage
- Logstash process file descriptors
JVM
- Average time spent for GC (young & old generations)
- Threads count
- Heap used percentage
- Heap used in MB
- GC old generation events count
- GC young generation events count
Pipeline
- Events processing times
- Processed input/output events per second
- Output events count
- Input plugins events average waiting times
- Beats input plugins connections
- Input events per second over the last hour
- Output events per second over the last hour
Filters average duration

Setup instructions

The dashboard has been tested on Grafana v6.4. Older versions (>2.5) will probably work out of the box or may require minor modifications eg. the $__range variable (available since v5.6) is used in some PromQL queries.

The setup is based on this Prometheus exporter for logstash written by alxrem. You can either run it locally on your logstash instance or deploy it on a docker container via Docker Hub.

In this use case it is assumed that:

A docker container with the Prometheus exporter is deployed for each logstash instance we want to monitor.
The Logstash API has been configured to be accessible from the docker host.
One Prometheus job named logstash with multiple targets that are actually the containers running the exporters.
A Prometheus type datasource configured on Grafana named Prometheus.

Because, as mentioned above, the exporters are running in a container the Prometheus instance label is overwritten in order to reflect the actual logstash fqdn instead of the hostname:port of the target which in this case is the docker host. Additionally, a custom label named instance_pqdn has been added to expose only the pqdn part of the hostname where needed in Grafana visualizations. Below is an example of the Prometheus configuration for job and targets:

  - job_name: 'logstash'
    scrape_interval:     10s
    static_configs:
      - targets: ['dockerhost.example.com:9304']
        labels:
          instance: 'logstash01.example.com'
          instance_pqdn: 'logstash01'
      - targets: ['dockerhost.example.com:9305']
        labels:
          instance: 'logstash02.example.com'
          instance_pqdn: 'logstash02'

Miscellaneous

The dashboard relies on repeated panels, rows and templated variables. You can filter the graphs by selecting instance, plugin_id, input_plugin and output_plugin from the top menu bar.

instance: Logstash fqdn.
plugin_id: The unique id of each plugin which used whithin Logstash filters.
input_plugin: The unique id of each plugin which is used as Logstash input.
output_plugin: The unique id of each plugin which used as Logstash output.

When Logstash initializes a plugin it assigns a random hash to it. This is not very explanatory and helpful in cases where you want to monitor the performance characteristics of each plugin. It is recommended to overwrite the id by setting the id field in each plugin definition in your logstash configuration.

Plugin selection can also be grouped by using Grafana tags. In our case the grouping functionality is based on the name label which is returned by the Prometheus exporter for each plugin. In the example below, the plugin_id: messages_date_1 will be grouped under tag: date in the menu bar.

logstash_pipeline_plugins_filters_events_duration_in_millis{id="messages_date_1",name="date",pipeline="main"} 354

The drop down menus allow also multiple selection, so for example if you have multiple logstash instances you can select All and the panels will adapt accordingly to display graphs from all hosts. Below is an example of System stats from two Logstash instances. Average CPU load is distinct (repeated) for each instance while metrics for total vmem usage and open file descriptors are visualized for both instances in Logstash ps total vmem and Logstash ps fds respectively.

The same logic applies also for plugins where you can select specific id(s) as well as instance(s) to visualize (eg. the average duration of filters). This section is displayed under rows that are repeated based on the selected instance fqdn.

Other sample graphs from the JVM monitoring: