Home

Awesome

cgroups_exporter

A Prometheus exporter for cgroup-level metrics.

Compiling from Source

This project is written primarily in Go and requires Go v1.16 or later to compile.

To build, you can just type go build, and Go will handle everything. Alternatively, if you have Make installed, you can just type make. Both of these methods produce an executable binary called cgroups_exporter.

Usage

To view help, run ./cgroups_exporter -help.

$ ./cgroups_exporter -help
Usage of ./cgroups_exporter:
  -cgroups-root string
        path to the root of the cgroupsv1 hierarchy (default "/sys/fs/cgroup")
  -file string
        path to the cgroup specification file to use if method is file, ignored otherwise (default "/proc/1/cgroup")
  -help
        print usage
  -method string
        one of: file, slurm (default "slurm")
  -port string
        the port to listen on (default "9821")

The -cgroups-root option allows you to change the default location of the cgroupsv1 hierarchy if it happens to be mounted somewhere unusual. This can also be handy when using a container in case you want to mount the hierarchy in a different location for the sake of the container.

The -method option specifies which cgroup hierarchies will be monitored. Valid options are file and slurm. If set to file, the cgroups specification file specified by the addtional -file option will be read and used to determine which cgroups to monitor. If set to slurm, the program will monitor the node for any jobs running under the Slurm scheduler and output labeled stastics for each job over time.

The -file option is only used if the -method is set to file and specifies the path to the cgroups specification file which indiciates which cgroups will be used. These cgroup specification files have a structured format and they're commonly found at /proc/$$/cgroup for some process ID $$. The cgroups specified in the specification file will be tracked by this exporter while all other cgroups will be ignored. If you want to track cgroups for two or more different processes, you should run two or more copies of this exporter on different ports.

The -port option allows you to change the port that the Prometheus HTTP server will listen on for requests. The default port is recommended unless you need to run two or more copies of this exporter.

Docker

A Docker container is provided for systems where that is more convenient (such as a Kubernetes cluster). You can build it manually using the provided Dockerfile, or just pull the pre-built copy from Docker Hub. Example usage follows:

docker run -t --rm \
    --mount type=bind,src=/sys/fs/cgroup,dst=/sys/fs/cgroup,readonly \
    phphavok/cgroups_exporter -method file -file /proc/31337/cgroup

We specify -t so that we're allocated a pseudo-terminal which makes the logging output look nice and formatted. The --rm option automatically cleans up the container on exit. The first mount command passes through the cgroupv1 hierarchy (/sys/fs/cgroup) on the parent system to the same location within the container. By default, Docker will often have some of the cgroup hierarchy present within the container, but not all of it. This application will need to see the full hierarchy, so this read-only bind mount takes care of that. If you run into issues with mounting over the existing hierarchy within the container, you can change the target to some other location and then pass the -cgroup-root option to the program to accommodate that change. The entrypoint to the container is the program itself, so you can just pass its parameters to the run command.

Singularity

You can use Singularity (e.g., on an HPC cluster) to run the above Docker container from Docker Hub. If using the slurm method, make sure not to have Singularity create a PID namespace for the job (i.e., leave off the -p option), otherwise the container will be unable to properly detect Slurm jobs running outside the container.

singularity run \
    -B /sys/fs/cgroup:/sys/fs/cgroup \
    -B `pwd`:/data \
    docker://phphavok/cgroups_exporter -method file -file /proc/31337/cgroup

Grafana Dashboard

A convenient Grafana dashboard is available in this repository, and is also published as public dashboard 14587 for quick installation. The provided dashboard works for invocations of the cgroups_exporter that operate in -method slurm mode.