Awesome

starboard-exporter

Exposes Prometheus metrics from Trivy Operator's VulnerabilityReport, ConfigAuditReport, and other custom resources (CRs).

Metrics

This exporter exposes several types of metrics:

CIS Benchmarks

Report Summary

A report summary series exposes the count of checks of each status reported in a given CISKubeBenchReport. For example:

starboard_exporter_ciskubebenchreport_report_summary_count{
    node_name="bj56o-master-bj56o-000000"
    status="FAIL"
    } 31

Section Summary

For slightly more granular reporting, a section summary series exposes the count of checks of each status reported in a given CISKubeBenchSection. For example:

starboard_exporter_ciskubebenchreport_section_summary_count{
    node_name="bj56o-master-bj56o-000000"
    node_type="controlplane"
    section_name="Control Plane Configuration"
    status="WARN"
    } 4

Result Detail

A CIS benchmark result info series exposes fields from each instance of an Aqua CISKubeBenchResult. For example:

starboard_exporter_ciskubebenchreport_result_info{
    node_name="bj56o-master-bj56o-000000"
    node_type="controlplane"
    pod="starboard-exporter-859955f485-cwkj6"
    section_name="Control Plane Configuration"
    test_desc="Client certificate authentication should not be used for users (Manual)"
    test_number="3.1.1"
    test_status="WARN"
    } 1

Vulnerability Reports

Report Summary

A summary series exposes the count of CVEs of each severity reported in a given VulnerabilityReport. For example:

starboard_exporter_vulnerabilityreport_image_vulnerability_severity_count{
    image_digest="",
    image_namespace="demo",
    image_registry="quay.io",
    image_repository="giantswarm/starboard-operator",
    image_tag="0.11.0",
    report_name="replicaset-starboard-app-6894945788-starboard-app",
    severity="MEDIUM"
    } 4

This indicates that the giantswarm/starboard-operator image in the demo namespace contains 4 medium-severity vulnerabilities.

Vulnerability Details

A "detail" or "vulnerability" series exposes fields from each instance of an Aqua Vulnerability. The value of the metric is the Score for the vulnerability. For example:

starboard_exporter_vulnerabilityreport_image_vulnerability{
    fixed_resource_version="1.1.1l-r0",
    image_digest="",
    image_namespace="demo",
    image_registry="quay.io",
    image_repository="giantswarm/starboard-operator",
    image_tag="0.11.0",
    installed_resource_version="1.1.1k-r0",
    report_name="replicaset-starboard-app-6894945788-starboard-app",
    severity="HIGH",
    vulnerability_id="CVE-2021-3712",
    vulnerability_link="https://avd.aquasec.com/nvd/cve-2021-3712",
    vulnerability_title="openssl: Read buffer overruns processing ASN.1 strings",
    vulnerable_resource_name="libssl1.1"
    } 7.4

This indicates that the vulnerability with the id CVE-2021-3712 was found in the giantswarm/starboard-operator image in the demo namespace, and it has a CVSS 3.x score of 7.4.

An additional series would be exposed for every combination of those labels.

Config Audit Reports

Report Summary

A summary series exposes the count of checks of each severity reported in a given ConfigAuditReport. For example:

starboard_exporter_configauditreport_resource_checks_summary_count{
  resource_name="replicaset-chart-operator-748f756847",
  resource_namespace="giantswarm",
  severity="LOW"
  } 7

A Note on Cardinality

For some use cases, it is helpful to export additional fields from VulnerabilityReport CRs. However, because many fields contain unbounded arbitrary data, including them in Prometheus metrics can lead to extremely high cardinality. This can drastically impact Prometheus performance. For this reason, we only expose summary data by default and allow users to opt-in to higher-cardinality fields.

Sharding Reports

In large clusters or environments with many reports and/or vulnerabilities, a single exporter can consume a large amount of memory, and Prometheus may need a long time to scrape the exporter, leading to scrape timeouts. To help spread resource consumption and scrape effort, starboard-exporter watches its own service endpoints and will shard metrics for all report types across the available endpoints. In other words, if there are 3 exporter instances, each instance will serve roughly 1/3 of the metrics. This behavior is enabled by default and does not require any additional configuration. To use it, simply change the number of replicas in the Deployment. However, you should read the section on cardinality and be aware that consuming large amounts of high-cardinality data can have performance impacts on Prometheus.

Customization

Summary metrics of the format described above are always enabled.

To enable an additional detail series per Vulnerability, use the --target-labels flag to specify which labels should be exposed. For example:

# Expose only select image and CVE fields.
--target-labels=image_namespace,image_repository,image_tag,vulnerability_id

# Run with (almost) all fields exposed as labels, if you're feeling really wild.
--target-labels=all

Target labels can also be set via Helm values:

exporter:
  vulnerabilityReports:
    targetLabels:
      - image_namespace
      - image_repository
      - image_tag
      - vulnerability_id
      - ...

The same can be done for CIS Benchmark Results. To enable an additional detail series per CIS Benchmark Result, use the --cis-detail-report-labels flag to specify which labels should be exposed. For example:

# Expose only section_name, test_name and test_status
--cis-detail-report-labels=section_name,test_name,test_status

# Run with (almost) all fields exposed as labels.
--cis-detail-report-labels=all

CIS detail target labels can also be set via Helm values:

exporter:
  CISKubeBenchReports:
    targetLabels:
      - node_name
      - node_type
      - section_name
      - test_name
      - test_status
      - ...

Helm

How to install the starboard-exporter using helm:

helm repo add giantswarm https://giantswarm.github.io/giantswarm-catalog
helm repo update
helm upgrade -i starboard-exporter --namespace <trivy operator namespace> giantswarm/starboard-exporter

Scaling for Prometheus scrape timeouts

When exporting a large volume of metrics, Prometheus might time out before retrieving them all from a single exporter instance. It is possible to automatically scale the number of exporters to keep the scrape time below the configured timeout. To enable HPA scaling based on Prometheus metrics, here