Awesome

Utility that exposes the expiry of TLS certificates as Prometheus metrics

Building

To build the Docker image, simply run docker build:

docker build . -t muxinc/certificate-expiry-monitor:latest

Running

Run the Docker image using the executable at /app:

→ docker run muxinc/certificate-expiry-monitor:latest /app --help
Usage of ./certificate-expiry-monitor:
  -domains string
        Comma-separated SNI domains to query
  -frequency duration
        Frequency at which the certificate expiry times are polled (default 1m0s)
  -hostIP
        If true, then connect to the host that the pod is running on rather than to the pod itself.
  -ignoredDomains string
        Comma-separated list of domains to exclude from the discovered set. This can be a regex if the string is wrapped in forward-slashes like /.*\.domain\.com$/ which would exclude all domain.com subdomains.
  -ingressNamespaces string
        If provided, a comma-separated list of namespaces that will be searched for ingresses with domains to automatically query
  -insecure
        If true, then the InsecureSkipVerify option will be used with the TLS connection, and the remote certificate and hostname will be trusted without verification (default true)
  -kubeconfig string
        Path to kubeconfig file if running outside the Kubernetes cluster
  -labels string
        Label selector that identifies pods to query
  -logformat string
        Log format (text or json) (default "text")
  -loglevel string
        Log-level threshold for logging messages (debug, info, warn, error, fatal, or panic) (default "error")
  -metricsPort int
        TCP port that the Prometheus metrics listener should use (default 8888)
  -namespaces string
        Comma-separated Kubernetes namespaces to query (default "default")
  -port int
        TCP port to connect to each pod on (default 443)

Kubernetes Manifest

You're probably going to want to run the certificate-expiry monitor in a Kubernetes cluster. The following manifest shows how you might monitor a set of ingress pods matching the label k8s-app=my-ingresses in the default namespace for the foobar.example.com domain:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: certificate-expiry-monitor
  namespace: default
spec:
  minReadySeconds: 5
  revisionHistoryLimit: 3
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: certificate-expiry-monitor
    spec:
      containers:
      - command:
        - /app
        - -labels
        - k8s-app=my-ingresses
        - -namespaces
        - default
        - -frequency
        - 1m
        - -domains
        - foobar.example.com
        image: muxinc/certificate-expiry-monitor:latest
        imagePullPolicy: Always
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8888
          initialDelaySeconds: 5
          timeoutSeconds: 5
        name: certificate-expiry-monitor
        resources:
          limits:
            cpu: 20m
            memory: 50Mi
          requests:
            cpu: 20m
            memory: 50Mi

Monitoring

A Prometheus endpoint is available at /metrics on TCP port :8888 (customizable with metricsPort).

Labels

Name	Description
`ns`	Namespace of the pod that was queried
`pod`	Pod being queried for TLS certificates
`domain`	Domain being verified against TLS certificates
`status`	Certificate is either `valid`, `expired`, `soon` (not yet valid), or `notfound`

Gauges

Name	Labels	Description
`certificate_expiry_monitor_matching_pods`	`ns`	Number of pods that match the label filter in a namespace
`certificate_expiry_monitor_certificate`	`ns`, `pod`, `domain`, `status`	Number of pods with a certificate in a given status for the domain
`certificate_expiry_monitor_seconds_since_cert_issued`	`ns`, `pod`, `domain`	Seconds since the certificate was issued
`certificate_expiry_monitor_seconds_until_cert_expires`	`ns`, `pod`, `domain`	Seconds until the certificate expires

Counters

Name	Labels	Description
`certificate_expiry_monitor_tls_open_connection_error`	`ns`, `pod`, `domain`	Number of times an error occurred while opening a TLS connection to a pod
`certificate_expiry_monitor_tls_close_connection_error`	`ns`, `pod`, `domain`	Number of times an error occurred while closing a TLS connection to a pod

Healthcheck

A simple healthcheck is available at /healthz on the TCP port :8888 (customizable with metricsPort):

→ curl -v http://localhost:8888/healthz
*   Trying ::1...
* TCP_NODELAY set
* Connection failed
* connect to ::1 port 8888 failed: Connection refused
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8888 (#0)
> GET /healthz HTTP/1.1
> Host: localhost:8888
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Mon, 04 Mar 2019 17:56:45 GMT
< Content-Length: 7
< Content-Type: text/plain; charset=utf-8
<
* Curl_http_done: called premature == 0
* Connection #0 to host localhost left intact
Healthy