Awesome
Overview
This repostitory contains three nagios plugins:
- check_jenkins_job_extended.pl - The original, as documented below. Designed to check for failures, not how long since success.
- check_jenkins_cron.pl - A from-scratch copy designed to check jobs that should build periodically.
- check_jenkins_nodes.pl - Checks the number of nodes with a status of "offline".
check_jenkins_cron.pl
Usage
usage: ./check_jenkins_cron.pl -j <job> -l <url> -w <threshold> -c <threshold> [-f] [-u username -p password] [-v]
Required arguments
-j <job> : Jenkins job name
The name of the job to examine.
-l <url> : Jenkins URL
Protocol assumed to be http if none specified.
-w <threshold> : Warning Threshold (seconds)
WARNING when the last successful run was over <threshold> seconds ago.
CRITICAL when last successful run was over <threshold> and failures
have occured since then.
-c <threshold> : Critical Threshold (seconds)
CRITICAL when the last successful run was over <threshold> seconds ago.
Optional arguments
-f : WARNING when the last run was not successful, even if the last
successful run is within the -w and -c thresholds.
-u <username> : Jenkins Username if anonymous API access is not available
-p <password> : Jenkins Password if anonymous API access is not available
-v : Increased verbosity.
This will confuse nagios, and should only be used for debug purposes
when testing this plugin.
Sample nagios configuration
Command definition
define command {
command_name check_jenkins_cron
command_line $USER1$/check_jenkins_cron.pl -j '$ARG1$' -l $ARG2$ -w $ARG3$ -c $ARG4$ -f -u $ARG5$ -p $ARG6$
}
Service definition to warn when a job hasn't built for 24 hours, and crit when it hasn't built for 36 hours.
define service {
use local-service
host_name buildserver.mycompany.com
service_description Jenkins - prod build
check_interval 1
check_command check_jenkins_cron!Producuction build!buildserver.mycompany.com!86400!129600!myuser!mypassword
contacts bob,bill
}
nagios-jenkins-plugin (check_jenkins_job_extended.pl)
A nagios plugin for which lets you check jenkins jobs according to various criteria.
How to use it
The plugin supports several options, which you can pass "0" to disable that particular threshold.
Usage: check_jenkins_job_extended url jobname concurrentFailsThreshold buildDurationThresholdMilliseconds lastStableBuildThresholdInMinutesWarn lastStableBuildThresholdInMinutesCrit
-
url: The URL to your jenkins server
-
username: The username for auth to your jenkins server [optional]
-
password: The password for auth to your jenkins server [optional]
-
jobname: The name of the jenkins job you'd like to check
-
concurrentFailsThreshold: The number of concurrent failing builds it should CRIT alert on
-
buildDurationThresholdMilliseconds: It will alert if the last build took longer than this number of milliseconds to complete
-
lastStableBuildThresholdInMinutesWarn: WARN if it's been this number of minutes since the last stable build
-
lastStableBuildThresholdInMinutesCrit: CRIT if it's been this number of minutes since the last stable build
Example
A sample nagios command using this plugin.
define command {
command_name check_jenkins_job_ext
command_line $USER1$/check_jenkins_job_extended.pl $ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ $ARG7$ $ARG8$
}
A sample nagios service using the above command to warn when it's been 4 mins since the last stable build, and crit when it's been 20.
define service {
use local-service
host_name buildserver.mycompany.com
service_description Jenkins - prod build
check_interval 1
check_command check_jenkins_job_ext!http://buildserver.mycompany.com!prod!0!0!4!20
contacts bob,bill
}
check_jenkins_nodes.pl
Usage
Usage: check_jenkins_nodes.pl -s [jenkins server hostname & path] -w [integer or %] -c [integer or %] [-h this help message] [-u username] [-p password] [-v]
Required Arguments:
-s <server hostname> : jenkins CI server hostname
-c <threshold> : integer or percentage (ex: 2 or 50%)
CRITICAL if <threshold> nodes or greater are offline
-w <threshold> : integer or percentage (ex: 2 or 50%)
WARNING if <threshold> nodes or greater are offline
Optional arguments
-h This help message
-p <password> : password to the jenkins CI server
-u <username> : username to the jenkins CI server
-v verbose output
Command definition
define command{
command_name check_jenkins_nodes
command_line $USER1$/check_jenkins_nodes.pl -s$ARG1$ -u$ARG2$ -p$ARG3$ -w$ARG4$ -c$ARG5$
}
Service definition to warn when a job hasn't built for 24 hours, and crit when it hasn't built for 36 hours.
define service {
use local-service
host_name buildserver.mycompany.com
service_description Jenkins - node check
check_interval 1
check_command check_jenkins_nodes!https://buildserver.mycompany.com!myuser!mypassword!2!51%
contacts bob,bill
}