Awesome
System & Process Supervisor for Linux
<a href="https://www.clipartof.com/435776"><img align="right" src="./doc/logo.png" alt="http://toonclips.com/design/788" title="Copyright © Ron Leishman"></a>
Table of Contents
Introduction
watchdogd(8) is an advanced system and process supervisor daemon, primarily intended for embedded Linux and server systems. By default it periodically kicks the system watchdog timer (WDT) to prevent it from resetting the system. In its more advanced guise it monitors critical system resources, supervises the heartbeat of processes, records deadline transgressions, and initiates a controlled reset if needed.
When a system starts up, watchdogd
determines the reset cause by
querying the kernel. In case of system reset, and not power loss, the
reset reason is available already in a file for later analysis by an
operator or network management system (NMS). This information in
turn can be used to put the system in an operational safe state, or
non-operational safe state.
News: as of v4.0, multiple watchdog devices are supported.
What is a watchdog timer?
Most server and laptop motherboards today come equipped with a watchdog timer (WDT). It is a small timer connected to the reset circuitry so that it can reset the board if the timer expires. The WDT driver, and this daemon, periodically "kick" (reset) the timer to prevent it from firing.
Most embedded systems utilize watchdog timers as a way to automatically recover from malfunctions: lock-ups, live-locks, CPU overload. With a bit of logic sprinkled on top the cause can more easily be tracked down.
The Linux kernel provides a common userspace interface /dev/watchdog
,
created automatically when the appropriate watchdog driver is loaded.
If your board does not have a WDT, the kernel provides a softdog.ko
module which in many cases can be good enough.
The idea of a watchdog daemon in userspace is to run in the background
of your system. When there is no more CPU time for the watchdog daemon
to run it will fail to "kick" the WDT. This will in turn cause the WDT
to reboot the system. When it does watchdogd
has already saved the
reset reason for your post mortem.
As a background process, watchdogd
can of course also be used to
monitor other aspects of the system ...
What can watchdogd do?
Without arguments watchdogd
runs in the background, monitoring the CPU
and as long as there is CPU time it "kicks" /dev/watchdog
every 10
seconds. If the daemon is stopped, or does not get enough CPU time to
run, the underlying WDT hardware will detect this and reboot the system.
This is the normal mode of operation.
With a few lines in watchdogd.conf(5), it can also monitor other aspects of the system, such as:
- File descriptor leaks
- File system usage
- Generic script
- Load average
- Memory leaks
- Process live locks
- Reset counter, e.g., for snmpEngineBoots (RFC 2574)
- Temperature
Read more about Built-in Monitors in the extended documentation.
Build & Install
Note: To enable any of the extra monitors and the process supervisor, see
./configure --help
watchdogd
is tailored for Linux systems and builds against most modern
C libraries. However, three external libraries are required: libite,
libuEv, and libConfuse. Neither should present any surprises,
all of them use de facto standard configure
scripts and support
pkg-config
. The latter is used by the watchdogd
configure
script
use to locate required libraries and header files.
The common ./configure --some --args --here && make
is usually
sufficient to build watchdogd
. But, if libraries are installed in
non-standard locations you may need to provide their paths, e.g. with
PKG_CONFIG_PATH
. The following also sets the most common install
and search paths for the build:
PKG_CONFIG_PATH=/opt/lib/pkgconfig:/home/ian/lib/pkgconfig \
./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var
make
If you're not building from a released tarball but instead use the GIT sources, see the Contributing section below.
Contributing
If you find bugs or want to contribute fixes or features, check out the code from GitHub:
git clone https://github.com/troglobit/watchdogd
cd watchdogd
./autogen.sh
The autogen.sh
script runs autoconf
, automake
, et al to create the
configure script and such generated files not part of the VCS tree. For
more details, see the file CONTRIBUTING in the GIT sources.
Origin & References
[watchdogd(8)[] is an improved version of the original, created by Michele d'Amico and adapted to uClinux-dist by Mike Frysinger. It is maintained by Joachim Wiberg collaboratively at GitHub.
The original code in uClinux-dist is available in the public domain, whereas this version is distributed under the ISC license. See the file LICENSE for more details on this.
The logo, "Watch Dog Detective Taking Notes", is licensed for use by
the watchdogd
project, copyright © Ron Leishman.