Cheat Sheets

Nagios Plugin for dmesg Monitoring

So far I found no easy solution to monitor for Linux kernel messages. So here is a simple Nagios plugin to scan dmesg output for interesting stuff:


SEVERITIES="err,alert,emerg,crit" WHITELIST="microcode: |\ Firmware Bug|\ i8042: No controller|\ Odd, counter constraints enabled but no core perfctrs detected|\ Failed to access perfctr msr|\ echo 0 > /proc/sys"

# Check for critical dmesg lines from this day date=$(date "+%a %b %e") output=$(dmesg -T -l "$SEVERITIES" | egrep -v "$WHITELIST" | grep "$date" | tail -5)

if [ "$output" == "" ]; then echo "All is fine." exit 0 fi

echo "$output" | xargs exit 1
"Features" of the script above: This script helped a lot to early on detect I/O errors, recoverable as well as corruptions. It even worked when entire root partition wasn't readable anymore, because then the Nagios check failed with "NRPE: unable to read output" which indicated that dmesg didn't work anymore. By always showing all errors from the entire day one cannot miss recovered errors that happened in non-office hours.

Another good thing about the check is detecting OOM kills or fast spawning of processes.

Comment on Disqus