Nagios Plugin for dmesg Monitoring

So far I found no easy solution to monitor for Linux kernel messages. So here is a simple Nagios plugin to scan dmesg output for interesting stuff:

#!/bin/bash

SEVERITIES="err,alert,emerg,crit"
WHITELIST="microcode: |\
Firmware Bug|\
i8042: No controller|\
Odd, counter constraints enabled but no core perfctrs detected|\
Failed to access perfctr msr|\
echo 0 > /proc/sys"

# Check for critical dmesg lines from this day
date=$(date "+%a %b %e")
output=$(dmesg -T -l "$SEVERITIES" | egrep -v "$WHITELIST" | grep "$date" | tail -5)

if [ "$output" == "" ]; then
	echo "All is fine."
	exit 0
fi

echo "$output" | xargs
exit 1

"Features" of the script above:

  • It gives you the 5 most recent messages from today
  • It allows to whitelist common but useless errors in $WHITELIST
  • It uses "dmesg" to work when you already have disk I/O errors and to be faster than syslog parsing

This script helped a lot to early on detect I/O errors, recoverable as well as corruptions. It even worked when entire root partition wasn't readable anymore, because then the Nagios check failed with "NRPE: unable to read output" which indicated that dmesg didn't work anymore. By always showing all errors from the entire day one cannot miss recovered errors that happened in non-office hours.

Another good thing about the check is detecting OOM kills or fast spawning of processes.

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

To prevent automated spam submissions leave this field empty.
Syndicate content