Nagios Plugin for dmesg Monitoring

So far I found no easy solution to monitor for Linux kernel messages. So here is a simple Nagios plugin to scan dmesg output for interesting stuff:

#!/bin/bash

SEVERITIES="err,alert,emerg,crit"
WHITELIST="microcode: |\
Firmware Bug|\
i8042: No controller|\
Odd, counter constraints enabled but no core perfctrs detected|\
Failed to access perfctr msr|\
echo 0 > /proc/sys"

# Check for critical dmesg lines from this day
date=$(date "+%a %b %e")
output=$(dmesg -T -l "$SEVERITIES" | egrep -v "$WHITELIST" | grep "$date" | tail -5)

if [ "$output" == "" ]; then
	echo "All is fine."
	exit 0
fi

echo "$output" | xargs
exit 1

"Features" of the script above:

  • It gives you the 5 most recent messages from today
  • It allows to whitelist common but useless errors in $WHITELIST
  • It uses "dmesg" to work when you already have disk I/O errors and to be faster than syslog parsing

This script helped a lot to early on detect I/O errors, recoverable as well as corruptions. It even worked when entire root partition wasn't readable anymore, because then the Nagios check failed with "NRPE: unable to read output" which indicated that dmesg didn't work anymore. By always showing all errors from the entire day one cannot miss recovered errors that happened in non-office hours.

Another good thing about the check is detecting OOM kills or fast spawning of processes.

Comments

Nagios Plugin for dmesg Monitoring | LZone

obtaining suggestions

obtaining suggestions

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

To prevent automated spam submissions leave this field empty.
Syndicate content