Server Monitoring Script
Yesterday I got an email from Rimuhosting warning me about too-high load on my server. It turned out that there was a runaway process (aftermath of executing a bugged CGI script) that was consuming 90%+ of the CPU resources on my VPS.
Of course, I immediately killed the process, which ended this issue.
However, I was left thinking that I really need a way to monitor server load, and notify me via email if the average load over one minute goes over, say, 70%. I want this to be checked every 15 minutes by a process executed by cron, but I want only at most one email warning every hour. That warning email should list each incident of going over the 70% load level, with the load level and time indicated.
Here's a bash script to do this. The frequency of the emails, the load level trigger level, and the sampling rate (load level over one minute, over 5 minutes, or over 10 minutes) are all configurable. This is a very lightweight solution for those who only need to monitor load level.
#!/bin/bash # server load monitoring script by Lloyd Standish, lloyd at crnatural.net # This script is freeware released under GNU GPL. loadpath="/home/lloyd/loadmon" maxload=.70 # example .75 = 75% load average minwarninterval=3600 # minimum interval between warning emails, seconds # uncomment only one of the folowing 3 lines loadavginterval=1 #for one minute load averages #loadavginterval=2 #for 5 minute load averages #loadavginterval=3 #for 10 minute load averages #cat /proc/loadavg #debugging if [ ! -d "$loadpath" ]; then mkdir "$loadpath" fi #if [ ! -f "$loadpath/loadcount" ]; then # echo "0" "$loadpath/loadcount" #fi #count=`cat $loadpath/loadcount` now=`date +%s` prev="0" if [ -f "$loadpath/loadsecs" ]; then prev=`cat $loadpath/loadsecs` fi # check if average load is too high loadavg=`cat /proc/loadavg | cut -d ' ' -f $loadavginterval` if [ `echo $loadavg \> $maxload | bc` -eq 1 ]; then echo "$loadavg `date +%T`" >> "$loadpath/loadrpt" fi if [ `echo $now \> $prev \+ $minwarninterval | bc` -eq 1 -a -f "$loadpath/loadrpt" ]; then case $loadavginterval in 1) loadminutes="one";; 2) loadminutes="five";; 3) loadminutes="ten";; esac echo "Server `hostname`: Warning, $loadminutes minute load average above $maxload! Incidents in last $minwarninterval seconds:" | cat - "$loadpath/loadrpt" rm -f "$loadpath/loadrpt" echo $now > "$loadpath/loadsecs" fi
This is installed with a line in /etc/crontab like this:
*/15 * * * * lloyd /home/lloyd/load.sh
Notes:
1. bc is a calculator program that can take arguments from STDIN. You may need to install this on your server (Debian: apt-get install bc)
2. /proc/loadavg returns (example output):
0.20 0.18 0.12 1/80 11206
The first three columns measure CPU and IO utilization of the last one, five, and 10 minute periods. The fourth column shows the number of currently running processes and the total number of processes. The last column displays the last process ID used.
3. Consult script source for configuration options.
