No I am not dead – I have just moved into management… I’ll let you come up with the jokes!
Today I’m going to write a technical document on how to monitor the age of a file to ensure that it is newer than a certain criteria – i.e. make sure that file ‘X’ is newer than ’5 days’ for example. This came up during my day as I wanted to make sure that my diary that I use at home (running on WordPress) is backed up to a remote location successfully once a week – so it pays to be in monitoring today!
Firstly, I setup my crontab entry:
[root@rhelserver log]# crontab -l
0 23 * * 0 mysqldump --single-transaction -u sam -p wpblog --password=removed > "/media/nfs2/Backups/Diary/Blog-$(date '+%Y%m%d').sql.gz"
0 23 * * 0 echo "Backup completed" > /var/log/diary-backup
Here we are essentially running a mysqldump against the MySQL DB that is running my wordpress installation (wpblog), and storing it on a remote NFS mount point as a .gz file, with a date modified file name (so i can roll back if needed).
Also, I am creating a new file in /var/log called ‘diary-backup’ – why? Because my plugin will be executed by the nagios user, and i dont really want to give it access to my nfs2 share (Plus, it is a hassle that i dont have time to play with) – so i’m creating a file in /var/log that im going to chmod 755, so that nagios can access it and scrutinize the file age -which, as the file is created after the backup job – will be a real world representation of the .gz file created.
For this exercise, I used the ‘check_file_age’ plugin that ships with Opsview – however the standard output was rather annoying and not very humanised – for example:
root@opsview-monitor:/usr/local/nagios/libexec# ./check_file_age -w 691199 -c 691200 /home/sam/.bash_history
FILE_AGE OK: /home/sam/.bash_history is 425197 seconds old and 434 bytes
This isnt very useful to me – as I am not a computer and cant work out if 425,000 seconds is a good thing or a bad thing So, i modified the check_file_age plugin using the help of this guide here - http://www.krzywanski.net/archives/429 – essentially, replace the line:
print "FILE_AGE $result: $opt_f is $age seconds old and $size bytes\n";
my $days = $age/86400;
$days = sprintf("%.1f", $days);
print "FILE_AGE $result: $opt_f is $age seconds ($days days) old and $size bytes\n";
So that we output ‘days’ instead of seconds. So, next I tested my command locally on my wordpress server:
[root@rhelserver log]# su - nagios
[nagios@rhelserver ~]$ cd /usr/local/nagios/libexec/
[nagios@rhelserver libexec]$ ./check_file_age -c 691200 /var/log/diary-backup
FILE_AGE OK: /var/log/diary-backup is 961 seconds (0.0 days) old and 17 bytes
(If your curious, 691200 seconds is 8 days). So here we can see, the nagios user has access to the file in question – and we are getting data in a usable format i.e. days, not seconds.
Next, we need to create the NRPE entry, so this ^^ command can be executed remotely by the Opsview monitoring server. Doing this is very simple – just add a line similar to the below in your /usr/local/nagios/etc/nrpe_local/overrides.cfg file (if this doesnt exist, just create one):
nagios@rhelserver libexec]$ tail -n1 /usr/local/nagios/etc/nrpe_local/override.cfg
check_command[diary_backup]=/usr/local/nagios/libexec/check_file_age -c 691200 /var/log/diary-backup
The ‘diary_backup’ element is the command we will be executing from Opsview. Finally, give the opsview-agent a bounce to apply the changes:
[nagios@rhelserver libexec]$ exit
[root@rhelserver log]# /etc/init.d/opsview-agent restart
We can now test this locally:
root@rhelserver log]# cd /usr/local/nagios/libexec/
[root@rhelserver libexec]# ./check_nrpe -H localhost -c diary_backup
FILE_AGE WARNING: /var/log/diary-backup is 1272 seconds (0.0 days) old and 17 bytes
Voila, its working.
Bring it all together in the GUI
So lastly, we need to login to the Opsview GUI and bring this all together. Firstly, create a new service check with the plugin as ‘check_nrpe’ and the arguments as ‘-H $HOSTADDRESS$ -c diary_backup’. Then, add this to your host (wordpress server in my example). Finally, give it a reload and it will now be running and monitoring your backup:
There are then hundreds of things you can do – for example be notified when it goes critical or warning (ignore the warning above, i didnt set a -w flag, whoops) – or show it in a keyword (Monitoring > Keywords) as i have done at home:
So there you have it – i am now monitoring my Diary backup cronjob to make sure it completes every week using Opsview. You can use this for anything – logs, files, logins, you name it. Happy hunting!