Centralized configuration management like Puppet is a bless. If it runs properly. So it makes sense to monitor the run of the Puppet agent, and I wrote NRPE plugin to do just that.
Puppet is great, and many of our customers use it, often in combination with a Icinga monitoring setup. However, it might happen that the Puppet agent, for some reason, does not run, or does not run properly. If the infrastructure is large enough, that might slip through your fingers. Thus it makes sense to monitor the Puppet client.
There are already several solutions out there to do just that. Since the Puppet agent does write plenty of status information to
/var/lib/puppet/state/last_run_summary.yaml current solutions check the last run time stamp of the file, or try to verify the validity of the Yaml structure. However, a correct Yaml structure does not tell anything about when the Puppet agent actually run last time. Also, the time stamp is also written even if the Puppet agent run fails in the end. There is even a Bash script which does both – but it is a difficult-to maintain piece of code and cannot really speak Yaml, it just greps for certain elements.
Thus I wrote my own script inspired by the solutions mentioned above, but checking the last run state as well as verifying that the Yaml file has proper content – and written in Python, by the way. The script can be tested on command line:
$ sudo /usr/local/lib/nagios/plugins/check_puppetagent -w 3600 -c 9000 OK: Puppet was last run 13 minutes and 21 seconds ago
If the Yaml file is not properly formatted, the script throws an error:
$ python check_puppetagent -w 3600 -c 9000 CRIT: Yaml file not properly formatted, last puppet run failed.
The script does not support any further options or functions. Since the Yaml file does contain much more information it might make sense to give more information back to the monitoring server, or for example also give back the number of failures given in the status file if there are any. But for now, that is not implemented.
The script was also uploaded to Monitoringexchange. Since my employer strongly supports the ideas behind Open Source, I was able to publish the script under the MIT licence. I also wrote a blog post about the script on my German company’s blog.
What I totally forgot: there is also a ruby check script which does mainly the same as the Python script I wrote and was a good inspiration for my code.