cron

Wednesday, 24. 02. 2010  –  Category: sw

Obviously cron jobs are abundantly useful for so many things, all the way from basic housekeeping up to big application functionality.

They’re also the source of plenty of flail. What do I mean?

  • They are neither code nor data, so often get overlooked, or shonkily installed, by application deployment tools
  • They run with a minimal environment that can catch out the unwary: scripts that work in interactive shell sometimes don’t from cron
  • The default behaviour of mailing output to the cronjob owner generates large amounts of mail that gets ignored, filtered or bounced
  • Jobs can fail silently and no-one notices until, say, you need to restore that backup that hasn’t run for last six months
  • Jobs that helpfully append their output to a log commonly don’t rotate that log
  • It’s easy to have jobs overlapping if they get stuck or take longer than expected to complete. This is a splendid way of wedging a machine.

The mail aspect is a particular peeve. In some jobs my mailbox has enjoyed several thousand cron generated mails a day, and there’s no way I’m able to accurately look at each one and react to it. Mostly they contain expected output from successful job execution, so they’re easy to skip. But I don’t trust my eyes to get that right all the time.

One approach to this is to arrange for jobs to only send mail on error. This is an improvement, but can lead into thinking that a job is happily succeeding when in fact it’s either not running or the only-on-error logic is bust. Since cron jobs often cover essential system tasks like backing up, syncing data around and reporting it’s vital that they don’t fail silently.

I’ve worked somewhere that tackled this by collating cron-generated mails from diverse systems into a system mailbox and pattern matching them for failure signs. This seems slightly dubious — it’s fragile and labour intensive — but at least the system also flagged if expected jobs failed to arrive and got our inboxes tamed.

To tackle these problems I find myself writing wrappers for cronjobs. I’ve written several variants to meet different situation’s needs. Unhelpfully I call them all cronwrap. These wrappers sets out to

  • Engage the amazingly useful lockrun utility to guard against multiple execution of stuck crons
  • Place cron output into timestamped logs that can be both aged out and made available to interested parties
  • Hook into local monitoring systems:
    1. On execution, update a run counter (SNMP data or some simple text file)
    2. On failure, send a SNMP trap or leave some bait for Nagios. Also, update a fail counter
    3. If lockrun has prevented a job running owing to overlap, send a SNMP trap or similarly bait Nagios
  • If required, send output by mail somewhere (sometimes this is necessary, even with the concerns listed above)

So, nothing surprising there. Using such wrappers helps keep cron jobs tamed and reliable, and it’s monitoring them near to where the action occurs, rather than mediating via SMTP.

This is hardly invention either, there’s plenty of prior art with different nuances in behaviour to meet the needs of different environments. Perhaps I’ll merge the variants of my efforts and publish too.

What’s curious is that this functionality isn’t available inside the cron daemon1 itself. It is perfectly placed to catch exit status, divert output and know if a job has overrun; and would remove the need for all this additional monkeying to make jobs reliable and well behaved. If my C wasn’t just read-only I’d have a crack at it!

There, I’ve finally condensed all my cron rant into one sustained piece.

Update: I posted a cron wrapper at https://github.com/zomo/cronwrap.

  1. To be clear, I’m talking about the BSD cron written by Paul Vixie. None of the variants I’ve seen address these concerns either. I’d love to know if there’s any I’ve missed. []

6 Responses to “cron”

  1. Jeff Says:

    I realize that no one uses it outside of Mac OS X, but I like launchd for a lot of the reasons you don’t like cron.

  2. Zach Peters Says:

    Weird, i was just on the same vibe earlier. I’m working up a blog post on a sort of cron dashboard. It won’t be anything close to production quality, but hopefully it will be usable.

    Can I pick your brain for an example cron wrapper script?

  3. lemon Says:

    Sure, one variant has a diddy web interface that a generous onlooker might call a dashboard at a stretch! It was written on someone else’s dime though, so I’ll check if they’re cool for me to publish.

  4. Cyde Weys Says:

    As a suggestion for a follow-up post, can I make a suggestion that you post some example cronwraps you’ve written? I would find those very useful and informative. I currently have problems with one job occasionally overrunning past its next invocation, so I’m going to give lockrun a go, but I’d also like to see how you handle logging and output direction.

  5. JohnK Says:

    Cron follows UNIXs philosophy for programs – do one thing and do it well. Cron simply scheduled jobs/scripts/programs. Whatever runs or doesn’t run is your problem. (.. and it does seem to be a problem to you.) I far prefer being able to do my own monitoring and managment of failed jobs than be constrained to one method built into cron.
    Your problems all seem to fit into a scripting category. So maybe it’s that you should be looking at.

  6. lemon Says:

    Absolutely. Using such wrappers are my monitoring and management solution to failing and misconstructed cron jobs. I don’t get to vet every cron that is installed on platforms I admin, but I do get to encourage the use of these wrappers to sidestep common pain points.

Leave a Reply