The Linux Page

at job spool growing

How it works

The at command is often used to start a process at a later time. It can run any script at a specified date and time.

For instance, if you want the computer to send you a signal in one hour, use something like:

cat signal | at -q z now + 60 minutes

Assuming that the file named signal contains the commands necessary to generate the signal. Then in 60 minutes, the script will be execute and the signal will be heard or shown.

Too many files

Whenever you run on a system that may randomly kill your processes, some of your daemons simply disappear from time to time. If you do not pay attention, that daemon could be cron or atd... and these you may notice right away (I'll tell you, when it is PostgreSQL or Apache, we notice immediately!)

It happened to us! atd died. Since I have processes that wake up all the time and add at commands, it generates files in the /var/spool/cron/atjobs folder. Our server folder had over 25,000 files! We create a new file each time a hacker comes in and attempt to hack one of our sites...

So...

1) atd runs happily

2) it gets killed because the system decided so

3) it does not run and files accumulate like crazy

4) it cannot be restarted because there are too many files!

Point 4 was the problem for me. I still get that silly error saying that some pam setup is preventing us from starting the daemon, but that's just because pam is like that with most daemons.

Sample of errors you can ignore:

May  2 01:44:45 web atd[9583]: PAM adding faulty module: /lib/security/pam_unix.so

May  2 01:25:09 web atd[19653]: PAM _pam_init_handlers: [Critical error - immediate abort]

Notice that the pam_unix module is called faulty! And the [Critical error - immediate abort] had nothing to do with PAM. I guess the fact that something crashed at a lower level made the system print out that error.

An error that I missed is the "Cannot allocate memory" as presented below. I think that's because I did not expect that to happen. But thinking of it... loading 25,000 files probably requires quite a bit of RAM. Never the less, once I deleted all the files, it was just fine.

May  2 01:44:45 web atd[9441]: Cannot fork: Cannot allocate memory
May  2 01:44:45 web atd[9563]: Error in fork: Cannot allocate memory
May  2 01:44:45 web atd[9560]: Error in fork: Cannot allocate memory
May  2 01:44:45 web atd[9659]: Exec failed for /bin/sh: Success
May  2 01:44:45 web atd[9583]: Authentication failure
May  2 01:44:45 web atd[9706]: Exec failed for /bin/sh: Success
May  2 01:44:45 web atd[9536]: Error in fork: Cannot allocate memory

Solution

1) Cannot start atd with:

/etc/init.d/atd start

This even though the command says that it succeeded (because atd crashes a little later! So it really was getting started.)

2) Check the /var/spool/cron/atjobs folder. If you see a really large number of files (larger than what expected) then delete the files and try again (note that you may want/need to execute all the files instead of deleting them!)