Real World Mac Maintenance and Backups

We present four troubleshooting scenarios to demonstrate different strategies for resolving some common problems with cron. The examples are limited to cron because cron is by far the most popular of the utilities in this chapter. Also, these examples should provide the information needed to troubleshoot problems in the other utilities covered.

The first scenario shows how to use the information cron provides. Many issues can be resolved just by understanding what the emails from cron and cron's syslog messages mean. The second scenario shows that cron can start a process only to have it hang. The third scenario shows a problem with the cron daemon. Finally, we cover a problem common to many log files.

Scenario 10-1: Using cron Output for Debugging

The database administrator complains that his report doesn't run. Not that our DBA would mislead us intentionally, but let's check the crontab to verify the job entry.

We can confirm the job entry with the crontab command. We use the -u dbprod option to look at another user's crontab. Only root can do this.

# crontab -u dbprod -l # DO NOT EDIT THIS FILE - edit the master and reinstall. # (/tmp/crontab.3389 installed on Sun Aug 15 15:04:14 2004) # (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie Exp $) 06 15 * * * /prod_serv/scripts/prod_daily_reports

We could just cat the crontab file as well:

# cat /var/spool/cron/dbprod # DO NOT EDIT THIS FILE - edit the master and reinstall. # (/tmp/crontab.3389 installed on Sun Aug 15 15:04:14 2004) # (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie Exp $) 06 15 * * * /prod_serv/scripts/prod_daily_reports

The crontab entry looks good. The next step is to check the cron log to see if we can determine whether cron started the job. There should be a line in /var/log/cron similar to the following:

Aug 15 15:06:00 sawnee CROND[3400]: (dbprod) CMD (/prod_serv/scripts/prod_daily_reports)

The cron log shows that cron did start the job at 15:06, and the process ID is 3400. We know the job ran. But did it finish? The crontab command line doesn't redirect standard output (stdout) or standard error (stderr), so any output should go to dbprod's mail.

From dbprod@sawnee.somecomp.com Sun Aug 15 15:06:00 2004 Date: Sun, 15 Aug 2004 15:06:00 -0400 From: root@sawnee.somecomp.com (Cron Daemon) To: dbprod@sawnee.somecomp.com Subject: Cron <dbprod@sawnee> /prod_serv/scripts/prod_daily_reports X-Cron-Env: <SHELL=/bin/sh> X-Cron-Env: <HOME=/home/dbprod> X-Cron-Env: <PATH=/usr/bin:/bin> X-Cron-Env: <LOGNAME=dbprod> /bin/sh: line 1: /prod_serv/scripts/prod_daily_reports: No such file or directory

We can see from the message that the DBA probably made a typo. We should confirm this idea:

# ll /prod_serv/scripts/ total 12 -rwx------ 1 dbprod root 73 Aug 5 17:32 prod_backup_1 -rwx------ 1 dbprod root 42 Jul 27 18:15 prod_backup_2 -rwx------ 1 dbprod root 79 Aug 15 13:51 prod_daily_report

Looks like the DBA meant to run prod_daily_report rather than prod_daily_reports.

If dbprod's crontab redirects stdout (standard out) and stderr (standard error) to /dev/null, the output is thrown away. 1 is always the file number of stdout, and 2 is the file number for stderr. The >/dev/null 2>&1 on the following crontab entry means make stdout /dev/null and make stderr the same file as stdout, which throws away the script output and any error messages:

06 15 * * * /prod_serv/scripts/prod_daily_reports >/dev/null 2>&1

The /dev/null file is not a real file. The device file throws away anything written to it:

# ls -l /dev/null crw-rw-rw- 1 root root 1, 3 Oct 2 2003 /dev/null

cron jobs often redirect output to /dev/null to avoid having to trim a log file or delete lots of emails from cron. If the cron job stops working, the crontab line should be modified to send stdout and stderr to a log file so that any output from the script can be examined:

06 15 * * * /prod_serv/scripts/prod_daily_reports >/tmp/dbprod.out 2>&1

The crontab could be changed to email the user with any output:

06 15 * * * /prod_serv/scripts/prod_daily_reports

Set the time fields so that the job runs again and see whether any output is generated.

Scenario 10-2: Using pstree to Find a Hanging cron Job

Ted, a software engineer, complains that his code sweep didn't run. Using the troubleshooting steps from the first scenario, we verify that Ted's job should have been run at 7:47 a.m.:

# crontab -u ted -l # DO NOT EDIT THIS FILE - edit the master and reinstall. # (/tmp/crontab.5691 installed on Mon Aug 16 07:45:59 2004) # (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie Exp $) 47 07 * * * /home/ted/sweep.sh

The job was started according to the cron log:

Aug 16 07:47:00 sawnee CROND[5697]: (ted) CMD (/home/ted/sweep.sh)

The user ted has not received email from cron. We know the job was started but never finished. Thus, it is probably still running. The pstree command is a good way to see what cron is doing. pstree shows process parent-child relationships in a tree format. Here is the entry for ted's job:

If necessary, we could look at the output from ps -ef to see more detail. We can see from the cron log that the process ID for the job is 5697. The process might be working normally or could be hanging, but cron is doing its job.

Scenario 10-3: Jobs Not Started Because cron Daemon Is Not Running

Ted's back. He says his code sweep didn't run. Running through the steps, we check the crontab and cron log.

The crontab has a job that should have run at 9:15 a.m.

[root@sawnee root]# crontab -u ted -l # DO NOT EDIT THIS FILE - edit the master and reinstall. # (/tmp/crontab.5962 installed on Mon Aug 16 09:13:24 2004) # (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie Exp $) 15 09 * * * /home/ted/sweep.sh

Ted modified his crontab this morning but was finished before 9:15. cron should have run the job but apparently did not because there is no line in the cron log indicating that the job was started.

Aug 16 09:12:48 sawnee crontab[5962]: (ted) BEGIN EDIT (ted) Aug 16 09:13:24 sawnee crontab[5962]: (ted) REPLACE (ted) Aug 16 09:13:24 sawnee crontab[5962]: (ted) END EDIT (ted)

Next, we check whether crond is running:

# ps -ef|grep cron root 5977 5832 0 09:23 pts/7 00:00:00 grep cron

The crond binary is not running. Let's restart it and then try to figure out what happened.

# /etc/init.d/crond start Starting crond: [ OK ]

We can see crond is now running.

# ps -ef|grep cron root 6001 1 0 09:31 ? 00:00:00 crond root 6005 5832 0 09:31 pts/7 00:00:00 grep cron

We check /var/log/cron, and it seems crond is working:

Aug 16 09:31:14 sawnee crond[6001]: (CRON) STARTUP (fork ok)

We probably need to run a test cron job just to confirm that cron is working. If so, we move on and try to determine what happened. We check /var/log/messages for system messages that may show a problem. We verify that the cron packages are installed properly.

What cron packages can we find?

# rpm -q -a|grep cron crontabs-1.10-5 vixie-cron-3.0.1-74 anacron-2.3-25

The crond binary is probably delivered with vixie-cron. We confirm this suspicion:

# rpm -q -filesbypkg vixie-cron-3.0.1-74 vixie-cron /etc/cron.d vixie-cron /etc/rc.d/init.d/crond vixie-cron /usr/bin/crontab vixie-cron /usr/sbin/crond vixie-cron /usr/share/man/man1/crontab.1.gz vixie-cron /usr/share/man/man5/crontab.5.gz vixie-cron /usr/share/man/man8/cron.8.gz vixie-cron /usr/share/man/man8/crond.8.gz vixie-cron /var/spool/cron

We see /usr/sbin/crond is delivered with vixie-cron. Now, we just make sure vixie-cron-3.0.1-74 is installed properly:

# rpm -V vixie-cron-3.0.1-74

No output means that the package is installed properly. Maybe someone with root access killed crond by accident. We could also check for updates to the package that might correct a new bug.

See Chapter 8, "Linux Processes: Structure, Hangs, and Core Dumps," for further troubleshooting guidance.

Scenario 10-4: cron Log Not Updating

Users say cron is working fine, but a system administrator notices that the cron log isn't being updated. The cron log is zero bytes in size.

# ls -l /var/log/cron -rw------- 1 root root 0 Aug 31 20:26 /var/log/cron

Let's run a command to generate a cron log update.

# crontab -l no crontab for root

That should have generated an entry in the cron log even though root has no crontab file, but it didn't.

# ls -l /var/log/cron -rw------- 1 root root 0 Aug 31 20:26 /var/log/cron

We know cron uses syslogd to do its logging. Let's make sure syslogd is running and working:

# ps -ef|grep syslog root 1566 1 0 Aug23 ? 00:00:00 syslogd -m 0 root 28889 28848 0 20:21 pts/3 00:00:00 grep syslog

We can see syslogd is running. Is it still logging messages?

# tail -3 /var/log/messages Aug 31 17:36:59 sawnee sshd(pam_unix)[28511]: session closed for user root Aug 31 20:19:36 sawnee sshd(pam_unix)[28846]: session opened for user root by (uid=0)

From the previous output, we can see syslogd was working as of Aug 31 8:19 p.m. If there were no recent entries in the messages log, we could use the logger command to generate a test message:

# logger -p kern.info "test from sysadmin" # tail -1 /var/log/messages Aug 31 20:30:47 sawnee root: test from sysadmin

Where do cron messages go?

# grep cron /etc/syslog.conf *.info;mail.none;news.none;authpriv.none;cron.none /var/log/messages # Log cron stuff cron.* /var/log/cron

Messages should be routed to /var/log/cron. Because syslogd is doing its job, something must be wrong with the cron file itself.

Someone with root access might have tried to trim the file incorrectly by removing or moving it and then touching cron to start a new log. If we look for cron files, we can see a cron.bkup that seems out of place.

# ls -l /var/log/cron* -rw------- 1 root root 0 Aug 31 20:26 /var/log/cron -rw------- 1 root root 19795 Aug 29 04:02 /var/log/cron.1 -rw------- 1 root root 13506 Aug 22 04:02 /var/log/cron.2 -rw------- 1 root root 19512 Aug 15 04:02 /var/log/cron.3 -rw------- 1 root root 12288 Jul 28 22:01 /var/log/cron.4 -rw------- 1 root root 10254 Aug 31 20:27 /var/log/cron.bkup

Sure enough, our crontab entry went to the old file.

# til -1 /var/log/cron.bkup Aug 31 20:27:19 sawnee crontab[28906]: (root) LIST (root)

Moving a file using the mv command within the same filesystem while a file is open for writing generally causes problems. The program attached to the open file descriptor continues writing to the old file until the file descriptor is closed. This happens even though the filename has changed. The file descriptor points to the open inodes on the disk. In this instance, even though the file has a new name, syslogd keeps writing to it. The solution is to restart syslogd or put the file back. The syslogd(8) man page tells us to restart syslogd with:

kill -SIGHUP 'cat /var/run/syslogd.pid'

This command tells syslog to close and open the files again and fixes the logging issue. More syslog information can be found in the syslogd(8) and syslog.conf(5) man pages.

# crontab -l no crontab for root # tail -1 /var/log/cron Aug 31 20:39:49 sawnee crontab[28949]: (root) LIST (root)

The cron log is updating properly.

Methodology

The flow chart in Figure 10-2 shows the troubleshooting methodology that was used in the examples.

Figure 10-2. Troubleshooting methodology flow chart

Категории