Backup & Recovery: Inexpensive Backup Solutions for Open Systems

3.8. Backing Up and Restoring with the cpio Utility

cpio is a powerful utility. Unlike dump, it works on the file level. For this reason, it handles changing filesystems a little better than dump, but it changes the access time (atime) of files as it is backing them up. (It does have an option to reset atime, but this changes ctime.) Unless you're using GNU cpio, one of cpio's biggest challenges is compatibility between different operating systems. In addition, cpio requires you to specify files to include on standard input, which makes it a bit different from all other backup tools.

cpio does make you do more work than dump does. This means you need to know a little bit more about how it works if you want to use it for regular system backups. You need to understand:

  • How to use find with cpio to do full and incremental backups of a filesystem, while leaving the access time (atime) of the files unmodified

  • What arguments give you the best results

  • How to use rsh or ssh to send a cpio backup to a remote backup drive

  • How to get a table of contents of that volume

  • How to manipulate a tape drive and restore from a backup created by cpio

One good thing about cpio is that its name is usually cpio. (A great advantage over dump to be sure!)

Mac OS users: Remember to use the native cpio if you're running a version of Mac OS later than 10.4. Otherwise, use ditto if you need cpio format.

Let's start with the basic syntax of cpio, followed by some example commands.

cpio's backup syntax is as follows:

cpio -o [aBcv]

cpio's restore syntax is as follows:

cpio -i [Btv] [patterns]

The following example command creates a full backup of /home to a local tape drive:

$ cd /home $ touch level.0.cpio.timestamp

The touch command is optional, but it makes incremental backups possible.

$ find . -print|cpio -oacvB > device

Of course, the device in the preceding command also could be a local file if you are backing up to an optical or CD device. This command creates an incremental backup of /home to a local tape drive:

$ cd /home $ touch level.1.cpio.timestamp $ find . -newer level.0.cpio.timestamp -print \ |cpio -oacvB > device

These commands create a full backup of /home to a remote tape drive:

$ cd /home $ find . -print|cpio -oacvB \ |(rsh remote_system dd of=device bs=5120)

Here's a more secure method that uses ssh:

$ find . -print|cpio -oacvB \ |(ssh remote_system dd of=device bs=5120)

3.8.1. The Syntax of cpio When Backing Up

The cpio command takes its list of files from standard input (stdin) and by default sends its data stream to standard output (stdout). To provide a list of files to back up, do anything that generates a list of files:

  • Use ls or find (e.g., ls | cpio -oacvB).

  • Create an include file, then send it to the stdin of cpio (e.g., cat /tmp/include | cpio -oacvB, or cpio -oacvB </tmp/include).

All the preceding references generate an include list with a path that is relative to the current working directory. This is done automatically with dump, but with cpio, you can use either relative paths (e.g., cd /home;find .) or absolute paths (e.g., find / home1). However, using absolute paths severely limits your restore flexibility. If a table of contents of your cpio file shows /home1/directory/somefile, you can restore it only to / home1/directory/somefile. (Sometimes it is possible to use chroot to fix this, but it is very tricky!) On the other hand, if the table of contents shows ./home1/directory/somefile or home1/directory/somefile, you can restore it to anywhere you want by changing to another directory and running the restore from there. Therefore, you should always use relative paths when creating include lists for cpio or tar. (GNU tar suppresses absolute paths during a restore, but it is probably better to develop a habit of using relative paths when creating include lists for either of these backup utilities.)

find is the usual method for making regular system backups because it can make cpio perform incremental backups. Before beginning a full backup of a filesystem or directory, create a timestamp file in the top-level directory. For example, in the native version of cpio, if you want to do incremental backups of /home1, create a file called / home1/level.0.cpio.timestamp. Then perform the full backup, using a find command that lists the entire contents of that directory or filesystem (e.g., find . -print). When it is time for a level 1 backup, you create the file /home1/level.1.cpio.timestamp and use a find command that looks for files newer than /home1/level.0.cpio.timestamp (e.g., find . -newer level.0.cpio.timestamp). The level.1.cpio.timestamp file can then do a level 2 backup, using a find command that looks for files newer than that file. You can use this technique to generate as many levels of backups as you wish.

3.8.2. The Options to the cpio Command

There are six options that should be used when making regular cpio backups. The first five usually are listed all at once (e.g., -oacvB), and the last one usually is listed as a separate argument (e.g., -C 5120). (Note that the -B and -C options are mutually exclusive; they cannot be used together.)

o

The o option specifies that a backup should be created.

a

The a option resets atime to its value before the backup.

c

The c option tells cpio to use the ASCII header format.

v

The v option results in verbose output.

B, C

The B and C options let you specify the block size.

In addition, you can specify a device or file to which cpio can send its output rather than sending it to stdout. All of these options and more are available in the GNU version of cpio, as is the ability to use remote devices.

Use GNU cpio if You Can!

GNU cpio brings a lot of functionality to the table, and there are three very good reasons for using it if you can:

The native cpio utility is not very portable, even when it says it is. However, if you write a backup using GNU cpio, you can always read it as long as you have GNU cpio on your systemno matter what platform it is.

The portable ASCII format also has limitations. For example, it cannot handle a filesystem with more than 65,536 inodes. The newc header format available in GNU cpio has overcome this limitation.

It supports remote devices just like dump! As long as it's OK to use rsh authentication, all you have to do is enter:

$ -O remote_host:/device_name

GNU cpio is available at http://www.gnu.org.

3.8.2.1. Specifying the output mode (o)

The o option is one of the three modes of cpio (o, i, and p) and is used to create a backup. It is listed as the first of several arguments.

3.8.2.2. Restoring access times (a)

One of the differences between dump and cpio is that dump backs up directly using the disk device, whereas cpio must go through the filesystem. Therefore, when cpio reads a file to back it up, it changes its access time (atime). System administrators typically use this value to see when a user has last used a file by looking at it in some way. Files that have not been accessed in a long time are typically removed from the system as part of a cleanup process. If your backup program changes the access time of a file, it appears as if all files are used every night. This option to cpio can reset atime to its original value.

Restoring access times causes ctime to change. This could trigger some hacker alerts if you're watching these things closely.

3.8.2.3. Specifying the ASCII format (c)

When cpio backs up, it can send the data to the backup device using a number of header formats. These formats can be very platform-dependent, and therefore not very exchangeable between systems. The most exchangeable format (although not completely exchangeable) is called the ASCII format. The c option tells cpio to use this format. As mentioned in the sidebar "Use GNU cpio if You Can!", this format may not be as interchangeable as you might think. If you are really concerned with portability, you should consider using GNU cpio. If you can't use it, you should try transferring cpio files between the different flavors of Unix that you have. At least you will know where you stand. Either way, using the c option can't hurt.

3.8.2.4. Requesting verbose output (v)

The v option causes cpio to print the list of files that it backs up to standard error (stderr). The actual data of the cpio backup goes to standard out (stdout). (The backup data always goes to stdout, unless your version of cpio supports the -O option, which can specify an output file or device.)

3.8.2.5. Specifying a blocking factor of 5,120 (B)

The B option simply tells cpio to send its data to stdout in blocks of 5,120, instead of the default block size of 512. This can help the backup to go faster. However, it is nowhere near the large blocking factors that many modern backup drives prefer. You should therefore use the C option listed next if it is available on your system. The two options are mutually exclusive.

3.8.2.6. Specifying an I/O block size (C)

The C option does require an argument and allows you to specify the actual block size. If you are on AIX, the value is a blocking factor, which is multiplied by the minimum block size of 512. Most other Unix versions allow you to specify the value in bytes.[]

[] This time, its HP that's the strange one! It doesn't have a similar method for setting block size, and the -C option on HP does something totally different, causing it to use checkpoints. It has nothing to do with the blocking factor at all. (The feature isn't such a bad idea, but couldn't they have used another letter?)

Either way, you can set this value to be quite large, allowing cpio to perform much better with modern backup drives. Once again, this option is mutually exclusive with the B option and usually is listed separately with its argument, as in the following example:

$ find . -print|cpio -oacv -C 129024 >device

3.8.2.7. Specifying an output device or file (O)

Some versions of cpio allow you to specify a -O device argument, which causes the output to go to device. (This option is not always available.) All versions of cpio, however, default to sending the backup data to stdout. Once again, for simplicity, you don't have to use the -O option even if it is available. To specify a backup device, simply redirect stdout to a file or device. This method always works, no matter what version of Unix you are using.

3.8.2.8. Backing up to a remote device (piping to an rsh or ssh command)

The native version of cpio does not automatically support remote devices in the way that dump does. (The GNU cpio version does do this.) So, in order to back up to a remote backup drive, you need to replace the > device option with a pipe to an rsh or ssh command:

$ find . -print|cpio -oacv \ | rsh remote_system dd of=device bs=5k

Here's a more secure version:

$ find . -print|cpio -oacv \ | ssh remote_system dd of=device bs=5k

Notice that it is piped to a dd command on the remote host. Since the input file is stdin, you need only specify the output file (of=) and the block size. You need to specify the 5 K block size because that is readable by any version of cpio.

3.8.3. Restoring with cpio

The same rules apply to cpio as to any other restore command. I hope that you aren't sitting there with a cpio volume in your hand that contains your very critical system backup, and you've never restored with cpio before. Remember, test, test, test, and practice, practice, practice! OK, now that I'm off my soapbox, don't worry. Restoring from a cpio volume isn't that hard, although there are a number of possible challenges that you may face when trying to read a cpio volume.

This next section assumes that you know the volume was made with cpio and that you know its block size. If you do not have this information, see the section "How Do I Read This Volume?" in Chapter 23.

3.8.3.1. Different versions of cpio

Just because you know that a backup volume was written in cpio format doesn't mean you can read it easily. This is because, although most versions of cpio are called cpio, they don't always produce the same format. Even the ASCII header that is intended to provide portability is not readable among all platforms. If you just want to see if you can read the volume, try a simple cpio -itv < device. If that works, then you're golden! If it doesn't work, you might get errors like:

Not a cpio file, bad header

or:

Impossible header type

GNU cpio can save you hours of work. If you have GNU cpio, you could skip this whole section. The following is an excerpt from the GNU cpio manpage: "By default, cpio creates binary format archives, for compatibility with older cpio programs. When extracting from archives, cpio automatically recognizes which kind of archive it is reading and can read archives created on machines with a different byte-order."

3.8.3.2. Byte-order problems

If you are reading the volume on a type of platform that is different from the one on which the volume was written, you might have a byte-order problem, and you will probably get the first of the two preceding errors. The b, s, and S options to cpio are designed to help with byte-order problems:

$ cpio -itbv < device # Reverse the order of the bytes within each word. $ cpio -itsv < device # Reverse the order of the bytes within each half word. $ cpio -itSv < device # Swap half word within each word

Reversing the byte order may allow you to read the cpio header, but it may render the restored files useless. If the volume was not made with the c option, your best bet is to restore it on a system with the same byte order. (Consult the section "How Do I Read This Volume?" in Chapter 23 for more information about byte order.)

3.8.3.3. Wrong header type

If you don't have a byte-order problem, the cpio data might have been written with a different type of header. Some versions of cpio can automatically detect some of the headers, but they can't detect all of them, and some versions of cpio can detect only one type automatically. You may have to experiment with different headers to see which one it was written in. If this is your problem, you are probably getting the "Impossible header type" error. (Again, GNU cpio is able to detect any header type automatically.) Try some of the following commands:

$ cpio -ictv <device # Try reading the incoming data in ASCII format $ cpio -itv -H header <device # Try reading with a header of value header

The value header could be crc, tar, ustar, odc, and so on. Consult your manpage. This option is not available everywhere.

$ cpio -ictv -H header <device # Combining ASCII and header options

3.8.3.4. Strange block size

Finally, the cpio volume could have been written with a block size other than what cpio expects. If the block size of your cpio backup is 5 K, you can try telling cpio to use that block size by adding the B option to any of the preceding commands (cpio - itBv). If the block size is not 5 K, you can get cpio to use it by adding a -C blocksize at the end of the cpio command (cpio -itv -C 5120).

3.8.3.5. Full or partial restore, or table of contents only?

Once you determine that you can read the cpio backup volume, you have several choices of what to do with it:

  • Restore the contents into the current directory or filesystem.

  • Restore files that match the pattern you specify. This "pattern" can be the ouput of a command.

  • Do either of the preceding while interactively renaming the files.

  • Read the table of contents.

3.8.4. cpio's Restore Options

Before doing any of the things just described, you have several options available to read from a cpio volume. Many of these are the same options that you used to create a cpio volume, such as (B) for 5 K blocks, (c) to read an ASCII header, and (v) to give verbose output. In addition, you have the following:

i

The i option starts out the restore options string and tells cpio that it is in input mode.

t

If the i option is followed by a t, cpio generates a table of contents. It does not actually restore anything from the volume.

k

The k option tells cpio to attempt to skip bad spots in the volume.[§]

[§] This option is also in GNU cpio for compatibility reasons with legacy shell scripts but is actually ignored. GNU cpio always attempts to skip bad spots on the tape. Therefore, if you are using gcpio, you can drop this option. Some other versions do not have the option at all.

d

The d option causes cpio to make directories as needed.

m

The m option tells cpio to restore the original modification times of the files when they were backed up. Otherwise, cpio's default action is that the modification times of a restored file are set to the time of the restore.

Note that cpio's default action in this regard is the opposite of tar's default action.

u

This option tells cpio to unconditionally overwrite all files.

"* pattern* "

This option restores files that match the pattern.

f "* pattern* "

This option restores files except those that match the pattern.

r

This option tells cpio to interactively rename files. If any files are restored, the user is asked to rename each file as it is restored. If the user enters a null value, the file is not restored.

3.8.5. Telling cpio Which Device to Use

Unlike tar or dump, cpio does not take the name of the backup device as an argument.[||]

[||] That is, unless you want to use the -I option supported by some versions of cpio. Once again, though, this book concentrates on those options that work almost everywhere.

You must feed cpio the data through stdin. You can do this the hard way by using dd or cat:

$ dd if=device bs=blocksize | cpio -options

Alternatively, you can simply redirect stdin to read from the device:

$ cpio -options < device

3.8.6. Examples of a cpio Restore

The only question now is what options are needed. The easiest way to explain this is to show you example commands for the things that you can do with a cpio volume. Several "optional" options are listed in these example commands. Many of these options, while not required, make the operation easier or more robust. Some of the options may not be applicable to your particular application, so feel free to not use them.

3.8.6.1. Listing the files on a cpio volume

The following command reads the cpio volume in (B) blocks of 5120 bytes, uses the (c) ASCII format when reading the header, (k) skips bad spots on the volume when possible, and lists only the (t) table of contents with a (v) verbose (ls -l) style listing:

$ cpio -iBcktv < device

3.8.6.2. Doing an entire filesystem restore

The following command reads the cpio volume in (B) blocks of 5,120 bytes, uses the (c) ASCII format when reading the header, and makes (d) directories where needed. It (k) skips bad spots on the volume when possible, retains the original file (m) modification times, (u) unconditionally overwrites files, and (v) lists the names of the files that it recovers as it reads them:

$ cpio -iBcdkmuv < device

Of course, you can do the same thing, but without the (u) unconditional overwrite:

$ cpio -iBcdkmv < device

3.8.6.3. Doing a pattern-match restore

To restore files that match a certain pattern, simply list the pattern(s) you are looking for after the command:

$ cpio -iBcdkmuv "pattern1" "pattern2" "pattern3" < device

The pattern uses filename expansion wildcards, not regular expressions.[#]

[#] For learning more than you ever thought possible about regular expressions, I highly recommend Mastering Regular Expressions, by Jeffrey Friedl (O'Reilly). Understanding what they are and what they do is an eye-opening experience and will make your use of tools such as grep, sed, awk, and vi much more fruitful.

Filename expansion wildcards work like the ones on the command line (e.g., *ome* finds both home1 and rome). The cpio command is the only native restore utility that supports wildcard restores in this way. For example, if you want to restore all of the files that were in my home directory (/home1/curtis), you can type:

$ cpio -iBcdkmuv "*curtis*"

Quoting the pattern as shown in the previous code causes the filename expansion to be applied to the files in the archive. If you don't quote the pattern, the shell expands the wildcard for you, and cpio sees a list of filenames that currently exist on the system and match the pattern *curtis*. If you have deleted some of these files or if you are in a different directory, the results will not be what you expect!

To restore all files except those matching a certain pattern, use the f option, and list the excluded pattern(s):

$ cpio -iBcfdkmuv "pattern1" "pattern2" "pattern3" < device

3.8.6.4. Renaming files interactively

The following is the same command as that in the previous section "Doing an entire filesystem restore" but prompts the user to interactively (r) rename any files that are restored:

$ cpio -iBcdkmruv < device

The following is the same command as that in the previous section "Doing a pattern-match restore" but prompts the user to interactively (r) rename any files that are restored:

$ cpio -iBcdkmruv "pattern" < device

3.8.6.5. Other useful options

b, s, S

These options are used to swap bytes when you have byte-order problems. Use them as a last resort, because I've yet to see them used with unqualified success. There is one scenario in which they might come in handy: if you are trying to read a volume that was made on a little-endian machine, but you're on a big-endian machine. (See the section "How Do I Read This Volume?" in Chapter 23 for more information.) The person making the cpio backup did not use the -c option, so the only way that you can read the volume is to perform a byte swap:

$ dd if=device bs=10240 conv=swab | cpio -options

Afterwards you discover that the words in the backup are now reversed from the order in which you need them, resulting in restored files that can't be read. Allegedly, you could have cpio swap the words for you as they are restored. Notice the addition of the b option to the regular cpio command:

$ dd if=device bs=10240 conv=swab | cpio - iBcdkmubv < device

The b option is equivalent to using both the s and S options together. The problem here is that all this byte-swapping is going on without dd or cpio knowing what the format of the file is. What if the expected 8-byte words aren't 8 bytes at all? What if they're 10? Again, I have not met anyone who has used these options with complete success, so if you do, send me an email!

6

The 6 option reads a Unix sixth-edition archive. Use it for reading really old cpio backups.

3.8.6.6. Restoring to a different directory

If you made your backup volumes using relative pathnames, this is not a problem. Simply cd to the directory where you want to restore, and issue your cpio restore commands from there. If you don't know whether the volume was written with relative pathnames, enter the command cpio -itv < device, and look at the filenames. If they start with a /, the volume was made with absolute paths. In that case, you can do one of two things:

Use a symbolic link

If you are on Unix, the chroot command should be available. If you are on a non-Unix platform or the chroot command is not available, you may have to be more creative. If you have to restore to a different directory, and the backup was made with absolute pathnames, you might create a symbolic link from /home2 to /home1 (e.g., ln -s /home2 /home1). That way, any files that are supposed to go into /home1 actually go into /home2. This works only if /home1 is not mounted on that system. If /home1 is already present; you must unmount it. This, of course, is a pain, which is why you should be making your backup volumes with relative pathnames.

Use GNU cpio

This is really the best option. GNU cpio has a no-absolute-pathnames option that removes the leading slash (/) from any absolute paths and restores the files relative to the current directory.

3.8.7. Using cpio's Directory Copy Feature

If you need to move a directory from one place to another, you can try this little-used feature of cpio. Issue the following command:

$ cd old-directory ; find . -print | cpio -padlmuv new-directory

This moves old-directory to new-directory, resetting (a) access times, creating (d) directories when needed, (l) linking files when possible, retaining the original (m) modification times, and (u) unconditionally overwriting all files, while giving a (v) verbose output of the files that get copied.

Some versions of Unix also have a -L option that causes cpio to follow symbolic links, copying the directories and files to which they point, instead of the symbolic link itself. If you use this option, make sure that the find command that is feeding cpio its file list uses the -follow option. If you do not, you will get unpredictable results.

If you were to compile a list of all the options that are available on all Unix platforms, it would be very long. Depending on your platform, there may be a lot of other neat options that can make cpio more useful for you. There are also a number of extra features in GNU's version of cpio. Make sure you read the manpage for your version of cpio. Please be aware that if you use any of the options that affect how the cpio backup is written, it may reduce its portability.

Категории