Backup & Recovery: Inexpensive Backup Solutions for Open Systems

3.11. Using rsync

Think of rsync as simply a copy command that can copy between systems. It's most like rcp in its syntax, but it's also like the Windows copy command to some degree. However, it has gone beyond a simple copy program by adding features such as the following:

Copies links, devices, owners, groups, and permissions

This means that rsync can copy everything properly from the source to the destination, including special files and all of the appropriate permissions. It can copy both hard links and soft links as well.

Can use any transparent remote shell, including ssh or rsh

rsync's default authentication mechanism is now ssh, but this can be easily overridden by changing the RSYNC_RSH variable to rsh.

Can run as authenticated or anonymous daemon

In addition to authenticating via rsh and ssh, rsync can also run as a daemon in either authenticated or anonymous mode. The former provides a more secure authentication mechanism, and the latter works really great for mirroring.

Has advanced exclude options

rsync can exclude files in the same way GNU tar does, using exclude strings on the command line or by creating an exclude file and specifying it with the exclude-from option. In addition, rsync can be configured to skip the same files that CVS would ignore.

Sends only changed blocks of changed files

This is the biggest difference between rsync and rcpand rsync's greatest featureand a lot of people don't realize it exists. When updating the destination, the source and destination split each changed file into blocks and run two CRC checks against each block. Only those blocks of data whose CRC checks don't match are transferred. This allows rsync to keep large files that change a lot in sync across much smaller pipes.

Sends several changed files as one large file

Since rsync performs a lot of single file and subfile activities, it can bunch them together into a single large transfer to reduce latency.

Can delete files

This is another big difference between rcp and rsync. rsync can delete files on the destination that are no longer present on the source.

Many people, including myself, have not really thought of rsync as a backup utility. One reason for this is that it is really a synchronization tool, not a backup tool. This means that, without some sort of intervention, a subsequent run of rsync overwrites the backup with a bad copy of the original, or deletes from the backup a file that was deleted on the original. That doesn't sound like a very good backup tool, does it?

However, it doesn't take a whole lot of work to put some history behind rsync. If you save previous versions before you overwrite them with newer versions or delete them, rsync can make an excellent backup tool. This book provides two examples of using rsync as a backup utility. Chapter 5 discusses BackupPC, and Chapter 7 describes near-continuous data protection using rsync and related utilities.

3.11.1. Basic rsync Syntax

Here are the basic ways to run rsync:

% rsync source [ source ...] destination

This command copies one or more source files or directories to a destination directory on the same machine:

% rsync source [source ...] username@hostname:destination

This command copies one or more source files or directories to a destination directory on a different machine, authenticating using rsh, or ssh if the RSYNC_RSH variable had been set to ssh:

% rsync source [ source ...] username@hostname::destination

Since the most common use for rsync for backup purposes is to transfer an entire directory tree from one machine to another, let's show that as an example. We want to transfer the directory /home to /backup on backupserver. We want to back up everything under /home (recursive, or -r); we want to back up soft links (-l); we want their times (-t) preserved, and permissions (-p) including owner (-o) and group (-g) preserved; and we want any special files transferred as well (-D). This command could look like this:

% rsync rlptgoD /home backupserver:/backup

Luckily for us, the rsync team realized that these options were very common for backup and archive purposes, so they created a single -a option that means the same as rlptgoD. So the following simple command is the same as the previous one:

% rsync a /home backupserver:/backup

Let's add verbosity (-v) and compression (-z) to the command:

% rsync avz /home backupserver:/backup

To be truly synchronized, we need to add the delete flag to our command:

% rsync avz --delete /home backupserver:/backup

Now, every time rsync runs, it copies everything from /home to /backup/home and deletes any files on /backup/home that aren't present in /home. All we've got to do is add some type of history collector on the other end, and we've got ourselves a backup system!

Be sure to read Chapter 7 on open-source near-continuous data protection systems and Chapter 5 on BackupPC to learn more about how to use rsync in a backup setting.

3.11.1.1. A few twists

All of these commands copy /home and its contents to the /backup directory on backupserver. That means they create /backup/home. If what you want to do is copy the contents of /home to /backup and not create a /home subdirectory, just add a trailing slash to the source directory:

% rsync avz /home/ backupserver:/backup

This command does the same as the following command, just with fewer keystrokes:

% rsync avz /home backupserver:/backup/home

By default, rsync commands authenticate using ssh. You can authenticate using rsh instead by changing the RSYNC_RSH variable to rsh. In addition, you can also tell rsync to connect to an rsync daemon running on another machine by putting two colons instead of one after the hostname:

% rsync avz /home/ backupserver::/backup

If the rsync daemon you're connecting to requires a password, you can specify that password using the RSYNC_PASSWORD variable.

3.11.1.2. rsync on Windows

rsync is really a Unix-style binary, but it can be run on Windows if you use a Unix emulator such as cygwin. However, all the hard work has been done, and some members of the rsync team have actually created precompiled packaged binaries that come with the cygwin1.dll file and an rsync.exe file. Instructions on how to run rsync on Windows, including how to run it as a service/daemon, can be found from the main rsync web page at http://samba.org/rsync/nt.html.

3.11.1.3. rsync on Mac OS

Using rsync on Mac OS is quite simple. The only thing you have to add is the E or extended-attributes flag that tells Mac OS to transfer the additional attributes that Mac OS files have. Basically, this is the option that tells it to transfer the resource forks. (The only odd thing is that E was an existing option on rsync that meant to transfer the executable bit in a file that was being transferred.)

3.11.2. Restoring with rsync

Restoring with rsync is exactly the same as backing up with rsync, except you change the order of the command. Specify as the source the location that is normally the destination, and specify as the destination the location that's normally the source, and you've got yourself a restore. Let's take the system from our earlier example, and reverse the source and destination directories:

% rsync avz backupserver:/backup/home/ /home

This tells rsync to restore everything from /backup/home on backupserver to /home on the local server. Of course, you can specify a single file as well:

% rsync avz backupserver:/backup/home/curtis/resume.doc /home/curtis

The real challenge with rsync restores is not the syntax of the command, it's keeping track of what files should be brought back and which files are actually the same corrupted copies that you don't want to restore. That is the responsibility of the backup program that you're using. If you were using a snapshot-like utility like the one covered in the book, you'd simply add something like daily.1 to the string to get yesterday's version:

% rsync avz backupserver:/backup/daily.1/home/curtis/resume.doc /home/curtis

You can read more about using rsync to make snapshots in Chapter 7.

Категории