Linux Annoyances for Geeks: Getting the Most Flexible System in the World Just the Way You Want It
2.2. My Boss Insists on Real-Time Backups
There are two key reasons to do backups: data redundancy in case of local hardware failure and data availability in case of disaster. Appropriate RAID arrays can ensure the availability of your data in case of hardware failure. Careful network backups can help keep your data available in case of natural or man-made disaster. Naturally, this is an administrative function beyond the capabilities of regular users. High-speed network backups saved the data from a number of financial firms after the tragedies of September 11, 2001. Without those backups, a lot more financial data would have been lost, and I suspect the subsequent economic declines might have been much worse. Many standard books and documents tell you how to back up your system while it's down and unavailable to users. But Linux is increasingly being used as a server in environments where downtime is considered a sin. When used with removable hard drives, RAID does not require downtime. And the removable hard drives can be sent to safe locations. Unfortunately, hardware RAID solutions are more expensive compared to software RAID. And they go beyond the packages that are included with most Linux distributions. There are other high-capacity/high-availability commercial solutions, such as Red Hat Cluster Manager and SUSE Heartbeat. But if you're stuck and need to configure a real-time backup using just the software available with a Linux distribution, consider software RAID, as described in this annoyance. 2.2.1. RAID Basics
As software RAID in Linux has limits, it's useful to review some basic characteristics of RAID. Linux supports four different levels of software RAID:
There are other levels of RAID supported by Linux. For example, there is partial support for RAID 6, which includes a second level of striping compared to RAID 5 (and therefore can tolerate failures of two disks in the array). RAID 10 is a combination of RAID 0 and RAID 1, a striped volume built on two RAID 1 arrays. Until the development of Serial ATA (SATA) and multidisk Parallel ATA (PATA) IDE hard disks (and controllers), effective use of RAID was limited to SCSI drives. Naturally, if you want a real-time backup, you're looking for some implementation of a RAID 1 array. While hardware RAID is not dependent on the operating system, there is an excellent introduction to how to use it with Linux in the DPT Hardware RAID HOWTO at http://www.ram.org/computing/linux/dpt_raid.html.
2.2.2. Tools for Software RAID
Software RAID is configured through the operating system. Unlike hardware RAID, there is no dedicated hardware controller for the disk array. You can control a software RAID array with commands and configuration files. While Software RAID takes more work than hardware RAID, it can be surprisingly efficient because there is no potential bottleneck at the non-RAID hardware controller. But when you configure a software RAID array, be careful. Make sure to configure partitions on different physical drives. Otherwise, you can't get the benefit of fast access through different controllers. Furthermore, if you've configured more than one RAID 1 or 4/5 partition on a single physical drive and it fails, you'll lose all data in that array. Before you can use software RAID, your system must meet two requirements:
A couple other packages may be useful:
2.2.3. Typical RAID 1 Configuration
A Linux software RAID 1 array keeps an exact copy of the files from one partition, such as /dev/hda5, on a second partition, such as /dev/hdc5. It's important to place each partition in a RAID 1 array on separate hard disks. While details vary, this is essentially how to set up the /home directory on a RAID 1 array:
2.2.4. Networking RAID 1
You can set up one of the mirrors in a RAID array in a remote location. Naturally, this requires a dedicated high-speed connection. Details would take up another full book. There are several techniques for real-time networked RAID mirrors, based on the Enhanced Network Block Device (ENBD). With an ENBD, you can configure a remote partition from a RAID array so that it appears local on your computer. It's an academic research project funded in part by Realm Software; the home page is http://www.it.uc3m.es/ptb/enbd. |