Data Protection and Information Lifecycle Management

 < Day Day Up > 

General backup software makes copies of data either by opening a file and copying the bytes out of it or by transferring raw blocks of data from a disk to a backup media. In either case, general backup is a binary copy. In the event of catastrophic failure of the object, such as corruption of the volume or a disk crash, the entire object can be recovered from the backup medium and the object restored to its state as of the last backup. This is called an image copy.

This creates problems for multielement, structured objects such as databases, e-mail systems, and similar applications. The elements inside structured objects are unknown to the general backup software. A database, for example, is copied to the backup media as a file or volume (depending on how it is implemented). It can be restored only as that type of object.

What if the event is not a catastrophic one? Perhaps a critical table was accidentally deleted, and the database is unusable. An important e-mail may have been deleted and needs to be recovered. It would be unacceptable to return an e-mail or database system to a day-old state, losing all changes that have occurred since then, just to retrieve a single element.

With image copies, the data is copied precisely, but there is no catalog of the elements within. Individual elements are inaccessible, because the software does not record what is in the structured objects. A Microsoft Exchange Server database can be copied to backup media as a whole volume or a file. General backup software does not know about the e-mails, attachments, contacts, and calendar items that are in the database and, hence, has no way to find them later. When it comes time to restore a single e-mail, the software has no way to reference it.

To alleviate this problem, many applications that use structured data stores have backup software that can catalog the elements within the structured objects. Oracle's Recovery Manager, also known as RMAN, is one such utility. Capable of backing up a database to various media, including disk, it allows for selective restores of database elements.

Most backup software vendors have application-specific versions of their software. They offer versions of their software that catalog internal elements of structured data objects. It is easy to find support for Oracle, Microsoft SQL Server, Microsoft Exchange Server, and IBM's Lotus Notes/Domino, either as a product or as an add-on to the primary backup software.

Structured Object Backup Constraints

Structured data objects have certain constraints that must be taken into account when backing them up. To begin with, many of these objects are outside the normal file system. The server processes that create and manage these objects use direct block I/O to access data on the disk. Subsequently, the normal discovery mechanisms that backup software uses cannot find application data. As far as the backup software is concerned, the data doesn't exist. Luckily, most backup software vendors and applications vendors have add-ons that allow for discovery of these objects. During installation of Microsoft Exchange Server, for example, a new version of the standard Windows backup software that is capable of seeing and backing up the Exchange database is loaded with it.

Most structured objects are difficult to back up when still in use. Many applications, especially database-oriented ones, require complete control over the underlying data store to ensure the integrity of the data. The database or application software has various controls built in to ensure that two processes do not try to change data at the same time. The software does this by placing locks on certain elements of the data. Other software is unaware of these controls, in much the same way that they are unaware of the internal structure. Backup software, by the very act of reading the data, can cause disruptions in the data. Many system administrators take database-oriented applications offline while backing them up. Because this practice makes the application or group of applications unavailable, it is not always desirable. Application-aware backup alleviates this constraint. By understanding the internal structure and locking mechanisms of the underlying databases, application-aware backup software can run backups without disrupting database operations and application availability.

Backup Software for Structured and Unstructured Objects

Backing up structured objects and backing up unstructured objects are dissimilar processes. Different software is required to perform backup and, more important, to restore lost data. This requires general backup software for normal file system objects and specialized versions for structured objects.

"Enterprise" backup software handles this process fairly well. This type of software generally has a server that sits on the network backup server and an agent that resides on the host managing a particular data store. Even more modest backup software may integrate general and application backup into one package.

Unfortunately, many system administrators are forced by budget concerns to use native application backup and a general package for the file system. This creates a process problem, as uncoordinated backup programs are running on different servers in the network. The use of a variety of software for backup creates management issues, especially when the hardware system is heterogeneous. Training, tweaking, and network behavior can all be dissimilar and unpredictable. It is always better, whenever possible, to purchase one product, with agents or plug-ins, for application servers and general file system backup.

Off-Site Backups

An interesting option for smaller organizations is to have a service provider perform backups to an off-site location. When the backup process is outsourced, most of the headaches are removed, and capital costs can be reduced to a more manageable monthly fee. This is a boon for capital-poor organizations or those that want to put their dollars into other projects.

A WAN link or virtual private network (VPN) over the Internet is established that connects the service provider's network to the customer's. The service provider runs backup servers on its side of the connection, which interacts with agents on the customer's servers or desktop computers. The service provider then takes responsibility for ensuring that backups are performed on a prearranged schedule and that the data is available for restore operations as needed. Pricing for this type of service is usually based on the amount of data backed up, though some service providers prefer to charge by the host.

For a small company with limited IT resources, using a service provider has benefits. Money is not tied up in capital equipment, and capital expenditures do not increase as the amount of data grows. Limited personnel are not stretched as backup needs grow, and the organization gains the extra protection of having off-site backups without the mishaps, hassles, and expense of having to move tapes.

What is given up is control. People outside the company are making decisions for the company. The service provider has to be trusted to provide for the security, safety, and availability of the data it holds. A reasonable amount of bandwidth on an Internet connection is also needed; for some companies, that can be expensive.

Backing up Home Systems

Most people do not have backup devices at home. Although some people in the computer industry and IT do, they are the minority. Home systems are often not included in backup strategies. Who cares, anyway? The organization will not suffer if a hard drive full of MP3 files is lost forever!

That's true, but along with those MP3s and the children's homework will also be work files. As more and more people bring work home or telecommute part time, a considerable amount of corporate data ends up on home PCs.

A strategy for backing up just the corporate data is necessary. In many cases, home workers are able to copy files to a corporate server, which is then backed up regularly. It might be over a VPN through the Internet, by dialing into the corporate network directly, or simply by carrying around a solid-state memory stick. This hardly ever works well. People forget to copy files to the server, and e-mail postboxes on home computers get ignored.

Instead, consider using a service provider that all workers can access through the Internet or backup software that can work across relatively slow links. Mobile workers and occasional laptop users can also be covered. As an incentive for users to participate, offer to back up other files of their own (with the exception of MP3s and graphic files, which may be illegal and are certainly huge).

It is bad policy to assume that home computers don't matter. Remember that the next time the CEO calls from home on Saturday and says, "I just lost an important file, and . . . "

SAN Backup Deployment Steps

Relieving the stress of backup has been one of the most important justifications for installing a SAN. SANs remain an important tool in architecting efficient backup and restore systems. They provide fast I/O and allow for consolidation of backup resources. This in turn saves money, as fewer personnel are needed to run systems as they grow and storage resources have higher utilization rates. SAN-based backup is an important technique for improving performance and efficiency, as well as saving money through consolidation.

A SAN-based backup system deployment does not happen all at once. The amount of new equipment alone creates a lot of work for systems personnel and is usually done in stages. First, the basics of the SAN are put in place, including switches and host adapters. After the infrastructure is in place, it is time to build the rest of the system. The basic SAN should be fully tested and operational before any live storage systems are placed into the system. Otherwise, data corruption and downtime may occur, crippling the applications that depend on these data stores.

Next, the backup drives are consolidated into one larger unit. This means replacing individual tape drives with a library. It is also an opportunity to build in a disk-to-disk virtual tape system. All backup jobs are now pointed at the consolidated backup storage.

Even with storage consolidation, there is still redundancy in the operations, because there are still several uncoordinated backup jobs running. Software designed for a single machine may run well in a SAN environment, but it is inefficient. Backup software designed for a SAN, on the other hand, has a more distributed architecture that better mirrors the design of the SAN itself. Software that utilizes a dedicated backup server and agents running on different computers is installed once the backup system is up and running on the SAN.

Last, in the interests of even more efficiency, a data mover may be brought in to perform server-less backup. It should always be deployed last, after everything else in the system is running well.

A SAN is great technology for backup. A staged deployment helps bring components online gradually, allowing for less disruption of operations.

Archive Is Different from Backup

Archive and backup seem to be very similar on the surface. In both cases, data is copied from a primary to a secondary media. The goals, however, are very different. The purpose of backup is to store copies of data so that it can be available in case of disaster, allowing a recovery operation to take place. The primary data is left in place.

Archive assumes that it is no longer necessary to have the data online. Much of the purpose of archiving is to make room for new data. Whereas backed-up data is expected to be restored in a short amount of time, archive is considered to be much less accessible. When data has to be brought back from archival stores, there is not supposed to be a rush. Archive is also selective, copying only data that is specifically slated for removal and not entire volumes.

For these reasons, archive systems can be much slower than backup systems, both for read and write. On the other hand, they need to be able to last a long time in less-than-hospitable environments. Though tape is still the dominant archive format, CD-ROMs and DVD-ROMs make excellent archive media. They are tough and long-lasting (some predictions of CD longevity are in the hundreds of years) but have a small capacity and are slow relative to disk and tape.

     < Day Day Up > 

    Категории