Storage Networking Fundamentals: An Introduction to Storage Devices, Subsystems, Applications, Management, and File Systems (Vol 1)
Backup has been a problematic systems management application for many years as the amount of data needing backup has consistently outpaced the capabilities of the networks and equipment to do the job. SANs provide several architectural advantages for backup processing and promise to provide solutions that keep pace into the future. The following section explores some of the new SAN backup technologies that are likely to change the nature of data management for years to come. Problems with Legacy Network Backup
Legacy network backup has several architectural limitations that are making it less plausible over time. Two fundamental areas are creating most of the problems:
LAN Limitations
Legacy network backup depends on a LAN to function as the backup transfer network to copy data over. Unfortunately, most LANs, not even many Gigabit Ethernet LANs, have the bandwidth capabilities to handle the massive data transfer requirements of backup. This is not only unacceptable for backup performance, but it also wreaks havoc with the other applications running on the LAN by creating long periods of network congestion. When backup performance is too slow for the backup job to finish in the allotted backup window, nothing earth-shattering happens, and business processing goes on as normal. Most administrators simply stop the backup operation prematurely, which means the data on tape is inconsistent and untrustworthy for restores. The only time backup failures ever have a real impact is during restoresand by then it is clearly too late to do anything about it. It is possible to use a dedicated LAN for backup, but realistically, there are many environments where LANS are simply inadequate and impractical to get the job done. Part of the problem lies in the multiple transfers and processes involved in LAN-based backup. A backup agent initially reads the data from a storage device or subsystem and writes it into system memory, where it is temporarily held until it is transferred over the LAN. The data is then processed by the system's network protocol stack and transmitted by the network interface over the network. In the network, the transmission may encounter congestion in switches en route, which slows performance and often results in dropped frames and retransmissions. When the transmission is received by the backup server system, it is processed in inverse fashion through the network interface and protocol stacks, buffered in memory, and then written to a tape device of some sort. NOTE The problem of clobbering LAN traffic with backup traffic in the wee hours of the night has given many administrators troubles for many years. There is nothing quite like coming to work in the morning and finding out that the network didn't do half as much as it was supposed to. If you are reading this and nodding in agreement, and you are not using a SAN for backups, you need to get out of your seat and do something about it!
Distributed Backup Systems and Management
To circumvent the performance problems of LAN-based backup, backup devices and subsystems can be dedicated to individual application servers running backup software. In this case, backup traffic travels from disk storage through the system and out to tape storage. This approach is often used on large systems that cannot be backed up over a LAN. While the performance is optimal, these multiple backup systems create administrative problems and overhead. One of the best things about LAN-based backup is that the management of backup can be centralized and leverage tape automation equipment. The implementation of multiple, dedicated backup systems results in high management costs; instead of leveraging management across multiple servers, the cost of management is multiplied by the number of independent backup systems. When individual servers have their own tape equipment, administrators have to manage multiple rotation schemes and tape collections. This not only takes a great deal of time, but it also contributes to confusion and errors in media management. Optimal Performance and Management with SANs
LAN-based backup forces administrators to choose between inadequate performance and distributed management. In contrast, SANs allow administrators to have the best of both worlds: fast performance and centralized management. That's not asking too much, is it? Whereas LAN-based backup performance is limited by LAN performance, SAN-based backup is limited by the performance of server systems, backup devices, and subsystems. In other words, backup performance could not be any faster. Centralization can be achieved by virtue of the longer distances supported by SANs. Servers do not have to be in the data center, but can be spread throughout the organization and still have their data backed up to centralized tape equipment. Separating the Control and Data Paths
One of the interesting architectural notions with SAN-based backup is that the control of backup operations and execution of data copying during the backup operation can occur over different networks. This is sometimes called separating the control and data path. SANs are used as the data path, and the LAN is used as the control path. Figure 13-4 illustrates. Figure 13-4. Data Path in a SAN and Control Path in a LAN
LAN-Free Backup
LAN-free backup is a simple concept where applications servers copy their data to centralized backup storage over a SAN, as opposed to transmitting data over a LAN. It was one of the first applications identified to justify SANs when the technology was originally introduced, and it continues to be a strong reason to implement SANs. Figure 13-5 shows a simple design for a LAN-free backup system in a SAN. Figure 13-5. LAN-Free Backup
Backup operations in LAN-free backup are basically the same as with legacy network backup. The scheduling of backup operations and tape rotations is similar to those used for standalone backup, although more systems, tape subsystems, and tapes are integrated under a common management system. The main feature of LAN-free backup is that the I/O path avoids the LAN altogether. LAN-free backup agents residing in application servers transfer data directly to tape subsystems in the SAN. This is similar to using dedicated direct-attached backup devices, except that the tape equipment is located in a SAN, and the management of the process, including backup metadata operations, is performed by a centralized backup management engine. While many LAN-free backup systems today are designed as a single-distributed, heterogeneous system with a single point of control, it is also possible to set up a "poor man's" LAN-free backup system simply by running multiple single-system backup systems connecting to centralized SAN tape subsystems and/or devices. Using this approach does not integrate the operations and media management in a single system, but it at least allows high-speed, centralized backup processing. Serverless Backup Using Extended Copy
Another new and potentially powerful backup technology is called serverless backup, which is based on the concept of SCSI extended copy. This is discussed in Chapter 6, "SCSI Storage Fundamentals and SAN Adapters," in the section "Extended Copy and Third-Party Copy" and is pictured in Figure 6-7. The basic idea of serverless backup is for the backup engine to send EXTENDED COPY commands to a dual-mode controller in the SAN, which acts as a "remote control" initiator to read data from server disk storage and write it again to backup tape storage. The word "serverless" is used because backup data transfers are not processed or conducted through the application server that creates and processes data. The communications model used by a simple serverless backup system is shown in Figure 13-6. Figure 13-6. A Simple, Serverless Backup Design
Serverless Backup Data Movers
The key architectural element in serverless backup is the dual-mode controller, which is more commonly referred to as a data mover. The data mover receives EXTENDED COPY commands as well as generating READ and WRITE commands to storage subsystems in the SAN. Theoretically, the data mover in a serverless backup system can be located in any system, subsystem, or device connected to a SAN, including computer systems, disk subsystems, tape subsystems, and networking equipment. There are many possible designs and architectures. The EXTENDED COPY command contains information about the target/LUN address, the block locations where the data is located, and the command to perform. For instance, when reading data in a serverless backup process, the backup engine's initiator sends an EXTENDED COPY command that tells the data mover to read a certain group of blocks from a certain subsystem port and LUN address. When the data mover completes the READ command, it sends a response for the EXTENDED COPY command confirming a successful READ to the backup engine. Then another EXTENDED COPY command is generated by the backup engine for the data mover to WRITE data to a tape subsystem at a given target/LUN address in the SAN. Serverless Backup Engines
Like other backup systems, serverless backup systems need to maintain backup metadata and recover from processing and transmission errors that sometimes occur. In other backup system designs, there is a direct SCSI connection between the initiator directing backup transmissions and the target receiving them. Acknowledging the completion of a file copy or recognizing problems with a transfer are relatively straightforward processes. However, in serverless backup, the data mover is a second, proxy initiator that executes SCSI commands on behalf of the backup engine. This means that all commands, responses, and status/error messages need to be conducted through the EXTENDED COPY mechanism. Obviously, the engine software in a serverless backup system needs to be designed for a completely different processing model. Serverless Backup Agent Software
Backup agent software used in serverless backup systems also requires different designs. Instead of transferring data over a LAN or a SAN, the backup agent is responsible for identifying the block storage locations that are to be transferred during EXTENDED COPY operations. This means that the agent needs to get block address locations from the file system instead of data. While this might seem like a small detail, it is not, because file systems are typically designed to deliver bytes of data to applicationsnot the block locations for data. It is essential that other system processes that run during serverless backup do not change the block locations of data after the agent has transmitted them to the backup engine. This includes the COW process described earlier. For instance, an alternative COW method that copies old data to temporary storage for backup processing and writes new updates over old data locations could result in inconsistent backup. NOTE No insult is intended toward any type of COW. As with nearly all storage processes, there can be many workable designs. COW is one example. Either old or new data could be placed in temporary storage and processes developed to make things work correctly. It is also possible to make adjustments to serverless backup EXTENDED COPY commands and processes to accommodate such changes. It is important not to take your COW for granted. Love and know your COW.
Virtualization and Serverless Backup
Serverless backup depends on sharing precise block storage information between the application server's file system and the data mover in the SAN. For that reason, the data mover and the application server must use the exact same virtualization "lens" to access the blocks of data. If the virtualized view of data is different, serverless backup will be inconsistent and worthless. |