Data Protection and Information Lifecycle Management

2017-07-07 02:10:07

< Day Day Up >

All kinds of networks can support remote copy, including Fibre Channel, Gigabit Ethernet, and a variety of wide-area networks. The choice of network depends on the topology, the distance that needs to be covered, the amount of data to be moved, and the amount of money that can be spent on the system.

Block-level storage applications are especially sensitive to latency and throughput. Distance drives network latency, and the amount of data to be moved defines the throughput requirement. Remote copy can easily fail if the round-trip delay is too great or if there is not enough bandwidth to move all the data. It is safe to say that more bandwidth and lower network latency are always better.

Tip

Network latency and storage latency are similar. In both cases, latency refers to the time it takes to send and receive data. What is different is the cause of the latency. For storage devices, the root cause of latency is the mechanical properties of the device and media. For networks, latency is a function of the electrical and software properties of the network connection. Both must be taken into account when designing remote copy systems.

Remote Copy and SANs

Remote copy is an application born for SANs. It is a network application that deals with block-level data, just like SANs do. Fibre Channel in particular provides high-bandwidth connections that can extend over 10-kilometer distances.

IP SANs can also be used for remote copy. There are questions as to whether the performance of these SANs is sufficient for remote copy applications. IP SANs based on Gigabit Ethernet and using dedicated storage switches may be capable of performing remote copy. With the advent of long-distance Gigabit Ethernet with distances up to 40 kilometers, IP SANs become an interesting option for moderate throughput remote copy.

SANs and remote copy are often deployed by the same types of organizations. Companies with large amounts of storage usually have a high need for data protection and have SANs in place. SANs become the natural network environment in which to locate remote copy services.

Bandwidth

Bandwidth needs are dependent on the remote copy application, which is different for each vendor. A good rule of thumb is to look at the underlying storage architecture and see what its bandwidth requirements are. If the storage infrastructure consists of 2-gigabit Fibre Channel, and the data transfer is operating at full rate, remote copy may need 2 gigabits of bandwidth.

The cost of network connections increases as more bandwidth is needed and distances increase. It costs much less for a 1-gigabit connection within a local-area network with Fibre Channel or Gigabit Ethernet than for a 1-gigabit connection from a long-distance carrier.

In practice, most storage applications do not run at full bandwidth. Most individual data transfers are less than the maximum bandwidth. Because more bandwidth costs more money, it is important to measure the actual bandwidth the application is using before buying expensive network services or components. Matching storage bandwidth needs with network connections is a key part of remote copy design. Table 4-1 gives a list of some storage connections and corresponding network connections.

Table 4-1. Storage Bandwidth Matched with Network Bandwidth
Link Speed	Storage Connection Bandwidth (Gigabits Per Second)	Network Connection	Network Connection Bandwidth (Gigabits Per Second)
Ultra3 SCSI	1.6 GBPS	Fibre Channel OC-36	2 GBPS 1.866 GBPS
Ultra320 SCSI	3.2 GBPS	OC-96	4.676 GBPS
Fibre Channel (1 gigabit)	1.0 GBPS	Fibre Channel Gigabit Ethernet OC-24	1 or 2 GBPS 1 GBPS 1.244 GBPS
Fibre Channel (2 gigabit)	2.0 GBPS	Fibre Channel OC-48	2 GBPS 2.488 GBPS
SAS	3.0 GBPS	OC-96	4.676 GBPS
iSCSI (Gigabit Ethernet)	1.0 GBPS	Fibre Channel Gigabit Ethernet OC-24	1 or 2 GBPS 1 GBPS 1.244 GBPS

The connection matching assumes that the applications using the storage need the full bandwidth of the connection. This would be the case in a SAN architecture or with an external disk array requiring high bandwidth. In most instances, full-speed network connections will not be necessary, because the applications using the storage will not use all the available link speed.

There are two methods of getting the bandwidth needed from a network connection. The simplest method is to obtain a high-enough bandwidth connection to handle the application's throughput. This can be costly, and in many areas, high bandwidth connections are not available.

Another method is to aggregate several connections at the switch level to provide the required bandwidth. DWDM optical switching products do this well, combining several high-speed network connections into one fiber optic link. This method can also be used with long-distance WAN connections such as T-1/E-1 and T-3/E-3 leased lines. Though bandwidth is small by storage standards (1.54 megabits per second and 44.76 megabits per second, respectively), combining several connections can provide sufficient bandwidth for many remote copy applications.

The Causes of Network Latency

Network latency the time it takes for a packet or frame to get from the source to the destination depends on many factors. These include

The type of medium through which data is being transmitted

Delays in intervening network components such as routers and switches

The distance being traversed

It is important to take these factors into account. Keep in mind the following:

Fiber optic cables are not susceptible to radio and electronic inference, which slows signals. They produce less network latency.

Optical switching is much faster than electronic switching. This reduces the network latency caused by switching packets.

Shorter distances mean that packets don't need to travel as far.

What Causes Delays

There are three types of delays that affect network latency in a network connection.

Propagation delay is the delay caused by raw distance. The fastest that a packet can travel is the speed of light. The greater the distance, the longer the propagation delay.

Transmission delay is the delay introduced by the media and intervening network components. No signal can travel down a wire or fiber optic cable without resistance of some type. Small delays can be caused by radio or electrical interference in a wire or from the light pulses bouncing around in the fiber optic cable.

The time it takes for routers and switches to process a packet and send it on to the next hop also increases the overall delay. The more times a packet has to pass through piece of networking equipment, the more network latency will be introduced. This includes the switching equipment of telco carriers as well as internal networks. These delays are cumulative. The more switches that the packet passes through, the longer the distance the packet travels, and delays caused by running through copper cables all combine to produce the latency of the network connection.

Because all storage was originally local, it works under the assumption that a long delay equals unavailable resources. This is in sharp contrast to network applications, which assume that network latency will happen and are willing to wait longer periods of time.

With remote copy, the problems occur when the application must wait for acknowledgment of a frame. To move on to the next I/O, the application needs to know that the last one was successful. When the application has to wait a long time for the last I/O to respond, it will assume that the storage is no longer available and will fail in some fashion. Long delays caused by the network latency will result in slow acknowledgment of the whole transaction and possible failure of the application.

Vendors of remote copy applications employ a variety of tricks to overcome network latency problems. All these techniques are designed to make it appear that the I/O was completed. Storage applications have also adapted to environments in which network latency is more of a problem, such as SANs. By queuing I/Os, retrying before abandoning, and through caching, applications have become more tolerant of delays in completing storage transactions.

Distances

Distances for remote copy are thought of in the traditional networking manner. Network connections come in local, metropolitan, and wide area or long distance. Local remote copy is performed in the confines of the LAN or SAN within the same building or campus. It can assume very fast connections, usually Fibre Channel and Gigabit Ethernet, and few switches and routers to pass through.

The options are greater for metropolitan areas. The MAN distance is defined as being within the local area of a city or region, usually less than 100 kilometers. Intercity connections that are close together are also considered to be metropolitan. Direct fiber optic links can be leased or built. These can then be used directly by Fibre Channel and Gigabit Ethernet. Native Fibre Channel can be used when the distance is less than 10 kilometers and Gigabit Ethernet when the distance is less than 40 kilometers. High-capacity fiber optic connections such as SONET, OC-48, and OC-96 are also available within large metropolitan areas.

Long-distance remote copy can use any of the data communications connections available for long-haul transmissions, including T-1/E-1, T-3/E-3 circuits, and leased fiber optic lines (dark fiber). Remote copy at these distances, however, poses some difficulties for the system architect. The network latency caused by distance alone is significant and can create problems for remote copy applications. Another difficulty is the type of available network connections. The distances are too long for direct Gigabit Ethernet, Fibre Channel, and SONET protocols. A single fiber optic link is usually not available, so the packets have to be routed through the networks of a telecommunications carrier, adding more delay. To operate over long distance, a remote copy application needs to be fine-tuned very carefully.

Synchronous and Asynchronous Remote Copy

Problems with timely acknowledgment of remote copy I/O have led to two different ways of implementing remote copy. They are synchronous remote copy and asynchronous remote copy.

The synchronous form of remote copy has the host wait for acknowledgment of both the primary disk array and remote array. This is the more secure method of remote copy. The application is assured that all I/Os have been completed on both arrays and that an exact copy of the data and logs exists. If the I/O to the primary array fails, the remote can be rolled back to the same state as the primary and an error produced for the application. If the remote copy fails, the remote copy software can resend the I/O while the application waits for a response. If the response does not come in a reasonable amount of time, the I/O can be rolled back on the primary and an error code created.

Synchronous remote copy assumes that sufficient bandwidth exists on the network link to the remote array to perform I/Os normally. When the primary data path is using 1 gigabit per second of bandwidth, and the remote array is serviced by an OC-24 network link, there will be sufficient bandwidth for synchronous remote copy. In the case in which that link is shared by several applications, there may be times when the I/Os to the remote array cannot be completed in the allotted time, and the connection times out. Depending on the applications, the host may be able to wait for the packet to be resent, or an error may be generated.

With asynchronous remote copy, the remote copy software whether it is housed in an appliance or within a storage device, or is host-based acknowledges the I/O as soon as the primary storage completes it. The I/O is then sent to the remote array and is managed independently of the primary I/O. The host does not have to wait for acknowledgment from the remote array to continue (Figure 4-5).

Figure 4-5. Synchronous and asynchronous remote copy

Even when the I/O to the remote array does not fail, waiting for the acknowledgment can drag down the host's performance. Network latency, retries, and other delays can cause the host to spend time waiting instead of processing data. With asynchronous remote copy, there is no waiting. This is a vitally important characteristic when the network link to the remote array is slow or very long distance.

Asynchronous remote copy has allowed for less costly implementations. Slower connections mean more network latency and retries. With asynchronous remote copy, these have less effect on the overall performance of the host. Lower-bandwidth connections can be used, which cost much less on a recurring basis.

There is a downside to this approach. The host has no way of knowing whether the remote copy actually occurred correctly or at all. In the event of problems on the remote network link, the remote array could become out of sync with the primary array. To mitigate this occurrence, remote copy applications often have a facility for resyncing the data. That is a time-consuming process that has to happen offline, causing downtime in the overall system. With this form of remote copy, the state of the remote array cannot be verified at all times by the host.

It should be noted that the steps involved in remote copy are not always sequential. Some implementations of remote copy will write the I/O to the primary and remote array at the same time. What is important to note is that the host has to wait for both acknowledgments before continuing with the next I/O, whichever arrives first. With asynchronous remote copy, only the acknowledgment from the primary disk array is necessary before the next I/O can begin.

Bunkering

For some organizations, asynchronous remote copy does not afford the level of protection that is needed for critical applications. This is true in the financial-services industry, for example. Synchronous remote copy is used over a short distance, allowing for the use of a high-bandwidth connection such as direct Fibre Channel or Gigabit Ethernet. Metropolitan Area Network connections such as SONET are also used to get high bandwidth over short distances.

When the need exists for long-distance but high-performance remote copy, different architectures are needed. Otherwise, costs will be high and system performance less than what is desired. One such architecture is called bunkering.

With bunkering, a hardened facility (the bunker) is available at a short distance, housing only storage and networking equipment. A separate facility that contains not only storage but also application servers is kept at a far distance. Data is copied, using synchronous remote copy, to the arrays in the bunker, where it is available for use over a high-bandwidth local or MAN connection. The bunker storage acts as a staging ground for asynchronous remote copy over a longer distance but a slower link. From here, data is copied over long distance, using standard data communications links (Figure 4-6). Copies of the data are kept on the primary array, bunkered array, and remote array.

Figure 4-6. Bunkering

Bunkering solves several problems with long-haul remote copy: cost, performance, and link failure. Because the primary storage has already copied its I/O over to the bunker storage, the application is not affected by the slower, less costly connection. If the long-distance connection should fail, there is still the copy of the data in the bunker protecting the data. When the connection is brought back online, the bunker storage and long-distance storage can synchronize without disturbing the primary storage.

Bunkering provides other advantages over direct remote copy. By making three copies of the data instead of only two, the data is safer. In the event of a regional disaster that destroys both the primary data center and the second one, the third version of the data is far enough away to remain unharmed. It can also be an operating data center where employees travel to, allowing the company to return to normal operations sooner.

By staging the data, it is also possible to run backups at one or more of the remote facilities. Backups can be performed at almost any time without disrupting applications. Also, because bunkering requires three facilities, a network connection could be established between the primary and long-distance facilities, allowing remote copy to continue in the event of a major disruption at the bunker. With traditional remote copy, disruption in the network link or remote facility leaves no options for continuing remote copy. Bunkering provides an alternative for businesses that have very high availability and data protection requirements.

Optical Networking

One of the more popular ways of implementing remote copy is through optical networking. Optical networks use light instead of electricity to transmit information. Bits are sent down a fiber optic cable by a light-emitting diode (LED) or laser that pulses on and off. Optical networks have the advantage of high bandwidth and no interference from electromagnetic sources. Whereas a copper wire is subjected to electromagnetic interference (EMI) and radio frequency interference (RFI) from all the electrical and radio sources around it, optical networks are not. This yields a better signal over a longer distance than copper wire.

Fibre Channel and Gigabit Ethernet use optics only for transmitting the signal through a fiber optic cable. They are otherwise completely electrical. An optical network is completely based on light. Dense Wave Division Multiplexing (DWDM) is an optical networking technology used extensively in remote copy applications. It is can be used with leased fiber optic line or with public optical networks such as SONET. It provides 32 to 64 data channels at 2.5 gigabits per second of bandwidth each on each fiber optic strand.

Storage systems connect with optical networking products either through a blade in the storage switch that allows for outbound optical connections, by a converter box that takes in Fibre Channel or Gigabit Ethernet and interfaces it to the optical network, or through a Fibre Channel or Gigabit Ethernet blade in the DWDM switch.

Because DWDM is so costly, it usually cannot be justified for remote copy alone. With up to 64 channels available on one fiber optic cable, other voice and data traffic can be accommodated easily.

Cost Considerations

Remote copy can be a very expensive method of data protection, especially for long distances. When designing remote copy systems, it is important to keep these cost factors in mind:

Host software. Often, the least expensive way to implement remote copy functions is with host-based software. Like all software products, host-based remote copy software becomes more expensive as more hosts are added and more licenses purchased.

Remote copy hardware. Depending on the scale of the solution, disk array systems that perform remote copy can cost in the hundreds of thousands of dollars. Although that sounds like a lot of money, the cost per megabyte of data protected decreases as more storage and hosts are added. Aside from that, these solutions are used because they meet the needs of certain types of organizations better than less expensive ones do.

Local network connections. All remote copy requires network connections, preferably high-speed ones. Gigabit Ethernet and Fibre Channel are the most common types of local network connection deployed for local remote copy. The cost of these types of network connections is dropping quickly, and many are already in use for SANs and server connections.

Remote network connections. Unlike other storage applications, the recurring cost of the long-distance network connection for remote copy often far exceeds the cost of the components of the system. This is true even over the course of a single year. Even with the current glut of fiber optic lines, high-bandwidth, long-distance network connections are very expensive.

Belt and Suspenders: RAID, Backup, and Remote Copy

It is rare to see remote copy used alone. Instead, it is part of a comprehensive data protection plan. Primary data storage is almost always disk based, and the other techniques available for protecting disk storage are combined with remote copy to create widespread protection.

Because they all operate at a block level, remote copy is complementary to RAID and backup. RAID 1, mirroring, copies blocks of data destined for one set of disks to a second set; data is protected against local disk failure. Backups then make offline copies of the RAID volumes, which protect against local equipment failure. Remote copy further copies the data over a distance to protect against a disaster's affecting the entire data center. Used together, RAID, backup, and remote copy protect data against an ever-increasing set of threats.