Storage Networking Fundamentals: An Introduction to Storage Devices, Subsystems, Applications, Management, and File Systems (Vol 1)
Some of the biggest challenges in owning and developing a SAN are identifying potential problems and planning to minimize their impact. This section discusses a few topics that should be considered when implementing volume management and SAN virtualization systems. The SAN Network Environment
Although storage subsystems are not the subject of this chapter, comparing their connection architectures with volume managers and SAN virtualization systems can be useful. The connection environment inside a storage subsystem is very well known and predictable. There are few surprises, if any, involving downstream communications within the subsystem. More importantly, the risk of the interconnect failing inside a disk subsystem is minimal due to designs with passive backplanes and protected connectors. As is often true, the greatest strength of a technology is also its main weakness. In this case, the flexibility of volume managers and SAN virtualization systems creates greater ambiguity about the connecting environment. Volume managers and SAN virtualization systems have more networking and storage variables to work with than storage subsystems. Whereas the reliability of a disk subsystem is essentially a function of disk drive failure modes, the reliability of SAN virtualization systems includes such additional networking variables as GBICs and HBAs in the virtualization system or line cards in switches. Not only that, but potential congestion problems in the SAN can also become a reliability issue if they become serious enough. SAN virtualization systems may be more sensitive to network congestion problems due to the actions required of proxy LUs and initiators to manage I/O communications. Unlike SAN virtualization systems, volume management software does not depend on any additional hardware in the system. However, the more connections that exist between systems and storage, the more likely it is that problems will occur. Insofar as volume management software may use more SAN connections (fan-out), there is a corresponding increased probability that a connection failure will affect a system running volume management software. Multipathing with Volume Managers and SAN Virtualization Systems
Volume managers and SAN virtualization products are often implemented along with multipathing technology in order to overcome connection failures that might occur. For SAN virtualization systems this means redundant SAN virtualization systems may be used in different paths. Host systems with two initiators could connect to two different SAN virtualization systems, both providing the same virtualization services with access to the same downstream storage resources. To do this the virtualization systems need to agree on the LU UUID or serial number for the proxy LUs that communicate with host systems. The best way to implement SAN virtualization systems with multipathing is probably in cluster configurations, which are discussed in the next section. The situation with volume managers is much easier to understand because volume managers typically operate between file system and multipathing software in the I/O stack in host systems. Multipathing software running beneath the volume manager defines the connections between multiple initiators and a single logical unit in a disk subsystem. A volume manager might "see" a single storage resource, while multipathing software "sees" two or more ways to access that resource. Figure 12-14 is the same drawing of the I/O stack as in Figure 12-2, except for the presence of multipathing software. Figure 12-14. Storage I/O Software Stack, Including File Systems, Volume Manager, Multipathing, and Storage Device Drivers
SAN Virtualization System Clusters
A single SAN virtualization system carries the risk of being a single point of failure. To alleviate this risk, multiple SAN virtualization systems can be used in a clustered configuration. Host systems using the virtualization cluster need multipathing software that allows them to access other cluster members to continue working. The subject of system clustering is complex, especially when you consider the cluster may be supporting high-performance I/O between servers and storage. To keep the discussion simple, this analysis focuses primarily on two-node clusters. All nodes in the SAN virtualization cluster must be able to access all storage used by the cluster. Furthermore, they must share the same configuration information that defines the various virtualization lenses for all hosts using the cluster. As long as any cluster member can present a host system with the correct virtualization lens, a host system can access its storage through any of the cluster members. One of the main challenges of operating a SAN virtualization cluster is dealing with any and all I/O actions that may be pending when a failure occurs on a cluster member. There is a chance that newly updated data may have been written to write-back cache memory on the failed system with stale data still on disk storage. It is essential that subsequent reads access the most recent copy of data, whether it is in cache or on disk. A simple solution to this problem is to turn off caching in the cluster. That way, there is no risk of data consistency problems. However, this might not be acceptable if the SAN virtualization cluster is being used for high-throughput I/O applications. A better way to work is to ensure that cache contents are shared between cluster members. One of the ways to do this is to mirror cache contents between nodes in the cluster. With mirrored cache, a write I/O is written to cache in both cluster members before it is acknowledged to the host initiator. This architecture is similar to that used for synchronous remote copy operations, as discussed in Chapter 10, "Redundancy Over Distance with Remote Copy," except the connection speeds between the cluster members is much faster. |