IP Storage Networking: Straight to the Core

Perhaps no other area of storage management has gotten as much attention as virtualization. One explanation is that no other area of storage management has effectively covered such a wide variety of functions under one banner. For better or worse , virtualization has cast its shadow far and wide, positioned by many companies, large and small alike, as the panacea to solve all storage problems.

The rise of virtualization as a hot storage topic may be more indicative of customer pain than technological innovation. Storage management has become increasingly complex as users balance features, functions, and vendors in a myriad of combinations. In fact, the dizzying array of options would send any rational human being towards a solution that can " virtualize " the pain away.

Looking more closely into what makes up virtualization, we see that many features bundled under this banner have been prevalent in enterprise storage management for years . Therefore, we will define virtualization by looking at what it does as opposed to what it is.

3.8.1 Basic Concept of Virtualization

In its simplest form, virtualization presents heterogeneous storage homogenously. More specifically , it provides a mechanism to aggregate different storage devices into a single storage pool that can be parsed and allocated through central administration. This process creates a presentation layer of logical views of physical storage devices that can come from multiple vendors.

This basic concept is shown in Figure 3-17A. In this environment all of the storage devices are virtualized into a common pool that can be distributed across the application servers accordingly . Administrators no longer need to know how each storage device is configured, where it is located, or what the capacity or utilization may be. Since all capacity gets lumped into a common pool, the virtualization engine will take care of the rest. In this respect, virtualization simplifies storage management by inserting an abstraction layer between the storage and the user assignment of that storage.

However, the reality of the situation is quite different. Aggregating storage into a single pool works well when devices share common characteristics and can be treated equally. In practice, organizations maintain a wide variety of equipment that translates into unique capacity, performance, and functionality characteristics for each device. Therefore, the representative environment will more likely be similar to that in Figure 3-17B.

Figure 3-17. Virtualization in homogeneous and heterogeneous environments.

In this case, value-added features of each storage device may be lost in the virtualization process, hampering some of the initial goals. For example, some storage devices may have disk drives and RAID sets configured more for speed than availability. Simply aggregating such devices into a larger pool of available storage reduces their value and contribution. To address this issue, some vendors implement storage classes within the virtualization process. However, taken to the extreme, each device, or group of homogeneous devices, would be in its own class, thereby restricting the capability to create an aggregated pool of storage.

Freedom to choose among storage devices and to substitute one for another has been a driving force behind virtualization. The ability to break the lock-in of one particular vendor by substituting another compels customers to evaluate such solutions. Stepping back, that may represent just a shift of the vendor lock-in from the storage device vendor to the virtualization vendor.

By virtualizing storage, the virtualization engine facilitates storage flow to and from the devices and represents a central clearinghouse for this information. Placing a variety of storage devices within an aggregated pool, front-ended by a virtualization engine, implies that the virtualization engine controls access to the entire pool. Further, consider the processing involved with the recovery of one of the devices in a virtualized pool. Of course, backing up may be simple enough, with the various devices feeding into a single stream to a consolidated backup device such as a tape library. But what about the restore? The virtualization engine must remember where each piece of data resided and return it to its initial location. While this may be technically feasible , the time and processing required to complete such an operation makes the recovery of virtualized storage ominous.

The reality leads one to understand that control points within the infrastructure matter and must be examined closely. Chapter 4, "Storage System Control Points," takes a detailed look at this subject.

3.8.2 Separating Virtualization from Data Coordination and Protection Functions

Virtualization functions have been a part of data storage infrastructures for many years, and they will continue to be. Reactions to virtualization in the storage networking industry frequently boil down to debates about definition rather than function.

In fact, the core aspects of virtualization are the same as data coordination functions like volume management. Veritas Software has delivered this functionality in its volume manager and file system products for years, allowing administrators to create virtual file storage or volumes from LUNs of varying arrays. Typically, Veritas Volume Manager (VxVM) or File System (VxFS) implementations do not span virtual volumes across multivendor environments. But no technical reasons exist to prohibit such use. New virtualization solutions have taken this simple concept one step further and focused on volume management across heterogeneous environments.

Data-protection functions such as backup and mirroring are frequently, and mistakenly, bundled as virtualization solutions. These storage applications do send virtual volumes across local or remote networks, and may rely on underlying virtualization techniques. But bundling these data protection functions into a single virtualization basket only confuses an already misunderstood technology.

3.8.3 Areas of Virtualization Deployment

Basic virtualization functions can be deployed across layers in the enterprise storage environment, including host, fabric, and subsystem. These are illustrated in Figure 3-18 and compared in Table 3-2. Pros and cons exist within each of these areas, and a myriad of implementations are offered by different vendors.

Figure 3-18. Areas of virtualization deployment: host, fabric, and subsystem.

The most common form of virtualization is host-based and has been present in enterprise configurations via volume management and file system applications. The advantages of this approach include support for heterogeneous environments, ease of installation, and delivery of APIs to applications. However, host-based implementations run in software, thereby consuming CPU cycles that could otherwise be used for application processing. Additionally, each host must be configured independently for its own logical disk volumes, which leads to scalability limitations. With each host importing independent LUNs, the LUNs cannot be easily shared across multiple hosts .

Subsystem-based virtualization provides greater performance than host-based implementations through the use of best-of-breed RAID technologies and optimized caching. However, subsystem-based virtualization often applies to only a single array. Some vendors have implemented virtualization capabilities that easily span multiple disk arrays. However, subsystem approaches are vendor-specific and unlikely to exploit features across a variety of vendor platforms.

Fabric-based virtualization comes in two methods that can reside in different areas of the fabric. Figure 3-19 helps clarify the dimensions and effects. The two types, in-band and out-of-band, refer to the placement of the virtualization engine in relation to the data path . In the case of in-band, the engine intercepts all I/O commands and redirects them to the appropriate resources. All command requests and data pass through the engine. This limits overall performance to the performance of the in- band device.

Figure 3-19. Types of fabric virtualization.

With out-of-band virtualization, only command information is passed through the virtualization engine. Once the required meta-information for storage is shared with the initiator, that initiator has direct access to the storage device. This provides the highest level of performance. Essentially, an out-of-band virtualization engine introduces the right host to the right device, then gets out of the way.

Both types of virtualization can reside within different areas of the fabric. One platform for fabric-based virtualization is within appliances. Appliances are frequently rack-mounted Windows or Linux PCs that have a few additional storage ports on PCI cards. These devices provide simplicity, ease of installation, and low cost. An appliance not specifically designed for network may impact performance.

An alternative approach to appliance-based is switch-based virtualization. Switch-based virtualization overcomes speed limitations by using a network-centric platform designed for wire-speed throughput. By placing functionality directly within the switch and providing switch-to-switch redundancy, this approach offers a more scalable architecture. Switch-based virtualization eliminates the need for more devices in the data path, delivers faster reactions to network- related events, acts as a central configuration control for zoning, and assists in better path management.

3.8.4 Understanding the Multivendor Virtualization Game

Looking back at the features of storage management, one can easily be overwhelmed by the laundry lists of features required by enterprise storage deployments. Storage software aims to automate as much of the process as possible, but still leaves a heavy burden on the shoulders of the storage administrator. Virtualization helps alleviate that burden by masking lower layer functions of the storage management chain and presenting clear, intuitive interfaces for tasks like storage allocation and data protection. By doing so, virtualization engines firmly demarcate areas of control within the infrastructure. They serve as central distribution points that dictate the look and feel, as well as the performance and reliability, of the entire configuration.

With standards emerging for this sector, multivendor virtualization solutions may one day be a reality. In the meantime, it is unlikely that these solutions will operate in lock step, and more probable that segmentation and isolation of solutions will take place. Unfortunately, this is exactly the problem that virtualization tried to solve. Segmentation will likely take place based on areas of intelligence, and customers have options to react accordingly. For example, within the subsystem layer, it would seem unlikely for large storage subsystem manufacturers to provide similar feature sets across other vendors' products. Figure 3-20 shows that the interoperability challenge spans the horizontal host, fabric, and subsystem layers. Recognizing the market forces that inherently present such challenges, strategic options for balancing multivendor solutions should focus on the vertical access across areas of intelligence. For example, it is common to support host-based volume management software along with SAN switches and intelligent subsystems. Each layer has the ability to perform some virtualization functions and can be used to balance other areas in the food chain. This approach helps balance both control points and vendor lock-in, and ultimately reduces total ownership costs.

Figure 3-20. Interoperability strategies and challenges for virtualization.

Table 3-2. Host, Fabric, and Subsystem Virtualization

 

Definitions

Benefits

Limitations

Host

Volume management or file-system services that reside in software on the host.

Software intercepts I/O commands and redirects to the appropriately mapped storage devices.

Support for heterogeneous targets.

Delivers API to applications.

Configuration per host

Performance scaling.

LUNs imported by host prohibit sharing.

Fabric

Network-based handling of I/O requests from host. Commands are redirected to mapped physical storage.

In-band virtualization engines reside in the data path handling both redirection and data flow.

Out-of-band virtualization engines reside out-side the data path and maintain metadata for redirection, but data flow remains device-direct.

Virtualization engines can reside in stand-alone appliances or within network switches.

Centralization of storage administration.

Offloading from servers (compared to host approach) offers CPU cycles for applications.

Simple support for heterogeneous environments.

Vendor-specific storage features not applicable to complete environment.

Appliance approachnot scalable.

In-band approach requires massive processing power to avoid impacting data path.

Subsystem

Target-based services that map sets of drives to logical volumes presented to hosts.

Best of breed RAID 0,5

Higher performance than host approach.

Scales to multiterabyte capacity.

Subsystem cache delivers high I/O throughput.

Administration on per-target basis.

Vendor-specificstorage features not applicable tocomplete environment.

No path awareness.

Категории