Software Development: Building Reliable Systems

Disk Storage Architecture

Traditionally, disk storage was attached to a single host. Workstations, Unix servers, and mainframes each had their own directly attached storage. Three types of disk arrays are commonly found on servers:

If you wanted to share data between hosts , you typically copied the data to the remote hosts storage system. There was no easy way to share storage devices other than a few dual-ported disk subsystems that might support two hosts at most. This led to the creation of server-centric storage islands as illustrated in Figure 16-3

Figure 16-3. Host Attached Storage

The next major architectural change in storage was to build large storage systems, front-ended by intelligent controllers, that could attach to many (several dozen or more) servers. This approach offered the advantage of being able to more easily share your available storage, possibly between heterogeneous hosts. If today you need 500 GB on a mainframe and 100 GB on a Unix server, but next year your plans call for 100 GB on the mainframe and 500 GB on the Unix server, then this is a good approach. Entire companies, such as EMC, have based their building of large storage systems using this exact approach.

However, upon closer examination, such centralized storage approaches do have their drawbacks. The wide diversity of applications supported in a typical enterprise today have vastly different storage requirements. For instance, Online Transaction Processing (OLTP) systems typically perform a large number of small, random writes to a storage device. A disk subsystem optimized for OLTP typically would include a large nonvolatile cache to speed up database synchronous write operations. On the other hand, an Online Analytical Processing (OLAP) application, such as a data warehouse, typically performs large read operations and simply requires the highest disk read throughput available. A large disk system cache might do nothing but slow down the operation of such a system. It is difficult to satisfy both requirements with a single storage subsystem.

Furthermore, as an organization grows, even the largest centralized storage subsystem becomes too small and you have to add another. With large centralized systems there is a high marginal cost to add a new storage subsystem. Eventually, what happens is that you simply extend your original server-centric storage islands to become larger storage-centric islands as shown in Figure 16-4.

Figure 16-4. Centralized Storage

The best possible solution to the storage problem will likely come from network attached storage subsystems. Such systems are being enabled by the rapid adoption of Fiber Channel Arbitrated Loop (FC-AL) disk drives and associated hubs and switches. Such an architecture allows you to deploy storage networks using the FC-AL standard. This allows you to mix and match various FC-AL storage subsystems depending on the storage requirements of your particular application. Much like in Ethernet networks, the storage devices are shared between all hosts on the FC-AL network. When needed, intelligent controllers can be used to provide nonvolatile write caching, RAID, mirroring, or even orthogonal storage functions such as backup and recovery. The resulting architecture would look much like that shown in Figure 16-5.

Figure 16-5. Network Attached Storage

Категории