ActionScripting in Flash MX

6.3 Fault-tolerant, high-availability, and clustering systems

The approach taken for service protection depends largely upon the budget, the relative importance of service outages to the business or organization, and the type of data being protected. The network designer may choose from a basic high-availability solution through to a full fault-tolerant system. These systems can be described as follows:

Figure 6.13 illustrates two of these systems.

click to expand Figure 6.13: General architecture for servers in (a) fault-tolerant mode and (b) clustered high-availability (HA) mode.

Fault-tolerant solutions are explicitly designed to eliminate all unplanned and planned computer downtime by incorporating reliability, maintainability, and serviceability features. Fault-tolerant systems claim as much as 99.999 percent uptime (colloquially known as five-nines). High-availability systems can deliver average levels of uptime in the 99 to 99.9 percent range.

6.3.1 Maintaining service levels

When we talk about system availability, we need to differentiate between data loss, transaction loss, and loss of service. These issues are tackled via different techniques. Data can be protected by recording them simultaneously to multiple storage devices; the most widely used techniques are disk mirroring and RAID. Fault-tolerant systems offer complete protection at the transaction level but are typically very expensive and do not scale. In contrast, HA solutions offer cost-effective and scalable protection; however, transactions may be lost. Regardless of the solution provided, overall service availability depends heavily on the architecture of the applications. Application-based redundancy refers to application-level routines designed to protect data integrity, such as two-phase commit or various data replication processes. Simple applications generally have no protection against transactions lost in midsession; once data are committed they are gone for good. Sophisticated database management systems that support a two-phase commit model are typically much more robust against data loss.

6.3.2 Application models and availability

In general the purpose of improving availability is to protect mission- or business-critical services; it is, therefore, important to understand the dynamics of these services in detail to ensure that the solutions provided are appropriate for, and consistent with, application behavior. Networked applications today often use a multitier architecture based on a client/server or distributed model. For availability purposes, it is useful to consider applications as three layers: communications services, application services, and database services, since the availability issues for each of these layers can be quite different, as follows:

At a large central site, the three layers are often distributed across several server platforms, in which case it is generally possible to provide a tailored availability solution for each layer. For example, fault-tolerant communications servers can provide continuous availability for higher-level communications services, enabling front-end applications to route messages among multiple back-end systems and to store and later submit transactions if back-end systems are offline. For smaller sites it is normal practice to select a single solution that is most suited for the needs of all three layers.

With fault-tolerant systems much of the hard work is transparent to the network designer, so these systems can be relatively straightforward to implement. However, fault-tolerant systems have finite resources and do not scale (so even FT systems may have to use clustering techniques). HA solutions provide scalability and can provide equivalent availability; an HA solution can be harder to implement, since much of the workings of the HA solution are exposed and may require special tuning or have topological restrictions. Currently there are no generic communications solutions that provide the equivalent level of reliability provided by two-phase commit database management software (although sessions may be maintained by HA solutions, they typically cannot guarantee against transaction loss). Further complications arise from the heterogeneous nature of many communications environments, as well as the need to incorporate existing legacy protocols, multiple operating systems, and a variety of networking devices in the end-to-end delivery path. All of these factors impact reliability.

6.3.3 Fault-tolerant systems

At the high end of the server market there are a number of vendors that offer truly fault-tolerant machines (such as Tandem [11] and Stratus [12]). These machines are designed to eliminate most single points of failure within the internal processor, memory, IO, and power subsystems. They also typically offer multiple network adaptors. Key features to look for in a fault-tolerant system are as follows:

The most resilient of fault-tolerant architectures include full hardware-based fault tolerance. Within these systems, hardware is engineered to include continuous self-checking logic, and all of the main subsystems are physically duplicated (CPUs, main memory, I/O controllers, system bus, power subsystems, disks, etc.). Self-checking logic is typically resident on every major circuit board to detect and immediately isolate failures. Since every separate element of the computer is duplicated, normal application processing continues even if one of these components should fail. In such systems, hardware-based fault tolerance is transparent to application software, and there is no performance degradation if a component failure occurs. Also, logic self-checking allows data errors to be isolated at each clock tick, assuring that erroneous data never enter the bus and, as a result, cannot corrupt other parts of the system. Finally, onboard diagnostics built into continuous availability system architectures often automatically detect problems before they lead to failures and initiate service instantaneously should a component fail.

Conventional computers (even in HA mode) are not fault tolerant; they have many single points of failure that can crash the system running the end user's mission-critical application. Their goal is to recover as soon as possible from a crash, rather than to assure that a crash does not occur in the first place. Recovery times can vary from seconds to many minutes depending on the complexity of the application and the communications involved. Recovery can take much longer if the application must be returned to the point where it was before the failure occurred. In many cases it may actually be impossible to recover the application context, and users may simply have to log in and start again. Disks may have to be resynchronized, databases synchronized, and screens refreshed. If a crash occurs in the middle of a transaction, corrupt data may enter the system, entailing additional time and cost to rectify. Obviously, transaction data also may be permanently lost during a system crash. Conventional computers reconfigured for high availability rely on layered systems software or custom application code residing above the operating system in order to help an application recover from a system crash. These configurations, however, have limited capabilities to identify hardware failures. They cannot detect transient hardware failures, for example. As a result, although the hardware platform may continue to run, the mission-critical software application can be rendered useless by bad data.

While fault-tolerant systems clearly have their advantages, there are several issues with fault-tolerant systems from the designer's perspective: they tend to be very expensive; since the system sits in a single location this is a single point of failure, finite resources mean that for very high traffic volumes there can be scalability issues.

Operating systems

The server operating system can be proprietary or an industry standard OS (such as UNIX or Windows NT/2000). Clearly, the more standard the OS the more likely that more applications will be available to run on the platform. For example, the initial Stratus fault-tolerant platforms ran the proprietary Virtual Operating System (VOS). Stratus subsequently released support for two flavors of UNIX (FTX and HP/UX). Stratus Continuum systems are based on the HP PA-RISC microprocessor family and run a standard version of HP-UX. These systems are reportedly fully ABI and API compatible with HP-9000 servers and can run both HP and third-party software without modification. Stratus recently offered support for Windows 2000 via its Melody fault-tolerant server platform.

Scalability

While fault tolerance can be essential for mission- or business-critical servers, these systems have finite resources. As traffic and transaction levels increase, fault-tolerant systems eventually run out of steam, and an intelligent way of clustering systems (either with or without fault tolerance) is required. Clustering builds availability into a solution through external system redundancy and control procedures, much the same way that fault-tolerant systems internalize those processes. Clustering enables systems to provide scalability through modular addition of further systems into a cooperating group.

Example fault-tolerant application

Fault-tolerant systems are commonly routinely employed to support Automated Teller Machine (ATM) and Point of Sale (POS) card authorization systems, since these systems typically support transactions on a global, 24/7. These systems are characterized by continuously available front-end communications, authorization, and logging service, with back-end mainframe systems handling the customer account databases and settlements. With a fault-tolerant communications front end, service can continue during periods where the back-end systems are unavailable. Another common application is a 24/7 call center, integrated either with a customer service or an order-entry application. By deploying fault-tolerant systems in the call center front-end, the call center can provide temporary processing or transaction capture if the back-end database systems are unavailable. The front end can also route transactions to multiple back-end systems if required.

Fault-tolerant systems may be used to support applications such as mission-critical Intelligent Network (IN) elements and Service Control Points (SCPs). These systems provide application and database services to the voice switches using the SS7 protocol running over WAN links. SCP applications require very high reliability and rely on large in-memory databases to fulfill the subsecond response times required. Network management systems, especially those used in large enterprises or for telecommunications network monitoring, may be deployed using fault-tolerant systems. These applications often keep large amounts of network status information in memory, and some fault-tolerant systems can provide persistent memory for in-context recovery after system failure.

6.3.4 Clustering and high-availability systems

Application and file servers are typically key resources on a network. For many reasons IT managers often prefer to centralize these resources, whether for cost, management, or performance reasons. Since many organizations are heavily dependent on these services, it is imperative that you protect them by deploying some fault-tolerance measures. In the past many enterprise networks were often designed with multiple routers between LAN segments in order to provide redundancy. The effectiveness of this design was limited by the speed at which the hosts on those LANs detected a topology failure and switched to an alternate router. As we have already discussed, many IP hosts tend to be configured with a default gateway or are configured to use Proxy ARP in order to find the nearest router on their LAN. Forcing an IP host to change its default router often requires manual intervention to clear the ARP cache or to change the default gateway. This section describes how this and other problems can be resolved. The main approaches we will consider are as follows:

There are many vendors currently offering standard and proprietary-load balancing hardware or software products. The techniques vary widely and have advantages and disadvantages that we will now examine.

Design features of HA systems

Specialized networking devices, such as routers, switches, and firewalls, generally use purpose-built hardware designs, particularly at the high end, that enable fault tolerance or simply improve reliability. These include the following:

Server mirroring

Server mirroring enables servers to be distributed, eliminating the problems associated with a single location while also enabling more cost-effective platforms to be purchased. With this technique, a backup server simultaneously duplicates all the processes and transactions of the primary server. If the primary server fails, the backup server can immediately take its place without any downtime. Server mirroring is an expensive but effective strategy for achieving fault tolerance. It's expensive because each server must be mirrored by an identical server whose only purpose is to be there in the event of a failure.

This technique generally works by providing hot standby servers. A primary and backup server exchange keep-alive messages; when one stops transmitting, the standby system automatically takes over. These solutions work by capturing disk writes and replicating them to two live servers. Each piece of data written to the volume is captured on the backup server. The backup server is always online, ready to step in (this can take anything from milliseconds to minutes depending upon the architecture of the system). Products in this category are available from several vendors, including Novell (SFT Level III), Vinca Corp (StandbyServer), IBM (HACMP/HAGEO) software, and HP's SwitchOver/UX. A less expensive technique that is becoming more and more popular is clustering.

Clustering techniques

Clustering is a technique generally associated with the grouping of two or more systems together in such a way that they behave like a single logical system. Clustering is used to provide a scalable, resilient solution for mission-critical networking components such as servers, routers, firewalls, and VPN appliances. Clustering is used for parallel processing, for load balancing, and for fault tolerance. Clustering is a popular strategy for implementing parallel processing applications, because it enables companies to leverage the investment already made in PCs and workstations. In addition, it's relatively easy to add new CPUs simply by adding a new PC to the network.

Clustering may be achieved using proprietary protocols and signaling systems or via standard protocols. Protocols such as VRRP or HSRP offer a crude form of clustering for routers.

Clustering software

High-availability server clustering software is available from a number of vendors, including Hewlett-Packard's MC/ServiceGuard [13], IBM's HACMP [5], and Microsoft's Cluster Server software (Wolfpack) [14]. These enable multiple servers, in conjunction with shared disk storage units, to rapidly recover from failures. Whenever a hardware or software failure is detected that affects a critical component (or an entire system), the software automatically triggers a failover from one cluster node to another. Data integrity and data access are preserved using a shared-access RAID system. Application processing and access to disk-based data are typically restored within minutes (recovery times will vary depending upon specific characteristics of the application and system configuration). To ensure proper failover behavior, the user must customize the configuration to match the environment by creating a number of failover scripts. Management of a cluster is generally more complex than for a single system, since multiple systems must be managed and configuration information must be consistent across systems. Software upgrades and other hardware or software maintenance operations can be done with minimal disruption by migrating applications from one node in a cluster to another.

Availability of this software is dependent upon the operating system used. Router and firewall platforms typically rely on proprietary, or at least heavily modified operating systems, which means that this software is unlikely to be available. These devices either implement proprietary clustering protocols or standards-based protocols such as VRRP.

6.3.5 Virtual Router Redundancy Protocol (VRRP)

Virtual Router Redundancy Protocol (VRRP) is a standards-based protocol [15]. VRRP enables devices (normally routers) to act as a logical cluster, offering themselves on a single virtual IP address. The clustering model is simple but effective, based on master-slave relationships. VRRP slave devices continually monitor a master's status and offer hot standby redundancy. A crude form of load sharing is possible through the use of multiple virtual groups. VRRP is similar in operation to Cisco's proprietary Hot Standby Redundancy Protocol (HSRP) [16]. Although primarily used for fault-tolerant router deployment, VRRP has also been employed with other platforms (such as Nokia's range of firewall appliances [17]). The current version of VRRP is version 2.

The real problem VRRP attempts to address is the network vulnerability caused by the lack of end-system routing capabilities on most workstations and desktop devices. The vast majority of end systems interact with routers via default routes; the problem with default gateway functionality is that it creates a single point of failure—if the default router goes down, then all host communications may be lost outside the local subnet, either permanently or until some timer has expired. A mechanism was required to solve this problem quickly and transparently, so that no additional host configuration or software is required. VRRP solves this by clustering routing nodes that reside on a common shared media interface, offering them to end systems under a single virtual IP address. End systems continue to use the default gateway approach; nodes within the VRRP cluster resolve who should forward this traffic. Before proceeding any further let us reflect on the following definitions:

It is worth pointing out that VRRP is essentially a LAN-based protocol. To my knowledge there are no VRRP implementations available for wide area interfaces (although multiaccess technologies such as Frame Relay or SMDS could conceivably support it). Since the default gateway problem does not manifest itself in the wide area, it makes little sense to use VRRP on WAN interfaces, and dynamic routing protocols generally do a much better job.

VRRP packet format

VRRP messages are encapsulated in IP packets (protocol 112) and addressed to the IPv4 multicast address 224.0.0.18. This is a link local scope multicast address. Routers must not forward a datagram with this destination address regardless of its TTL, so the TTL must be set to 255. Just to ensure that the packet is not forwarded, a VRRP router receiving a VRRP packet with the TTL not equal to 255 must still discard the packet. The function of VRRP messages is to communicate priority and status information. In a stable state these messages originate from the master only. (See Figure 6.14.)

Figure 6.14: VRRP packet format.

Field Definitions

VRRP operation

VRRP operations are fairly straightforward and are summarized as follows:

If Proxy ARP is to be used on a VRRP router, then the VRRP router must advertise the VMAC address in the Proxy ARP message; otherwise, hosts might learn the real MAC address of the VRRP router.

Example design—simple hot standby

Figure 6.15 illustrates a topology where VRRP is used between two routers to provide resilience for client/server access for two LANs. In this configuration, both routers run VRRP on all interfaces, and on both LAN interfaces both routers simultaneously participate in a single VRRP group. Note that the VRIDs used could be the same in this case, since the two broadcast LANs are physically separate.

Figure 6.15: VRRP configuration with resilience for highspeed server farm.

End systems on the client LAN (VRID-1) install a default route to the virtual IP address (194.34.4.1), and Router-1 (with a priority of 254 on this interface) is configured as the master VRRP router for that group. Router-2 acts as backup for VRID-1 and only starts forwarding if the master router dies.

End systems on the server LAN (VRID-2) install a default route to the virtual IP address (193.168.32.12), and Router-2 (with a priority of 254 on this interface) is configured as the master VRRP router for that group. Router-1 acts as backup for VRID-2 and starts forwarding only if the master router dies.

This configuration enables full transparent resilience for both clients and servers. The hosts require no special software or configuration and are oblivious to the VRRP operations.

The more observant of you may have noticed that in the topology shown in Figure 6.15, we have effectively created asymmetrical paths across the VRRP cluster; traffic from the client network (VRID-1) is forwarded via Router-1 and is returned from the server network (VRID-2) via Router-2. It would have been just as easy to force the path to be symmetrical by making Router-1 master on both interfaces. In this scenario asymmetry is not a problem, assuming both routers are evenly resourced; in fact, this configuration distributes some of the workload between routers. In cases where the routers have very different performance characteristics (i.e., processor speeds and buffer sizes), this would not be advisable. In such cases the router with the most resources should be configured as master for both interfaces, or at least master for the server side configuration (assuming the bulk of the traffic is server-to-client oriented). Path asymmetry can also be an issue for VRRP routers that also offer firewall applications (session states may be maintained between firewalls), and, depending on the state update frequency, Router-1 and Router-2 may be out of synchronization.

Note also that in this configuration Router-2 is not backed up by Router-1 on the 194.34.4.0 network, and Router-1 is not backed up by Router-2 on the 193.168.32.0 network. This can be achieved by configuring another virtual router on each LAN, this time with alternate primary and backup routers. This configuration could be enhanced quite easily to support full load sharing in addition to backup.

VRRP issues

The VRRP default router technique is relatively simple and effective but not without its drawbacks, and there are some subtle VRRP configuration issues that can be difficult to analyze for inexperienced engineers. These include the following:

Assuming that we run a dynamic routing protocol, or R1's VRRP process somehow gains knowledge of the broken link, it will at best start sending ICMP redirects to clients on the 194.34.4.1 interface, redirecting them to the real IP address of R2. This can lead to further problems, since R2 sees that VRRP operations on the client side are working well (R1 never actually relinquishes master status for VRID-1). This results in routing inefficiencies, since every new user session must be explicitly redirected (some client stacks also handle ICMP redirects badly). To solve this problem some vendors allow the monitoring of specified interfaces, so that transitions on those interfaces automatically trigger a change to the VRRP master status. In this case we could monitor 140.0.0.1, and any failure would be treated as a soft system failure, so the master stops advertising or lowers its priority to force reelection (note that this feature is not specified in the standards).

This works, but there is yet another subtle problem that this enhancement does not address. Since dynamic routing knowledge is not available to VRRP, in some scenarios it is quite possible that a dynamic routing protocol (e.g., RIP or OSPF) will be announcing a more optimal next-hop address, based on superior topological knowledge. In Figure 6.16 consider a failure of interface 140.0.0.1; assuming circuit monitoring is available, this will result in VRRP transitioning, so that R2 starts forwarding traffic as expected. However, OSPF (running concurrently on R1) may announce a better route to 140.0.0.0 via R1 and network 150.0.0.0 (e.g., there could be problems upstream of R2 at the interface to 140.0.0.0 that VRRP is unaware of). These problems could include the following:

VRRP is clearly useful but can be problematic for anything other than simple clustering applications. Subtle interactions with ARP, ping, and interior routing protocols often result in confusion for engineers, making diagnostic work protracted. VRRP does not provide efficient load sharing; in practice it distributes traffic on a node-node basis (i.e., not the session or packet level). This means that a heavy traffic producer always goes to the same gateway regardless. For large client populations there is no easy way for a network administrator to automate allocation of default gateways fairly (ironically this is the scenario VRRP would most usefully benefit if used as a quick fix). In summary VRRP is a useful but very basic tool. For real high-bandwidth load sharing and fault-tolerant applications a more granular, more intelligent, transparent clustering technique is required.

6.3.6 Hot Standby Routing Protocol (HSRP)

The Hot Standby Router Protocol (HSRP) predates VRRP and is described in [16]. HSRP is a Cisco proprietary protocol (see Cisco's patent [18]) with functionality similar to VRRP. HSRP handles network topology changes transparent to the host using a virtual group IP address. HSRP has its own terminology, as follows:

HSRP is supported over Ethernet, Token Ring, FDDI, Fast Ethernet, and ATM. HSRP runs over the UDP protocol and uses port number 1985. Routers use their actual IP address as the source address for protocol packets, not the virtual IP address. This is necessary so that the HSRP routers can identify each other. Packets are sent to multicast address 224.0.0.2 with a TTL of 1. As with VRRP, an HSRP group can be defined on each LAN. One member of the group is elected master (the active router), and this router forwards all packets sent to the HSRP virtual group address. The other routers are in standby mode and constantly monitor the status of the active router. All members of the group know the standby IP address and the standby MAC address. If the active router becomes unavailable, the highest-priority standby router is elected and inherits the HSRP MAC address and IP address. HSRP typically allows hosts to reroute in approximately ten seconds. High-end routers (Cisco 4500, 7000, and 7500 families) are able to support multiple MAC addresses on the same Ethernet or FDDI interface, allowing the routers to simultaneously handle both traffic that is sent to the standby MAC address and to the private MAC address. As with VRRP, if multiple groups are configured on a single LAN, load sharing is possible using different standby groups and appropriate default routes in end systems.

Differences between HSRP and VRRP

The main differences between VRRP and HSRP are as follows:

The same reservations I made for VRRP in sophisticated designs apply to HSRP also. For further information on HSRP, the interested reader is directed to [16, 19].

6.3.7 Proxy server and interception techniques

There are a number of techniques used by gateways and proxies to enable load balancing as a kind of value-added service. These techniques usually work on the premise that since these proxies are often placed in strategic network locations (say in front of a server farm or on the perimeter WAN interface), and they need to modify or respond to requests as part of their basic function, they might add value without the user knowing. Note that the term balancing here is generally misleading, and the term sharing is perhaps more appropriate. Most of these techniques rely on quite crude methods to distribute load, and generally there are no guarantees about the actual load levels. Techniques in this class include the following:

When deploying proxy load balancers, careful attention should be paid to the design so that the balancer may become a single point of failure. If the load balancer dies, then remote clients may not be able to reach any of the servers in the cluster behind it. Since this functionality is increasingly being integrated into general-purpose load-sharing systems, switches, and firewalls, these devices can often be clustered to provide appropriate levels of fault tolerance. Many of the techniques listed above are closely associated with load sharing and performance optimization.

6.3.8 Other related techniques

There are a number of other protocol techniques used to provide clustered fault tolerance, some standard, some proprietary, including router discovery protocol and IGP default routes.

6.3.9 Combining HA clusters with fault-tolerant servers

An architecture that combines HA clustering and fault-tolerant servers in a front-end/back-end configuration provides customers with the best combination of performance, flexibility, scalability, and availability. High-end servers, combined with MC/ServiceGuard software, provide a robust, scalable, back-end database service. Fault-tolerant servers provide a continuously available front-end communication service. Application services can run on the back end, front end, or both, depending on the specific application requirements. In some cases, the Application Services layer may warrant a separate set of server systems, again depending on the particular application environment and availability needs. This front-end/back-end architecture is actually very similar to the traditional mainframe architecture that has supported enterprise applications for over 30 years. Mainframes have used an intelligent communications controller to offload communications processing from the host (much of this communication is IBM SNA) and to allow routing of transactions among multiple hosts, providing both higher availability and load sharing. The combination of approaches brings the benefits of this traditional architecture to the world of open systems.

Deciding between an HA cluster and a fault-tolerant system will depend on the particular characteristics of the application and operational features of the network. HA clusters are a good choice if the following guidelines are present:

A fault-tolerant system is a good choice if the following guidelines are present:

Категории