Scalable Internet Architectures

Wackamole is a product from the Center for Networking and Distributed Systems at the Johns Hopkins University and is an integral part of the Backhand Project (www.backhand.org).

Wackamole's goal is to leverage the powerful semantics of group communication (Spread specifically) to drive a deterministic responsibility algorithm to manage IP address assignment over a cluster of machines. The technology differs from other IP failover solutions in that it can flexibly support both traditional active-passive configuration, as well as power multi-machine clusters where all nodes are active.

In Chapter 4, we briefly discussed the technical aspects of Wackamole; now we can discuss why it is the "right" solution for this problem.

Reasoning

There are several reasons for choosing Wackamole aside from my clearly biased preference toward the solution. The alternative solutions require placing a machine or device in front of this cluster of machines. Although this works well from a technical standpoint, its cost-efficiency is somewhat lacking. As discussed in Chapter 4, to meet any sort of high availability requirements, the high-availability and load-balancing (HA/LB) device needs to be one of a failover pair.

You also might argue that you already have a hardware load balancer in place for the dynamic content of the site, and you can simply use those unsaturated resources as shown in Figure 6.5. This is a good argument and a good approach. You have already swallowed the costs of owning and managing the solution, so it does not incur additional costs while the management of a Wackamole-based solution does. However, another reason has not been mentioned yetgrowth.

Figure 6.5. Simple HA/LB configuration.

The last section of this chapter discusses how to split the solution geographically to handle higher load and faster load times for users. Clearly, if this architecture is to be replicated twice over, the costs of two HA/LB content switches twice over will dramatically increase the price and complexity of the solution.

Although it is arguable that Wackamole is not the right solution for a single site if an HA/LB content-switching solution is already deployed at that site, it will become clear that as the architecture scales horizontally, it is a cost-effective and appropriate technology.

Installation

Wackamole, first and foremost, requires Spread. Appendix A, "Spread," details the configuration and provides other tips and tricks to running Spread in production. For this installation, we will configure Spread to run listening to port 3777.

Wackamole is part of the Backhand project and can be obtained at www.backhand.org. Compiling Wackamole is simple with a typical ./configure; make; make install. For cleanliness (and personal preference), we'll keep all our core service software in/opt. We issue the following commands for our install:

./configure prefix=/opt/wackamole make make install

Installed as it is, we want a configuration that achieves the topology portrayed in Figure 6.6. This means that the six machines should manage the six IP addresses published through DNS. Now the simplicity of peer-based failover shines.

Figure 6.6. Peer-based high-availability configuration.

Which machine should get which IP address? As a part of the philosophy of peer-based HA, that question is better left up to the implementation itself. Wackamole should simply be told to manage the group of IP addresses by creating a wackamole.conf file as follows:

Spread = 3777 Group = wack1 SpreadRetryInterval = 5s Control = /var/run/wack.it Prefer None VirtualInterfaces { { fxp0:192.0.2.21/32 } { fxp0:192.0.2.22/32 } { fxp0:192.0.2.23/32 } { fxp0:192.0.2.24/32 } { fxp0:192.0.2.25/32 } { fxp0:192.0.2.26/32 } } Arp-Cache = 90s mature = 5s Notify { # Let's notify our router fxp0:192.0.2.1/32 # And everyone we've been speaking with arp-cache }

Let's walk through this configuration file step-by-step before we see it in action:

  • Spread Connects to the Spread daemon running on port 3777.

  • Group All Wackamole instances running this configuration file will converse over a Spread group named wack1.

  • SpreadRetryInterval If Spread were to crash or otherwise become unavailable, Wackamole should attempt to reconnect every 5 seconds.

  • Control Wackamole should listen on the file /var/run/wack.it for commands from the administrative program wackatrl.

  • Prefer Instructs Wackamole that no artificial preferences exist toward any one IP. In other words, all the Wackamoles should collectively decide which servers will be responsible for which IP addresses.

  • VirtualInterfaces Lists the IP addresses that the group of servers will be responsible for. These are the IP addresses published through DNS for images.example.com that will "always be up" assuming that at least one machine running Wackamole is alive and well.

  • Arp-Cache Instructs each instance to sample the local machine's ARP cache and share it with the other cluster members. The ARP cache contains the IP address to Ethernet MAC address mapping that is used by the operating system's network stack to communicate. It contains every IP address that a machine has been communicating with "recently." If machine A fails, and B is aware of the contents of A's ARP cache, B can inform all the necessary machines that have been communicating with A that the MAC addresses for the services they need have changed.

  • Mature To reduce "flapping," 5 seconds are allowed to pass before a new member is eligible to assume responsibility for any of the virtual interfaces listed.

  • Notify When Wackamole assumes responsibility for an IP address, it informs its default route at 192.0.2.1 and every IP address in the cluster's collective ARP cache. This is an effort to bring quick awareness of the change to any machines that have been using the services of that IP address.

Testing the High Availability

Now that Wackamole is installed, let's crank it up and see whether it works. First we will bring up Spread on all the machines (it should be in the default start scripts already) and test it as described in Appendix A. Next, we start Wackamole on image-0-1:

root@image-0-1# /usr/local/sbin/wackamole root@image-0-1# /usr/local/sbin/wackatrl l Owner: 192.0.2.11 * fxp0:192.0.2.21/32 * fxp0:192.0.2.22/32 * fxp0:192.0.2.23/32 * fxp0:192.0.2.24/32 * fxp0:192.0.2.25/32 * fxp0:192.0.2.26/32 root@image-0-1# ifconfig fxp0 fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 inet 192.0.2.11 netmask 0xffffff00 broadcast 192.0.2.255 inet6 fe80::202:b3ff:fe3a:2e97%fxp0 prefixlen 64 scopeid 0x1 inet 192.0.2.21 netmask 0xffffffff broadcast 192.0.2.21 inet 192.0.2.22 netmask 0xffffffff broadcast 192.0.2.22 inet 192.0.2.23 netmask 0xffffffff broadcast 192.0.2.23 inet 192.0.2.24 netmask 0xffffffff broadcast 192.0.2.24 inet 192.0.2.25 netmask 0xffffffff broadcast 192.0.2.25 inet 192.0.2.26 netmask 0xffffffff broadcast 192.0.2.26 ether 00:02:b3:3a:2e:97 media: Ethernet autoselect (100baseTX <full-duplex>) status: active

So far, so good. Let's make sure that it works. From another location, we should ping all six of the virtual IP addresses to ensure that each is reachable. After successfully passing ICMP packets to these IP addresses, the router or firewall through which image-0-1 passes packets will have learned that all six IP addresses can be found at the Ethernet address 00:02:b3:3a:2e:97.

Now we bring up image-0-2:

root@image-0-2# /usr/local/sbin/wackamole root@image-0-2# /usr/local/sbin/wackatrl l Owner: 192.0.2.11 * fxp0:192.0.2.21/32 * fxp0:192.0.2.22/32 * fxp0:192.0.2.23/32 Owner: 192.0.2.12 * fxp0:192.0.2.24/32 * fxp0:192.0.2.25/32 * fxp0:192.0.2.26/32 root@image-0-2# ifconfig fxp0 fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 inet 192.0.2.12 netmask 0xffffff00 broadcast 192.0.2.255 inet6 fe80::202:b3ff:fe3a:2f97%fxp0 prefixlen 64 scopeid 0x1 inet 192.0.2.24 netmask 0xffffffff broadcast 192.0.2.24 inet 192.0.2.25 netmask 0xffffffff broadcast 192.0.2.25 inet 192.0.2.26 netmask 0xffffffff broadcast 192.0.2.26 ether 00:02:b3:3a:2f:97 media: Ethernet autoselect (100baseTX <full-duplex>) status: active

Everything looks correct, but we should make sure that image-0-1 sees the same thing. Because the output of wackatrl l will certainly be the same, ifconfig is the true tool to make sure everything is the same. Subsequent to bringing image-0-2's Wackamole instance up, we see the appropriate message in /var/log/message, and ifconfig shows that the three complementary IP addresses are assigned to image-0-1.

root@image-0-1# tail /var/log/message | grep wackamole image-0-1 wackamole[201]: DOWN: fxp0:192.0.2.24/255.255.255.255 image-0-1 wackamole[201]: DOWN: fxp0:192.0.2.25/255.255.255.255 image-0-1 wackamole[201]: DOWN: fxp0:192.0.2.26/255.255.255.255 root@image-0-1# ifconfig fxp0 fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 inet 192.0.2.11 netmask 0xffffff00 broadcast 192.0.2.255 inet6 fe80::202:b3ff:fe3a:2e97%fxp0 prefixlen 64 scopeid 0x1 inet 192.0.2.21 netmask 0xffffffff broadcast 192.0.2.21 inet 192.0.2.22 netmask 0xffffffff broadcast 192.0.2.22 inet 192.0.2.23 netmask 0xffffffff broadcast 192.0.2.23 ether 00:02:b3:3a:2e:97 media: Ethernet autoselect (100baseTX <full-duplex>) status: active

Although the local configuration on each server looks sound, there is more to this than meets the eye. The traffic from other networks is being delivered to and from this one server through a router, and that router has an ARP cache. If the ARP cache was not updated, the router will continue to send packets to 192.0.2.24, 192.0.2.25, and 192.0.2.26 to image-0-1. Although we can be clever and send ICMP packets to each IP address and use a packet analyzer such as tcpdump or ethereal to determine whether the ICMP packets are indeed being delivered to the correct machine, there is a simpler and more appropriate method of testing thisturn off image-0-1.

Wackamole employs a technique called ARP spoofing to update the ARP cache of fellow machines on the local Ethernet segment. Machines use their ARP cache to label IP packet frames for delivery to their destination on the local subnet. When two machines on the same Ethernet segment want to communicate over IP, they each must ascertain the Ethernet hardware address (MAC address) of the other. This is accomplished by sending an ARP request asking what MAC address is hosting the IP address in question. This request is followed by a response that informs the curious party with the IP address and MAC address. The crux of the problem is that this result is cached to make IP communications efficient.

After we yank the power cord from the wall, we should see image-0-2 assume responsibility for all the IP addresses in the Wackamole configuration. Now a ping test will determine whether Wackamole's attempts to freshen the router's ARP cache via unsolicited ARP responses was successful.

If pings are unsuccessful and suddenly start to work after manually flushing the ARP cache on our router, we are unfortunate and have a router that does not allow ARP spoofing. The only device I am aware of that acts in this fashion is a Cisco PIX firewall, but I am sure there are others lingering out there to bite us when we least expect it.

If a server is communicating over IP with the local router, that router will inevitably have the server's MAC address associated with that server's IP address in its ARP cache. However, if that server were to crash and another machine was to assume the responsibilities of one of the IP addresses previously serviced by the crashed machine, the server will have the incorrect MAC address cached. Additionally, it will not know that it needs to re-ARP for that IP. So, Wackamole will send ARP response packets (also known as unsolicited or gratuitous ARPing) to various machines on the local Ethernet segment if an IP address is juggled from one server to another.

Assuming that all has gone well, our cluster is ready for some serious uptime. After bringing up all six Wackamole instances, we will see the following output from wackatrl l.

root@image-0-2# /usr/local/sbin/wackatrl l Owner: 192.0.2.11 * fxp0:192.0.2.21/32 Owner: 192.0.2.12 * fxp0:192.0.2.22/32 Owner: 192.0.2.13 * fxp0:192.0.2.23/32 Owner: 192.0.2.14 * fxp0:192.0.2.24/32 Owner: 192.0.2.15 * fxp0:192.0.2.25/32 Owner: 192.0.2.16 * fxp0:192.0.2.26/32

Now, even if five of these machines fail, all six virtual IP addresses will be publicly accessible and serving whatever services necessary. Granted, our previous calculations let us know that one machine would never be capable of coping with the peak traffic load, but we still should be able to have two of them offline (unexpectedly or otherwise) and be able to handle peak load.

The next step is to advertise these six IP addresses via DNS so that people visiting images.example.com will arrive at our servers.

The DNS RR records for this service should look as follows:

$ORIGIN example.com. images 900 IN A 192.0.1.21 900 IN A 192.0.1.22 900 IN A 192.0.1.23 900 IN A 192.0.1.24 900 IN A 192.0.1.25 900 IN A 192.0.1.26

This sample bind excerpt advertises the six listed IP addresses for the name images.example.com, and clients should "rotate" through these IP addresses. Clients, in this context, are not actually end users but rather the caching name server closest to the client. Each name server is responsible for cycling the order of the records it presents from response to response. So each new query for images.example.com results in a list of IP addresses in a new order. Typically, web browsers tend to use the name service resolution provided by the host machine that they run on, and most hosts choose the first DNS RR record when several are presented. That means different clients accessing the same name server will contact different IPs, and there will be a general distribution across all the advertised IPs.

The balancing will not be even, but you should see roughly the same number of requests per second across each of the IP addresses. For example, in our production reference implementation, we see an average 30% deviation from machine to machine on a second-to-second basis, and less than 3% deviation from minute to minute. So, although the balancing is not perfect, it is entirely sufficient.

Категории