Network Redundancy
The last topic we need to discuss in order to create a highly available, scalable cluster is network redundancy. Although you can just use a normal network setup, doing so results in a single point of failure. If all your network traffic in the cluster is going across either a single router or two, and one shuts down, all the nodes disconnect and shut down your cluster. Again, there are a few different solutions for this.
The first option is to get dual-port network cards that support automatic network failover. These cards should provide automatic failover totally transparent to the cluster nodes. When purchasing these cards, you should talk to the manufacturer to ensure that you have good driver support on the operating system on which you intend to use them. With this setup, you essentially have two networks, with one network acting as the backup in case the primary fails for some reason.
A second option is to have some redundancy in the network itself. It is possible to set up a network with redundancy such that if one network component fails, others can take over the load of the failed one. In a typical setup like this, you should never have all of one node group plugged in to the same physical network device. That way, if the network device fails, you lose at most one machine from the cluster, which does not result in an entire cluster shutdown.
If you decide to go with the redundant network setup, you need to be very careful to ensure that having a larger network setup doesn't affect performance. As mentioned in Chapter5," Performance", network latency is extremely important, and as you add more layers of networking, it can result in an overall slowdown in cluster processing.