Failover and Redundancy
Architectural Overview
When two identical Cisco ASA are set up in failover, one of Cisco ASA (known as the active Cisco ASA) is responsible for creating the state and translation tables, transferring the data packets, and monitoring the other unit. The other security Cisco ASA (referred as the standby Cisco ASA) is responsible for monitoring the status of the active unit. The active and standby Cisco ASA are connected through a dedicated network link to send failover related messages to each other. This connection, known as a failover control link, is established over a dedicated failover LAN interface. When a failure occurs on the active Cisco ASA, the standby takes over the role and starts forwarding traffic. The standby Cisco ASA (now the active unit) also takes over the IP and MAC addresses that were used by the original active Cisco ASA. After the original active unit recovers from a failure, it can either assume the standby role or become the active Cisco ASA depending on its configuration.
The failover control link provides a medium over which the two security Cisco ASA can communicate with each other. Using this link, the Cisco ASA update one another about:
- The unit state (active or standby)
- Network link status
- Hello messages (which are sent on all interfaces)
- MAC address exchange
- Configuration synchronization
Figure 11-1 shows two Cisco ASA 5540 devices connected to each other through the Gigabit Ethernet interfaces to send the failover hello messages. For the failover link, they are using the GigabitEthernet0/2 interface, shown as the dashed line.
Figure 11-1. Failover Connection Between Two ASAs
Conditions that Trigger Failover
For a failover to occur, any one of the following conditions has to be met:
- When an administrator manual switches over from active to standby. This happens when either no failover active is issued on the active unit or failover active is issued on the standby unit.
- When a standby Cisco ASA stops receiving keepalive packets on the failover command interface. In this condition, the standby unit waits for two consecutive polling periods before sending additional testing packets to the remaining interfaces. If it still does not receive a response from the active unit, it assumes that a failure has occurred and takes over the role as the active Cisco ASA.
- When the command interface link goes down. In this scenario, the security Cisco ASA sends additional testing packets to the remaining interfaces to determine if the peer's command interface is also down. If the command interface of the peer is not down and this unit is an active Cisco ASA, then it notifies the standby unit to become active using the other interfaces. If the peer's interface is also down, then the active Cisco ASA remains in the active state.
- When the link state of an interface goes down. In this condition, the Cisco ASA marks the interface as failed and initiates the failover process. Additionally, if the standby Cisco ASA does not receive keepalive packets for two consecutive polling periods on an interface, the Cisco ASA uses additional tests on the interface to determine the root cause of the problem. These tests are discussed in detail in the following section.
Failover Interface Tests
To ensure that an interface failure is properly detected before initiating a failover, the security Cisco ASA goes through four different interface tests. These tests are discussed in the order they are checked:
- Link up/down test The security Cisco ASA determines the status of its network interface card (NIC) by doing the link up/down test, to determine if one of the ports on the security Cisco ASA is not plugged into an operational network. In this case, the security Cisco ASA marks the interface as failed, and initiates the failover process. Some examples of this failover include hardware port failure, unplugged cable of an interface, and a failure on the hub or switch that the interfaces are connected to. If the interface passes the link up/down test, the security Cisco ASA moves to the network activity test.
- Network activity test In this test, the security Cisco ASA counts all received packets for up to 5 seconds. If the security Cisco ASA receives any packet during this time interval, it stops this test and marks the interface operational. If no traffic is received, the test is inconclusive regarding whether or not the interface is faulty, so the security Cisco ASA proceeds to the next test.
- ARP test In the ARP test, the security Cisco ASA reads its ARP table for the last ten acquired entries. It sends an ARP request to those machines one at a time, and then counts packets for up to 5 seconds. If it receives traffic during this time window, it marks the interface as operational. If it does not receive a response from the host, it moves to the next host and sends an ARP request, and so on. At the end of the list, if the security Cisco ASA does not receive any traffic, it moves on to the ping test.
- Broadcast ping test In this test, the security Cisco ASA sends out a broadcast ping request and then counts all received packets for up to 5 seconds. If it receives any packets during this time window, the security Cisco ASA declares this interface operational and stops the test. If the Cisco ASA does not receive any traffic, it marks the interface as failed and initiates failover.
Note
Although the network activity, ARP, and broadcast ping tests are time consuming, they do help avoid false failover on the security Cisco ASA. Even when the interface is going through these tests, the security Cisco ASA forwards traffic on the interfaces.
Stateful Failover
When a connection is established through the active Cisco ASA, the Cisco ASA updates its connection table. A connection entry includes the source and destination IP addresses, protocol used, current state of the connection, the interface it is tied to, and the number of bytes transferred. Depending on the failover configuration, the security Cisco ASA takes one of the following actions:
- Stateless failover The security Cisco ASA maintains the connection table but does not replicate entries to the standby Cisco ASA.
- Stateful failover The security Cisco ASA maintains the connection table and replicates it to the standby Cisco ASA.
In a stateless failover, the active Cisco ASA is not responsible for sending the state table updates to the standby Cisco ASA. When the standby unit becomes active (whether by detecting a failure or by manually switching over), it has to build all the connection entries from scratch. This causes all the stateful traffic, such as TCP-based connections, to get disrupted.
In a stateful failover, the active Cisco ASA sends an update to the standby unit whenever there is a change in the state table. In this mode, the active Cisco ASA sends stateful updates over a dedicated link to the standby unit. When the standby unit becomes active, it does not need to build any connection entries because all the entries already exist in its database.
Note
You can use the same physical interface for both failover control and stateful link updates.
Table 11-1 lists the entries and the types of traffic that are replicated to the standby Cisco ASA in the stateful failover.
Type of Traffic |
Stateful Replication |
---|---|
HTTP connection |
Yes, if enabled |
TCP connection |
Yes |
UDP connection |
Yes |
Xlate |
Yes |
Uauth cache |
No |
URL filtering cache |
No |
TCP intercept |
No |
SNMP firewall MIB |
No |
Routing table |
No |
IKE/IPSec SA |
Yes |
Note
The security Cisco ASA replicate IPSec states only if stateful failover is used in Active/Standby. IPSec VPN is not supported in multimode firewall, and Active/Active failover only works in multimode.
Hardware and Software Requirements
For failover to properly work, the following specifications must be identical:
- Product or model number of the Cisco ASA For example, both Cisco ASA should be Cisco ASA 5520. You cannot use an ASA 5520 and an ASA 5540 in failover.
- Amount of RAM You cannot use 512 MB of RAM in one Cisco ASA and 1024 MB in the other one.
- Amount of Flash memory You cannot use 64 MB of Flash memory in one security Cisco ASA and 128MB of Flash memory in the other security Cisco ASA.
- Number of interfaces The current hardware does not allow adding additional physical interfaces. If the number of interfaces on an Cisco ASA changes in the future, you cannot have mismatched interfaces on the two security Cisco ASA.
- Activation key with the same features The activation key should have the same features, such as the failover mode, encryption level, and number of VPN peers.
Note
In the Cisco ASA, the software version does not have to be the same when running it in failover. This is called zero-downtime software upgrade, which is covered later in this chapter.
Before setting up security Cisco ASA for failover, verify that they have a valid activation key to run failover. After you verify the activation key, you can proceed with failover configuration.
Types of Failover
Cisco ASA supports two different types of failover:
- Active/Standby failover
- Active/Active failover
Active/Standby Failover
Active/Standby failover is identical to the failover scenario described in the previous section. In this failover type, when two security Cisco ASA are in failover, the active unit is responsible for passing the traffic. The standby's role is to monitor the status of the active Cisco ASA by sending periodic keepalive messages. The active Cisco ASA also sends keepalive messages to monitor the status of the standby Cisco ASA.
Note
You can have only two Cisco ASA set up in failover.
In Active/Standby failover, the Cisco ASA go through the following election process. They assume roles based on their designated status, whether primary or secondary.
- When both Cisco ASA are up and running, one of them is designated as the active unit, while the other Cisco ASA assumes the standby role.
- If both devices boot up simultaneously, the primary Cisco ASA takes over the active Cisco ASA role, and the secondary Cisco ASA goes into the standby state.
- If one of the security Cisco ASA boots up and detects an active failover unit, it goes into the standby state regardless of its primary or secondary designation.
- If one of the security Cisco ASA boots up and does not detect an active failover unit, it goes in the active state regardless of its primary or secondary designation.
- If both Cisco ASA become active, the secondary changes its state to standby, while the primary remains active.
- If both Cisco ASA become standby, the primary changes its state to active, while the secondary remains standby after they detect each other's state.
Active/Active Failover
Active/Active failover is a methodology in which both Cisco ASA, while monitoring the status of their peers, actively pass traffic. Cisco ASA in Active/Active failover mode can only be deployed in multimode. Figure 11-2 shows a network topology where two Cisco ASA are set up in stateful multimode Active/Active failover. They are set up for two customer contexts: Cubs and Bears. In this deployment, the Cubs security context is active on FO1 and standby on FO2. However, the Bears security context is active on FO2 and standby on FO1. If FO1 fails, the standby security context on FO2 for Cubs will become active and take over the IP and MAC addresses of FO1. As a result, both security contexts will be active on FO2.
Figure 11-2. Cisco ASA in Active/Active Multiple Mode
The failover will be completely transparent to the end hosts because of the replication of state and connection tables. When the security Cisco ASA are deployed in this mode, the total throughput is doubled, because each Cisco ASA can allocate 100 percent of its system resources to inspect and route packets. However, if one of Cisco ASA fails, the total throughput is reduced by half, up to the capacity of one Cisco ASA. It is therefore recommended that you do not oversubscribe the failover pair.
A key point to remember is that the failover in the Cisco ASA is per failover redundancy group (discussed in the section, "Failover Configuration") as opposed to per-context failover. The Cisco ASA's failover is currently limited to only two failover groups.
Note
If an interface is used as the shared interface between multiple contexts, then all of those contexts need to be in the same failover redundancy group.
The biggest challenge in running Active/Active failover is that packets can leave from one active unit and can come back to the other active unit. Cisco ASA implements a feature known as asymmetric routing to work around this problem. This feature is discussed in the following section.
Asymmetric Routing
In their enterprise, many customers use multiple ISPs to get connectivity to the Internet or to the remote locations. Depending on their implementation, these enterprises can use these ISP to either load-balance the traffic or back each other up in the event of a failure.
Figure 11-3 depicts two Cisco ASA connected to two different ISPs and running in Active/Active failover with multiple contexts. Context Cubs is active on FO1 while context Bears is active on FO2. The problem arises when both ISPs are load-balancing the traffic out to the cloud and the security Cisco ASA are setup in Active/Active mode. If Host A, sitting behind context Cubs, sends out a TCP SYN packet to Host B, the packet can leave the active Cisco ASA (FO1). However, there is no guarantee that SYN-ACK, the reply from the server, will be routed back through the same unit. If the SYN-ACK packet lands on the other active Cisco ASA (FO2), FO2 will drop the packet because it is not active for the security context Cubs.
Figure 11-3. Asymmetric Routing
Note
The asymmetric routing feature is only supported in multimode. Asymmetric routing is not supported if Cisco ASA are using shared interfaces.
Using the asymmetric routing feature, FO1 will replicate the connection table entry for the SYN packet to FO2 over the stateful failover link. Thus, when the active context on FO2 (Bears) receives the SYN-ACK packet, it will forward the packet to FO1 because it belongs to context Cubs, which is active on FO1. Figure 11-3 depicts all the steps when Host A communicates with Host B.
- Host A sends the SYN packet to its gateway router.
- The gateway router consults its routing table and forwards the SYN packet to FO1, because it belongs to context Cubs.
- FO1 looks at the routing table and forwards the SYN packet out to the Internet through ISP1.
- Host B sends SYN-ACK, which gets routed to FO2 through ISP2.
- When FO2 receives the packet on context Bears (as it is active) but it does not have an active connection. It checks the other interfaces that are in the same asymmetric routing group for the corresponding connection. In this case, it detects an active connection from FO1 for context Cubs. Therefore, it forwards the packet to FO1. It will continue to forward packets until the connection is terminated on FO1.
- FO1 forwards the packet to its next-hop router (R1).
- R1 forwards the packet to Host A, after checking the routing table.
Note
As a race condition, if the SYN-ACK packet arrives at FO2 before FO2 has the chance to process the state update message from FO1, then FO2 will drop the SYN-ACK packet. This problem can be remedied by using a high-bandwidth link as the stateful failover interface.