Cisco Network Security Troubleshooting Handbook
This section explores the failover features available on PIX Firewall in greater detail (the discussion is based on information from the New Product Introduction training by PIX marketing team with modification to be able to easily comprehend). The resiliency of the connections through the PIX firewall can be achieved with failover, which means that if one PIX failed, the other PIX still would be available to process the packets. Starting from PIX Version 7.0, PIX can be configured in one of the following two failover modes:
Active/Standby Model
In this model, one PIX acts as the active PIX that processes the traffic at the time, and the other unit acts as standby. In the event of failure of the active unit, the standby unit becomes active and starts processing packets. If stateful failover is configured, the transition of connections from active to standby is very smooth. The new features added to Version 7.0 are as follows:
Active/Active Model
The active/standby model is not the best way to use the resources. Ideally, both of the units in the failover should be used, and act as backups for each other in the event of failure. This is exactly what active/active failover can provide, as introduced in Version 7.0. In this model, both units can actively process firewall traffic while at the same time serving as backups for each other. In PIX Version 7.0, only a two-node setup is possible, in which each device handles 50 percent of the traffic. When one unit fails, the remaining unit takes over and processes 100 percent of the traffic. Remember that in this model, PIX itself does not perform the load balancing of the traffic. So, it is your responsibility to engineer the network to route 50 percent of the traffic to each unit. This can be accomplished either by static routes or dynamic routing protocols on both upstream and downstream routers. A new command, failover groups, is being introduced to support active/active failover. Conceptually it is very similar to the Cisco's Multi-Group Hot Standby Router Protocol (HSRP) or Virtual Router Redundancy Protocol (VRRP). As mentioned before, PIX 7.0 supports a two-node active/active failover configuration with two failover groups. The active/active failover feature leverages the Security Context feature. Figure 3-10 shows a typical two-node cluster. On unit A, ctx1 is active and ctx2 is standby. Unit B has ctx2 as active and ctx1 is standby. If unit A fails, unit B will have both contexts active and will process 100 percent of the firewall traffic. Figure 3-10. Active/Active FO setup
Each failover group contains separate state machines to keep track of the failover state. The command failover active group <group#> can be used to change the active state of a failover group. Hardware and License Requirements
You must have the following minimum requirement for hardware and licensing to configure active/active failover:
System and User Failover Group
To support active/active failover, PIX 7.0 failover added support for failover groups. Each failover group has its own state machine and can switch over independently. There are two types of failover group: system failover group and user failover group. The concept is similar to the system and user context. The system group is used internally by the failover process. Its main purpose is to allow the failover process to manage the unit-wide activities under the failover group scheme. These activities include: unit health monitoring, failover command interface health monitoring, running config synchronizing, and so on. There is only one system failover group per unit, and it is created automatically when the user enables the failover. The system context is bound to the system failover group. User failover groups are used to manage the user contexts under the active/active failover scheme. Figure 3-10 shows a simple example of two user contexts that are bound to two user failover groups. Note PIX Firewall Version 7.0 supports a maximum of two (user) failover groups. However, there can be more than two user contexts configured in the system.
The failover-group subcommand under the context command is used to bind a user context to a failover group. The failover group 1 is the default failover group. If a user context is not bound to a failover group through the command interface, it is bound to failover group 1 by default. Failover group 1 must be created first and must be the last group to be removed. If a context is bound to a failover group, the failover group cannot be deleted unless the binding is removed. Initialization, Configuration Synchronization/Command Replication
When a unit boots up, the system failover group will contact the active peer to obtain the running configuration. If both units boot up at the same time, the System failover group of the Primary unit will become active and synchronize its configuration to the Secondary unit. After configuration syncing has finished, the state machine of the user failover group will start running to elect the active unit for each group and start the Active/Active failover. Even though both units could be actively processing user traffic, the command replication is uni-directional. The command will be replicated to the peer only if failover group 1 is in active state. Users will be warned if they try to enter a configuration command from the standby unit. This means that a user context can be in the active state, but the user will need to enter commands from the user context of the standby unit if it is bound to failover group 2. Configuration Examples
Work through the steps that follow to configure Active/Active Failover:
Asymmetrical Routing Support
For better performance and reliability, you may have the network set up with redundant connections to the same Internet Service Provider (ISP) or two different ISPs as shown in Figure 3-11. This poses a problem for PIX, as the ISP may receive traffic from one PIX but might return traffic back to the other PIX. This is because these two PIX firewall units (or pairs) do not share session information. Therefore, while the return packet is allowed by the PIX that allowed the initial connection, it is denied by the other PIX. This is also how it works in Failover mode when it is deployed with Active/Standby Mode. Figure 3-11. Asymmetric Routing Support with Active/Active FO Setup
The same problem can occur when running active/active failover. A unit may receive a packet that belongs to its peer. If this happens, the receiving unit will forward the packet back to its peer for processing. Stateful failover must be enabled to support asymmetric routing. Figure 3-11 shows two units running Active/Active failover, where Unit 1 has context ctx1 active and Unit 2 has context ctx2 active. An inside host initiates a connection through context ctx1 of Unit 1 (solid line). Context ctx1 creates the connection, replicates the connection to Unit 2, and then forwards the packet. If the return packet is routed through context ctx2 of Unit 2, connection information already exists in Unit 2's connection table. But, because context ctx1 is not active in Unit 2, it forwards the packet back to Unit 1 (arrows). Following are some of the restrictions and cautions you should remember when configuring asymmetric routing in Active/Active Failover mode.
To configure asymmetric routing support, you need to use asr-group command under the interfaces of contexts. Asymmetric routing support is needed for the outside interface only; you need to have the following commands configured on both ctx1 and ctx2: PIX/ctx1(config)# interface Ethernet 0.30 PIX/ctx1(config-if)# asr-group 1 PIX/ctx2(config)# interface Ethernet 0.40 PIX/ctx2(config-if)# asr-group 1
Troubleshooting Steps
Before working through some of the common scenarios, it would be helpful for you to examine the syslog messages that are shown on the secondary PIX firewall after you turn on logging on as shown in Example 3-29. Example 3-29. Syslog on the Secondary PIX
After the units are synchronized with each other, you can find out the status of a unit on both the primary and secondary with the show failover command. Example 3-30 shows the output of the show failover command on the primary unit. Example 3-30. Monitoring Failover Status on Primary PIX Firewall
Work through the following steps to troubleshoot the failover problem:
|