OSPF Network Design Solutions

Previous Table of Contents Next

Case Study: In the Trenches with OSPF

This case study is intended to describe a “real world” case where this problem actually occurred and how it was identified and corrected. The case study then outlines some lessons to help prevent it from happening again in the future. The troubleshooting model introduced earlier in this chapter is used throughout this case study as a process to reference when performing network troubleshooting.

Recently, a large broadcast storm occurred in an OSPF Enterprise network, affecting a region of the network that consisted of approximately 50 geographically separate sites consisting of over 75 routers serving approximately 3,000 users. This condition brought all user WAN traffic in the impacted area to a standstill.

Through the course of our troubleshooting, we identified how and why “localized” broadcasts were erroneously being propagated across the WAN, resulting in a dramatic degradation in network performance. Additionally, several Cisco OSPF router configuration problems were identified and corrected during the course of troubleshooting.

Troubleshooting Methodology

When troubleshooting in any type of networking environment, a systematic troubleshooting methodology works best. The seven steps outlined throughout this section will help you to clearly define the specific symptoms associated with network problems, identify potential problems that could be causing the symptoms, and then systematically eliminate each potential problem (from most likely to least likely) until the symptoms disappear.

This process is not a rigid outline for troubleshooting an internetwork. Rather it is a foundation from which you can build a problem-solving process to suit your particular internetworking environment.

The following troubleshooting steps detail the problem-solving process:

1.  Clearly define the problem.
2.  Gather facts.
3.  Consider possible problems.
4.  Create an action plan.
5.  Implement the action plan.
6.  Gather results.
7.  Reiterate the process, if needed, in steps 4-7.

Customer Reports a Network Slowdown

The customer has called the Network Operations Center (NOC) and reported a network slowdown at a number of critical sites. The situation is even more urgent, as the customer is preparing to run the end of inventory reconciliation report. The network must be available at this critical time or the customer will lose money.

Step 1: Define the Problem

The first step in any type of troubleshooting and repair scenario is to define the problem. What is actually happening is sometimes very different from what is reported; thus, the truth in this step is defining the actual problem. You need to do two things: identify the symptoms and perform an impact assessment.

Our customer called us and explained that the network response was extremely slow. This, of course, was a rather vague and broad description from a network operations standpoint. Due to the nature of the problem report (that is, it can sometimes be difficult to define “slow”, a clear understanding of the problem was required before we could proceed with developing an action plan. This was accomplished by gathering facts and asking several questions to the user reporting the problem. According to the users, the general symptoms included:

•  Slow response (including a 30-40 percent packet loss) while connecting to any device on the WAN from the Downtown location (see Figure 8-6). We confirmed the slow response by executing a ping from ROUTER B to ROUTER C and we received an 800 millisecond round trip delay. A normal network round trip delay for other routers in our network had consistently been in the range of 100 to 150 milliseconds.
•  Nearly 100 percent utilization was found on Router B’s Frame Relay Permanent Virtual Circuit to Router C. This had been seen by our long distance carrier’s Frame Relay network group, who had been monitoring Frame Relay switch statistics at our request.
•  We were informed that impact on user productivity was so great that several users were sent home because they could not reliably access critical network resources.
•  Other users reported that their ability to run inventory reports was being impaired and the deadline was quickly approaching.

Step 2: Gather Facts

After the problem is defined, it is then necessary to begin gathering the facts surrounding the problem. This step will provide the facts that were gained in this network case study.

Before starting to troubleshoot any type of networking problem, it is usually helpful to have a network diagram. Figure 8-6 shows the diagram we used.

Following the previously mentioned troubleshooting methodology, we collected as many facts as possible and made some general observations by connecting to the routers in question. We gathered facts from several sources on the router, including the Cisco log buffer and by utilizing various Cisco SHOW commands. Our observations revealed the following facts and occurrences within the network.

Router B in Figure 8-6 reported high traffic input to the Ethernet segment at Headquarters. This caused the Ethernet connectivity to become so unstable that the links would become unavailable for brief periods. Consequently, OSPF adjacencies were being reformed repeatedly. The following is an excerpt from the SYSLOG on Router A:

Mar 1 00:08:17 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0, changed state to down Mar 1 00:08:29 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0, changed state to up Mar 1 00:08:35 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0, changed state to down Mar 1 00:08:39 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0, changed state to up

As you can see by the SYSLOG entries, Ethernet connectivity was being lost for brief periods of time. The router was definitely showing us a contributing factor to the problems being reported by our customer.

Figure 8-6  Case study WAN diagram.

Missing OSPF Adjacencies

Previous Table of Contents Next

Категории