Mitigating Route Instabilities with Route Flap Damping

Problem

You want to deal with potential route instabilities caused by routes being withdrawn in a series of BGP Update messages only to be readvertised as active routes a few minutes later when an intermittently failing link is restored.

Solution

Route flap damping is a way to prevent flapping routes from destabilizing BGP. In the JUNOS software, you set up damping by using routing policy. There are four steps in setting up damping:

  1. Define the damping parameters:

    [edit policy-options] aviva@Router3# set damping damping-normal suppress 6000 aviva@Router3# set damping standard-damping half-life 15 aviva@Router3# set damping standard-damping reuse 3000 aviva@Router3# set damping standard-damping max-suppress 30

  2. Create a routing policy that references the damping parameters:

    [edit policy-options] aviva@Router3# set policy-statement damping-policy from route-filter 10.0.31.1/32 exact aviva@Router3# set policy-statement damping-policy then damping damping-normal aviva@Router3# set policy-statement damping-policy then accept

  3. Enable route flap damping for BGP:

    [edit protocols bgp] aviva@Router3# damping

  4. Apply the damping policy to a BGP group:

    [edit protocols bgp] aviva@Router3# set group session-to-AS65505 import damping-policy

Discussion

If a link on the network is intermittently failing, routes can be withdrawn and readvertised in quick succession as the link goes down and then comes back up. This route flapping forces BGP to change any next hops that use the failed interface each time the link goes down. BGP then has to update its routing tables and propagate the new routing information. If many routes are being recalculated, the flapping link could make BGP very unstable.

Route damping is a mechanism for preventing flapping routes from destabilizing a BGP network. Damping slows or stops the "vibrations," or rapid changes, in the routing table. When a route flaps, it is given a specified number of demerits. The routes accumulated demerits are reduced over time according to a configured decay rate. If the routes accumulated demerits exceed a configured threshold, the route is suppressed until the number of demerits decays below a second configured threshold.

Route damping is most useful in large service provider networks that have many attached peers and that carry many prefixesa scenario in which the chances of one or more routes flapping at any given time is high.

In the first part of the configuration, you set four damping parameters that are used to calculate a figure of merit, which controls how long a route can be suppressed. The figure-of-merit value correlates to the probability of a routes future instability, and the value decays exponentially over time. BGP suppresses routes with higher figure-of-merit values for longer periods of time.

For a new route, BGP assigns a figure-of-merit value of 0. If the route experiences any instability, the value is increased based on the following rules:

The points, or demerits, given to a route decrease over time and decay exponentially. This time is the half-life of the route. If the demerits decay faster than the figure-of-merit value increases, the route will not be suppressed. When the figure-of-merit value increases beyond a cutoff value, called the suppression threshold (also called the cutoff threshold), the route is suppressed and is considered unusable. The router will ignore any new information about the route received from its peers and will not install it into the forwarding table or forward the route to any other routing protocols. The figure-of-merit value continues to decay based on the half-life. When the value drops below the reuse threshold, it is unsuppressed and again considered usable.

The damping parameters that you configure play into the figure of merit. The suppress statement controls the suppression threshold. By default, when a routes figure-of-merit value reaches 3,000, it is suppressed. The figure-of-merit value decays exponentially over the half-life that you set with the half-life statement. The default half-life is 15 minutes. To illustrate how the decay works, if a route has a figure-of-merit value of 1,000 and no incidents occur, the value decays to 500 after 15 minutes, then to 250 after another 15 minutes. You set the reuse threshold with the reuse statement. The default is 750. As the figure-of-merit value continues to decay, when it drops below the reuse threshold, the route becomes usable again. The maximum amount of time a route can be suppressed is 60 minutes by default, which you can modify with the max-suppress statement.

The first step in configuring damping parameters is to create a named parameter list. In this recipe, the list is named damping-normal, which sets up a standard set of damping parameters. The figure-of-merit value decays over 15 minutes, which is the default half-life. Routes are suppressed when their figure of merit reaches a value of 6,000 (instead of the default 3,000) and are unsuppressed at half that value (3,000) instead of at the default value (750). Finally, in the recipe, routes remain suppressed for a maximum of 30 minutes instead of the default 60 minutes.

The figure of merit doesn increase forever but stops when it reaches the merit ceiling, c, which is a value that is calculated based on the reuse threshold (images/U2211.jpg border=0>r); half-life (l), in minutes; and maximum suppression time (t), in minutes:

Using the default reuse threshold of 750, a maximum suppression time of 60 minutes, and a half-life of 30 minutes, the calculation looks like this:

In this case, a routes figure-of-merit value will stop increasing when it reaches 3,000. If you change the default damping parameter values, use this formula to make sure that the suppression threshold is not greater than the merit ceiling. If it is, routes will never be suppressed and route flap damping will never occur.

After setting the damping parameters, you are ready to create the routing policy for route flap damping. In this recipe, the policy named damping-policy applies to a particular peer, 10.0.31.1.

In larger networks, it is common to set up different degrees of damping policy to apply to different types of routes. In addition to the normal damping parameters set in this recipe, you can also set up parameters to suppress routes for longer periods of time:

[edit policy-options damping damping-medium ] aviva@Router3# show half-life 15; reuse 1500; suppress 6000; max-suppress 45; [edit policy-options damping damping-high ] aviva@Router3# show half-life 30; reuse 1640; suppress 6000; max-suppress 60;

The damping-medium parameters increase the decay half-life from 10 to 15 minutes and the maximum suppression time from 30 to 45 minutes, and the damping-high parameters increase the half-life to 30 minutes and maximum suppression to 60 minutes. You apply these two damping parameters to routes that flap a bit more than normal or severely more than normal. Then, instead of applying the policy to specific BGP peers, you can apply it to a range of prefixes:

[edit policy-options policy-statement flap-damping ] aviva@Router3# show from { route-filter 0.0.0.0/0 upto /21 damping damping-normal; route-filter 0.0.0.0/0 upto /23 damping damping-medium; route-filter 0.0.0.0/0 orlonger damping damping-high; } then accept;

Once the routing policy is set up, enable damping for BGP with the set damping command. Then apply the damping policy to the EBGP group with an import statement so the damping policy is applied to all routes before they are placed into the routing table.

Verify the damping configuration with the show policy damping command:

aviva@Router3> show policy damping Default damping information: Halflife: 15 minutes Reuse merit: 750 Suppress/cutoff merit: 3000 Maximum suppress time: 60 minutes Computed values: Merit ceiling: 12110 Maximum decay: 6193 Damping information for "damping-high": Halflife: 30 minutes Reuse merit: 1640 Suppress/cutoff merit: 6000 Maximum suppress time: 60 minutes Computed values: Merit ceiling: 6577 Maximum decay: 24933 Damping information for "damping-medium": Halflife: 15 minutes Reuse merit: 1500 Suppress/cutoff merit: 6000 Maximum suppress time: 45 minutes Computed values: Merit ceiling: 12049 Maximum decay: 12449 Damping information for "damping-normal": Halflife: 15 minutes Reuse merit: 3000 Suppress/cutoff merit: 6000 Maximum suppress time: 30 minutes Computed values: Merit ceiling: 12017 Maximum decay: 24963

The output shows the default damping information and the three configured sets of parameters. The first portion of the output lists the default damping parameters. The Computed values fields show the merit ceiling value calculated from the damping parameters. In the default policy, you can see that the merit ceiling of 12,110 is well above the suppression threshold of 3,000.

The show bgp summary command output shows whether any BGP routes have been damped:

aviva@Router3> show bgp summary Groups: 2 Peers: 3 Down peers: 0 Table Tot Paths Act Paths Suppressed History Damp State Pending inet.0 8 5 0 0 1 0 Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#A ctive/Received/Damped… 192.168.15.1 65500 503 517 0 0 4:11:20 0/0/0 0/0/0 192.168.17.1 65500 501 515 0 0 4:10:25 0/0/0 0/0/0 10.0.31.1 65505 181 182 0 3 1:27:46 4/8/1 0/0/0

The Damp State field in the first line shows that one route in the inet.0 routing table has been damped. Farther down in the output, you see that a connection to BGP peer 10.0.31.1 in AS 65505 is established because the State field shows that the router has received eight routes from that peer and four of them are active. The third number in the State field shows that one route is currently suppressed as a result of the damping policy.

The show route damping command provides more information about damped routes. The suppressed detail option shows specific prefixes that are or have been suppressed:

aviva@Router3> show route damping suppressed detail inet.0: 173318 destinations, 1533437 routes (172602 active, 11 holddown, 108105 hidden) 10.4.10.0/19 (1 entry, 0 announced) BGP /-101 Next-hop reference count: 18064 Source: 192.168.106.33 Next hop: 192.168.106.33 via so-6/3/0.0, selected State: Local AS: 65000 Peer AS: 65530 Age: 1:36 Task: BGP_65530.192.168.106.33+179 AS path: 65530 65531 65532 I ( ) Communities: 65501:390 65501:2000 65501:3000 65504:6453 Localpref: 100 Router ID: 192.168.103.240 Merit (last update/now): 12866/11594 damping-parameters: damping-normal Last update: 00:01:36 First update: 1w3d 03:00:51 Flaps: 13718 Suppressed. Reusable in: 00:19:40 Preference will be: 170

Here, the prefix 10.4.10.0/19 is suppressed. While suppressed, the prefix is not active in the forwarding table, so there is no asterisk next to BGP on the second line of the output, and the prefix is hidden (noted in the State field) and is not exported to any BGP peers. The last several lines show the damping information. The damping-parameters line indicates that this route is being damped with the damping-normal parameters. The current figure-of-merit value is 11,594, which is above the reuse threshold of 3,000 that is set for damping-normal. The third line gives an idea of how long the prefix has been unusable. First update shows when the path attributes for the route were first changed (here, more than a week ago) and when they were last updated (about 1.5 hours ago). The next lines show that the route has flapped a total of 13,718 times. If the route remains stable and the path information for it does not change, the router will unsuppress this route and reuse it in 19 minutes 40 seconds, and with a preference of 170, which is the default JUNOS preference for routes learned from BGP.

You can also check to see the routes that have flapped but have not been suppressed:

aviva@Router3> show route damping decayed detail inet.0: 173319 destinations, 1533668 routes (172625 active, 4 holddown, 108083 hidden) 10.0.111.0/24 (7 entries, 1 announced) *BGP Preference: 170/-101 Next-hop reference count: 151973 Source: 172.23.2.129 Next hop: via so-1/2/0.0 Next hop: via so-5/1/0.0, selected Next hop: via so-6/0/0.0 Protocol next hop: 172.23.2.129 Indirect next hop: 89a1a00 264185 State: <Active Ext> Local AS: 65000 Peer AS: 65490 Age: 3:28 Metric2: 0 Task: BGP_65490.172.23.2.129+179 Announcement bits (6): 0-KRT 1-RT 4-KRT 5-BGP.0.0.0.0+179 6-Resolve tree 2 7-Resolve tree 3 AS path: 65490 65520 65525 65525 65525 65525 I ( ) Communities: 65501:390 65501:2000 65501:3000 65504:701 Localpref: 100 Router ID: 172.23.2.129 Merit (last update/now): 1934/1790 damping-parameters: damping-high Last update: 00:03:28 First update: 00:06:40 Flaps: 2

The prefix 10.0.111.0/24 is using the damping-high parameters, which have a suppression threshold of 6,000. This route currently has not yet crossed this threshold but has a nonzero figure of merit of 1,790. The asterisk before BGP on the second line and the Active in the State field both indicate that this route is still active.

The show route damping history command shows whether any routes have been withdrawn:

aviva@Router3> show route damping history inet.0: 173320 destinations, 1533529 routes (172624 active, 6 holddown, 108122 hidden) + = Active Route, - = Last Active, * = Both 10.108.0.0/15 [BGP ] 2d 22:47:58, localpref 100 AS path: 65220 65501 65502 I > to 192.168.60.85 via so-3/1/0.0

The prefix 10.108.0.0/15 has been withdrawn. Use the detail option to get more information:

aviva@Router3> show route damping history detail inet.0: 173319 destinations, 1533435 routes (172627 active, 2 holddown, 108105 hidden) 10.108.0.0/15 (3 entries, 1 announced) BGP /-101 Next-hop reference count: 69058 Source: 192.168.60.85 Next hop: 192.168.60.85 via so-3/1/0.0, selected State: Inactive reason: Unusable path Local AS: 65000 Peer AS: 65220 Age: 2d 22:48:10 Task: BGP_65220.192.168.60.85+179 AS path: 65220 65501 65502 I ( ) Communities: 65501:390 65501:2000 65501:3000 65504:3561 Localpref: 100 Router ID: 192.168.80.25 Merit (last update/now): 1000/932 damping-parameters: set-normal Last update: 00:01:05 First update: 00:01:05 Flaps: 1 History entry. Expires in: 00:22:20

This output is similar to the show route damping suppressed detail command. It also shows in the Inactive reason line that the path is hidden because it is unusable. Unusable path can mean one of three things: that the route was rejected as the result of an import routing policy, that the route has been damped (which is the case here), or that the next hop to the route cannot be resolved.

Checking the routing table entries for 10.108.0.0/15 confirms that the route is unusable:

aviva@Router3> show route 10.108.0.0/15 exact all inet.0: 173321 destinations, 1533468 routes (172617 active, 14 holddown, 108123 hidden) + = Active Route, - = Last Active, * = Both 10.108.0.0/15 *[BGP/170] 02:59:16, localpref 120, from 172.24.250.123 AS path: (64603) 65503 65503 65503 I > via so-2/0/0.0, label-switched-path 1 via so-2/0/0.0, label-switched-path 2 [BGP/170] 5w3d 11:43:01, localpref 100, from 172.24.20.129 AS path: 65520 65521 I via so-1/2/0.0 > via so-5/1/0.0 via so-6/0/0.0 [BGP ] 2d 22:49:33, localpref 100 AS path: 65220 65501 65502 I > to 192.168.60.85 via so-3/1/0.0

The third route to 10.108.0.0/15, using the SONET interface so-3/1/0.0, is the one that is suppressed. You can confirm this because no preference value is associated with the route. You see [BGP ] instead of [BGP/170].

See Also

RFC 2439, BGP Route Flap Damping; Recipe 9.1

Категории