Network Working Group I. Hokelek
Internet-Draft M. Fecko
Expires: August 28, 2008 P. Gurung
S. Samtani
S. Cevher
J. Sucec
Applied Research
Telcordia Technologies, Inc.
February 25, 2008
Loop-Free IP Fast Reroute Using Local and Remote LFAPs
draft-hokelek-rlfap-01.txt
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 28, 2008.
Copyright Notice
Copyright (C) The IETF Trust (2008).
Abstract
This draft describes a new loop-free IP fast reroute mechanism which
enhances the IP Fast-ReRoute (IPFRR) [1,15] by introducing the
concept of pre-computed remote Loop-Free Alternate Paths (rLFAPs) on
top of the IPFRR local LFAP. In rLFAP, a router which is adjacent to
the failed resource switches over to pre-computed LFAPs, if they
exist, immediately after failure detection. Multi-hop neighbors
(MNBs) are notified about this remote failure as quickly as possible
using fast failure notification mechanism. Upon receipt of failure
information, MNBs activate their pre-computed remote LFAPs that they
maintain for protecting against remote failures within their
multi-hop neighborhoods. In the worst case, where IPFRR results in
forming micro-loops, rLFAP completely prevents micro-loops for
single link failures and quickly converges to a loop-free path in
case of multiple link failures.
1. Introduction
Fast reroute in communication networks is defined as the dynamic and
timely redirection of a primary path to an alternate secondary path
in response to degradation/failure of the primary path. IP Fast-
ReRoute (IPFRR) drafts [1,15] provide a framework for the fast-
reroute mechanism which minimizes the adverse effects of link or
node failures by timely invoking the pre-computed repair paths.
IPFRR performs well when a router which is adjacent to the failed
resource (link or router) has Loop-Free Alternate Paths (LFAPs).
However, major issues arise either when there is no local LFAP or
the router assumes that the alternate path is loop-free but this
might not be true if there are inconsistencies in the routers' FIBs
(e.g., due to failure propagation and FIB update delays). In the
latter case, a micro-loop might be formed when the primary path is
replaced with the alternate path.
A relative performance of a fast-reroute mechanism should not be
worse than the performance with relying on the routing protocol's
(e.g., OSPF or IS-IS) standard repair mechanism. The performance
metrics include repair delay, additional overhead, ability to handle
micro-loops, backward compatibility, complexity (processing and
memory), and the repair path coverage and quality. An ideal but
non-realistic fast-reroute mechanism would have zero repair delay,
create no extra overhead, either be micro-loop free or handle all
micro-loops, be backward compatible, require minimum amount of
processing power and memory, and cover all failure scenarios.
Fast reroute can be severely impaired by micro-loops which represent
the worst case scenario for IPFRR. Ref. [2] summarizes the
techniques proposed for minimizing the adverse effects of micro-
loops. These techniques either can resolve the micro-loops only
partially [3], or can suffer high delays [4, "incremental cost
advertisement"], or are highly complex [5,6]. Therefore, a fast
reroute mechanism that is micro-loop free and has comparable
performance results to IPFRR in terms of the above metrics will
significantly help IETF to reach its goal of dynamic and timely
rerouting.
This document describes a new micro-loop free fast reroute mechanism
which enhances IPFRR [1,15] by introducing the concept of pre-
computed remote LFAPs (rLFAPs) on top of the IPFRR local LFAP. This
mechanism achieves complete and fast micro-loop prevention at the
minimal amount of extra complexity. In rLFAP, a router that is
adjacent to the failed resource immediately switches over pre-
computed LFAPs if LFAPs for protecting against this local failure
exist (the same as IPFRR) and instantly propagates failure
information to multi-hop neighbors (MNBs) (e.g., X-hop neighbors,
where X is an integer number representing how many hops away from
the failure). Upon receipt of failure information, MNBs activate
their pre-computed LFAPs that they maintain for protecting against
remote failures within their MNBs. Note that this draft uses the
existing well-defined LFAP criteria in [15] but extends its
applicability by calculating remote LFAPs to protect against remote
failures.
2. Overview of rLFAP
The number of LFAP choices to bypass a failed resource is either
equal or higher at routers which are multi-hop away from the failed
component compared to the router adjacent to the failure (the latter
is the subset of the first). This fact together with the concept of
remote LFAP, which protects against failures at multi-hop routers,
is the key motivation behind rLFAP.
One might think that rLFAP would not be fast enough since there is a
certain failure notification delay before activating remote LFAPs.
However, in rLFAP, each router pre-computes two sets of LFAPs: the
first set includes local LFAPs which are activated instantly (if
exist) when a local failure is detected and the second set includes
remote LFAPs which are activated when notifications of remote
failures arrive from multi-hop routers. Therefore, rLFAP follows the
approach proposed in the IPFRR draft [1] for its local best-effort
fast reroute and quickly continues to correct the previous best
effort decisions as failure notifications keep on arriving from
MNBs. Since rLFAP gradually corrects the previous decisions, micro
loops are detected and corrected before or shortly after they are
formed (rLFAP completely prevents micro-loops for single link
failures and quickly converges to a loop-free path in case of
multiple link failures).
The main features of rLFAP are as follows:
- Prevention of micro-loops without tunneling
- Handling multiple link/node failures
- Faster than IPFRR in the worst case scenarios; comparable
otherwise
- Ability to distinguish link and node failures
- Scalability due to limiting the scope of MNBs to X hops
- Minimal additional overhead for fast failure notification
2.1 Micro-loop Prevention Using Remote LFAPs
S----------------A------------------B
| |
| |
| |
| |
E-----------------------------------D
| |
| |
| |
| |
F-----------G-----------H-----------J
Figure 1: Micro-loop prevention with rLFAP (link E-D fails)
An example link failure scenario is shown in Figure 1, where link
metrics are unity (i.e., hop-count). The primary path from S to D is
S-E-D before link E-D fails. After E detects the failure, IPFRR
switches instantly over the shortest alternate path E-S-A-B-D.
In this case, a micro-loop is formed between S and E and will cause
network disruption until a new primary path S-A-B-D converges if
none of the techniques in Ref. [2] is employed. However, rLFAP
immediately starts using LFAP S-A-B-D when S is notified about the
failure of link E-D. Note that LFAP S-A-B-D is pre-computed at S to
protect against the failure of link E-D (this is a remote link
failure from router S point of view and the path S-A-B-D is a remote
LFAP from link E-D point of view).
2.2 Handling Multiple Simultaneous Failures
S-----------A-----------B-----------C
| |
| |
| |
| |
E-----------------------F-----------D
| |
| |
| |
| |
G-----------------------H
Figure 2: A simple topology with unit link costs demonstrating
rLFAP's capability for handling multiple simultaneous failures
(links E-F and H-F fail at the same time)
Figure 2 shows a scenario where two link failures (links E-F and H-
F) occur. Using IPFRR, E switches over the shortest alternate
path E-G-H-F-D as soon as it detects the failure of link E-F.
However, after the switchover, H sends all packets coming from the
source E back to G since link H-F also failed. The micro-loop
between E and H should be resolved. After resolving the micro-loop
between E and H, another micro-loop between E and S should also be
resolved to enable successful reroute. If S is notified about these
two link failures, then the best LFAP S-A-B-C-D can be utilized
quickly (X=3 is needed in this scenario). The number of LFAPs needed
for the multiple link failures will not be scalable if they are
required to be pre-computed by each router for any combination of
failures within the entire network. However, rLFAP only needs to
protect failures within its MNBs (explained in Section 3.1);
therefore rLFAP significantly enhances the IPFRR scalability in
resolving micro-loops in case of multiple failures (e.g., compared
to the NOT-VIA mechanism [6]).
2.3 Distinguishing Link and Node Failures
A------------B-------------C
| | |
| | |
| | |
| | |
S------------E-------------D
\ /
\ /
\ /
\ /
\ /
\ /
F
Figure 3: rLFAP's ability to distinguish link and node failures
(either link S-E or router E fail)
Another important rLFAP feature is the ability to distinguish
between link and node failures. Figure 3 shows an example topology
with unit link costs for demonstrating the rLFAP's ability to
distinguish link and node failures. The primary path from S to D is
S-E-D before any failure. Initially, a router assumes that it is a
link failure whenever its communication to a neighboring node is
disrupted (e.g., S detects that it can't reach node E through link
S-E). In this case, the pre-computed LFAP S-F-E-D will be activated
immediately without waiting for distinguishing whether it is a link
or node failure. However, if it is a router failure (e.g., node E
fails), the failure notification concept of rLFAP makes it possible
for routers to receive the failure notification from other
interfaces of the failed router (e.g., node S receives failure
notifications of links B-E and F-E using 2-hop (i.e., X=2) failure
notification mechanism). Upon detection of router's failure, the
pre-computed LFAP S-A-B-C-D is activated. Hence, in rLFAP, routers
can distinguish between link and node failures. The significance of
this feature is supported by the recent work by Gjoka et. al. [7],
which shows that the failure coverage of the fast reroute mechanisms
is different for link and node failure cases.
3. Design Details
rLFAP achieves loop-free convergence by introducing two additional
mechanisms on top of IPFRR: multi-hop failure notification and remote
alternate paths (remote LFAPs or SAPs) for protecting against
failures at multi-hop routers. Apart from these mechanisms, rLFAP
implements all its functionalities using the IPFRR framework [1]. In
this section, we first describe multi-hop neighborhoods (MNBHs) which
limit the number of the required remote LFAPs/SAPs for
scalability. And then, two fast failure notification mechanisms and
LFAP calculations for local and remote failures within MNBH are
presented. It is crucial for all fast reroute mechanisms that the
underlying dynamic routing protocol should converge back to its
optimum routes without causing micro-loops once the topology change
is disseminated to all routers in the network. rLFAP provides a fast
re-convergence from LFAPs to the optimum new routes by utilizing
new routes shortly after they are calculated.
3.1 Multi-hop Neighborhood (MNBH)
N11----N12----N13----N14----N15----N16----N17
| | | | | | |
| | | | | | |
N21----N22----N23----N24----N25----N26----N27
| | | | | | |
| | | | | | |
N31----N32----N33----N34----N35----N36----N37
| | | | | | |
| | | | | | |
N41----N42----N43----N44----N45----N46----N47
| | | | | | |
| | | | | | |
N51----N52----N53----N54----N55----N56----N57
| | | | | | |
| | | | | | |
N61----N62----N63----N64----N65----N66----N67
| | | | | | |
| | | | | | |
N71----N72----N73----N74----N75----N76----N77
Figure 4: An example network
Figure 4 shows an example network consists of 49 nodes. A node on
the boundary has two neighbors if it is located on one of four
corner positions (e.g., N11); otherwise three neighbors (e.g., N12).
A non-boundary node has four neighbors (e.g., N22) irrespective of
its location. The issue is to decide the structure of two adjacent
MNBHs: overlapping or non-overlapping. Inconsistencies or micro-
loops may arise on boundaries of adjacent MNBHs if they are
non-overlapping since each MNBH has only a partial view of the
global topology (e.g., failures close to the boundary of adjacent
MNBHs). Also, an additional mechanism is needed to explicitly
maintain the boundaries if multi-hop NHs are non-overlapping.
Therefore, rLFAP uses overlapping MNBHs.
|
|
----N24----
| | |
| | |
----N33----N34----N35----
| | | | |
| | | | |
----N42----N43----N44----N45----N46----
| | | | |
| | | | |
----N53----N54----N55----
| | |
| | |
----N64----
|
|
Figure 5: 2-hop MNBH for N44
The MNBH for each node only defines a local scope within which to
propagate failure notifications. For example, 2-hop (i.e., X-hop
where X=2) MNBH of N11 consists of nodes together with their all
links which are at most 2-hop away from N11. These nodes include N12,
N13, N21, N22, and N31; and hence, there are 5 nodes and 11 links
within 2-hop MNBH of N11. However, N44's 2-hop MNBH includes 12 nodes
and 36 links as shown in Figure 5. N44 has to calculate separate
LFAPs/SAPs for each destination in the network to protect against
link/node failures within its MNBH. Since MNBHs are overlapping and
define only a local scope for each node, no additional mechanism is
needed to explicitly maintain the MNBHs in the network (e.g., a
simple flooding mechanism similar to OSPF LSAs but limited to X-hop
away routers is sufficient for maintaining MNBHs). Each node takes
its best fast reroute action independently in case of a failure
within its MNBH. Minimal inconsistencies (hence micro-loops) are
expected as a result of each node's independent decision since two
overlapping MNBHs' partial topologies have a lot common information.
3.2 Alternate Path Calculation
We describe the alternate path calculation methodology for the node
N44 in Figure 5. Other nodes in the network repeat the same steps
but using their own MNBHs. For simplicity, we assume that only a
single link failure occurs.
N44 has four local links. For each local link LL_k (k=1,2,3,4), the
following calculations are done per each destination Nij
(i=1,2,3,5,6,7 and j=1,2,3,5,6,7):
- Remove LL_k from the topology to anticipate the failure
- Calculate the shortest path to Nij using the new topology. If
the new shortest path to Nij (after the failure) is the same as
the primary path to Nij (before the failure) then break (do not
perform the following three steps). Do not store any alternate
path to Nij for the failure of LL_k. If the primary path to Nij
before the failure has equal cost multipaths (ECMPs) and at
least one of these ECMPs is affected from the failure, then
the following three steps should also be performed.
- Calculate three local shortest alternate paths (SAPs) via
immediate neighbors to Nij (there are three immediate neighbors
after the link failure and therefore three SAPs need to be
calculated). Here SAP is defined as the shortest distance path to
Nij via an immediate neighbor calculated by the Djikstra
algorithm after removing LL_k from the topology
- Store the shortest local LFAP among these three local SAPs (if
LFAP exists) and modify the entry, corresponding to this
source-destination pair, in the path safety matrix. This matrix
keeps track of loop-free source-destination pairs which will be
used during the remote LFAP calculation
A similar method will be used for calculating remote alternate
paths. N44 has 32 remote links within its MNBH. For each remote link
RL_k (k=5,6,...,32), the following calculations are done per each
destination Nij (i=1,2,3,5,6,7 and j=1,2,3,5,6,7):
- Remove RL_k from the topology to anticipate the failure
- Calculate the shortest path to Nij using the new topology. If the
shortest path to Nij is the same as the primary path to Nij
then break (do not perform the following three steps). Do not
store any alternate path to Nij for the failure of RL_k
- Calculate four remote shortest alternate paths (SAPs) via
immediate neighbors to Nij (since RL_k is not a local interface
there are four immediate neighbors for this node).
- Store the shortest remote LFAP among these four remote SAPs and
update the path safety matrix if any LFAP exists. If there is no
LFAP, then check the path safety matrix if there is any neighbor
which has a loop-free path to this destination indicated by
the path safety matrix. If there is/are such neighbor(s), then
use the shortest one as remote LFAP and update the corresponding
entry in the path safety matrix.
The above algorithms make sure that a local or remote alternate
path will be stored only if the primary path does not function
anymore after the failure. Therefore, rLFAP stores only the
required alternate paths and is scalable. Note that routers only
store alternate next-hops (not the alternate path itself).
3.3 Fast Failure Notification Mechanism
The objectives of a multi-hop failure notification mechanism are as
follows:
- Routers should be notified about failures as fast as possible to
minimize the reroute delay from the primary path to an LFAP
- Failure notification should create minimal overhead in terms of
bandwidth consumption. For example, generating too many LSA
packets in OSPF consumes the available bandwidth and may cause
disruption in other parts of the network beyond the failure
point
- A solution which does not require modifications to the
underlying routing protocol is preferable
- A failure notification should not cause the network instability
under some worst-case scenarios (e.g., sending too many OSPF LSA
messages for transient failures (link flapping) may cause the
network instability)
There are two options for the multi-hop failure notification in
rLFAP when a router detects a local failure:
1) Configuring routing protocol's parameters for the fast failure
propagation by relying on periodic link state updates (e.g.,
LSAs for OSPF and LSPs for IS-IS)
2) Implementing a new efficient fast failure notification mechanism
within the MNBH
3.3.1 Configuring Routing Protocol's Parameters
rLFAP is proposed initially for link state IGPs, where link state
update packets are LSAs in OSPF and LSPs in IS-IS. For simplicity, we
describe the concept for OSPF; all the procedures are applicable to
IS-IS as well. This option does not introduce a new signaling
mechanism but optimizes the existing link state update mechanisms for
rLFAP's performance efficiency.
The flooding procedure by which OSPF distributes LSAs is reliable. A
router packages its new LSA within a link state update packet (may
contain multiple distinct LSAs) and transmits it on each of its
interfaces which are in the same OSPF area impacted by the LSA. Each
recipient acknowledges the router from which the LSA was received,
repackages the LSA within a new link state update packet and sends
it out on each of its interfaces except for the interface on which
the LSA was received.
When the content of an LSA changes, a new LSA is originated
[8]. However, two instances of the same LSA may not be originated
within the time period MinLSInterval. This may require that the
generation of the next instance may be delayed by up to
inLSInterval. Although MinLSInterval is an architectural constant
(default is 5 secs), implementations could make this interval
configurable in order to speed up the failure propagation [12].
Ref. [9] studies how to achieve sub-second IGP convergence in large
IP networks by configuring the routing protocol parameters. Ref.
[10] shows that minLSInterval around 20-30 ms does not generate much
overhead while providing fast failure propagation to multi-hop
routers. Based on these studies, we suggest that minLSInterval which
is around 20-30 ms will provide rLFAP with fast failure notification
mechanism due to the MNBH, where the maximum number of hops between
two nodes in which one node includes another within its MNBH is
limited to X hops. Note also that this interval limits the successive
generations of the same LSA. The maximum delay (i.e., 20-30 ms) is
realized only if there are two successive topology changes in which
the second failure occurs just after an LSA for the first failure is
generated.
3.3.2 An Efficient Failure Flooding Mechanism within MNBH
For the stable wired backbone networks, configuring routing
protocols' parameters for fast failure notification will neither
create much additional signaling overhead nor network instability
since transient failures (i.e., link flapping) are rarely occured in
these networks. However, for wireless mobile ad-hoc or backbone
networks, the drawback of configuring the routing protocol's
parameters for fast failure propagation is that the routing protocol
overhead will be enormous in the case where frequent transient
failures (e.g., link flapping) occur. This overhead is due to too
many LSA updates which are generated for each transient topology
change and flooded to the entire network. The frequent transient
failures may also cause the network instability since the routing
protocol may repeat shortest path calculations and FIB updates too
frequently for each topology change without allowing a common
convergence.
The LSA flooding scope is more explicit in OSPF IPv6 and appears in
the LS type field [13]. There are three separate flooding scopes for
LSAs:
- Link-local scope: LSA is flooded only on the local link and no
further.
- Area scope: LSA is flooded only throughout a single OSPF area.
- AS scope: LSA is flooded throughout the routing domain.
However, rLFAP needs LSA's scope to be configurable to its MNBH and
none of these scopes satisfies this requirement. Moreover, an LSA is
needed by the routing protocol and should be at least flooded within
the same OSPF area.
Therefore, another option for the fast failure notification mechanism
in rLFAP is to implement a new flooding procedure within MNBH which
minimizes routing protocol instability and overhead. The new flooding
mechanism defines a new link update packet (LUP) which is similar to
LSA in OSPF but includes two new fields: Time-To-Live (TTL) and
Stop-Flooding (SF) bit. TTL indicates the maximum number of hops a
new LUP will be transmitted. The transmission is stopped if TTL field
is zero and SF is one (i.e., true). SF decides whether the flooding
of LUP will continue beyond the MNBH. A router continues flooding an
LUP within the MNBH irrespective of its SF value. However, if TTL is
zero, then a router continues the flooding procedure only if the
LUP's SF value is zero (i.e., false). An intermediate router changes
SF value to one if LFAPs, which protect against this particular
failure described in LUP, are found for all destinations in the
network. An intermediate router is not allowed to change SF value
from one to zero because the SF value 1 indicates that LFAPs are
found for all destinations (i.e., full coverage).
LUP has its own timers for controlling its new packet generation and
flooding mechanism. This is another reason for introducing a new
packet format since the timers for controlling LSA flooding can not
be set independently for each LSA. The rLFAP flooding procedure
will be the same as OSPF's flooding procedure but with the following
modifications:
- Parameters (e.g., timers) for the LUP flooding are set to
aggressive values for fast failure notifications while parameters
for LSAs are set to relatively conservative for minimizing the
routing protocol instability and overhead.
- Each router sets TTL field to the number of maximum hops defined
by the MNBH (i.e., parameter X in X-hop MNBH) and SF bit to 0
upon generation of a new LUP packet.
- Each recipient of a LUP packet decrements TTL and sets SF bit to
1 if all required LFAPs are found and then transmits it on each
of its interfaces except for the interface on which the LUP was
received only if TTL field is nonzero.
- Each recipient which observes zero TTL field continue the
flooding procedure if SF bit is still zero; otherwise stops the
flooding procedure.
There are two advantages in this method: i) minimal routing overhead
since flooding LUP update messages are limited to MNBH, ii) fast
failure notification since the parameters of the flooding procedure
(e.g., timers) can be set independently from the routing protocol's
flooding mechanism's parameters. This efficient flooding mechanism
is expected to substantially decrease the amount of additional
overhead and routing protocol instability, which are observed during
the transient failures, while fast failure notifications are still
provided.
3.4 Remote LFAPs and Scalability
In rLFAP, remote LFAPs are calculated only to protect against node
and link failures within the MNBH. Therefore, the number of
LFAPs for a single failure scenario will not be an important issue
in terms of scalability.
However, in case of protecting against multiple failures, the number
of pre-computed LFAPs to protect against some combinations of these
failures might not be scalable in rLFAP depending on the depth of
the MNBH (i.e., parameter X) and the average number of links for
each node. Our design utilizes the concept of KEYLINKS introduced
in Ref. [14] for the purpose of handling this scalability problem.
The main idea behind KEYLINKS is that LFAP for a certain
destination is needed only if the primary path fails. Therefore,
each node maintains what nodes/links in its MNBH are used for
reaching a certain destination and calculates an LFAP only for the
links on its primary path (i.e., LFAP, which is needed only if at
least one of the links on the primary path fails, should not use any
of multiple failures). Maintaining key links and nodes adds
additional complexity while the gain is obtained in terms of
scalability (i.e., fewer LFAPs need to be pre-computed and stored).
For example, N44's 2-hop MNBH includes 12 nodes and 36 links as
shown in Figure 5. For a certain destination, assume that 2 nodes
and 3 links from the 2-hop MNBH of N44 are on the primary path.
Therefore, LFAPs are needed only to protect against 5 members (5
LFAPs) instead of 48 members (48 LFAPs) of the 2-hop MNBH for a
single failure scenario. This reduction for a single destination is
significant since each node needs to pre-compute LFAPs for all
possible destinations in the network.
3.5 Routing Convergence from LFAPs to New Primary Paths
It is crucial for all fast reroute mechanisms that the underlying
dynamic routing protocol should converge to its new optimum routes
once the topology change is disseminated to all routers in the
network. The loops can still form in this phase if the FIB updates
are not done in the right order.
In rLFAP, all nodes within the MNBH have pre-computed alternate
paths protecting against the failures within the MNBH. This makes
sure that all nodes within the MNBH are aware of these failures
and therefore their current routes do not include the failed
resources. This feature provides a flexibility of timely switching
back to new primary paths when new paths are calculated by the
routing protocol. rLFAP switches back to new primary paths after
waiting for a constant delay representing the rLFAP convergence
time within the MNBH (this delay starts after new paths are found
by the routing protocol).
Another important feature of rLFAP is that the minimum number of
next hop changes is expected during switching from LFAPs to new
primary paths of the dynamic routing protocol. The reason is that
the next hop change will occur only if the next hop in the new
primary path is different than the next hop in the alternate path.
Due to the MNBH concept, it is conjectured that the next hops for
LFAPs and the new primary paths will be the same with a high
probability and the minimum number of FIB updates is expected
during switching from LFAPs to new primary paths. Our initial
analysis shows that
3.6 Applicability of rLFAP Mechanism to Support Fast PIM-SM Tree
Repair
Protocol Independent Multicast - Sparse Mode (PIM-SM) [16] is a
widely available multicast protocol that achieves efficient
distribution of multicast content through on-demand construction of
shared trees and shortest path trees. Key to its efficiency is its
use of Join messages to build the multicast tree from the multicast
group members towards the Rendezvous Point (RP) or the content
source for the cases of shared trees and shortest path trees (SPTs),
respectively.
Unfortunately, PIM-SM reacts to link failure events even more slowly
than the underlying routing protocol (e.g., OSPF, IS-IS, etc.).
Specifically, not only must the underlying unicast routing protocol
converge to reflect the degraded network topology, but the
appropriate PIM Join messages must be sent, received, and processed
before multicast tree repair is completed. Furthermore, depending
on the router implementation, PIM-SM may not even recognize a
change in the underlying unicast paths for several seconds when
time-based events are used to trigger the PIM-SM process to compare
the current routing information for changes that impact the reverse
path forwarding (RPF) to RPs and multicast sources.
The rLFAP features can help speed up PIM-SM tree repair by
accelerating the convergence to correct RPF information about a
failure at affected nodes in the LUP TTL MNBH. To illustrate
potential benefits of the rLFAP mechanism in the context of PIM-SM,
the topology of Figure 2 is considered again. Here, however, Node S
is supposed to denote the source node for packets to be delivered by
an SPT to the multicast group Z comprising Nodes D and G. The SPT
constructed by PIM-SM (assuming unit cost links) is shown in Figure
6.
S A B C
|
|
|
|
E-----------------------F-----------D
|
|
|
|
G H
Figure 6: SPT constructed by PIM-SM for multicast group Z = (D,G)
with source node S based on the topology of Figure 2 (links not
part of the SPT are omitted from the figure).
Now, per the scenario of Figure 2, it is assumed that links E-F and
F-H of Figure 2 fail at the same time. However, when the LUP
originated by Node F arrives at Node D, D will recognize that the RPF
to S has changed and D will use the pre-computed LFAP to S via the
link D-C as the link for the new RPF to S. The processing described
thus far is native to the rLFAP mechanisms already presented herein.
In order to complete the speed-up of PIM-SM tree repair, routers
employing rLFAP mechanisms MUST trigger the PIM-SM process to check
the impact on RPF information of those multicast groups for which
PIM-SM state is maintained locally whenever a LUP is received and
processed. In doing so, the PIM-SM process will learn of the new
correct RPF information for affected multicast groups within
milliseconds of the received LUP being processed. The PIM-SM
implementation MUST then immediately issue the appropriate (*,G)
and/or (S,G) Join messages. For the scenario of Figure 6, Node D
will issue a (S,G) Join message to its PIM neighbor Node C via the
link D-C which, in turn, sends a Join message to Node B, and so
forth. The resulting repaired PIM-SM SPT for Group Z is shown in
Figure 7.
S-----------A-----------B-----------C
| |
| |
| |
| |
E F D
|
|
|
|
G H
Figure 7: Reconstructed by PIM-SM for multicast group Z = (D,G)
following fast tree repair based on LUPs originated by Node F (links
not part of the SPT are omitted from the figure).
A few additional points regarding the illustrative scenario of
Figures 6 and 7 are worth noting. First, the routing instance at
Node F does not originate a new Join message for Group Z as it is no
longer "upstream" to Group Z members. That is, a router applying
the rLFAP mechanism to speed up PIM-SM tree repair SHOULD suppress
sending PIM Join messages if the neighbors for which it maintained
Join state are no longer "downstream" from it with respect to the
source. Second, the illustrative scenario given here applies without
loss of generality to RP-based (i.e., shared) trees: The only
difference being (in the example here) that Node D would issue a
(*,G) Join message to repair its branch of the multicast tree. Last,
the SPT branch going to multicast group member G was not affected by
the link failures and, therefore, LUPs originated by Nodes E and H
received at Node G (correctly) did not initiate tree repair.
4. Experimental Results
4.1 LFAP Coverage Analysis
We performed the coverage analysis of the fast reroute mechanism
presented here on realistic topologies, which were generated by the
BRITE topology generator in bottom-up mode [17]. The LFAP coverage
percentage is defined here as the percentage of the number of LFAPs
for protecting the primary paths which are failed because of link
failures to the number of all failed primary paths. Only local LFAPs
are considered in the coverage calculation for the neighborhood depth
of 0 (i.e., X=0) while both local and remote LFAPs are taken into
account when the neighborhood depth is set to a value greater than 0
(i.e., X>0).
The realistic topologies include AT&T and DFN using pre-determined
BRITE parameter values from Ref.[17] and various random topologies
with different number of nodes and varying network connectivity. For
example, the number of nodes for AT&T and DFN are 154 and 30,
respectively, while the number of nodes for other random topologies
is varied from 20 to 100. The BRITE parameters which are used in our
topology generation process are summarized in Figure 10 (see Ref.[17]
for the details of each parameter). In summary, m represents the
average number of edges per node and is set to either 2 or 3. A
uniform bandwidth distribution in the range 100-1024 Mbps is selected
and the link cost is obtained deterministically from the link
bandwidth (i.e., inversely proportional to the link bandwidth as used
by many vendors). Since the values for p(add) and beta determine the
number of edges in the generated topologies, their values are varied
to obtain network topologies with varying connectivity (e.g., sparse
and dense).
|----------------------------|-----------------------|
| | Bottom up |
|----------------------------|-----------------------|
| Grouping Model | Random pick |
| Model | GLP |
| Node Placement | Random |
| Growth Type | Incremental |
| Preferential Connectivity | On |
| BW Distribution | Uniform |
| Minimum BW | 100 |
| Maximum BW | 1024 |
| m | 2-3 |
| Number of Nodes (N) | 20,30,50,100,154 |
| p(add) | 0.01,0.05,0.10,0.42 |
| beta | 0.01,0.05,0.15,0.62 |
|----------------------------|-----------------------|
Figure 10: BRITE topology generator parameters
The coverage percentage of our fast reroute method is reported for
different network topologies (e.g., different number of nodes and
varying network connectivity) using neighborhood depths of 0, 1,
and 2. (i.e., X=0, 1, and 2). For a particular failure, LFAPs
protecting the failed primary paths are calculated only by those
nodes which are within the multi-hop neighborhood of this failure.
Note that these nodes are determined by the parameter X as follows:
For X=0, two nodes which are directly connected to the failed link,
for X=1, two nodes which are directly connected to the failed link
and also neighboring nodes which are adjacent to one of the
outgoing links of these two nodes, and so on.
The LFAP coverage percentage for a certain topology is computed by
the following formula: LFAP Coverage Percentage = N_lfaps*100/N_fpp
where N_lfaps is the number of source-destination pairs whose
primary paths are failed because of link failures and have LFAPs for
protecting these failed paths, and N_fpp is the number of
source-destination pairs whose primary paths are failed because of
link failures. The source-destination pairs, in which source and
destination nodes do not have any physical connectivity after a
failure, are excluded from N_fpp. Note that the coverage percentage
includes a network-wide result which is calculated by averaging all
coverage results obtained by individually failing all edges for a
certain network topology.
Figure 11 shows the LFAP coverage percentage results for random
topologies with different number of nodes (N) and network
connectivity, and Figure 12 shows these results for AT&T and DFN
topologies. In these figures, E_mean represents the average number
of edges per node for a certain topology. Note that the average
number of edges per node is determined by the parameters m, p(add),
and beta. We observed that E_mean increases when p(add) and beta
values increase. For each topology, LFAP coverage analysis is
repeated for 10 topologies generated randomly by using the same
BRITE parameters. E_mean and LFAP coverage percentage are
obtained by averaging the results of these ten experiments.
|------------|-----|------|------|------|------|
| Case |N |E_mean|X=0 |X=1 |X=2 |
|------------|-----|------|------|------|------|
|p(add)=0.01 |20 |3.64 |82.39 |98.85 |100.0 |
|beta=0.01 |50 |3.86 |82.10 |98.69 |100.0 |
| |100 |3.98 |83.21 |98.04 |100.0 |
|------------|-----|------|------|------|------|
|p(add)=0.05 |20 |3.70 |85.60 |99.14 |100.0 |
|beta=0.05 |50 |4.01 |84.17 |99.09 |100.0 |
| |100 |4.08 |83.35 |98.01 |100.0 |
|------------|-----|------|------|------|------|
|p(add)=0.1 |20 |5.52 |93.24 |100.0 |100.0 |
|beta=0.15 |50 |6.21 |91.46 |99.87 |100.0 |
| |100 |6.39 |91.17 |99.86 |100.0 |
|------------|-----|------|------|------|------|
Figure 11: Coverage percentage results for random topologies
|------------|-----------|------|------|------|------|
| Case |N |E_mean|X=0 |X=1 |X=2 |
|------------|-----------|------|------|------|------|
|p(add)=0.42 |154 (AT&T) |6.88 |91.04 |99.81 |100.0 |
|beta=0.62 |30 (DFN) |8.32 |93.76 |100.0 |100.0 |
|------------|-----------|------|------|------|------|
Figure 12: Coverage percentage results for AT&T and DFN topologies
There are two main observations from these results:
1. As the neighborhood depth (X) increases the LFAP coverage
percentage increases and the complete coverage is obtained using
a low neighborhood depth value (i.e., X=2). This result is
significant since failure notification message needs to be sent
only to nodes which are two-hop away from the point of failure
for the complete coverage. This result supports that our method
provides fast convergence by introducing minimal signaling
overhead within only the two-hop neighborhood.
2. The topologies with higher connectivity (i.e., higher E_mean
values) have better LFAP coverage compared to the topologies
with lower connectivity (i.e., lower E_mean values). This is
an intuitive result since the number of possible alternate hops
in dense network topologies is higher than the number of
possible alternate hops in sparse topologies. This phenomenon
increases the likelihood of finding LFAPs, and therefore the
LFAP coverage percentage.
4.2 Convergence Analysis Based on Testbed Experiments
F-----------E------------A------------|
| | | |
| | | |
| | | |
| | | |
G-----------D------------| B
| |
| |
| |
| |
C-------------------------|
Figure 13: 7-node network topology for local LFAP convergence
experiments
F-----------E------------A------------|
| | |
| | |
| | |
| | |
G-----------D B
| |
| |
| |
| |
C-------------------------|
Figure 14: 7-node network topology for remote LFAP convergence
experiments
We performed the convergence analysis of our new fast reroute
presented here using 7-node network consisting of 2600 and 3600
series Cisco routers as shown in Figures 13 and 14. Link costs are
set to the same value for all links. A dedicated computer
(e.g., an agent), which is running our fast reroute technology,
is connected to each router through an Ethernet cable. These
agents obtain the network topology from the routers in real-time
by issuing an SNMP request. Using the retrieved network topology
information, a set of LFAPs, which protects the failed primary
paths when a real failure occurs, is pre-computed and stored. For
both networks, the link between routers A and E is failed for all
experiments.
The convergence time on the alternate path is measured by running
a session, similar to a ping application, between routers A and E.
When the link between routers A and E is failed on the topology as
shown in Figure 13, the primary path A-E (and hence the session
between them) fails. Once the failure is detected, the router A (E)
reroutes its traffic over the alternate path A-D-E (E-D-A) by
installing its pre-computed local LFAP. However, when the same
scenario is applied to the topology in Figure 14, there is no local
LFAP in the router A (E) to protect the primary path A-E (E-A) for
this failure. For the above scenario, our fast reroute technology
propagates the failure information to router B (D) which is
already pre-computed a set of remote LFAPs for this particular
failure. As a result, the session is rerouted through the
alternate path A-B-C-D-E (E-D-C-B-A).
For both local and remote LFAP scenarios, the experiments are
repeated for 10 times. The mean convergence time for the local LFAP
scenario is 602 milliseconds while the mean convergence time for
the remote LFAP scenario is 612 milliseconds. Note that LFAPs are
stored in the agents which are external to the routers. These agents
issue an IOS command to install the right set of LFAPs when a
failure is detected. In reality, the agent technology should run
in the router as a GPP and LFAPs should be installed beforehand
and waiting for a failure signal to be activated. Therefore, the
results should be used to compare the difference between local and
remote LFAPs rather than their absolute values.
The convergence times do not include the failure detection time
since our primary objective in these experiments is to compare local
and remote LFAP rather than proposing a new failure detection
mechanism. For the sake of experiments, we implemented a heuristic
based failure detection mechanism using periodic light-weight
hearbeat messages similar to the Bidirectional Forwarding Detection.
Note that the alternate path between A and E in the remote LFAP
scenario is longer (A-D-E vs. A-B-C-D-E). The failure information is
reached to one-hop neighbor within a few milliseconds since the round
trip time between two neighboring agents is measured around 1-2
milliseconds. These results indicate that the convergence time of the
remote LFAP mechanism is only slightly higher compared to the only
local LFAP mechanism due to the failure notification. This increase
is bounded by the neighborhood depth times a few milliseconds.
However, the remote LFAP significantly increases the alternate path
coverage since there is no local LFAP to protect the session between
routers A and E when the link between routers A and E fails in
Figure 14.
5. Scope and Applicability
This work is proposed initially for link state IGPs (i.e., OSPF and
IS-IS). Further study is needed for extending its applicability to
non-link state IGPs or BGP.
6. Acknowledgements
The authors would like to thank John Scudder, the co-chair of the
IETF RTGWG, for his valuable comments and suggestions on the
preliminary version of this draft.
7. References
Internet-drafts are works in progress available from
[1] M. Shand and S. Bryant, "IP Fast Reroute Framework",
draft-ietf-rtgwg-ipfrr-framework-06.txt, Oct. 2006 (work in
progress).
[2] S. Bryant and M. Shand, "A Framework for Loop-free Convergence",
draft-bryant-shand-lf-conv-frmwk-03.txt, Oct. 2006 (work in
progress).
[3] Alex Zinin, "Analysis and Minimization of Microloops in Link-
state Routing Protocols", draft-ietf-rtgwg-microloop-analysis-
01.txt, Oct. 2005 (work in progress).
[4] Francois et. al., "Loop-free convergence using ordered FIB
updates", , Oct. 2006 (work
in progress).
[5] S. Bryant, M. Shand, "IP Fast Reroute using tunnels", , Apr 2005 (work in progress).
[6] S. Bryant, M. Shand, and S. Previdi, "IP Fast Reroute Using Not-
via Addresses", , Oct. 2006 (Work in progress).
[7] M. Gjoka, V. Ram, and X. Yang, "Evaluation of IP Fast Reroute
Proposals", to appear in IEEE/Create-Net/ICST COMSWARE 2007.
[8] J. Moy, "OSPF Version 2", RFC 2328, April 1998.
[9] P. Francois, C. Filsfils, J. Evans, and O. Bonaventure,
"Achieving sub-second IGP convergence in large IP network",
SIGCOMM Comput. Commun. Rev., 35(3):35-44, 2005.
[10] M. Pitkanen and M. Luoma, "OSPF Flooding Process Optimization",
Workshop on High Performance Switcing and Routing (HPSR),
pp.448-452, May 2005.
[11] G. Iannaccone et. al., "Analysis of link failures in an IP
backbone", in Proc. ACM Sigcomm Internet Measurement Workshop,
Nov. 2002.
[12] Cisco Systems, Inc., Cisco IOS IP Routing Protocols
Configuration Guide, Release 12.4.
[13] R. Coltun, D. Ferguson, and J. Moy, "OSPF for IPv6", RFC 2740,
December 1999.
[14] S. Lee, Y. Yu, S. Nelakuditi, Z. Zhang, and C. Chuah,
"Proactive vs. reactive approaches to failure resilient
routing", Proc. INFOCOM 2004.
[15] A. Atlas and A. Zinin, "Basic Specification for IP Fast-
Rerorute: Loop-free Alternates", , March 2007 (work in progress).
[16] B. Fenner, M. Handley, H. Holbrook and I. Kouvelas, "Protocol
Independent Multicast (PIM-SM): Protocol Specification,"RFC
4601, August 2006.
[17] Oliver Heckmann et al., "How to use topology generators to
create realistic topologies", Technical Report, Dec 2002.
8. Authors' Addresses
Ibrahim Hokelek
Applied Research,
Telcordia Technologies, Inc.
RRC-1E313
One Telcordia Drive,
Piscataway, NJ 08854
United States. Email: ihokelek@research.telcordia.com
Mariusz A. Fecko
Applied Research,
Telcordia Technologies, Inc.
RRC-1E326
One Telcordia Drive,
Piscataway, NJ 08854
United States. Email: mfecko@research.telcordia.com
Provin Gurung
Applied Research,
Telcordia Technologies, Inc.
RRC-1D305
One Telcordia Drive,
Piscataway, NJ 08854
United States. Email: pgurung@research.telcordia.com
Sunil Samtani
Applied Research,
Telcordia Technologies, Inc.
RRC-1P387
One Telcordia Drive,
Piscataway, NJ 08854
United States. Email: ssamtani@research.telcordia.com
Selcuk Cevher
Applied Research,
Telcordia Technologies, Inc.
RRC-1A212
One Telcordia Drive,
Piscataway, NJ 08854
United States. Email: cevhers@research.telcordia.com
John Sucec
Applied Research,
Telcordia Technologies, Inc.
RRC-1G313
One Telcordia Drive,
Piscataway, NJ 08854
United States. Email: sucecj@research.telcordia.com
Full Copyright Statement
Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.