Personal M. Scott Corson Ansible Systems Internet Draft A. O'Neill BT G. Tsirtsis Flarion Document: draft-corson-fastrestore-00.txt Category: Informational November 2000 IP Fast Restoration Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This memo describes a generalized approach for achieving IP fast restoration in response to link and router failures by using IP routing to support high network availability. The approach relies on the utilization of flat, yet highly-localized routing technology to reduce far-reaching control message propagation, thereby limiting the effects of a failure and enabling sub-second restoration. It then outlines how the Temporally-Ordered Routing Algorithm (TORA) (a Mobile Ad hoc Network (MANET) routing protocol), when suitably modified for use in fixed networks, can be used within this framework. The general approach to routing topology management within the fast restoration framework is immediate reaction on link failure (to preserve existing flows) and gradually integration of new network elements on activation into the domain (to further assure stability). Internet Draft Fast Restore August 2000 INDEX 1. Introduction . . . . . . . . . . . . . . . . . . . . . 2 1.1 Terminology. . . . . . . . . . . . . . . . . . . 2 2. Motivation . . . . . . . . . . . . . . . . . . . . . . 3 3. The Fast Restoration Framework . . . . . . . . . . . . 4 3.1 Highly Localized Routing Protocol. . . . . . . . 5 3.2 Link Status Assessment Protocol. . . . . . . . . 5 3.3 Reaction to Failure. . . . . . . . . . . . . . . 6 4. The Temporally-Ordered Routing Algorithm . . . . . . . 7 4.1 Overview . . . . . . . . . . . . . . . . . . . . 7 4.2 Router Activation (Route Creation) . . . . . . . 8 4.3 Link Failure . . . . . . . . . . . . . . . . . . 9 4.4 Link Activation. . . . . . . . . . . . . . . . . 9 4.5 Router Failure (Route Erasure) . . . . . . . . . 10 5. Detecting Router Failures with Single-Homed Prefixes . 10 5.1 Neighbor Advertisement Protocol. . . . . . . . . 11 5.2 Neighbor Reachability Protocol . . . . . . . . . 12 6. Conclusion . . . . . . . . . . . . . . . . . . . . . . 14 7. References . . . . . . . . . . . . . . . . . . . . . . 15 1. Introduction This memo describes a generalized approach for achieving IP fast restoration in response to link and router failures by using IP routing to support high network availability. The approach relies on the utilization of flat, yet highly-localized routing technology to reduce far-reaching control message propagation, thereby limiting the effects of a failure and enabling sub-second restoration. It then outlines how the Temporally-Ordered Routing Algorithm (TORA) (a Mobile Ad hoc Network (MANET) routing protocol), when suitably modified for use in fixed networks, can be used within this framework. The general approach to routing topology management within the fast restoration framework is an immediate reaction on link failure (to preserve existing flows) and gradually integration of new network elements on activation into the domain (to further assure stability). The draft first discusses the motivation for this draft and its relevance to a recent IAB report [IAB99] on routing convergence. It then outlines a framework of protocol mechanisms, and then gives a fuller description of the main mechanisms. It then describes a candidate routing algorithm that can be used within this framework. It then additionally describes the mechanisms within the framework to address efficiency and localization issues associated with the failure of routers advertising single-homed prefixes. The draft concludes with a discussion of the benefits of this fast restoration framework. 1.1 Terminology Internet Draft Fast Restore August 2000 Here are some new terms defined in this draft. RID Router ID. A unique ID used by a routing protocol to identify a router. RPLF Routing Protocol Link Failure. A RPLF packet is generated as a routing protocol's reaction to a link failure for each destination prefix affected by the failure RPDF Routing Protocol Destination Failure. RPDF is a packet generated to eliminate routing to a destination that has been partitioned from the rest of the network LFDT Link Failure Detection Timer. Failure to receive an FRLS packet with the neighbor heard flag set within the LFDT setting indicates link failure. LFUB Link Failure Upper Bound. This is the upper bounds of the time required to detect failure. FRNA Fast Restore Neighbor Advertisement. A router's FRNA packet also contains the RIDs and corresponding prefixes of the router's one-hop neighbors. FRLS Fast Restore Link Status. A FRLS packet contains a router's RID and an indication of whether or not the router has recently heard its neighbor. 2. Motivation The report from the recent IAB Network Layer Workshop [IAB99] raised several routing issues. Regarding routing convergence, it noted that "at least one backbone operator has concerns about the convergence time of internetwork-wide routing during a failover. This operator believes that current convergence times are on the order of half a minute, and possibly getting worse. Others in the routing community did not believe that the convergence times are a current issue. Some, who believe that real-time applications (e.g. telephony) require sub-second convergence, are concerned about the implications of convergence times of a half minute on such applications. [IAB99]" It is agreed that existing Internet intra-domain routing protocols (e.g. [RIP,OSPF]) have limited responsiveness in reacting to topological changes (i.e. slow convergence) and consequently have limitations on the size of domains they may support. These limitations stem directly from the use of shortest-path routing based on global topological information exchange. For any premium service with critically tight availability requirements, operators are left with the use of hot standby routers and link level restoration to improve restoration times. Both Internet Draft Fast Restore August 2000 mechanisms are however very expensive, and applicable mainly to large-scale core elements. However, these cannot be cost-effectively applied in edge networks such as IP enabled Radio Access Networks and base stations and flat enterprise networks. These will likely be composed of large numbers of small and necessarily cheap IP routers which may be interconnected by ad-hoc, non-resilient or unprotected link layers such as ethernet and various serial lines. The need for a fast restoring IGP is therefore apparent. In addition, if a cost- effective IP layer solution could be found which could be made sympathetic to traffic engineering requirements then the requirement for hot standby and link layer restoration could be reduced resulting in cost savings. The traffic engineering implications are not however considered in this draft. By relaxing the requirement for shortest path routing only slightly (with acceptable impact on forwarding efficiency), one can approximate shortest-path routing while enabling IP fast restoration and higher levels of domain scalability. This can be achieved by utilizing adaptive routing algorithms that localize their response to topological changes while ensuring loop-free paths. It is localization that yields the desired benefits of reduced restoration time and increased domain size. The proposed fast restoration approach therefore offers a potential intra-domain solution to this convergence problem in reaction to failure. It is presently unclear whether the fast restoration techniques discussed in this memo have potential applicability to the associated inter-domain routing problem. 3. Fast Restoration Framework Four general protocol mechanisms are required in the proposed fast restoration framework. The first two are mandatory in that they provide the basic fault detection and reaction mechanisms. The second two are optional and illustrate one way to enable the reaction to be tailored to whether the failure point is a link or a neighbor router. 1) A highly localized routing protocol This is required to enable the protocol to be able to react extremely quickly to failures, with minimal impact on other routers within the domain. 2) A link status assessment protocol This is required to enable a router to rapidly detect link or router failures. 3) A neighbor advertisement protocol This is used to enable a router to learn about its local two hop topology including its neighbors' neighbors and the degree of connectivity (single/multi-homed) of subnets. Internet Draft Fast Restore August 2000 4) A neighbor reachability protocol This is used to enable a router to identify whether it is the link or the neighbor router that has failed through collaboration with its two hop neighbors. 3.1 Highly Localized Routing The general approach to routing topology management within the fast restoration framework is immediate reaction on link failure (to preserve existing flows) and gradually integration of new network elements on activation into the domain (to further assure stability). The method by which the aforementioned route adjustment occurs is a function of the routing algorithm. It is essential that a routing algorithm have the following properties to operate within this fast restoration framework. 1. The protocol reaction to a link failure that does not partition the network be localized for all destination prefixes. 2. The protocol reaction to a router failure that does not partition the network should be localized for all destination prefixes. (i.e. it should be treated as a link failure with respect to these other prefixes). 3. The protocol reaction to a link failure that does partition the network be capable of detecting the partition (and erasing routes if desired). 4. The protocol reaction to a router failure that does partition the network be capable of erasing routes for these prefixes (if desired). Single-homed prefixes on the failed router will by definition create a partition. The preceding description assumes that a routing protocol reacts to a link or router failure by sending generic Routing Protocol Link Failure (RPLF). Routing Protocol Destination Failure (RPDF) packets as generated as necessary in response to partitions. 3.2 Link Status Assessment The link status assessment protocol is principally concerned with fast detection of link failures. Work within the IETF on a generalized IP level link status assessment algorithm was recently proposed [FLIP]. A similar capability is required here where a link consists of a pair of Router IDs (RIDs). After discovery of a neighbor (identified by its RID) via the neighbor discovery protocol, the objective of a link status assessment algorithm would be to monitor with high resolution (in time) the bi-directional connectivity of a link with an adjacent neighbor router. Internet Draft Fast Restore August 2000 It is desired that a link failure be detectable on millisecond time scales enabling the higher level mechanisms (to be described) to quickly react thereby preserving ongoing sessions. The protocol operates by periodically sending Fast Restore Link Status (FRLS) packets between adjacent interfaces to reconfirm link bi- directionality. The FRLS retransmission period is set to LINK_PROBE_INTERVAL time units. A FRLS packet contains a router's RID and an indication of whether or not the router has recently heard its neighbor. FRLS packets are initially sent indicating that the sending router has not heard its neighbor. Reception of a FRLS packet at a router from a neighbor causes the router to set the "neighbor heard" flag to TRUE for ROUTER_HEARD TIME (perhaps 3.5 times LINK_PROBE_INTERVAL) during which time it sends FRLS packets back towards its neighbor indicating that it has heard its neighbor. Reception of any packet from its neighbor causes the router to reset the ROUTER_HEARD_TIMER associated with this flag setting. Reception of a FRLS packet with the neighbor heard flag set to TRUE causes the router to consider the link with its neighbor as bi-directional. At that time the Link Failure Detection Timer (LFDT) is also set to ROUTER_HEARD_TIME. Reception of a subsequent FRLS packet with the neighbor heard flag set resets the LFDT. Failure to receive an FRLS packet with the neighbor heard flag set within the LFDT setting indicates link failure. Also, if the router considers the link to be bi-directional, subsequent reception of a FRLS packet without the neighbor heard flag set immediately indicates link failure (i.e. this indicates that its neighbor no longer hears the router). Thus the sum of ROUTER_HEARD_TIME plus LINK_PROBE_INTERVAL equals Link Failure Upper Bound (LFUB) and effectively upper bounds the time required to detect failure. Alternatively, the protocol may also rely on a link layer notification of link failure (e.g. loss of carrier) if available. Such a signal may provide a more accurate and timely indication of link status. 3.3 Reaction to Failure After the link status assessment protocol detects a link failure (possible router failure) at a router X with a neighbor Y, the following processing steps occur at router X: 1) Router X initially drops (or buffers) data packets for destination routes whose next hop was over the failed link with neighbor Y. This clearly includes any single-homed or multi- homed destination prefixes on the (potentially failed) neighboring router Y. 2) Router X adjusts routes for all affected destination prefixes aiming to locally route around the failed link and/or router. Let Internet Draft Fast Restore August 2000 us assume that the routing protocol's reaction to a link failure generates a Routing Protocol Link Failure (RPLF) packet for each destination prefix. Note that the RPLF activity for a particular destination will definitely fail in either of two cases. 3) The first would be if the link failure had generated a partition in the topology towards that destination which places an obvious requirement that restoration requires a sufficient level of connectivity between nodes to guarantee the availability of an alternate path. Partition detection would lead to the generation of a Routing Protocol Destination Failure (RPDF) packet that would be used to eliminate the routing for that destination throughout the domain. 4) The second more specific example occurs if the destination prefix is single-homed on a router, and that router has itself failed, rendering restoration around the failure futile. Therefore, for singly homed prefixes it may be useful to react differently to the failure of the router. This firstly requires a method to discover which routers have single-homed prefixes, a method to discover when such routers fail, and an alternative strategy to react to this type of failure. These are discussed in section 5. One such strategy though would be to hold back on pointless RPLF processing to give the router time to re-boot. This avoids the RPLF processing causing partition detection and subsequent RPDF generation, resulting in the elimination of the routing for that destination throughout the domain. In this case, the domain relies on interim ICMP unreachable messages to inform applications of the loss of connectivity to the destination. If re-boot does not occur within a fixed period then the routes to that destination can be then be reasonably removed from the domain through RPDF propagation. This is somewhat similar to the `graceful' strategy for avoiding routing flaps in BGP [BGPrestart] by suppressing route erasure on router failure in anticipation of fast reboot and graceful BGP restart. 4 The Temporally-Ordered Routing Algorithm 4.1 Overview We now map these generic packets to a specific routing protocol --- the Temporally-Ordered Routing Algorithm (TORA) [TORA, TORAdraft]. In TORA, RPLF packets map to Update (UPD) packets while RPDF packets map to Erase (ERS) packets. This latter packet type does not yet appear in [TORA, TORAdraft]. TORA is not a shortest-path routing protocol. It is a "link reversal" routing protocol that differs significantly from existing routing technology. TORA is similar to distance vector protocols (e.g. RIP) in that a separate version of the protocol runs independently for each destination prefix. Thus its storage Internet Draft Fast Restore August 2000 requirements scale with the number of destination prefixes. But its similarity ends there. In its original usage mode, TORA is an on- demand protocol that builds routes reactively in response to route requests from a MANET IP kernel. However, for usage in fixed networks it is transformed into a more traditional, proactive protocol. The key attribute of the protocol remains the same for both versions, and this is the protocol's aggressive, optimistic reaction to link failures which is critical within the MANET environment but is becoming equally important in fixed network scenario's. On link failure, when necessary, the protocol immediately reverses the directions of links in the area near the failure effectively searching for alternative routes to the destination. If such exist, the protocol typically finds them in a time and communication- efficient, single-pass reaction. Otherwise (in case of partition), the reaction detects the partition enabling route erasure if desired. We now briefly describe how TORA would function within the fast restoration framework. 4.2 Router Activation (Route Creation) When a router boots up, it establishes new links with its neighbors. The router proactively initiates a localized process of height database synchronization with each of its new neighbors. Its neighbors transmit the contents of their height databases to the new router. From this and any received routing advertisements (if any), a router can determine which of its own prefixes are single or multi-homed. It then generates an Optimization (OPT) packet for each destination prefix it wishes to advertise. If the prefix is single-homed and heights exist in the neighboring routers, the OPT packet stops at these neighbors and is not propagated further. Otherwise, the OPT packet floods fully (or partially) throughout a domain installing (or reorienting) routing entries for its destination prefix. The collection of routing entries forms a Directed Acyclic Graph (DAG). From an individual router's perspective, this means that it will have one or more feasible next hops for the destination, all of which are guaranteed to be loop-free. Loop-free multipath routing is achieved through the use of "heights". Each router maintains a height with respect to the destination, and the destination has the lowest height. A link is directed from a higher height to a lower height, and packets may only be forwarded from routers with higher heights to routers with lower heights. Thus packets flow "downhill" towards the destination. Each height is guaranteed to be unique and thus the collection of heights forms a total order. The protocol is Internet Draft Fast Restore August 2000 sometimes also referred to as the "Totally-Ordered Routing Algorithm". From a router's perspective, a neighbor that is higher is referred to as an "upstream" neighbor while a neighbor that is lower is referred to as a "downstream" neighbor. A destination router always has only upstream neighbors, and no other router is allowed to have upstream neighbors without having at least one downstream neighbor. It should be understood that all heights and associated link directions are per IP destination prefix DAG. 4.3 Link Failure A router may lose an upstream neighbor (due to a link failure) at any time without requiring protocol reaction. Similarly, it may lose a downstream neighbor provided it has other downstream neighbors, or has no downstream neighbors and no upstream neighbors, without requiring protocol reaction. However, if due to a link failure a router finds itself with at least one upstream neighbor without having any downstream neighbors (i.e. it just lost its last downstream neighbor and is a black hole), the router must adjust its height so that it has at least one downstream neighbor. The manner in which this occurs is described in [TORA,TORAdraft]. The adjustment results in transmission of an UPD (RPLF) packet to the router's neighbors advertising its new height. The effect of this height adjustment is to reverse the direction of some or all of the router's links; hence the name "link reversal" algorithm. Reception of an UPD (RPLF) packet at a neighbor may cause that neighbor to lose its last downstream link, which is cause for subsequent height modification and UPD packet transmission. Examination of the algorithm's reaction on mesh-like topologies reveals that the number of subsequent UPD (RPLF) transmissions is typically quite low; often only one UPD (RPLF) transmission is required. The resultant reaction is thus highly localized. It is also strongly de-correlated with the size of the domain. 4.4 Link Activation When a router restarts, it re-establishes links with its neighbors. The router initiates a localized process of height database synchronization with each of its new neighbors. Its neighbors transmit the contents of their height databases to the new router. By default, the router's new heights are NULL for each destination. (Note: A NULL height is higher than any non-NULL height, and packets may be forwarded from NULL to non-NULL heights. A link between two routers with NULL heights for a destination is marked as "unassigned", and packet forwarding over such links is disallowed.) The re-started router then performs its own non-NULL height selection with respect to the existing heights of its neighbors. Internet Draft Fast Restore August 2000 The algorithm for height selection is not discussed here, but it will be described in an upcoming draft detailing modification of TORA for use in fixed networks. The procedure only involves a router's one-hop neighbors. The height selection procedure occurs slowly, and a link is gradually brought back into active usage within the routing domain. 4.5 Router Failure (Route Erasure) When a router fails, then the default behavior is that each neighbor individually reacts to the router failure as a link failure, sending UPD (RPLF) packets as described in 3.3 above. This will restore paths around the failed router if sufficient physical connectivity exists. The failed router itself may have been advertising destination prefixes into the network and in the case of multi-homed subnets/prefixes the TORA reaction will result in the paths towards the failed routing being redirected towards the nearest alternate router also advertising that prefix. In the case of single-homed prefixes, the restoration attempt will clearly fail as those subnets are now partitioned from the core network. The collective UPD (RPLF) reaction of the routers connected to the failed router, followed by partition detection via the normal TORA reflection, would eventually clear the network of routing state for these single-homed destination prefixes. This is wasteful if the router is simply undergoing a re-boot and therefore additional protocol mechanisms may be warranted as highlighted in 3.3 and described in section5 below. The aim of these mechanisms in TORA would be to avoid sending UPD (RPLF) packets for the single-homed destination prefixes, wait to see if the associated router recovers in a set time, and if not, only then use ERS (RPDF) packets to efficiently clear this routing state from the network. 5. Detecting Router Failures with Single-Homed Prefixes As has previously been discussed, it may be useful to use additional protocol mechanisms to be able to distinguish router failures and associated single-homed destinations so that futile restoration processing and premature route elimination can be avoided. One such approach is outlined below, but the necessity and some of the mechanics for solving this problem may be critically dependent on the particular IGP. In this approach, a bootstrapping neighbor advertisement protocol enables a router to learn its one-hop neighbors, as well as its neighbors' one-hop neighbors, called the two-hop neighbor set. It works in combination with IPv6 router advertisements (or equivalent ICMP router messages in IPv4 (RFC1256)) to enable a router to assess its routing neighbors. Once a link failure is detected, the neighbor reachability protocol attempts to determine whether the router at Internet Draft Fast Restore August 2000 the other end of the failed link is still reachable, and whether it is advertising single-homed and/or multi-homed destination prefixes. With input from the preceding protocols, the highly localized routing protocol then reacts differently for router failures involving single-homed prefixes. 5.1 Neighbor Advertisement Protocol This protocol is principally concerned with neighbor discovery and protocol bootstrapping. As with OSPF, each router is identified by a Router ID (RID). It also is associated with at least one IP destination prefix by which it is globally reachable. Routes for this prefix are advertised out of all its interfaces and built by the routing algorithm. It may also advertise other destination prefixes. These prefixes may be single-homed (i.e. only this router advertises itself as the last hop for these prefixes) or multi-homed (i.e. they are advertised by multiple routers as being last hop routers). The neighbor advertisement protocol requires that each router periodically advertises its RID and destination prefix(es) in Fast Restore Neighbor Advertisement (FRNA) packets to its one-hop routing neighbors out interfaces on which routing is activated. FRNA packets would likely be multicast (TTL scope == 1) to a well-known multicast address. The transmission period would be relatively long (on the order of 30 seconds). A router's FRNA packet also contains the RIDs and corresponding prefixes of the router's one-hop neighbors that it has learned from their previous FRNA transmissions. In this way, by receiving its neighbors' FRNA packets, a router learns the identities of its neighbors' neighbors in a fashion very similar to the neighbor advertisement algorithm of [OLSR]. Associated with each link a router has with a routing neighbor is an instance of a link status assessment protocol (see section 2.2). For each instance a router has a Link Failure Detection Timer (LFDT) for that link. This is the time after which a router will declare a link to its neighbor down if no packets are received from its neighbor. A router also computes an Link Failure Upper Bound (LFUB) on the overall time required to detect a failure on each link, which is equal to the LFDT value plus a link polling period (to be described). A router X also advertises in its FRNA transmissions the maximum computed value over all its links of these bounds, known as the maximum LFUB (LFUBmax(X)). The neighbor advertisement protocol operates in conjunction with IPv6 Router Advertisement (RA) messages (or an equivalent in IPv4) which are sent out interfaces on which routing is not activated (e.g. over subnets with attached hosts). Router advertisement messages convey the destination prefixes being advertised by other routers on a given subnet. By listening to these advertisements, a router can learn if other routers are advertising destination Internet Draft Fast Restore August 2000 prefixes that itself is also advertising. It thus determines which of its own last-hop prefixes are single-homed or multi-homed. 5.2 Neighbor Reachability Protocol We now describe the basic operation of a neighbor reachability protocol. The protocol is assumed to run at a router X, and is triggered by link failure detection on a link to router Y. After a short time (estimated sufficient for localized route re- computation), router X executes a distributed neighbor reachability protocol to determine reachability of the single-homed destination prefixes advertised by neighbor Y as follows. The (potentially failed) neighbor Y's maximum LFUB (LFUBmax) is known from its previous FRNA transmissions. With this information a router X creates a RPLF packet as it would if this were a link failure, but delays transmission of the packet for a time T somewhat greater than its neighbor Y's LFUBmax; thus T > LFUBmax(Y). In the meantime it immediately sends a Fast Restore Link Failure (FRLF) packet to each of the (potentially failed) neighbor Y's one-hop neighbors. Reception of a FRLF packet at such a neighbor Z instructs that neighbor to mark the link between the indicated router Y and the sender X as "failed". But the receiver Z keeps the sender X in its list of the neighbor Y's one-hop neighbors. This is to ensure that Z will still send a FRLF packet to X if it also detects a link failure with Y. Z will change its notion of Y's neighbors only on reception of a FRNA message from Y itself. Subsequent reception of a FRLS packet at router Z from neighbor Y indicates to the receiver Z that the link between routers X and Y may indeed have failed, but that router Y itself is still functioning. Let us assume that router Y has indeed failed, and that its neighbor X was the first neighbor to detect this failure, thus sending FRLF packets to router Y's previous one-hop neighbors. Subsequently others of Y's previous neighbor set begin to declare link failures with Y, each scheduling delayed RPLF transmissions and immediately sending an FRLF packet to each of Y's past neighbors. Let us also assume that Y's neighbor Z is the last neighbor to detect a link failure with router Y. It also schedules a RPLF packet for transmission (suitably delayed) and sends FRLF packets to each of Y's previous neighbors. By this time (or very shortly thereafter) Z will likely to be the first of Y's past neighbors to both declare link failure with Y and have received FRLF messages from all of Y's previous neighbors (without having heard a subsequent message from Y). At that point router Z declares "router failure" of previously adjacent neighbor Y, and cancels its pending RPLF transmission for Y. The router Z then queues a Routing Protocol Destination Failure (RPDF) packet for each of neighbor Y's single-homed destination prefixes (Note that for efficiency, multiple RPDF packets may be aggregated and sent within a single aggregate RPDF packet). Internet Draft Fast Restore August 2000 Assuming T (T > LFUBmax(Y)) is set appropriately at each of Y's previous neighbors, none of Y's previous neighbors would have yet transmitted their pending RPLF packets. The router also sets a NEIGHBOR_RESTART_TIMER to some value (possibly 30 seconds) while it waits for neighbor Y to reboot. During this time packets sent to Y's single-homed prefixes will be undeliverable resulting in "host unreachable" ICMP message being returned to the sources. Assuming neighbor Y reboots within this time, the queued RPDF transmission is cancelled. Otherwise the queued RPDF packet is transmitted when the timer expires and the routes for Y's single-homed destination prefixes are erased from the domain. The objective of this timer is to suppress route erasure in the case of fast router reboot in a fashion similar to that proposed in [BGPrestart]. If the NEIGHBOR_RESTART_TIMER expires and T (T > LFUBmax(Y)) is not set appropriately at each of Y's previous neighbors, possibly others may also have queued and then issued RPDF packets prior to receiving an FRDF message originating from router Z. This is not detrimental. RPDF packets are destination-specific and source-independent. Their collective propagation results in only a single aggregated flood, regardless of the number of routers originating RPDF packets. An RPDF packet floods throughout the network immediately, efficiently clearing state information regarding the effected destinations. An RPDF packet is sent to the ALL_ROUTERS multicast address. For maximal efficiency the aforementioned protocol assumes a domain topology with the characteristic that, after any single router or link failure, the remaining domain remains well connected. Thus router failure does not partition the remaining domain permitting the failed neighbors to exchange FRLF messages. This is required so fast restoration can be achieved. Nevertheless, if the network becomes partitioned the above algorithm need not result in undo harm provided the routing protocol is capable of detecting partitions. A router/link failure, in such a case, will result in multiple link failures being detected since the failed router Y's neighbors on one side of the partitioned network will not be able to reach Y's one hop neighbor's on the other side of the partitioned domain (i.e. FRLF will fail to reach some neighbors) to reach a more definite conclusion. Consequently each such neighbor will transmit a RPLF packet for each single-homed destination prefix on the potentially failed router. Each packet will initiate a distributed computation that performs "distributed partition detection" and eventually clears the domain of routing state for its destination. This approach is less efficient than that using RPDF packets. The approach also assumes the FRLF messages are not frequently dropped. Loss of at least one FRLF message destined for each of the failed destination's neighbors would ensure that no neighbor generates an RPDF packet. If there are k such neighbors with each needing to receive all k-1 FRLF packets from the other neighbors, and the probability of packet dropping is p, then an upper bound on Internet Draft Fast Restore August 2000 the probability of this undesirable, joint packet dropping event is p^k. That is, each of the k neighbors must have missed at least one packet from some other neighbor. If control packets are given reasonable priority (suitably small p), then this will not occur very frequently (i.e. only with probability p^k). If it does occur, each of these neighbors would start to independently transmit their queued RPLF packets (which is what this router failure detection mechanism intends to avoid). This results in the aforementioned distributed partition detection process and the worst-case consequences regarding convergence in this case depend on the particular routing algorithm. 6. Conclusions The fast restoration processing occurs entirely at layer 3 and does not require special layer 2 considerations, although a layer 2 link activation/failure notification trigger may be used if available. The overall restoration approach is constrained only by the speed at which a router can create, transmit and process the localized routing updates. Increases in processing and transmission speeds will only improve restoration performance over time. It is anticipated that restoration can frequently be achieved with little or no noticeable effect on voice over IP sessions. Some degradation would likely be experienced by higher rate sessions given current hardware capabilities. The approach offers immunity to link instability (such as that triggered by malfunctioning interface cards). A link that flip- flops between UP and DOWN will effectively be removed from the forwarding paths of its adjacent routers without triggering domain- wide routing updates or routing re-convergence computations. Thus the network will not melt down due to localized link problems. The approach offers reasonable immunity to router instability as well. A router that frequently goes UP and DOWN will not create domain-wide routing messages (route building followed by erasure) if it can quickly reboot. If it cannot, then these messages still only effect its own prefixes. With regards to the remaining network prefixes, the routing protocol will treat such router instability as a set of individually unstable links, effectively removing them from the forwarding paths for these prefixes and isolating the network from harm. The net effect then is that one can deploy much larger domains with greatly reduced risk of domain melt down while enabling fast restoration of IP flows when necessary. Internet Draft Fast Restore August 2000 7. References [BGPrestart] S. Ramachandra et al., "Graceful Restart mechanism for BGP", Internet-Draft, draft-ramachandra-bgp-restart-00.txt, (work in progress), August 2000. [FLIP] H.Sandick et al., "Fast LIveness Protocol (FLIP)", Internet Draft, draft-sandiick-flip-00.txt, February 2000. [IAB99] M. Kaat, "Overview of 1999 IAB Network Layer Workshop", Internet-Draft, draft-iab-ntwlyrws-over-03.txt, July 2000. [MobileIP] C.E. Perkins, ``IP Mobility Support," Internet RFC 2002, Oct 1996. [OLSR] P. Jacquet, P. Muhlethaler and A. Qayyum, "Optimized Link State Routing Protocol", Internet-Draft, draft-ietf-manet-olsr- 01.txt, February 2000. [OSPF] J. Moy, "OSPF Version 2", Internet RFC 2328, April 1998. [RIP] G. Malkin, "RIP Version 2", Internet RFC 2453, November 1998. [TORA] V. Park, M. S. Corson, "The Temporally-Ordered Routing Algorithm", Proc. IEEE INFOCOM '97, Kobe, Japan, April 1997. [TORAdraft] V. Park, M. S. Corson, "Temporally-Ordered Routing Algorithm (TORA) Version 1 Functional Specification", Internet-Draft (work in progress), draft-ietf-manet-tora-spec-02.txt, October 1999. Author's Addresses M. Scott Corson University of Maryland Ansible Systems (+1) 301-405-6630 corson@isr.umd.edu mscorson@ix.netcom.com Alan O'Neill BT (+44) 1473-646459 alan.w.oneill@bt.com George Tsirtsis Flarion Technologies (+44) 020 88260073 g.tsirtsis@Flarion.com gtsirt@hotmail.com Internet Draft Fast Restore August 2000 Copyright Notice Placeholder for ISOC copyright.