Internet draft T. Hardie Expires: April, 2002 R. White November, 2001 Category: Work-in-progress draft-hardie-bounded-longest-match-02.txt Bounding Longest Match Considered Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt To view the list Internet-Draft Shadow Directories, see http://www.ietf.org/shadow.html. Copyright Copyright (C) The Internet Society 2001. All Rights Reserved. Abstract Some ASes currently use length-based filters to manage the size of the routing table they use and propagate. This draft explores an alternative to length-based filters which allows for more automatic configuration and which provides for better redundancy. Rather than use a filter, this draft proposes a method of modifying the BGP longest match algorithm by setting a bound on the prefix lengths eligible for preference. A bound would operate on long prefixes when covering route announcements are available; in certain circumstances it would cause a router to prefer an aggregate over a more specific route announcement. 1. What problem would modifying longest match address? Modifying longest match would limit the rate of growth in the routing table seen by many BGP speakers. The current rate of growth and the time to convergence represent threats to the stability to the Internet. In the short term, the IETF is considering efforts to curb these threats while new routing paradigms that attack the fundamental limitations of path vector protocols are developed and deployed. A number of the practical efforts to limit the rate of growth of the routing table have focused on filter policies, arguing that aggressive filtering will return the Internet to a state in which provider aggregates are a majority of the routes in the routing table[3]. This draft proposes an approach along those same lines, but using a bound on the longest match algorithm rather than a filter policy. The authors believe that this approach can produce a similar (though not identical) effect while retaining full reachability and allowing multi-homing non-transit networks to achieve the main goals which have motivated their becoming independent ASes. 2. How might this work? As each prefix is received by a BGP speaker from an external peer, it would be evaluated in the light of other prefies already received. If two prefixes overlap in space (such as 192.168.0.0/16 and 192.168.1.0/24), the longer prefix would be marked (for example with a new, as yet unspecified PATH_TYPE) in order to indicate it should not be used for forwarding traffic, and propogated to its internal peers. Any peer receiving a prefix with the "bounding" marker would then compare this path to other paths towards the same prefix and prefix length. If the router is not advertising a bounded path for this prefix as well, it will simply filter (or block) advertising of this bounded prefix from its Local-RIB to other BGP speakers within the autonomous system (internal speakers). If the router is advertising a bounded path for this prefix as well, it will compare the router ID of the advertising router to its router ID; if the loacl router has the lower router ID, it will continue to advertise the prefix marked as bounded. Otherwise, it will stop advertising this prefix altogether. Rather than using the standard BGP prefix comparison (decision process), the path originated by the BGP speaker with the lowest router ID will be preferred, so only one BGP speaker will advertise a bounded path into an autonomous system. Since the bounded path is not to be inserted into the Local-RIB, it will not be propogated to BGP speakers outside the autonoumous system. 2.1 Examples of Bounding the Longer Prefix 2.1.1 Assume the following configuration of autonomous systems: ( ) /-------( AS2 )--------\ ( ) / ( ) \ ( ) ( ) ( AS1 ) ( AS4 )-----( AS5 ) ( ) \ ( ) / ( ) ( ) \-------( AS3 )--------/ ( ) AS1 is advertising 192.168.1.0/24 to both AS2 and AS3. AS2 is advertising both 192.168.1.0/24 and 192.168.0.0/16 into AS4 AS3 is advertising 192.168.1.0/24 into AS3 Each connection (session) is handled by a seperate router within each AS (AS4 peers with AS2 on a seperate router than AS3 does) When the peering router in AS4 between AS4 and AS2 receives both the 192.168.1.0/24 and the 192.168.0.0/16 prefixes, it will mark the 192.168.1.0/24 as bounded and will then propogate this through AS4. When the router in AS4 between AS4 and AS3 receives this bounded advertisement, it will stop advertising 192.168.1.0/24 into AS4, leaving only the shorter prefix (the aggregate) for forwarding purposes within the AS. If the link between AS1 and AS2 fails, the longer length prefix will be withdrawn from AS2, and thus the peering point between AS2 and AS4 will no longer have an overlapping set of prefixes. Thus, within AS4, the border router which peers with AS2 will cease advertising the 192.168.1.0/24 prefix which is marked as bounded. When the border router peering with AS3 receives this withdraw, it will note it no longer has any restriction on advertising this longer prefix, and will begin propagating it. 2.1.2 Assume this configuration of autonomous systems: ( ) /-------( AS2 )--------\ ( ) / ( ) \ ( ) ( ) ( AS1 ) ( AS4 )-----( AS5 ) ( ) \ ( ) /--( AS6 )-----/ ( ) ( ) \-( AS3 ) / ( ) \--( AS7 )---/ And we have the following assumptions: AS1 is advertising 192.168.1.0/24 to both AS2 and AS3. AS2 is advertising both 192.168.1.0/24 and 192.168.0.0/16 into AS4 AS3 is advertising 192.168.1.0/24 into AS6 and AS7 AS6 and AS7 are advertising 192.168.1.0/24 into AS4 Each connection (session) is handled by a seperate router within each AS (AS4 peers with AS2 on a seperate router than AS3 does) The operation is much the same as 2.1.1, except that the border routers within AS4 peering with AS6 and AS7 now both block the longer prefix from being advertised into AS4. AS5 only receives the single aggregate originally advertised by AS2. 2.2 Advantages to the Service Provider AS4, in each of the situations, reduces the number of prefixes carred through the autonomous system by the number of longer prefixes that overlap with aggregates of those prefixes. While one copy of the prefix continues to be carried through the autonomous system, this entry is not placed in the forwarding table, nor is it propogated outside the autonomous system. AS5 receives one prefix instead of two (or possibly more). 2.3 Advantages to the Customer In this case, the customer is respresented as AS1. The customer will continue to receive some amount of traffic over both peering sessions, and dual homing through two Service Providers is still effective. If the customer's primary link fails, the alternate link through AS3 will take over receving all inbound traffic automatically. With most other schemes presented to this point, the customer loses all impact of dual-homing into the Internet, unless both connections are through one Service Provider. 2.4 Advantages to the Internet Beyond the second AS hop, aggregation is preserved in all cases. While this would not reduce the backbone routing table by the dramatic amounts that other methods might, the advantages to the community are great, and at greatly reduced risk to customers. 2.5 What are the costs of using bounds of this type? 2.5.1 Router processing This proposal clearly adds to the work which needs to be done during overall BGP processing. Because a check needs to be done for both covered and covering routes, some part of this work is required for routes of lengths on either side of the bound. Should this become common, however, the rate of growth in the number of routes should be smaller and a balance should be struck between the extra processing per route and the number of routes. 2.5.2 Traffic engineering The implementation of a bound risks magnifying or removing the effect of certain widely deployed traffic engineering methods. If, for example, an AS chose to prepend its own route to an announcement in order to alter the preference for that route, a BGP neighbor using a bounded longest match might now see that route as eligible for discard in favor of an aggregate. While it is fairly easy to code around that particular problem, to avoid this class of problems it might be preferable to allow this to apply to specific AS Sets as well as to all BGP neighbors. 2.5.3 Propagation delay and increased convergence time. If the route to the AS providing the route to the aggregate should be lost, the more-specific must propagate into the ASes which had formerly heard only the aggregate. This increases convergence time and may create situations in which reachability is temporarily compromised. Unlike the filter case, however, normal BGP behavior should restore reachability without changes to the router configuration. There is a also a risk that during a pathological event the increased processing required by this change will degrade propagation times during those events. This depends on both the speed of specific implementations and the character of the topology. 2.5.4 Routing Loops In cases where neighbor ASes do not apply the same bounds, short-lived routing loops could occur. As an example, imagine an AS which has applies a bound at /23 and has two routers, rtrA and rtrB, each with an eBGP peer in a different AS. Both routers learn a /24. rtrA's is the best path; and rtrB propagates that path to its eBGP peer. If rtrB learns a covering /16 from its eBGP peer the bound will cause that /16 to be the best path, so it will point traffic to the /24 to its peer. If that peer is not using a similar bound, however, it will point traffic to the /24 through rtrB to the route learned at rtrA. This is transient, but it does cause a loop. 2.6 What are the advantages of using bounds of this type? As noted above, using a bound on the longest match algorithm limits the number of routes used and propagated by an AS without risking a loss in reachability. Many of the same effects can be achieved with active management of length-based filters, but this method should allow a single knob to achieve the same thing as many lines of router configuration. It also avoids the risk of long lived black holes inherent in the application of length-based filters where no covering aggregate is announced or when that aggregate is withdrawn by a loss in reachability. 2.7 Who gets the benefits? The AS which ignores the route longer than the bound gets the benefit for all routers past the router which undertakes the processing to apply the bound. The ASes which are BGP neighbors of that AS derive the same benefit, however, without incurring the extra processing. 2.8 What does all this mean for multi-homing? Multi-homed non-transit ASes may maintain multiple BGP relationships to ensure redundancy in the presence of provider network or business failure, to manage performance to key traffic exchange partners, and to reduce expenditures on transit through peering. The deployment of bounded longest match as described here should not affect redundancy. In the case of failure of the AS from whom the address space is initially derived, the propagation delay of the longer match past the next-hop BGP neighbor may be subject to the problems discussed in 2.5.3. Performance to key traffic exchange partners who are direct BGP neighbors is not affected at all by the deployment of the described algorithm for bounded longest match. For those who are not direct BGP neighbors, the ability to manage the outbound path is not changed, but the return path may be changed. This is, in any case, notoriously difficult to control. The deployment of bounded longest match should not affect the traffic exchange between peers, though the ability of such a system to recognize AS Sets may be required where peers are defined by an AS Sets rather than an single AS. 3. Security Considerations This document presumes that the implementation of bounded longest match is a knob inside a router config. Since the use of the knob affects route announcements not originating within the router's AS or its direct neighbors, the new behavior may result in surprises to the announcing AS. It is possible that this behavior might be considered a denial of service or mistaken for a denial of service by systems designed to detect black-holing on behalf of the origin AS. 4. Full copyright statement Copyright (C) The Internet Society 1999. All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 5. Acknowledgements Cengiz Alaentinoglu, Alvaro Retana, Russ White, and Barry Greene gave valuable comments on the -00.txt version of this draft. A number of colleagues also gave the author valuable comments on the white board markings that gave rise to this paper; among them are Lane Patterson, Ian Cooper, Gerd Besch, Bill Norton, Diarmuid Flynn, and Sean Donelan. 6. References [1] Huston, Geoff. http://www.telstra.net/ops/bgp/index.html [2] Ahuja, Abha. http://www.merit.edu/~ahuja/ptomaine-bof/ahuja-ietf-ptomaine/index.htm [3] Bush, Randy. Plenary, IETF 51. Eventually at: http://www.ietf.org/proceedings/01aug/ 7. Authors' addresses Ted Hardie Ted.Hardie@nominum.com Russ White ruwhite@cisco.com