Internet draft T. Hardie Expires: March, 2002 October, 2001 Category: Work-in-progress draft-hardie-bounded-longest-match-01.txt Bounding Longest Match Considered Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt To view the list Internet-Draft Shadow Directories, see http://www.ietf.org/shadow.html. Copyright Copyright (C) The Internet Society 2001. All Rights Reserved. Abstract Some ASes currently use length-based filters to manage the size of the routing table they use and propagate. This draft explores an alternative to length-based filters which allows for more automatic configuration and which provides for better redundancy. Rather than use a filter, this draft proposes a method of modifying the BGP longest match algorithm by setting a bound on the prefix lengths eligible for preference. A bound would operate on long prefixes when covering route announcements are available; in certain circumstances it would cause a router to prefer an aggregate over a more specific route announcement. 1. What problem would modifying longest match address? Modifying longest match would limit the rate of growth in the routing table seen by many BGP speakers. The current rate of growth and the time to convergence represent threats to the stability to the Internet. In the short term, the IETF is considering efforts to curb these threats while new routing paradigms that attack the fundamental limitations of path vector protocols are developed and deployed. A number of the practical efforts to limit the rate of growth of the routing table have focused on filter policies, arguing that aggressive filtering will return the Internet to a state in which provider aggregates are a majority of the routes in the routing table[3]. This draft proposes an approach along those same lines, but using a bound on the longest match algorithm rather than a filter policy. The author believes that this approach can produce a similar (though not identical) effect while retaining full reachability and allowing multi-homing non-transit networks to achieve the main goals which have motivated their becoming independent ASes. 2. How might this work? Assume that router vendors have added a knob like "route length bound" that is either applied globally or to a route map; that bound would represent the CIDR block length (e.g. /23) up to which the router would use normal longest-match BGP processing. Routes beyond that length would be pre-processed to check for the presence of covering aggregates, and in certain circumstances the longer prefix would be ignored in favor of the covering aggregates. Because longest-match processing is now commonly a function of ASICs, practical deployment in the short term would require that this processing occur before the tables are handed to the ASIC; eventually, however, this could become part of forwarding engine behavior. The author presumes a particular order to application of rules, however, and careful attention by reviewers to unintended consequences of this ordering is encouraged; full optimization will, of course, be dependent on the vendor implementation. An outline of the steps envisioned is below: Step 1. Check the source of the route. If the route being advertised originates in a direct BGP neighbor: accept it for normal BGP processing. If it does not originate in a direct BGP neighbor: proceed to Step 2. Step 2. Check the route length. If the route is longer than the bound: check for a covering aggregate. If there is no covering aggregate: accept it for normal BGP processing. If there is a covering route: check to see if NO EXPORT or NO PEER has been set for the covering route. If the covering route is not exportable: accept the longer prefix for normal BGP processing. If the covering route is exportable: ignore the route longer than the bound. If the route is shorter than the bound: check the route for exportability. If it is not exportable, proceed to normal BGP processing. If it is exportable: Check for routes longer than the bound which are covered by this route. If any covered routes longer than the bound are present: check source of routes. Ignore any covered routes which do not originate in direct BGP neighbors. Step 3. Continue normal BGP processing. 2.1 What happens when we set bounds of this type? Like setting up a filter, setting a bound on longest match reduces the propagation of routes beyond a certain length. It does so in a way that retains any route which is not reachable by an aggregate; it also ensures that normal BGP behavior will allow more-specifics to propagate whenever an aggregate is withdrawn. This does break the use of longest-match for determining an optimal route when the route in question is longer than the bound. The behavior described above can already be achieved with filters, full knowledge of available aggregates, and an active network engineering staff to remove filters when aggregates are withdrawn. Moving from filter to bound may, however, encourage networks without access to those benefits to participate. 2.1.1 Example Below is a diagram of a non-transit multi-homed AS advertising a /24 derived from an upstream's /16: /16| /16| /16| /16| /24 | | | | | ----- ----- ----- /16 ----- ----- | A | | B | /-| C |-------| D | | F | ----- ----- / ----- ----- ----- \ /16| / | / / \ /24| / | / / \ | / | | / \ | | | /16 | / /16 \ | | /16 | /24 | /24 / /24 /24 \ | | /24 | | / ------ ------ |-------- | G |-------| H | | I | ------ /16 ------ ------- \ /24 | / \ | / /24 \ /24| /24 / \ | / \ | / \ | / \ | / --------------- | J | --------------- Assume that all of these networks have bounded longest match at /23, invoking special processing for /24s and longer prefixes. J is a multi-homed network with its own AS and three BGP neighbors, G, H, and I. It uses a /24 from G's /16, and announces it to all three neighbors. G, H, and I, as direct BGP neighbors, will process that /24 according to normal BGP longest match rules. F hears the /24 from I and hears no other route covering it, so F retains the /24. A, B, C are BGP neighbors of G and hear both the /24 and its covering /16 from G; they propagate only the /16. D hears the /24 from H and the /16 from C; it uses and propagates only the /16. In this example, all of the next-hop ASes to J use the route it announces, but four out of the five ASes one hop further from J use and propagate only an aggregate, causing the longer prefix to fade from the routing system very quickly but without loss of reachability. Note that C and D's return path to J have likely changed because of the bound. D now routes traffic to J via C,G,J rather than I,J and C now uses G,J rather than {G|H},J. If G should become unavailable, the /16 it announces will disappear and C, D, and F will revert to the /24 they hear through H and I. 2.2 What are the costs of using bounds of this type? 2.2.1 Router processing The preprocessing steps described in 2 clearly add to the work which needs to be done during overall BGP processing. Because a check needs to be done for both covered and covering routes, some part of this work is required for routes of lengths on either side of the bound. Should this become common, however, the rate of growth in the number of routes should be smaller and a balance should be struck between the extra processing per route and the number of routes. 2.2.2 Traffic engineering The implementation of a bound risks magnifying or removing the effect of certain widely deployed traffic engineering methods. If, for example, an AS chose to prepend its own route to an announcement in order to alter the preference for that route, a BGP neighbor using a bounded longest match might now see that route as eligible for discard in favor of an aggregate. While it is fairly easy to code around that particular problem, to avoid this class of problems it might be preferable to allow pre-processing Rule 1. to consider AS Sets as BGP neighbors. The AS Set of a prepending AS is then just a special case of a general handling mechanism for cases in which that information should be retained into normal BGP processing. This, of course, further increases the router processing required. 2.2.3 Propagation delay and increased convergence time. If the route to the AS providing the route to the aggregate should be lost, the more-specific must propagate into the ASes which had formerly heard only the aggregate. This increases convergence time and may create situations in which reachability is temporarily compromised. Unlike the filter case, however, normal BGP behavior should restore reachability without changes to the router configuration. There is a also a risk that during a pathological event the increased processing required by this change will degrade propagation times during those events. This depends on both the speed of specific implementations and the character of the topology. 2.2.4 Routing Loops In cases where neighbor ASes do not apply the same bounds, short-lived routing loops could occur. As an example, imagine an AS which has applies a bound at /23 and has two routers, rtrA and rtrB, each with an eBGP peer in a different AS. Both routers learn a /24. rtrA's is the best path; and rtrB propagates that path to its eBGP peer. If rtrB learns a covering /16 from its eBGP peer the bound will cause that /16 to be the best path, so it will point traffic to the /24 to its peer. If that peer is not using a similar bound, however, it will point traffic to the /24 through rtrB to the route learned at rtrA. This is transient, but it does cause a loop. 2.3 What are the advantages of using bounds of this type? As noted above, using a bound on the longest match algorithm limits the number of routes used and propagated by an AS without risking a loss in reachability. Many of the same effects can be achieved with active management of length-based filters, but this method should allow a single knob to achieve the same thing as many lines of router configuration. It also avoids the risk of long lived black holes inherent in the application of length-based filters where no covering aggregate is announced or when that aggregate is withdrawn by a loss in reachability. 2.4 Who gets the benefits? The AS which ignores the route longer than the bound gets the benefit for all routers past the router which undertakes the processing to apply the bound. The ASes which are BGP neighbors of that AS derive the same benefit, however, without incurring the extra processing. 2.5 What does all this mean for multi-homing? Multi-homed non-transit ASes may maintain multiple BGP relationships to ensure redundancy in the presence of provider network or business failure, to manage performance to key traffic exchange partners, and to reduce expenditures on transit through peering. The deployment of bounded longest match as described here should not affect redundancy. In the case of failure of the AS from whom the address space is initially derived, the propagation delay of the longer match past the next-hop BGP neighbor may be subject to the problems discussed in 2.2.3. Performance to key traffic exchange partners who are direct BGP neighbors is not affected at all by the deployment of the described algorithm for bounded longest match. For those who are not direct BGP neighbors, the ability to manage the outbound path is not changed, but the return path may be changed. This is, in any case, notoriously difficult to control. The deployment of bounded longest match should not affect the traffic exchange between peers, though the ability of such a system to recognize AS Sets may be required where peers are defined by an AS Sets rather than an single AS. 3. Security Considerations This document presumes that the implementation of bounded longest match is a knob inside a router config. Since the use of the knob affects route announcements not originating within the router's AS or its direct neighbors, the new behavior may result in surprises to the announcing AS. It is possible that this behavior might be considered a denial of service or mistaken for a denial of service by systems designed to detect black-holing on behalf of the origin AS. 4. Full copyright statement Copyright (C) The Internet Society 1999. All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 5. Acknowledgements Cengiz Alaentinoglu, Alvaro Retana, Russ White, and Barry Greene gave valuable comments on the -00.txt version of this draft. A number of colleagues also gave the author valuable comments on the white board markings that gave rise to this paper; among them are Lane Patterson, Ian Cooper, Gerd Besch, Bill Norton, Diarmuid Flynn, and Sean Donelan. 6. References [1] Huston, Geoff. http://www.telstra.net/ops/bgp/index.html [2] Ahuja, Abha. http://www.merit.edu/~ahuja/ptomaine-bof/ahuja-ietf-ptomaine/index.htm [3] Bush, Randy. Plenary, IETF 51. Eventually at: http://www.ietf.org/proceedings/01aug/ 7. Author's address Ted Hardie hardie@oakthorn.com