Internet draft T. Hardie Expires: February, 2002 Equinix, Inc Category: Work-in-progress September, 2001 draft-hardie-bounded-longest-match-00.txt Bounding Longest Match Considered Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt To view the list Internet-Draft Shadow Directories, see http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society 2001. All Rights Reserved. Abstract This is a very drafty draft; so much so that the author isn't even sure what adjective should follow "considered" in the title. It attempts to explore what would happen to the routing system if network engineers were able to modify the BGP longest match algorithm by setting a bound on the prefix lengths eligible for preference. A bound would operate on long prefixes when covering route announcements are available; in certain circumstances it would cause a router to prefer an aggregate over a more specific route announcement. 1. Why modify longest match? Available evidence indicates both that the rate of growth in the number of prefixes in the routing table has increased and that the time to convergence has grown [1][2]. These trends represent a threat to the future stability of the Internet. In the short term, the IETF is considering efforts to curb these threats while new routing paradigms that attack the fundamental limitations of path vector protocols are developed and deployed. A number of the practical efforts to limit the rate of growth of the routing table have focused on filter policies, arguing that aggressive filtering will return the Internet to a state in which provider aggregates are a majority of the routes in the routing table[3]. This draft proposes an approach along those same lines, but using a bound on the longest match algorithm rather than a filter policy. The author believes that this approach can produce a similar (though not identical) effect while retaining full reachability and allowing multi-homing non-transit networks to achieve the main goals which have motivated their becoming independent ASes. 2. How might this work? Assume that router vendors have added a knob "route length bound" that is either applied globally or to a route map; that bound would represent the CIDR block length (e.g. /23) up to which the router would use normal longest-match BGP processing. Routes beyond that length would be pre-processed to check for the presence of covering aggregates, and in certain circumstances the longer prefix would be discarded in favor of the covering aggregates. Although described as pre-processing, the steps outlined below would become part of the BGP processing for the router. The author presumes a particular order to application of rules, however, and careful attention by reviewers to unintended consequences of this ordering is encouraged; full optimization will, of course, be dependent on the vendor implementation. Step 1. Check the origin of the route. If the route being advertised originates in a direct BGP neighbor: accept it for normal BGP processing. If it does not originate in a direct BGP neighbor: proceed to Step 2. Step 2. Check the route length. If the route is longer than the bound: check for a covering aggregate. If there is no covering aggregate: accept it for normal BGP processing. If there is a covering route: check the exportability of the covering route. If the covering route is not exportable: accept the longer prefix for normal BGP processing. If the covering route is exportable: delete the route longer than the bound. If the route is shorter than the bound: check the route for exportability. If it is not exportable, proceed to normal BGP processing. If it is exportable: Check for routes longer than the bound which are covered by this route. If any covered routes longer than the bound are present: check origin of routes. Delete any covered routes which do not originate in direct BGP neighbors. Step 3. Continue normal BGP processing. 2.1 What happens when we set bounds of this type? Like setting up a filter, setting a bound on longest match reduces the propagation of routes beyond a certain length. It does so in a way that retains any route which is not reachable by an aggregate; it also ensures that normal BGP behavior will allow more-specifics to propagate whenever an aggregate is withdrawn. The behavior described above can already be achieved with filters, full knowledge of available aggregates, and an active network engineering staff to remove filters when aggregates are withdrawn. Moving from filter to bound may, however, encourage networks without access to those benefits to participate. 2.1.1 What would bad ascii art on this look like? /16| /16| /16| /16| /24 | | | | | ----- ----- ----- /16 ----- ----- | A | | B | /-| C |-------| D | | F | ----- ----- / ----- ----- ----- \ /16| / | / / \ /24| / | / / \ | / | | / \ | | | /16 | / /16 \ | | /16 | /24 | /24 / /24 /24 \ | | /24 | | / ------ ------ |-------- | G |-------| H | | I | ------ /16 ------ ------- \ /24 | / \ | / /24 \ /24| /24 / \ | / \ | / \ | / \ | / --------------- | J | --------------- 2.1.2 What does the bad ascii art mean? Assume that all of these networks have bounded longest match at /23, invoking special processing for /24s and longer prefixes. J is a multi-homed network with its own AS and three BGP neighbors, G, H, and I. It uses a /24 from G's /16, and announces it to all three neighbors. G, H, and I, as direct BGP neighbors, will process that /24 according to normal BGP longest match rules. F hears the /24 from I and hears no other route covering it, so F retains the /24. A, B, C are BGP neighbors of G and hear both the /24 and its covering /16 from G; they propagate only the /16. D hears the /24 from H and the /16 from C; it uses and propagates only the /16. In this example, all of the next-hop ASes to J use the route it announces, but four out of the five ASes one hop further from J use and propagate only an aggregate, causing the longer prefix to fade from the routing system very quickly but without loss of reachability. Note that C and D's return path to J have likely changed because of the bound. D now routes traffic to J via C,G,J rather than I,J and C now uses G,J rather than {G|H},J. If G should become unavailable, the /16 it announces will disappear and C, D, and F will revert to the /24 they hear through H and I. 2.2 What are the costs of using bounds of this type? 2.2.1 Router processing The preprocessing steps described in 2 clearly add to the work which needs to be done during overall BGP processing. Because a check needs to be done for both covered and covering routes, some part of this work is required for routes of lengths on either side of the bound. Should this become common, however, the rate of growth in the number of routes should be smaller and a balance should be struck between the extra processing per route and the number of routes. 2.2.2 Traffic engineering The implementation of a bound risks magnifying or removing the effect of certain widely deployed traffic engineering methods. If, for example, an AS chose to prepend its own route to an announcement in order to alter the preference for that route, a BGP neighbor using a bounded longest match might now see that route as eligible for discard in favor of an aggregate. While it is fairly easy to code around that particular problem, to avoid this class of problems it might be preferable to allow pre-processing Rule 1. to consider AS Macros as BGP neighbors. The AS Macro of a prepending AS is then just a special case of a general handling mechanism for cases in which that information should be retained into normal BGP processing. This, of course, further increases the router processing required. 2.2.3 Propagation delay. On the basis of absolutely no empirical evidence, the author believes that the fading of more specifics from route announcements will improve propagation times in general. There is a risk, however, that during a pathological event the increased processing required by this change will degrade propagation times during those events. This depends on both the speed of specific implementation and the character of the topology. 2.3 What are the advantages of using bounds of this type? As noted above, using a bound on the longest match algorithm limits the number of routes used and propagated by an AS without risking a loss in reachability. Many of the same effects can be achieved with active management of length-based filters, but this method should allow a single knob to achieve the same thing as many lines of router configuration. 2.4 Who gets the benefits? The AS which discards the route longer than the bound gets the initial benefit. The ASes which are BGP neighbors of that AS derive the same benefit, however, without incurring the extra processing. 2.5 What does all this mean for multi-homing? Multi-homed non-transit ASes may maintain multiple BGP relationships to ensure redundancy in the presence of provider network or business failure, to manage performance to key traffic exchange partners, and to reduce expenditures on transit through peering. The deployment of bounded longest match as described here should not affect redundancy. In the case of failure of the AS from whom the address space is initially derived, the propagation delay of the longer match past the next-hop BGP neighbor may be subject to the problems discussed in 2.2.3. Performance to key traffic exchange partners who are direct BGP neighbors is not affected at all by the deployment of bounded longest match. For those who are not direct BGP neighbors, the ability to manage the outbound path is not changed, but the return path may be changed. This is, in any case, notoriously difficult to control. The deployment of bounded longest match should not affect the traffic exchange between peers, though the ability of such a system to recognize AS Macros may be required where peers are defined by an AS Macro rather than an single AS. 3. Security Considerations This document presumes that the implementation of bounded longest match is a knob inside a router config. Since the use of the knob affects route announcements not originating within the router's AS or its direct neighbors, the new behavior may result in surprises to the announcing AS. It is possible that this behavior might be considered a denial of service or mistaken for a denial of service by systems designed to detect black-holing on behalf of the origin AS. 4. Full copyright statement Copyright (C) The Internet Society 1999. All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 5. Acknowledgements A number of colleagues at Equinix have watched the white board scribblings that gave rise to this paper; among them are Lane Patterson, Ian Cooper, Gerd Besch, Bill Norton, Diarmuid Flynn, and Sean Donelan. 6. References [1] Huston, Geoff. http://www.telstra.net/ops/bgp/index.html [2] Ahuja, Abha. http://www.merit.edu/~ahuja/ptomaine-bof/ahuja-ietf-ptomaine/index.htm [3] Bush, Randy. Plenary, IETF 51. Eventually at: http://www.ietf.org/proceedings/01aug/ 7. Author's address Ted Hardie Equinix, Inc. 2450 Bayshore Parkway Mountain View, CA 94043-1107 hardie@equinix.com Tel: 1.650.316.6226 Fax: 1.650.315.6903