Internet Engineering Task Force Rohit Dube Internet Draft Bell Labs, Lucent Technologies Expiration Date: May 1999 John G. Scudder Internet Engineering Group, LLC Route Reflection Considered Harmful draft-dube-route-reflection-harmful-00.txt 1. Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). 2. Abstract Route reflection as defined by [2] is a popular way of reducing the full-mesh IBGP peering required by routers running the Border Gateway Protocol [1]. There are cases where a topology built using route reflectors produces persistent loops or does not produce the same results as what one would expect with a full IBGP mesh. This document describes these problems. 3. Introduction Route reflectors by design are selective as to which routes they forward to their peers (i.e. reflect). Specifically, if many routes to the same NLRI are available, a route reflector will reflect only the route it has selected for its own use. Typically this reduces the number of routes each peer in the AS must store in its RIB as well as the volume of BGP update traffic. By this very nature of route reflection, every peer in the network doesn't have a full view of all the routes to a prefix to choose from. This coupled with the specifics of BGP causes problems as we now describe. Dube, Scudder [Page 1] Internet Draft November 1998 4. Persistent Loops Consider the topology in Figure 1. +----------------------+ | +------------+ | | | | | E1=====RR1=====R3=====R4=====RR2=====E2 <---> | | <---> +-------------+ Figure 1 -------- RR1, RR2, R3 and R4 are bgp routers in the same AS. E1 and E2 are BGP routers in some other AS peering with RR1 and RR2 respectively via EBGP. RR1 is configured as a route reflector with R4 as a client and RR2 is configured as a reflector in a different cluster with R3 as a client. The IBGP sessions are denoted in the diagram above by +---+ and the EBGP sessions by <--->. For simplicity, assume that all the physical links (denoted by ===) have the same IGP cost. Now if both E1 and E2 advertise the same prefix to RR1 and RR2 respectively, all other things being equal, RR1 picks the route through E1 for this prefix on account of lower IGP cost. RR1 then reflects this route to R4 which now routes to the prefix in question through R3 and RR1 Similarly RR2 picks the route through E2 and reflects it to R3 which now routes to the prefix in question through R4 and RR2. Clearly a data packet for this prefix will loop between R3 and R4. Note that the problem would disappear if the topology is reverted to full-mesh IBGP - R3 would pick the route through RR1 and R4 would pick the route through RR2, both on account of lower IGP cost. 5. Incorrect Routing Decision Consider the topology in Figure 2. [RR1]------------------[RR2] /\ | / \ | / \ | [R1] [R2] [R3] | | | | | | | | | [E1] [F1] [E2] Figure 2 -------- Dube, Scudder [Page 2] Internet Draft November 1998 RR1, RR2, R1, R2, R3 are bgp routers in the same AS R. RR1 is a route reflector with clients R1 and R2 and RR2 is a route reflector in a different cluster with client R3. E1 and E2 are bgp routers in AS E and EBGP peer with R1 and R3 respectively. F1 is a bgp router in AS F which EBGP peers with R2. Assume that E1, E2 and F advertise the same prefix to R1, R2, R3 in accordance with the following table - Router AS Router-id MED -------------------------------- E1 E 3.3.3.3 50 F1 F 2.2.2.2 - E2 E 1.1.1.1 100 All other attributes of the prefix in question are the same. Further assume that RR1's IGP cost to R1 (and E1) is the same as its cost R2 (and F1) and RR2's IGP cost to R3 (and E2) is the same as its IGP cost to R1 (and E1) and R2 (F1). (The --- lines in Figure 2 denote both physical and BGP connectivity). Now, RR1 chooses the route thru F1 on account of lower router-id as compared to the route through E1 (which wins over the route from E2 on account of MEDs). RR2 on the other hand chooses the route through E2 on account of lower router-id as compared to F. Note that RR1 sends only the route through F1 to RR2 and not the route through E1. Instead if we had a full-mesh, RR2 would see all the 3 routes and pick the one thru F1 - the route through E1 wins over the route through E2 on MEDs and the route through F1 wins over the route through E1 on account of lower router-id. A network operator shifting from a topology without to reflectors to the one above with reflectors would have a problem. Packets destined for the prefix in question would flow from RR2 through E2 instead of the original F1. 6. Characterization Problem 1 (Section 4) has two ingredients - a) the selective nature of route reflectors which prevents some routes from getting to some clients and b) The fact the some of the BGP decision process -- specifically the "prefer lowest IGP cost" rule -- depend on the router's location in the network. Thus the route reflector's decision can never perfectly mirror the decision its client would have made. Note that b) implies that reflector topologies can be out of sync with the physical topologies but bad things happen only when they get out of sync enough that clients would make decisions (in this case based on IGP cost) different from their servers if reflection was replaced by full-mesh. Dube, Scudder [Page 3] Internet Draft November 1998 Problem 2 (Section 5) has two components too - a) the selective nature of route reflectors as above and b) the partial order that MEDs impose upon competing routes (this is because MEDs can be compared only between routes from the same AS). If all decision criteria used by BGP imposed a total order on the routes (i.e all BGP routes for a prefix could be arranged in strict order of precedence), then b) would not be an issue and in-spite of a) this problem would not happen. For both examples discussed, it is possible to come up with several other topologies which suffer from the problems described above. 7. Avoidance Guidelines Since there are no protocol mechanisms currently available to detect the problems mentioned above, we provide guidelines to avoid situations where these problems could surface. As noted in section 6, problem 1 happens because the IBGP reflector topology doesn't follow the physical topology. A simple way of avoiding this problem would be to ensure that reflector clusters are constrained to follow the physical connectivity between the routers. It is always safe (at least with respect to this problem) to deploy route reflection such that no IBGP session between a pair of route reflectors will ever physically transit a reflector client. One common mode of deployment is to fully mesh all the routers in a "backbone" region, and to do route reflection to/from/between the routers in a POP, using one or more of the backbone routers as the reflector(s). Problem 2 can be avoided by always making sure that reflectors are never forced to decide on the best BGP route based on MEDs. This can be achieved either by setting the local preference of a route at the border router to reflect the MED values or by configuring community based policies using which the reflector can decide on the best route. 8. Acknowledgments The First author would like to thank to Harry Mantakos, James Da Silva and Arvind Srivaths (all at Torrent Networking Technologies Corp.), Rob Coltun (Fore Systems) and Tony Przgyienda (Bell Labs, Lucent Technologies) for discussions on this topic. The second author would like to thank Ravi Chandra and Tony Bates (both at Cisco Systems) for similar discussions. Dube, Scudder [Page 4] Internet Draft November 1998 9. References [1] Rekhter, Y., and Li, T., "A Border Gateway Protocol 4 (BGP-4)", RFC 1771, March 1995. [2] Bates, T., and Chandra, R., "BGP Route Reflection An alternative to full mesh IBGP", RFC 1966, June 1996. 10.Author Information Rohit Dube Bell Labs, Lucent Technologies Inc. 4C-508, 101 Crawfords Corner Road Holmdel, NJ 07724 e-mail: rohitd@dnrc.bell-labs.com John G. Scudder Internet Engineering Group, LLC 122 S. Main, Suite 280 Ann Arbor, MI 48104 e-mail: jgs@ieng.com Dube, Scudder [Page 5]