Network Working Group A. Bashandy Internet Draft B. Pithawala Intended status: Standards Track Cisco Systems Expires: January 2012 Jakob Hietz Ericsson July 10, 2011 Scalable, Loop-Free BGP FRR using Repair Label draft-bashandy-idr-bgp-repair-label-02.txt Abstract Consider a BGP free core scenario. Suppose the provider edge BGP speakers PE1, PE2,..., PEn know about a prefix P/p via the external routers CE1, CE2,..., CEm. If the PE router PEi loses connectivity to the primary path, whether it is another PE router or a CE router, it is desirable to immediately restore traffic by rerouting packets arriving to PEi and destined to the prefix P/p to one of the other PE routers that advertised P/p, say PEj, until BGP re-converges. However if the loss of connectivity of PEi to the primary path also resulted in the loss of connectivity between PEj and CEj, rerouting a packet before the control plane converges may result in a loop. In this document, we propose using a repair label for traffic restoration while avoiding loops. We propose advertising the ''repair'' label through BGP. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Bashandy Expires January 10, 2012 [Page 1] Internet-Draft BGP FRR using Repair Label July 2011 Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on January 10, 2012. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction...................................................3 1.1. Conventions used in this document.........................4 1.2. Terminology...............................................4 2. Protocol Operation.............................................5 2.1. Control plane Operation...................................5 2.1.1. Additional Rules for allocating and advertising a Repair label.......................................................6 2.2. Forwarding Plane Operation................................6 2.3. Example...................................................8 3. How to Disseminate Repair Label Information....................9 3.1.1. Structure of the Repair Label Path Attribute........10 3.1.2. Semantics of the Repair Label Attribute.............11 3.1.3. Additional Rule when Forwarding Advertisements Containing the Repair Path Attribute.......................12 4. Security Considerations.......................................12 5. IANA Considerations...........................................12 6. Conclusions...................................................12 Bashandy Expires January 10, 2012 [Page 2] Internet-Draft BGP FRR using Repair Label July 2011 7. References....................................................13 7.1. Normative References.....................................13 7.2. Informative References...................................13 8. Acknowledgments...............................................14 1. Introduction In a BGP free core, where traffic is tunneled between edge routers and edge routers assign labels to prefixes, BGP speakers advertise reachability information about prefixes and associate a local label with each prefix such as L3VPN [9], 6PE [10], and Softwire [8]. Suppose that a given edge router is chosen as the best next-hop for a prefix P/p. An ingress router that receives a packet from an external router and destined for the prefix P/p pushes the label advertised by the egress edge router and then "tunnels" the packet across the core to that egress router. Upon receiving the labeled packet from the core, the egress router uses the label on the packet to take the appropriate forwarding decision. In modern networks, it is not uncommon to have a prefix reachable via multiple edge routers. One example is the best external path [7]. Another more common and widely deployed scenario is L3VPN [9] with multi-homed VPN sites. As an example, consider the L3VPN topology depicted in Figure 1. +--------------------------+ | | | BGP free Core | | | | +------------------PE1----+ | / | \ | / | \ | / | \ | / | \ | / | * PE3 | CE....... VPN prefix | \ | * (P/p) | \ | / | \ | / | \ | / | \ | / | +------------------PE2----+ | | | | +--------------------------+ Figure 1 VPN prefix reachable via multiple PEs Bashandy Expires January 10, 2012 [Page 3] Internet-Draft BGP FRR using Repair Label July 2011 PE3 is the ingress PE. PE1 and PE2 are both egress PEs connected to CE. CE advertises one or more VPN prefixes, denoted by P/p. PE1 and PE2 advertise P/p as VPNv4 or VPNv6 routes to all ingress PEs, including PE3, and associates a label with each route. Suppose that the ingress PE, PE3, chooses PE1 as the next-hop for the prefix P/p. In order to minimize traffic loss, it is highly desirable for PE1 to reroute all traffic destined to P/p to PE2 as soon as the connectivity to CE is lost without waiting for the control plane (whether it is IGP or BGP) to re-converge and compute the new best path. In doing so, PE1 pushes the label advertised by PE2 for the prefix P/p, and then "tunnels" the packet to PE2. However if the loss of PE1-CE connectivity was due to CE crash, then PE2 will also reroute the traffic back to PE1, resulting in a loop. Due to ultra scalability requirements, where there is a need to support thousands of peers and hundreds of thousands of prefixes, there is a need to support quick traffic restoration without waiting for the control plane to converge and without risking loops. 1.1. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [1]. In this document, these words will appear with that interpretation only when in ALL CAPS. Lower case uses of these words are not to be interpreted as carrying RFC-2119 significance. 1.2. Terminology This section outlines the terms used in this document. For ease of use, we will use terms similar to those used by L3VPN [9] o Protected prefix: a prefix P/p (of any AFI) that a BGP speaker has an external path to. The BGP speaker may learn about the prefix from an external peer through BGP, some other protocol, or manual configuration. The protected prefix is advertised to some or all the internal peers. o Primary egress PE: an IBGP peer that can reach the protected prefix P/p through an external path and advertised the prefix to the other IBGP peers. The primary egress PE was chosen as the best path by one or more internal peers. In other words, the primary egress PE is an egress PE that will normally be used when there is no failure. Referring to Figure 1, PE1 is a primary egress PE. Bashandy Expires January 10, 2012 [Page 4] Internet-Draft BGP FRR using Repair Label July 2011 o CE: an external router through which an egress PE can reach a prefix P/p. The router "CE" in Figure 1 is an example of such a CE o Ingress PE: a BGP speaker that learns about a prefix through another IBGP peer and chooses that IBGP peer as the next-hop for the prefix. PE3 in Figure 1 is an example of an ingress PE o Repairing PE: the PE that attempts to restore traffic when the primary path is no longer reachable "without" waiting for BGP to re-converge. The repairing PE restores the traffic by rerouting the traffic (through a tunnel) towards the pre-calculated repair PE when it detects that the primary path is no longer reachable. The primary path may be a CE or another egress PE. Referring to Figure 1, if PE3 chooses PE1 as the primary egress PE and PE1 decides to reroute traffic to PE2 on losing reachability with CE, then PE1 is a repairing PE. If PE3 chooses PE1 as a primary path and PE3 decides to use PE2 as a repair path when it loses reachability to PE2, then PE3 is a repairing PE. o Primary label: the label advertised by the primary egress PE to be used for normal traffic forwarding. o Repair egress PE: an egress PE other than the primary egress PE that can reach the protected prefix P/p through an external neighbor. The repair PE is pre-calculated via other PEs prior to any failure o Repair label: the label that will be pushed on the packet when the repairing PE reroutes the traffic (through a tunnel) towards the repair egress PE. Section 2 discusses how the repair label is used. Section 3 discusses semantics of and the method for disseminating repair label information. o Repair path: the repair egress PE and the repair label. o internal and external: internal or external to the core. 2. Protocol Operation This section explains the operation of the control and forwarding planes of routers participating in BGP-free core traffic restoration. 2.1. Control plane Operation 1. As usual, each PE allocates a local label for each prefix it can reach through an external neighbor CE. This is the primary label used for normal traffic forwarding. Bashandy Expires January 10, 2012 [Page 5] Internet-Draft BGP FRR using Repair Label July 2011 2. To provide repair path information to all PEs, the PE also allocates a repair label to the prefix if it can reach that prefix via an external neighbor. Different repair label allocation schemes are proposed in Section 3. 3. The PE advertises both the primary and repair labels to all IBGP peers. 4. When a PE receives the label advertisement from egress PEs, it calculates a primary egress PE and a repair egress PE based on its internal path selection criteria. Note that the method of choosing the repair path is beyond the scope of this document. 5. In the end, for some of the prefixes advertised by more than one PE, a PE will have o a primary path o a repair path consisting of a repair PE and a repair label advertised by the chosen repair PE. 6. A PE "never" protects a repair label. Hence on any PE, a repair label only has paths towards the CE. However a primary label may have a repair path towards a chosen repair PE 2.1.1. Additional Rules for allocating and advertising a Repair label o A repair PE MUST NOT advertise a repair label for a prefix if it does NOT have an external path to the prefix o A repair PE MUST NOT associate an internal path with a repair label o Repair labels SHOULD be advertised with labeled address families only. That is AFI/SAFI 1/4, 2/4, 1/128, and 2/128. 2.2. Forwarding Plane Operation This section specifies the forwarding plane operation when a PE receives a packet and any of the following two conditions are true: o The PE lost the primary path and has not yet calculated another primary path and programmed it in the forwarding plane. The primary path may be external or internal Bashandy Expires January 10, 2012 [Page 6] Internet-Draft BGP FRR using Repair Label July 2011 o The arriving packet arrived from the core and the PE does not have an external path. It is noteworthy to mention that this condition should be a temporary condition until all ingress PEs converge and stop sending traffic to that PE. The forwarding plane processes arriving traffic as follows: 1. If the repairing PE is an egress PE, the packet arrives at the repairing PE with the primary label at the top because the packet is "tunneled" from the ingress PE(s). In that case, the repairing PE swaps the incoming label stack with the "repair label stack" advertised by the repair egress PE. Section 3.1.2. specifies all the details 2. If the repairing PE is an ingress PE, it MAY push the "repair label stack" advertised by the repair egress PE. Section 3.1.2. specifies all the details 3. The repairing PE tunnels the packet to the repair PE 4. At the repair PE, the packet arrives with the repair label at the top. The repair PE uses the incoming label stack to take forwarding decisions 5. If the repair egress PE can reach the CE, the repair PE forwards the packet towards the CE. 6. If the repair PE cannot reach the CE, the traffic will be dropped because a PE never protects a repair label Bashandy Expires January 10, 2012 [Page 7] Internet-Draft BGP FRR using Repair Label July 2011 2.3. Example Consider the L3VPN [9] topology depicted in Figure 2 where two PEs are connected to the same PE. Assume that the core is LDP. We will be using an advertised repair label. PE1 \ \ \ \ LDP core CE....... VPN prefix / (10.0.0.0/8) / / / PE2 Figure 2 : L3VPN Example PE1: Repairing egress PE PE2: repair PE Primary VPN label advertised by PE1 to all PEs: 4000 Repair VPN label advertised by PE1 to all PEs: 5000 Primary VPN label advertised by PE2 to all PEs: 2000 Repair VPN label advertised by PE2 all PEs: 3000 LDP label for PE2 on PE1 is 1234 LDP label for PE1 on PE2 is 4567 Before failure ''''''''''''''' PE1 has the following FIB entries 4000 -----> CE (unlabeled) -----> PE2, swap 4000 with 3000 and then push 1234 5000 -----> CE (unlabeled) PE2 has the following 2000 -----> CE (unlabeled) -----> PE1, swap 2000 with 5000 and then push 4567 3000 ------> CE (unlabeled) After the CE crashes '''''''''''''''''''' PE1 has the following entry: 4000 -----> PE2, swap 4000 with 3000 and then push 1234 Bashandy Expires January 10, 2012 [Page 8] Internet-Draft BGP FRR using Repair Label July 2011 5000 -----> Drop PE2 has the following 2000 -----> PE1, swap 2000 with 5000 and then push 4567 3000 ------> Drop Because of the above routing entries, any traffic arriving from the core at PE1 and destined for 10.0.0/8, is rerouted towards PE2 using the repair VPN label 3000. PE2 will just drop it instead of looping it back towards PE1. After the link between PE1 and CE fails (CE did not crash) ''''''''''''''''''''''''''''''''''''''''''''''''''''''''' PE1 has the following entry: 4000 -----> PE2, swap 4000 with 3000 and then push 1234 5000 -----> Drop PE2 has the following 2000 -----> CE (unlabeled) -----> PE1, swap 2000 with 5000 and then push 4567 3000 ------> CE Because of the above routing entries, any traffic arriving from the core at PE1 and destined for 10.0.0/8 is rerouted towards PE2 using the repair VPN label 3000. PE2 will forward the traffic towards CE. 3. How to Disseminate Repair Label Information We propose to advertise the repair label as an optional path attribute. Advertising the repair label as an optional path attributes has some advantages: o An egress PE can benefit from a scalable repair label allocation schemes such as per-CE repair label allocation o Allows the repairing PE to share the same repair path among multiple protected prefixes. Since the repair path is shared by all labels sharing the path attribute, the repairing PE can optimize its RIB and FIB by sharing the same repair path data structure among a large number of protected prefixes. o Reduces the BGP update message size. Instead of having to send additional labels per prefix, multiple prefixes can share the same repair label Bashandy Expires January 10, 2012 [Page 9] Internet-Draft BGP FRR using Repair Label July 2011 o The number of labels used for traffic restoration does not depend on the number of protected prefixes o Allows for incremental deployment because the attribute is optional The main disadvantage of sharing the same repair path among multiple primary paths is loss of fine grain control. It is not possible to manage, control, or provide differentiated handling to traffic on per prefix basis until the network re-converges. The loss of fine grain control is limited to the BGP re-convergence period. It is noteworthy to mention that per-CE repair label allocation has some advantages over per-prefix repair label allocation. First it results in using fewer labels. Second it allows for better packing in BGP messages. Third it does not require special handling in the forwarding plane at the repair PE. Fourth it simplifies the forwarding plane while maximizing the packet switching performance because the egress PE can take a forwarding decision with a single FIB lookup. 3.1.1. Structure of the Repair Label Path Attribute This document defines the repair label attribute as an optional non- transitive path attribute [2] as follows: Attribute name: REPAIR_LABEL Type code: TBD Attribute Flags: Optional bit: 1 Transitive bit: 0 Partial bit: 0 Extended Length bit: 0 Length of the attribute: length in octets of the attribute Attribute Value: The attribute value contains a stack of one or more labels. The encoding of the labels is identical to encoding of the "label" field in [4]. The value of the bottom of stack (BOS) bit is determined at traffic restoration time as specified in Section 3.1.2. Bashandy Expires January 10, 2012 [Page 10] Internet-Draft BGP FRR using Repair Label July 2011 3.1.2. Semantics of the Repair Label Attribute This document specifies the semantics of the repair label attribute when the attribute carries one repair label only. The semantics of more than one repair label is beyond the scope of this document. Suppose a BGP speaker PE1 receives an update message with a repair label attribute containing the label "Lr2" from the IBGP peer PE2. Suppose the NLRI in the MP_REACH_NLRI attribute [3] contains the prefixes R1, R2,..., Rn each bound to a label L21, L22,..., L2n, respectively. This means the following: 1. PE2 will never attempt to repair a packet arriving with the label "Lr2". Hence PE2 will either forward the packet to an external CE or drop the packet 2. PE2 expects the folowing from PE1: a. Case a: The route Ri on PE1 is bound to a local label "L1i". Suppose PE1 receives a packet with the label "L1i" at the top of the stack. If the PE1 loses the primary path for a prefix Ri or PE1 receives a packet from the core while not having an external path, and PE1 decides that PE2 is the repair PE for the prefix Ri, then PE1 has to swap the label "L1i" on the packet with the repair label "Lr2" and then tunnel the packet to PE2. The bottom of stack (BOS) bit MUST be copied from the label arriving on the packet to the label "Lr2" b. Case b: The route Ri on PE1 is not bound to any local label. If the PE1 loses the primary path for a prefix Ri and PE1 decides that PE2 is the repair PE for the prefix Ri, then PE1 MAY push the label "Lr2" and then tunnel the packet to PE2. The bottom of stack (BOS) bit in "Lr2" MUST be set as specified in [5]. c. Case c: The route Ri on PE1 is bound to an aggregate label (e.g. per-vrf label). In that case, PE1 has to perform more than one route lookup to determine the primary path. Eventually, there will either be an IP lookup or a label lookup that points to the primary path: i. A label lookup points to the primary path: In that case, PE1 handles the packet as described in item 2.a above. ii. An IP lookup points to the primary path: In that case, PE1 handles the packet similar described in item 2.b above. Bashandy Expires January 10, 2012 [Page 11] Internet-Draft BGP FRR using Repair Label July 2011 3.1.3. Additional Rule when Forwarding Advertisements Containing the Repair Path Attribute As specified in Section 3.1.1, the repair label attribute is a non- transitive attribute. However there may be cases, such as inter-AS option (b)[9], route reflectors [11], or confederation [12], where a router may replace the advertised next-hop with its own before forwarding an advertisement. If a BGP speaker replaces the next-hop attribute with its own and the advertisement contains a repair label attribute with label stack "Sr", there are two options o Option 1: The BGP speaker MUST NOT advertise the repair label attribute o Option 2: The BGP speaker MUST replace the repair label stack "Sr" with a locally allocated label stack "Sr1" before advertising the route and then advertise the stack "Sr1" in the repair label attribute. For the forwarding plane, the BGP speaker MUST install a swap forwarding entry such that if the BGP speaker receives a packet with the label stack "Sr1", it swaps "Sr1" with the stack "Sr". Note that advertising the repair label attribute by the router depends on whether the router understands the semantics of and supports the repair label attribute at the time of receiving an advertisement containing the repair label attribute. 4. Security Considerations No additional security risk is introduced by using the mechanisms proposed in this document 5. IANA Considerations This document defines a new BGP path attribute. IANA maintains a list of the current BGP attribute typecodes in [6]. This document proposes defining a new typecode value of "TBD" for the REPAIR_LABEL path attribute 6. Conclusions This document proposes using a repair label to allow restoring traffic prior to BGP convergence while avoiding loops Bashandy Expires January 10, 2012 [Page 12] Internet-Draft BGP FRR using Repair Label July 2011 7. References 7.1. Normative References [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [2] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 (BGP-4), RFC 4271, January 2006 [3] Bates, T., Chandra, R., Katz, D., and Rekhter Y., "Multiprotocol Extensions for BGP", RFC 4760, January 2007 [4] Rosen, E., Rekhter, Y., "Carrying Label Information in BGP-4", RFC 3107, May 2001 [5] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., Farinacci, D., Li, T. and A. Conta, "MPLS Label Stack Encoding", RFC 3032, January 2001. 7.2. Informative References [6] BGP Parameters, http://www.iana.org/assignments/bgp- parameters/bgp-parameters.xhtml [7] Marques,P., Fernando, R., Chen, E, Mohapatra, P., "Advertisement of the best external route in BGP", draft-ietf- idr-best-external-02.txt, April 2004. [8] Wu, J., Cui, Y., Metz, C., and E. Rosen, "Softwire Mesh Framework", RFC 5565, June 2009. [9] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, February 2006. [10] De Clercq, J. , Ooms, D., Prevost, S., Le Faucheur, F., Connecting IPv6 Islands over IPv4 MPLS Using IPv6 Provider Edge Routers (6PE)", RFC 4798, February 2007 [11] Bates, T., Chen, E., and Chandra, R., "BGP Route Reflection: An Alternative to Full Mesh Internal BGP (IBGP)", RFC 4456, April 2006 [12] Traina, P., McPherson, P., and Scudder, J., "Autonomous System Confederations for BGP", RFC 5065, August 2007 Bashandy Expires January 10, 2012 [Page 13] Internet-Draft BGP FRR using Repair Label July 2011 8. Acknowledgments Special thanks to Keyur Patel, Robert Raszuk, and Eric Rosen for the valuable comments This document was prepared using 2-Word-v2.0.template.dot. Authors' Addresses Ahmed Bashandy Cisco Systems 170 West Tasman Dr, San Jose, CA 95134 Email: bashandy@cisco.com Burjiz Pithawala Cisco Systems 170 West Tasman Dr, San Jose, CA 95134 Email: bpithaw@cisco.com Jakob Heitz Ericsson 100 Headquarters Drive, San Jose, CA, 95134 Email: jakob.heitz@ericsson.com Bashandy Expires January 10, 2012 [Page 14]