Network Working Group K. Patel Internet-Draft A. Lindem Intended status: Standards Track A. Roy Expires: January 9, 2017 D. Yeung V. Venugopal Cisco Systems July 8, 2016 Shortest Path Routing Extensions for BGP Protocol draft-keyupate-idr-bgp-spf-00.txt Abstract Many Massively Scaled Data Centers (MSDCs) have converged on simplified layer 3 routing. Furthermore, requirements for operational simplicity have lead many of these MSDCs to converge on BGP as their single routing protocol for both their fabric routing and their Data Center Interconnect (DCI) routing. This document describes a solution which leverages BGP Link-State distribution and the Shortest Path First algorithm similar to Internal Gateway Protocols (IGPs) such as OSPF. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 9, 2017. Copyright Notice Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Patel, et al. Expires January 9, 2017 [Page 1] Internet-Draft BGP Protocol SPF Extensions July 2016 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 2. BGP Peering Models . . . . . . . . . . . . . . . . . . . . . 4 2.1. BGP Single-Hop Peering on Network Node Connections . . . 4 2.2. BGP Peering Between Directly Connected Network Nodes . . 4 2.3. BGP Peering in Route-Reflector or Controller Topology . . 4 3. Extensions to BGP-LS . . . . . . . . . . . . . . . . . . . . 5 3.1. Node NLRI Usage and Modifications . . . . . . . . . . . . 5 3.2. Link NLRI Usage . . . . . . . . . . . . . . . . . . . . . 6 3.3. Prefix NLRI Usage . . . . . . . . . . . . . . . . . . . . 6 4. Shortest Path Routing (SPF) Capability . . . . . . . . . . . 6 5. Decision Process with SPF Algorithm . . . . . . . . . . . . . 6 5.1. Impact on BGP Tie-breaking attributes . . . . . . . . . . 7 5.2. Dual Stack Support . . . . . . . . . . . . . . . . . . . 7 5.3. NEXT_HOP Manipulation . . . . . . . . . . . . . . . . . . 7 5.4. Error Handling . . . . . . . . . . . . . . . . . . . . . 8 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 7.1. Acknowledgements . . . . . . . . . . . . . . . . . . . . 9 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 8.1. Normative References . . . . . . . . . . . . . . . . . . 9 8.2. Information References . . . . . . . . . . . . . . . . . 10 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 Patel, et al. Expires January 9, 2017 [Page 2] Internet-Draft BGP Protocol SPF Extensions July 2016 1. Introduction Many Massively Scaled Data Centers (MSDCs) have converged on simplified layer 3 routing. Furthermore, requirements for operational simplicity have lead many of these MSDCs to converge on BGP [RFC4271] as their single routing protocol for both their fabric routing and their Data Center Interconnect (DCI) routing. Requirements and procedures for using BGP are described in [I-D.ietf-rtgwg-bgp-routing-large-dc]. This document describes an alternative solution which leverages BGP-LS [RFC7752] and the Shortest Path First algorithm similar to Internal Gateway Protocols (IGPs) such as OSPF [RFC2328]. [RFC4271] defines the Decision Process that is used to select routes for subsequent advertisement by applying the policies in the local Policy Information Base (PIB) to the routes stored in its Adj-RIBs- In. The output of the Decision Process is the set of routes that are announced by a BGP speaker to its peers. These selected routes are stored by a BGP speaker in the speaker's Adj-RIBs-Out according to policy. [RFC7752] describes a mechanism by which link-state and TE information can be collected from networks and shared with external components using BGP. This is achieved by defining a NLRI carried within BGP-LS AFIs and BGP-LS SAFIs. The BGP-LS extensions defined in [RFC7752] makes use of the Decision Process defined in [RFC4271]. This draft modifies [RFC7752] by replacing its use of the existing Decision Process; in particular the Phase 1 and 2 decision functions of the Decision Process are replaced with the Shortest Path Algorithm (SPF) also known as the Dijkstra Algorithm. The Phase 3 decision function is also simplified since it is no longer dependent on the previous phases. This solution avails the benefits of both BGP and SPF-based IGPs. These include TCP based flow-control, no periodic link-state refresh, and completely incremental NLRI advertisement. These advantages can reduce the overhead in MSDCs where there is a high degree of Equal Cost Multi-Path (ECMPs) and the topology is very stable. Additionally, using a SPF-based computation can support fast convergence and the computation of Loop-Free Alternatives (LFAs) [RFC5286] in the event of link failures. Furthermore, a BGP based solution lends itself to multiple peering models including those incorporating route-reflectors [RFC4456] or controllers. Support for Multiple Topology Routing (MTR) as described in [RFC4915] is an area for further study dependent on deployment requirements. Patel, et al. Expires January 9, 2017 [Page 3] Internet-Draft BGP Protocol SPF Extensions July 2016 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. BGP Peering Models Depending on the requirements, scaling, and capabilities of the BGP speakers, various peering models are supported. The only requirement is that all BGP speakers in the BGP SPF routing domain receive link- state NLRI on a timely basis, run an SPF calculation, and update their data plane appropriately. The content of the Link NLRI is described in Section 3.2. 2.1. BGP Single-Hop Peering on Network Node Connections The simplest peering model is the one described in section 5.2.1 of [I-D.ietf-rtgwg-bgp-routing-large-dc]. In this model, EBGP single- hop sessions are established over direct point-to-point links interconnecting the network nodes. For the purposes of BGP SPF, Link NLRI is only advertised if a single-hop BGP session has been established, the Link-State address family capability has been exchanged, and the SPF capability has been exchanged on the corresponding session. If the session goes down, the NLRI will be withdrawn. 2.2. BGP Peering Between Directly Connected Network Nodes In this model, BGP speakers peer with all directly connected network nodes but the sessions may be multi-hop and the direct connection discovery and liveliness detection for those connections are independent of the BGP protocol. How this is accomplished is outside the scope of this document. Consequently, there will be a single session even if there are multiple direct connections between BGP speakers. For the purposes of BGP SPF, Link NLRI is advertised as long as a BGP session has been established, the Link-State address family capability has been exchanged, the SPF capability has been exchanged, and the corresponding link is up and considered operational. 2.3. BGP Peering in Route-Reflector or Controller Topology In this model, BGP speakers peer solely with one or more Route Reflectors [RFC4456] or controllers. As in the previous model, direct connection discovery and liveliness detection for those connections are done outside the BGP protocol. For the purposes of Patel, et al. Expires January 9, 2017 [Page 4] Internet-Draft BGP Protocol SPF Extensions July 2016 BGP SPF, Link NLRI is advertised as long as the corresponding link is up and considered operational. 3. Extensions to BGP-LS [RFC7752] describes a mechanism by which link-state and TE information can be collected from networks and shared with external components using BGP protocol. It contains two parts: definition of a new BGP NLRI that describes links, nodes, and prefixes comprising IGP link-state information and definition of a new BGP path attribute (BGP-LS attribute) that carries link, node, and prefix properties and attributes, such as the link and prefix metric or auxiliary Router- IDs of nodes, etc. The BGP protocol will be used in the Protocol-ID field specified in table 1 of [I-D.ietf-idr-bgpls-segment-routing-epe]. The local and remote node descriptors for all NLRI will be the BGP Router-ID (TLV 516) and either the AS Number (TLV 512) [RFC7752] or the BGP Confederation Member (TLV 517) [I-D.ietf-idr-bgpls-segment-routing-epe]. However, if the BGP Router-ID is known to be unique within the BGP Routing domain, it can be used as the sole descriptor. 3.1. Node NLRI Usage and Modifications The SPF capability is a new Node Attribute TLV that will be added to those defined in table 7 of [RFC7752]. The new attribute TLV will only be applicable when BGP is specified in the Node NLRI Protocol ID field. The TBD TLV type will be defined by IANA. The new Node Attribute TLV will contain a single octet SPF algorithm field: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SPF Algorithm | +-+-+-+-+-+-+-+-+ The SPF Algorithm may take the following values: 1 - Normal SPF 2 - Strict SPF When computing the SPF for a given BGP routing domain, only BGP nodes advertising the SPF capability attribute will be included the Shortest Path Tree (SPT). Patel, et al. Expires January 9, 2017 [Page 5] Internet-Draft BGP Protocol SPF Extensions July 2016 3.2. Link NLRI Usage The criteria for advertisement of Link NLRI are discussed in Section 2. Link NLRI is advertised with local and remote node descriptors as described above and unique link identifiers dependent on the addressing. For IPv4 links, the links local IPv4 (TLV 259) and remote IPv4 (TLV 260) addresses will be used. For IPv6 links, the local IPv6 (TLV 261) and remote IPv6 (TLV 262) addresses will be used. For unnumbered links, the link local/remote identifiers (TLV 258) will be used. For links supporting having both IPv4 and IPv6 addresses, both sets of descriptors may be included in the same Link NLRI. The link identifiers are described in table 5 of [RFC7752]. The link IGP metric attribute TLV (TLV 1095) as well as any others required for non-SPF purposes SHOULD be advertised. Algorithms such as setting the metric inversely to the link speed as done in the OSPF MIB [RFC4750] may be supported. However, this is beyond the scope of this document. 3.3. Prefix NLRI Usage Prefix NLRI is advertised with a local descriptor as described above and the prefix and length used as the descriptors (TLV 265) as described in [RFC7752]. The prefix metric attribute TLV (TLV 1155) as well as any others required for non-SPF purposes SHOULD be advertised. For loopback prefixes, the metric should be 0. For non- loopback, the setting of the metric is beyond the scope of this document. 4. Shortest Path Routing (SPF) Capability In order to replace the Phase 1 and 2 decision functions of the existing Decision Process with an SPF-based Decision Process, this draft introduces a new capability to signal the support of an SPF Decision Process. The SPF Capability is a new BGP Capability [RFC5492]. The Capability Code for this capability is allocated by IANA as specified in the Section 6. The Capability Length field of this capability has a value of 0. 5. Decision Process with SPF Algorithm The Decision Process described in [RFC4271] takes place in three distinct phases. The Phase 1 decision function of the Decision Process is responsible for calculating the degree of preference for each route received from a Speaker's peer. The Phase 2 decision function is invoked on completion of the Phase 1 decision function Patel, et al. Expires January 9, 2017 [Page 6] Internet-Draft BGP Protocol SPF Extensions July 2016 and is responsible for choosing the best route out of all those available for each distinct destination, and for installing each chosen route into the Loc-RIB. The combination of the Phase 1 and 2 decision functions is also known as a Path vector algorithm. The SPF based Decision process starts with selecting only those Node NLRI whose SPF capability TLV matches with the local BGP speaker's SPF capability TLV value. These selected Node NLRI and their Link/ Prefix NLRI are use to build a directed graph during the SPF computation. The best paths for BGP prefixes are installed as a result of the SPF process. The Phase 3 decision function of the Decision Process [RFC4271] is also simplified since it is no longer based on the output of the previous phases. Since Link-State NLRI always contains the local descriptor [RFC7752], it will only be originated by a single BGP speaker in the BGP routing domain. Hence, for each valid NLRI, the Phase 3 decision function will simply need to advertise a valid NLRI instance dependent on policy. 5.1. Impact on BGP Tie-breaking attributes The modified Decision Process with SPF algorithm uses the metric from Link and Prefix NLRI Attribute TLVs [RFC7752]. As a result, any attributes that would influence the Decision process defined in [RFC4271] like ORIGIN, MULTI_EXIT_DISC, and LOCAL_PREF attributes are ignored by the SPF algorithm. Furthermore, the NEXT_HOP attribute value is preserved and validated but otherwise ignored in any received BGP Update messages. 5.2. Dual Stack Support The SPF based decision process operates on Node, Link, and Prefix NLRIs that support both IPv4 and IPv6 addresses. Whether to run a single SPF instance or multiple SPF instances for separate AFs is a matter of a local implementation. Normally, IPv4 next-hops are calculated for IPv4 prefixes and IPv6 next-hops are calculated for IPv6 prefixes. However, an interesting use-case is deployment of [RFC5549] where IPv6 link-local next-hops are calculated for both IPv4 and IPv6 prefixes. As stated in Section 1, support for Multiple Topology Routing (MTR) is an area for future study. 5.3. NEXT_HOP Manipulation A BGP speaker that supports SPF extensions MAY interact with peers that don't support SPF extensions. If the BGP Link-State address family is advertised to a peer not supporting the SPF extensions described herein, then the BGP speaker MUST conform to the NEXT_HOP rules mentioned in [RFC4271] when announcing the Link-State address family routes to those peers. Patel, et al. Expires January 9, 2017 [Page 7] Internet-Draft BGP Protocol SPF Extensions July 2016 All BGP peers that support SPF extensions would locally compute the NEXT_HOP values as result of the SPF process. As a result, the NEXT_HOP attribute is always ignored on receipt. However BGP speakers should set the NEXT_HOP address according to the NEXT_HOP attribute rules mentioned in [RFC4271]. 5.4. Error Handling When a BGP speaker receives a BGP Update containing a malformed SPF Capability TLV in the Node NLRI BGP-LS Attribute [RFC7752], it MUST ignore the received TLV and the Node NLRI and not pass it to other BGP peers as specified in [RFC7606]. When discarding a Node NLRI with malformed TLV, a BGP speaker SHOULD log an error for further analysis. 6. IANA Considerations This document defines a new capability for BGP known as a SPF Capability. We request IANA to assign a BGP capability number from BGP Capability Codes Registry. This document also defines a new attribute TLV for BGP LS Node NLRI. We request IANA to assign a new TLV for the SPF capability from the "BGP-LS Node Descriptor, Link Descriptor, Prefix Descriptor, and Attribute TLVs" Registry. Additionally, IANA is requested to create a new registry for "BGP-LS SPF Capability Algorithms" for the value of the algorithm both in the BGP-LS Node Attribute TLV and the BGP SPF Capability. The initial assignments are: +-------------+-----------------------------------+ | Value(s) | Assignment Policy | +-------------+-----------------------------------+ | 0 | Reserved (not to be assigned) | | | | | 1 | SPF | | | | | 2 | Strict SPF | | | | | 3-254 | Unassigned (IETF Review) | | | | | 255 | Reserved (not to be assigned) | +-------------+-----------------------------------+ BGP-LS SPF Capability Algorithms Patel, et al. Expires January 9, 2017 [Page 8] Internet-Draft BGP Protocol SPF Extensions July 2016 7. Security Considerations This extension to BGP does not change the underlying security issues inherent in the existing [RFC4724] and [RFC4271]. 7.1. Acknowledgements The authors would like to thank .... for the review and comments. 8. References 8.1. Normative References [I-D.ietf-idr-bgpls-segment-routing-epe] Previdi, S., Filsfils, C., Ray, S., Patel, K., Dong, J., and M. Chen, "Segment Routing BGP Egress Peer Engineering BGP-LS Extensions", draft-ietf-idr-bgpls-segment-routing- epe-05 (work in progress), May 2016. [I-D.ietf-rtgwg-bgp-routing-large-dc] Lapukhov, P., Premji, A., and J. Mitchell, "Use of BGP for routing in large-scale data centers", draft-ietf-rtgwg- bgp-routing-large-dc-11 (work in progress), June 2016. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, DOI 10.17487/RFC4271, January 2006, . [RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement with BGP-4", RFC 5492, DOI 10.17487/RFC5492, February 2009, . [RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K. Patel, "Revised Error Handling for BGP UPDATE Messages", RFC 7606, DOI 10.17487/RFC7606, August 2015, . [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and S. Ray, "North-Bound Distribution of Link-State and Traffic Engineering (TE) Information Using BGP", RFC 7752, DOI 10.17487/RFC7752, March 2016, . Patel, et al. Expires January 9, 2017 [Page 9] Internet-Draft BGP Protocol SPF Extensions July 2016 8.2. Information References [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, DOI 10.17487/RFC2328, April 1998, . [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route Reflection: An Alternative to Full Mesh Internal BGP (IBGP)", RFC 4456, DOI 10.17487/RFC4456, April 2006, . [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, DOI 10.17487/RFC4724, January 2007, . [RFC4750] Joyal, D., Ed., Galecki, P., Ed., Giacalone, S., Ed., Coltun, R., and F. Baker, "OSPF Version 2 Management Information Base", RFC 4750, DOI 10.17487/RFC4750, December 2006, . [RFC4915] Psenak, P., Mirtorabi, S., Roy, A., Nguyen, L., and P. Pillay-Esnault, "Multi-Topology (MT) Routing in OSPF", RFC 4915, DOI 10.17487/RFC4915, June 2007, . [RFC5286] Atlas, A., Ed. and A. Zinin, Ed., "Basic Specification for IP Fast Reroute: Loop-Free Alternates", RFC 5286, DOI 10.17487/RFC5286, September 2008, . [RFC5549] Le Faucheur, F. and E. Rosen, "Advertising IPv4 Network Layer Reachability Information with an IPv6 Next Hop", RFC 5549, DOI 10.17487/RFC5549, May 2009, . Authors' Addresses Keyur Patel Cisco Systems 170 W. Tasman Drive San Jose, CA 95134 USA Email: keyupate@cisco.com Patel, et al. Expires January 9, 2017 [Page 10] Internet-Draft BGP Protocol SPF Extensions July 2016 Acee Lindem Cisco Systems 170 W. Tasman Drive San Jose, CA 95134 USA Email: acee@cisco.com Abhay Roy Cisco Systems 170 W. Tasman Drive San Jose, CA 95134 USA Email: akr@cisco.com Derek Yeung Cisco Systems 170 W. Tasman Drive San Jose, CA 95134 USA Email: myeung@cisco.com Venu Venugopal Cisco Systems 170 W. Tasman Drive San Jose, CA 95134 USA Email: venuv@cisco.com Patel, et al. Expires January 9, 2017 [Page 11]