Inter-Domain Routing P. Marques, Ed. Internet-Draft R. White Intended status: Standards Track Cisco Systems, Inc. Expires: June 24, 2011 December 21, 2010 Topology-based aggregation draft-marques-idr-aggregate-00 Abstract This document defines a mechanism which allows more-specific IP address prefixes to be aggregated when they are topologically equivalent or less preferable than a less-specific advertisement. It is designed to allow multi-homed sites to use "Provider Aggregatable" (PA) addresses and obtain both redundancy and local traffic optimizations when using multiple service providers. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on June 24, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of Marques & White Expires June 24, 2011 [Page 1] Internet-Draft Topology-based aggregation December 2010 the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Topology-based aggregation . . . . . . . . . . . . . . . . . . 4 3. BGP AGGREGATE_INFO attribute . . . . . . . . . . . . . . . . . 6 4. BGP extension deployment . . . . . . . . . . . . . . . . . . . 8 5. Path selection criteria . . . . . . . . . . . . . . . . . . . 8 6. Network deployment . . . . . . . . . . . . . . . . . . . . . . 9 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 10 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 10. Security Considerations . . . . . . . . . . . . . . . . . . . 11 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11.1. Normative References . . . . . . . . . . . . . . . . . . 11 11.2. Informative References . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 Marques & White Expires June 24, 2011 [Page 2] Internet-Draft Topology-based aggregation December 2010 1. Introduction With the existing inter-domain routing functionality as defined by RFC 4271 [RFC4271], multi-homed sites feel compelled to advertise their individual prefixes to the entire Internet in order to achieve the desired reliability and traffic-engineering behavior. Multi-homed sites typically advertise "Provider Independent" (PI) prefixes. An alternative approach would be for "Provider Aggregatable" (PA) space to be used along with a set of procedures that allow for route advertisements to be aggregated. This option must retain the functionality that is provided today by PI advertisements. One assumption made here is that renumbering of a multi-homed site is economically feasible given the increased usage of dynamic host configuration protocols and/or network address translation. This document is being written at a time when IP addresses are becoming scarse. It is difficult to predict whether Internet address allocation and assignment policies will drift torwards the use of PI space in order to achieve more efficient allocation. Or whether scarcity will make it harder to obtain PI space. In the latter case, this document define an approach that would allow multi-homed sites a method for using PA addresses without bumping into address space filtering rules that may be in place to limit the growth of the internet table size. In order to meet the requirements stated above for multi-home site routing, the following is proposed: The routing advertisement must be taken out of "Provider Aggregatable" (PA) space. The routing advertisement must be leaked through one or more alternate providers, other than the one owning the PA space. These more-specific route advertisements shall be automatically aggregated, depending on the network topology. If the multi-homed site becomes disconnected from the owner of the address space it must be possible to unsuppress the most-specific adververtisement. In order to provide topology-dependent aggregation, this document defines a new BGP path attribute, AGGREGATE_INFO, which defines a BGP prefix as being a more specific of a given aggregate prefix. A BGP Marques & White Expires June 24, 2011 [Page 3] Internet-Draft Topology-based aggregation December 2010 speaker that receives such a prefix MUST compare the received prefix with the specified aggregate, if present in its Loc-RIB. The standard path selection algorithm is applied between the paths of the more-specific prefix and the best-path of the aggregate. If the best-path of the aggregate is preferable, the more-specific prefix should be considered as "Inactive". It SHOULD NOT be further re- advertised into External BGP sessions. It MAY BE re-advertised into Internal BGP sessions, if the path-selection criteria between the aggregate and more-specific justifies it. Conceptually, the aggregate prefix conveys implicit path information that applies to the delegated more-specifics. Path selection occurs between the explicit paths that are present in the routing system and these implicit paths represented by the aggregates. The AGGREGATE_INFO attribute contains an operational status field. This field is used to indicate the status of the connectivity between the multi-homed site and the provider owning the aggregate. It can be used in a situation of failure in which the customer becomes detached from the service provider originating the PA aggregate. When the operational status denotes connectivity failure this will result on the more-specific being unsuppressed and attracting traffic through the failover paths. The operational status is used explicitly in order to inform downstreams that the more-specific is temporary and will be removed from the routing system once connectivity is restored. The operational status field uses three colors: green, yellow and red. Green means full connectivity. Red means no connectivity. Yellow informs the routing system that while the site itself has no direct connectivity to the primary provider, it believes that there is sufficient redundant connectivity in the network that its prefix is still reachable through it. 2. Topology-based aggregation The intent of this extension is to achieve the same semantics as "Provider Independent" (PI) advertisements, while removing the more specifics from the BGP routing table in locations of the network where the aggregate provides equal or better service to the IP destination prefix in question. Marques & White Expires June 24, 2011 [Page 4] Internet-Draft Topology-based aggregation December 2010 +------+ | AS 10| +------+ / \ / \ +------+ +------+ | AS 1 | | AS 2 | +------+ +------+ | \ / | | \ / | | \/ | | /\ | | / \ | | / \ | +------+ +------+ | AS 3 | | AS 4 | +------+ +------+ \ / \ / +------+ | AS 20| +------+ Figure 1 Figure 1 contains an example of the usage of the BGP AGGREGATE_INFO attribute. AS 10 in the example above has been delegated "10.0.1/24" prefix by AS 1. Using this extension, it will advertise the prefix into AS 2, which will likely prefer a customer router over a peer route to AS 1. When AS 2 re-advertises the more-specific "10.0.1/24" to its peers, AS 3 and 4 in this example, the peers will compare the more-specific to the "10.0/16" aggregate received from AS 1. Typically AS 3 will prefer the aggregate (as-path: "1", length 1) over the more-specific (as-path: "2 10", length: 2). When this is the case, the more-specific will be suppressed and no longer propagated in the network. If, for any reason, AS 1 becomes disconnected from AS 3, the more-specific route to "10.0.1/24" will become active again, achieving the required failover protection. From a traffic-engineering perspective, the more-specific is selected in locations in the network where AS 10 is topologically closer than AS 1. In the example described above, the aggregate route may have a shorter as-path than the equivalent PI prefix that is in use currently. A PI prefix that is injected by the customer AS (AS 10) would be advertised to AS 3 with an as-path of "1 10". In order to Marques & White Expires June 24, 2011 [Page 5] Internet-Draft Topology-based aggregation December 2010 provide multi-homed sites with equivalent functionality as it is available to them using PI space, the AGGREGATE_INFO BGP attribute allows the originator to specify an AS_PATH attribute to be appended with the path contained in the aggregate route. This allows the customer AS (AS 10) to indicate to AS 3 that the attribute comparison should be performed between the explicitly advertised more-specific with as-path "2 10" and an implicit more-specific path with an as- path of "1 10". This implicit path is derived from the aggregate prefix. 3. BGP AGGREGATE_INFO attribute The BGP AGGREGATE_INFO attribute is a well-known, transitive attribute with Type Code 129. It contains a list of one or more aggregate target elements. Each aggregate target contains a mandatory part, with the operational status field followed by a route prefix. That may be followed by additional BGP PATH attributes that apply to the specified aggregate target prefix. The operational status is encoded as a 1-octect field with the following values: +-------+--------+-----------------------------------------------+ | Value | Color | Description | +-------+--------+-----------------------------------------------+ | 0 | Red | No connectivity between customer and provider | | 1 | Yellow | Direct connectivity unavailable | | 2 | Green | Connectivity fully operational | +-------+--------+-----------------------------------------------+ The prefix is encoded as a 2 byte AFI [RFC1700] value, followed by a variable length prefix encoded as a 1 byte prefix-length in bits and the prefix itself padded to a byte boundary. This is the same encoding used for NLRI in BGP UPDATE messages. The prefix contained in the AGGREGATE_INFO attribute SHOULD be a less-specific prefix containing all the NLRI specified in the BGP UPDATE message that includes this attribute. Following the route prefix, the encoding allows for one of more BGP path attributes using the encoding specified the BGP [RFC4271] protocol specification. An implementation MAY choose to include an AS_PATH attribute in this optional element. When an AS_PATH attribute is contained inside an AGGREGATE_INFO attribute, the path segments that it contains shall be appended to the AS_PATH of the implicit path represented by the aggregate prefix. Marques & White Expires June 24, 2011 [Page 6] Internet-Draft Topology-based aggregation December 2010 This implicit path is then compared with the best path of NLRI prefix(es) included in the UPDATE message containing this attribute. Example encoding for prefix 10.0/16, as-path "10": Attr Flags = 0x40, Attr Code = 0x81, Attr Length = 0x0e OpStatus=0x2, AFI = 0x00 0x01, Prefix Length = 0x10, Prefix Data = 0x0a 0x00 Attr Flags = 0x40, Attr Code = 0x02, Attr Length = 0x04, Data = 0x02 0x01 0x00 0x0a In the example given above, an AS_PATH segment of "10" in the aggregate-info attribute and an aggregate path with an AS_PATH of "1" would result in a as-path of "1 10", of length 2. When multiple aggregate target prefixes are present in a AGGREGATE_INFO attribute, the most significant prefix present in the Loc-Rib is used to generate the implicit path used in path selection. Multiple targets can be used when prefix assignment and delegation happens at more than one level. As an example, a provider X may have a /16 out of which it delegates to Y a specific /22 block. Y then allocates a /24 to a specific multi-homed customer Z. If Y itself is using aggregation its prefix may be suppressed. Where Z to originate a route with a single aggregation-target (/22), that prefix would not be aggregated in regions of the network where the /22 had itself be aggregated. For this mechanism to behave as expected one would have to ensure that if Y's prefix has been suppress then Z's has also been suppressed. Otherwise if Z's prefix is present, its aggregation target of Y will be ignored. Since this condition cannot be guaranteed, the protocol allows the originator of the more-specific prefix (Z) to include multiple aggregation targets (Y and X) in its route advertisement. Whenever Y is present in the Loc-Rib of BGP speaker, Y is used as source of the implicit aggregation path. Otherwise X is used if present. The choice of explicitly listing the aggregation targets rather than automatically deriving the parent is designed to avoid situations in which the less-specific is being artificially generated such as, for instance, the default route. Marques & White Expires June 24, 2011 [Page 7] Internet-Draft Topology-based aggregation December 2010 4. BGP extension deployment BGP speakers that support the extensions described in this document SHALL use the Capability Advertisement [RFC5492] BGP extension to advertise that support to its BGP peers. Compliant implementations should advertise the BGP Capability Code TBD. The capability data should contain a 1-byte value which is interpreted as the version of this specification. It should contain the value 1. When a BGP route is placed in the Out-RIB for a given external BGP peer and the peer in question doesn't support this capability, if the path in the Loc-Rib contains the AGGREGATE_INFO attribute this should result in the prefix being suppressed. If a previous path was advertised to this peer that path shall be withdrawn. If the peer in question is an internal BGP peer which doesn't support this capability an implementation MAY choose to replace this attribute with the NO_EXPORT [RFC1997] BGP community attribute, rather than suppress the path. This mechanism assures that a path that originated with an AGGREGATE_INFO attribute is not used by a router without being compared to the respective aggregate. This is intended to facilitate the incremental deployment of this functionality. 5. Path selection criteria A BGP implementation shall run its path selection algorithm unmodified between all the paths for a given prefix. If the selected best-path contains the BGP AGGREGATE_INFO attribute, this path shall be compared with the best-path of the aggregate prefix indicated by the attribute in question. The AGGREGATE_INFO attribute represents an implicit path for the more-specific prefix (the NLRI containing that attribute). The BGP path attributes of this implicit prefix are the attributes of the best-path of the aggregate prefix. If the AGGREGATE_INFO contains an optional AS_PATH attribute, the AS_PATH segments in that attribute shall be appended to the AS_PATH of the aggregate prefix best-path before comparison. When the Operational Status of the specified aggregate target is "Red" the corresponding implicit path is considered to be unreachable. When the Operational Status is "Yellow" the originating AS of the aggregate target prefix MUST treat the implicit path as Marques & White Expires June 24, 2011 [Page 8] Internet-Draft Topology-based aggregation December 2010 unreachable also and use the more-specific. Autonomous-systems further downstream MAY choose whether to ignore or use the aggregation information. The "Yellow" state represents that the originator of the prefix believes that there is a path between the primary and backup providers for the site such that this path always prefers the more- specific advertisement. This is often the case if both providers have a direct peering relationship. When comparing the more-specific path with its implicit path (represented by the aggregate), the following changes to the standard path selection algorithm should be taken into account: o The Origin attributes of both paths are not comparable. This is step b) in the path selection algorithm and should be bypassed. o If the paths in question are equal upto step d) of path selection algorithm, if both paths are EBGP paths, the less-specific (aggregate) should be preferred. This replaces the step in path selection where the oldest EBGP path is preferred [RFC5004]. o If both paths are iBGP paths, the less-specific (aggregate) should be preferred in case where the paths are equal up-to the router-id comparison step of path selection. When the aggregate path is considered to be preferable over the more- specific, the more-specific should be considered inactive and should not be installed in the FIB or subsequently advertised to other peers. 6. Network deployment The objective of this document is to provide multi-homed sites with the resilience to failures and limited traffic-engineering capabilities without the need to recurse to PI advertisements. Instead of using a PI prefix, a multi-homed site can choose to address its network with PA prefix from one service provider which it then advertises through a secondary provider. Or it may choose to dual address its hosts and/or NAT appliances. In order for a multi-homed site to achieve the required resilience it should be allowed by other service providers to inject the more- specifics that have been delegated to it with the BGP AGGREGATE_INFO attribute. Marques & White Expires June 24, 2011 [Page 9] Internet-Draft Topology-based aggregation December 2010 The AGGREGATE_INFO attribute should only be added to a BGP path by the originator of the route advertisement. This rule is intended to ensure that there aren't instances of the same BGP path information flowing through the Internet routing system with and without the specified attribute. In order to maintain the loop free properties of BGP one must ensure that when suppressing a more-specific this doesn't result in traffic being forwarded in a way which results in a loop. For this to occur, the following conditions would be necessary: A transit AS (X) prefers the more-specific route. Another AS (Y) receives both aggregate and more-specific from X and prefers the former. Y is in the transit path for the more-specific. The last condition cannot occur since Y, by definition prefers the aggregate path and will not advertise the more-specific. 7. Acknowledgements There have been several prior proposals to reduce routing information used in muli-homing scenarios. For instance, using BGP communities [I-D.white-bounded-longest-match] and AS hops [I-D.ietf-idr-as-hopcount]. The current document builds upon the previous work and proposes the use of standard BGP path selection using both implicit and explicit paths in order limit information to parts of the network where it is useful. 8. Contributors Central parts of the protocol operation where defined by Robert Raszuk and Keyur Patel. Russ White, Enke Chen, Dave Meyer and Vince Fuller provided essential input in the early stages of the proposal. 9. IANA Considerations This memo requests IANA to allocate a BGP attribute type code value, for the BGP aggregate-info attribute defined herein. It also requests IANA to allocate a Capability Code according to the Marques & White Expires June 24, 2011 [Page 10] Internet-Draft Topology-based aggregation December 2010 procedures defined in RFC 5492 [RFC5492]. 10. Security Considerations The BGP aggregate-info attribute in itself doesn't create a new security threat. This attribute can only lead to the route being suppressed. The presence of more-specifics in the routing system makes a stronger case for the usefulness of performing origin authentication of route advertisements. 11. References 11.1. Normative References [RFC1700] Reynolds, J. and J. Postel, "Assigned Numbers", RFC 1700, October 1994. [RFC1997] Chandrasekeran, R., Traina, P., and T. Li, "BGP Communities Attribute", RFC 1997, August 1996. [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. [RFC5004] Chen, E. and S. Sangli, "Avoid BGP Best Path Transitions from One External to Another", RFC 5004, September 2007. [RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement with BGP-4", RFC 5492, February 2009. 11.2. Informative References [I-D.ietf-idr-as-hopcount] Li, T., "The AS_HOPCOUNT Path Attribute", draft-ietf-idr-as-hopcount-00 (work in progress), December 2005. [I-D.white-bounded-longest-match] Hares, S., "Bounding Longer Routes to Remove TE", draft-white-bounded-longest-match-02 (work in progress), July 2008. Marques & White Expires June 24, 2011 [Page 11] Internet-Draft Topology-based aggregation December 2010 Authors' Addresses Pedro Marques (editor) Cisco Systems, Inc. 170 W. Tasman Dr. San Jose, CA 94040 US Phone: +1 408 853 1193 Email: roque@cisco.com Russ White Cisco Systems, Inc. 7025 Kit Creek Road Research Triangle Park, NC 27709 US Email: riw@cisco.com Marques & White Expires June 24, 2011 [Page 12]