Network Working Group L. Dunbar Internet Draft Futurewei Intended status: Informational K. Majumdar Expires: January 3, 2024 Microsoft H. Wang Huawei G. Mishra Verizon July 3, 2023 BGP Usage for 5G Edge Computing Service Metadata draft-dunbar-rtgwg-5g-edge-metadata-bgp-usage-01 Abstract This draft describes the problems in the 5G Edge computing environment and how BGP can be used to propagate additional IP layer detectable information about the 5G edge data centers so that the ingress routers in the 5G Local Data Network can make path selections based on not only the routing distance but also the IP Layer relevant metrics of the destinations. The goal is to improve latency and performance for 5G Edge Computing (EC) services even when the detailed servers running status are unavailable. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. This document may not be modified, and derivative works of it may not be created, except to publish it as an RFC and to translate it into languages other than English. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." xxx, et al. Expires January 3, 2024 [Page 1] Internet-Draft BGP Usage for 5G Edge Service Metadata The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on April 7, 2021. Copyright Notice Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction.............................................. 3 1.1. 5G Edge Computing Background......................... 3 1.2. 5G Edge Computing Network Properties................. 4 1.3. ANYCAST in 5G EC Environment......................... 5 1.4. Problem of Unbalanced Anycast Distribution........... 7 1.5. Problem of Application Service instance Relocation... 7 2. Conventions used in this document......................... 7 3. Destination Metadata for 5G Edge Computing................ 8 3.1. Assumptions.......................................... 8 3.2. IP Layer Metrics to Gauge Traffic Load............... 9 3.3. Metadata Constrained Optimal Path Selection......... 10 4. Soft Anchoring of an ANYCAST Flow........................ 11 5. Manageability Considerations............................. 13 6. Security Considerations.................................. 13 7. IANA Considerations...................................... 13 8. References............................................... 13 8.1. Normative References................................ 13 8.2. Informative References.............................. 14 9. Acknowledgments.......................................... 14 Dunbar, et al. [Page 2] Internet-Draft BGP Usage for 5G Edge Service Metadata 1. Introduction This document describes the problems in the 5G Edge computing environment and how BGP can be used to propagate additional IP-layer relevant information about the destination so that the ingress routers in the 5G Local Data Network can make path selections based on not only the routing distance but also the IP Layer relevant metrics of the destinations. The goal is to improve latency and performance for 5G Edge Computing (EC) services even when the detailed servers running status are unavailable. 1.1. 5G Edge Computing Background In 5G Edge Computing (EC), one application can have multiple instances hosted in different edge data centers that are close in proximity. The 5G Local Data Networks (LDN)that connect the edge data centers with the 5G Base stations (a.k.a. UPFs) consist of a small number of dedicated routers. When a User Equipment (UE) sends packets using the destination address from a DNS reply or its cache, the packets from the UE are carried in a PDU session through 5G Core [5GC] to the 5G UPF-PSA (User Plane Function - PDU Session Anchor). The UPF- PSA decapsulates the 5G GTP outer header and forwards the packets from the UE to its directly connected Ingress router of the 5G LDN. The LDN for 5G EC is responsible for forwarding the packets to the intended destination(s). When the UE moves out of coverage of its current gNB (next- generation Node B) and anchors to a new gNB, the 5G SMF (Session Management Function) could select the same UPF or a new UPF for the UE per standard handover procedures described in 3GPP TS 23.501 and TS 23.502. If the UE is anchored to a new UPF-PSA when the handover process is complete, the packets to/from the UE is carried by a GTP tunnel to the new UPF-PSA. Per TS 23.501-h20 Section 5.8.2, the UE may maintain its IP address when anchored to a new UPF-PSA unless the new UFP-PSA belongs to different mobile operators. 5GC may maintain a path from the old UPF to the new UPF for a short time for the SSC [Session and Service Continuity] mode 3 to make the handover process more seamless. Dunbar, et al. [Page 3] Internet-Draft BGP Usage for 5G Edge Service Metadata 1.2. 5G Edge Computing Network Properties In this document, 5G Edge Computing Network refers to multiple Local Data Networks (LDN) in one region that interconnect the Edge Computing data centers. Those (IP) LDN networks are the N6 interfaces from 3GPP 5G perspective. The 5G Edge Computing Network's ingress routers are directly connected to the 5G UPFs. The egress routers to the 5G Edge Computing [EC] Network are the routers directly connected to the EC service instances. The EC service instances and the egress routers are co-located. Some Edge Computing Data centers may have virtual switches or Top of Rack [ToR] switches between the egress routers and the service instances. But transmission delay between the egress routers and the EC service instances is negligible, which is too small to be considered in this document. For an application that has multiple service instances clustered together behind an application layer load balancer, it is usually the load balancer's IP address(es) visible to the 5G LDNs. How an application layer load balancer distributes traffic to a group of service instances is out of the scope of the network layer. This document is only for optimizing the traffic delivery from UEs to the IP addresses visible to the 5G LDN, which can be application layer load balancers or the actual service instances. The 5G EC services are registered premium services that require super-low latency and very high SLA. Most services by the UEs are not part of the registered 5G EC Services. Dunbar, et al. [Page 4] Internet-Draft BGP Usage for 5G Edge Service Metadata +--+ |UE|---\+---------+ +------------------+ +--+ | 5G | +--------+ | S1: aa08::4450 | +--+ | Site +--+-+---+ +----+ | |UE|----| A |PSA1| Ra| | R1 | S2: aa08::4460 | +--+ | +----+---+ +----+ | +---+ | | | | | S3: aa08::4470 | |UE1|---/+---------+ | | +------------------+ +---+ |IP Network | L-DN1 |(3GPP N6) | | | | +------------------+ | UE1 | | | S1: aa08::4450 | | moves to | +----+ | | Site B | | R3 | S2: aa08::4460 | v | +----+ | | | | S3: aa08::4470 | | | +------------------+ | | L-DN3 +--+ | | |UE|---\+---------+ | | +------------------+ +--+ | 5G | | | | S1: aa08::4450 | +--+ | Site +--+--+---+ +----+ | |UE|----| B |PSA2| Rb | | R2 | S2: aa08::4460 | +--+ | +--+-+----+ +----+ | +--+ | | +-----------+ | S3: aa08::4470 | |UE|---/+---------+ +------------------+ +--+ L-DN2 Figure 1: Service instances in multiple edge DCs 1.3. ANYCAST in 5G EC Environment Increasingly, Anycast is used by various application providers and CDNs because Anycast provides better and faster resiliency to failover events than geo database DNS-based load balancing, which relies on DNS to provide a different IP based on source address. Anycast address leverages the proximity information present in the network (routing) layer. It eliminates the single point of failure and bottleneck at the DNS resolvers. Anycast address can be assigned to instances located in multiple data centers Dunbar, et al. [Page 5] Internet-Draft BGP Usage for 5G Edge Service Metadata to leverage network condition for balanced forwarding. Another benefit of using the ANYCAST address is removing the dependency on UEs. Some UEs (or clients) might use their cached IP addresses for an extended period instead of querying DNS. Client using Virtual IP address is a common practice in Cloud Native networking, e.g., Kubernetes, to scale dynamic changes of application instances. Virtual IP requires the destination gateway node to perform address translation for return traffic, which is unsuitable for underlay network nodes with millions of flows passing by. The Cloud Native network can also leverage network conditions to balance forwarding among multiple Cloud Gateway nodes by assigning the same virtual IP address. Having multiple locations of the same IP address in the 5G EC LDN can be problematic if path selection is solely based on routing cost as the routing cost differences to reach different egress routers can be very small. This list elaborates the issues in detail: a) Path Selection: When a new flow comes to an ingress node (Ra), avoiding instability with Anycast flipping among paths to the same address can be an issue. The problem also exists in the BGP multipath environment, with the optimal path selected based on routing cost metrics. The ingress node needs to forward the packets from one flow to the same service instance, a.k.a. Flow Affinity or Flow-based load balancing. The ingress node (Ra/Rb) can use Flow ID (in IPv6 header) or UDP/TCP port number combined with the source address to enforce packets in one flow being placed in one tunnel to one Egress router. No new features are needed. b) When a UE moves to a new 5G site in the middle of a communication session with an EC service instance, a method is needed to stick the flow to the same EC service instance, which is required by 5G Edge Computing: 3GPP TR 23.748. [5g-edge-compute-sticky-service] describes several approaches to achieve stickiness in the IPv6 domain. Dunbar, et al. [Page 6] Internet-Draft BGP Usage for 5G Edge Service Metadata Note: most EC services have shorter sessions, e.g., shorter TCP sessions. Most likely, when a UE is moving to a new 5G site, the TCP session via the old UPF to an EC service instance is already finished. Only a very small percentage of registered EC services need to stick to the original service instance when handover to a new cell tower. From BGP perspective, the multiple service instances with the same IP address (ANYCAST)attached to different egress routers is the same as multiple next hops for the IP address. This draft describes using BGP UPDATE to propagate some metrics about the destination data centers to the ingress routers so that both network and destination conditions can be considered when computing the optimal path to the egress routers. 1.4. Problem of Unbalanced Anycast Distribution It is common to have higher capacity EC service instances placed in a metro data center to accommodate more UEs in proximity and fewer placed in remote sites. Sometimes, UEs swarm to a specific site unexpectedly, e.g., a special event at a remote site for a short period, e.g., 1~2 days. The EC service instances in the remote site might be heavily utilized. In contrast, the EC service instances of the same app in the metro DC can be under-utilized. Since the condition can be short-lived or unexpected, it might not make business sense to adjust EC capacity among DCs. 1.5. Problem of Application Service instance Relocation When an EC service instance is added to, moved, or deleted from a 5G EC Data Center, it is useful to propagate the changes to 5G PSA or the PSA adjacent routers. After the change, the cost associated with the site might change as well. 2. Conventions used in this document A-ER: Egress Router to an Application Service instance, [A-ER] is used to describe the last router that the Application Service instance is attached. For Dunbar, et al. [Page 7] Internet-Draft BGP Usage for 5G Edge Service Metadata a 5G EC environment, the A-ER can be the gateway router to a (mini) Edge Computing Data Center. EC: Edge Computing Edge DC: Edge Data Center, which provides the Edge Computing Hosting Environment. An Edge DC might host 5G core functions in addition to the frequently used application service instances. gNB next generation Node B LDN: Local Data Network PSA: PDU Session Anchor (UPF) SSC: Session and Service Continuity UE: User Equipment UPF: User Plane Function The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 3. Destination Metadata for 5G Edge Computing The destination metadata consists of metrics about the running environment at the egress routers to which EC service instances are directly attached. 3.1. Assumptions From the IP Layer, the EC service instances or their respective load balancers are identified by their IP addresses. Those IP addresses are the identifiers to the EC service instances throughout this document. Here are some assumptions about the 5G EC services: Dunbar, et al. [Page 8] Internet-Draft BGP Usage for 5G Edge Service Metadata - Only the registered EC services, which are only a small portion of the services, need to incorporate the destination related metrics for optimal forwarding. - The 5G EC controller or management system can send those EC service identifiers to relevant routers. - The ingress routers' local BGP path compute algorithm includes a special plugin that can compute the path to the optimal Next Hop (egress router) based on the BGP Metadata TLV received for the registered EC services. The proposed solution is for the egress routers, a.k.a. A-ERs in this document, that have direct links to the EC Service instances to collect various measurements about the Service instances' running status [5G-EC-Metrics] and advertise the metrics to the ingress routers in 5G EC LDN (Local Data Network). 3.2. IP Layer Metrics to Gauge Traffic Load [5G-EC-Metrics] describes the IP Layer Metrics that can be used to estimate the service instances running status and environment: - IP-Layer Metric for Load Measurement: The Load Measurement is a weighted combination of the number of packets/bytes to the IP address and the number of packets/bytes from the address which are collected by the A- ER to which the Service instance is directly attached. The A-ER is configured with an ACL that can filter out the packets for the Application Service instance. - Site Degradation Index a numeric number, representing the percentage of the site being functional. When a data center goes dark (i.e., lost power), the A-ER can announce the capacity index being 0 for all the IP addresses reached by the A-ER. - Site preference index: is used to describe some sites are more preferred than others. For example, a site with higher bandwidth has a higher preference number than other. Dunbar, et al. [Page 9] Internet-Draft BGP Usage for 5G Edge Service Metadata In this document, the term "Application Service instance Egress Router" [A-ER] is used to describe the last router that application Service instance are attached. For the 5G EC environment, the A-ER can be the gateway router to the EC DC where application service instances are hosted. 3.3. Metadata Constrained Optimal Path Selection The main benefit of using ANYCAST is to leverage the network layer conditions to select an optimal path to the application instantiated in multiple locations. When the ingress routers to the 5G LDN are informed of the Load and Capacity Index of the App Service instances at different EC data centers, they can incorporate those metrics with the network path conditions for the path selections. Here is an algorithm that computes the cost to reach the App Service instances attached to Site-i relative to another site, say Site-b. When the reference site, Site-b, is plugged in the formula, the cost is 1. So, if the formula returns a value less than 1, the cost to reach Site-i is less than reaching Site-b. CP-b * Load-i Pref-b * Network-Delay-i Cost-i= (w *(----------------) + (1-w) *(-------------------------)) CP-i * Load-b Pref-i * Network-Delay-b Load-i: Load Index at Site-i, it is the weighted combination of the total packets or/and bytes sent to and received from the Application Service instance at Site-i during a fixed time period. CP-i: degraded capacity index at Site-i, a higher value means higher capacity. Delay-i: Network latency measurement (RTT) to the A-ER that has the Application Service instance attached at the site- i. Pref-i: Preference index for the Site-i, a higher value means higher preference. w: Weight for load and site information, which is a value between 0 and 1. If smaller than 0.5, Network latency and Dunbar, et al. [Page 10] Internet-Draft BGP Usage for 5G Edge Service Metadata the site Preference have more influence; otherwise, Service instance load and its capacity have more influence. 4. Soft Anchoring of an ANYCAST Flow "Sticky Service" in the 3GPP Edge Computing specification (3GPP TR 23.748) is about flows from a UE sticking to a specific location when the UE moves from one 5G Site to another. "Soft Anchoring" is a mechanism for ingress routers to apply preference to the path towards the previous service instance location when the UE is anchored to a new UPF and continue using its cached IP for the EC service instance. Let's assume one application "App.net" is instantiated on four service instances that are attached to four different routers R1, R2, R3, and R4 respectively. It is desired for packets to the "App.net" from UE-1 to stick with one service instance, say the App Service instance attached to R1, even when the UE moves from one 5G site to another. However, when there is a failure reaching R1 or the Application Service instance attached to R1, the packets of the flow "App.net" from UE-1 need to be forwarded to the Application Service instance attached to R2, R3, or R4. We call this kind of sticky service "Soft Anchoring", meaning that anchoring to the site of R1 is preferred, but other sites can be chosen when the preferred site encounters a failure. Here is a mechanism to achieve Soft Anchoring: - Assign a group of ANYCAST addresses to one application. For example, "App.net" is assigned with 4 ANYCAST addresses, L1, L2, L3, and L4. L1/L2/L3/L4 represents the location preferred ANYCAST addresses. - For the App.net Service instance attached to a router, the router has four Stub links to the same Service instance, L1, L2, L3, and L4 respectively. The cost to L1, L2, L3, and L4 is assigned differently for different egress routers. For example, o When attached to R1, the L1 has the lowest cost, say 10, when attached to R2, R3, and R4, the L1 can have a higher cost, say 30. Dunbar, et al. [Page 11] Internet-Draft BGP Usage for 5G Edge Service Metadata o ANYCAST L2 has the lowest cost when attached to R2, higher cost when attached to R1, R3, R4 respectively. o ANYCAST L3 has the lowest cost when attached to R3, higher cost when attached to R1, R2, R4 respectively, and o ANYCAST L4 has the lowest cost when attached to R4, higher cost when attached to R1, R2, R3 respectively - When a UE queries for the "App.net" for the first time, the DNS reply has the location preferred ANYCAST address, say L1, based on where the query is initiated. - When the UE moves from one 5G site-A to Site-B, UE continues sending packets of the "App.net" to ANYCAST address L1. The routers will continue sending packets to R1 because the total cost for the App.net instance for ANYCAST L1 is lowest at R1. If any failure occurs making R1 not reachable, the packets of the "App.net" from UE-1 will be sent to R2, R3, or R4 (depending on the total cost to reach L1 attached to R2/R3/R4). If the Application Service instance supports the HTTP redirect, more optimal forwarding can be achieved. - When a UE queries for the "App.net" for the first time, the global DNS reply has the ANYCAST address G1, which has the same cost regardless of where the Application service instances are attached. - When the UE initiates the communication to G1, the packets from the UE will be sent to the Application Service instance that has the lowest cost, say the Service instance attached to R1. The Application service instance is instructed with HTTPs Redirect to reply with a location-specific URL, say App.net-Loc1. The client on the UE will query the DNS for App.net-Loc1 and get the response of ANYCAST L1. The subsequent packets from the UE-1 for App.net are sent to L1. Dunbar, et al. [Page 12] Internet-Draft BGP Usage for 5G Edge Service Metadata 5. Manageability Considerations The Edge Service Metadata described in this document are only intended for propagating between Ingress and egress routers of one single BGP domain, i.e., the 5G Local Data Networks, which is a limited domain with edge services a few hops away from the ingress nodes. Only the selective services by UEs are considered as 5G Edge Services. The 5G LDN is usually managed by one operator, even though the routers can be by different vendors. 6. Security Considerations The proposed Edge Service Metadata are advertised within the trusted domain of 5G LDN's ingress and egress routers. There are no extra security threats compared with iBGP. 7. IANA Considerations This document doesn't require any IANA action. 8. References 8.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC4364] E. rosen, Y. Rekhter, "BGP/MPLS IP Virtual Private networks (VPNs)", Feb 2006. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC8200] s. Deering R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", July 2017 Dunbar, et al. [Page 13] Internet-Draft BGP Usage for 5G Edge Service Metadata 8.2. Informative References [3GPP-EdgeComputing] 3GPP TR 23.748, "3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Study on enhancement of support for Edge Computing in 5G Core network (5GC)", Release 17 work in progress, Aug 2020. [5G-EC-Metrics] L. Dunbar, H. Song, J. Kaippallimalil, "IP Layer Metrics for 5G Edge Computing Service", draft- dunbar-ippm-5g-edge-compute-ip-layer-metrics-00, work-in-progress, Oct 2020. [5G-Edge-Sticky] L. Dunbar, J. Kaippallimalil, "IPv6 Solution for 5G Edge Computing Sticky Service", draft-dunbar- 6man-5g-ec-sticky-service-00, work-in-progress, Oct 2020. [RFC5521] P. Mohapatra, E. Rosen, "The BGP Encapsulation Subsequent Address Family Identifier (SAFI) and the BGP Tunnel Encapsulation Attribute", April 2009. [BGP-SDWAN-Port] L. Dunbar, H. Wang, W. Hao, "BGP Extension for SDWAN Overlay Networks", draft-dunbar-idr-bgp- sdwan-overlay-ext-03, work-in-progress, Nov 2018. [SDWAN-EDGE-Discovery] L. Dunbar, S. Hares, R. Raszuk, K. Majumdar, "BGP UPDATE for SDWAN Edge Discovery", draft-dunbar-idr-sdwan-edge-discovery-00, work-in- progress, July 2020. [Tunnel-Encap] E. Rosen, et al "The BGP Tunnel Encapsulation Attribute", draft-ietf-idr-tunnel-encaps-10, Aug 2018. 9. Acknowledgments Acknowledgements to Sue Hares and Donald Eastlake for their review and contributions. Dunbar, et al. [Page 14] Internet-Draft BGP Usage for 5G Edge Service Metadata This document was prepared using 2-Word-v2.0.template.dot. Authors' Addresses Linda Dunbar Futurewei Email: ldunbar@futurewei.com Kausik Majumdar Microsoft Email: kmajumdar@microsoft.com Haibo Wang Huawei Email: rainsword.wang@huawei.com Gyan Mishra Verizon Email: gyan.s.mishra@verizon.com Dunbar, et al. [Page 15]