BESS WG Y. Wang Internet-Draft Z. Zhang Intended status: Standards Track ZTE Corporation Expires: 1 March 2022 28 August 2021 EVPN VPWS as VRF Attachment Circuit draft-wz-bess-evpn-vpws-as-vrf-ac-02 Abstract When a VRF Attachment Cirucit (VRF-AC) is far away from its IP-VRF instance, we can deploy an EVPN VPWS ([RFC8214]) between that VRF-AC and its IP-VRF instance. From the viewpoint of the IP-VRF instance, a local virtual interface takes the place of that remote "VRF-AC". The IP address for that VRF-AC is now configured to the virtual interface, in other words, the virtual interface is the actual VRF-AC of the IP-VRF instance. The virtual interface is also the AC of that VPWS instance, in other words, the virtual interface is cross- connected to that remote "VRF-AC" by the VPWS instance. This document proposes an extension to [I-D.ietf-bess-evpn-inter-subnet-forwarding] to support this scenario. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 1 March 2022. Copyright Notice Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved. Wang & Zhang Expires 1 March 2022 [Page 1] Internet-Draft EVPN VPWS as VRF-AC August 2021 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Integrated Routing and Cross-connecting . . . . . . . . . 3 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 2. ARP/ND Synching and IP Prefix Synching . . . . . . . . . . . 6 2.1. Constructing MAC/IP Advertisement Route . . . . . . . . . 7 2.1.1. When CEs are Hosts . . . . . . . . . . . . . . . . . 7 2.1.2. When CEs are Routers . . . . . . . . . . . . . . . . 8 2.2. Constructing Ethernet A-D Route . . . . . . . . . . . . . 8 2.3. Constructing IP Prefix Advertisement Route . . . . . . . 9 2.3.1. Direct-Prefixes Advertisement . . . . . . . . . . . . 9 2.3.2. Exclusive CE-Prefixes of Each CE . . . . . . . . . . 9 3. Packet Walk Through . . . . . . . . . . . . . . . . . . . . . 10 3.1. When CEs are Hosts . . . . . . . . . . . . . . . . . . . 10 3.2. When CEs are Routers . . . . . . . . . . . . . . . . . . 11 4. Fast Convergence for Routed Traffic . . . . . . . . . . . . . 11 5. Considerations on ABRs and Route Reflectors . . . . . . . . . 12 6. For Common CE-prefixes behind R1 and R2 . . . . . . . . . . . 12 6.1. Solution 1: Independent CE-BGP sessions . . . . . . . . . 12 6.2. Solution 2: ECMP-Merging for RT-5G routes . . . . . . . . 13 6.2.1. ECMP-Merging by RT-5L . . . . . . . . . . . . . . . . 15 6.2.2. ECMP-Merging by RT-2R . . . . . . . . . . . . . . . . 15 6.3. Solution 3: RT-5E Routes Advertisement . . . . . . . . . 16 6.3.1. CE-Prefix Advertisement by RT-5E Routes . . . . . . . 16 6.3.1.1. When Internal Remote PEs Receive the RT-5E . . . 18 6.3.1.2. When External Remote PEs Receive the RT-5E . . . 18 6.3.1.3. Packet Walk Through . . . . . . . . . . . . . . . 18 6.3.2. The Advertisement of SOI-mapping Routes . . . . . . . 19 6.3.3. IP-mapping SOI Extended Community . . . . . . . . . . 19 7. Security Considerations . . . . . . . . . . . . . . . . . . . 20 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 9. Normative References . . . . . . . . . . . . . . . . . . . . 20 10. Informative References . . . . . . . . . . . . . . . . . . . 21 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 Wang & Zhang Expires 1 March 2022 [Page 2] Internet-Draft EVPN VPWS as VRF-AC August 2021 1. Introduction When a VRF Attachment Cirucit (VRF-AC) is far away from its IP-VRF instance, we can deploy an EVPN VPWS ([RFC8214]) between that VRF-AC and its IP-VRF instance. From the viewpoint of the IP-VRF instance, a local virtual interface takes the place of that remote "VRF-AC". The IP address for that VRF-AC is now configured to the virtual interface, in other words, the virtual interface is the actual VRF-AC of the IP-VRF instance. The virtual interface is also the AC of that VPWS instance, in other words, the virtual interface is cross- connected to that remote "VRF-AC" by the VPWS instance. The requirements of this scenario is described in Section 1.1. 1.1. Integrated Routing and Cross-connecting When an IP-VRF instance and an EVPN VPWS instance are connected by an virtual-interface, We call such scenarios as Integrated Routing and Cross-connecting (IRC) use-case, and the virtual-interface connecting EVPN VPWS and IP-VRF is called as IRC interface, because that the packets received from the virtual-interface is routed in the IP-VRF and the data packets sent to the virtual-interface is cross-connected to the remote AC of that EVPN VPWS. The IRC use case are illustrated by the following figure: PE1 +---------------------+ | IRC1=10.9 | | +-----+ +------+ |. .| |VPWS1|---|IPVRF1| | . . | +-----+ +------+ | . PE4 . | | . PE3 +--------+. +---------------------+ +---------+ | | | | | |+-----+ | | RT-2 |+------+ | ||VPWS1| | | <10.2, M1> ||IPVRF1| | |+-----+ | | label2=IPVRF1 |+------+ | | | | | label1=VPWS1 | | | +---|----+. | RT=VPWS1 .+---|-----+ | . PE2 V . | | . +---------------------+ . | | .| IRC1=10.9 |. | N1=10.2 | +-----+ +------+ | N3=30.2 | |VPWS1|---|IPVRF1| | Behind N1: | +-----+ +------+ | 60.0/24 | | 70.0/24 +---------------------+ Wang & Zhang Expires 1 March 2022 [Page 3] Internet-Draft EVPN VPWS as VRF-AC August 2021 Figure 1: ARP/ND Synchronizing for IRC Interfaces There are four PE nodes named PE1/PE2/PE3/PE4 in the above network. PE4 is a pure EVPN VPWS PE, there may be no IP-VRFs on it. PE3 is a pure L3 EVPN PE, there may be no VPWSes or MAC-VRFs on it. PE1 and PE2 are the border of the EVPN VPWS domain and the L3 EVPN domain, so they are both EVPN VPWS PE and L3 EVPN PE, there will be both EVPN IP-VRFs and EVPN VPWSes on them. N1/N2/N3/N1b may be a host or an IP router. N1/N1b and IRC1 is in the subnet 10.0.0.0/24, where N1's IP is 10.0.0.2, N1b's IP is 10.0.0.3 and IRC1's IP is 10.0.0.9 (10.9). N2 and IRC2 (see Figure 3) is in the subnet 20.0.0.0/24, where N2's IP is 20.0.0.2 and IRC2's IP is 20.0.0.9 (20.9). N3 is in the subnet 30.0.0.0/24. When N1/N2/N3/N1b is a host, it is also called H1/H2/H3/H1b in this document. When N1/N2/N3/N1b is a router, it is also called R1/R2/R3/ R1b in this document. N1/N2/N3/N1b's MAC address is M1/M2/M3/M1b respectively. When N1 is a Router, there are two subnets behind N1, these subnets are 60.0/24 and 70.0/24. Note that there may be L2 switches between N1/N2/N3/N4 and their PEs. These switches are not shown in Figure 1. Note that the IRC interfaces are considered as AC interfaces in EVPN VPWS instances. At the same time, they are considered as VRF-ACs in IP-VRF instances. When N1 sends an ARP Request REQ_P1, then REQ_P1 will be forwarded by PE4 to either PE1 or PE2, not to the both. Both the IRC1 on PE1 and PE2 are N1's subnet-gateway(SNGW). But when N3 send an ARP Reply REP_P2 to N1, then PE3 may load-balance REP_P2 to either PE1 or PE2, not to the both. When REQ_P1 is load-balanced to PE1, not to PE2, but PE3 load-balance REP_P2 to PE2, The ARP entry of N1 will not be prepared on PE2 for REP_P2. So the fowarding of REP_P2 will be delayed due to ARP missing. We use RT-2 routes to advertise the ARP entry of N1 from PE2 to PE3. Note that an ESI may be assigned to IRC1 and IRC2, But it is not necessary to advertise that ESI in the L3 EVPN domain in some scenarios. The ESI may be advertised in the EVPN VPWS domain only, in such scenarios. Wang & Zhang Expires 1 March 2022 [Page 4] Internet-Draft EVPN VPWS as VRF-AC August 2021 1.2. Terminology Most of the terminology used in this documents comes from [RFC7432] and [I-D.ietf-bess-evpn-prefix-advertisement] except for the following: * VRF AC: VRF Attachment Circuit, An Attachment Circuit (AC) that attaches a CE to an IP-VRF. It is defined in [RFC4364]. * IRC: Integrated Routing and Cross-connecting, thus a IRC interface is the virtual interface connecting an IP-VRF and an EVPN VPWS. * L3 EVI: An EVPN instance spanning the Provider Edge (PE) devices participating in that EVPN which contains VRF ACs and maybe contains IRB interfaces or IRC interfaces. * IP-AD/EVI: Ethernet Auto-Discovery route per EVI, and the EVI here is an IP-VRF. * IP-AD/ES: Ethernet Auto-Discovery route per ES, and the EVI for one of its route targets is an IP-VRF. * CE-BGP: The BGP session between PE and CE. Note that CE-BGP route doesn't have a RD or Route-Target. * RMAC: Router's MAC, which is signaled in the Router's MAC extended community. * RT-2R: When a MAC/IP Advertisement Route is used in the context of an IP-VRF, it is called as a RT-2R in this draft. * RT-5E: An EVPN Prefix Advertisement Route with a non-reserved ESI. * RT-5G: An EVPN Prefix Advertisement Route with a zero ESI and a non-zero GW-IP. * RT-5L: An EVPN Prefix Advertisement Route with both zero ESI and zero GW-IP, but a valid MPLS label. * SOI: Supplementary Overlay Index (see Section 6.3.3), the SOI is used together with an ESI to select IP A-D per EVI routes. * Internal Remote PE: When PEx is called as an EVPN route ERy's internal remote PE, that is saying that, PEx is on the ES which is identified by ERy's ESI field. When ERy's SOI is not zero, that is aslo saying that PEx has been attached to the ethernet tag which is identified by the . Wang & Zhang Expires 1 March 2022 [Page 5] Internet-Draft EVPN VPWS as VRF-AC August 2021 * External Remote PE: When PEx is called as an EVPN route ERy's external remote PE, that is saying that, PEx is not on the ES which is identified by ERy's ESI field. When ERy's SOI is not zero, PEx may aslo be a PE which has not been attached to the ethernet tag which is identified by the . * CE-Prefix: When an IP prefix can be reached through CEx from PEy, that IP prefix is called as PEy's CE-prefix behind CEx in this draft. PEy's CE-prefix behind CEx is also called as PEy's CE- prefix for short in this draft. * Common CE-Prefix: When an CE-Prefix can be reached through either CEy or CEz from PEy, in this draft, it is called as a common CE- Prefix of CEy and CEz,from the viewpoint of PEy. * Exclusive CE-Prefix: When an CE-Prefix of PEy can be reached through CEy, and it can't be reached through other CEs of PEy, it is called as an exlusive CE-Prefix of CEy, from the viewpoint of PEy. * SNGW: Sub-Net-specific Gate Way IP address, the SNGW of a subnet is an IP address which is used by the hosts of that subnet to be the nexthop of the default route of these host. * Intermediate subnet: The subnet that connects a PE and a CE of a L3 EVI. * Intermediate SNGW : The SNGW of a intermediate subnet. It will be the IP address of a IRC interface in this draft. * Intermediate nexthop : The CE's IP address in the intermediate subnet. * Overlay nexthop : The CE-Prefix's nexthop IP address which is in the address-space of the L3 EVI. * Original Overlay nexthop : The overlay nexthop which is advertised by the CE through a PE-CE route protocol. 2. ARP/ND Synching and IP Prefix Synching IP-MAC relations of hosts are learnt by PEs on the access side via a control plane protocol like ARP. In case where N1 is multihomed to multiple L3 EVPN PE nodes by an All-Active EVPN VPWS, N1's Host IP/ MAC will be learnt and advertised in the MAC/IP Advertisement Route only by the PE that receives the ARP packet. The MAC/ IP Advertisement with non-zero ESI will be received by the other multihomed PEs. Wang & Zhang Expires 1 March 2022 [Page 6] Internet-Draft EVPN VPWS as VRF-AC August 2021 As a result, after PE2 receives the MAC/IP Advertisement and imports it to the VPWS Service Instance, PE2 installs an ARP entry to the VPWS Service instance's IRC interface. Such ARP entry is called as remote synched ARP Entry in this document. Note that the PE3 follows the DGW1 behavior of [I-D.ietf-bess-evpn-prefix-advertisement]'s section 4.1 to achieve the load balancing procedures based on the recursive route resolution by the GW-IP Overlay Index. When PE3 load balance the traffic towards PE1/PE2, both PE1 and PE2 would have been prepared with corresponding ARP entry yet because of the following ARP synching procedures. 2.1. Constructing MAC/IP Advertisement Route The CEs may be hosts or routers, these factors may have an influence on how the MAC/IPs of these CEs should be advertised. * The CEs are Hosts - In this case, there may be many hosts in the subnet of an IRC interface. It is not necessary for the MAC/IP routes of these hosts to be imported by their external remote PEs (e.g. PE3). These MAC/IP routes just need to be imported by their internal remote PEs (e.g. PE1/PE2). * The CEs are Routers - In this case, there may be few Routers in the subnet of an IRC interface. The MAC/IP routes of these routers should be imported by their external remote PEs (e.g. PE3), because that the GW-IP of the RT- 5G routes (see Section 2.3) of the CE-prefixes behind these routers should be resolved to these MAC/IP routes. This draft introduces a new usage/construction of MAC/IP Advertisement route to enable ARP/ND synching for IP addresses in EVPN IRC use-cases. The usage/construction of this route remains similar to that described in [I-D.ietf-bess-evpn-inter-subnet-forwarding] with a few notable exceptions as below. 2.1.1. When CEs are Hosts * The Route-Distinguisher should be set to the corresponding EVPN- VPWS context. Wang & Zhang Expires 1 March 2022 [Page 7] Internet-Draft EVPN VPWS as VRF-AC August 2021 * The Ethernet Tag should be set to the VPWS Service Instance Identifier of the IRC interface. * The MAC/IP Advertisement SHOULD carry one EVI-RT (for the EVPN VPWS instance) and one ES-Import RT (for the ESI of the IRC interface). * The ESI can be set to the ESI of the IRC interface or the I-ESI of VPWS1's L2 EVI. Note that the receiver use the ESI and Ethernet Tag ID to determine the VPWS Service Instance whose IRC interface is the interface that the synced ARP entry will be installed to. Note that VPWS1 and VPWS2 are two VPWS Service Instances of the same L2 EVPN Instance, thus they have different VPWS Service Instance Identifiers. Then we can assign an I-ESI to that L2 EVI. The ESI of the Ethernet A-D per EVI routes for these two VPWS Service Instances will be set to this I-ESI. The Ethernet Tag ID of each of these Ethernet A-D per EVI routes (for EVPN VPWS domain) will be set to its VPWS Service Instance ID. * The MPLS Label1 should be set to the label of the . 2.1.2. When CEs are Routers * Route-Distinguisher: The RD of VPWS1's EVI. * Ethernet Tag ID: The same as Section 2.1.1. * SOI: The same as the ET-ID of Section 2.1.1. * Router Target: IPVRF1's export RTs and EVPN VPWS's export RTs. * ESI: The same as Section 2.1.1. * MPLS Label1: The same as Section 2.1.1. * MPLS Label2: The MPLS Label2 should be set to IPVRF1's EVPN label. * RMAC: The Rourter's MAC Extended Community attribute SHOULD be carried in VXLAN EVPN. 2.2. Constructing Ethernet A-D Route When CEs are hosts, the ESI of the IRC interface is mainly used in the EVPN VPWS domain. That ESI typically has nothing to do with the fundamental function of the L3 EVPN domain. Note that PE3 or PE4 will not import the RT-2 route with an ES-import RT it doesn't recognize. Wang & Zhang Expires 1 March 2022 [Page 8] Internet-Draft EVPN VPWS as VRF-AC August 2021 Note that the Ethernet A-D route advertisement in the EVPN VPWS domain still follows [RFC8214]. The IRC interface is considered as an ordinary AC in the EVPN VPWS domain. When CEs are routers, the of the RT-2R route for the GW-IP of the RT-5G routes will be used to do recursive resolution. Thus an corresponding IP A-D per EVI route should be advertised for the IRC1 interface in the context of IPVRF1. * Route-Distinguisher: IPVRF1's RD. * Ethernet Tag ID: IRC1 interface's local VPWS service instance ID. * Router Target: IPVRF1's export RT. * ESI: IRC1's ESI or the I-ESI of VPWS1's L2 EVI. * MPLS Label: IPVRF1's EVPN label. * RMAC: The Rourter's MAC Extended Community should be set as per [I-D.sajassi-bess-evpn-ip-aliasing]. 2.3. Constructing IP Prefix Advertisement Route There may be two types of IP prefixes on PE1/PE2, direct-prefixes (e.g. intermediate subnet of IRC interface) and CE-prefixes. The direct-prefixes are the subnets of the PE's own interfaces (e.g. the IRC interface). The CE-prefixes are the prefixes behind the CE node N1 (especially when N1 is a router). 2.3.1. Direct-Prefixes Advertisement Given that PE1/PE2 can install synced ARP entries to its proper IRC interface benefitting from the RT-2 route of Section 2. This ensures that both PE1 and PE2 will know all hosts of the IRC interface's own subnet. So it is not necessary for PE1/PE2 to advertise per-host IP prefixes of that subnet to PE3 by RT-2 routes. It is recommended that PE1/PE2 advertise a single RT-5L route of that subnet to PE3 instead. The ESI of these RT-5 routes can be simply set to zero, because when PE3 receives such RT-5 routes from both PE1 and PE2, PE3 can consider them as ECMP or FRR even when their ESI is zero. 2.3.2. Exclusive CE-Prefixes of Each CE There may be two types of CE-Prefixes on PE1/PE2, they are the common CE-prefixes (e.g. SN9) of R1 and R2, and the exclusive CE-prefixes (which can only be reached by a specified CE) of R1 or R2. Let us discuss the exclusive CE-Prefixes first, the common CE-prefixes will be discussed in Section 6. Note that N1 may be a host or a router, when it is a router, there may be some prefixes behind N1 on PE1. Those prefixes will be learnt via a PE-CE route protocol (e.g. CE-BGP). N1's IP address may be Wang & Zhang Expires 1 March 2022 [Page 9] Internet-Draft EVPN VPWS as VRF-AC August 2021 considered as the overlay nexthop of those prefixes. The overlay nexthop of those prefixes will be carried in the RT-5 route's GW-IP field. Those RT-5 routes are called as RT-5G routes because their Overlay Indexes are their GW-IPs (and their ESI and label are zero). Note that these RT-5G routes are advertised by PE1 to both PE2 and PE3. If the IRC1 interface of PE1 fails, these CE-prefixes will achieve more faster convergency on PE3 by the withdraw (from PE1) of the corresponding IP A-D per EVI route. Note that when PE3 receives the withdraw of the RT-2R of 10.2 from PE1, and the RT-2R is the only RT-2R of 10.2, and the of the RT-2R can be resolved to an IP A-D per EVI route from another PE (e.g. PE2), PE3 should triger a delayed deletion of that RT-2R. so that ARP/ND refresh can happen on PE2 before the deletion. 3. Packet Walk Through The procedures for local/remote host learning and MAC/IP Advertisement route constructing are described above. 3.1. When CEs are Hosts When N3 sends a data packet P301 to 10.2 which is a host of the subnet of IRC1, P301 will match prefix 10.0/24 on PE3. Both PE1 and PE2 have advertised the RT-5L route of 10.0/24 to PE3. PE3 may consider them as ECMP or FRR, depending on their route attributes. Then PE3 should forward P301 to PE1 or PE2, depending on the ECMP/FRR procedures. We can assume that it is PE2 that will receive P301 from PE3. The outgoing interface for P301 (whose destination IP is 10.2) is IRC1 interface. The destination MAC should be found from the ARP entries on IRC1. The ARP entry for 10.2 is a synched ARP entry, because N1 sent the ARP Request only to PE1. It is intalled onto IRC1 interface just because the RT-2 route's route-target mathes VPWS1's L2 EVI and the RT-2 route's matches the IRC1 interfaces's ESI and VPWS Service Instance ID. Then P301 is encapsulated with a ethernet header and becomes an ethernet packet P301E. The destination MAC address of P301E is N1's MAC address which is determined by that ARP entry. The source MAC address of P301E is IRC1's MAC address. Then P301E is sent over IRC1 interface. Wang & Zhang Expires 1 March 2022 [Page 10] Internet-Draft EVPN VPWS as VRF-AC August 2021 After P301E is sent over IRC1 interface, it will be forwarded to PE4 in the EVPN VPWS instance according to [RFC8214] 3.2. When CEs are Routers When N3 sends a data packet P301b to a host 60.1 whose location is behind R1(N1), P301b will match prefix 60.0/24 on PE3. The RT-5G route for 60.0/24 will be used to forward P301b. The GW-IP of that RT-5G route is 10.2 (R1). So PE3 uses 10.2 to do recursive route resolution and matches the RT-2R route of 10.2. Note that the recursive route resolution follows the DGW1 behavior of [I-D.ietf-bess-evpn-prefix-advertisement]'s section 4.1. Both PE1 and PE2 have advertised the IP A-D per EVI route for the of the RT-2R route of 10.2. PE3 may consider them as ECMP or FRR, depending on the ESI is all-active or single-active. Then PE3 can forward P301b to PE1 or PE2, depending on the ECMP/FRR procedures. We can assume that it is PE2 that will receive P301b from PE3. The destination IP of P301b is in prefix 60.0/24. That prefix has been installed into IPVRF1 on PE2. PE2 previously received that prefix either from a PE-CE route protocol or from a RT-5G route from PE1. The overlay nexthop or GW-IP of prefix 60.0/24 is 10.2, which is a host of IRC1's subnet. The outgoing interface for P301b is IRC1 interface. The ARP entry for 10.2 will be found by the same way as Section 3.1. then the ethernet header will be encapsulated by the same way as Section 3.1. then it will be forwarded to PE4 by the same way as Section 3.1. 4. Fast Convergence for Routed Traffic When IRC1 interface goes down, PE1 will withdraw the RT-5L route of 10.0/24. And the RT-5G routes of 60.0/24 and 70.0/24 will be just changed to stale state. When PE3 receives the withdraw of that RT-5L route, it will stop to forward the data packets of those two subnets to PE1 again. But PE3 will continue to forward these data packets to PE2. Wang & Zhang Expires 1 March 2022 [Page 11] Internet-Draft EVPN VPWS as VRF-AC August 2021 5. Considerations on ABRs and Route Reflectors When an ABR or ASBR receives a MAC/IP Advertisement Route that contains both EVI-RT and ES-Import RT, It should re-advertise that route even if that route's MPLS label1 is null (It should not consider that route as malformed). When that route's nexthop are changed to itself, It don't have to allocate a new label for each RT-2 route's MPLS label1 field separately. That field can be rewritten to the same preconfigured MPLS label that will blackhole the data packets it received. But the MPLS label2 (if is not null) field should be rewritten normally along with the nexthop-rewritting. 6. For Common CE-prefixes behind R1 and R2 We can assume that there is a common prefix (SN9) and two exclusive prefixes (SN7 and SN8). SN9 is behind both R1 and R2, SN7 is particular to R1 while SN8 is particular to R2. That's saying that PE5 can reach SN9 through either R1 or R2. 6.1. Solution 1: Independent CE-BGP sessions R1 and R2 don't know which prefix is their common prefix, and which prefix is their exclusive prefix. So R1 establish its own CE-BGP session S1 to PE1, and R2 establish its own CE-BGP session S2 to PE2. When R1(or R2) advertises IP prefixes to PE1(or PE2), the BGP next hop of these prefixes are set to R1's (or R2's) IP address in the IRC1's (or IRC2's) subnet . Wang & Zhang Expires 1 March 2022 [Page 12] Internet-Draft EVPN VPWS as VRF-AC August 2021 PE4 +-----------------------+ +-------------+ PE1 | | SN7 | VPWS1 | +------+------+ ----------> | + | +---------+ | | (VPN1) | RT5(SN9) | | | | _|_|_________|__ / IRC1 | GW-IP=R1 | | | | PW1 / | | P | (VPWS1) | RT2(R1,M1) | +-R1----O=====< | | +-----+-------+ |PE3 | | | \_|_|___ | +--+---+ | | | | | B \ +-----+-------+ | | | | +---------+ | \____|__(VPWS1) | | | | | | | \ IRC1 | ----------> | | SN9 | | |PE5 (VPN1) | RT2(R1,M1) |(VPN1)--R3 | | VPWS2 | ____|__ / IRC2 | RT2(R2,M2) | | | | +---------+ | / | (VPWS2) | | | | | | _|_|___/ +-----+-------+ | | | | | PW2 / | | B | +--+---+ +-R2----O=====< | | +-----+-------+ | | | | \_|_|_________|__(VPWS2) | ----------> | | | | | | P | \ IRC2 | RT5(SN9) | + | +---------+ | | (VPN1) | GW-IP=R2 | SN8 | | +------+------+ RT2(R2,M2) | +-------------+ PE2 | | +-----------------------+ Figure 2: Common CE-Prefixes and Exclusive CE-Prefixes In such case, the route advertisement is just the same as Section 2 (on the condition that the CEs are routers). Note that according to the recursive route resolution behavior of [I-D.ietf-bess-evpn-prefix-advertisement]'s section 4.1, If both RT- 5G routes of SN9 were equally preferable and ECMP is enabled, SN9 would be added to the routing table with both Overlay Index 10.2 and Overlay Index 20.2. 6.2. Solution 2: ECMP-Merging for RT-5G routes In some scenarios, R1 and R2 will not have any exclusive prefixes (e.g. SN7 or SN8 in Figure 2) at all, in other words, all prefixes of them are always their common prefixes, in such case, when R1 advertises SN9 to PE1 over that CE-BGP session S1, 10.2 may not be the best choice for SN9's BGP next hop. Wang & Zhang Expires 1 March 2022 [Page 13] Internet-Draft EVPN VPWS as VRF-AC August 2021 +-----------------------+ PE1 | | +------+------+ ----------> | | (VPN1) | RT5(SN9) | _____________|__ / IRC1 | GW-IP=IP201 | PW1 / P | (VPWS1) | RT2(IP201) | +-R1----O=====< +-----+-------+ MAC201 |PE3 | VPWS1 \_______ | +--+---+ | B \ +-----+-------+ | | | \____|__(VPWS1) | | | | | \ IRC1 | ----------> | | SN9 |PE5 (VPN1) | RT2(IP201) |(VPN1)--R3 | ____|__ / IRC2 | MAC201 | | | / | (VPWS2) | | | | _______/ +-----+-------+ | | | PW2 / B | +--+---+ +-R2----O=====< +-----+-------+ | VPWS2 \_____________|__(VPWS2) | ----------> | P | \ IRC2 | RT2(IP201) | | (VPN1) | MAC201 | +------+------+ | PE2 | | +-----------------------+ Figure 3: IP Aliasing of Common CE-Prefixes In such case, we can configure a common anycast loopback address (say IP201, whose value is 7.7.7.7) on R1 and R2. Then, when R1 advertise SN9 to PE1, R1 choose IP201 to be the BGP next-hop of the advertisement. Thus the RT-5G of SN9 from PE1 will be advertised along with GW-IP=IP201. In such case, we can configure a static route in VPN1 for IP201 on PE1, PE2 and PE5. The static route on PE1 (which is called as SRE1) use NH1 as its overlay next hop. The static route on PE2 (which is called as SRE2) use NH2 as its overlay next hop. The static route on PE5 (which is called as SRE5) use both NH1 and NH2 as its overlay next hops. If SRE1, SRE2 and SRE5 are advertised by RT-5G routes too, The recursive resolution will be complicated. There are two ways to simplify the recursive resolution. Wang & Zhang Expires 1 March 2022 [Page 14] Internet-Draft EVPN VPWS as VRF-AC August 2021 6.2.1. ECMP-Merging by RT-5L Note that IRC1 and IRC2 are on the same I-ES (say ESI512). Thus 10.2 (say NH1) and 20.2(say NH2) are behind different Ethernet Tags of the same I-ESI. We can assume that the ET-ID of IRC1 is ETI100, while the ET-ID of IRC2 (say ETI200) is ETI200. Thus 10.2 is behind , while 20.2 is behind . Then all of the three PEs advertise a RT-5L route (say RT5L_201, whose ESI is zero) for IP201 (in fact it is for SRE1, SRE1 or SRE2) separately. Then we advertise a RT-5G route for SN9 (say RT5G_SN9), the RT5G_SN9's GW-IP is IP201, and its ESI is 0, its ET-ID is 0. When PE3 receives RT5G_SN9 and RT5L_201, the GW-IP of RT5G_SN9 can be resolved to RT5L_201. Then the corresponding data packets of RT5G_SN9 will be forwarded according to IP201's ECMP pathes formed by the corresponding RT-5L routes. Note that we can use this approach to merge the two ECMP Path collections (e.g. s and s) for the CE- prefixes (e.g. SN9) behind a specified anycast IP address (e.g. 7.7.7.7 or IP201, which is the IP-address of a loopback interface). 6.2.2. ECMP-Merging by RT-2R We can substitute a RT-2 route (say RT2R_201) for RT5L_201(Section 6.2.1). The RT2R_201's IP address is IP201, its MAC address is MAC201, its RD is VPN1's RD, its ESI is 0, its ET-ID is 0. Such RT-2 routes MUST NOT carry any Route-Targets of a Broadcast Domain. Its MPLS Label2 field should be set to VPN1's EVPN label, thus its RMAC should be set to the PE's MAC address in VXLAN EVPN. and its MPLS Label1 field should be set to a pre-configured (for all such RT-2 routes) value. Note that MAC201 is a pre-configured MAC address for IP201. And the MAC201 MUST be advertised along with the Stricky flag. Note that the diferences between RT2R_201 and RT5L_201 exists only in the control plane, when they are installed into the FIB of VPN1 in the data plane, they will be the same. Wang & Zhang Expires 1 March 2022 [Page 15] Internet-Draft EVPN VPWS as VRF-AC August 2021 6.3. Solution 3: RT-5E Routes Advertisement For direct-prefixes and exclusive CE-prefixes behind each CE, no ESIs need to be advertised along with them, but for the common CE-prefixes behind R1 and R2, a virtual ESI can be used to achieve the ECMP- merging. 6.3.1. CE-Prefix Advertisement by RT-5E Routes This use case is different from Section 6.2 in the following: * There are common prefixes behind R1 and R2, but there are also other prefixes which can only be reached through R1 or R2. * For the common prefixes behind R1 and R2, the integration of R1 and R2 can be considered as a vRouter whose two LPUs is R1 and R2. Note that the vRouter concept is a logical entity only for the common prefixes behind R1 and R2, it should not be used for other prefixes. * The CE-prefixes are IPv6 prefixes whith IPv6 nexthop (NH21). * The vRouter is identified by VR621(Virtual Router-ID 621). * The VR621 can be mapped to form an IPv6 address VRID_IP. The VRID_IP are slected from an 96 bits IPv6 prefix VRID_Prefix, and the VRID_IP's lowest 32 bits may be set to a constant X. The VRID_Prefix's lowest 32 bits (of that 96 bits) should be set to VR621. Wang & Zhang Expires 1 March 2022 [Page 16] Internet-Draft EVPN VPWS as VRF-AC August 2021 +-------------------------------+ PE1 | | +------+--------+ ----------------> | vRouter | (VRF1:vES) | RT5(SN9) | +---------+ _______|__ / IRC1 | ESI=vES251 | | | VPWS1 / P | (VPWS1) | SOI=VR612 | | R1---+---O==< +-----+---------+ RT1(vES251,VR612) |PE3 | | \___ | Label=VRF1 +--+--+ | | B \ +-----+---------+ | | | VR612 | \__|__(VPWS1) | | | | | | \ IRC1 | ----------------> | | | | PE5 | (VRF1:vES) | RT1(vES251,VR612) | | | | __|__ / IRC2 | Label=VRF1 | | | (NH612) | / | (VPWS2) | | | | | ___/ +-----+---------+ | | | | VPWS2 / B | +--+--+ | R2---+---O==< +-----+---------+ | | | \_______|__(VPWS2) | ----------------> | +---+-----+ P | \ IRC2 | RT1(vES251,VR612) | | | (VRF1:vES) | Label=VRF1 | | +------+--------+ | + PE2 | | SN9(Common Prefix) +-------------------------------+ Figure 4: VRID as ET-ID * SOI-mapping Route per each VRID A special static route (which is called as SOI-mapping route) is configured for prefix VRID_Prefix on PE1, PE2, PE5, they are VRID_MR1 (VRID Mapping Route 1), VRID_MR2, VRID_MR5 respectively. VRID_MR1's nexthop is IP102 of R1, which is allocated from IRC1's subnet. VRID_MR2's nexthop is IP202 of R2, which is allocated from IRC2's subnet. VRID_MR5's nexthops are both IP102 and IP202. * I-ESI per L3 EVPN Instance Then we can assign an I-ESI (illustrated as vES251 in the figure) to that L3 EVI. Note that a single RT-1 per ES route will be advertised for vES251, because vES251 is dedicated to that L3 EVI. The RT-4 route will be advertised for DF-Election of vES251. AC-DF mode should be used for vES251. * Ethernet Tag per each vRouter Wang & Zhang Expires 1 March 2022 [Page 17] Internet-Draft EVPN VPWS as VRF-AC August 2021 Each vRouter of that L3 EVI is considered to be attached to an Ethernet Tag of vES251. The ET-ID of such Ethernet Tag will be a vRouter's VRID. The Ethernet A-D per EVI route advertisement is triggered by the SOI-mapping route (which represents the vRouter) per each PE, where: - RD: VRF1's RD. - ESI: VRF1's I-ESI (vES251). - ET-ID: The vRouter's VRID. - MPLS Label: VRF1's EVPN label. - Route Target: VRF1's eRT (export Route Target). * RT-5E Route per each CE-Prefix The CE-Prefixes are advertised using RT-5E route, instead of RT-5G route. When PE1 learns a CE-prefix SN9 from the CE-BGP session between PE1 and the vRouter, PE1 will advertise a RT-5E route RT5E_SN9, where: - RD: VRF1's RD. - Ethernet Tag ID: The ET-ID should be set to 0. - ESI: VRF1's I-ESI (vES251). - Supplementary Overlay Index: The VRID of the CE-Prefix's advertising vRouter. The SOI can be carried in IP-mapping SOI extended community. - MPLS Label: VRF1's EVPN label. - Route Target: VRF1's eRT (export Route Target). 6.3.1.1. When Internal Remote PEs Receive the RT-5E PE5 receives the RT5E_SN9 whose VRID_IP can match a local SOI-mapping route VRID_MR5, and VRID_MR5 indicates that RT5E_SN9 should be installed is if its overlay nexthop is the VRID_IP. The VRID_IP can be infered from the SOI and VRID_MR5 and the constant X. 6.3.1.2. When External Remote PEs Receive the RT-5E PE3 receives the RT5E_SN9 whose SOI can't match a local SOI-mapping route, RT5E_SN9 should be installed (as FIB_Entry_6) with as its Overlay Index. 6.3.1.3. Packet Walk Through When PE3 use that RT-5E to forward data packet DP6, it follows [I-D.wang-bess-evpn-ether-tag-id-usage]. When PE2 receives DP6 from PE3, it forwards DP6 according to FIB_Entry_6. Wang & Zhang Expires 1 March 2022 [Page 18] Internet-Draft EVPN VPWS as VRF-AC August 2021 6.3.2. The Advertisement of SOI-mapping Routes VRID_MR1, VRID_MR2, VRID_MR3 can be advertised using RT-5L along with EVI-RT and ES-Import RT to preclude the external remote PEs from importing these routes into their IP-VRF. Because that they don't have to be used on the external remote PEs. 6.3.3. IP-mapping SOI Extended Community The IP-specific SOI extended community is an extension of Supplementary Overlay Index extended community. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type=0x06 | Sub-Type=TBD |Type=4 |O|Z|F=1| Flags |V|G|Rsv| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP-mapping SOI | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: IP-mapping SOI Extended Community Where: IP-mapping SOI: A SOI that is derived from or mapped to an IP address, Router ID, static route, etc. V Flag: IPv6 Flag, when it is set to 1, it indicates that the SOI should be mapped to an overlay IPv6 nexthop on internal remote PEs, otherwise the SOI should be mapped to an overlay IPv4 nexthop (whose value is the same as the IP-mapping SOI field) on internal remote PEs. When V Flag is 1, on the internal remote PEs, the IP-mapping SOI will be mapped to an IPv6 address (like the VRID_IP in Section 6.3.1, Paragraph 2, Item 5) in the address space of the IP-VRF, then it will be used the same as the above case. When V Flag is zero, on the internal remote PEs, the IP-mapping SOI don't need to be mapped to an IPv6 address. F: Format Inicator is set to 1, to indicate that it is a type- specific SOI. Type: Type code is 4, to indicate that it is an IP-mapping SOI. Rsv: Reserved for future use. G Flag: When G Flag is zero, on the external remote PEs, the SOI- mapped IP address can be used as if it is the GW-IP field of the RT-5 route it belongs to, except for that it don't require to find a RT-2 routes (which is discussed in Appendix B.2 of [I-D.wang-bess-evpn-arp-nd-synch-without-irb]) before the recursive resolution. Wang & Zhang Expires 1 March 2022 [Page 19] Internet-Draft EVPN VPWS as VRF-AC August 2021 When V Flag is 1, the SOI-mapped IP address is an IPv6 address like the VRID_IP in Section 6.3.1, Paragraph 2, Item 5. When V Flag is 0, the SOI-mapped IP address is the SOI itself. When the G Flag is set to 1, the advertising PE should advertise an RT-5L route for that SOI-mapped IP address. and the RT-5L route should not use EVI-RT and ES-import RT. Other fields: The same as [I-D.wang-bess-evpn-ether-tag-id-usage]. 7. Security Considerations TBD. 8. IANA Considerations There is no IANA consideration needed. 9. Normative References [I-D.ietf-bess-srv6-services] Dawra, G., Filsfils, C., Talaulikar, K., Raszuk, R., Decraene, B., Zhuang, S., and J. Rabadan, "SRv6 BGP based Overlay Services", Work in Progress, Internet-Draft, draft-ietf-bess-srv6-services-07, 11 April 2021, . [I-D.ietf-bess-evpn-prefix-advertisement] Rabadan, J., Henderickx, W., Drake, J., Lin, W., and A. Sajassi, "IP Prefix Advertisement in EVPN", Work in Progress, Internet-Draft, draft-ietf-bess-evpn-prefix- advertisement-11, 18 May 2018, . [I-D.ietf-bess-evpn-inter-subnet-forwarding] Sajassi, A., Salam, S., Thoria, S., Drake, J., and J. Rabadan, "Integrated Routing and Bridging in EVPN", Work in Progress, Internet-Draft, draft-ietf-bess-evpn-inter- subnet-forwarding-15, 26 July 2021, . [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, . Wang & Zhang Expires 1 March 2022 [Page 20] Internet-Draft EVPN VPWS as VRF-AC August 2021 [RFC8214] Boutros, S., Sajassi, A., Salam, S., Drake, J., and J. Rabadan, "Virtual Private Wire Service Support in Ethernet VPN", RFC 8214, DOI 10.17487/RFC8214, August 2017, . [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., Uttaro, J., and W. Henderickx, "A Network Virtualization Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, DOI 10.17487/RFC8365, March 2018, . [I-D.wang-bess-evpn-ether-tag-id-usage] Wang, Y., "Ethernet Tag ID Usage Update for Ethernet A-D per EVI Route", Work in Progress, Internet-Draft, draft- wang-bess-evpn-ether-tag-id-usage-03, 26 August 2021, . [I-D.sajassi-bess-evpn-ip-aliasing] Sajassi, A., Badoni, G., Warade, P., Pasupula, S., Drake, J., and J. Rabadan, "EVPN Support for L3 Fast Convergence and Aliasing/Backup Path", Work in Progress, Internet- Draft, draft-sajassi-bess-evpn-ip-aliasing-02, 8 June 2021, . 10. Informative References [I-D.wang-bess-evpn-arp-nd-synch-without-irb] Wang, Y. and Z. Zhang, "ARP/ND Synching And IP Aliasing without IRB", Work in Progress, Internet-Draft, draft- wang-bess-evpn-arp-nd-synch-without-irb-07, 9 August 2021, . Authors' Addresses Yubao Wang ZTE Corporation No. 68 of Zijinghua Road, Yuhuatai Distinct Nanjing China Email: wang.yubao2@zte.com.cn Wang & Zhang Expires 1 March 2022 [Page 21] Internet-Draft EVPN VPWS as VRF-AC August 2021 Zheng(Sandy) Zhang ZTE Corporation No. 50 Software Ave, Yuhuatai Distinct Nanjing China Email: zhang.zheng@zte.com.cn Wang & Zhang Expires 1 March 2022 [Page 22]