Internet-Draft EVPN Applicability for NVO3 June 2022
Rabadan, et al. Expires 23 December 2022 [Page]
Workgroup:
NVO3 Workgroup
Internet-Draft:
draft-ietf-nvo3-evpn-applicability-04
Published:
Intended Status:
Informational
Expires:
Authors:
J. Rabadan, Ed.
Nokia
M. Bocci
Nokia
S. Boutros
Ciena
A. Sajassi
Cisco

Applicability of EVPN to NVO3 Networks

Abstract

In NVO3 networks, Network Virtualization Edge (NVE) devices sit at the edge of the underlay network and provide Layer-2 and Layer-3 connectivity among Tenant Systems (TSes) of the same tenant. The NVEs need to build and maintain mapping tables so that they can deliver encapsulated packets to their intended destination NVE(s). While there are different options to create and disseminate the mapping table entries, NVEs may exchange that information directly among themselves via a control-plane protocol, such as Ethernet Virtual Private Network (EVPN). EVPN provides an efficient, flexible and unified control-plane option that can be used for Layer-2 and Layer-3 Virtual Network (VN) service connectivity. This document describes the applicability of EVPN to NVO3 networks and how EVPN solves the challenges in those networks.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 23 December 2022.

Table of Contents

1. Introduction

In NVO3 networks, Network Virtualization Edge (NVE) devices sit at the edge of the underlay network and provide Layer-2 and Layer-3 connectivity among Tenant Systems (TSes) of the same tenant. The NVEs need to build and maintain mapping tables so that they can deliver encapsulated packets to their intended destination NVE(s). While there are different options to create and disseminate the mapping table entries, NVEs may exchange that information directly among themselves via a control-plane protocol, such as EVPN. EVPN provides an efficient, flexible and unified control-plane option that can be used for Layer-2 and Layer-3 Virtual Network (VN) service connectivity.

In this document, we assume that the EVPN control-plane module resides in the NVEs. The NVEs can be virtual switches in hypervisors, TOR/Leaf switches or Data Center Gateways. As described in [RFC7365], Network Virtualization Authorities (NVAs) may be used to provide the forwarding information to the NVEs, and in that case, EVPN could be used to disseminate the information across multiple federated NVAs. The applicability of EVPN would then be similar to the one described in this document. However, for simplicity, the description assumes control-plane communication among NVE(s).

2. EVPN and NVO3 Terminology

3. Why is EVPN Needed in NVO3 Networks?

Data Centers have adopted NVO3 architectures mostly due to the issues discussed in [RFC7364]. The architecture of a Data Center is nowadays based on a CLOS design, where every Leaf is connected to a layer of Spines, and there is a number of ECMP paths between any two leaf nodes. All the links between Leaf and Spine nodes are routed links, forming what we also know as an underlay IP Fabric. The underlay IP Fabric does not have issues with loops or flooding (like old Spanning Tree Data Center designs did), convergence is fast and ECMP provides a fairly optimal bandwidth utilization on all the links.

On this architecture and as discussed by [RFC7364] multi-tenant intra-subnet and inter-subnet connectivity services are provided by NVO3 tunnels, being VXLAN [RFC7348] or GENEVE [RFC8926] two examples of such tunnels.

Why is a control-plane protocol along with NVO3 tunnels required? There are three main reasons:

  1. Auto-discovery of the remote NVEs that are attached to the same VPN instance (Layer-2 and/or Layer-3) as the ingress NVE is.
  2. Dissemination of the MAC/IP host information so that mapping tables can be populated on the remote NVEs.
  3. Advanced features such as MAC Mobility, MAC Protection, BUM and ARP/ND traffic reduction/suppression, Multi-homing, Prefix Independent Convergence (PIC) like functionality, Fast Convergence, etc.

A possible approach to achieve points (a) and (b) above for multipoint Ethernet services, is "flood and learn". "Flood and learn" refers to not using a specific control-plane on the NVEs, but rather "flood" BUM traffic from the ingress NVE to all the egress NVEs attached to the same BD. The egress NVEs may then use data path MAC SA "learning" on the frames received over the NVO3 tunnels. When the destination host replies back and the frames arrive at the NVE that initially flooded BUM frames, the NVE will also "learn" the MAC SA of the frame encapsulated on the NVO3 tunnel. This approach has the following drawbacks:

EVPN provides a unified control-plane that solves the NVE auto-discovery, tenant MAP/IP dissemination and advanced features in a scalable way and keeping the independence of the underlay IP Fabric, i.e., there is no need to enable PIM in the underlay network and maintain multicast states for tenant BDs.

Section 4 describes how EVPN can be used to meet the control-plane requirements in an NVO3 network.

4. Applicability of EVPN to NVO3 Networks

This section discusses the applicability of EVPN to NVO3 networks. The intend is not to provide a comprehensive explanation of the protocol itself but give an introduction and point at the corresponding reference document, so that the reader can easily find more details if needed.

4.1. EVPN Route Types Used in NVO3 Networks

EVPN supports multiple Route Types and each type has a different function. For convenience, Table 1 shows a summary of all the existing EVPN route types and its usage. We will refer to these route types as RT-x routes throughout the rest of the document, where x is the type number included in the first column of Table 1.

Table 1: EVPN route types
Type Description Usage
1 Ethernet Auto-Discovery Multi-homing: Per-ES: Mass withdrawal, Per-EVI: aliasing/backup
2 MAC/IP Advertisement Host MAC/IP dissemination, supports MAC mobility and protection
3 Inclusive Multicast Ethernet Tag NVE discovery and BUM flooding tree setup
4 Ethernet Segment Multi-homing: ES auto-discovery and DF Election
5 IP Prefix IP Prefix dissemination
6 Selective Multicast Ethernet Tag Indicate interest for a multicast S,G or *,G
7 Multicast Join Synch Multi-homing: S,G or *,G state synch
8 Multicast Leave Synch Multi-homing: S,G or *,G leave synch
9 Per-Region I-PMSI A-D BUM tree creation across regions
10 S-PMSI A-D Multicast tree for S,G or *,G states
11 Leaf A-D Used for responses to explicit tracking

4.2. EVPN Basic Applicability for Layer-2 Services

Although the applicability of EVPN to NVO3 networks spans multiple documents, EVPN's baseline specification is [RFC7432]. [RFC7432] allows multipoint layer-2 VPNs to be operated as [RFC4364] IP-VPNs, where MACs and the information to setup flooding trees are distributed by MP-BGP [RFC4760]. Based on [RFC7432], [RFC8365] describes how to use EVPN to deliver Layer-2 services specifically in NVO3 Networks.

Figure 1 represents a Layer-2 service deployed with an EVPN BD in an NVO3 network.

                              +--TS2---+
                              *        | Single-Active
                              *        |   ESI-1
                            +----+  +----+
                            |BD1 |  |BD1 |
              +-------------|    |--|    |-----------+
              |             +----+  +----+           |
              |              NVE2    NVE3          NVE4
              |           EVPN NVO3 Network       +----+
         NVE1(IP-A)                               | BD1|-----+
        +-------------+      RT-2                 |    |     |
        |             |    +-------+              +----+     |
        |   +----+    |    |MAC1   |               NVE5     TS3
 TS1--------|BD1 |    |    |IP1    |              +----+     |
 MAC1   |   +----+    |    |Label L|--->          | BD1|-----+
 IP1    |             |    |NH IP-A|              |    | All-Active
        | Hypervisor  |    +-------+              +----+  ESI-2
        +-------------+                              |
              +--------------------------------------+
Figure 1: EVPN for L2 in an NVO3 Network - example

In a simple NVO3 network, such as the example of Figure 1, these are the basic constructs that EVPN uses for Layer-2 services (or Layer-2 Virtual Networks):

  • BD1 is an EVPN Broadcast Domain for a given tenant and TS1, TS2 and TS3 are connected to it. The five represented NVEs are attached to BD1 and are connected to the same underlay IP network. That is, each NVE learns the remote NVEs' loopback addresses via underlay routing protocol.
  • NVE1 is deployed as a virtual switch in a Hypervisor with IP-A as underlay loopback IP address. The rest of the NVEs in Figure 1 are physical switches and TS2/TS3 are multi-homed to them. TS1 is a virtual machine, identified by MAC1 and IP1. TS2 and TS3 are physically dual-connected to NVEs, hence they are normally not considered virtual machines.

4.2.1. Auto-Discovery and Auto-Provisioning

Auto-discovery is one of the basic capabilities of EVPN. The provisioning of EVPN components in NVEs is significantly automated, simplifying the deployment of services and minimizing manual operations that are prone to human error.

These are some of the Auto-Discovery and Auto-Provisioning capabilities available in EVPN:

  • Automation on Ethernet Segments (ES): an ES is defined as a group of NVEs that are attached to the same TS or network. An ES is identified by an Ethernet Segment Identifier (ESI) in the control plane, but neither the ESI nor the NVEs that share the same ES are required to be manually provisioned in the local NVE:

    • If the multi-homed TS or network are running protocols such as LACP (Link Aggregation Control Protocol) [IEEE.802.1AX_2014], MSTP (Multiple-instance Spanning Tree Protocol), G.8032, etc. and all the NVEs in the ES can listen to the protocol PDUs to uniquely identify the multi-homed TS/network, then the ESI can be "auto-sensed" or "auto-provisioned" following the guidelines in [RFC7432] section 5. The ESI can also be auto-derived out of other parameters that are common to all NVEs attached to the same ES.
    • As described in [RFC7432], EVPN can also auto-derive the BGP parameters required to advertise the presence of a local ES in the control plane (RT and RD). Local ESes are advertised using RT-4 routes and the ESI-import Route-Target used by RT-4 routes can be auto-derived based on the procedures of [RFC7432], section 7.6.
    • By listening to other RT-4 routes that match the local ESI and import RT, an NVE can also auto-discover the other NVEs participating in the multi-homing for the ES.
    • Once the NVE has auto-discovered all the NVEs attached to the same ES, the NVE can automatically perform the DF Election algorithm (which determines the NVE that will forward traffic to the multi-homed TS/network). EVPN guarantees that all the NVEs in the ES have a consistent DF Election.
  • Auto-provisioning of services: when deploying a Layer-2 Service for a tenant in an NVO3 network, all the NVEs attached to the same subnet must be configured with a MAC-VRF and the BD for the subnet, as well as certain parameters for them. Note that, if the EVPN service model is VLAN-based or VLAN-bundle, implementations do not normally have a specific provisioning for the BD (since it is in that case the same construct as the MAC-VRF). EVPN allows auto-deriving as many MAC-VRF parameters as possible. As an example, the MAC-VRF's RT and RD for the EVPN routes may be auto-derived. Section 5.1.2.1 in [RFC8365] specifies how to auto-derive a MAC-VRF's RT as long as VLAN-based service model is implemented. [RFC7432] specifies how to auto-derive the RD.

4.2.2. Remote NVE Auto-Discovery

Auto-discovery via MP-BGP [RFC4760] is used to discover the remote NVEs attached to a given BD, the NVEs participating in a given redundancy group, the tunnel encapsulation types supported by an NVE, etc.

In particular, when a new MAC-VRF and BD are enabled, the NVE will advertise a new RT-3 route. Besides other fields, the RT-3 route will encode the IP address of the advertising NVE, the Ethernet Tag (which is zero in case of VLAN-based and VLAN-bundle models) and also a PMSI Tunnel Attribute (PTA) that indicates the information about the intended way to deliver BUM traffic for the BD.

In the example of Figure 1, when BD1 is enabled, NVE1 will send an RT-3 route including its own IP address, Ethernet-Tag for BD1 and the PTA to the remote NVEs. Assuming Ingress Replication (IR) is used, the RT-3 route will include an identification for IR in the PTA and the VNI that the other NVEs in the BD must use to send BUM traffic to the advertising NVE. The other NVEs in the BD will import the RT-3 route and will add NVE1's IP address to the flooding list for BD1. Note that the RT-3 route is also sent with a BGP encapsulation attribute [RFC9012] that indicates what NVO3 encapsulation the remote NVEs should use when sending BUM traffic to NVE1.

Refer to [RFC7432] for more information about the RT-3 route and forwarding of BUM traffic, and to [RFC8365] for its considerations on NVO3 networks.

4.2.3. Distribution of Tenant MAC and IP Information

Tenant MAC/IP information is advertised to remote NVEs using RT-2 routes. Following the example of Figure 1:

  • In a given EVPN BD, TSes' MAC addresses are first learned at the NVE they are attached to, via data path or management plane learning. In Figure 1 we assume NVE1 learns MAC1/IP1 in the management plane (for instance, via Cloud Management System) since the NVE is a virtual switch. NVE2, NVE3, NVE4 and NVE5 are TOR/Leaf switches and they normally learn MAC addresses via data path.
  • Once NVE1's BD1 learns MAC1/IP1, NVE1 advertises that information along with a VNI and Next Hop IP-A in an RT-2 route. The EVPN routes are advertised using the RD/RTs of the MAC-VRF where the BD belongs. All the NVEs in BD1 learn local MAC/IP addresses and advertise them in RT-2 routes in a similar way.
  • The remote NVEs can then add MAC1 to their mapping table for BD1 (BT). For instance, when TS3 sends frames to NVE4 with MAC DA = MAC1, NVE4 does a MAC lookup on the BT that yields IP-A and Label L. NVE4 can then encapsulate the frame into an NVO3 tunnel with IP-A as the tunnel IP DA and L as the Virtual Network Identifier. Note that the RT-2 route may also contain the host's IP address (as in the example of Figure 1). While the MAC of the received RT-2 route is installed in the BT, the IP address may be installed in the Proxy-ARP/ND table (if enabled) or in the ARP/IP-VRF tables if the BD has an IRB. See Section 4.7.3 to see more information about Proxy-ARP/ND and Section 4.3. for more details about IRB and Layer-3 services.

Refer to [RFC7432] and [RFC8365] for more information about the RT-2 route and forwarding of known unicast traffic.

4.3. EVPN Basic Applicability for Layer-3 Services

[RFC9136] and [RFC9135] are the reference documents that describe how EVPN can be used for Layer-3 services. Inter Subnet Forwarding in EVPN networks is implemented via IRB interfaces between BDs and IP-VRFs. An EVPN BD corresponds to an IP subnet. When IP packets generated in a BD are destined to a different subnet (different BD) of the same tenant, the packets are sent to the IRB attached to the local BD in the source NVE. As discussed in [RFC9135], depending on how the IP packets are forwarded between the ingress NVE and the egress NVE, there are two forwarding models: Asymmetric and Symmetric model.

The Asymmetric model is illustrated in the example of Figure 2 and it requires the configuration of all the BDs of the tenant in all the NVEs attached to the same tenant. In that way, there is no need to advertise IP Prefixes between NVEs since all the NVEs are attached to all the subnets. It is called Asymmetric because the ingress and egress NVEs do not perform the same number of lookups in the data plane. In Figure 2, if TS1 and TS2 are in different subnets, and TS1 sends IP packets to TS2, the following lookups are required in the data path: a MAC lookup (on BD1's table), an IP lookup (on the IP-VRF) and a MAC lookup (on BD2's table) at the ingress NVE1 and then only a MAC lookup at the egress NVE. The two IP-VRFs in Figure 2 are not connected by tunnels and all the connectivity between the NVEs is done based on tunnels between the BDs.

               +-------------------------------------+
               |             EVPN NVO3               |
               |                                     |
             NVE1                                 NVE2
       +--------------------+            +--------------------+
       | +---+IRB +------+  |            |  +------+IRB +---+ |
 TS1-----|BD1|----|IP-VRF|  |            |  |IP-VRF|----|BD1| |
       | +---+    |      |  |            |  |      |    +---+ |
       | +---+    |      |  |            |  |      |    +---+ |
       | |BD2|----|      |  |            |  |      |----|BD2|----TS2
       | +---+IRB +------+  |            |  +------+IRB +---+ |
       +--------------------+            +--------------------+
               |                                     |
               +-------------------------------------+
Figure 2: EVPN for L3 in an NVO3 Network - Asymmetric model

In the Symmetric model, depicted in Figure 3, the same number of data path lookups is needed at the ingress and egress NVEs. For example, if TS1 sends IP packets to TS3, the following data path lookups are required: a MAC lookup at NVE1's BD1 table, an IP lookup at NVE1's IP-VRF and then IP lookup and MAC lookup at NVE2's IP-VRF and BD3 respectively. In the Symmetric model, the Inter Subnet connectivity between NVEs is done based on tunnels between the IP-VRFs.

               +-------------------------------------+
               |             EVPN NVO3               |
               |                                     |
             NVE1                                 NVE2
       +--------------------+            +--------------------+
       | +---+IRB +------+  |            |  +------+IRB +---+ |
 TS1-----|BD1|----|IP-VRF|  |            |  |IP-VRF|----|BD3|-----TS3
       | +---+    |      |  |            |  |      |    +---+ |
       | +---+IRB |      |  |            |  +------+          |
 TS2-----|BD2|----|      |  |            +--------------------+
       | +---+    +------+  |                        |
       +--------------------+                        |
               |                                     |
               +-------------------------------------+
Figure 3: EVPN for L3 in an NVO3 Network - Symmetric model

The Symmetric model scales better than the Asymmetric model because it does not require the NVEs to be attached to all the tenant's subnets. However, it requires the use of NVO3 tunnels on the IP-VRFs and the exchange of IP Prefixes between the NVEs in the control plane. EVPN uses RT-2 and RT-5 routes for the exchange of host IP routes (in the case of the RT-2 and the RT-5 routes) and IP Prefixes (RT-5 routes) of any length. As an example, in Figure 3, NVE2 needs to advertise TS3's host route and/or TS3's subnet, so that the IP lookup on NVE1's IP- VRF succeeds.

[RFC9135] specifies the use of RT-2 routes for the advertisement of host routes. Section 4.4.1 in [RFC9136] specifies the use of RT-5 routes for the advertisement of IP Prefixes in an "Interface-less IP-VRF-to-IP-VRF Model". The Symmetric model for host routes can be implemented following either approach:

  1. [RFC9135] uses RT-2 routes to convey the information to populate L2, ARP/ND and L3 FIB tables in the remote NVE. For instance, in Figure 3, NVE2 would advertise a RT-2 route with TS3's IP and MAC addresses, and including two labels/VNIs: a label-3/VNI-3 that identifies BD3 for MAC lookup (that would be used for L2 traffic in case NVE1 was attached to BD3 too) and a label-1/VNI-1 that identifies the IP-VRF for IP lookup (and will be used for L3 traffic). NVE1 imports the RT-2 route and installs TS3's IP in the IP-VRF route table with label-1/VNI-1. Traffic from e.g., TS2 to TS3, will be encapsulated with label-1/VNI-1 and forwarded to NVE2.
  2. [RFC9136] uses RT-2 routes to convey the information to populate the L2 FIB and ARP/ND tables, and RT-5 routes to populate the IP-VRF L3 FIB table. For instance, in Figure 3, NVE2 would advertise a RT-2 route including TS3's MAC and IP addresses with a single label-3/VNI-3. In this example, this RT-2 route wouldn't be imported by NVE1 because NVE1 is not attached to BD3. In addition, NVE2 would advertise a RT-5 route with TS3's IP address and label-1/VNI-1. This RT-5 route would be imported by NVE1's IP-VRF and the host route installed in the L3 FIB associated to label-1/VNI-1. Traffic from TS2 to TS3 would be encapsulated with label-1/VNI-1.

4.4. EVPN as Control Plane for NVO3 Encapsulations and GENEVE

[RFC8365] describes how to use EVPN for NVO3 encapsulations, such us VXLAN, nvGRE or MPLSoGRE. The procedures can be easily applicable to any other NVO3 encapsulation, in particular GENEVE.

The Generic Network Virtualization Encapsulation [RFC8926] has been recommended to be the proposed standard for NVO3 Encapsulation. The EVPN control plane can signal the GENEVE encapsulation type in the BGP Tunnel Encapsulation Extended Community (see [RFC9012]).

The NVO3 encapsulation design team has made a recommendation in [I-D.ietf-nvo3-encap] for a control plane to:

  1. Negotiate a subset of GENEVE option TLVs that can be carried on a GENEVE tunnel
  2. Enforce an order for GENEVE option TLVs and
  3. Limit the total number of options that could be carried on a GENEVE tunnel.

The EVPN control plane can easily extend the BGP Tunnel Encapsulation Attribute sub-TLV [RFC9012] to specify the GENEVE tunnel options that can be received or transmitted over a GENEVE tunnels by a given NVE. [I-D.ietf-bess-evpn-geneve] describes the EVPN control plane extensions to support GENEVE.

4.5. EVPN OAM and Application to NVO3

EVPN OAM (as in [I-D.ietf-bess-evpn-lsp-ping]) defines mechanisms to detect data plane failures in an EVPN deployment over an MPLS network. These mechanisms detect failures related to P2P and P2MP connectivity, for multi-tenant unicast and multicast L2 traffic, between multi-tenant access nodes connected to EVPN PE(s), and in a single-homed, single-active or all-active redundancy model.

In general, EVPN OAM mechanisms defined for EVPN deployed in MPLS networks are equally applicable for EVPN in NVO3 networks.

4.6. EVPN as the Control Plane for NVO3 Security

EVPN can be used to signal the security protection capabilities of a sender NVE, as well as what portion of an NVO3 packet (taking a GENEVE packet as an example) can be protected by the sender NVE, to ensure the privacy and integrity of tenant traffic carried over the NVO3 tunnels [I-D.sajassi-bess-secure-evpn].

4.7. Advanced EVPN Features for NVO3 Networks

This section describes how EVPN can be used to deliver advanced capabilities in NVO3 networks.

4.7.1. Virtual Machine (VM) Mobility

[RFC7432] replaces the traditional Ethernet Flood-and-Learn behavior among NVEs with BGP-based MAC learning, which in return provides more control over the location of MAC addresses in the BD and consequently advanced features, such as MAC Mobility. If we assume that VM Mobility means the VM's MAC and IP addresses move with the VM, EVPN's MAC Mobility is the required procedure that facilitates VM Mobility. According to [RFC7432] section 15, when a MAC is advertised for the first time in a BD, all the NVEs attached to the BD will store Sequence Number zero for that MAC. When the MAC "moves" within the same BD but to a remote NVE, the NVE that just learned locally the MAC, increases the Sequence Number in the RT-2 route's MAC Mobility extended community to indicate that it owns the MAC now. That makes all the NVE in the BD change their tables immediately with no need to wait for any aging timer. EVPN guarantees a fast MAC Mobility without flooding or black-holes in the BD.

4.7.2. MAC Protection, Duplication Detection and Loop Protection

The advertisement of MACs in the control plane, allows advanced features such as MAC protection, Duplication Detection and Loop Protection.

[RFC7432] MAC Protection refers to EVPN's ability to indicate - in a RT-2 route - that a MAC must be protected by the NVE receiving the route. The Protection is indicated in the "Sticky bit" of the MAC Mobility extended community sent along the RT-2 route for a MAC. NVEs' ACs that are connected to subject-to-be-protected servers or VMs, may set the Sticky bit on the RT-2 routes sent for the MACs associated to the ACs. Also, statically configured MAC addresses should be advertised as Protected MAC addresses, since they are not subject to MAC Mobility procedures.

[RFC7432] MAC Duplication Detection refers to EVPN's ability to detect duplicate MAC addresses. A "MAC move" is a relearn event that happens at an access AC or through a RT-2 route with a Sequence Number that is higher than the stored one for the MAC. When a MAC moves a number of times N within an M-second window between two NVEs, the MAC is declared as Duplicate and the detecting NVE does not re-advertise the MAC anymore.

[RFC7432] provides MAC Duplication Detection, and with an extension it can protect the BD against loops created by backdoor links between NVEs. The same principle (based on the Sequence Number) may be extended to protect the BD against loops. When a MAC is detected as duplicate, the NVE may install it as a black-hole MAC and drop received frames with MAC SA and MAC DA matching that duplicate MAC. The MAC Duplication extension to support Loop Protection is described in [I-D.ietf-bess-rfc7432bis].

4.7.3. Reduction/Optimization of BUM Traffic in Layer-2 Services

In BDs with a significant amount of flooding due to Unknown unicast and Broadcast frames, EVPN may help reduce and sometimes even suppress the flooding.

In BDs where most of the Broadcast traffic is caused by ARP (Address Resolution Protocol) and ND (Neighbor Discovery) protocols on the TSes, EVPN's Proxy-ARP and Proxy-ND capabilities may reduce the flooding drastically. The use of Proxy-ARP/ND is specified in [RFC9161].

Proxy-ARP/ND procedures along with the assumption that TSes always issue a GARP (Gratuitous ARP) or an unsolicited Neighbor Advertisement message when they come up in the BD, may drastically reduce the unknown unicast flooding in the BD.

The flooding caused by TSes' IGMP/MLD or PIM messages in the BD may also be suppressed by the use of IGMP/MLD and PIM Proxy functions, as specified in [I-D.ietf-bess-evpn-igmp-mld-proxy] and [I-D.skr-bess-evpn-pim-proxy]. These two documents also specify how to forward IP multicast traffic efficiently within the same BD, translate soft state IGMP/MLD/PIM messages into hard state BGP routes and provide fast-convergence redundancy for IP Multicast on multi-homed Ethernet Segments (ESes).

4.7.4. Ingress Replication (IR) Optimization for BUM Traffic

When an NVE attached to a given BD needs to send BUM traffic for the BD to the remote NVEs attached to the same BD, Ingress Replication is a very common option in NVO3 networks, since it is completely independent of the multicast capabilities of the underlay network. Also, if the optimization procedures to reduce/suppress the flooding in the BD are enabled (Section 4.7.3), in spite of creating multiple copies of the same frame at the ingress NVE, Ingress Replication may be good enough. However, in BDs where Multicast (or Broadcast) traffic is significant, Ingress Replication may be very inefficient and cause performance issues on virtual-switch-based NVEs.

[I-D.ietf-bess-evpn-optimized-ir] specifies the use of AR (Assisted Replication) NVO3 tunnels in EVPN BDs. AR retains the independence of the underlay network while providing a way to forward Broadcast and Multicast traffic efficiently. AR uses AR-REPLICATORs that can replicate the Broadcast/Multicast traffic on behalf of the AR-LEAF NVEs. The AR-LEAF NVEs are typically virtual-switches or NVEs with limited replication capabilities. AR can work in a single-stage replication mode (Non-Selective Mode) or in a dual-stage replication mode (Selective Mode). Both modes are detailed in [I-D.ietf-bess-evpn-optimized-ir].

In addition, [I-D.ietf-bess-evpn-optimized-ir] also describes a procedure to avoid sending Broadcast, Multicast or Unknown unicast to certain NVEs that do not need that type of traffic. This is done by enabling PFL (Pruned Flood Lists) on a given BD. For instance, an virtual-switch NVE that learns all its local MAC addresses for a BD via Cloud Management System, does not need to receive the BD's Unknown unicast traffic. Pruned Flood Lists help optimize the BUM flooding in the BD.

4.7.5. EVPN Multi-Homing

Another fundamental concept in EVPN is multi-homing. A given TS can be multi-homed to two or more NVEs for a given BD, and the set of links connected to the same TS is defined as Ethernet Segment (ES). EVPN supports single-active and all-active multi-homing. In single-active multi-homing only one link in the ES is active. In all-active multi-homing all the links in the ES are active for unicast traffic. Both modes support load-balancing:

  • Single-active multi-homing means per-service load-balancing to/from the TS. For example, in Figure 1, for BD1, only one of the NVEs can forward traffic from/to TS2. For a different BD, the other NVE may forward traffic.
  • All-active multi-homing means per-flow load-balanding for unicast frames to/from the TS. That is, in Figure 1 and for BD1, both NVE4 and NVE5 can forward known unicast traffic to/from TS3. For BUM traffic only one of the two NVEs can forward traffic to TS3, and both can forward traffic from TS3.

There are two key aspects in the EVPN multi-homing procedures:

  • DF (Designated Forwarder) election: the DF is the NVE that forwards the traffic to the ES in single-active mode. In case of all-active, the DF is the NVE that forwards the BUM traffic to the ES.
  • Split-horizon function: prevents the TS from receiving echoed BUM frames that the TS itself sent to the ES. This is especially relevant in all-active ESes, where the TS may forward BUM frames to a non-DF NVE that can flood the BUM frames back to the DF NVE and then the TS. As an example, in Figure 1, assuming NVE4 is the DF for ES-2 in BD1, BUM frames sent from TS3 to NVE5 will be received at NVE4 and, since NVE4 is the DF for DB1, it will forward them back to TS3. Split-horizon allows NVE4 (and any multi-homed NVE for that matter) to identify if an EVPN BUM frame is coming from the same ES or different, and if the frame belongs to the same ES2, NVE4 will not forward the BUM frame to TS3, in spite of being the DF.

While [RFC7432] describes the default algorithm for the DF Election, [RFC8584] and [I-D.ietf-bess-evpn-pref-df] specify other algorithms and procedures that optimize the DF Election.

The Split-horizon function is specified in [RFC7432] and it is carried out by using a special ESI-label that it identifies in the data path, all the BUM frames being originated from a given NVE and ES. Since the ESI-label is an MPLS label, it cannot be used in all the non-MPLS NVO3 encapsulations, therefore [RFC8365] defines a modified Split-horizon procedure that is based on the IP SA of the NVO3 tunnel, and it is known as "Local-Bias". It is worth noting that Local-Bias only works for all-active multi-homing, and not for single-active multi-homing.

4.7.6. EVPN Recursive Resolution for Inter-Subnet Unicast Forwarding

Section 4.3 describes how EVPN can be used for Inter Subnet Forwarding among subnets of the same tenant. RT-2 routes and RT-5 routes allow the advertisement of host routes and IP Prefixes (RT-5 route) of any length. The procedures outlined by Section 4.3 are similar to the ones in [RFC4364], only for NVO3 tunnels. However, [RFC9136] also defines advanced Inter Subnet Forwarding procedures that allow the resolution of RT-5 routes to not only BGP next-hops but also "overlay indexes" that can be a MAC, a GW IP or an ESI, all of them in the tenant space.

Figure 4 illustrates an example that uses Recursive Resolution to a GW-IP as per [RFC9136] section 4.4.2. In this example, IP-VRFs in NVE1 and NVE2 are connected by a SBD (Supplementary BD). An SBD is a BD that connects all the IP-VRFs of the same tenant, via IRB, and has no ACs. NVE1 advertises the host route TS2-IP/L (IP address and Prefix Length of TS2) in an RT-5 route with overlay index GWIP=IP1. Also, IP1 is advertised in an RT-2 route associated to M1, VNI-S and BGP next-hop NVE1. Upon importing the two routes, NVE2 installs TS2-IP/L in the IP-VRF with a next-hop that is the GWIP IP1. NVE2 also installs M1 in the SBD, with VNI-S and NVE1 as next-hop. If TS3 sends a packet with IP DA=TS2, NVE2 will perform a Recursive Resolution of the RT-5 route prefix information to the forwarding information of the correlated RT-2 route. The RT-5 route's Recursive Resolution has several advantages such as better convergence in scaled networks (since multiple RT-5 routes can be invalidated with a single withdrawal of the overlay index route) or the ability to advertise multiple RT-5 routes from an overlay index that can move or change dynamically. [RFC9136] describes a few use-cases.

               +-------------------------------------+
               |             EVPN NVO3               |
               |                                     +
             NVE1                                 NVE2
       +--------------------+            +--------------------+
       | +---+IRB +------+  |            |  +------+IRB +---+ |
 TS1-----|BD1|----|IP-VRF|  |            |  |IP-VRF|----|BD3|-----TS3
       | +---+    |      |-(SBD)------(SBD)-|      |    +---+ |
       | +---+IRB |      |IRB(IP1/M1)    IRB+------+          |
 TS2-----|BD2|----|      |  |            +-----------+--------+
       | +---+    +------+  |                        |
       +--------------------+                        |
               |   RT-2(M1,IP1,VNI-S,NVE1)-->        |
               |     RT-5(TS2-IP/L,GWIP=IP1)-->      |
               +-------------------------------------+
Figure 4: EVPN for L3 - Recursive Resolution example

4.7.7. EVPN Optimized Inter-Subnet Multicast Forwarding

The concept of the SBD described in Section 4.7.6 is also used in [I-D.ietf-bess-evpn-irb-mcast] for the procedures related to Inter Subnet Multicast Forwarding across BDs of the same tenant. For instance, [I-D.ietf-bess-evpn-irb-mcast] allows the efficient forwarding of IP multicast traffic from any BD to any other BD (or even to the same BD where the Source resides). The [I-D.ietf-bess-evpn-irb-mcast] procedures are supported along with EVPN multi-homing, and for any tree allowed on NVO3 networks, including IR or AR. [I-D.ietf-bess-evpn-irb-mcast] also describes the interoperability between EVPN and other multicast technologies such as MVPN (Multicast VPN) and PIM for inter-subnet multicast.

[I-D.ietf-bess-evpn-mvpn-seamless-interop] describes another potential solution to support EVPN to MVPN interoperability.

4.7.8. Data Center Interconnect (DCI)

Tenant Layer-2 and Layer-3 services deployed on NVO3 networks must be extended to remote NVO3 networks that are connected via non-NOV3 WAN networks (mostly MPLS based WAN networks). [RFC9014] defines some architectural models that can be used to interconnect NVO3 networks via MPLS WAN networks.

When NVO3 networks are connected by MPLS WAN networks, [RFC9014] specifies how EVPN can be used end-to-end, in spite of using a different encapsulation in the WAN. [RFC9014] also supports the use of NVO3 or Segment Routing (encoding 32-bit or 128-bit Segment Identifiers into labels or IPv6 addresses respectively) transport tunnels in the WAN.

Even if EVPN can also be used in the WAN for Layer-2 and Layer-3 services, there may be a need to provide a Gateway function between EVPN for NVO3 encapsulations and IPVPN for MPLS tunnels, if the operator uses IPVPN in the WAN. [I-D.ietf-bess-evpn-ipvpn-interworking] specifies the interworking function between EVPN and IPVPN for unicast Inter Subnet Forwarding. If Inter Subnet Multicast Forwarding is also needed across an IPVPN WAN, [I-D.ietf-bess-evpn-irb-mcast] describes the required interworking between EVPN and MVPN (Multicast Virtual Private Networks).

5. Conclusion

EVPN provides a unified control-plane that solves the NVE auto-discovery, tenant MAP/IP dissemination and advanced features required by NVO3 networks, in a scalable way and keeping the independence of the underlay IP Fabric, i.e. there is no need to enable PIM in the underlay network and maintain multicast states for tenant BDs.

This document justifies the use of EVPN for NVO3 networks, discusses its applicability to basic Layer-2 and Layer-3 connectivity requirements, as well as advanced features such as MAC-mobility, MAC Protection and Loop Protection, multi-homing, DCI and much more.

6. Conventions Used in this Document

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

7. Security Considerations

This document does not introduce any new procedure or additional signaling in EVPN, and relies on the security considerations of the individual specifications used as a reference throughout the document. In particular, and as mentioned in [RFC7432], control plane and forwarding path protection are aspects to secure in any EVPN domain, when applied to NVO3 networks.

[RFC7432] mentions security techniques such as those discussed in [RFC5925] to authenticate BGP messages, and those included in [RFC4271], [RFC4272] and [RFC6952] to secure BGP are relevant for EVPN in NVO3 networks as well.

8. IANA Considerations

None.

9. References

9.1. Normative References

[RFC7432]
Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, , <https://www.rfc-editor.org/info/rfc7432>.
[RFC7365]
Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. Rekhter, "Framework for Data Center (DC) Network Virtualization", RFC 7365, DOI 10.17487/RFC7365, , <https://www.rfc-editor.org/info/rfc7365>.
[RFC7364]
Narten, T., Ed., Gray, E., Ed., Black, D., Fang, L., Kreeger, L., and M. Napierala, "Problem Statement: Overlays for Network Virtualization", RFC 7364, DOI 10.17487/RFC7364, , <https://www.rfc-editor.org/info/rfc7364>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.

9.2. Informative References

[RFC9136]
Rabadan, J., Ed., Henderickx, W., Drake, J., Lin, W., and A. Sajassi, "IP Prefix Advertisement in Ethernet VPN (EVPN)", RFC 9136, DOI 10.17487/RFC9136, , <https://www.rfc-editor.org/info/rfc9136>.
[RFC9135]
Sajassi, A., Salam, S., Thoria, S., Drake, J., and J. Rabadan, "Integrated Routing and Bridging in Ethernet VPN (EVPN)", RFC 9135, DOI 10.17487/RFC9135, , <https://www.rfc-editor.org/info/rfc9135>.
[RFC8365]
Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., Uttaro, J., and W. Henderickx, "A Network Virtualization Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, DOI 10.17487/RFC8365, , <https://www.rfc-editor.org/info/rfc8365>.
[RFC8926]
Gross, J., Ed., Ganga, I., Ed., and T. Sridhar, Ed., "Geneve: Generic Network Virtualization Encapsulation", RFC 8926, DOI 10.17487/RFC8926, , <https://www.rfc-editor.org/info/rfc8926>.
[I-D.ietf-nvo3-encap]
Boutros, S. and D. E. Eastlake, "Network Virtualization Overlays (NVO3) Encapsulation Considerations", Work in Progress, Internet-Draft, draft-ietf-nvo3-encap-08, , <https://www.ietf.org/archive/id/draft-ietf-nvo3-encap-08.txt>.
[RFC9012]
Patel, K., Van de Velde, G., Sangli, S., and J. Scudder, "The BGP Tunnel Encapsulation Attribute", RFC 9012, DOI 10.17487/RFC9012, , <https://www.rfc-editor.org/info/rfc9012>.
[I-D.ietf-bess-evpn-lsp-ping]
Jain, P., Salam, S., Sajassi, A., Boutros, S., and G. Mirsky, "LSP-Ping Mechanisms for EVPN and PBB-EVPN", Work in Progress, Internet-Draft, draft-ietf-bess-evpn-lsp-ping-07, , <https://www.ietf.org/archive/id/draft-ietf-bess-evpn-lsp-ping-07.txt>.
[RFC9161]
Rabadan, J., Ed., Sathappan, S., Nagaraj, K., Hankins, G., and T. King, "Operational Aspects of Proxy ARP/ND in Ethernet Virtual Private Networks", RFC 9161, DOI 10.17487/RFC9161, , <https://www.rfc-editor.org/info/rfc9161>.
[I-D.ietf-bess-evpn-igmp-mld-proxy]
Sajassi, A., Thoria, S., Mishra, M., Drake, J., and W. Lin, "IGMP and MLD Proxy for EVPN", Work in Progress, Internet-Draft, draft-ietf-bess-evpn-igmp-mld-proxy-21, , <https://www.ietf.org/archive/id/draft-ietf-bess-evpn-igmp-mld-proxy-21.txt>.
[I-D.skr-bess-evpn-pim-proxy]
Rabadan, J., Kotalwar, J., Sathappan, S., Zhang, Z., and A. Sajassi, "PIM Proxy in EVPN Networks", Work in Progress, Internet-Draft, draft-skr-bess-evpn-pim-proxy-01, , <https://www.ietf.org/archive/id/draft-skr-bess-evpn-pim-proxy-01.txt>.
[I-D.ietf-bess-evpn-optimized-ir]
Rabadan, J., Sathappan, S., Lin, W., Katiyar, M., and A. Sajassi, "Optimized Ingress Replication Solution for Ethernet VPN (EVPN)", Work in Progress, Internet-Draft, draft-ietf-bess-evpn-optimized-ir-12, , <https://www.ietf.org/archive/id/draft-ietf-bess-evpn-optimized-ir-12.txt>.
[RFC8584]
Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet VPN Designated Forwarder Election Extensibility", RFC 8584, DOI 10.17487/RFC8584, , <https://www.rfc-editor.org/info/rfc8584>.
[I-D.ietf-bess-evpn-pref-df]
Rabadan, J., Sathappan, S., Przygienda, T., Lin, W., Drake, J., Sajassi, A., and S. Mohanty, "Preference-based EVPN DF Election", Work in Progress, Internet-Draft, draft-ietf-bess-evpn-pref-df-08, , <https://www.ietf.org/archive/id/draft-ietf-bess-evpn-pref-df-08.txt>.
[I-D.ietf-bess-evpn-irb-mcast]
Lin, W., Zhang, Z., Drake, J., Rosen, E. C., Rabadan, J., and A. Sajassi, "EVPN Optimized Inter-Subnet Multicast (OISM) Forwarding", Work in Progress, Internet-Draft, draft-ietf-bess-evpn-irb-mcast-06, , <https://www.ietf.org/archive/id/draft-ietf-bess-evpn-irb-mcast-06.txt>.
[RFC9014]
Rabadan, J., Ed., Sathappan, S., Henderickx, W., Sajassi, A., and J. Drake, "Interconnect Solution for Ethernet VPN (EVPN) Overlay Networks", RFC 9014, DOI 10.17487/RFC9014, , <https://www.rfc-editor.org/info/rfc9014>.
[I-D.ietf-bess-evpn-ipvpn-interworking]
Rabadan, J., Sajassi, A., Rosen, E., Drake, J., Lin, W., Uttaro, J., and A. Simpson, "EVPN Interworking with IPVPN", Work in Progress, Internet-Draft, draft-ietf-bess-evpn-ipvpn-interworking-06, , <https://www.ietf.org/archive/id/draft-ietf-bess-evpn-ipvpn-interworking-06.txt>.
[RFC7348]
Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, L., Sridhar, T., Bursell, M., and C. Wright, "Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks", RFC 7348, DOI 10.17487/RFC7348, , <https://www.rfc-editor.org/info/rfc7348>.
[RFC7510]
Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, "Encapsulating MPLS in UDP", RFC 7510, DOI 10.17487/RFC7510, , <https://www.rfc-editor.org/info/rfc7510>.
[RFC4364]
Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, , <https://www.rfc-editor.org/info/rfc4364>.
[CLOS1953]
Clos, C., "A Study of Non-Blocking Switching Networks", .
[I-D.ietf-bess-evpn-geneve]
Boutros, S., Sajassi, A., Drake, J., Rabadan, J., and S. Aldrin, "EVPN control plane for Geneve", Work in Progress, Internet-Draft, draft-ietf-bess-evpn-geneve-04, , <https://www.ietf.org/archive/id/draft-ietf-bess-evpn-geneve-04.txt>.
[I-D.ietf-bess-evpn-mvpn-seamless-interop]
Sajassi, A., Thiruvenkatasamy, K., Thoria, S., Gupta, A., and L. Jalil, "Seamless Multicast Interoperability between EVPN and MVPN PEs", Work in Progress, Internet-Draft, draft-ietf-bess-evpn-mvpn-seamless-interop-03, , <https://www.ietf.org/archive/id/draft-ietf-bess-evpn-mvpn-seamless-interop-03.txt>.
[I-D.sajassi-bess-secure-evpn]
Sajassi, A., Banerjee, A., Thoria, S., Carrel, D., Weis, B., and J. Drake, "Secure EVPN", Work in Progress, Internet-Draft, draft-sajassi-bess-secure-evpn-05, , <https://www.ietf.org/archive/id/draft-sajassi-bess-secure-evpn-05.txt>.
[RFC5925]
Touch, J., Mankin, A., and R. Bonica, "The TCP Authentication Option", RFC 5925, DOI 10.17487/RFC5925, , <https://www.rfc-editor.org/info/rfc5925>.
[RFC4271]
Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, DOI 10.17487/RFC4271, , <https://www.rfc-editor.org/info/rfc4271>.
[RFC4272]
Murphy, S., "BGP Security Vulnerabilities Analysis", RFC 4272, DOI 10.17487/RFC4272, , <https://www.rfc-editor.org/info/rfc4272>.
[RFC6952]
Jethanandani, M., Patel, K., and L. Zheng, "Analysis of BGP, LDP, PCEP, and MSDP Issues According to the Keying and Authentication for Routing Protocols (KARP) Design Guide", RFC 6952, DOI 10.17487/RFC6952, , <https://www.rfc-editor.org/info/rfc6952>.
[RFC4760]
Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol Extensions for BGP-4", RFC 4760, DOI 10.17487/RFC4760, , <https://www.rfc-editor.org/info/rfc4760>.
[I-D.ietf-bess-rfc7432bis]
Sajassi, A., Burdet, L. A., Drake, J., and J. Rabadan, "BGP MPLS-Based Ethernet VPN", Work in Progress, Internet-Draft, draft-ietf-bess-rfc7432bis-04, , <https://www.ietf.org/archive/id/draft-ietf-bess-rfc7432bis-04.txt>.
[IEEE.802.1AX_2014]
IEEE, "IEEE Standard for Local and metropolitan area networks -- Link Aggregation", .

Appendix A. Acknowledgments

The authors want to thank Aldrin Isaac for his comments.

Appendix B. Contributors

Appendix C. Authors' Addresses

Authors' Addresses

Jorge Rabadan (editor)
Nokia
520 Almanor Ave
Sunnyvale, CA 94085
United States of America
Matthew Bocci
Nokia
Sami Boutros
Ciena
Ali Sajassi
Cisco Systems, Inc.