Generic Protocol Extension for VXLAN
(VXLAN-GPE)Cisco Systemsfmaino@cisco.comArrcuslkreeger@gmail.comInteluri.elzur@intel.com
Internet
Network Working GroupLISP; L2 Overlay, L3 OverlayThis document describes extending Virtual eXtensible Local Area
Network (VXLAN), via changes to the VXLAN header, with four new
capabilities: support for multi-protocol encapsulation, support for
operations, administration and maintenance (OAM) signaling, support for
ingress-replicated BUM Traffic (i.e. Broadcast, Unknown unicast, or
Multicast), and explicit versioning. New protocol capabilities can be
introduced via shim headers.The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in .Virtual eXtensible Local Area Network VXLAN
defines an encapsulation format that encapsulates Ethernet frames in an
outer UDP/IP transport. As data centers evolve, the need to carry other
protocols encapsulated in an IP packet is required, as well as the need
to provide increased visibility and diagnostic capabilities within the
overlay. The VXLAN header does not specify the protocol being
encapsulated and therefore is currently limited to encapsulating only
Ethernet frame payload, nor does it provide the ability to define OAM
protocols. In addition, requires that new
transports not use transport layer port numbers to identify tunnel
payload, rather it encourages encapsulations to use their own
identifiers for this purpose. VXLAN-GPE is intended to extend the
existing VXLAN protocol to provide protocol typing, OAM, and versioning
capabilities.The Version and OAM bits are introduced in , and the choice of location for these fields is
driven by minimizing the impact on existing deployed hardware.In order to facilitate deployments of VXLAN-GPE with hardware
currently deployed to support VXLAN, changes from legacy VXLAN have been
kept to a minimum. provides a detailed
discussion about how VXLAN-GPE addresses the requirement for backward
compatibility with VXLAN.The capabilities of the VXLAN-GPE protocol can be extended by
defining next protocol "shim" headers that are used to implement new
data plane functions. For example, Group-Based Policy (GBP) or In-situ
Operations, Administration, and Maintenance (IOAM) metadata
functionalities can be added as specified in and .VXLAN provides a method of creating multi-tenant overlay networks by
encapsulating packets in IP/UDP along with a header containing a network
identifier which is used to isolate tenant traffic in each overlay
network from each other. This allows the overlay networks to run over an
existing IP network.Through this encapsulation, VXLAN creates stateless tunnels between
VXLAN Tunnel End Points (VTEPs) which are responsible for adding/
removing the IP/UDP/VXLAN headers and providing tenant traffic isolation
based on the VXLAN Network Identifier (VNI). Tenant systems are unaware
that their networking service is being provided by an overlay.When encapsulating packets, a VTEP must know the IP address of the
proper remote VTEP at the far end of the tunnel that can deliver the
inner packet to the Tenant System corresponding to the inner destination
address. The control plane used to distribute inner to outer mappings is
out of the scope of this document.The VXLAN Network Identifier (VNI) provides scoping for the addresses
in the header of the encapsulated PDU. If the encapsulated packet is an
Ethernet frame, this means the Ethernet MAC addresses are only unique
within a given VNI and may overlap with MAC addresses within a different
VNI. If the encapsulated packet is an IP packet, this means the IP
addresses are only unique within that VNI.The first 8 bits of the header are
the flag field. The bits designated "R" above are reserved flags.
These MUST be set to zero on transmission and ignored on
receipt.Indicates VXLAN-GPE protocol version.
The initial version is 0. If a receiver does not support the
version indicated it MUST drop the packet.The I bit MUST be set to
indicate a valid VNI.The P bit is set to
indicate that the Next Protocol field is present.The B bit is set to
indicate that this is ingress-replicated BUM Traffic (ie,
Broadcast, Unknown unicast, or Multicast).The O bit is set to indicate
that the packet is an OAM packet.This 8 bit field indicates the
protocol header immediately following the VXLAN-GPE header.This 24 bit field identifies the VXLAN overlay
network the inner packet belongs to. Inner packets belonging to
different VNIs cannot communicate with each other (unless
explicitly allowed by policy).Reserved fields MUST be set to zero on
transmission and ignored on receipt.This draft defines the following two changes to the VXLAN header in
order to support multi-protocol encapsulation:Flag bit 5 is defined as the Next Protocol
bit. The P bit MUST be set to 1 to indicate the presence of the 8
bit next protocol field.When UDP dest port=4790, P = 0 the "Next Protocol"
field must be set to zero and the payload MUST be ETHERNET(L2) as
defined by .Flag bit 5 was chosen as the P bit because this
flag bit is currently reserved in VXLAN.The lower 8 bits of the first
word are used to carry a next protocol. This next protocol field
contains the protocol of the encapsulated payload packet. A new
protocol registry will be requested from IANA, see section
10.2.This draft defines the following Next Protocol
values:ReservedIPv4IPv6EthernetNetwork Service Header UnassignedExperimentation and testingUnassigned (shim headers)Experimentation and testing (shim
headers)Next protocol values 0x7E, 0x7F and 0xFE, 0xFF are assigned for
experimentation and testing as per .Next protocol values from Ox80 to 0xFD are assigned to protocols
encoded as generic "shim" headers. All shim protocols MUST use the
header structure in , which includes a Type, a
Lenght, and a Next Protocol field. When shim headers are used with
other protocols identified by next protocol values from 0x0 to 0x7F,
all the shim headers MUST come first.Shim headers can be used to incrementally deploy new GPE features
without updating the implementation of each transit node between two
tunnel endpoints, and without punting the packet with shim headers of
unknown type to the 'slow' path. Transit nodes that are not aware of a
given shim header type MUST ignore that shim header and proceed to
parse the next protocol.VTEP implementations can keep the processing of known shim headers
in the 'fast' path (typically an ASIC), while punting the processing
of the remaining new GPE features to the 'slow' path.Shim protocols MUST have the first 32 bits defined as:Where:This field MAY be used to identify different
messages of this protocol.The length, in in 4-octet units, of this
protocol message not including the first 4 octets.The use of this field is reserved to the
protocol defined in this message.This next protocol field
contains the protocol of the encapsulated payload. The protocol
registry will be requested from IANA as per section 10.2.Flag bit 6 is defined as the B bit. When the B bit is set to 1, the
packet is marked as an an ingress-replicated BUM Traffic (i.e.
Broadcast, Unknown unicast, or Multicast) to help egress VTEP to
differentiate between known and unknown unicast. The details of using
the B bit are out of scope for this document, but please see for an example in the EVPN context. As with the
P-bit, bit 6 is currently a reserved flag in VXLAN.Flag bit 7 is defined as the O bit. When the O bit is set to 1, the
packet is an OAM packet and OAM processing MUST occur. Other header
fields including Next Protocol MUST adhere to the definitions in . The OAM protocol details are out of scope for
this document. As with the P-bit, bit 7 is currently a reserved flag
in VXLAN.VXLAN-GPE bits 2 and 3 are defined as version bits. These bits are
reserved in VXLAN. The version field is used to ensure backward
compatibility going forward with future VXLAN-GPE updates.The initial version for VXLAN-GPE is 0.In addition to the VXLAN-GPE header, the packet is further
encapsulated in UDP and IP. Data centers based on Ethernet, will then
send this IP packet over Ethernet.Outer UDP Header:Destination UDP Port: IANA has assigned the value 4790 for the
VXLAN-GPE UDP port. This well-known destination port is used when
sending VXLAN-GPE encapsulated packets.Source UDP Port: The source UDP port is used as entropy for devices
forwarding encapsulated packets across the underlay (ECMP for IP
routers, or load splitting for link aggregation by bridges). Tenant
traffic flows should all use the same source UDP port to lower the
chances of packet reordering by the underlay for a given flow. It is
recommended for VTEPs to generate this port number using a hash of the
inner packet headers. Implementations MAY use the entire 16 bit source
UDP port for entropy.UDP Checksum: see for considerations
related to UDP Checksum processing.Outer IP Header:This is the header used by the underlay network to deliver packets
between VTEPs. The destination IP address can be a unicast or a
multicast IP address. The source IP address must be the source VTEP IP
address which can be used to return tenant packets to the tenant system
source address within the inner packet header.When the outer IP header is IPv4, VTEPs MUST set the DF bit.Outer Ethernet Header:Most data centers networks are built on Ethernet. Assuming the outer
IP packet is being sent across Ethernet, there will be an Ethernet
header used to deliver the IP packet to the next hop, which could be the
destination VTEP or be a router used to forward the IP packet towards
the destination VTEP. If VLANs are in use within the data center, then
this Ethernet header would also contain a VLAN tag.The following figures show the entire stack of protocol headers that
would be seen on an Ethernet link carrying encapsulated packets from a
VTEP across the underlay network for both IPv4 and IPv6 based underlay
networks.If the inner packet (as indicated by the VXLAN-GPE Next Protocol
field) is an Ethernet frame, it is recommended that it does not
contain a VLAN tag. In the most common scenarios, the tenant VLAN tag
is translated into a VXLAN Network Identifier. In these scenarios,
VTEPs should never send an inner Ethernet frame with a VLAN tag, and a
VTEP performing decapsulation should discard any inner frames received
with a VLAN tag. However, if the VTEPs are specifically configured to
support it for a specific VXLAN Network Identifier, a VTEP may support
transparent transport of the inner VLAN tag between all tenant systems
on that VNI. The VTEP never looks at the value of the inner VLAN tag,
but simply passes it across the underlay.VTEPs MUST never fragment an encapsulated VXLAN-GPE packet, and
when the outer IP header is IPv4, VTEPs MUST set the DF bit in the
outer IPv4 header. It is recommended that the underlay network be
configured to carry an MTU at least large enough to accommodate the
added encapsulation headers. It is recommended that VTEPs perform Path
MTU discovery to
determine if the underlay network can carry the encapsulated payload
packet.VXLAN-GPE conforms, as an UDP-based encapsulation protocol, to the
UDP usage guidelines as specified in . The
applicability of these guidelines are dependent on the underlay IP
network and the nature of the encapsulated payload. outlines two applicability scenarios for
UDP applications, 1) general Internet and 2) controlled environment.
The controlled environment means a single administrative domain or
adjacent set of cooperating domains. A network in a controlled
environment can be managed to operate under certain conditions whereas
in general Internet this cannot be done. Hence requirements for a
tunnel protocol operating under a controlled environment can be less
restrictive than the requirements of general internet.VXLAN-GPE is intended to be deployed in a data center network
environment operated by a single operator or adjacent set of
cooperating network operators that fits with the definition of
controlled environments in [RFC8085].For the purpose of this document, a traffic-managed controlled
environment (TMCE), outlined in , is defined
as an IP network that is traffic-engineered and/or otherwise managed
(e.g., via use of traffic rate limiters) to avoid congestion.
Significant portions of text in this Section are based on .It is the responsibility of the network operators to ensure that
the guidelines/requirements in this section are followed as applicable
to their VXLAN-GPE deploymentsVXLAN-GPE does not natively provide congestion control
functionality and relies on the payload protocol traffic for
congestion control. As such VXLAN-GPE MUST be used with congestion
controlled traffic or within a network that is traffic managed to
avoid congestion (TMCE). An operator of a traffic managed network
(TMCE) may avoid congestion by careful provisioning of their networks,
rate-limiting of user data traffic and traffic engineering according
to path capacity.In order to provide integrity of VXLAN-GPE headers and payload, for
example to avoid mis-delivery of payload to different tenant systems
in case of data corruption, outer UDP checksum SHOULD be used with
VXLAN-GPE when transported over IPv4. The UDP checksum provides a
statistical guarantee that a payload was not corrupted in transit.
These integrity checks are not strong from a coding or cryptographic
perspective and are not designed to detect physical-layer errors or
malicious modification of the datagram (see Section 3.4 of ). In deployments where such a risk exists, an
operator SHOULD use additional data integrity mechanisms such as
offered by IPSec.An operator MAY choose to disable UDP checksum and use zero
checksum if VXLAN-GPE packet integrity is provided by other data
integrity mechanisms such as IPsec or additional checksums or if one
of the conditions in a, b, c are
met.By default, UDP checksum MUST be used when VXLAN-GPE is
transported over IPv6. A tunnel endpoint MAY be configured for use
with zero UDP checksum if additional requirements described in this
section are met.When VXLAN-GPE is used over IPv6, UDP checksum is used to protect
IPv6 headers, UDP headers and VXLAN-GPE headers and payload from
potential data corruption. As such by default VXLAN-GPE MUST use UDP
checksum when transported over IPv6. An operator MAY choose to
configure to operate with zero UDP checksum if operating in a
traffic managed controlled environment as stated in if one of the following conditions are
met:It is known that the packet corruption is exceptionally
unlikely (perhaps based on knowledge of equipment types in their
underlay network) and the operator is willing to take a risk of
undetected packet corruptionIt is judged through observational measurements (perhaps
through historic or current traffic flows that use non zero
checksum) that the level of packet corruption is tolerably low
and where the operator is willing to take the risk of undetected
corruptionVXLAN-GPE payload is carrying applications that are tolerant
of misdelivered or corrupted packets (perhaps through higher
layer checksum validation and/or reliability through
retransmission)In addition VXLAN-GPE tunnel implementations using Zero UDP
checksum MUST meet the following requirements:Use of UDP checksum over IPv6 MUST be the default
configuration for all VXLAN-GPE tunnelsIf VXLAN-GPE is used with zero UDP checksum over IPv6 then
such VTEP implementation MUST meet all the requirements
specified in section 4 of and
requirements 1 as specified in section 5 of The VTEP that decapsulates the packet SHOULD check the source
and destination IPv6 addresses are valid for the VXLAN-GPE
tunnel that is configured to receive Zero UDP checksum and
discard other packets for which such check failsThe VTEP that encapsulates the packet MAY use different IPv6
source addresses for each VXLAN-GPE tunnel that uses Zero UDP
checksum mode in order to strengthen the decapsulator's check of
the IPv6 source address (i.e the same IPv6 source address is not
to be used with more than one IPv6 destination address,
irrespective of whether that destination address is a unicast or
multicast address). When this is not possible, it is RECOMMENDED
to use each source address for as few VXLAN-GPE tunnels that use
zero UDP checksum as is feasibleMeasures SHOULD be taken to prevent VXLAN-GPE traffic over
IPv6 with zero UDP checksum from escaping into the general
Internet. Examples of such measures include employing packet
filters at the gateways or edge of a VXLAN-GPE network, and/or
keeping logical or physical separation of VXLAN network from
networks carrying General InternetThe above requirements do not change either the
requirements specified in as modified by
or the requirements specified in .The requirement to check the source IPv6 address in addition to
the destination IPv6 address, plus the recommendation against reuse
of source IPv6 addresses among VXLAN-GPE tunnels collectively
provide some mitigation for the absence of UDP checksum coverage of
the IPv6 header. A traffic-managed controlled environment that
satisfies at least one of three conditions listed at the beginning
of this section provides additional assurance.A VXLAN VTEP conforms to VXLAN frame format and uses UDP
destination port 4789 when sending traffic to VXLAN-GPE VTEP. As per
VXLAN, reserved bits 5 and 7, VXLAN-GPE P and O-bits respectively must
be set to zero. The remaining reserved bits must be zero, including
the VXLAN-GPE version field, bits 2 and 3. The encapsulated payload
MUST be Ethernet.A VXLAN-GPE VTEP MUST NOT encapsulate non-Ethernet frames to a
VXLAN VTEP. When encapsulating Ethernet frames to a VXLAN VTEP, the
VXLAN-GPE VTEP MUST conform to VXLAN frame format and hence will set
the P bit to 0, the Next Protocol to 0 and use UDP destination port
4789. A VXLAN-GPE VTEP MUST also set O = 0 and Ver = 0 when
encapsulating Ethernet frames to VXLAN VTEP. The receiving VXLAN VTEP
will treat this packet as a VXLAN packet.A method for determining the capabilities of a VXLAN VTEP (GPE or
non-GPE) is out of the scope of this draft.VXLAN-GPE uses a IANA assigned UDP destination port, 4790, when
sending traffic to VXLAN-GPE VTEPs.When encapsulating IP (including over Ethernet) packets provides guidance for mapping DSCP between
inner and outer IP headers. The Pipe model typically fits better
Network virtualization. The DSCP value on the tunnel header is set
based on a policy (which may be a fixed value, one based on the inner
traffic class, or some other mechanism for grouping traffic). Some
aspects of the Uniform model (which treats the inner and outer DSCP
value as a single field by copying on ingress and egress) may also
apply, such as the ability to remark the inner header on tunnel egress
based on transit marking. However, the Uniform model is not
conceptually consistent with network virtualization, which seeks to
provide strong isolation between encapsulated traffic and the physical
network. describes the mechanism for exposing ECN
capabilities on IP tunnels and propagating congestion markers to the
inner packets. This behavior MUST be followed for IP packets
encapsulated in VXLAN-GPE.Though Uniform or Pipe models could be used for TTL (or Hop Limit
in case of IPv6) handling when tunneling IP packets, Pipe model is
more aligned with network virtualization.
provides guidance on handling TTL between inner IP header and outer IP
tunnels; this model is more aligned with the Pipe model and is
recommended for use with VXLAN-GPE for network virtualization
applications.When a VXLAN-GPE router performs Ethernet encapsulation, the inner
802.1Q 3-bit priority code point (PCP) field MAY be mapped from the
encapsulated frame to the DSCP codepoint of the DS field defined in
.When a VXLAN-GPE router performs Ethernet encapsulation, the inner
header 802.1Q VLAN Identifier (VID) MAY be mapped to, or used to
determine the VXLAN Network Identitifer (VNI) field.This section provides three examples of protocols encapsulated using
the Generic Protocol Extension for VXLAN described in this document.VXLAN-GPE encapsulation does not affect security for the payload
protocol. The security considerations for VXLAN applies to VXLAN-GPE,
see .When crossing an untrusted link, such as the public Internet, IPsec
may be used to provide authentication and/or
encryption of the IP packets formed as part of VXLAN-GPE
encapsulation.Operators have to make an assessment based on their network
environment and determine the risks that are applicable to their
specific environment and use appropriate mitigation approaches as
applicable.A special thank you goes to Dino Farinacci for his guidance and
detailed review. Thanks to Tom Herbert for the suggestion to assign
codepoints for experimentations and testing.UDP 4790 port has been assigned by IANA for VXLAN-GPE.IANA is requested to set up a registry of "Next Protocol". These
are 8-bit values. Next Protocol values in the table below are defined
in this draft. New values are assigned via Standards Action .Next ProtocolDescriptionReference0x0ReservedThis Document0x1IPv4This Document0x2IPv6This Document0x3EthernetThis Document0x4NSHThis Document0x05..0x7DUnassigned0x7E, 0x7FExperimentation and testingThis Document0x80..0xFDUnassigned (shim headers)0x8E, 0x8FExperimentation and testing (shim headers)This DocumentThere are ten flag bits at the beginning of the VXLAN-GPE header,
followed by 16 reserved bits and an 8-bit reserved field at the end of
the header. New bits are assigned via Standards Action .Bits 0-1 - Reserve6Bits 2-3 - VersionBit 4 - Instance ID (I bit)Bit 5 - Next Protocol (P bit)Bit 6 - ReservedBit 7 - OAM (O bit)Bit 8-23 - ReservedBits 24-31 in the 2nd Word -- ReservedReserved bits/fields MUST be set to 0 by the sender and ignored by
the receiver.