Generic Protocol Extension for VXLANCisco Systemsfmaino@cisco.comlkreeger@gmail.comInteluri.elzur@intel.com
Internet
Network Working GroupLISP; L2 Overlay, L3 OverlayThis draft describes extending Virtual eXtensible Local Area Network
(VXLAN), via changes to the VXLAN header, with three new capabilities:
support for multi-protocol encapsulation, operations, administration and
management (OAM) signaling and explicit versioning.The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in .Virtual eXtensible Local Area Network VXLAN defines an encapsulation format that
encapsulates Ethernet frames in an outer UDP/IP transport. As data
centers evolve, the need to carry other protocols encapsulated in an IP
packet is required, as well as the need to provide increased visibility
and diagnostic capabilities within the overlay. The VXLAN header does
not specify the protocol being encapsulated and therefore is currently
limited to encapsulating only Ethernet frame payload, nor does it
provide the ability to define OAM protocols. In addition, requires that new transports not use transport
layer port numbers to identify tunnel payload, rather it encourages
encapsulations to use their own identifiers for this purpose. VXLAN GPE
is intended to extend the existing VXLAN protocol to provide protocol
typing, OAM, and versioning capabilities.The Version and OAM bits are introduced in , and the choice of location for these fields
is driven by minimizing the impact on existing deployed hardware.In order to facilitate deployments of VXLAN GPE with hardware
currently deployed to support VXLAN, changes from legacy VXLAN have been
kept to a minimum. provides a
detailed discussion about how VXLAN GPE addresses the requirement for
backward compatibility with VXLAN.VXLAN provides a method of creating multi-tenant overlay networks by
encapsulating packets in IP/UDP along with a header containing a network
identifier which is used to isolate tenant traffic in each overlay
network from each other. This allows the overlay networks to run over an
existing IP network.Through this encapsulation, VXLAN creates stateless tunnels between
VXLAN Tunnel End Points (VTEPs) which are responsible for adding/
removing the IP/UDP/VXLAN headers and providing tenant traffic isolation
based on the VXLAN Network Identifier (VNI). Tenant systems are unaware
that their networking service is being provided by an overlay.When encapsulating packets, a VTEP must know the IP address of the
proper remote VTEP at the far end of the tunnel that can deliver the
inner packet to the Tenant System corresponding to the inner destination
address. In the case of tenant multicast or broadcast, the outer IP
address may be an IP multicast group address, or the VTEP may replicate
the packet and send it to all known VTEPs. If multicast is used in the
underlay network to send encapsulated packets to remote VTEPs, Any
Source Multicast is used and each VTEP serving a particular VNI must
perform a (*, G) join to the same group IP address.Inner to outer address mapping can be determined in two ways. One is
source based learning in the data plane, and the other is distribution
via a control plane.Source based learning requires a receiving VTEP to create an inner to
outer address mapping by gleaning the information from the received
packets by correlating the inner source address to the outer source IP
address. When a mapping does not exist, a VTEP forwards the packets to
all remote VTEPs participating in the VNI by using IP multicast in the
IP underlay network. Each VTEP must be configured with the IP multicast
address to use for each VNI. How this occurs is out of scope.The control plane used to distribute inner to outer mappings is also
out of scope. It could use a centralized authority or be distributed, or
use a hybrid.The VXLAN Network Identifier (VNI) provides scoping for the addresses
in the header of the encapsulated PDU. If the encapsulated packet is an
Ethernet frame, this means the Ethernet MAC addresses are only unique
within a given VNI and may overlap with MAC addresses within a different
VNI. If the encapsulated packet is an IP packet, this means the IP
addresses are only unique within that VNI.The first 8 bits of the header are
the flag field. The bits designated "R" above are reserved flags.
These MUST be set to zero on transmission and ignored on
receipt.Indicates VXLAN GPE protocol version.
The initial version is 0. If a receiver does not support the
version indicated it MUST drop the packet.The I bit MUST be set to
indicate a valid VNI.The P bit is set to
indicate that the Next Protocol field is present.The B bit is set to
indicate that this is ingress-replicated BUM Traffic (ie,
Broadcast, Unknown unicast, or Multicast).The O bit is set to indicate
that the packet is an OAM packet.This 8 bit field indicates the
protocol header immediately following the VXLAN GPE header.This 24 bit field identifies the VXLAN overlay
network the inner packet belongs to. Inner packets belonging to
different VNIs cannot communicate with each other (unless
explicitly allowed by policy).Reserved fields MUST be set to zero on
transmission and ignored on receipt.This draft defines the following two changes to the VXLAN header in
order to support multi-protocol encapsulation:Flag bit 5 is defined as the Next Protocol
bit. The P bit MUST be set to 1 to indicate the presence of the 8
bit next protocol field.When UDP dest port=4790, P = 0 the "Next Protocol"
field must be set to zero and the payload MUST be ETHERNET(L2) as
defined by .Flag bit 5 was chosen as the P bit because this
flag bit is currently reserved in VXLAN.The lower 8 bits of the first
word are used to carry a next protocol. This next protocol field
contains the protocol of the encapsulated payload packet. A new
protocol registry will be requested from IANA, see section
9.2.This draft defines the following Next Protocol
values:IPv4IPv6EthernetNetwork Service Header Multiprotocol Label Switching . Please see for more
details.Flag bit 6 is defined as the B bit. When the B bit is set to 1, the
packet is marked as an an ingress-replicated BUM Traffic (i.e.
Broadcast, Unknown unicast, or Multicast) to help egress VTEP to
differentiate between known and unknown unicast. The details of using
the B bit are out of scope for this document, but please see for an example in the EVPN
context. As with the P-bit, bit 6 is currently a reserved flag in
VXLAN.Flag bit 7 is defined as the O bit. When the O bit is set to 1, the
packet is an OAM packet and OAM processing MUST occur. Other header
fields including Next Protocol MUST adhere to the definitions in . The OAM protocol details are out of scope
for this document. As with the P-bit, bit 7 is currently a reserved
flag in VXLAN.VXLAN GPE bits 2 and 3 are defined as version bits. These bits are
reserved in VXLAN. The version field is used to ensure backward
compatibility going forward with future VXLAN GPE updates.The initial version for VXLAN GPE is 0.In addition to the VXLAN GPE header, the packet is further
encapsulated in UDP and IP. Data centers based on Ethernet, will then
send this IP packet over Ethernet.Outer UDP Header:Destination UDP Port: IANA has assigned the value 4790 for the VXLAN
GPE UDP port. This well-known destination port is used when sending
VXLAN GPE encapsulated packets.Source UDP Port: The source UDP port is used as entropy for devices
forwarding encapsulated packets across the underlay (ECMP for IP
routers, or load splitting for link aggregation by bridges). Tenant
traffic flows should all use the same source UDP port to lower the
chances of packet reordering by the underlay for a given flow. It is
recommended for VTEPs to generate this port number using a hash of the
inner packet headers. Implementations MAY use the entire 16 bit source
UDP port for entropy.UDP Checksum: Source VTEPs MAY either calculate a valid checksum, or
if this is not possible, set the checksum to zero. When calculating a
checksum, it MUST be calculated across the entire packet (outer IP
header, UDP header, VXLAN GPE header and payload packet). All receiving
VTEPs must accept a checksum value of zero. If the receiving VTEP is
capable of validating the checksum, it MAY validate a non-zero checksum
and MUST discard the packet if the checksum is determined to be
invalid.Outer IP Header:This is the header used by the underlay network to deliver packets
between VTEPs. The destination IP address can be a unicast or a
multicast IP address. The source IP address must be the source VTEP IP
address which can be used to return tenant packets to the tenant system
source address within the inner packet header.When the outer IP header is IPv4, VTEPs MUST set the DF bit.Outer Ethernet Header:Most data centers networks are built on Ethernet. Assuming the outer
IP packet is being sent across Ethernet, there will be an Ethernet
header used to deliver the IP packet to the next hop, which could be the
destination VTEP or be a router used to forward the IP packet towards
the destination VTEP. If VLANs are in use within the data center, then
this Ethernet header would also contain a VLAN tag.The following figures show the entire stack of protocol headers that
would be seen on an Ethernet link carrying encapsulated packets from a
VTEP across the underlay network for both IPv4 and IPv6 based underlay
networks.If the inner packet (as indicated by the VXLAN GPE Next Protocol
field) is an Ethernet frame, it is recommended that it does not
contain a VLAN tag. In the most common scenarios, the tenant VLAN tag
is translated into a VXLAN Network Identifier. In these scenarios,
VTEPs should never send an inner Ethernet frame with a VLAN tag, and a
VTEP performing decapsulation should discard any inner frames received
with a VLAN tag. However, if the VTEPs are specifically configured to
support it for a specific VXLAN Network Identifier, a VTEP may support
transparent transport of the inner VLAN tag between all tenant systems
on that VNI. The VTEP never looks at the value of the inner VLAN tag,
but simply passes it across the underlay.VTEPs MUST never fragment an encapsulated VXLAN GPE packet, and
when the outer IP header is IPv4, VTEPs MUST set the DF bit in the
outer IPv4 header. It is recommended that the underlay network be
configured to carry an MTU at least large enough to accommodate the
added encapsulation headers. It is recommended that VTEPs perform Path
MTU discovery to determine if the underlay network can
carry the encapsulated payload packet.A VXLAN VTEP conforms to VXLAN frame format and uses UDP
destination port 4789 when sending traffic to VXLAN GPE VTEP. As per
VXLAN, reserved bits 5 and 7, VXLAN GPE P and O-bits respectively must
be set to zero. The remaining reserved bits must be zero, including
the VXLAN GPE version field, bits 2 and 3. The encapsulated payload
MUST be Ethernet.A VXLAN GPE VTEP MUST NOT encapsulate non-Ethernet frames to a
VXLAN VTEP. When encapsulating Ethernet frames to a VXLAN VTEP, the
VXLAN GPE VTEP MUST conform to VXLAN frame format and hence will set
the P bit to 0, the Next Protocol to 0 and use UDP destination port
4789. A VXLAN GPE VTEP MUST also set O = 0 and Ver = 0 when
encapsulating Ethernet frames to VXLAN VTEP. The receiving VXLAN VTEP
will treat this packet as a VXLAN packet.A method for determining the capabilities of a VXLAN VTEP (GPE or
non-GPE) is out of the scope of this draft.VXLAN GPE uses a IANA assigned UDP destination port, 4790, when
sending traffic to VXLAN GPE VTEPs.When encapsulating and decapsulating IPv4 and IPv6 packets, certain
fields, such as IPv4 Time to Live (TTL) from the inner IP header need
to be considered. VXLAN GPE IP encapsulation and decapsulation
utilizes the techniques described in ,
section 5.3.This section provides three examples of protocols encapsulated using
the Generic Protocol Extension for VXLAN described in this document.VXLAN's security is focused on issues around L2 encapsulation into
L3. With VXLAN GPE, issues such as spoofing, flooding, and traffic
redirection are dependent on the particular protocol payload
encapsulated.A special thank you goes to Dino Farinacci for his guidance and
detailed review.UDP 4790 port has been assigned by IANA for VXLAN GPE.IANA is requested to set up a registry of "Next Protocol". These
are 8-bit values. Next Protocol values 0, 1, 2, 3 and 4 are defined in
this draft. New values are assigned via Standards Action .Next ProtocolDescriptionReference0ReservedThis Document1IPv4This Document2IPv6This Document3EthernetThis Document4NSHThis Document5MPLSThis Document6..253UnassignedThere are ten flag bits at the beginning of the VXLAN GPE header,
followed by 16 reserved bits and an 8-bit reserved field at the end of
the header. New bits are assigned via Standards Action .Bits 0-1 - ReservedBits 2-3 - VersionBit 4 - Instance ID (I bit)Bit 5 - Next Protocol (P bit)Bit 6 - ReservedBit 7 - OAM (O bit)Bit 8-23 - ReservedBits 24-31 in the 2nd Word -- ReservedReserved bits/fields MUST be set to 0 by the sender and ignored by
the receiver.