Path Maximum Transmission Unit Discovery (PMTUD) for Bit Index Explicit Replication (BIER) LayerZTE Corp.gregimirsky@gmail.comJuniper Networksprz@juniper.net Nokiaandrew.dolganow@nokia.com
Routing
BIER Working GroupInternet-DraftBIEROAM
This document describes Path Maximum Transmission Unit Discovery (PMTUD) in Bit Indexed Explicit Replication (BIER) layer.
In packet switched networks, when a host seeks to transmit data to a
target destination, the data is transmitted as a set of packets. In many cases it is more efficient to
use the largest size packets that are less than or equal to the least
Maximum Transmission Unit (MTU) for any forwarding device
along the routed path to the IP destination for these packets.
Such "least MTU" is known as Path MTU (PMTU).
Fragmentation or packet drop, silent or not, may occur on hops along the route where a MTU is smaller
than the size of the datagram. To avoid any of the listed above behaviors, the packet source must find
the value of the least MTU, i.e. PMTU, that will be encountered along the route that
a set of packets will follow to reach the given set of destinations.
Such MTU determination along a specific path is referred to as path MTU discovery (PMTUD).
introduces and explains Bit Index Explicit Replication (BIER)
architecture and how it supports forwarding of multicast data packets.
A BIER domain consists of
Bit-Forwarding Routers (BFRs) that are uniquely identified by their respective BFR-ids. An ingress border router (acting as a
Bit Forwarding Ingress Router (BFIR)) inserts a Forwarding Bit Mask (F-BM) into a packet. Each targeted
egress node (referred to as a Bit Forwarding Egress Router (BFER)) is represented by Bit Mask Position (BMP)
in the BMS. A transit or intermediate BIER node, referred as BFR, forwards BIER encapsulated packets to
BFERs, identified by respective BMPs, according to a Bit Index Forwarding Table (BIFT).
BFR: Bit-Forwarding Router BFER: Bit-Forwarding Egress RouterBFIR: Bit-Forwarding Ingress Router BIER: Bit Index Explicit ReplicationBIFT: Bit Index Forwarding Tree F-BM: Forwarding Bit Mask MTU: Maximum Transmission Unit OAM: Operations, Administration and MaintenancePMTUD: Path MTU Discovery
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be interpreted as
described in BCP 14
when, and only when, they appear in all capitals, as shown here.
sets forth the requirement to define PMTUD
protocol for BIER domain. This document describes the extension to
for use in BIER PMTUD solution.
Current PMTUD mechanisms (, ,
and ) are primarily targeted
to work on point-to-point, i.e. unicast paths. These mechanisms use packet
fragmentation control by disabling fragmentation of the probe packet. As a result,
a transient node that cannot forward a probe packet that is bigger than its
link MTU sends to the packet source an error notification, otherwise the packet
destination may respond with a positive acknowledgement. Thus,
possibly through a series of iterations, varying the size of the probe packet,
the packet source discovers the PMTU of the particular path.
Thus applied such existing PMTUD solutions are inefficient for point-to-multipoint paths
constructed for multicast traffic. Probe packets must be flooded through the whole set of
multicast distribution paths over and over again until the very last egress responds with a
positive acknowledgement. Consider without loss of generality an example
multicast network presented in ,
where MTU on all links but one (B,D) is the same. If MTU on link (B,D) is
smaller than the MTU on the other links, using
existing PMTUD mechanism probes will unnecessary flood to leaf nodes E, F, and G
for the second and consecutive times and positive responses will
be generated and received by root A repeatedly.
A BFIR selects a set of BFERs for the specific multicast distribution. Such a BFIR determines, by
explicitly controlling subset of targeted BFERs and transmitting series
of probe packets, the MTU of that multicast distribution tree. The critical step is
that in case of failure at an intermediate BFR to forward towards the subset of
targeted downstream BFERs, the BFR responds with a partial (compared to the
one it received in the request) bitmask towards the originating BFIR in error
notification. That allows for retransmission of the next probe with smaller MTU address only
towards the failed downstream BFERs instead of all BFERs addressed in the previous probe.
In the scenario discussed in the
second and all following (if needed) probes will be sent only to the node D since
MTU discovery of E, F, and G has been completed already by the first probe successfully.
introduced BIER Ping as a transport-independent
OAM mechanism to detect and localize failures in the BIER data plane. This document specifies
how BIER Ping can be used to perform efficient PMTUD in the BIER domain.
Consider the network displayed in to be presentation of a BIER domain
and all nodes to be BFRs. To discover MTU over BIER domain to BFERs D, F, E, and G BFIR A will use
BIER Ping with Data TLV, defined in . Size of the first probe set to M_max
determined as minimal MTU value of BFIR's links to BIER domain.
As has been assumed in , MTUs of all links but link (B,D) are the same.
Thus BFERs E. F, and G would receive BIER Echo Request and will send their
respective replies to BFIR A. BFR B may pass the packet which is too large to
forward over egress link (B, D) to the appropriate network layer for error processing
where it would be recognized as a BIER Echo Request packet.
BFR B MUST send BIER Echo Reply to BFIR A and MUST include Downstream Mapping TLV,
defined in setting its fields in the following fashion:
MTU SHOULD be set to the minimal MTU value among all egress BIER links,
logical links between this and downstream BFRs,
that could be used to reach B's downstream BFERs;Address Type MUST be set to 0 [Ed.note: we need to define 0 as valid value for the Address
Type field with the specific semantics to "Ignore" it.]I flag MUST be cleared;Downstream Interface Address field (4 octets) MUST be zeroed and
MUST include in the Egress Bitstring sub-TLV the list of all BFERs that cannot be reached because
the attempted MTU turned out to be too small.
The BFIR will receive either of the two types of packets:
a positive Echo Reply from one of BFERs to which the probe has been sent. In this
case the bit corresponding to the BFER MUST be cleared from the BMS;
a negative Echo Reply with bit string listing unreached BFERs and recommended MTU
value MTU'. The BFIR MUST add the bit string
to its BMS and set size of the next probe as min(MTU, MTU')
If upon expiration of the Echo Request timer BFIR didn't receive any Echo Replies,
then the size of the probe SHOULD be decreased. There are scenarios
when an implementation of the PMTUD would not decrease the size of the probe.
For example, if upon expiration of the Echo Request timer BFIR didn't receive
any Echo Reply, then BFIR MAY continue to retransmit the probe using the initial
size and MAY apply probe delay retransmission procedures. The algorithm used to
delay retransmission procedures on BFIR is outside the scope of this specification.
The BFIR sends probes using BMS and locally defined retransmission
procedures until either the bit string is clear, i.e. contains no set bits, or until the BFIR
retransmission procedure terminates and PMTU discovery is declared unsuccessful.
In case of convergence of the procedure, the size of the last probe indicates the PMTU size
that can be used for all BFERs in the initial BMS without incurring fragmentation.
Thus we conclude that in order to comply with the requirement in :
a BFR SHOULD support PMTUD;a BFR MAY use defined per BIER sub-domain MTU value as initial MTU
value for discovery or use it as MTU for this BIER sub-domain to reach BFERs;a BFIR MUST have a locally defined of PMTUD probe retransmission procedure.
There needs to be a control for probe size in order to support the BIER PMTUD. Data TLV format
is presented in .
Type: indicates Data TLV, to be allocated by IANA .Length: the length of the Data field in octets.
Data: n octets (n = Length) of arbitrary data. The receiver SHOULD ignore it.
IANA is requested to assign new Type value for Data TLV Type from its registry of TLV and sub-TLV Types of BIER Ping
as follows:
ValueDescriptionReferenceTBA1DataThis document
Routers that support PMTUD based on this document are subject to the same security considerations as defined in
Authors greatly appreciate thorough review and the most detailed comments by Eric Gray.