IP Parcels and Advanced JumbosBoeing Research & TechnologyP.O. Box 3707SeattleWA98124USAfltemplin@acm.orgI-DInternet-DraftIP packets (both IPv4 and IPv6) contain a single unit of transport
layer protocol data which becomes the retransmission unit in case of loss.
Transport layer protocols including the Transmission Control Protocol (TCP)
and reliable delivery protocol users of the User Datagram Protocol (UDP)
prepare data units known as "segments", with individual IP packets
including only a single segment. This document presents new constructs
known as "IP Parcels" and "Advanced Jumbos". IP parcels permit a single
packet to carry multiple transport layer protocol segments in a
"packet-of-packets", while advanced jumbos provide significant
operational advantages over standard jumbograms for carrying truly
large segments. IP parcels and advanced jumbos provide essential
building blocks for improved performance, efficiency and integrity
while encouraging larger Maximum Transmission Units (MTUs) in the
Internet.IP packets (both IPv4 and IPv6 ) contain a single unit of transport layer protocol
data which becomes the retransmission unit in case of loss. Transport
layer protocols such as the Transmission Control Protocol (TCP) and reliable delivery protocol users of the User
Datagram Protocol (UDP) (including QUIC
, LTP and others)
prepare data units known as "segments", with individual IP packets
including only a single segment. This document presents a new construct
known as the "IP Parcel" which permits a single packet to carry
multiple transport layer protocol segments. This essentially creates
a "packet-of-packets" with the full {TCP,UDP}/IP headers appearing
only once but with possibly more than one segment.Transport layer protocol entities form parcels by preparing a
data buffer (or buffer chain) beginning with an Integrity Block of at
most 256 2-octet Checksums followed by their corresponding transport
layer protocol segments that can be broken out into individual packets
and/or smaller sub-parcels if necessary. All segments except the final
one must be equal in length and no larger than 65535 octets (minus
headers), while the final segment must not be larger than the others
but may be smaller. The transport layer protocol entity then delivers
the buffer(s), number of segments and non-final segment size to the
network layer which copies the buffer(s) into the body of a parcel
then includes a {TCP,UDP} header and an IP header plus extensions
that identify this as a parcel and not an ordinary packet.The network layer then forwards each parcel over consecutive
parcel-capable links in a path until they arrive at a next hop
link that does not support parcels, a parcel-capable link with a
size restriction, or an ingress middlebox Overlay Multilink Network
(OMNI) Interface that
spans intermediate Internetworks using adaptation layer encapsulation
and fragmentation. In the first case, the original source or next hop
router applies packetization to break the parcel into individual IP
packets. In the second case, the source/router applies network layer
parcellation to form smaller sub-parcels. In the final case, the
OMNI interface applies adaptation layer parcellation to form smaller
sub-parcels if necessary then applies adaptation layer encapsulation
and fragmentation if necessary before forwarding.These adaptation layer sub-parcels may then be reunified into
one or more larger sub-parcels by an egress middlebox OMNI interface
which either delivers them locally or forwards them over additional
parcel-capable links in the network path to the final destination.
The final destination can then apply network layer reunification (or
restoration) to concatenate elements of the same original parcel
into a single unit so as to present the largest possible number of
segments to the transport layer in a single system call. Reordering
and even loss or damage of individual segments within the network is
therefore possible, but what matters is that the parcels delivered
to the final destination's transport layer should be the largest
practical size for best performance and that loss or receipt of
individual segments (and not parcel size) determines the
retransmission unit.This document further specifies an "advanced jumbo" service that
provides useful extensions beyond the "basic" IPv6 jumbogram service
defined in . Advanced jumbos are defined for
both IP protocol versions and provide end systems and routers with a
more robust service when truly large segment sizes are necessary.The following sections discuss rationale for creating and shipping
IP parcels and advanced jumbos as well as actual protocol constructs
and procedures involved. IP parcels and advanced jumbos provide
essential building blocks for improved performance, efficiency and
integrity while encouraging larger Maximum Transmission Units (MTUs).
These services will further inspire future innovation in applications,
transport protocols, operating systems, network equipment and data
links in ways that will transform the Internet architecture.The Oxford Languages dictionary defines a "parcel" as "a thing or
collection of things wrapped in paper in order to be carried or sent
by mail". Indeed, there are many examples of parcel delivery services
worldwide that provide an essential transit backbone for efficient
business and consumer transactions.In this same spirit, an "IP parcel" is simply a collection of at most
256 transport layer protocol segments wrapped in an efficient package
for transmission and delivery (i.e., a "packet-of-packets") while a
"singleton IP parcel" is simply a parcel that contains a single segment.
IP parcels are distinguished from ordinary packets through the
constructs specified in this document.The IP parcel construct is defined for both IPv4 and IPv6. Where the
document refers to "IPv4 header length", it means the total length of
the base IPv4 header plus all included options, i.e., as determined by
consulting the Internet Header Length (IHL) field. Where the document
refers to "IPv6 header length", however, it means only the length of the
base IPv6 header (i.e., 40 octets), while the length of any extension
headers is referred to separately as the "IPv6 extension header length".
Finally, the term "IP header plus extensions" refers generically to an
IPv4 header plus all included options or an IPv6 header plus all
included extension headers.Where the document refers to "{TCP,UDP} header length", it means
the length of either the TCP header plus options (20 or more octets)
or the UDP header (8 octets). It is important to note that only a
single IP header and a single full {TCP,UDP} header appears in
each parcel regardless of the number of segments included. This
distinction often provides a significant savings in overhead made
possible only by IP parcels.Where the document refers to checksum calculations, it means the
standard Internet checksum unless otherwise specified. The same as for
TCP , UDP and IPv4
, the standard Internet checksum is defined as
(sic) "the 16-bit one's complement of the one's complement sum of all
(pseudo-)headers plus data, padded with zero octets at the end (if
necessary) to make a multiple of two octets". A notional Internet
checksum algorithm can be found in , while
practical implementations require special attention to byte ordering
"endianness" to ensure interoperability between diverse architectures.The terms "application layer (L5 and higher)", "transport layer
(L4)", "network layer (L3)", "(data) link layer (L2)" and "physical
layer (L1)" are used consistently with common Internetworking
terminology, with the understanding that reliable delivery protocol
users of UDP are considered as transport layer elements. The OMNI
specification further defines an "adaptation layer" logically positioned
below the network layer but above the link layer (which may include
physical links and Internet- or higher-layer tunnels). The adaptation
layer is simply known as "the layer below L3 but above L2" and does
not assign a layer number itself. A network interface is a node's
attachment to a link (via L2), and an OMNI interface is therefore
a node's attachment to an OMNI link (via the adaptation layer). The term "parcel-capable link/path" refers to paths that traverse
interfaces to adaptation and/or link layer media (either physical or
virtual) capable of transiting {TCP,UDP}/IP packets that employ the
parcel constructs specified in this document. The source and each
router in the path has a "next hop link" that forwards parcels toward
the final destination, while each router and the final destination has
a "previous hop link" that accepts en route parcels. Each next hop link
must be capable of forwarding parcels (after first applying parcellation
if necessary) with segment lengths no larger than can transit the link.
Currently only the OMNI link satisfies these properties, but new and
existing link types are also encouraged to support parcels.The term "5-tuple" refers to a transport layer protocol entity
identifier that includes the network layer (source address,
destination address, source port, destination port, protocol number).
The term "3-tuple" refers to a network layer parcel entity
identifier that includes the adaptation layer (source address,
destination address, Parcel ID).The term "Maximum Transmission Unit (MTU)" is widely understood
in Internetworking terminology to mean the largest packet size that
can traverse a single link ("link MTU") or an entire path ("path MTU")
without requiring network layer IP fragmentation. If the MTU value
returned during parcel path qualification is larger than 65535 (plus
the length of the parcel headers), it determines the maximum parcel
size that can traverse the link/path without requiring a router to
perform packetization/parcellation. Otherwise, the MTU determines
the "Maximum Segment Size (MSS)" for the leading portion of the
path up to a router that cannot forward the parcel further. (Note
that this size may still be larger than the MSS that can traverse
the remainder of the path to the final destination, which can
only be determined through additional probing.)The terms "packetization" and "restoration" refer to a network
layer process in which the original source or a router on the path
breaks a parcel out into individual IP packets that can transit
the remainder of the path without loss due to a size restriction.
The final destination then restores the combined packet contents
into a parcel before delivery to the transport layer. In current
practice, packetization/restoration are considered to be one and
the same as Generic Segmentation/Receive Offload (GSO/GRO).The terms "parcellation" and "reunification" refer to either
network layer or adaptation layer processes in which the original
source or a router on the path breaks a parcel into smaller
sub-parcels that can transit the path without loss due to a size
restriction. These sub-parcels are then reunified into larger
(sub-)parcels before delivery to the transport layer. As a network
layer process, the sub-parcels resulting from parcellation may
only be reunified at the final destination. As an adaptation
layer process, the resulting sub-parcels may be first reunified
at an adaptation layer egress node then possibly further
reunified by the network layer of the final destination.The parcel sizing variables "J", "K", "L" and "M" are cited
extensively throughout the document. "J" denotes the number of
segments included in the parcel (also termed "Nsegs"), "L" is the
length of each non-final segment, "K" is the length of the final
segment and "M" is the overall parcel length (also termed
"Parcel Payload Length").The term "advanced jumbo" refers to a new type of IP jumbogram
defined for both IP protocol versions and derived from "basic" IPv6
jumbograms as defined in . Advanced jumbos
include a 32-bit Jumbo Payload Length field the same as for basic
IPv6 jumbograms, but are differentiated by including a non-zero
integer "Type" value (i.e., 1 or 2) in the IP {Total, Payload}
Length field. Type 1 or 2 advanced jumbos can be in either
minimal or expanded format, with expanded format including
additional Jumbo Payload option control information.Automatic Extended Route Optimization (AERO) and the Overlay Multilink Network
Interface (OMNI) provide an
architectural framework for transmission of IP parcels over existing
Internetworks. AERO/OMNI will provide an operational environment for
IP parcels beginning from the earliest deployment phases and extending
indefinitely to accommodate continuous future growth. As more and more
parcel-capable links are deployed (e.g., in data centers, edge networks,
space-domain, and other high data rate services) AERO/OMNI will continue
to provide an essential service for true IP parcel Internetworking.The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP 14
when, and only when,
they appear in all capitals, as shown here.Studies have shown that applications can improve their performance by
sending and receiving larger packets due to reduced numbers of system
calls and interrupts as well as larger atomic data copies between kernel
and user space. Larger packets also result in reduced numbers of network
device interrupts and better network utilization (e.g., due to header
overhead reduction) in comparison with smaller packets.A first study involved performance enhancement
of the QUIC protocol using the linux Generic
Segment/Receive Offload (GSO/GRO) facility. GSO/GRO provides a robust
service that has shown significant performance increases based on a
multi-segment transfer capability between the operating system kernel
and QUIC applications. GSO/GRO performs fragmentation and reassembly at
the transport layer with the transport protocol segment size limited by
the path MTU (typically 1500 octets or smaller in today's Internet).A second study showed that
GSO/GRO also improves performance for the Licklider Transmission
Protocol (LTP) used for the Delay Tolerant
Networking (DTN) Bundle Protocol for segments
larger than the actual path MTU through the use of OMNI interface
encapsulation and fragmentation. Historically, the NFS protocol also
saw significant performance increases using larger (single-segment)
UDP datagrams even when IP fragmentation is invoked, and LTP still
follows this profile today. Moreover, LTP shows this (single-segment)
performance increase profile extending to the largest possible segment
size which suggests that additional performance gains are possible
using (multi-segment) IP parcels that approach or even exceed
65535 octets.TCP also benefits from larger packet sizes and efforts have
investigated TCP performance using jumbograms internally with changes
to the linux GSO/GRO facilities . The approach
proposed to use the Jumbo Payload option internally and to allow GSO/GRO
to use buffer sizes larger than 65535 octets, but with the understanding
that links that support jumbograms natively are not yet widely available.
Hence, IP parcels provide a packaging that can be considered in the
near term under current deployment limitations.A limiting consideration for sending large packets is that they are
often lost at links with MTU restrictions, and the resulting Packet Too
Big (PTB) message may
be lost somewhere in the return path to the original source. This "Path
MTU black hole" condition can degrade performance unless robust path
probing techniques are used, however the best case performance always
occurs when loss of packets due to size restrictions is minimized.These considerations therefore motivate a design where transport
protocols can employ segment sizes as large as 65535 octets (minus
headers), while parcels that carry multiple segments may themselves
be significantly larger. This would allow the receiving transport
layer protocol entity to process multiple segments in parallel
instead of one at a time per existing practices. Parcels therefore
support improvements in performance, integrity and efficiency for
the original source, final destination and networked path as a whole.
This is true even if the network and lower layers need to apply
packetization/restoration, parcellation/reunification and/or
fragmentation/reassembly.An analogy: when a consumer orders 50 small items from a major online
retailer, the retailer does not ship the order in 50 separate small
boxes. Instead, the retailer packs as many of the small items as
possible into one or a few larger boxes (i.e., parcels) then places the
parcels on a semi-truck or airplane. The parcels may then pass through
one or more regional distribution centers where they may be repackaged
into different parcel configurations and forwarded further until they
are finally delivered to the consumer. But most often, the consumer will
only find one or a few parcels at their doorstep and not 50 separate
small boxes. This flexible parcel delivery service greatly reduces
shipping and handling cost for all including the retailer, regional
distribution centers and finally the consumer.A transport protocol entity identified by its 5-tuple
forms a parcel body when it prepares a data buffer (or buffer chain)
containing an Integrity Block of at most 256 2-octet Checksums
followed by their corresponding transport layer protocol segments,
with each TCP non-first segment preceded by a 4-octet Sequence
Number header. All non-final segments MUST be equal in length
while the final segment MUST NOT be larger and MAY be smaller.The non-final segment size L MUST be set to a value between
16 and 65535 octets and SHOULD be no larger than the minimum of
65535 octets and the path MTU, minus the length of the {TCP,UDP}
header (plus options), minus the length of the IP header (plus
options/extensions), minus 2 octets for the per-segment Checksum
(see: ). The transport layer protocol
entity then presents the buffer(s) and size L to the network
layer, noting that the combined buffer length(s) may exceed
65535 octets if there are sufficient segments of a large
enough size.If the next hop link is not parcel capable, the network layer
performs packetization to configure each segment as an individual IP
packet as discussed in . Otherwise,
the network layer forms a parcel by appending a single full {TCP,UDP}
header (plus options) and a single full IP header (plus options/extensions).
The network layer finally includes a specially-formatted "Parcel Payload"
option as an extension to the IP header of each parcel prior to
transmission over a network interface.The Parcel Payload Option formats for both IP protocol versions
are derived from the Jumbo Payload option specified in and appear as shown in :
For IPv4, the network layer includes the Parcel Payload option
as an IPv4 header option with Option Type set to '00001011' and
Option Data Length set to '00010000' (noting that the length also
distinguishes this type from its obsoleted use as the "IPv4 Probe
MTU" option ). The network layer sets Code
to 255 and sets Check to the same value that will appear in the
IPv4 header TTL field upon transmission to the next hop. The
network layer also sets Parcel Payload Length to a 3-octet value
M that encodes the length of the IPv4 header plus the length of
the {TCP,UDP} header plus the combined length of the Integrity
Block plus all concatenated segments. The network layer then sets
the IPv4 header DF bit to 1 and Total Length field to the
non-final segment size L.For IPv6, the network layer includes the Parcel Payload option
as an IPv6 Hop-by-Hop option with Option Type set to '01101110'
and Option Data Length set to '00001100'. (Note that the Option
Type is coded with the upper 2 bits set to '01' so that routers
that do not recognize the option will drop the packet without
returning an ICMPv6 message, and with bit 3 set to '1' since
the contents may change on the path. For further Hop-by-Hop
option processing considerations, see: .) The network layer then
sets Parcel Payload Length to a 3-octet value M that encodes the
lengths of all IPv6 extension headers present plus the length of
the {TCP,UDP} header plus the combined length of the Integrity
Block plus all concatenated segments. The network layer also sets
the IPv6 header Payload Length field to L.For both IP protocol versions, the network layer then sets
Nsegs to a value J between 0 and 255 and sets Identification
and PMTU as specified in . The network
layer finally sets the "(P)robe Path MTU" flag to '1' for
probes or '0' for non-probes, sets the "More (S)ub-parcels"
flag to '1' for non-final sub-parcels or '0' for the final
(sub-)parcel.Following transport and network layer processing, {TCP,UDP}/IP
parcels therefore have the structures shown in
:where the total number of segments is (J + 1), L
is the length of each non-final segment (between 16 and 65535
octets), and K is the length of the final segment which MUST
be no larger than L.The {TCP,UDP}/IP header is immediately followed by an Integrity
Block containing (J + 1) 2-octet Checksums concatenated in numerical
order as shown in :
The Integrity Block is then followed by (J + 1) transport
layer segments. For TCP, the TCP header Sequence Number field
encodes a 4-octet starting sequence number for the first segment
only, while each additional segment is preceded by its own 4-octet
Sequence Number field. For this reason, the length of the first
segment is only (L-4) octets since the 4-octet TCP header
Sequence Number field applies to that segment. (All non-first
TCP segments instead begin with their own Sequence Number
headers, with the 4-octet length included in L and K.)The Parcel Payload option Nsegs value unambiguously determines
the number of 2-octet Checksums present in the Integrity Block and
(together with the IP {Total, Payload} Length and Parcel Payload
Length) also determines the number of parcel data segments present.
Nodes that process and forward IP parcels therefore observe the
following requirements:if the Parcel Payload Length indicates insufficient space
for the full Integrity Block the receiver discards the parcel.if the length of the payload following the Integrity Block
is (J * L) or less, the receiver processes all initial Checksums
along with their corresponding segments up to the end of the
payload and ignores any remaining Checksums (note that this
also addresses the case of K less than 16).if the length of the payload following the Integrity Block is
greater than ((J + 1) * L) the receiver processes all Checksums
with their corresponding segments and ignores any remaining
payload beyond the end of the final segment.Note: Per-segment Checksums appear in a contiguous Integrity Block
immediately following the {TCP,UDP}/IP headers instead of inline with
the parcel segments to greatly increase the probability that they will
appear in the contiguous head of a kernel receive buffer even if the
parcel was subject to OMNI interface IPv6 fragmentation. This condition
may not always hold if the IPv6 fragments also incur IPv4 encapsulation
and fragmentation over paths that traverse IPv4 links with small MTUs.
Even then, only the fragmented Integrity Block (i.e., and not the
entire parcel) may need to be pulled/copied into the contiguous
head of a kernel receive buffer.Note: For IPv4 parcels, the first 2 octets of the Parcel Payload
option include Code and Check fields in case a router on the path
overwrites the values in a wayward attempt to implement . IPv4 parcel recipients should therefore regard
an incorrect Code or Check value as evidence that the field was
accidentally or intentionally corrupted by a previous hop node.A TCP Parcel is an IP Parcel that includes an IP header plus
extensions with a Parcel Payload option formed as shown in
with Nsegs/J encoding one less than
the number of segments and Parcel Payload Length encoding a
value up to 16,777,215 (2**24 - 1). The IP header plus extensions
is then followed by a TCP header plus options (20 or more octets),
which is then followed by an Integrity Block with (J + 1) consecutive
2-octet Checksums. The Integrity Block is then followed by (J + 1)
consecutive segments, where the first segment is (L-4) octets in length
and uses the 4-octet sequence number found in the TCP header, each
intermediate segment is L octets in length (including its own 4-octet
Sequence Number header) and the final segment is K octets in
length (including its own 4-octet Sequence Number header). The
value L is encoded in the IP header {Total, Payload} Length field
while J is encoded in the Nsegs octet. The overall length of the
parcel as well as final segment length K are determined by Nsegs
and the Parcel Payload length M as discussed above.The source prepares TCP Parcels in an alternative adaptation of
TCP jumbograms . The source calculates a checksum
of the TCP header plus IP pseudo-header only (see: ),
but with the TCP header Sequence Number field temporarily set to 0
during the calculation since the true sequence number will be included
as an integrity pseudo header for the first segment. The source then
writes the calculated value in the TCP header Checksum field as-is (i.e.,
without converting calculated '0' values to 'ffff') and finally re-writes
the actual sequence number back into the Sequence Number field. (Nodes
that verify the header checksum first perform the same operation of
temporarily setting the Sequence Number field to 0 and then resetting
to the actual value following checksum verification.)The source then calculates the checksum of the first segment
beginning with the sequence number found in the full TCP header as a
4-octet pseudo-header then extending over the remaining (L-4) octet
length of the segment. The source next calculates the checksum for
each L octet intermediate segment independently over the length of
the segment (beginning with its sequence number), then finally
calculates the checksum of the K octet final segment (beginning
with its sequence number). As the source calculates each segment(i)
checksum (for i = 0 thru J), it writes the value into the
corresponding Integrity Block Checksum(i) field as-is.Note: The parcel TCP header Source Port, Destination Port and
(per-segment) Sequence Number fields apply to all parcel segments,
while the TCP control bits and all other fields apply only to the
first segment (i.e., "segment(0)"). Therefore, only parcel segment(0)
may be associated with control bit settings while all other
segment(i)'s must be simple data segments.See for additional TCP considerations. See
for additional integrity considerations.A UDP Parcel is an IP Parcel that includes an IP header plus
extensions with a Parcel Payload option formed as shown in
with Nsegs/J encoding one less than
the number of segments and Parcel Payload Length encoding a value
up to 16,777,215 (2**24 - 1). The IP header plus extensions is then
followed by an 8-octet UDP header followed by an Integrity Block
with (J + 1) consecutive 2-octet Checksums followed by (J + 1)
transport layer segments. Each segment must begin with a
transport-specific start delimiter (e.g., a segment identifier)
included by the transport layer user of UDP. The length of the
first segment L is encoded in the IP {Total, Payload} Length
field while J is encoded in the Nsegs octet. The overall
length of the parcel as well as the final segment length are
determined by the Parcel Payload Length M as discussed above.The source prepares UDP Parcels in an alternative adaptation of
UDP jumbograms . The source first MUST set the UDP
header length field to 0, then calculates the checksum of the UDP header
plus IP pseudo-header (see: ) and writes the
calculated value in the UDP header Checksum field as-is (i.e., without
converting calculated '0' values to 'ffff').The source then calculates a separate checksum for each segment
for which checksums are enabled independently over the length of the
segment. As the source calculates each segment(i) checksum (for
i = 0 thru J), it writes the value into the corresponding Integrity
Block Checksum(i) field with calculated '0' values converted to
'ffff'; for segments with checksums disabled, the source instead
writes the value '0'.See: for additional integrity considerations.During {TCP,UDP} parcel assembly, the network layer of the source
fully populates IP header fields including the source address,
destination address and Parcel Payload option as discussed above. The
source also sets IP {Total, Payload} Length to L (between 16 and
65535) to distinguish the parcel from a basic or advanced jumbogram
(see: ).The network layer of the source also maintains a randomly-initialized
32-bit cached Identification value for each destination. For each parcel
transmission, the source sets the Parcel Payload option PMTU to the
minimum of the next hop link MTU and (2**24 - 1) then sets Identification
to the current cached value for this destination. The source then
increments the cached value by 1 (modulo 2**32) for each successive
transmission and can later reset the cached value to a new random
number, e.g., to maintain an unpredictable profile.The network layer of the source next presents each parcel to an
interface for transmission to the next hop. For ordinary interface
attachments to parcel-capable links, the source simply admits each
parcel into the interface the same as for any IP packet where it
may be forwarded by one or more routers over additional consecutive
parcel-capable links possibly even traversing the entire forward
path to the final destination. If any node in the path does not
recognize the parcel construct, it drops the parcel and may return
an ICMP "Parameter Problem" message.When the next hop link does not support parcels at all, or when
the next hop link is parcel-capable but configures an MTU that is
too small to pass the entire parcel, the source breaks the parcel
up into individual IP packets (in the first case) or into smaller
sub-parcels (in the second case). In the first case, the source
can apply "packetization" using Generic Segment Offload (GSO), and
the final destination can apply "restoration" using Generic Receive
Offload (GRO) to deliver the largest possible parcel buffer(s)
to the transport layer. In the second case, the source can apply
"parcellation" to break the parcel into sub-parcels which each
contain the same Identification value and with the S flag set
appropriately. The final destination can then apply "reunification"
to deliver the largest possible parcel buffer(s) to the transport
layer. In all other ways, the source processes of breaking a
parcel up into individual IP packets or smaller sub-parcels
entail the same considerations as for a router on the path that
invokes these processes as discussed in the following subsections.Each parcel serves as an implicit probe that tests the forward
path's ability to pass parcels. Each parcel header also includes a
PMTU field initialized by the source as specified above and each
router in the path rewrites PMTU in the same fashion as for .
In particular, each router compares the parcel PMTU value with
the next hop link MTU in the parcel path and MUST (re)set PMTU to
the minimum value. Note that the fact that the parcel traversed
a previous hop link should provide sufficient evidence of forward
progress since parcel path MTU determination is unidirectional in
the forward path only. However, nodes can also include the previous
hop link MTU in their minimum PMTU calculations in case the link
may have an ingress size restriction (such as a receive buffer
limitation). Each parcel also includes one or more transport layer
segments corresponding to the 5-tuple for the flow, which may also
include {TCP,UDP} segment size probes used for packetization layer
path MTU discovery .
(See: for further details on parcel path
probing.)When a router receives an IPv4 parcel it first compares Code with
255 and Check with the IPv4 header TTL; if either value differs, the
router drops the parcel and returns a negative Jumbo Report (see ). For all other IP parcels, the router next compares
the value L with the next hop link MTU. If the next hop link is parcel
capable but with MTU too small to pass a parcel with a single segment
of length L the router discards the parcel and returns a positive
Jumbo Report with MTU set to the next hop link MTU. If the next hop
link is not parcel capable and has an MTU too small to pass or an
individual IP packet with a single segment of length L the router
discards the parcel and instead returns a positive Parcel Report with
MTU set to the next hop link MTU. Otherwise, for IPv4 parcels if the
next hop link is parcel capable the router MUST reset Check to the
same value that would appear in the IPv4 header TTL field upon
transmission to the next hop.If the router recognizes parcels but the next hop link in the path
does not, or if the entire parcel would exceed the next hop link MTU, the
router instead opens the parcel. The router then forwards each enclosed
segment in individual IP packets or in a set of smaller sub-parcels that
each contain a subset of the original parcel's segments. If the next
hop link is via an OMNI interface, the router instead proceeds according
to OMNI Adaptation Layer procedures. These considerations are discussed
in detail in the following sections.For transmission of individual IP packets over links that do not
support parcels, the source or router (i.e., the node) engages GSO
to perform packetization. The node first determines whether an
individual packet with segment of length L can fit within the next
hop link MTU. If not, the node drops the parcel and returns a positive
Parcel Report message with MTU set to the next hop link MTU and with the
leading portion of the parcel beginning with the IP header as the
"packet in error". Otherwise, the node removes the Parcel Payload
option, sets aside and remembers the Integrity Block (and for TCP
also sets aside and remembers the Sequence Number header values of
each non-first segment) then copies the {TCP,UDP}/IP headers (but
with the Parcel Payload option removed) followed by segment(i)
(for i= 0 thru J) into 'i' individual IP packets ("packet(i)").For each IP packet(i), the node then clears the TCP control bits
in all but packet(0), and includes only those TCP options that are
permitted to appear in data segments in all but packet(0) which may
also include control segment options (see:
for further discussion). The node then sets IP {Total, Payload}
Length for each packet(i) based on the length of segment(i) according
to the IP protocol standards .For each IPv6 packet(i), the node includes an IPv6 Fragment Header
and sets the Identification field to the value found in the parcel
header. For each IPv4 packet(i), the node sets the Identification
field to the least significant 16 bits of the value found in the
parcel header and sets the (D)ont Fragment flag to '1'. For each
IP packet(i), the node then sets both the Fragment Offset field
and (M)ore fragments flag to '0' to produce an unfragmented IP
packet (IPv6 destinations will process these "atomic fragments"
as whole packets instead of admitting them into the reassembly
cache, i.e., the same as for IPv4). The node then processes
further according to transport layer protocol conventions
as follows.For TCP, the node calculates the checksum up to the end of
packet(0)'s TCP/IP headers only according to
but with the sequence number value saved and the field set to 0. The
node then adds Integrity Block Checksum(0) to the calculated value
and writes the sum into packet(0)'s TCP Checksum field. The node then
resets the Sequence Number field to packet(0)'s saved sequence number
and forwards packet(0) to the next hop. The node next calculates the
checksum of packet(1)'s TCP/IP headers with the Sequence Number field
set to 0 and saves the calculated value. In each non-first packet(i)
(for i = 1 thru J), the node then adds the saved value to Integrity
Block Checksum(i), writes the sum into packet(i)'s TCP Checksum
field, sets the TCP Sequence Number field to packet(i)'s sequence
number then forwards packet(i) to the next hop.For UDP, the node sets the UDP length field according to in each packet(i) (for i= 0 thru J). If Integrity
Block Checksum(i) is 0, the node then sets the UDP Checksum field
to 0, forwards packet(i) to the next hop and continues to the next.
The node next calculates the checksum over packet(i)'s UDP/IP
headers only according to . If Integrity Block
Checksum(i) is not 'ffff', the node then adds the value to the header
checksum; otherwise, the node re-calculates the checksum for segment(i).
If the re-calculated segment(i) checksum value is 'ffff' or '0' the
node adds the value to the header checksum; otherwise, it continues
to the next packet(i). The node finally writes the total checksum
value into the packet(i) UDP Checksum field (or writes 'ffff' if
the total was '0') and forwards packet(i) to the next hop.Note: For each UDP packet(i), the node must recalculate
the segment checksum if Checksum(i) is 'ffff', since that value is
shared by both '0' and 'ffff' calculated checksums. If recalculating
the checksum produces an incorrect value, the node can optionally
drop or forward (noting that the forwarded packet would simply be
discarded as an error by the final destination). For each {TCP,UDP}
packet(i), the node can optionally re-calculate and verify the
segment checksum unconditionally before forwarding, but this may
introduce unacceptable delay and processing overhead.Note: Packets resulting from packetization may be too large
to transit the remaining path to the final destination, such that
a router may drop the packet(s) and possibly also return an
ordinary ICMP PTB message. Since these messages cannot be
authenticated or may be lost on the return path, the original
source should take care in setting a segment size larger than
the known path MTU.For transmission of smaller sub-parcels over parcel-capable links,
the source or router (i.e., the node) first determines whether a single
segment of length L can fit within the next hop link MTU if packaged as
a (singleton) sub-parcel. If not, the node returns a positive Jumbo Report
message with MTU set to the next hop link MTU and containing the leading
portion of the parcel beginning with the IP header, then drops the parcel.
Otherwise, the node employs network layer parcellation to break the original
parcel into smaller groups of segments that would fit within the path MTU
by determining the number of segments of length L that can fit into each
sub-parcel under the size constraints. For example, if the node determines
that a sub-parcel can contain 3 segments of length L, it creates sub-parcels
with the first containing Integrity Block Checksums/Segments 0-2, the
second containing Checksums/Segments 3-5, etc., and with the final
containing any remaining Checksums/Segments.The node then appends identical {TCP,UDP}/IP headers (including the
Parcel Payload option and any other extensions) to each sub-parcel while
resetting ({Total, Payload} Length/L) and (Parcel Payload Length/M) in
each according to the above equations with Nsegs/J set to 2 for each
intermediate sub-parcel and with Nsegs/J set to one less than the
remaining number of segments for the final sub-parcel. For TCP, the
node then clears the TCP control bits in all but the first sub-parcel
and includes only those TCP options that are permitted to appear in
data segments in all but the first sub-parcel (which may also include
control segment options). For both TCP and UDP, the node then resets
the {TCP,UDP} Checksum according to ordinary parcel formation
procedures (see above). The node then sets the TCP Sequence Number
field to the value that appears in the first sub-parcel segment while
removing the first segment's Sequence Number header (if present).When the node performs parcellation, it examines the "(S)ub-parcel"
flag in the original parcel's Parcel Payload option. If S is '0', the
node sets S to '1' in all resulting sub-parcels except the last (i.e.,
the one containing the final segment of length K, which may be shorter
than L) for which it sets S to '0'. If the S flag is '1', the node
instead sets S to '1' in all resulting sub-parcels including the last.
The node finally sets PMTU to the next hop link MTU then forwards each
(sub-)parcel over the parcel-capable next hop link.For transmission of original parcels or sub-parcels over OMNI
interfaces, the node admits all parcels into the interface
unconditionally since the OMNI interface MTU is unrestricted. The
OMNI Adaptation Layer (OAL) of this First Hop Segment (FHS) OAL
source node then forwards the parcel to the next OAL hop which may
be either an intermediate node or a Last Hop Segment (LHS) OAL
destination. OMNI interface parcellation and reunification
procedures are specified in detail in the remainder of this
section, while parcel encapsulation and fragmentation procedures
are specified in .When the OAL source forwards a parcel (whether generated
by a local application or forwarded over a network path that
traversed one or more parcel-capable links), it first assigns a
monotonically-incrementing (modulo 255) adaptation layer "Parcel ID".
If the parcel is larger than the OAL maximum segment size of 65535
octets, the OAL source then employs adaptation layer parcellation to
break the parcel into sub-parcels the same as for the network layer
procedures discussed above. The OAL source next assigns a different
monotonically-incrementing adaptation layer Identification value for
each sub-parcel of the same Parcel ID then performs adaptation layer
encapsulation and fragmentation and finally forwards each fragment
to the next OAL hop toward the OAL destination as necessary. (During
encapsulation, the OAL source examines the Parcel Payload option S
flag to determine the setting for the adaptation layer fragment
header S flag according to the same rules specified in .)When the sub-parcels arrive at the OAL destination, it can
optionally retain them along with their Parcel ID and Identifications
for a brief time to support reunification with peer sub-parcels of
the same original (sub-)parcel identified by the 3-tuple information
supplied by the OAL source. This reunification entails the
concatenation of Checksums/Segments included in sub-parcels with
the same Parcel ID and with Identification values within 255 of one
another to create a larger sub-parcel possibly even as large as the
entire original parcel. The OAL destination concatenates each
sub-parcel in ascending Identification value order, while ensuring
that any sub-parcel with TCP control bits set appears as the first
concatenated element in a reunified larger parcel and any sub-parcel
with S flag set to '0' appears as the final concatenation. The
OAL destination then sets S to '0' in the reunified (sub-)parcel
if and only if one of its constituent elements also had S set to
'0'; otherwise, it sets S to '1'.The OAL destination then appends a common {TCP,UDP}/IP header
plus extensions to each reunified sub-parcel while resetting J, K, L
and M in the corresponding header fields of each. For TCP, if any
sub-parcel has TCP control bits set the OAL destination regards it
as sub-parcel(0) and uses its TCP header as the header of the
reunified (sub-)parcel with the TCP options including the union of
the TCP options of all reunified sub-parcels. The OAL destination
then resets the {TCP,UDP}/IP header checksum. If the OAL destination
is also the final destination, it then delivers the sub-parcels to
the network layer which processes them according to the 5-tuple
information supplied by the original source. Otherwise, the OAL
destination forwards each sub-parcel toward the final destination
the same as for an ordinary IP packet as discussed above.Note: Adaptation layer parcellation over OMNI links occurs only
at the OAL source while the adaptation layer reunification occurs
only at the OAL destination. The OAL destination can instead avoid
this process if it would negatively impact performance, noting that
forwarding individual sub-parcels without delay and without
reunification is always acceptable (but not always optimal).
Intermediate OAL nodes do not participate in the parcellation or
reunification processes.Note: OMNI interface parcellation and reunification is an OAL
process based on the adaptation layer 3-tuple and not the network
layer 5-tuple. This is true even if the OAL has visibility into
network layer information since some sub-parcels of the same
original parcel may be forwarded over different network paths.When the original source or a router on the path opens a parcel
and forwards its contents as individual IP packets, these packets
will arrive at the final destination which can hold them in a
restoration buffer for a short time then restore the
original parcel using GRO. The 5-tuple information plus the
Identification value provides sufficient context for GRO
restoration which practical implementations have proven can
provide a robust service at high data rates even for IPv4 with
its 16-bit Identification limitation.When the original source or a router on the path opens a parcel
and forwards its contents as smaller sub-parcels, these sub-parcels
will arrive at the final destination which can hold them in a
reunification buffer for a short time or until a sub-parcel with
the S flag set to '0' arrives. The 5-tuple information plus the
Identification value provides sufficient context for reunification,
and both IPv4 and IPv6 will see a full 32-bit Identification.In both the restoration and reunification cases, the
final destination concatenates segments in the order they were
received even if some small degree of reordering and/or loss may
have occurred in the networked path. When the final destination
performs restoration/reunification on TCP segments, however,
it must include the one with any TCP flag bits set as the first
concatenation and with the TCP options including the union of
the TCP options of all concatenated packets or sub-parcels. For
both TCP and UDP, any packet or sub-parcel containing the final
segment (i.e., as told by either the segment length or S flag)
must appear as a final concatenation.The final destination can then present the concatenated parcel
contents to the transport layer with segments arranged in (nearly)
the same order in which they were originally transmitted. Strict
ordering is not required since each segment will include a transport
layer protocol specific start delimiter with positional coordinates.
These procedures eliminate the need for a Fragment Offset value
since each sub-parcel or individual IP packet contains an integral
number of whole transport layer protocol segments which are not
themselves fragmented.Note: Since loss and/or reordering may occur in the network,
the final destination may receive a "short" packet or sub-parcel
with S set to '0' before all other elements of the same original
parcel have arrived. This condition does not represent an error,
but in some cases may cause the network layer to deliver
sub-parcels that are smaller than the original parcel to the
transport layer. The transport layer simply accepts any segments
received from all such deliveries and will request retransmission
of any segments that were lost and/or damaged.Note: Restoration and/or reunification buffer congestion may
indicate that the network layer cannot sustain the service(s) at
current arrival rates. The network layer should then begin to
deliver partial concatenations or even individual segments to
transport layer receive queues (e.g., a socket buffer) instead
of waiting for all segments to arrive. The network layer can
manage restoration/reunification buffers, e.g., by maintaining
buffer occupancy high/low watermarks.All parcels also serve as implicit probes and may cause either a
router in the path or the final destination to return an ordinary
ICMP error and/or
Packet Too Big (PTB) message concerning the parcel. A router in the
path or the final destination may also return either a "Parcel
Report" or "Jumbo Report" (subject to rate limiting per ) as discussed below.To determine whether parcels can transit at least an initial
portion of the forward path toward the final destination, the
original source can also send IP parcels with the Parcel Payload
option P flag set to '1' as an explicit "Parcel Probe". The probe
will cause the final destination or a router on the path to return
a Parcel/Jumbo Report. (The original source should be conservative
in sending explicit Parcel Probes to avoid loss of Reports due to
rate limiting.)A Parcel Probe can be included either in an ordinary data parcel
or a {TCP,UDP}/IP parcel with destination port set to '9' (discard)
. The probe will still contain a valid
{TCP,UDP} parcel header Checksum that any intermediate hops as
well as the final destination can use to detect mis-delivery,
while the final destination will process any parcel data in
probes with correct Checksums.If the original source receives a positive Parcel/Jumbo Report,
it marks the path as "parcels supported" and ignores any ordinary
ICMP and/or PTB messages concerning the probe. If the original
source instead receives a negative Jumbo Report or no report, it
marks the path as "parcels not supported" and may regard any
ordinary ICMP and/or PTB messages concerning the probe (or its
contents) as indications of a possible path limitation.The original source can therefore send Parcel Probes in the
same IP parcels used to carry real data. The probes will traverse
parcel-capable links joined by routers on the forward path possibly
extending all the way to the destination. If the original source
receives a positive Parcel/Jumbo Report, it can continue using IP
parcels after adjusting its segment size if necessary.The original source sends Parcel Probes unidirectionally in the
forward path toward the final destination to elicit a Parcel/Jumbo
Report, since it will often be the case that IP parcels are supported
only in the forward path and not in the return path. Parcel Probes
may be dropped in the forward path by any node that does not
recognize IP parcels, but Parcel/Jumbo Reports must be packaged to
avoid return path filtering. For this reason, the Parcel Payload
options included in Parcel Probes are always packaged as IPv4
header options or IPv6 Hop-by-Hop options while Parcel/Jumbo
Reports are returned as UDP/IP encapsulated ICMPv6 PTB messages
with a "Parcel/Jumbo Report" Code value (see: ).Original sources send ordinary parcels or discard parcels as
explicit Parcel Probes by setting the Parcel Payload option P
flag to '1' and PMTU to the minimum of the next hop link MTU
and (2**24 - 1). The source then sets Nsegs, Parcel Payload
Length, and {Total, Payload} Length, then calculates the header
and per-segment checksums the same as for an ordinary parcel.
The source finally sends the Parcel Probe via the outbound
IP interface.Original sources can send Parcel Probes that include a large
segment size, but these may be dropped by a router on the path even
if the next hop link is parcel-capable. The original source would
then receive a Parcel Report that contains only the MTU of the leading
portion of the path up to the router with the restrictive link. The
original source can instead send Parcel Probes with smaller segments
that would be likely to transit the entire forward path to the final
destination if all links are parcel-capable. For parcel-capable
paths, this may allow the original source to discover both the path
MTU and the MSS in a single message exchange instead of multiple.According to , IPv4 middleboxes (i.e.,
routers, security gateways, firewalls, etc.) that do not observe this
specification should drop IPv4 packets that contain option type
'00001011' ("IPv4 Probe MTU") but some might instead either attempt
to implement or ignore the option altogether.
IPv4 middleboxes that observe this specification instead MUST process
the option as an implicit or explicit Parcel Probe as specified below.According to , IPv6 middleboxes (i.e.,
routers, security gateways, firewalls, etc.) that recognize the IPv6
Jumbo Payload option but do not observe this specification should
return an ICMPv6 Parameter Problem message (and presumably also drop
the packet) due to validation rules for ordinary jumbograms since
the parcel includes a non-zero IP {Total, Payload} Length. IPv6
middleboxes that observe this specification instead MUST process
the option as an implicit or explicit Parcel Probe as specified below.When a router that observes this specification receives an IPv4
Parcel Probe it first compares Code with 255 and Check with the IP
header TTL; if either value differs, the router drops the probe
and returns a negative Jumbo Report (see below). For all other IP
Parcel Probes, if the next hop link is non-parcel-capable the router
compares PMTU with the next hop link MTU and returns a positive
Parcel Report (see below) with MTU set to the minimum value. If the
next hop link configures a sufficiently large MTU, the router then
applies packetization to convert the probe into individual IP
packet(s) and forwards each packet to the next hop; otherwise,
it drops the probe.If the next hop link both supports parcels and configures an MTU
that is large enough to pass the probe, the router instead compares
the probe PMTU with the next hop link MTU and MUST (re)set PMTU to
the minimum value then forward the probe to the next hop (and for
IPv4 first reset Check to the same value that will appear in the
IPv4 header TTL upon transmission to the next hop). If the next
hop link supports parcels but configures an MTU that is too small
to pass the probe, the router resets PMTU (and Check if necessary)
then applies parcellation to break the probe into multiple smaller
sub-parcels that can traverse the link while setting the P flag to
'1' only for the first sub-parcel. If the next hop link supports
parcels but configures an MTU that is too small to pass a singleton
sub-parcel of the probe, the router instead drops the probe and
returns a positive Jumbo Report with MTU set to the next hop
link MTU.The final destination may therefore receive one or more individual
IP packets or sub-parcels including an intact Parcel Probe. If the
final destination receives individual IP packets, it performs any
necessary integrity checks, applies restoration/GRO if possible
then delivers the (restored) parcel contents to the transport
layer. If the final destination receives an IPv4 Parcel Probe, it
first compares Code with 255 and Check with the IPv4 header TTL;
if either value differs, the final destination drops the probe and
returns a negative Jumbo Report. For all other Parcel Probes, the
final destination instead returns a positive Jumbo Report, applies
reunification then delivers the (reunified) parcel contents
to the transport layer.When a router or final destination returns a Parcel/Jumbo Report,
it prepares an ICMPv6 PTB message with Code
set to either "Parcel Report" or "Jumbo Report" (see: ) and with MTU set to either the
minimum MTU value for a positive report or to '0' for a negative
report. The node then writes its own IP address as the Parcel/Jumbo
Report source and writes the source address of the packet that
invoked the report as the Parcel/Jumbo Report destination (for
IPv4 Parcel Probes, the node writes the Parcel/Jumbo Report
address as an IPv4-Compatible IPv6 address ).
The node next copies as much of the leading portion of the invoking
packet as possible (beginning with the IP header) into the "packet
in error" field without causing the entire Parcel/Jumbo Report
(beginning with the IPv6 header) to exceed 512 octets in length. The
node then sets the Checksum field to 0 instead of calculating and
setting a true checksum.Since IPv6 packets cannot traverse IPv4 paths, and since middleboxes
often filter ICMPv6 messages as they traverse IPv6 paths, the node next
wraps the Parcel/Jumbo Report in UDP/IP headers of the correct IP version
with the IP source and destination addresses copied from the Parcel/Jumbo
Report and with UDP port numbers set to the OMNI UDP port number . The node then calculates and sets
the UDP Checksum (and for IPv4 clears the DF bit). The node finally
sends the prepared Parcel/Jumbo Report to the original source of
the probe.After sending a Parcel Probe (or an ordinary parcel) the original
source may therefore receive a UDP/IP encapsulated Parcel/Jumbo Report
and/or one or more transport layer protocol probe replies. If the
source receives a Parcel/Jumbo Report, it verifies the UDP Checksum
then verifies that the ICMPv6 Checksum is 0. If both Checksums are
correct, the node then matches the enclosed PTB message with an original
probe/parcel by examining the ICMPv6 "packet in error" containing the
leading portion of the invoking packet. If the "packet in error" does
not match one of its previous packets, the source discards the
Parcel/Jumbo Report; otherwise, it continues to process.If the node received a Parcel/Jumbo Report with MTU '0', the
source marks the path as "parcels not supported"; otherwise, it marks
the path as "parcels supported" and also records the MTU value as the
parcel path MTU (i.e., the portion of the path up to and including
the node that returned the Parcel/Jumbo Report). If the MTU value is
65535 (plus headers) or larger, the MTU determines the largest whole
parcel that can traverse the path without packetization/parcellation
while using any segment size up to and including the maximum. For
Reports that include a smaller MTU, the value represents both the
largest whole parcel size and a maximum segment size limitation.
In that case, the maximum parcel size that can traverse the initial
portion of the path may be larger than the maximum segment size that
can continue to traverse the remaining path to the final destination.Note: If a router or final destination receives a Parcel Probe but
does not recognize the parcel construct, it drops the probe without
further processing (and may return an ICMP error). The original
source will then consider the probe as lost, but may attempt to
probe again later, e.g., in case the path may have changed.Note: When the source examines the "packet in error" portion of
a Parcel/Jumbo Report, it can easily match the Report against its
recent transmissions if the Identification value is available. For
"packets in error" that do not include an Identification, the source
can attempt to match based on any other identifying information
available; otherwise, it should discard the message.The {TCP,UDP}/IP header plus each segment of a (multi-segment) IP
parcel includes its own integrity check. This means that IP parcels can
support stronger and more discrete integrity checks for the same amount
of transport layer protocol data compared to an individual IP packet or
jumbogram. The {TCP/UDP} Checksum header integrity check can be verified
at each hop to ensure that parcels with errored headers are detected.
The per-segment Integrity Block Checksums are set by the source and
verified by the final destination, noting that TCP parcels must honor
the sequence number discipline discussed in .IP parcels can range in length from as small as only the {TCP,UDP}/IP
headers plus a single Integrity Block Checksum with a single segment to
as large as the headers plus (256 * 65535) octets. Although link layer
integrity checks such as CRC-32 provide sufficient protection for
contiguous data blocks up to approximately 9KB, reliance on link-layer
integrity checks may be inadvisable for links with significantly larger
MTUs and may not be possible at all for links such as tunnels over IPv4
that invoke fragmentation. Moreover, the segment contents of a received
parcel may arrive in an incomplete and/or rearranged order with respect
to their original packaging.Each network layer forwarding hop as well as the final destination
should verify the {TCP,UDP}/IP Checksum at its layer, since an errored
header could result in mis-delivery. If a network layer protocol entity
on the path detects an incorrect {TCP,UDP}/IP Checksum it should discard
the entire IP parcel unless the header(s) can somehow first be repaired
by lower layers.To support the parcel header checksum calculation, the network
layer uses modified versions of the {TCP,UDP}/IPv4 "pseudo-header"
found in ,
or the {TCP,UDP}/IPv6 "pseudo-header" found in Section 8.1 of
. Note that while the contents of the
two IP protocol version-specific pseudo-headers beyond the address
fields are the same, the order in which the contents are arranged
differs and must be honored according to the specific IP protocol
version as shown in . This allows for maximum
reuse of widely deployed code while ensuring interoperability.where the following fields appear in both pseudo-headers:
Source Address is the 4-octet IPv4 or 16-octet IPv6 source
address of the prepared parcel.Destination Address is the 4-octet IPv4 or 16-octet IPv6
destination address of the prepared parcel.zero encodes the constant value '0'.Next Header is the IP protocol number corresponding to the
transport layer protocol, i.e., TCP or UDP.Segment Length is the value that appears in the IP
{Total, Payload} Length field of the prepared parcel.Nsegs is the 1-octet value that appears in the Parcel
Payload Option field of the same name.Parcel Payload Length is the 3-octet value that appears in
the Parcel Payload Option field of the same name.Transport layer protocol entities coordinate per-segment checksum
processing with the network layer using a control mechanism such as
a socket option. If the transport layer sets a SO_NO_CHECK(TX) socket
option, the transport layer is responsible for supplying per-segment
checksums on transmission and the network layer forwards the IP parcel
to the next hop without further processing; otherwise, the network
layer supplies the per-segment checksums before forwarding. If the
transport layer sets a SO_NO_CHECK(RX) socket option, the transport
layer is responsible for verifying per-segment checksums on reception
and the network layer delivers each received parcel body to the
transport layer without further processing; otherwise, the network
layer verifies the per-segment parcel checksums before delivering.When the transport layer protocol entity of the source delivers a
parcel body to the network layer, it prepends an Integrity Block of
(J + 1) 2-octet Checksum fields and includes a 4-octet Sequence Number
field with each TCP non-first segment. If the SO_NO_CHECK(TX) socket
option is set, the transport layer protocol either calculates each
segment checksum and writes the value into the corresponding Checksum
field (and for UDP with '0' values written as 'ffff') or writes the
value '0' to disable specific UDP segment checksums. If the
SO_NO_CHECK(TX) socket options is clear, for UDP the transport
layer instead writes the value '0' to disable or any non-zero value
to enable checksums for specific segments (for TCP, the transport
layer instead writes any value).When the network layer of the source accepts the parcel body from
the transport layer protocol entity, if the SO_NO_CHECK(TX) socket option
is set the network layer appends the {TCP,UDP}/IP headers and forwards
the parcel to the next hop without further processing. If the
SO_NO_CHECK(TX) socket option is clear, the network layer instead
calculates the checksum for each TCP segment (or each UDP segment
with a non-zero value in the corresponding Integrity Block Checksum
field) and overwrites the calculated value into the Checksum field
(and for UDP with '0' values written as 'ffff').When the network layer of the destination receives a
parcel from the source, if the SO_NO_CHECK(RX) socket option is set the
network layer delivers the parcel body to the transport layer protocol
entity without further processing, and the transport layer is responsible
for per-segment checksum verification. If the SO_NO_CHECK(RX) socket
option is clear, the network layer instead verifies the checksum for
each TCP segment (or each UDP segment with a non-zero value in the
corresponding Integrity Block Checksum field) and marks a corresponding
flag for the segment in an ancillary data structure as either "correct"
or "incorrect". (For UDP, if the Checksum is '0' the network layer
unconditionally marks the segment as "correct".) The network layer
then delivers both the parcel body (beginning with the Integrity block)
and ancillary data to the transport layer which can then determine
which segments have correct/incorrect checksums.Note: The Integrity Block itself is intentionally omitted from the IP
Parcel {TCP,UDP} header checksum calculation. This permits destinations
to accept as many intact segments as possible from received parcels with
checksum block bit errors, whereas the entire parcel would need to be
discarded if the header checksum also covered the Integrity Block.This specification introduces an IP "advanced jumbo" service as an
alternative to basic IPv6 jumbograms that also includes a path probing
function based on the mechanisms specified in .
The function employs an "Advanced Jumbo Option" with the same Option
Type and Option Data Length values as for the Parcel Payload option,
but with the Nsegs and Parcel Payload Length fields converted to a
single 32-bit Jumbo Payload Length field and with the final 4 octets
converted to a single 32-bit PMTU field as shown in :
The Advanced Jumbo format is shared by both
advanced jumbos and "Jumbo Probes". The purpose of the Jumbo
Probe is to determine whether the entire path from the source
to the destination is advanced jumbo capable (i.e., one in
which all links can forward advanced jumbos) as well as to
determine the jumbo path MTU.The source prepares a Jumbo Probe by first setting the IP
{Total, Payload} length field to the special "Type" value '1'
to distinguish this as a Jumbo Probe and not a basic jumbogram
or parcel. The source then sets {Protocol, Next Header} to
{TCP,UDP}, sets the {TCP,UDP} port to '9' (discard) and either
includes no octets beyond the {TCP,UDP} header or a single
discard segment of the desired probe size immediately following
the header and with no Integrity Block included. The source then
sets Jumbo Probe Length to the length of the {TCP,UDP} header
plus the length of the discard segment plus the length of the
full IP header for IPv4 or the extension headers for IPv6.The source next sets Identification the same as for an IP
Parcel Probe, sets the Jumbo Probe PMTU to the full 32-bit MTU
of the (jumbo-capable) next hop link, and for IPv4 sets Code to
255 and Check to the next hop TTL. The source then calculates the
{TCP,UDP} Checksum based on the same pseudo header as for an
ordinary parcel (see: ) but with the
{Nsegs; Parcel Payload Length} fields replaced with a 32-bit
Jumbo Probe Length field and with the Segment Length replaced
with the Type value. The source then calculates the checksum
over the pseudo header then continues the calculation over the
entire length of the probe segment. The source then sends the
Jumbo Probe via the next hop link toward the final destination.At each IPv4 forwarding hop, the router examines Code and Check
and returns a negative "Jumbo Report" (i.e., prepared the same as
for a Parcel Report) if either value is incorrect. Otherwise, if
the next hop link is jumbo-capable the router compares PMTU to
the next hop link MTU, resets PMTU to the minimum value (and for
IPv4 sets Check to the next hop TTL) then forwards the probe to
the next hop. If the next hop link is not jumbo-capable, the
router instead drops the probe and returns a negative Jumbo
Report.If the Jumbo Probe encounters an OMNI link, the OAL source can
either drop the probe and return a negative Jumbo Report or forward
the probe further toward the OAL destination using adaptation layer
encapsulation. If the OAL source already knows the OAL path MTU
for this OAL destination, it can encapsulate and forward the Jumbo
Probe with PMTU set to the minimum of itself and the known value
(minus the adaptation layer header size), and without adding any
padding octets.If the OAL path MTU is unknown, the OAL source can instead
encapsulate the Jumbo Probe in an adaptation layer IPv6 header
with a Jumbo Payload option and with NULL padding octets added
beyond the end of the encapsulated Jumbo Probe to form an
adaptation layer jumbogram no larger than the minimum of PMTU
and (2**24 - 1) octets (minus the adaptation layer header size).The OAL source then writes this size into the Jumbo Probe
PMTU field and forwards the newly-created adaptation layer
jumbogram toward the OAL destination, where it may be lost
due to a link restriction. If the jumbogram somehow traverses
the path, the OAL destination then removes the adaptation layer
encapsulation, discards the padding, then forwards the probe
onward toward the final destination (with each hop reducing
PMTU if necessary).When a router on the path forwards a Jumbo Probe, it drops
and returns a Jumbo Report if the next hop MTU is insufficient;
otherwise, it forwards to the next hop toward the final destination.
When the final destination receives the Jumbo Probe, it returns
a Jumbo Report with the PMTU set to the maximum-sized jumbo that
can transit the path.After successfully probing the path, the original source can
begin sending advanced jumbos by setting the IP {Total, Payload}
length field to the special Type value '2' and with the Checksum
calculated the same as described for Type '1' above. When the
final destination receives an advanced jumbo, it first verifies
the Checksum then delivers the data to the transport layer without
returning a Jumbo Report. The source can continue to send advanced
jumbos into the path with the possibility that the path may change.
In that case, a router in the network may return an ICMP error,
an ICMPv6 PTB, or a Jumbo Report if the path MTU decreases.Note: If the original source can in some way determine that a
Jumbo Probe is likely to transit the path without loss due to a
size restriction, it can optionally include a real {TCP,UDP} data
segment instead of a discard segment. The network layer of the
final destination will then deliver the data to the transport
layer and return a Probe Report the same as discussed above.Note: If the OAL source can in some way determine that a very
large packet is likely to transit the OAL path, it can encapsulate
a Jumbo Probe to form an adaptation layer jumbogram larger than
(2**24 - 1) octets with the understanding that the time required
to transit the path determines acceptable jumbogram sizes.Note: The Jumbo Report message types returned in response to
both Parcel and Jumbo Probes are one and the same, and signify
that both parcels and jumbos at least as large as the reported
MTU can transit the path. However, a Jumbo Report response to a
Parcel Probe is limited to a maximum MTU size of (2**24 - 1)
while the response to a Jumbo Probe may report a larger size
even over the same path.Minimal IP parcels and advanced jumbos are distinguished from
expanded-format parcels/jumbos by including an Option Data Length
of '00000100' for IPv6 of '00001000' for IPv4. For IPv6, the source
also sets Option Type to '010001110' (i.e., with bit 3 cleared)
since the Option Data cannot change in the path and can therefore
be included in an upper layer protocol integrity check.Minimal advanced jumbos also include a Type value of 1 or 2 in
the IP {Total, Payload} Length field, while basic IPv6 jumbograms
with Payload Length of 0 are processed per .
(IPv4 packets with Total Length of 0 are undefined and must be dropped.)The option formats for IPv4 are shown in
and the option formats for IPv6 are shown in .
The original source can send minimal parcels or advanced jumbos
after successfully probing a path to confirm that it can transit
a given size over its entire length to the final destination.
Minimal parcels/jumbos use a reduced-length IP option that omits
the Identification and Path MTU fields and therefore cannot transit
a router that performs network layer packetization or parcellation
(although they can transit an OMNI interface that applies adaptation
layer parcellation if necessary).End systems and routers process minimal parcels the same as for
expanded parcels as specified in previous sections. If a router needs
to drop a minimal parcel, it returns a Parcel/Jumbo Report the same
as for an expanded parcel (noting that the encapsulated parcel body
will not contain an Identification and Path MTU field).End systems and routers process minimal advanced jumbos with
Type value '1' or '2' in the IP {Total, Payload} Length field
the same as for expanded advanced jumbos as specified in . If a router needs to drop a minimal advanced
jumbo, it returns a Jumbo Report the same as for an expanded
advanced jumbo.End system routers silently discard basic IPv6 jumbograms with
the value '0' in the IPv6 payload length field and any value other
than '00000100' in the Option Length field. End systems and routers
process basic IPv6 jumbograms with the value '00000100' in the
Option Length field as specified in . End systems and routers silently discard all IPv4 jumbograms
with the value '0' in the IPv4 Total Length field, as no basic
IPv4 jumbogram service is defined for IPv4.Note: If the path changes, routers in the path may cease
forwarding minimal parcels/jumbograms and begin returning ICMP
errors, ICMP PTBs and/or Parcel/Jumbo Reports. In response, the
original source should re-probe to determine whether the path
MTU has been reduced and/or whether the path can still support
parcels/jumbos at all.Common widely-deployed implementations include services such as TCP
Segmentation Offload (TSO) and Generic Segmentation/Receive Offload
(GSO/GRO). These services support a robust service that has been
shown to improve performance in many instances.UDP/IPv4 parcels have been implemented in the linux-5.10.67 kernel and
ION-DTN ion-open-source-4.1.0 source distributions. Patch distribution
found at: "https://github.com/fltemplin/ip-parcels.git".Performance analysis with a single-threaded receiver has shown that
including increasing numbers of segments in a single parcel produces
measurable performance gains over fewer numbers of segments due to more
efficient packaging and reduced system calls/interrupts. For example,
sending parcels with 30 2000-octet segments shows a 48% performance
increase in comparison with ordinary IP packets with a single
2000-octet segment.Since performance is strongly bounded by single-segment receiver
processing time (with larger segments producing dramatic performance
increases), it is expected that parcels with increasing numbers of
segments will provide a performance multiplier on multi-threaded
receivers in parallel processing environments.The IANA is instructed to change the "MTUP - MTU Probe" entry in the
'ip option numbers' registry to the "JUMBO - IPv4 Jumbo Payload" option.
The Copy and Class fields must both be set to 0, and the Number and
Value fields must both be set to '11'. The reference must be changed to
this document [RFCXXXX].The IANA is instructed to create and maintain a new registry entitled
"IP Jumbogram Types". For IP packets that include a Jumbo Payload Option,
the IP {Total, Header} Length field encodes a "Jumbo Type" value instead
of an ordinary length. Initial values are given below:
In the control plane, original sources match any identifying
information in received Parcel/Jumbo Reports with their
corresponding probes. If the information matches, the report is
likely authentic. In environments where stronger authentication
is necessary, nodes that send Parcel and/or Jumbo Reports can
apply the message authentication services specified for AERO/OMNI.In the data plane, multi-layer security solutions may be needed
to ensure confidentiality, integrity and availability. Since parcels
are defined only for TCP and UDP, IP layer securing services such as
IPsec-AH/ESP cannot be applied directly to
parcels, although they can certainly be used below the network or
adaptation layers such as for transmission of parcels over VPNs
and/or OMNI link secured spanning trees. Since the network layer
does not manipulate transport layer segments, parcels do not
interfere with transport- or higher-layer security services such
as (D)TLS/SSL which may provide greater
flexibility in some environments.Further security considerations related to IP parcels are found
in the AERO/OMNI specifications.This work was inspired by ongoing AERO/OMNI/DTN investigations. The
concepts were further motivated through discussions with colleagues.A considerable body of work over recent years has produced useful
"segmentation offload" facilities available in widely-deployed
implementations.With the advent of networked storage, big data, streaming media
and other high data rate uses the early days of Internetworking have
evolved to accommodate the need for improved performance. The need
fostered a concerted effort in the industry to pursue performance
optimizations at all layers that continues in the modern era. All
who supported and continue to support advances in Internetworking
performance are acknowledged.Accelerating UDP packet transmission for QUIC,
https://blog.cloudflare.com/accelerating-udp-packet-transmission-for-quic/BIG TCP, Netdev 0x15 Conference (virtual),
https://netdevconf.info/0x15/session.html?BIG-TCPTCP Extensions for High Performance are specified in , which updates earlier work that began in the late
1980's and early 1990's. These efforts determined that the TCP 16-bit
Window was too small to accommodate sustained transmission at high
data rates and devised a TCP Window Scale option to allow window
sizes up to 2^30. The work also defined a Timestamp option used
for round-trip time measurements and as a Protection Against Wrapped
Sequences (PAWS) at high data rates. TCP users of IP parcels are
strongly encouraged to adopt these measures.Since TCP/IP parcels only include control bits for the first
segment ("segment(0)"), nodes must regard all other segments of the
same parcel as data segments. When a node breaks a TCP/IP parcel out
into individual packets or sub-parcels, only the first packet/sub-parcel
contains the original segment(0) and therefore only its TCP header
retains the control bit settings from the original parcel TCP header.
If the original TCP header included TCP options such as Maximum Segment
Size (MSS), Window Scale (WS) and/or Timestamp, the node copies those
same options into the options section of the new TCP header.For all other packets/sub-parcels, the note sets all TCP header
control bits to '0' as data segment(s). Then, if the original parcel
contained a Timestamp option, the node copies the Timestamp option
into the options section of the new TCP header. Appendix A of
provides implementation guidelines for
the Timestamp option layout.Appendix A of also discusses Interactions
with the TCP Urgent Pointer as follows: "if the Urgent Pointer
points beyond the end of the TCP data in the current segment, then
the user will remain in urgent mode until the next TCP segment arrives.
That segment will update the Urgent Pointer to a new offset, and the
user will never have left urgent mode". In the case of IP parcels,
however, it will often be the case that the "next TCP segment" is
included in the same (sub-)parcel as the segment that contained
the urgent pointer such that the urgent pointer can be updated
immediately.Finally, if the parcel contains more than 65535 octets of data
(i.e., spread across multiple segments), then the Urgent Pointer
can be regarded in the same manner as for jumbograms as described
in Section 5.2 of .The transport layer can specify any L value between 16 and
65535 octets. While acceptable within standard parcel parameters,
"extreme" L values as small as 16 should appear only in control
segments since transport protocols normally exchange data segments
that are considerably larger. Transport protocols that send small
isolated control and/or data segments may instead elect to package
them as ordinary packets while packaging larger data segments as
parcels. Transport protocol streams therefore often include a
mix of parcels and ordinary packets.The transport layer should also specify an L value no larger
than can accommodate the maximum-sized transport and network layer
headers that the source will include without causing a single
segment plus headers to exceed 65535 octets. For example, if the
source will include a 28 octet TCP header plus a 40 octet IPv6
header with 24 extension header octets (plus a 2 octet per-segment
checksum) the transport should specify an L value no larger
than (65535 - 28 - 40 - 24 - 2) = 65441 octets.The transport can specify still larger "extreme" L values up
to 65535 octets, but the resulting parcels might be lost along
some paths with unpredictable results. For example, a parcel
with an extreme L value set as large as 65535 might be able to
transit paths that can pass jumbograms natively but might not
be able to transit a path that includes non-jumbo links. The
transport layer should therefore carefully consider the benefits
of constructing parcels with extreme L values larger than the
recommended maximum due to high risk of loss compared with only
minor potential performance benefits.Parcels that include extreme L values larger than the
recommended maximum and with a maximum number of included
segments could also cause a parcel to exceed 16,777,215
(2**24 - 1) octets in total length. Since the Parcel Payload
Length field is limited to 24 bits, however, the largest
possible parcel is also limited by this size. See also the
above risk/benefit analysis for parcels that include extreme
L values larger than the recommended maximum.Both historic and modern-day data links configure Maximum
Transmission Units (MTUs) that are far smaller than the desired
state for Internetworking futures. When the first Ethernet data
links were deployed many decades ago, their 1500 octet MTU set a
strong precedent that was widely adopted. This same size now
appears as the predominant MTU limit for most paths in the
Internet today, although modern link deployments with MTUs
as large as 9KB have begun to emerge.In the late 1980's, the Fiber Distributed Data Interface (FDDI)
standard defined a new link type with MTU slightly larger than 4500
octets. The goal of the larger MTU was to increase performance by a
factor of 10 over the ubiquitous 10Mbps and 1500-octet MTU Ethernet
technologies of the time. Many factors including a failure to harmonize
MTU diversity and an Ethernet performance increase to 100Mbps led to
poor FDDI market reception. In the next decade, the 1990's saw new
initiatives including ATM/AAL5 (9KB MTU) and HiPPI (64KB MTU) which
offered high-speed data link alternatives with larger MTUs but again
the inability to harmonize diversity derailed their momentum. By the
end of the 1990s and leading into the 2000's, emergence of the 1Gbps,
10Gbps and even faster Ethernet performance levels seen today has
obscured the fact that the modern Internet of the 21st century is
still operating with 20th century MTUs!To bridge this gap, increased OMNI interface deployment in the
near future will provide a virtual link type that can pass IP parcels
over paths that traverse legacy data links with small MTUs. Performance
analysis has proven that (single-threaded) receive-side performance is
bounded by transport layer protocol segment size, with performance
increasing in direct proportion with segment size. Experiments have
also shown measurable (single-threaded) performance increases by
including larger numbers of segments per parcel, with steady increases
for including increasing number of segments. However, parallel
receive-side processing will provide performance multiplier benefits
since the multiple segments that arrive in a single parcel can be
processed simultaneously instead of serially.In addition to the clear near-term benefits, IP parcels and
advanced jumbos will increase performance to new levels as future
links with very large MTUs in excess of 65535 octets begin to emerge.
With such large MTUs, the traditional CRC-32 (or even CRC-64) error
checking with errored packet discard discipline will no longer apply
for large parcels and advanced jumbos. Instead, packets larger than
a link-specific threshold will include Forward Error Correction (FEC)
codes so that errored packets can be repaired at the receiver's data
link layer then delivered to higher layers rather than being discarded
and triggering retransmission of large amounts of data. Even if the
FEC repairs are incomplete or imperfect, all parcels can still be
delivered to higher layers where the individual segment checksums
will detect and discard any damaged data not repaired by the link
and/or adaptation layers (advanced jumbos on the other hand would
require complete FEC repair).These new "super-links" will begin to appear mostly in the
network edges (e.g., high-performance data centers), however some
space-domain links that extend over enormous distances may also
benefit. For this reason, a common use case will include super-links
in the edge networks of both parties of an end-to-end session with an
OMNI link connecting the two over wide area Internetworks. Medium-
to moderately large-sized IP parcels over OMNI links will already
provide considerable performance benefits for wide-area end-to-end
communications while truly large parcels and advanced jumbos over
super-links can provide boundless increases for localized bulk
transfers in edge networks or for deep space long haul transmissions.
The ability to grow and adapt without practical bound enabled by IP
parcels and advanced jumbos will inevitably encourage new data link
development leading to future innovations in new markets that will
revolutionize the Internet.Until these new links begin to emerge, however, parcels will already
provide a tremendous benefit to end systems by allowing applications to
send and receive segment buffers larger than 65535 octets in a single
system call. By expanding the current operating system call data copy
limit from its current 16-bit length to a 32-bit length, applications
will be able to send and receive maximum-length parcel buffers even if
parcellation is needed to fit within the interface MTU. For applications
such as the Delay Tolerant Networking (DTN) Bundle Protocol , this will allow transfer of entire large protocol
objects (such as DTN bundles) in a single system call.Continuing into the future, a natural progression beginning with
IP packets then moving to IP parcels should also lead to wide scale
adoption of advanced jumbos. Since advanced jumbos carry only a
single very large transport layer data segment, loss of even a
single jumbogram could invoke a major retransmission event. But, with
the advent of error correcting codes, future link types could offer
truly large MTUs. Advanced jumbos sent over such links would then be
equipped with an error correction "repair kit" that the link far
end can use to "patch" the jumbogram allowing it to be processed
further by upper layers. Delay Tolerant Networking (DTN) over
high-speed and long-delay optical links provides an example
environment suitable for such large packets.<< RFC Editor - remove prior to publication >>Changes from earlier versions:Submit for review.