IP ParcelsBoeing Research & TechnologyP.O. Box 3707SeattleWA98124USAfltemplin@acm.orgI-DInternet-DraftIP packets (both IPv4 and IPv6) contain a single unit of upper layer
protocol data which becomes the retransmission unit in case of loss.
Upper layer protocols including the Transmission Control Protocol (TCP)
and transports over the User Datagram Protocol (UDP) prepare data units
known as "segments", with each individual IP packet including only a
single segment. This document presents a new construct known as the
"IP Parcel" which permits a single packet to carry multiple upper
layer protocol segments, essentially creating a "packet-of-packets". IP
parcels provide an essential building block for improved performance,
efficiency and integrity while encouraging larger Maximum Transmission
Units (MTUs) in the Internet.IP packets (both IPv4 and IPv6 ) contain a single unit of upper layer protocol data
which becomes the retransmission unit in case of loss. Upper layer
protocols such as the Transmission Control Protocol (TCP) and transports over the User Datagram Protocol (UDP)
(including QUIC , LTP
and others) prepare data units known as
"segments", with each individual IP packet including only a single
segment. This document presents a new construct known as the "IP
Parcel" which permits a single packet to carry multiple upper layer
protocol segments. This essentially creates a "packet-of-packets" with
the IP layer and full {TCP,UDP} headers appearing only once but with
possibly more than one segment included.Parcels are formed when an upper layer protocol entity identified
by the "5-tuple" (source address, destination address, source port,
destination port, protocol number) prepares a data buffer (or buffer
chain) beginning with an Integrity Block of up to 256 2-octet Checksums
followed by their corresponding upper layer protocol segments that can
be broken out into smaller sub-parcels and/or individual packets if
necessary. All segments except the final one must be equal in length
and no larger than 65535 octets (minus headers), while the final segment
must not be larger than the others but may be smaller. The upper layer
protocol entity then delivers the buffer(s), number of segments and
non-final segment size to lower layers which copy the buffer(s) into
the body of the parcel then include a {TCP,UDP} header and an IP
header plus extensions that identify this as a parcel and not an
ordinary packet.Parcels can be forwarded over consecutive parcel-capable links in
a path until they arrive at a router where the next hop is via a
link that does not support parcels, a parcel-capable link with a
size restriction, or an ingress middlebox Overlay Multilink Network
(OMNI) Interface that
spans intermediate Internetworks using adaptation layer encapsulation
and fragmentation. In the first case, the router breaks the parcel
into individual IP packets and forwards them via the next hop link.
In the second case, the router breaks the parcel into smaller
sub-parcels and forwards them via the next hop link. In the final
case, the OMNI interface breaks the parcel into smaller sub-parcels
if necessary then applies adaptation layer encapsulation and
fragmentation if necessary.These OMNI interface sub-parcels may then be reconstituted into
one or more larger parcels by an egress middlebox OMNI interface
which either delivers them locally or forwards them over additional
parcel-capable links on the path to the final destination. The
final destination can then further reconstitute sub-parcels of the
same original parcel into a single buffer (or buffer chain) so as
to present the largest possible number of segments to upper layers
in a single system call. Reordering and even loss or damage of
individual segments within the network is therefore possible, but
what matters is that the parcels delivered to the final destination
should be the largest practical size for best performance and that
loss or receipt of individual segments (and not parcel size)
determines the retransmission unit.The following sections discuss rationale for creating and shipping
IP parcels as well as the actual protocol constructs and procedures
involved. IP parcels provide an essential building block for improved
performance, efficiency and integrity while encouraging larger Maximum
Transmission Units (MTUs) in the Internet. It is further expected that
the parcel concept will inspire future innovation in applications,
operating systems, network equipment and data links.The Oxford Languages dictionary defines a "parcel" as "a thing or
collection of things wrapped in paper in order to be carried or sent by
mail". Indeed, there are many examples of parcel delivery services
worldwide that provide an essential transit backbone for efficient
business and consumer transactions.In this same spirit, an "IP parcel" is simply a collection of up to
256 upper layer protocol segments wrapped in an efficient package for
transmission and delivery (i.e., a "packet-of-packets") while a
"singleton IP parcel" is simply a parcel that contains a single segment.
IP parcels are distinguished from ordinary packets through the special
header constructions discussed in this document.The IP parcel construct is defined for both IPv4 and IPv6. Where the
document refers to "IPv4 header length", it means the total length of
the base IPv4 header plus all included options, i.e., as determined by
consulting the Internet Header Length (IHL) field. Where the document
refers to "IPv6 header length", however, it means only the length of the
base IPv6 header (i.e., 40 octets), while the length of any extension
headers is referred to separately as the "IPv6 extension header length".
Finally, the term "IP header plus extensions" refers generically to an
IPv4 header plus all included options or an IPv6 header plus all
included extension headers.Where the document refers to "{TCP, UDP} header length", it means
the length of either the TCP header plus options (20 or more octets)
or the UDP header (8 octets). It is important to note that only a
single IP header and a single full upper layer header appears in each
parcel regardless of the number of segments included. This distinction
often provides a significant savings in overhead made possible only
by IP parcels.Where the document refers to checksum calculations, it means the
standard Internet checksum unless otherwise specified. The same as for
TCP , UDP and IPv4
, the standard Internet checksum is defined as
(sic) "the 16-bit one's complement of the one's complement sum of all
(pseudo-)headers plus data, padded with zero octets at the end (if
necessary) to make a multiple of two octets". A notional Internet
checksum algorithm can be found in , while
practical implementations require special attention to byte ordering
"endianness" to ensure interoperability between diverse architectures.The Automatic Extended Route Optimization (AERO) and Overlay Multilink Network
Interface (OMNI) technologies
provide an architectural framework for transmission of IP parcels over
existing Internetworks. AERO/OMNI are expected to provide an operational
environment for IP parcels beginning from the earliest deployment phases
and extending to accommodate continuous growth. As more and more
parcel-capable links are deployed (e.g., in data centers, edge networks,
space-domain, and other high data rate services) AERO/OMNI will continue
to provide an essential service for true IP parcel Internetworking.The term "parcel-capable link" refers to any data link medium
(physical or virtual) capable of transiting a {TCP,UDP}/IP packet
that employs the parcel-specific constructions specified in this
document. The source and each router in the path has a "next hop
link" that forwards parcels toward the final destination, while
each router and the final destination has a "previous hop link"
that accepts en route parcels. Each next hop link MUST be capable
of forwarding parcels with segment lengths that fit within the
minimum of the link Maximum Transmission Unit (MTU) and 65535,
while first applying parcel subdivision if necessary (see:
). Currently, only the OMNI link satisfies
these properties, but new and existing link types are encouraged
to incorporate parcel support in their designs.The term "Maximum Transmission Unit (MTU)" is widely understood
in Internetworking terminology to mean the largest packet size that
can traverse a single link ("link MTU") or an entire path ("path MTU")
without requiring IP layer fragmentation. If the MTU value returned
during parcel path qualification is larger than 65535, it determines
the maximum parcel size that a router can forward over the path/link
without requiring a router to perform subdivision and with no segment
size restrictions; otherwise, it determines both the maximum parcel
and segment sizes (see: ).The terms "parcellation" and "reconstitution" refer to either
network layer or adaptation layer processes in which an original
(sub-)parcel is first sub-divided into smaller (sub-)parcels that
can transit the path without loss due to a size restriction then
finally re-combined into larger (sub-)parcels before delivery to
upper layers. As a network layer process, the (sub-)parcels
resulting from parcellation may only be reconstituted at the
final destination. As an adaptation layer process, the resulting
(sub)-parcels may be first reconstituted at the adaptation layer
egress node then further reconstituted at the final destination.The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP 14
when, and only when,
they appear in all capitals, as shown here.Studies have shown that applications can improve their performance by
sending and receiving larger packets due to reduced numbers of system
calls and interrupts as well as larger atomic data copies between kernel
and user space. Larger packets also result in reduced numbers of network
device interrupts and better network utilization (e.g., due to header
overhead reduction) in comparison with smaller packets.A first study involved performance enhancement
of the QUIC protocol using the linux Generic
Segment/Receive Offload (GSO/GRO) facility. GSO/GRO provides a robust
service that has shown significant performance increases based on a
multi-segment transfer capability between the operating system kernel
and QUIC applications. GSO/GRO performs fragmentation and reassembly at
the transport layer with the transport protocol segment size limited by
the path MTU (typically 1500 octets or smaller in today's Internet).A second study showed that
GSO/GRO also improves performance for the Licklider Transmission
Protocol (LTP) used for the Delay Tolerant
Networking (DTN) Bundle Protocol for segments
larger than the actual path MTU through the use of OMNI interface
encapsulation and fragmentation. Historically, the NFS protocol also
saw significant performance increases using larger (single-segment)
UDP datagrams even when IP fragmentation is invoked, and LTP still
follows this profile today. Moreover, LTP shows this (single-segment)
performance increase profile extending to the largest possible segment
size which suggests that additional performance gains are possible
using (multi-segment) IP parcels that approach or even exceed
65535 octets.TCP also benefits from larger packet sizes and efforts have
investigated TCP performance using jumbograms internally with changes
to the linux GSO/GRO facilities . The approach
proposed to use the Jumbo Payload option internally and to allow GSO/GRO
to use buffer sizes larger than 65535 octets, but with the understanding
that links that support jumbos natively are not yet widely available.
Hence, IP parcels provide a packaging that can be considered in the
near term under current deployment limitations.A limiting consideration for sending large packets is that they are
often lost at links with MTU restrictions, and the resulting Packet Too
Big (PTB) message may
be lost somewhere in the return path to the original source. This "Path
MTU black hole" condition can degrade performance unless robust path
probing techniques are used, however the best case performance always
occurs when loss of packets due to size restrictions is minimized.These considerations therefore motivate a design where transport
protocols should employ a maximum segment size no larger than 65535
octets (minus headers), while parcels that carry multiple segments may
themselves be significantly larger. Then, even if lower layers need to
apply parcellation/reconstitution (and/or fragmentation/reassembly),
improvements in performance, integrity and efficiency are enabled
for the original source, final destination and networked path as
a whole.An analogy: when a consumer orders 50 small items from a major online
retailer, the retailer does not ship the order in 50 separate small
boxes. Instead, the retailer packs as many of the small items as
possible into one or a few larger boxes (i.e., parcels) then places the
parcels on a semi-truck or airplane. The parcels may then pass through
one or more regional distribution centers where they may be repackaged
into different parcel configurations and forwarded further until they
are finally delivered to the consumer. But most often, the consumer will
only find one or a few parcels at their doorstep and not 50 separate
small boxes. This flexible parcel delivery service greatly reduces
shipping and handling cost for all including the retailer, regional
distribution centers and finally the consumer.An upper layer protocol entity (identified by the 5-tuple as
above) forms a parcel body when it prepares a data buffer (or buffer
chain) containing an Integrity Block of up to 256 2-octet Checksums
followed by their corresponding upper layer protocol segments (with
each TCP non-first segment preceded by a 4-octet Sequence Number header).
All non-final segments MUST be equal in length while the final segment
MUST NOT be larger and MAY be smaller. Each non-final segment MUST NOT
be larger than the minimum of 65535 octets and the path MTU, minus the
length of the {TCP,UDP} header, minus the length of the IP header (plus
options/extensions), minus 2 octets for the per-segment Checksum.
(Note that this also satisfies the case of ingress middlebox OMNI
interfaces in the path that would process the headers as upper layer
protocol payload during IPv6 encapsulation/fragmentation.)The upper layer protocol entity then presents the buffer(s) and
non-final segment size L to lower layers (noting that the combined
buffer length(s) may exceed 65535 octets if there are sufficient
segments of a large enough size). If the next hop link is not parcel
capable, the lower layer prepares each segment as an individual IP
packet as will be discussed further below. Otherwise, the lower
layer forms a parcel by appending a single full {TCP,UDP} header
(plus options) and a single full IP header (plus options/extensions).Upper layers request lower layers to include a specially-formatted
"Jumbo Payload" option (e.g., by asserting a socket option) as an
extension to the IP header of each parcel prior to transmission over
a network interface. For IPv4, the Jumbo Payload option is included as
an IPv4 header option with format derived from
except that the IP layer sets option type to '00001011' and option
length to '00010000' (noting that the length also distinguishes this
type from its obsoleted use as the "IPv4 Probe MTU" option ). The option is formed as shown in
:Following {TCP,UDP} parcel assembly but prior to
transmission, the IP layer sets Code to 255 and sets Check to the
same value that will appear in the TTL of the outgoing IPv4 header.
The IP layer next sets Nsegs to a value J between 0 and 255 and sets
Jumbo Payload Length to a 3-octet value M that encodes the length of
the IPv4 header plus the length of the {TCP,UDP} header plus the
combined length of the Integrity Block plus all concatenated segments.
Next, the IP layer sets Identification as discussed in , sets the "(P)robe Path MTU" flag to '1' for probes
or '0' for non-probes and sets the "More (S)ub-parcels" flag to '1'
for non-final sub-parcels or '0' for the final (sub-)parcel. The
IP layer finally sets the IPv4 header DF bit to 1 and Total Length
field to the non-final segment size L.For IPv6, the Jumbo Payload option is included as an IPv6
Hop-by-Hop option formatted the same as for IPv4 above, but with
option type set to '11001110', option length set to '00001100'
and with the Code/Check fields omitted. The option is formed as
shown in :Following {TCP,UDP} parcel assembly but prior to
transmission, the IP layer sets Nsegs to a 1-octet value J
between 0 and 255 and sets the Jumbo Payload Length field to a
3-octet value M that encodes the lengths of all IPv6 extension
headers present plus the length of the {TCP,UDP} header plus the
combined length of the Integrity Block plus all concatenated
segments. Next, the IP layer sets Identification as discussed
in , sets the P flag to '1' for probes or
'0' for non-probes and sets the S flag to '1' for non-final
sub-parcels or '0' for the final (sub-)parcel. The IP layer
finally sets the IPv6 header Payload Length field to L.The lower layers then assemble the {TCP,UDP}/IP parcel
according to the formats shown in :where the total number of segments is (J + 1), L
is the length of each non-final segment which MUST NOT be larger than
65535 octets (minus headers) and K is the length of the final segment
which MUST NOT be larger than L. (Note that when J is 0, K and L
are one and the same value.)The {TCP,UDP} header is then immediately followed by an Integrity
Block containing (J + 1) 2-octet Checksums concatenated in numerical
order as shown in :
The Integrity Block is then followed by (J + 1) upper layer
protocol segments. For TCP, the TCP header Sequence Number field
encodes a 4-octet starting sequence number for the first segment
only, while each additional segment is preceded by its own 4-octet
Sequence Number field. For this reason, the length of the first
segment is only (L-4) octets since the 4-octet TCP header
Sequence Number field applies to that segment. (All non-first
TCP segments instead begin with their own Sequence Number
headers, with the 4-octet length included in L and K.)Following parcel construction, the Nsegs value unambiguously
determines the number of 2-octet Checksums present in the Integrity
Block and (together with the IP {Total, Payload} length and Jumbo
Payload Length) also determines the number of parcel data segments
present. Receiving nodes that process IP parcels therefore observe
the following requirements:if the Jumbo Payload Length indicates insufficient space for
the full Integrity Block plus at least one data segment of
length K, the receiver discards the parcel.if the length of the payload following the Integrity Block
is (J * L) or less, the receiver processes all initial
Checksums along with their corresponding segments up to the
end of the payload and ignores any remaining Checksums.if the length of the payload following the Integrity Block is
greater than ((J + 1) * L) the receiver processes all Checksums
with their corresponding segments and ignores any remaining
payload beyond the end of the final segment.Note: per-segment Checksums appear in a contiguous Integrity Block
immediately following the {TCP,UDP}/IP headers instead of inline with
the parcel segments to greatly increase the probability that they will
appear in the contiguous head of a kernel receive buffer even if the
parcel was subject to OMNI interface IPv6 fragmentation. This condition
may not always hold if the IPv6 fragments also incur IPv4 encapsulation
and fragmentation over paths that traverse fast IPv4 links with small
MTUs. Even in that case, however, only the fragmented Integrity Block
(i.e., and not the entire parcel) must be pulled into the contiguous
head of a kernel receive buffer.Note: For IPv4 parcels, the first 2 octets of the Jumbo Payload option
include Code and Check fields in case a router on the path overwrites
the values in a wayward attempt to implement .
IPv4 parcel recipients should therefore regard an incorrect Code or
Check value as evidence that the field was either intentionally or
accidentally altered by a previous hop node.A TCP Parcel is an IP Parcel that includes an IP header plus
extensions with a Jumbo Payload option formed as shown in
with Nsegs/J encoding one
less than the number of segments and Jumbo Payload length encoding
a value up to 16,777,215 (2**24 - 1). The IP header plus extensions
is then followed by a TCP header plus options (20 or more octets),
which is then followed by an Integrity Block with (J + 1) consecutive
2-octet Checksums. The Integrity Block is then followed by (J + 1)
consecutive segments, where the first segment is (L-4) octets in length
and uses the 4-octet sequence number found in the TCP header, each
intermediate segment is L octets in length (including its own 4-octet
Sequence Number header) and the final segment is K octets in
length (including its own 4-octet Sequence Number header).
The value L is encoded in the IP header {Total, Payload} Length field
while J is encoded in the Nsegs octet. The overall length of the
parcel as well as final segment length K are determined by Nsegs
and the Jumbo Payload length M as discussed above.The source prepares TCP Parcels in a similar fashion as for simple
TCP jumbograms . The source calculates a checksum
of the TCP header plus IP pseudo-header only (see: ),
but with the TCP header Sequence Number field temporarily set to 0
during the calculation since the true sequence number will be included
as an integrity pseudo header for the first segment. The source then
writes the calculated value in the TCP header Checksum field as-is (i.e.,
without converting calculated '0' values to 'ffff') and finally re-writes
the actual sequence number back into the Sequence Number field. (Nodes
that verify the header checksum first perform the same operation of
temporarily setting the Sequence Number field to 0 and then resetting
to the actual value following checksum verification.)The source then calculates the checksum of the first segment
beginning with the sequence number found in the full TCP header as a
4-octet pseudo-header then extending over the remaining (L-4) octet
length of the segment. The source next calculates the checksum for
each L octet intermediate segment independently over the length of
the segment (beginning with its sequence number), then finally
calculates the checksum of the K octet final segment (beginning
with its sequence number). As the source calculates each segment(i)
checksum (for i = 0 thru J), it writes the value into the
corresponding Integrity Block Checksum(i) field as-is.Note: The parcel TCP header Source Port, Destination Port and
(per-segment) Sequence Number fields apply to all parcel segments,
while the TCP control bits and all other fields apply only to the
first segment (i.e., "segment(0)"). Therefore, only parcel segment(0)
may be associated with control bit settings while all other
segment(i)'s must be simple data segments.See for additional TCP considerations. See
for additional integrity considerations.A UDP Parcel is an IP Parcel that includes an IP header plus
extensions with a Jumbo Payload option formed as shown in
with Nsegs/J encoding one less than
the number of segments and Jumbo Payload length encoding a value
up to 16,777,215 (2**24 - 1). The IP header plus extensions is then
followed by an 8-octet UDP header followed by an Integrity Block
with (J + 1) consecutive 2-octet Checksums followed by (J + 1)
upper layer protocol segments. Each segment must begin with a
transport-specific start delimiter (e.g., a segment identifier)
included by the transport layer user of UDP. The length of the first
segment L is encoded in the IP {Total, Payload} Length field while
J is encoded in the Nsegs octet. The overall length of the parcel
as well as the final segment length are determined by the Jumbo
Payload length M as discussed above.The source prepares UDP Parcels in a similar fashion as for simple
UDP jumbograms and therefore MUST set the UDP
header length field to 0. The source then calculates the checksum of
the UDP header plus IP pseudo-header (see: )
and writes the calculated value in the UDP header Checksum field as-is
(i.e., without converting calculated '0' values to 'ffff').The source then calculates a separate checksum for each segment
for which checksums are enabled independently over the length of the
segment. As the source calculates each segment(i) checksum (for
i = 0 thru J), it writes the value into the corresponding Integrity
Block Checksum(i) field with calculated '0' values converted to
'ffff'; for segments with checksums disabled, the source instead
writes the value '0'.See: for additional integrity considerations.Following {TCP,UDP} parcel assembly, the IP layer of the source
fully populates all IP header fields including the source address,
destination address and Jumbo Payload option as discussed above.
The source also maintains a randomly-initialized 32-bit cached
Identification value for each destination. For each parcel
transmission, the IP layer sets the Jumbo Payload Identification
field to the current cached value for this destination then
increments the cached value by 1 (modulo 2**32). The IP layer
can subsequently reset each cached value to a new random value
at any time, e.g., to maintain an unpredictable profile.The IP layer of the source next presents each parcel to a network
interface for transmission to the next hop. For ordinary IP interface
attachments to parcel-capable links, the interface simply admits each
parcel into the link the same as for any IP packet where it may be
forwarded by one or more routers over additional consecutive
parcel-capable links possibly even traversing the entire forward
path to the final destination. If any node in the path does not
recognize the parcel construct, it may drop the parcel and return
an ICMP "Parameter Problem" message.When the next hop link does not support parcels at all, or when
the next hop link is parcel-capable but configures an MTU that is
too small to pass the entire parcel, the source breaks the parcel
up into individual IP packets (in the first case) or into smaller
sub-parcels (in the second case). In the first case, the parcel is
replaced by individual IP packets, but the process can engage
Generic Segment Offload (GSO) and the final destination can apply
Generic Receive Offload (GRO) to recombine the packets into a
larger buffer for upper layer delivery. In the second case (termed
"parcellation"), each sub-parcel will contain the same Identification
value and with the S flag set appropriately. The final destination
can then apply "reconstitution" to deliver the largest possible
parcel buffers to its upper layer protocols. In all other ways,
the source processes of breaking a parcel up into individual IP
packets or smaller sub-parcels entails the same considerations
as for a router on the path that invokes these processes as
discussed in the following subsections.Each parcel serves as an implicit probe that tests the forward
path's ability to pass parcels. Each parcel header also includes a
24-bit "Path MTU (PMTU)" field into which the source writes the
minimum of the next hop link MTU and (2**24 - 1) and each router
in the path rewrites PMTU in a similar fashion as for and .
In particular, each router compares the parcel PMTU value with
the next hop link MTU for the parcel and MUST (re)set PMTU to
the minimum value. Note that the fact that the parcel traversed
a previous hop link should provide acceptable evidence of forward
progress since parcel path MTU determination is unidirectional in
the forward path only. However, nodes can also include the previous
hop link MTU in their minimum PMTU calculations in case the link
may have an ingress size restriction (such as a receive buffer
limitation). Each parcel also includes one or more upper layer
protocol segments corresponding to the 5-tuple for the flow, which
may also include {TCP,UDP} segment size probes used for packetization
layer path MTU discovery .
(See: for further details on implicit/explicit
path probing.)When a router receives an IPv4 parcel it first compares Code with
255 and Check with the IPv4 header TTL; if either value differs, the
router drops the parcel and returns a negative Parcel Reply (see ). For all IP parcels, the router next compares
the value L with the next hop link MTU. If the next hop link MTU is
too small to pass either a singleton parcel or an individual IP packet
with a single segment of length L the router discards the parcel and
returns a positive Parcel Reply with MTU set to the next hop link MTU.
For IPv4 parcels, if the next hop link is parcel capable the router MUST
then reset Check to the same value that would appear in the TTL of the
outgoing IPv4 header for forwarding the parcel to the next hop.If the router recognizes parcels but the next hop link in the path
does not, or if the entire parcel would exceed the next hop link MTU, the
router instead opens the parcel. The router then forwards each enclosed
segment in individual IP packets or in a set of smaller sub-parcels that
each contain a subset of the original parcel's segments. If the next
hop link is via an OMNI interface, the router instead proceeds according
to OMNI Adaptation Layer procedures. These considerations are discussed
in detail in the following sections.For transmission of individual IP packets over links that do not
support parcels, the source or router (i.e., the node) engages GSO.
The node first determines whether an individual packet with segment of
length L can fit within the next hop link MTU. If not, the node drops
the parcel and returns a positive Parcel Reply message with MTU set
to the next hop link MTU and with the leading portion of the parcel
beginning with the IP header as the "packet in error". Otherwise,
the node removes the Jumbo Payload option, sets aside and remembers
the Integrity Block (and for TCP also sets aside and remembers the
Sequence Number header values of each non-first segment)
then copies the {TCP,UDP}/IP headers (but with the Jumbo Payload
option removed) followed by segment(i) (for i= 0 thru J) into 'i'
individual IP packets ("packet(i)"). The node then clears the TCP
control bits in all but packet(0), and includes only those TCP
options that are permitted to appear in data segments in all but
packet(0) (which may also include control segment options). The
node then sets IP {Total, Payload} length for each packet(i) based
on the length of segment(i) according to the IP protocol standards
.For each IPv6 packet(i), the node includes an IPv6 Fragment Header
and sets the Identification field to the value found in the parcel
header. For each IPv4 packet(i), the node sets the Identification
field to the least significant 16 bits of the value found in the
parcel header and sets the (D)ont Fragment flag to '1'. For each
IP packet(i), the node then sets both the Fragment Offset field
and (M)ore fragments flag to '0' to produce an unfragmented IP
packet. For IPv6, destinations should process these "atomic
fragments" as whole packets instead of admitting them into the
reassembly cache (i.e., the same as for IPv4).The node then processes further according to upper layer protocol
conventions. For TCP, the node calculates the checksum for packet(0)'s
TCP/IP headers only according to but with the
sequence number value saved and the field set to 0. The node then adds
Integrity Block Checksum(0) to the calculated value and writes the sum
into packet(0)'s TCP Checksum field. The node then resets the Sequence
Number field to packet(0)'s saved sequence number and forwards
packet(0) to the next hop. The node next calculates the checksum
of packet(1)'s TCP/IP headers with the Sequence Number field set
to 0 and saves the calculated value. In each non-first packet(i)
(for i = 1 thru J), the node then adds the saved value to Integrity
Block Checksum(i), writes the sum into packet(i)'s TCP Checksum
field, sets the TCP Sequence Number field to packet(i)'s sequence
number then forwards packet(i) to the next hop.For UDP, the node sets the UDP length field according to in each packet(i) (for i= 0 thru J). If Integrity
Block Checksum(i) is 0, the node then sets the UDP Checksum field
to 0, forwards packet(i) to the next hop and continues to the next.
The node next calculates the checksum over packet(i)'s UDP/IP
headers only according to . If Integrity Block
Checksum(i) is not 'ffff', the node then adds the value to the header
checksum; otherwise, the node re-calculates the checksum for segment(i).
If the re-calculated segment(i) checksum value is 'ffff' or '0' the
node adds the value to the header checksum; otherwise, it continues
to the next packet(i). The node finally writes the total checksum
value into the packet(i) UDP Checksum field (or writes 'ffff' if
the total was '0') and forwards packet(i) to the next hop.Note: for each UDP packet(i), the node must recalculate
the segment checksum if Checksum(i) is 'ffff', since that value is
shared by both '0' and 'ffff' calculated checksums. If recalculating
the checksum produces an incorrect value, segment(i) is considered
errored and the node can optionally drop or forward (noting that
the forwarded packet would simply be discarded as an error by the
final destination).Note: for each {TCP,UDP} packet(i), the node can optionally
re-calculate and verify the segment checksum unconditionally before
forwarding, but this may introduce undesirable extra delay and
processing overhead.For transmission of smaller sub-parcels over parcel-capable links,
the source or router (i.e., the node) first determines whether a single
segment of length L can fit within the next hop link MTU if packaged as
a (singleton) sub-parcel. If not, the node returns a positive Parcel Reply
message with MTU set to the next hop link MTU and containing the leading
portion of the parcel beginning with the IP header, then drops the parcel.
Otherwise, the node employs network layer parcellation to break the original
parcel into smaller groups of segments that would fit within the path MTU
by determining the number of segments of length L that can fit into each
sub-parcel under the size constraints. For example, if the node determines
that a sub-parcel can contain 3 segments of length L, it creates sub-parcels
with the first containing Integrity Block Checksums/Segments 0-2, the
second containing Checksums/Segments 3-5, etc., and with the final
containing any remaining Checksums/Segments.The node then appends identical {TCP,UDP}/IP headers (including the
Jumbo Payload option and any other extensions) to each sub-parcel while
resetting L and M in each according to the above equations with Nsegs/J
set to 2 for each intermediate sub-parcel and with Nsegs/J set to one
less than the remaining number of segments for the final sub-parcel.
For TCP, the node then clears the TCP control bits in all but the first
sub-parcel and includes only those TCP options that are permitted to
appear in data segments in all but the first sub-parcel (which may also
include control segment options). For both TCP and UDP, the node then
resets the {TCP,UDP} Checksum according to ordinary parcel formation
procedures (see above). The node then sets the TCP Sequence Number
field to the value that appears in the first sub-parcel segment while
removing the first segment's Sequence Number header (if present).When the node breaks an original parcel into sub-parcels, it also
checks the "(S)ub-parcel" flag in the Jumbo Header. If the S flag is
'0', the node sets S to '1' in all resulting sub-parcels except the last
(i.e., the one containing the final segment of length K, which may be
shorter than L) for which it sets S to '0'. If the S flag is '1', the
node instead sets S to '1' in all resulting sub-parcels including the
last. The node finally sets PMTU to the next hop link MTU then forwards
each (sub-)parcel over the parcel-capable next hop link.For transmission of original parcels or sub-parcels over OMNI
interfaces, the node admits all parcels no larger than the path MTU
into the interface unconditionally since the OMNI interface MTU is
unrestricted. The OMNI Adaptation Layer (OAL) of this First Hop Segment
(FHS) OAL source node then forwards the parcel to the next OAL hop
which may be either an OAL intermediate node or a Last Hop Segment
(LHS) OAL destination. OMNI interface upper layer protocol processing
procedures are specified in detail in the remainder of this section,
while lower layer encapsulation and fragmentation procedures are
specified in .When the OAL source forwards a parcel or sub-parcel (whether
generated by a local application or forwarded by other nodes over
one or more parcel-capable links), it first assigns a
monotonically-incrementing (modulo 255) adaptation layer "Parcel ID".
If the parcel is larger than the OAL maximum segment size of 65535
octets, the OAL source then employs adaptation layer parcellation to
break the parcel into sub-parcels the same as for the network layer
procedures discussed above. The OAL source next assigns a different
monotonically-incrementing adaptation layer Identification value for
each sub-parcel of the same Parcel ID then performs adaptation layer
encapsulation and fragmentation and finally forwards each fragment
to the next OAL hop toward the OAL destination as necessary. (During
encapsulation, the OAL source examines the Jumbo Payload option S
flag to determine the setting for the adaptation layer fragment
header S flag according to the same rules specified in .)When the sub-parcels arrive at the OAL destination, the node can
optionally retain them along with their Parcel ID and Identifications
for a brief time to support reconstitution with peer sub-parcels of
the same original (sub-)parcel identified by the adaptation layer
4-tuple (source, destination, Identification, Parcel ID). This
reconstitution entails the concatenation of Checksums/Segments
included in sub-parcels with the same Parcel ID and with
Identification values within 255 of one another to create a
larger sub-parcel possibly even as large as the entire original
(sub-)parcel. Order of concatenation need not be strictly enforced,
except that if a sub-parcel has TCP control bits set it must appear
as a first concatenated element in a reconstituted larger parcel,
and that the sub-parcel with S flag set to '0' must occur as a
final concatenation. The reconstituted (sub-)parcel then sets S
to '0' if and only if one of its constituent elements also had S
set to '0'; otherwise, it sets S to '1'.The OAL destination then appends a common {TCP,UDP}/IP header plus
extensions to each reconstituted sub-parcel while resetting J, K, L
and M in each. For TCP, if any sub-parcel has TCP control bits set the
OAL destination regards it as sub-parcel(0) and uses its TCP header as
the header of the reconstituted (sub-)parcel. The OAL destination then
resets the {TCP,UDP}/IP header checksum. If the OAL destination is
also the final destination, it then delivers the sub-parcels to the
IP layer which processes them according to the 5-tuple information
supplied by the original source. Otherwise, the OAL destination
forwards each sub-parcel toward the final destination the same as
for an ordinary IP packet as discussed above.Note: Adaptation layer parcellation over OMNI links occurs only
at the OAL source while the adaptation layer reconstitution occurs
only at the OAL destination. The OAL destination can instead avoid
this process if it would negatively impact performance, noting that
forwarding individual sub-parcels without delay and without
reconstitution is always acceptable (but not always optimal).
Intermediate OAL nodes do not participate in the parcellation or
reconstitution processes.Note: Adaptation layer parcellation and reconstitution is an OAL
process based on the adaptation layer 4-tuple and not the network
layer 5-tuple. This is true even if the OAL has visibility into
network layer information since some sub-parcels of the same
original parcel may be forwarded over different network paths.When a large parcel transits a path that includes links with
restrictive MTUs, the final destination may receive multiple
sub-parcels having the same 5-tuple and Identification value. The
final destination can hold the sub-parcels in a reconstitution
buffer for a short time or until a sub-parcel with the S flag set
to '0' arrives. The final destination then concatenates the segments
of all non-final sub-parcels, then finally concatenates the segments
of the final sub-parcel and passes the reconstituted parcel to
upper layers.Due to the possibility of network loss and/or reordering, the
final destination may receive a sub-parcel with S set to '0' before
all other sub-parcels of the same original parcel have arrived.
This condition does not represent an error, but in some cases
may cause the IP layer to deliver sub-parcels to upper layers
that are smaller than the original parcel. Upper Layers simply
process any segments received from all such lower layer deliveries
and will request retransmission of any segments that were lost
and/or damaged.If the original source or a router on the path opens a parcel
and forwards its contents as individual IP packets, these packets
will arrive at the final destination which may collectively
reconstitute them using GRO. The 5-tuple information plus the
Identification value provides sufficient context for GRO
reconstitution which practical implementations have proven
can provide a robust service at high data rates even for IPv4
with its 16-bit Identification limitation (see note).Note: in both the sub-parcel and GRO reassembly cases,
reconstitution entails concatenation of the segments in the
order they were received even though some small degree of
reordering and/or loss may have occurred in the networked path.
This eliminates the need for a reconstitution offset value,
since each sub-parcel or individual IP packet contains an
integral number of whole upper layer protocol segments which
are not themselves fragmented. The IP layer can then present
the reconstituted parcel contents to upper layers with segments
arranged in roughly the same order in which they were originally
transmitted, but strict ordering is not required since each
segment will include an upper layer protocol-specific start
delimiter with positional coordinates.Note: GSO and/or parcel reconstitution buffer congestion
may indicate that full reconstitution cannot be sustained at
current arrival rates. Lower layers should then begin delivering
partial reconstitutions or even individual segments to an upper
layer receive queue (e.g., a socket buffer) instead of waiting
for all segments to arrive. Lower layers can manage GSO/parcel
reconstitution, e.g., by maintaining buffer occupancy high/low
watermarks.All parcels serve as implicit probes and may cause either a router
in the path or the final destination to return an ordinary ICMP error
and/or Packet Too
Big (PTB) message
concerning the parcel. A router in the path or the final destination
may also return an unsolicited negative "Parcel Reply" if the parcel
cannot make further forward progress.To unambiguously determine whether parcels can transit at least
an initial portion of the forward path toward the final destination,
the original source can also send IP parcels with the Jumbo Payload
option P flag set to '1' as an explicit "Parcel Probe". The probe
will elicit a Parcel Reply from a router or the final destination
(and possibly also one or more upper layer protocol-specific probe
replys from the final destination) while the parcel itself may
continue to make forward progress.A Parcel Probe can be included either in an ordinary data parcel
or a {TCP,UDP}/IP parcel with destination port set to '9' (discard)
. The probe will still contain a valid UDP
Checksum that any intermediate hops as well as the final destination
can use to detect mis-delivery, but the final destination will
discard any parcel data unconditionally.If the original source receives a positive Parcel Reply, it marks
the path as "parcels supported" and ignores any ordinary ICMP and/or
PTB messages concerning the probe. If the original source instead
receives a negative Parcel Reply or no reply, it marks the path as
"parcels not supported" and may regard any ordinary ICMP and/or PTB
messages concerning the probe (or its contents) as indications of
a possible path limitation.The original source can therefore send Parcel Probes in the
same IP parcels used to carry real data. The probes will traverse
parcel-capable links joined by routers on the forward path possibly
extending all the way to the destination. If the original source
receives a positive Parcel Reply, it can continue using IP parcels
(while also adjusting its current segment size if necessary).The original source sends Parcel Probes unidirectionally in the
forward path toward the final destination to elicit a Parcel Reply,
since it will often be the case that IP parcels are supported only
in the forward path and not in the return path. Parcel Probes may be
dropped in the forward path by any node that does not recognize IP
parcels, but Parcel Replys must be packaged to avoid return path
filtering. For this reason, the Jumbo Payload options included in
Parcel Probes are always packaged as IPv4 header options or IPv6
Hop-by-Hop options while Parcel Replys are returned as UDP/IP
encapsulated ICMPv6 PTB messages with a "Parcel Reply" Code
value (see: ).Original sources send ordinary parcels or discard parcels as
explicit Parcel Probes by setting the Jumbo Payload P flag to '1'
and PMTU to the minimum of the next hop link MTU and (2**24 - 1).
The source then sets Nsegs, Jumbo Payload Length, and {Total,
Payload} Length, then calculates the header and per-segment
checksums the same as for an ordinary parcel. The source finally
sends the Parcel Probe via the outbound IP interface.According to , IPv4 middleboxes (i.e.,
routers, security gateways, firewalls, etc.) that do not observe this
specification SHOULD drop IPv4 packets that contain option type
'00001011' ("IPv4 Probe MTU") but some might instead either attempt
to implement or ignore the option altogether.
IPv4 middleboxes that observe this specification instead MUST process
the option as an implicit or explicit Parcel Probe as specified below.According to , IPv6 middleboxes (i.e.,
routers, security gateways, firewalls, etc.) that recognize the IPv6
Jumbo Payload option but do not observe this specification SHOULD
return an ICMPv6 Parameter Problem message (and presumably also drop
the packet) due to validation rules for ordinary jumbograms. IPv6
middleboxes that observe this specification instead MUST process
the option as an implicit or explicit Parcel Probe as specified below.When a router that observes this specification receives an IPv4
Parcel Probe it first compares Code with 255 and Check with the IP
header TTL; if either value differs, the router MUST drop the probe
and return a negative Parcel Reply (see below). For all other IP
Parcel Probes, if the next hop link is non-parcel-capable the router
compares PMTU with the next hop link MTU and MUST return a positive
Parcel Reply (see below) with MTU set to the minimum value. If the
next hop link configures a sufficiently large MTU, the router then
converts the probe into individual IP packet(s) the same as specified
in and forwards each packet to the next hop;
otherwise, it drops the probe.If the next hop link both supports parcels and configures an MTU
that is large enough to pass the probe, the router instead compares
the probe PMTU with the next hop link MTU and MUST (re)set PMTU to
the minimum value then forward the probe to the next hop (and for
IPv4 first reset Check to the same value that will appear in the
outgoing IPv4 TTL). If the next hop link supports parcels but
configures an MTU that is too small to pass the probe, the router
resets PMTU (and Check if necessary) then applies parcellation to
break the probe into multiple smaller sub-parcels that can traverse
the link while setting the P flag to '1' only for the first
sub-parcel. If the next hop link supports parcels but configures
an MTU that is too small to pass a singleton sub-parcel of the
probe, the router instead MUST drop the probe and return a
positive Parcel Reply with MTU set to the next hop link MTU.The final destination may therefore receive one or more individual
IP packets or intact Parcel Probes. If the final destination receives
individual IP packets, it performs any necessary integrity checks,
applies GRO if possible then delivers the (reconstituted) buffer
contents to upper layers which will return one or more upper layer
probe response(s) if necessary. If the final destination receives an
IPv4 Parcel Probe, it first compares Code with 255 and Check with
the IPv4 header TTL; if either value differs, the final destination
MUST drop the probe and return a negative Parcel Reply. Otherwise,
the final destination then MUST return a positive Parcel Reply and
deliver the (reconstituted) buffer contents to upper layers the
same as for an ordinary IP parcel.When a router or final destination returns a Parcel Reply, it
prepares an ICMPv6 PTB message with Code set to
"Parcel Reply" (see: ) and with
MTU set to either the minimum MTU value for a positive reply or to '0'
for a negative reply. The node then writes its own IP address as the
Parcel Reply source and writes the source of the Parcel Probe as the
Parcel Reply destination (for IPv4 Parcel Probes, the node writes the
Parcel Reply address as an IPv4-Compatible IPv6 address ). The node next copies as much of the leading
portion of the probe/parcel (beginning with the IP header) as possible
into the "packet in error" field without causing the entire Parcel
Reply (beginning with the IPv6 header) to exceed 512 octets in length,
then calculates the ICMPv6 Checksum. Since IPv6 packets cannot traverse
IPv4 paths, and since middleboxes often filter ICMPv6 messages as they
traverse IPv6 paths, the node next wraps the Parcel Reply in UDP/IP
headers of the correct IP version with the IP source and destination
addresses copied from the Parcel Reply and with UDP port numbers set
to the OMNI UDP port number .
In the process, the node either calculates or omits the UDP Checksum
as appropriate and (for IPv4) clears the DF bit. The node finally
sends the prepared Parcel Reply to the original source of the probe.After sending a Parcel Probe (or an ordinary parcel) the original
source may therefore receive a UDP/IP encapsulated Parcel Reply (see
above) and/or one or more upper layer protocol probe replies. If the
source receives a Parcel Reply, it verifies the checksum and matches
the enclosed PTB message with an original probe/parcel by examining
the Identification echoed in the ICMPv6 "packet in error" containing
the leading portion of the probe. If the Identification does not
match, the source discards the Parcel Reply; otherwise, it continues
to process. If the Parcel Reply MTU is '0', the source marks the path
as "parcels not supported"; otherwise, it marks the path as "parcels
supported" and also records the MTU value as the parcel path MTU (i.e.,
the portion of the path up to and including the node that returned
the Parcel Reply). If the MTU value is 65535 or larger, the MTU
determines the largest whole parcel size that can traverse the
parcel path without subdivision while using any segment size up to
and including the maximum. If the MTU value is smaller than 65535,
the MTU represents both the largest whole parcel size and a maximum
segment size limitation. In both cases, the maximum parcel size
that can traverse the initial portion of the path may be larger
than the maximum segment size that can continue to traverse the
remaining path to the final destination, which can only be
determined through upper layer protocol probes (i.e., either
as individual probe packets or as payloads of the Parcel Probes).Note: If a router or final destination receives a Parcel Probe but
does not recognize the parcel construct, it drops the probe without
further processing (and may return an ICMP error). The original
source will then consider the probe as lost, but may attempt to
probe again later, e.g., in case the path may have changed.The {TCP,UDP}/IP header plus each segment of a (multi-segment) IP
parcel includes its own integrity check. This means that IP parcels can
support stronger and more discrete integrity checks for the same amount
of upper layer protocol data compared to an individual IP packet or
jumbogram. The {TCP/UDP} Checksum header integrity check can be verified
at each hop to ensure that parcels with errored headers are detected.
The per-segment Integrity Block Checksums are set by the source and
verified by the final destination, noting that TCP parcels must
honor the sequence number discipline discussed in
.IP parcels can range in length from as small as only the {TCP,UDP}/IP
headers plus a single Integrity Block Checksum with a non-zero length
segment to as large as the headers plus (256 * (65535 minus headers)) octets.
Although 32-bit link layer integrity checks provide sufficient protection
for contiguous data blocks up to approximately 9KB, reliance on link-layer
integrity checks may be inadvisable for links with significantly larger
MTUs and may not be possible at all for links such as tunnels over IPv4
that invoke fragmentation. Moreover, the segment contents of a received
parcel may arrive in an incomplete and/or rearranged order with respect
to their original packaging.Lower layer protocol entities calculate and verify the {TCP,UDP}/IP
Checksum at their layer, since an errored header could result in
mis-delivery to the wrong upper layer protocol entity. If a lower
layer protocol entity on the path detects an incorrect {TCP,UDP}/IP
Checksum it discards the entire IP parcel unless the header(s) can
somehow be repaired.To support the parcel header checksum calculation, lower layer
protocol entities use modified versions of the {TCP,UDP}/IPv4
"pseudo-header" found in ,
or the {TCP,UDP}/IPv6 "pseudo-header" found in Section 8.1 of
. Note that while the contents of the
two IP protocol version-specific pseudo-headers beyond the address
fields are the same, the order in which the contents are arranged
differs and must be honored according to the specific IP protocol
version as shown in . This allows for maximum
reuse of widely deployed code while ensuring interoperability.where the following fields appear in both pseudo-headers:
Source Address is the 4-octet IPv4 or 16-octet IPv6 source
address of the prepared parcel.Destination Address is the 4-octet IPv4 or 16-octet IPv6
destination address of the prepared parcel.zero encodes the constant value '0'.Next Header is the IP protocol number corresponding to the upper
layer protocol, i.e., TCP or UDP.Segment Length is the value that appears in the IPv4 Total
Length or IPv6 Payload Length field of the prepared parcel.Nsegs is a 1-octet value one less than the number of segments
included, and must contain a number between 0 and 255 (this is
the same value that appears in the Jumbo Payload Option Nsegs
field).Upper-Layer Packet Length is the 3-octet length of the
{TCP,UDP} header plus data (this value can be derived from
the Jumbo Payload Length by subtracting the IPv4 header length
for IPv4 or IPv6 extension header length for IPv6).Upper layer protocol entities use socket options to coordinate
per-segment checksum processing with lower layers. If the upper layer
sets a SO_NO_CHECK(TX) socket option, the upper layer is responsible for
supplying per-segment checksums on transmission and the lower layer
forwards the IP parcel to the next hop without further processing;
otherwise, the lower layer supplies the per-segment checksums before
forwarding. If the upper layer sets a SO_NO_CHECK(RX) socket option,
the upper layer is responsible for verifying per-segment checksums on
reception and the lower layer delivers each received parcel body to
the upper layer without further processing; otherwise, the lower
layer verifies the per-segment parcel checksums before delivering.When the upper layer protocol entity of the source sends a parcel
body to lower layers, it prepends an Integrity Block of (J + 1) 2-octet
Checksum fields and includes a 4-octet Sequence Number field with each
TCP non-first segment. If the SO_NO_CHECK(TX) socket option is set, the
upper layer protocol either calculates each segment checksum and writes
the value into the corresponding Checksum field (and for UDP with '0'
values written as 'ffff') or writes the value '0' to disable checksums
for specific UDP segments. If the SO_NO_CHECK(TX) socket options is
clear, for UDP the upper layer instead writes the value '0' to disable
or any non-zero value to enable checksums for specific segments (for
TCP, the upper layer instead writes any zero or non-zero value).When the lower layer protocol entity of the source receives the
parcel body from upper layers, if the SO_NO_CHECK(TX) socket option
is set the lower layer appends the {TCP,UDP}/IP headers and forwards
the parcel to the next hop without further processing. If the
SO_NO_CHECK(TX) socket option is clear, the lower layer instead
calculates the checksum for each TCP segment (or each UDP segment
with a non-zero value in the corresponding Integrity Block Checksum
field) and overwrites the calculated value into the Checksum field
(and for UDP with '0' values written as 'ffff').When the lower layer protocol entity of the destination receives a
parcel from the source, if the SO_NO_CHECK(RX) socket option is set the
lower layer delivers the parcel body to the upper layer without further
processing, and the upper layer is responsible for per-segment checksum
verification. If the SO_NO_CHECK(RX) socket option is clear, the lower
layer instead verifies the checksum for each TCP segment (or each
UDP segment with a non-zero value in the corresponding Integrity Block
Checksum field) and marks a corresponding field for the segment in an
ancillary data structure as either "correct" or "incorrect". (For UDP,
if the Checksum is '0' the lower layer protocol unconditionally marks
the segment as "correct".) The lower layer then delivers both the parcel
body (beginning with the Integrity block) and ancillary data to the
upper layer which can then determine which segments have
correct/incorrect checksums.Note: The Integrity Block itself is intentionally omitted from the IP
Parcel {TCP,UDP} header checksum calculation. This permits destinations
to accept as many intact segments as possible from received parcels with
checksum block bit errors, whereas the entire parcel would need to be
discarded if the header checksum also covered the Integrity Block.True IPv6 jumbograms are distinguished from IPv6 parcels by
including a zero IPv6 Payload Length and an IPv6 Hop-by-Hop
Option with type '11001110' and length '00000100'. The Jumbo
Payload option format and all aspects of IPv6 jumbogram processing
are exactly as specified in .True IPv4 jumbograms are distinguished from IPv4 parcels by
including a zero IPv4 Total Length and an IPv4 option with type
'00001011' and length '00000110'. The Jumbo Payload option format
and all aspects of IPv4 jumbogram processing are exactly the same
as for IPv6 jumbograms.This specification augments IP jumbograms by also providing a
Jumbo Path Qualification function using the mechanisms specified
in . The function employs a "Jumbo Probe"
formed exactly the same as for Parcel Probes, but with Nsegs/Jumbo
Payload Length set to '0' and with the final 4 octets converted to
a single 32-bit PMTU field. The Jumbo Probe also sets the IP {Total,
Payload} length fields to '0', sets {Protocol, Next Header} to
{TCP,UDP}, sets the {TCP,UDP} port to '9' (discard) and includes
no octets beyond the {TCP,UDP} header. The purpose of the Jumbo
Probe is to determine whether the entire path from the source to
the destination is jumbo-capable (i.e., one in which all links
recognize jumbograms and configure an MTU larger than 65535
octets) as well as to determine the jumbo path MTU.The source sets the Jumbo Probe PMTU to the full 32-bit MTU
of the (jumbo-capable) next hop link, (and for IPv4 sets Code to
255 and Check to the next hop TTL) then calculates the UDP Checksum
and sends the probe via the link toward the final destination. At
each IPv4 forwarding hop, the router examines Code and Check and
returns a negative "Jumbo Reply" (i.e., prepared the same as a
Parcel Reply) if either value is incorrect. Otherwise, if the
next hop link is jumbo-capable the router compares PMTU to the
next hop link MTU, resets PMTU to the minimum value (and for
IPv4 sets Check to the next hop TTL) then silently forwards the
probe to the next hop. If the next hop link is not jumbo-capable,
the router instead drops the probe and returns a negative
Jumbo Reply.If the Jumbo Probe encounters an OMNI link, the OAL source can
either drop the probe and return a negative Jumbo Reply or forward
the probe further toward the OAL destination using adaptation layer
encapsulation. If the OAL source already knows the OAL path MTU
for this OAL destination, it can encapsulate and forward the Jumbo
Probe with PMTU set to the minimum of itself and the known value
(minus the adaptation layer header size), and without adding any
padding octets. If the OAL path MTU is unknown, the OAL source
can instead encapsulate the Jumbo Probe in an adaptation layer
IPv6 header with a Jumbo Payload option and with NULL padding
octets added beyond the end of the encapsulated Jumbo Probe to
form an adaptation layer jumbogram no larger than the minimum
of PMTU and (2*24 - 1) octets (minus the adaptation layer header
size). The OAL source then writes this size into the Jumbo Probe
PMTU field and forwards the newly-created adaptation layer
jumbogram toward the OAL destination, where it may be lost
due to a link restriction. If the jumbogram somehow traverses
the path, the OAL destination then removes the adaptation layer
encapsulation, discards the padding, then forwards the probe
onward toward the final destination (with each hop reducing
PMTU if necessary).If the Jumbo Probe reaches the final destination, the final
destination returns a positive Jumbo Reply with the PMTU set to
the maximum-sized jumbogram that can transit the path. (Note that
the jumbo probing process is conducted independently of any parcel
probing, and the two processes may yield very different results.)Note: if the OAL path MTU is unknown but the OAL source can in
some way determine that the path is capable of transiting very large
jumbograms, it MAY encapsulate a Jumbo Probe to form an adaptation
layer jumbogram larger than (2*24 - 1) octets with the understanding
that the time for the probe to transit the path may be considerable.Common widely-deployed implementations include services such as TCP
Segmentation Offload (TSO) and Generic Segmentation/Receive Offload
(GSO/GRO). These services support a robust service that has been
shown to improve performance in many instances.UDP/IPv4 parcels have been implemented in the linux-5.10.67 kernel and
ION-DTN ion-open-source-4.1.0 source distributions. Patch distribution
found at: "https://github.com/fltemplin/ip-parcels.git".Performance analysis with a single-threaded receiver has shown that
including increasing numbers of segments in a single parcel produces
measurable performance gains over fewer numbers of segments due to more
efficient packaging and reduced system calls/interrupts. For example,
sending parcels with 30 2000-octet segments shows a 48% performance
increase in comparison with ordinary IP packets with a single
2000-octet segment.Since performance is strongly bounded by single-segment receiver
processing time (with larger segments producing dramatic performance
increases), it is expected that parcels with increasing numbers of
segments will provide a performance multiplier on multi-threaded
receivers in parallel processing environments.The IANA is instructed to change the "MTUP - MTU Probe" entry in the
'ip option numbers' registry to the "JUMBO - IPv4 Jumbo Payload" option.
The Copy and Class fields must both be set to 0, and the Number and
Value fields must both be set to '11'. The reference must be changed to
this document [RFCXXXX].In the control plane, original sources match the Identification
values in received Parcel Replys with their corresponding Parcels
or Parcel Probes. If the values match, the reply is likely authentic.
In environments where stronger authentication is necessary, nodes
that send Parcel Replys can apply the message authentication
services specified for AERO/OMNI.In the data plane, multi-layer security solutions may be needed
to ensure confidentiality, integrity and availability. Since parcels
are defined only for TCP and UDP, IP layer securing services such as
IPsec-AH/ESP cannot be applied directly to
parcels, although they can certainly be used at lower layers such as
for transmission of parcels over VPNs and/or OMNI link secured
spanning trees. Since the IP layer does not manipulate segments
exchanged with upper layers, parcels do not interfere with
transport- or higher-layer security services such as (D)TLS/SSL
which may provide greater flexibility in
some environments.Further security considerations related to IP parcels are found
in the AERO/OMNI specifications.This work was inspired by ongoing AERO/OMNI/DTN investigations. The
concepts were further motivated through discussions on the IETF intarea
and 6man lists as well as with Boeing colleagues.A considerable body of work over recent years has produced useful
"segmentation offload" facilities available in widely-deployed
implementations.In the 1980's, the early days of Internetworking gave way to a need
for improved performance with the advent of networked storage, diskless
workstations, etc. The need drove a concerted effort in the industry to
pursue performance optimizations at all layers. All who took part in
efforts to advance the Internet to the current high-performance state
we enjoy today are acknowledged.Accelerating UDP packet transmission for QUIC,
https://blog.cloudflare.com/accelerating-udp-packet-transmission-for-quic/BIG TCP, Netdev 0x15 Conference (virtual),
https://netdevconf.info/0x15/session.html?BIG-TCPTCP Extensions for High Performance are specified in , which updates earlier work that began in the late
1980's and early 1990's. These efforts determined that the TCP 16-bit
Window was too small to accommodate sustained transmission at high
data rates and devised a TCP Window Scale option to allow window
sizes up to 2**30. The work also defined a Timestamp option used
for round-trip time measurements and as a Protection Against Wrapped
Sequences (PAWS) at high data rates. TCP users of IP parcels are
strongly encouraged to adopt these measures.Since TCP/IP parcels only include control bits for the first
segment ("segment(0)"), nodes must regard all other segments of the
same parcel as data segments. When a node breaks a TCP/IP parcel out
into individual packets or sub-parcels, only the first packet/sub-parcel
contains the original segment(0) and therefore only its TCP header
retains the control bit settings from the original parcel TCP header.
If the original TCP header included TCP options such as Maximum Segment
Size (MSS), Window Scale (WS) and/or Timestamp, the node copies those
same options into the options section of the new TCP header.For all other packets/sub-parcels, the note sets all TCP header
control bits to '0' as data segment(s). Then, if the original parcel
contained a Timestamp option, the node copies the Timestamp option
into the options section of the new TCP header. Appendix A of
provides implementation guidelines for
the Timestamp option layout.Appendix A of also discusses Interactions
with the TCP Urgent Pointer as follows: "if the Urgent Pointer
points beyond the end of the TCP data in the current segment, then
the user will remain in urgent mode until the next TCP segment arrives.
That segment will update the Urgent Pointer to a new offset, and the
user will never have left urgent mode". In the case of IP parcels,
however, it will often be the case that the "next TCP segment" is
included in the same (sub-)parcel as the segment that contained
the urgent pointer such that the urgent pointer can be updated
immediately.Finally, if the parcel contains more than 65535 octets of data
(i.e., spread across multiple segments), then the Urgent Pointer
can be regarded in the same manner as for jumbograms as described
in Section 5.2 of .Both historic and modern-day data links configure Maximum Transmission
Units (MTUs) that are far smaller than the desired state for IP parcel
transmission futures. When the first Ethernet data links were deployed
many decades ago, their 1500 octet MTU set a strong precedent that was
widely adopted. This same size now appears as the predominant MTU limit
for most paths in the Internet today, although modern link deployments
with MTUs as large as 9KB have begun to emerge.In the late 1980's, the Fiber Distributed Data Interface (FDDI)
standard defined a new link type with MTU slightly larger than 4500
octets. The goal of the larger MTU was to increase performance by a
factor of 10 over the ubiquitous 10Mbps and 1500-octet MTU Ethernet
technologies of the time. Many factors including a failure to harmonize
MTU diversity and an Ethernet performance increase to 100Mbps led to
poor FDDI market reception. In the next decade, the 1990's saw new
initiatives including ATM/AAL5 (9KB MTU) and HiPPI (64KB MTU) which
offered high-speed data link alternatives with larger MTUs but again
the inability to harmonize diversity derailed their momentum. By the
end of the 1990s and leading into the 2000's, emergence of the 1Gbps,
10Gbps and even faster Ethernet performance levels seen today has
obscured the fact that the modern Internet of the 21st century is
still operating with 20th century MTUs!To bridge this gap, increased OMNI interface deployment in the
near future will provide a virtual link type that can
pass IP parcels over paths that traverse traditional data links with
small MTUs. Performance analysis has proven that (single-threaded)
receive-side performance is bounded by upper layer protocol segment
size, with performance increasing in direct proportion with segment
size. Experiments have also shown measurable (single-threaded) performance
increases by including larger numbers of segments per parcel, with steady
increases for including increasing number of segments. However, parallel
receive-side processing will provide performance multiplier benefits
since the multiple segments that arrive in a single parcel can be
processed simultaneously instead of serially.In addition to the clear near-term benefits, IP parcels will increase
performance to new levels as future parcel-capable links with very
large MTUs begin to emerge. These links will provide MTUs far in excess
of 64KB to as large as 16MB. With such large MTUs, the traditional CRC-32
(or even CRC-64) error checking with errored packet discard discipline
will no longer apply for large parcels. Instead, parcels larger than a
link-specific threshold will include Forward Error Correction (FEC)
codes so that errored parcels can be repaired at the receiver's data
link layer then delivered to upper layers rather than being discarded
and triggering retransmission of large amounts of data. Even if the
FEC repairs are incomplete or imperfect, all parcels can still be
delivered to upper layers where the individual segment checksums
will detect and discard any damaged data not repaired by lower layers.These new "super-links" will appear mostly in the network edges
(e.g., high-performance data centers) and not as often in the middle
of the Internet. (However, some space-domain links that
extend over enormous distances may also benefit.) For this reason, a
common use case will include parcel-capable super-links in the edge
networks of both parties of an end-to-end session with an OMNI link
connecting the two over wide area Internetworks. Medium- to moderately
large-sized IP parcels over OMNI links will already provide considerable
performance benefits for wide-area end-to-end communications while truly
large IP parcels over super-links can provide boundless increases for
localized bulk transfers in edge networks or for deep space long haul
transmissions. The ability to grow and adapt without practical bound
enabled by IP parcels will inevitably encourage new data link
development leading to future innovations in new markets that will
revolutionize the Internet.Until these new links begin to emerge, however, parcels will already
provide a tremendous benefit to end systems by allowing applications to
send and receive segment buffers larger than 65535 octets in a single
system call. By expanding the current operating system call data copy
limit from its current 16-bit length to a 32-bit length, applications
will be able to send and receive maximum-length parcel buffers even if
lower layers need to break them into multiple parcels to fit within the
underlying interface MTU. For applications such as the Delay Tolerant
Networking (DTN) Bundle Protocol , this will
allow applications to send and receive entire large upper layer
protocol constructs (such as DTN bundles) in a single system call.<< RFC Editor - remove prior to publication >>Changes from earlier versions:Submit for Intarea Standards Track RFC Publication.