The QUIC Loss Bits
Orange Labs
2 av P.Marzin
Lannion
France
alexandre.ferrieux@orange.com
Orange Labs
2 av P.Marzin
Lannion
France
isabelle.hamchaoui@orange.com
Akamai Technologies
150 Broadway
Cambridge
MA
1122
USA
ilubashe@akamai.com
Loss Signaling
QUIC
Internet-Draft
This draft adapts the general technique described in
draft-ferrieuxhamchaoui-tsvwg-lossbits for QUIC using reserved bits in
QUIC v1 header. It describes a method that employs two bits to allow
endpoints to signal packet loss in a way that can be used by network
devices to measure and locate the source of the loss.
Packet loss is a hard and pervasive problem of day-to-day network
operation, and proactively detecting, measuring, and locating it is
crucial to maintaining high QoS and timely resolution of crippling
end-to-end throughput issues. To this effect, in a TCP-dominated
world, network operators have been heavily relying on information
present in the clear in TCP headers: sequence and acknowledgment
numbers, and SACK when enabled. These allow for quantitative
estimation of packet loss by passive on-path
observation. Additionally, the lossy segment (upstream or downstream
from the observation point) can be quickly identified by moving the
passive observer around.
With QUIC, the equivalent transport headers are encrypted and passive
packet loss observation is not possible, as described in
.
QUIC could be routed by the network differently and the fraction of
Internet traffic delivered using QUIC is increasing every
year. Therefore, is it imperative to measure packet loss experienced
by QUIC users directly instead of relying on measuring TCP loss
between similar endpoints.
Since explicit path signals are preferred by , this
document proposes adding two explicit loss bits to the clear portion
of short headers to restore network operators’ ability to maintain
high QoS for QUIC users.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”,
“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this
document are to be interpreted as described in .
The proposal introduces two bits that are to be present in packets
with a short header. Therefore, only loss of short header packets is
reported using loss bits. Whenever this specification refers to
packets, it is referring only to packets with short headers.
Q: The “sQuare signal” bit is toggled every N outgoing packets as
explained below in .
L: The “Loss event” bit is set to 0 or 1 according to the Unreported
Loss counter, as explained below in .
Each endpoint maintains appropriate counters independently and
separately for each connection 4-tuple and destination Connection ID.
The sQuare Value is initialized to the Initial Q Value (0 or 1) and is
reflected in the Q bit of every outgoing packet. The sQuare value is
inverted after sending every N packets (Q Period is 2*N), where N is a
parameter of the method, discussed below.
Observation points can estimate the upstream losses by counting the
number of packets during a half period of the square signal, as
described in .
The Unreported Loss counter is initialized to 0, and the L bit of
every outgoing packet indicates whether the Unreported Loss counter is
positive (L=1 if the counter is positive, and L=0 otherwise).
The value of the Unreported Loss counter is decremented every time a
packet with L=1 is sent.
The value of the Unreported Loss counter is incremented for every
packet that the protocol declares lost, using QUIC’s existing loss
detection machinery.
Observation points can estimate the end-to-end loss, as determined by
the upstream endpoint’s loss detection machinery, by counting packets
in this direction with a L bit equal to 1, as described in .
There are three sources of observable loss:
upstream loss - loss between the sender and the observation point
()
downstream loss - loss between the observation point and the
destination ()
observer loss - loss by the observer itself that does not cause
downstream loss ()
The upstream and downstream loss together constitute end-to-end loss
().
The Q and L bits allow detection and measurement of the types of loss
listed above.
The Loss Event bit allows an observer to calculate the end-to-end loss
rate by counting packets with L bit value of 0 and 1 for a given
connection. The end-to-end loss rate is the fraction of packets with
L=1.
The simplifying assumption here is that upstream loss affects packets
with L=0 and L=1 equally. This may be a simplification, if some loss
is caused by tail-drop in a network device. If the sender congestion
controller reduces the packet send rate after loss, there may be a
sufficient delay before sending packets with L=1 that they have a
greater chance of arriving at the observer.
Blocks of N (half of Q Period) consecutive packets are sent with the
same value of the Q bit, followed by another block of N packets with
inverted value of the Q bit. Hence, knowing the value of N, an on-path
observer can estimate the amount of loss after observing at least N
packets. The upstream loss rate is one minus the average number of
packets in a block of packets with the same Q value divided by N.
The observer needs to be able to tolerate packet reordering that can
blur the edges of the square signal.
The observer also needs to differentiate packets as belonging to
different connections, since they use independent counters.
The choice of N strikes a compromise: the observation could become too
unreliable in case of packet reordering and loss if N is too small;
and when N is too large, short connections may not yield a useful
upstream loss measurement.
To leave some room for adaptation, we only constrain the sender to
select an N that is (1) constant for a given connection and (2) equal
to a power of two. The latter allows on-path observers to derive N
after a few periods. It is thus also acceptable for a simple
implementation to choose a global constant; N=64 has been extensively
tried in large-scale field tests and yielded good results.
Upstream loss is calculated by observing the actual packets that did
not suffer the upstream loss. End-to-end loss, however, is calculated
by observing subsequent packets after the sender’s protocol detected
the loss. Hence, end-to-end loss is generally observed with a delay
of between 1 RTT (loss declared due to multiple duplicate
acknowledgments) and 1 RTO (loss declared due to a timeout) relative
to the upstream loss.
The connection RTT can sometimes be estimated by timing protocol
handshake messages. This RTT estimate can be greatly improved by
observing a dedicated protocol mechanism for conveying RTT
information, such as the Latency Spin bit of .
Whenever the observer needs to perform a computation that uses both
upstream and end-to-end loss rate measurements, it SHOULD use upstream
loss rate leading the end-to-end loss rate by approximately 1 RTT. If
the observer is unable to estimate RTT of the connection, it should
accumulate loss measurements over time periods of at least 4 times the
typical RTT for the observed connections.
If the calculated upstream loss rate exceeds the end-to-end loss rate
calculated in , then either the Q Period is too short
for the amount of packet reordering or there is observer loss,
described in . If this happens, the observer SHOULD
adjust the calculated upstream loss rate to match end-to-end loss
rate.
Because downstream loss affects only those packets that did not suffer
upstream loss, the end-to-end loss rate (e) relates to the upstream
loss rate (u) and downstream loss rate (d) as
(1-u)(1-d)=1-e. Hence, d=(e-u)/(1-u).
A typical deployment of a passive observation system includes a
network tap device that mirrors network packets of interest to a
device that performs analysis and measurement on the mirrored
packets. The observer loss is the loss that occurs on the mirror path.
Observer loss affects upstream loss rate measurement since it causes
the observer to account for fewer packets in a block of identical Q
bit values (see {{upstreamloss)}). The end-to-end loss rate
measurement, however, is unaffected by the observer loss, since it is
a measurement of the fraction of packets with the set L bit value, and
the observer loss would affect all packets equally (see
).
The need to adjust the upstream loss rate down to match end-to-end
loss rate as described in is a strong indication
of the observer loss, whose magnitude is between the amount of such
adjustment and the entirety of the upstream loss measured in
.
Accurate loss information is not critical to the operation of any
protocol, though its presence for a sufficient number of connections
is important for the operation of the networks.
The loss bits are amenable to “greasing” described in , if
the protocol designers are not ready to dedicate (and ossify) bits
used for loss reporting to this function. The greasing could be
accomplished similarly to the Latency Spin bit greasing in
. Namely, implementations could decide that a
fraction of connections should not encode loss information in the loss
bits and, instead, the bits would be set to arbitrary values. The
observers would need to be ready to ignore connections with loss
information more resembling noise than the expected signal.
Passive loss observation has been a part of the network operations for
a long time, so exposing loss information to the network does not add
new security concerns.
Guarding user’s privacy is an important goal for modern protocols and
protocol extensions per . While an explicit loss signal
– a preferred way to share loss information per – helps
to minimize unintentional exposure of additional information,
implementations of loss reporting must ensure that loss information
does not compromise protocol’s privacy goals.
For example, allows changing Connection IDs in the
middle of a connection to reduce the likelihood of a passive observer
linking old and new subflows to the same device. A QUIC implementation
would need to reset all counters when it changes Connection ID used
for outgoing packets. It would also need to avoid incrementing
Unreported Loss counter for loss of packets sent with a different
Connection ID.
This document makes no request of IANA.
The sQuare Bit was originally specified by Kazuho Oku in early
proposals for loss measurement, and is an instance of the “alternate
marking” as defined in .
Transport Protocol Path Signals
This document discusses the nature of signals seen by on-path elements examining transport protocols, contrasting implicit and explicit signals. For example, TCP's state machine uses a series of well-known messages that are exchanged in the clear. Because these are visible to network elements on the path between the two nodes setting up the transport connection, they are often used as signals by those network elements. In transports that do not exchange these messages in the clear, on-path network elements lack those signals. Often, the removal of those signals is intended by those moving the messages to confidential channels. Where the endpoints desire that network elements along the path receive these signals, this document recommends explicit signals be used.
Key words for use in RFCs to Indicate Requirement Levels
In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.
Alternate-Marking Method for Passive and Hybrid Performance Monitoring
This document describes a method to perform packet loss, delay, and jitter measurements on live traffic. This method is based on an Alternate-Marking (coloring) technique. A report is provided in order to explain an example and show the method applicability. This technology can be applied in various situations, as detailed in this document, and could be considered Passive or Hybrid depending on the application.
The Impact of Transport Header Confidentiality on Network Operation and Evolution of the Internet
This document describes some implications of applying end-to-end encryption at the transport layer. It first identifies in-network uses of transport layer header information. Then, it reviews some implications of developing end-to-end transport protocols that use encryption to provide confidentiality of the transport protocol headers, or that use authentication to protect the integrity of transport header information. Since measurement and analysis of the impact of network characteristics on transport protocols has been important to the design of current transports, it also considers the impact of transport encryption on transport and application evolution.
QUIC: A UDP-Based Multiplexed and Secure Transport
This document defines the core of the QUIC transport protocol. Accompanying documents describe QUIC's loss detection and congestion control and the use of TLS for key negotiation. Note to Readers Discussion of this draft takes place on the QUIC working group mailing list (quic@ietf.org), which is archived at <https://mailarchive.ietf.org/arch/search/?email_list=quic>. Working Group information can be found at <https://github.com/ quicwg>; source code and issues list for this draft can be found at <https://github.com/quicwg/base-drafts/labels/-transport>.
Applying GREASE to TLS Extensibility
This document describes GREASE (Generate Random Extensions And Sustain Extensibility), a mechanism to prevent extensibility failures in the TLS ecosystem. It reserves a set of TLS protocol values that may be advertised to ensure peers correctly handle unknown values.
Application-Layer Traffic Optimization (ALTO) Protocol
Applications using the Internet already have access to some topology information of Internet Service Provider (ISP) networks. For example, views to Internet routing tables at Looking Glass servers are available and can be practically downloaded to many network application clients. What is missing is knowledge of the underlying network topologies from the point of view of ISPs. In other words, what an ISP prefers in terms of traffic optimization -- and a way to distribute it.The Application-Layer Traffic Optimization (ALTO) services defined in this document provide network information (e.g., basic network location structure and preferences of network paths) with the goal of modifying network resource consumption patterns while maintaining or improving application performance. The basic information of ALTO is based on abstract maps of a network. These maps provide a simplified view, yet enough information about a network for applications to effectively utilize them. Additional services are built on top of the maps.This document describes a protocol implementing the ALTO services. Although the ALTO services would primarily be provided by ISPs, other entities, such as content service providers, could also provide ALTO services. Applications that could use the ALTO services are those that have a choice to which end points to connect. Examples of such applications are peer-to-peer (P2P) and content delivery networks.