QUIC                                                         B. Trammell
Internet-Draft                                                ETH Zurich
Intended status: Informational                           August 20, 2018
Expires: February 21, 2019


         Why do we need passive measurement of round trip time?
                   draft-trammell-why-measure-rtt-00

Abstract

   This document describes the utility of passive two-way latency
   measurement, both for the generation of latency metrics, as well as
   for other measurement tasks, when passive latency measurement is the
   only facility available for measurement.  It additionally discusses
   other metrics derivable from the transport-independent latency spin
   signal defined in [TSVWG-SPIN].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on February 21, 2019.

Copyright Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of


Trammell                Expires February 21, 2019               [Page 1]

Internet-Draft              Why measure RTT?                 August 2018


   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  About This Document . . . . . . . . . . . . . . . . . . .   3
   2.  Direct Utility of Passive RTT Measurement . . . . . . . . . .   3
     2.1.  Inter-domain Troubleshooting  . . . . . . . . . . . . . .   3
     2.2.  Bufferbloat Mitigation in Cellular Networks . . . . . . .   4
     2.3.  Locating WiFi Problems in Home Networks . . . . . . . . .   4
     2.4.  Internet Measurement Research . . . . . . . . . . . . . .   5
   3.  Indirect Utility of RTT Measurements  . . . . . . . . . . . .   5
   4.  Additional Metrics Derivable from the Spin Bit  . . . . . . .   6
     4.1.  Derived Loss and Reordering . . . . . . . . . . . . . . .   6
     4.2.  Two-Point Intradomain Measurement . . . . . . . . . . . .   7
   5.  Contributors  . . . . . . . . . . . . . . . . . . . . . . . .   7
   6.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .   8
   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   8
     7.1.  Normative References  . . . . . . . . . . . . . . . . . .   8
     7.2.  Informative References  . . . . . . . . . . . . . . . . .   8
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   9

1.  Introduction

   Latency is a key metric to understanding network operation and
   performance.  Passive measurement allows inspection of latency on
   productive traffic, avoiding problems with different treatment of
   productive and measurement traffic, and enables opportunistic
   measurement of latency without active measurement overhead.

   Passive measurement of RTT, in particular, has both direct utility
   (see Section 2), generating RTT samples for the measurement of RTT
   for various use cases, as well as indirect utility (see Section 3),
   since RTT is correlated with other useful metrics.  In addition, the
   passive latency signal proposed in [TSVWG-SPIN] provides other
   opportunities for metric generation which are a consequence of its
   design (see Section 4).

   This document describes these use cases in order to motivate why
   passive measurability of RTT on a per-flow basis is an interesting
   and useful feature for a transport protocol to have.  In the absence
   of other directly observable metrics such as loss and retransmission,
   as is the case with protocols with mostly-encrypted wire images
   [WIRE-IMAGE] such as QUIC [QUIC].


Trammell                Expires February 21, 2019               [Page 2]

Internet-Draft              Why measure RTT?                 August 2018


1.1.  About This Document

   This document is maintained in the GitHub repository
   https://github.com/britram/draft-trammell-tsvwg-spin, and the
   editor's copy is available online at https://britram.github.io/draft-
   trammell-tsvwg-spin.  Current open issues on the document can be seen
   at https://github.com/britram/draft-trammell-tsvwg-spin/issues.
   Comments and suggestions on this document can be made by filing an
   issue there, or by contacting the editor.

   This document is based in part on [QUIC-SPIN], however, aside from
   Section 4, it is not specific to the spin bit proposal.

2.  Direct Utility of Passive RTT Measurement

   RTT measurement generates two-way latency metric samples; these
   samples are useful in many measurement tasks which directly require
   latency data.  The measurement methodologies using two-way latency
   measurement samples follow one of a few basic variants:

   o  The RTT evolution of a flow or a set of flows can be compared to
      baseline or expected RTT measurements for flows with the same
      characteristics in order to detect or localize latency issues in a
      specific network.

   o  The RTT evolution of a single flow can also be examined in detail
      to diagnose performance issues with that flow.

   o  Samples of RTT for a flow aggregate (e.g., all flows between two
      given networks) can be used without regard to temporal evolution
      of the RTT, in order to examine the distribution of RTTs for a
      group of flows that should have similar RTT (e.g., because they
      should share the same path(s)).

2.1.  Inter-domain Troubleshooting

   Network access providers are often the first point of contact by
   their customers when network problems impact the performance of
   bandwidth-intensive and latency-sensitive applications such as video,
   regardless of whether the root cause lies within the access
   provider's network, the service provider's network, on the Internet
   paths between them, or within the customer's own network.

   Points on path can extract spatial delay metric samples [RFC6049]
   from fields of the transport layer (e.g.  TCP) or application layer
   (e.g.  RTP).  The information is captured in the upper layer because
   neither the IP header nor the UDP layer includes fields allowing the
   measurement of upstream and downstream delay.


Trammell                Expires February 21, 2019               [Page 3]

Internet-Draft              Why measure RTT?                 August 2018


   Local network performance problems are detected with monitoring tools
   which observe the variation of upstream latency and downstream
   latency.

   Inter-domain troubleshooting relies on the same metrics but is not a
   proactive task; instead, it is a recursive process which hones in on
   the domain and link responsible for the failure.  In practice, inter-
   domain troubleshooting is a communication process between the Network
   Operations Center (NOC) teams of the networks on the path, because
   the root cause of a problem is rarely located on a single network,
   and requires cooperation and exchange of data between the NOCs.

   One example is the troubleshooting performance degradation resulting
   from a change of routing policy on one side of the path which
   increases queueing on the other side of the path.

2.2.  Bufferbloat Mitigation in Cellular Networks

   Cellular networks consist of multiple Radio Access Networks (RAN)
   where mobile devices are attached to base stations.  It is common
   that base stations from different vendors and different generations
   are deployed in the same cellular network.

   Due to the dynamic nature of RANs, base stations have typically been
   provisioned with large buffers to maximize throughput despite rapid
   changes in capacity.  As a side effect, bufferbloat has become a
   common issue in such networks [WWMM-BLOAT].

   An effective way of mitigating bufferbloat without sacrificing too
   much throughput is to deploy Active Queue Management (AQM) in
   bottleneck routers and base stations.  However, due to the variation
   in deployed base-stations it is not always possible to enable AQM at
   the bottlenecks, without massive infrastructure investments.

   An alternative approach is to deploy AQM as a network function in a
   more centralized location than the traditional bottleneck nodes.
   Such an AQM monitors the RTT progression of flows and drops or marks
   packets when the measured latency is indicative of congestion.  Such
   a function also has the possibility to detect misbehaving flows and
   reduce the negative impact they have on the network.

2.3.  Locating WiFi Problems in Home Networks

   Many residential networks use WiFi (802.11) on the last segment, and
   WiFi signal strength degradation manifests in high first-hop delay,
   due to the fact that the MAC layer will retransmit packets lost at
   that layer.  Measuring the RTT between endpoints on the customer
   network and parts of the service provider's own infrastructure (which


Trammell                Expires February 21, 2019               [Page 4]

Internet-Draft              Why measure RTT?                 August 2018


   have predictable delay characteristics) can be used to isolate this
   cause of performance problems.

   The network provider can measure the RTT at the home gateway, or at
   an upstream point if there is no access to home gateway.  A problem
   in the WiFi network is identified by seeing high delay and low packet
   loss.

   These measurements are particularly useful for traffic which is
   latency sensitive, such as interactive video applications.  However,
   since high latency is often correlated with other network-layer
   issues such as chronic interconnect congestion [IMC-CONGESTION], it
   is useful for general troubleshooting of network layer issues in an
   interdomain setting.

   In this case, multiple RTT samples per flow are useful less for
   observing intraflow behavior, and more for generating sufficient
   samples for a given aggregate to make a high-quality measurement.

2.4.  Internet Measurement Research

   As a large, distributed, engineered system with no centralized
   control, the Internet has emergent properties of interest to the
   research community not just for purely scientific curiosity, but also
   to provide applicable guidance to Internet engineering, Internet
   protocol design and development, network operations, and policy
   development.  Latency measurements in particular are both an active
   area of research as well as an important tool for certain measurement
   studies (see, e.g.  [IMC-TCPSIG], from the most recent Internet
   Measurement Conference).  While much of this work is currently done
   with active measurements, the ability to generate latency samples
   passively or using a hybrid measurement approach (i.e., through
   passive observation of purpose-generated active measurement traffic;
   see [RFC7799]) can drastically increase the efficiency and
   scalability of these studies.

3.  Indirect Utility of RTT Measurements

   In addition to the direct generation of RTT metric samples, RTT
   measurement can also be used for indirect generation of other metrics
   when more direct means are not available.

   A variety of tools are used for detailed troubleshooting of the
   performance of single flows, both for debugging transport- and
   application-layer protocol implementations, as well as to determine
   whether a particular end-to-end performance issue is related to
   particular network conditions.  One common type of visualization used
   for TCP (implemented, for example, in the TCP Stream Graphs feature


Trammell                Expires February 21, 2019               [Page 5]

Internet-Draft              Why measure RTT?                 August 2018


   of Wireshark, https://www.wireshark.org/) shows the development over
   time of the sequence and acknowledgment numbers, including
   retransmissions, and the evolution of the inflight and receiver flow
   control windows over time.  By analyzing the relationship among loss,
   latency, and throughput, the precise cause of an observed performance
   on a given flow can be determined.

   While RTT measurements on their own are not enough to drive such a
   visualization, many similar techniques can be built on high-
   resolution time series RTT data.  Here we exploit two properties of
   transport protocols:

   o  The size of the inflight window is equal to the number of bytes/
      packets sent per RTT, so inflight window evolution can be
      generated at each RTT sample r at t, and summing the number of
      bytes/packets sent between t - r and t.

   o  Changes in the inflight window can be related to sender reactions
      to congestion.  For common loss- and ECN-based congestion control
      protocols such as NewReno [RFC6582] and Cubic [RFC8312], inflight
      window reductions are correlated with sender-experienced
      congestion or loss.

   Inflight window evolution over time, together with heuristic
   assumptions about server behavior, can go a long way toward replacing
   direct visibility of transport protocol dynamics (sequence and
   acknowledgment number seqence over time) for encrypted transports;
   the exact details of this are a subject of present and future
   research.

4.  Additional Metrics Derivable from the Spin Bit

   The latency spin signal mechanism itself [TSVWG-SPIN] has additional
   measurement utility; these observations do not apply to other
   methodologies for measuring RTT.

4.1.  Derived Loss and Reordering

   When used alone (as a one-bit signal), measurement systems using the
   latency spin bit must use heuristics to reject samples which are
   potentially-lost, potentially-reordered, or potentially-delayed.
   When these heuristics are instrumented to note their sample rejection
   rate, this rate itself is a potentially-useful proxy metric for
   "difficulty" (vaguely defined) experienced by a flow.

   When the latency signal is used with the Valid Edge Counter (VEC),
   additional information is available in the wire image to reject
   samples due to loss, delay, or reordering.  Analysis of the VEC


Trammell                Expires February 21, 2019               [Page 6]

Internet-Draft              Why measure RTT?                 August 2018


   together with the series of spin bit values can be used to recognize
   single loss and reordering events, which can be used to generate loss
   and reordering metrics at the resolution of the flow's round trip
   time.  Optimal use of the VEC signal to generate loss and reordering
   metric signals is a subject of ongoing research.

4.2.  Two-Point Intradomain Measurement

   The spin bit is also useful as a basic signal for instantaneous
   measurement of the treatment of traffic carrying the latency spin
   signal within a single network.  Though the primary design goal of
   the spin bit signal is to enable single-observer on-path measurement
   of end-to-end RTT, the spin bit can also be used by two cooperating
   observers with access to traffic flowing in the same direction as an
   alternate marking signal, as described in [ALT-MARK].  The only
   difference from alternate marking with a generated signal is that the
   size of the alternation will change with the flight size each RTT.
   However, these changes do not affect the applicability of the method
   that works for each marking batch separately applied between two
   measurement points on the same direction.  This two point measurement
   is an additional feature enabled "for free" by the spin bit signal.

   So, with more than one observer on the same direction, it can be
   useful to segment the RTT and deduce the contribution to the RTT of
   the portion of the network between two on-path observers.  This can
   be easily performed by calculating the delay between two or more
   measurement points on a single direction by applying [ALT-MARK].  In
   this way, packet loss, delay and delay variation can be measured for
   each segment of the network depending on the number and distribution
   of the available on-path observation points.  When these observation
   points are applied at network borders, the alternate-marking signal
   can be used to measure the performance of QUIC traffic within a
   network operator's own domain of responsibility. own portion of the
   network.

5.  Contributors

   This document contains text from [QUIC-SPIN], which is the work of
   the following authors in addition to the editor of this document:

   o  Piet De Vaere, ETH Zurich

   o  Roni Even, Huawei

   o  Giuseppe Fioccola, Telecom Italia

   o  Thomas Fossati, Nokia


Trammell                Expires February 21, 2019               [Page 7]

Internet-Draft              Why measure RTT?                 August 2018


   o  Marcus Ihlar, Ericsson

   o  Al Morton, AT&T Labs

   o  Emile Stephan, Orange

6.  Acknowledgments

   Thanks to Mark Nottingham for suggesting that this document should
   exist.

   This work is partially supported by the European Commission under
   Horizon 2020 grant agreement no. 688421 Measurement and Architecture
   for a Middleboxed Internet (MAMI), and by the Swiss State Secretariat
   for Education, Research, and Innovation under contract no. 15.0268.
   This support does not imply endorsement.

7.  References

7.1.  Normative References

   [TSVWG-SPIN]
              Trammell, B., "A Transport-Independent Explicit Signal for
              Hybrid RTT Measurement", draft-trammell-tsvwg-spin-00
              (work in progress), July 2018.

7.2.  Informative References

   [ALT-MARK]
              Fioccola, G., Capello, A., Cociglio, M., Castaldelli, L.,
              Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi,
              "Alternate Marking method for passive and hybrid
              performance monitoring", draft-ietf-ippm-alt-mark-14 (work
              in progress), December 2017.

   [IMC-CONGESTION]
              Luckie, M., Dhamdhere, A., Clark, D., Huffaker, B., and k.
              claffy, "Challenges in Inferring Internet Interdomain
              Congestion (in Proc. ACM IMC 2014)", November 2014.

   [IMC-TCPSIG]
              Sundaresan, S., Dhamdhere, A., Allman, M., and . k claffy,
              "TCP Congestion Signatures (in Proc. ACM IMC 2017)", n.d..

   [QUIC]     Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed
              and Secure Transport", draft-ietf-quic-transport-14 (work
              in progress), August 2018.


Trammell                Expires February 21, 2019               [Page 8]

Internet-Draft              Why measure RTT?                 August 2018


   [QUIC-SPIN]
              Trammell, B., Vaere, P., Even, R., Fioccola, G., Fossati,
              T., Ihlar, M., Morton, A., and S. Emile, "Adding Explicit
              Passive Measurability of Two-Way Latency to the QUIC
              Transport Protocol", draft-trammell-quic-spin-03 (work in
              progress), May 2018.

   [RFC6049]  Morton, A. and E. Stephan, "Spatial Composition of
              Metrics", RFC 6049, DOI 10.17487/RFC6049, January 2011,
              <https://www.rfc-editor.org/info/rfc6049>.

   [RFC6582]  Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The
              NewReno Modification to TCP's Fast Recovery Algorithm",
              RFC 6582, DOI 10.17487/RFC6582, April 2012,
              <https://www.rfc-editor.org/info/rfc6582>.

   [RFC7799]  Morton, A., "Active and Passive Metrics and Methods (with
              Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799,
              May 2016, <https://www.rfc-editor.org/info/rfc7799>.

   [RFC8312]  Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and
              R. Scheffenegger, "CUBIC for Fast Long-Distance Networks",
              RFC 8312, DOI 10.17487/RFC8312, February 2018,
              <https://www.rfc-editor.org/info/rfc8312>.

   [WIRE-IMAGE]
              Trammell, B. and M. Kuehlewind, "The Wire Image of a
              Network Protocol", draft-trammell-wire-image-04 (work in
              progress), April 2018.

   [WWMM-BLOAT]
              Alfredsson, S., Giudice, G., Garcia, J., Brunstrom, A.,
              Cicco, L., and S. Mascolo, "Impact of TCP Congestion
              Control on Bufferbloat in Cellular Networks (in Proc. IEEE
              WoWMoM 2013)", June 2013.

Author's Address

   Brian Trammell
   ETH Zurich

   Email: ietf@trammell.ch


Trammell                Expires February 21, 2019               [Page 9]