Internet Engineering Task Force                              Mark Allman
INTERNET DRAFT                                                 ICIR/ICSI
draft-allman-rto-backoff-01.txt                            Ethan Blanton
Expires: March 2006                                    Purdue University
                                                            Josh Blanton
                                                         Ohio University
                                                            October 2005


   Using Spurious Retransmissions to Adapt the Retransmission Timeout
                    draft-allman-rto-backoff-01.txt

Status of this Memo

    By submitting this Internet-Draft, each author represents that any
    applicable patent or other IPR claims of which he or she is aware
    have been or will be disclosed, and any of which he or she becomes
    aware will be disclosed, in accordance with Section 6 of BCP 79.

    Internet-Drafts are working documents of the Internet Engineering
    Task Force (IETF), its areas, and its working groups.  Note that
    other groups may also distribute working documents as Internet-
    Drafts.

    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time.  It is inappropriate to use Internet-Drafts as
    reference material or to cite them other than as "work in progress."

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt.

    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html.

Copyright Notice

    Copyright (C) The Internet Society (2005).

Abstract

    This document describes a method for using spurious retransmission
    timeouts as the trigger for slightly changing the way TCP's
    retransmission time is computed in an effort to avoid subsequent
    unnecessary retransmissions.

Terminology

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
    NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
    "OPTIONAL" in this document are to be interpreted as described
    in [RFC2119].

    The reader is expected to be familiar with the algorithm and
    terminology from [RFC2988].

Expires: March 2006                                             [Page 1]

draft-allman-rto-backoff-01.txt                             October 2005


1.  Introduction

    Various studies have shown that the retransmission timeout (RTO)
    estimator in [RFC2988] can trigger spurious retransmissions.  [AP99]
    shows that such unnecessary retransmissions are generally fairly
    rare.  However, [LK00] shows that in some networks (e.g., wireless
    networks) spurious retransmissions are more problematic due to
    occasional delay spikes that are not well predicted by TCP's RTO
    estimator.  In this document we outline one possible approach to
    mitigate the impact of pre-mature RTO firings by altering the RTO
    estimator specified in [RFC2988].

    Several methods for detecting spurious timeouts have been developed
    [RFC3522,RFC3708,RFC4138].  Additionally, [RFC4015] outlines one
    possible response to detecting spurious timeouts.  This document
    outlines an alternative to [RFC4015].  In general terms, [RFC4015]
    specifies two actions upon the detection of an unnecessary RTO-based
    retransmission.  First, the sending rate prior to the spurious
    retransmission is restored.  Furthermore, the RTO is adapted by
    re-initializing the RTO estimator with the long round-trip time
    (RTT) measurement that caused the spurious RTO.  The approach given
    in [RFC4015] is reasonable if the underlying cause of the problem is
    a shift in the path RTT.  For instance, if the route a TCP
    connection is traversing changes and the new path's RTT is
    significantly longer than the previous path's RTT then simply
    re-initializing the RTO is a reasonable action.

    As specified in the next section this document takes a slightly
    different approach than [RFC4015].  Generally, this document uses
    the failure of the RTO to wait long enough before triggering a
    retransmit as an indication that the RTO estimator itself is not
    properly capturing the variance present in the RTTs experienced by
    the TCP connection.  Therefore, this document calls for an increase
    in the contribution of the variance component in the RTO estimator
    upon the detection of retransmission timeouts in an effort to cope.
    This change represents a preference to try to avoid future spurious
    timeouts rather than simply reacting to each spurious
    retransmission. 
    
    We note that TCP implementations using the RTTM mechanism [RFC1323]
    to assess the RTT multiple times per RTT with the standard
    exponentially-weighted moving average (EWMA) gains from [RFC2988]
    use less RTT history than when taking one RTT measurement per RTT.
    [AP99] shows that "fast" EWMAs yield more spurious retransmissions
    than when using the standard gains with one RTT sample per RTT.
    Therefore, an orthogonal change to TCP implementations that use RTTM
    that may prevent spurious RTOs is to set the EWMA gains based on the
    number of RTT samples taken per RTT such that the amount of history
    kept, in terms of time, is the same regardless of the number RTT
    samples taken [Flo98,LS00]. 

2.  Parameter Changes


Expires: March 2006                                             [Page 2]

draft-allman-rto-backoff-01.txt                             October 2005

    As the basis for the changes proposed below, a TCP MUST support an
    IETF-specified spurious timeout detection method.  Currently,
    [RFC3522], [RFC3708] and [RFC4138] are such detection methods.  We
    note that the research literature includes alternate methods for
    detecting spurious retransmissions, e.g., the "retransmit bit"
    [LK00], but these schemes MUST NOT be used as part of the changes
    specified in this document until such time that the IETF approves a
    specification of these schemes.

    We also note that [RFC2988] explicitly allows for an RTO estimator
    that is more conservative than that given in [RFC2988].

    Also we note that, given that the TCP is savvy enough to untangle
    needed and uneeded retransmission timeouts, the TCP does not need to
    use Karn's algorithm [KP87,RFC2988] and can accurately determine the
    RTT that cause spurious retransmissions.

    Upon detection of a spurious RTO-based retransmission a TCP MAY
    alter the RTO estimator given in [RFC2988] in any way to be more
    conservative.

    The RECOMMENDED method for changing the RTO estimator given in
    [RFC2988] upon detection of a spurious timeout is to increase "K",
    the multiplier applied to RTTVAR in the RTO calculation given in
    step (2.3) of [RFC2988].  Specifically, before altering SRTT and
    RTTVAR based on the measured RTT R' (from step (2.3) in [RFC2988]) a
    K' should be calculated based on the multiplier that would have
    prevented the unneeded RTO-based retransmit:

        K' = ceil ((R' - SRTT) / RTTVAR)                           (1)
    
    After calculating K' the R' RTT sample MUST be used to adjust SRTT
    and RTTVAR and therefore the RTO, per [RFC2988].

    The actual K that is used in the RTO calculation is determined by
    the size of the congestion window.  When a TCP has only a small
    number of outstanding segments, advanced loss recovery that relies
    on the receipt of three duplicate acknowledgments as a recovery
    trigger is not as effective as when the congestion window is
    larger.  Therefore, TCP relies more heavily on the RTO in this
    regime.  Furthermore, the impact caused by spurious timeouts in
    this situation---in terms of congestion window reduction and
    resource wastage by go-back-N transmission---is small.  Hence,
    when the congestion window is less than or equal to 4*SMSS bytes
    then the standard K of 4 SHOULD be used when calculating the RTO
    via step (2.3) from [RFC2988].

    When the congestion window is greater than 4*SMSS bytes the K used
    in step (2.3) from [RFC2988] SHOULD be K'.  In this situation,
    advanced loss recovery will more likely deal with losses without
    invoking the RTO.  In addition, this regime is where spurious RTOs
    cause the most problems.  This increases K to the point where it
    would have prevented the previously sent spurious retransmission.


Expires: March 2006                                             [Page 3]

draft-allman-rto-backoff-01.txt                             October 2005

    This specification explicitly offers no way to reduce K' after it
    has been inflated.  K' is never reduced because the presence of
    spurious timeouts which inflated K' indicates that the standard
    estimator is inadequate for accurately estimating the variance of
    the RTT across the network path and therefore reducing K' would
    increase the chances of further spurious retransmissions.

    Finally, we note that bounding K' is not a good idea.  Say K' would
    be set to 20 via equation (1).  If K' were, instead, bound to 10
    then legitimate RTOs would be forced to wait longer without offering
    solid protection against delay spikes (given that we have observed
    delay spikes that a K' of 10 will not alleviate).

3.  Advantages

    The advantage of tuning the RTO calculation to be more conservative
    after detecting spurious RTO-based retransmissions is in preventing
    further spurious RTOs.  In addition, spurious RTOs can cause
    go-back-N behavior [LK00] which can also be avoided by adapting the
    RTO to be more conservative.

4.  Disadvantages    

    The disadvantage of tuning the RTO calculation to be more
    conservative is that legitimate RTO firings takes longer and could
    hurt performance.  However, an important note is that the RTO should
    not be TCP's primary loss recovery strategy.  [RFC3782] and
    [RFC3517] provide methods for TCP to effectively repair multiple
    lost segments from a single window of data without falling back to
    using the RTO.  Further, research shows that these changes are
    widely implemented [MAF05].  Therefore, making TCP's RTO calculation
    more conservative should not hinder performance under normal
    circumstance.  Put differently, when using advanced loss recovery
    techniques the firing of the RTO should be an indication that the
    congestion situation in the network is fairly bad.  In this case, it
    may well be that making the RTO estimator more conservative is the
    right general approach.

5.  Summary

    This document specifies a small change that makes the RTO
    calculation given in [RFC2988] more conservative upon the detection
    of spurious RTO-based retransmissions.  The root cause of spurious
    retransmits is an inaccurate assessment of the network conditions
    (in this case, of the RTT).  Therefore, we tackle this by making the
    RTO calculation take into account RTT variance to a larger degree.
    While this does lengthen the time required for legitimate
    retransmissions to fire, the RTO should not be TCP's primary means
    for retransmitting data and therefore this lengthened interval
    should only minimally impact overall performance and should only
    come into play when conditions along the network path have
    deteriorated significantly.  Finally, we note that this document
    makes the estimator given in [RFC2988] strictly more conservative
    and is therefore allowed via [RFC2988].

Expires: March 2006                                             [Page 4]

draft-allman-rto-backoff-01.txt                             October 2005


6.  Security Considerations 

    This document calls for a simple parameter tweak and does not change
    the security considerations given in [RFC2988].

7.  IANA Considerations

    None.

Acknowledgments

    This document has benefited from discussions with Ted Faber, Aaron
    Falk, Janardhan Iyengar, Sally Floyd and Joe Touch.

Normative References

    [RFC2119] S. Bradner.  Key words for use in RFCs to Indicate
        Requirement Levels, March 1997.  BCP 14, RFC 2119.
    
    [RFC2988] V. Paxson, M. Allman.  Computing TCP's Retransmission
        Timer, November 2000.  RFC 2988.

    [RFC3522] R. Ludwig, M. Meyer.  The Eifel Detection Algorithm for
        TCP, April 2003.  RFC 3522.

    [RFC3708] E. Blanton, M. Allman.  Using TCP Duplicate Selective
        Acknowledgement (DSACKs) and Stream Control Transmission
        Protocol (SCTP) Duplicate Transmission Sequence Numbers (TSNs)
        to Detect Spurious Retransmissions, February 2004.  RFC 3708. 

    [RFC4138] P. Sarolahti, M. Kojo.  Forward RTO-Recovery (F-RTO): An
        Algorithm for Detecting Spurious Retransmission Timeouts with
        TCP and the Stream Control Transmission Protocol (SCTP), August
        2005.  RFC 4138.

Informative References

    [AP99] Mark Allman, Vern Paxson. On Estimating End-to-End Network
        Path Properties. ACM SIGCOMM, September 1999.

    [Flo98] Sally Floyd.  Comments on RFC1323.bis, TCP-LW mailing list,
        May 1998.

    [KP87] Phil Karn, Craig Partridge.  Improving Round-Trip Time
        Estimates in Reliable Transport Protocols.  ACM SIGCOMM, August
        1997.

    [LK00] R. Ludwig, R. H. Katz.  The Eifel Algorithm: Making TCP
        Robust Against Spurious Retransmissions.  ACM Computer
        Communication Review, 30(1), January 2000.
    
    [LS00] R. Ludwig, K. Sklower, The Eifel Retransmission Timer, ACM
        Computer Communication Review, Vol. 30, No. 3, July 2000.

Expires: March 2006                                             [Page 5]

draft-allman-rto-backoff-01.txt                             October 2005

    
    [MAF05] A. Medina, M. Allman, S. Floyd.  Measuring the Evolution of
        Transport Protocols in the Internet. ACM Computer Communication
        Review, 35(2), April 2005.

    [RFC3517] E. Blanton, M. Allman, K. Fall, L. Wang.  A Conservative
        Selective Acknowledgment (SACK)-based Loss Recovery Algorithm
        for TCP, April 2003.  RFC 3517.
    
    [RFC3782] S. Floyd, T. Henderson, A. Gurtov.  The NewReno
        Modification to TCP's Fast Recovery Algorithm, April 2004.  RFC
        3782.
    
    [RFC4015] R. Ludwig, A. Gurtov.  The Eifel Response Algorithm for
        TCP, February 2005.  RFC 4015.

Author's Addresses

    Mark Allman
    ICSI Center for Internet Research
    1947 Center Street, Suite 600
    Berkeley, CA 94704-1198
    Phone: (440) 235-1792
    Email: mallman@icir.org
    URL: http://www.icir.org/mallman/

    Ethan Blanton
    Purdue University Computer Sciences
    250 North University Street
    West Lafayette, IN  47907
    Email: eblanton@cs.purdue.edu
    URL: http://www.cs.purdue.edu/homes/eblanton/

    Josh Blanton
    Ohio University Internetworking Research Group
    301 Stocker Center
    Athens, OH  45701
    Email: jblanton@cs.ohiou.edu
    URL: http://irg.cs.ohiou.edu/~jblanton/

Intellectual Property Statement

    The IETF takes no position regarding the validity or scope of any
    Intellectual Property Rights or other rights that might be claimed
    to pertain to the implementation or use of the technology described
    in this document or the extent to which any license under such
    rights might or might not be available; nor does it represent that
    it has made any independent effort to identify any such rights.
    Information on the procedures with respect to rights in RFC
    documents can be found in BCP 78 and BCP 79.

    Copies of IPR disclosures made to the IETF Secretariat and any
    assurances of licenses to be made available, or the result of an
    attempt made to obtain a general license or permission for the use

Expires: March 2006                                             [Page 6]

draft-allman-rto-backoff-01.txt                             October 2005

    of such proprietary rights by implementers or users of this
    specification can be obtained from the IETF on-line IPR repository
    at http://www.ietf.org/ipr.

    The IETF invites any interested party to bring to its attention any
    copyrights, patents or patent applications, or other proprietary
    rights that may cover technology that may be required to implement
    this standard.  Please address the information to the IETF at
    ietf-ipr@ietf.org.

Disclaimer of Validity

    This document and the information contained herein are provided on
    an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
    REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE
    INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
    IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
    THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
    WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Copyright Statement

    Copyright (C) The Internet Society (2005).  This document is subject
    to the rights, licenses and restrictions contained in BCP 78, and
    except as set forth therein, the authors retain all their rights.

Acknowledgment

    Funding for the RFC Editor function is currently provided by the
    Internet Society.


Expires: March 2006                                             [Page 7]