INTERNET-DRAFT           Expires Aug 1999             INTERNET-DRAFT

  Network Working Group                                    Matt Mathis
  INTERNET-DRAFT                      Pittsburgh Supercomputing Center
  Expiration Date:  Aug 1999                                  Feb 1999

                     TReno Bulk Transfer Capacity

                 < draft-ietf-ippm-treno-btc-03.txt >


  Status of this Document

     This document is an Internet-Draft and is in full conformance with
     all provisions of Section 10 of RFC2026.

     Internet-Drafts are working documents of the Internet Engineering
     Task Force (IETF), its areas, and its working groups.  Note that
     other groups may also distribute working documents as
     Internet-Drafts.

     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other
     documents at any time.  It is inappropriate to use Internet-
     Drafts as reference material or to cite them other than as
     "work in progress."

     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.


  Abstract:

    TReno is a tools to measure Bulk Transport Capacity (BTC) as defined
    in [ippm-btc-framework].  This document specifies specific details
    of the TReno algorithm as require by the BTC framework document.

  2. Introduction:

    This memo defines a Bulk Transport Capacity (BTC) based on the TReno
    (``tree-no'') diagnostic [Mathis97a].  It builds on notions
    introduced in the BTC framework document [ippm-btc-framework] and
    the IPPM Framework document, RFC 2330 [@@]; the reader is assumed to
    be familiar with both documents.

    The BTC framework document defines pure Congestion Avoidance
    Capacity (CAC) as the data rate (bits per second) of the Congestion
    Avoidance algorithm, subject to the restriction that the
    Retransmission Timeout and Slow-Start algorithms are not invoked.
    In principle a CAC metric would be an ideal BTC metric, but there
    are rather substantial difficulty with using it as such.  The
    Self-Clocking of the Congestion Avoidance algorithm can be very
    fragile, depending on the specific details of the Fast Retransmit,
    Fast Recovery or other advanced recovery algorithms.  When TCP
    looses Self-Clock it is reestablished through a retransmission
    timeout and Slow-Start.  These algorithms nearly always take more
    time than Congestion Avoidance would have taken.

    The TReno program implements BTC, CAC and ancillary metrics.  The
    ancillary metrics are designed to instrument all network events that
    might cause discrepancies between an ideal CAC metric and the TReno
    BTC, other BTC metrics or real TCP implementations.

    We use this multiple metrics approach because the CAC metric is more
    suitable for analytic modeling while the BTC metrics is more suited
    to applied measurement.  We believe that future research will lead
    to a strong analytic framework (A-frame) [ippm-btc-framework] that
    will result in understanding the relationship between CAC metrics
    and other metrics, including simple metrics (delay, loss) as well as
    the various different BTC metrics and TCP implementations.

  3. The TReno BTC Definition

  3.1. Metric Name:

    TReno-Type-P-Bulk-Transfer-Capacity

  3.2. Metric Parameters:

   +  Src, the IP address of a host

   +  Dst, the IP address of a host

   +  Initial Maximum Segment size

   +  a test duration
 
   +  T, a time

  3.3. Metric Units:

    Bits per second

  3.4. Definition: 

    The average data rate attained by the TReno program over the path
    under test.

  3.5 Congestion Control Algorithms

    The BTC framework document [ippm-btc-framework] makes the
    observation that the standard specifying congestion control
    algorithms [RFC2001.bis] allows more latitude in their
    implementation than is appropriate for a metric.  Some of the
    details of the congestion control algorithms that are left to the
    discretion of the implementor must be fully specified in a metric.

  3.5.1 Congestion Avoidance details

    TReno computes the window size in bytes.  Each acknowledgment opens
    the congestion window (cwnd) by MSS*MSS/cwnd bytes.  The actual
    number of outstanding bytes in the network is always an integral
    number of segments such that the total size is less than or equal to
    cwnd.

    @@@ the framework needs to require that delayed Acks emulation be
    specified.

    When a loss is detected the window is reduced using a algorithm that
    sends one segment per two acknowledgments for exactly one round trip
    (as determied by sequence numbers).  This reduces the window to
    exactly half of the data that was actually held by the network at
    the time the first loss was detected.  This algorithm, called
    Rate-Halving, is described in detail in a separate technical note
    [facknote].   The new cwnd will be (old_cwnd - loss)/2.

    The technical not also describes an additional group of algoritms,
    collectivly called bounding parameters, that assure that rate
    halving always arrives at a reasonable congestioin window, even
    under pathological conditions.  The bounding parameter algorithms
    have no effect on TReno under normal conditons.  If the bounding
    parameters are invoked, they are instrumented and an exceptional
    network event.

    The one of the  bounding parameters is to set ssthresh to 1/4 of
    the pre-recovery cwnd.  Thus recovery normally ends with cwnd larger
    than ssthresh, so TReno does not do a one segment slow-start as
    permitted by RFC2001.  However, if more than half a window of data
    was lost, rate having can arrive at a new cwnd which is smaller than
    ssthresh, resulting in a slow-start up to ssthresh (which would be
    1/4 the prior value of cwnd).

  3.5.2 Retransmission Timeouts

    The current version of TReno does not include an accurate model for
    the TCP retransmission timer.  Under nearly all normal conditions
    the timers in TReno are much more conservative than real TCP
    implementations.  TReno takes the view that timeouts indicate a
    failure to attain a CAC measurement, which an abnormality in the
    network that should be diagnosed.    TReno doem not experience
    timeouts unless an entire window of data is lost.

  3.5.3 Slow-Start

    TReno invokes Slow-start if cwnd is equal to or less than ssthresh.
    Unlike most TCP implementations this condition is not normally true
    at the end of recovery.

  3.5.4 Advanced Recovery Algorithms

    The algorithm used by TReno to emulate the TCP reassembly queue
    naturally emulates SACK [RFC2018] with the Forward Acknowledgment
    Algorithm [Mathis96] as updated by [facknote].

  3.5.5 Segment Size

    TReno can dynamicly discover the correct Maximum Segment Size through
    path MTU discovery.   A smaller MTU can be explicitly selected.

  3.6 Ancillary results:

	@@@ expand

        - Statistics over the entire test
          (data transferred, duration and average rate)
        - Statistics over the Congestion Avoidance portion of the test
          (data transferred, duration and average rate)
        - Path property statistics (MTU, minimum RTT, maximum congestion
          window during Congestion Avoidance and during Slow-start)
        - Direct measures of the analytic model parameters (Number
          of congestion signals, average RTT)
        - Indications of which TCP algorithms must be present to
          attain the same performance.
        - The estimated load/BW/buffering used on the return path
        - Warnings about data transmission abnormalities.
          (e.g. packets out-of-order, events that cause timeouts)
        - Warnings about conditions which may affect metric
          accuracy. (e.g. insufficient tester buffering)
        - Alarms about serious data transmission abnormalities.
          (e.g. data duplicated in the network)
        - Alarms about internal inconsistencies of the tester and
          events which might invalidate the results.
        - IP address/name of the responding target.
        - TReno version.

  3.7 Manual calibration checks:

    The following discussion assumes that the TReno diagnostic is
    implemented as a user mode program running under a standard
    operating system.  Other implementations, such as those in dedicated
    measurement instruments, can have stronger built-in calibration
    checks.

  3.7.1 Tester performance
    
    Verify that the tester and target have sufficient data rates to
    sustain the test.

    The raw performance (data rate) limitations of both the tester and
    target should be measured by running TReno in a controlled
    environment (e.g. a bench test).  Ideally the observed performance
    limits should be validated by determining the nature of the
    bottleneck and verifying that it agrees with other benchmarks of the
    tester and target (e.g. That TReno performance agrees with direct
    measures of backplane or memory bandwidth or other bottleneck as
    appropriate).  Currently no routers are reliable targets, although
    under some conditions they can be used for meaningful measurements.
    When testing between a pair of modern computer systems at a few
    megabits per second or less, the tester and target are unlikely to
    be the bottleneck.

    TReno may be less accurate at average rates above half of the known
    tester or target limits.  This is because during the initial
    Slow-start TReno needs to send bursts which are twice the average
    data rate.

    Likewise, if the link to the first hop is not more than twice as
    fast as the entire path, some of the path properties such as max
    congestion window during Slow-start may reflect the testers link
    interface, and not the path itself.

  3.7.2 Tester Buffering

    Verify that the tester and target have sufficient buffering to
    support the window needed by the test.

    If they do not have sufficient buffer space, then losses at their
    own queues may contribute to the apparent losses along the path.
    There are several difficulties in verifying the tester and target
    buffer capacity.  First, there are no good tests of the targets
    buffer capacity at all.  Second, all validation of the testers
    buffering depends in some way on the accuracy of reports by the
    tester's own operating system.  Third, there is the confusing result
    that under many circumstances (particularly when there is much more
    than sufficient average tester performance) insufficient buffering
    in the tester does not adversely impact measured performance.

    TReno reports (as calibration alarms) any events in which transmit
    packets were refused due to insufficient buffer space.  It reports a
    warning if the maximum measured congestion window is larger than the
    reported buffer space.  Although these checks are likely to be
    sufficient in most cases they are probably not sufficient in all
    cases, and will be the subject of future research.

    Note that on a timesharing or multi-tasking system, other activity
    on the tester introduces burstiness due to operating system
    scheduler latency.  Since some queuing disciplines discriminate
    against bursty sources, it is important that there be no other
    system activity during a test.  This should be confirmed with other
    operating system specific tools.

  3.7.3 Return Path performance

    Verify that the return path is not a bottleneck at the load needed
    to sustain the test.

    In ICMP mode TReno measures the net effect of both the forward and
    return paths on a single data stream.  Bottlenecks and packet losses
    in the forward and return paths are treated equally.

    In traceroute mode, TReno computes and reports the load it
    contributes to the return path.  Unlike real TCP, TReno can not
    distinguish between losses on the forward and return paths, so
    ideally we want the return path to introduce as little loss as
    possible.  A good way to test to see if the return path has a large
    effect on a measurement is to reduce the forward path messages down
    to ACK size (40 bytes), and verify that the measured packet rate is
    improved by at least factor of two.  [More research is needed.]

 3.8 Discussion:

    There are many possible reasons why a TReno measurement might not
    agree with the performance obtained by a TCP-based application.
    Some key ones include: older TCPs missing key algorithms such as MTU
    discovery, support for large windows or SACK, or miss-tuning of
    either the data source or sink.  Network conditions which require
    the newer TCP algorithms are detected by TReno and reported in the
    ancillary results.  Other documents will cover methods to diagnose
    the difference between TReno and TCP performance.

    People using the TReno metric as part of procurement documents
    should be aware that in many circumstances MTU has an intrinsic
    and large impact on overall path performance.  Under some
    conditions the difficulty in meeting a given performance
    specifications is inversely proportional to the square of the
    path MTU.  (e.g. Halving the specified MTU makes meeting the
    bandwidth specification 4 times harder.)

    When used as an end-to-end metric TReno presents exactly the same
    load to the network as a properly tuned state-of-the-art bulk TCP
    stream between the same pair of hosts.  Although the connection
    is not transferring useful data, it is no more wasteful than
    fetching an unwanted web page with the same transfer time.

  References

    [Jacobson88] Jacobson, V., "Congestion Avoidance and Control",
    Proceedings of SIGCOMM '88, Stanford, CA., August 1988.

    [mathis96] Mathis, M. and Mahdavi, J. "Forward acknowledgment:
    Refining TCP congestion control",  Proceedings of ACM SIGCOMM '96,
    Stanford, CA., August 1996.

    [RFC2018] Mathis, M., Mahdavi, J. Floyd, S., Romanow, A., "TCP
    Selective Acknowledgment Options", 1996 Obtain via:
    ftp://ds.internic.net/rfc/rfc2018.txt

    [Mathis97a] Mathis, M., TReno source distribution, Obtain via:
    ftp://ftp.psc.edu/pub/networking/tools/treno.shar

    [Mathis97b] Mathis, M., Semke, J., Mahdavi, J., Ott, T.,
    "The Macroscopic Behavior of the TCP Congestion Avoidance
    Algorithm", Computer Communications Review, 27(3), July 1997.

    [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance,
    Fast Retransmit, and Fast Recovery Algorithms",
    ftp://ds.internic.net/rfc/rfc2001.txt

    [facknote] Mathis, M., Mahdavi, M., TCP Rate-Halving with Bounding
    Parameters http://www.psc.edu/networking/papers/FACKnotes/current/

  Author's Address

    Matt Mathis
    email: mathis@psc.edu
    Pittsburgh Supercomputing Center
    4400 Fifth Ave.
    Pittsburgh PA 15213