Internet Engineering Task Force Mark Allman INTERNET DRAFT BBN/NASA GRC File: draft-ietf-ippm-btc-cap-00.txt February, 2001 Expires: August, 2001 A Bulk Transfer Capacity Methodology for Cooperating Hosts Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document specifies a specific Bulk Transfer Capacity (BTC) metric based on the BTC framework outlined in [MA00]. 1 Introduction This document specifies a methodology that performs Bulk Transfer Capacity (BTC) measurements based on the BTC framework outlined in [MA00]. This particular methodology assumes cooperating processes on the sender and receiver. As outlined in [MA00], there are a number of considerations that need to be made when writing a particular BTC metric. This document specifies these items for a BTC methodology that uses cooperating processes on the sender and receiver. Readers are assumed to be familiar with [MA00] and [RFC2581]. The terminology used to describe the congestion control algorithms in this document is taken from [RFC2581]. We implemented this methodology in two programs, cap and capd. The discussion in this document is conducted in terms of these two programs. However, alternate programs can be written that conform to this BTC methodology. Expires: August 2001 [Page 1] draft-ietf-ippm-btc-cap-00.txt February 2001 2 Congestion Control Algorithm Specifications As specified in section 2 of [MA00], each BTC document must tightly specify several details of the congestion control algorithms that are not tightly specified in [RFC2581]. The following is the specification of those details for the sender's behavior in the defined methodology: * Window Increase During Congestion Avoidance: During congestion avoidance, cap counts the number of packets that are acknowledged (ACKed) by the cumulative acknowledgment, denoted SA. When SA becomes greater than or equal to the current value of the congestion window (cwnd), SA is decreased by the current value of cwnd and cwnd is increased by 1 segment unless the increase is not possible due to the configured advertised window size. * When To Enter Congestion Avoidance. [RFC2581] allows TCP to use either slow start or congestion avoidance when cwnd equals ssthresh. Cap uses congestion avoidance. * Cap uses a segment size of 1500 bytes by default. The segment size can be changed using a command-line option. Cap does not use Path MTU Discovery [RFC1191]. * By default, cap assumes 40 bytes of header are prepended to each segment (default TCP and IP headers). When using timestamps [RFC1323] the header size is increased by 12 bytes. Additionally, when using selective acknowledgments (SACKs) [RFC2018] the header size on returning ACKs depends on the number of SACK blocks being returned (per [RFC2018]). * The algorithm for calculating the retransmission timeout (RTO) is similar to the algorithm outlined in [RFC2988]. The algorithm is fully specified in section 3. [MA00] recommends each BTC take a number of ancillary metrics, in addition to a simple BTC measurement. Cap does not perform any of these ancillary metrics, but can produce a segment trace which may be used to derive these metrics via post-processing. 3 Calculating the Retransmission Timeout The RTO used in this BTC methodology is generally outlined in [RFC2988]. The following is a sketch of the initial conditions, as well as a discussion of how our estimator differs from the one outlined in [RFC2988]. The reader is assumed familiar with [RFC2988]. Cap takes high-precision round-trip time (RTT) measurements and converts these into a retransmission timeout (RTO) based on a clock with a given granularity. The RTO is initialized as follows: Expires: August 2001 [Page 2] draft-ietf-ippm-btc-cap-00.txt February 2001 * The default clock granularity, G, is 500 ms. However, the clock granularity may be changed via a command-line option. * The initial RTO in clock ticks is: (int)(3 seconds / G). * When cap is started, the first heartbeat is determined by generating a uniform random number between 0-G and subtracting the obtained value from the current time. The time of the first heartbeat is denoted HB_FIRST. * We define bounds on the RTO, as follows: MIN_TICKS = ceil (1.0 / G) MAX_TICKS = ceil (64.0 / G) The RTO is calculated based on RTT measurements. We derive RTT measurements in one of two ways. When the timestamp option is enabled by the user, we use the timestamps in incoming ACKs to take RTT measurements. Otherwise, we time one segment and its corresponding ACK per RTT, as outlined in [RFC2988]. We update the SRTT and RTTVAR upon taking each sample as defined in [RFC2988]. The timer is armed in the situations outlined in [RFC2988]. Each time we are the timer the following algorithm is used to convert the fine-grained SRTT and RTTVAR values to a course-grained RTO estimate. now = get_current_time; if (!SRTT) ticks = 3.0 / G else rto = SRTT + (4 * RTTVAR) ticks = ceil (rto / G) ticks *= BACKOFF if (ticks < MIN_TICKS) ticks = MIN_TICKS else if (ticks > MAX_TICKS) ticks = MAX_TICKS so_far = now - HB_FIRST; so_far_ticks = (int)(so_far / G) gone = so_far - (so_far_ticks * G) partial = G - gone; full = (ticks - 1) * G real_rto = full + partial arm_timer (real_rto) 4 Receiver Specification The receiving process, capd, sends ``ACKs'', UDP datagrams, to the sender with the following properties. * Each ACK contains a cumulative sequence number, as done in TCP. * The default size of an ACK is 40 bytes. Expires: August 2001 [Page 3] draft-ietf-ippm-btc-cap-00.txt February 2001 * In the case when capd echoes the timestamp sent by cap the ACK consists of 52 bytes. * By default ACKs are sent in response to every incoming data segment. * The user may enable the use of delayed ACKs [RFC1122,RFC2581] via a command-line option. 5 Conclusion This document specifies a BTC methodology involving two processes based on the framework outlined in [MA00]. This methodology has been shown to accurately gauge the BTC of a given network path over various network conditions [All01]. 6 Security Considerations The BTC methodology outlined in this document does not pose security problems beyond those expressed in the BTC framework document [MA00]. Acknowledgments Thanks to Vern Paxson for encouraging the development of cap. References [All01] Mark Allman Measuring End-to-End Bulk Transfer Capacity, February 2001. Under review. [MA00] Matt Mathis, Mark Allman. A Framework for Defining Empirical Bulk Transfer Capacity Metrics, February 2001. Internet-Draft draft-ietf-ippm-btc-framework-05.txt (work in progress). [RFC1191] Jeff Mogul, Steve Deering, "Path MTU Discovery", RFC 1191, November 1990. [RFC1323] Van Jacobson, Robert Braden, David Borman, "TCP Extensions for High Performance", RFC 1323, May 1992. [RFC2018] Matt Mathis, Jamshid Mahdavi, Sally Floyd, Allyn Romanow, "TCP Selective Acknowledgment Options", RFC 2018, 1996. [RFC2581] Mark Allman, Vern Paxson, W. Richard Stevens, "TCP Congestion Control", RFC 2581, April 1999. [RFC2988] Vern Paxson, Mark Allman, "Computing TCP's Retransmission Timer", RFC 2988, November 2000. Expires: August 2001 [Page 4] draft-ietf-ippm-btc-cap-00.txt February 2001 Author's Address: Mark Allman BBN Technologies/NASA Glenn Research Center Lewis Field 21000 Brookpark Rd. MS 54-5 Cleveland, OH 44135 Phone: 216-433-6586 Fax: 216-433-8705 mallman@bbn.com http://roland.grc.nasa.gov/~mallman Expires: August 2001 [Page 5]