TCP Maintenance and Minor M. Kuehlewind Extensions (tcpm) University of Stuttgart Internet-Draft R. Scheffenegger, Ed. Intended status: Standards Track NetApp, Inc. Expires: September 8, 2011 March 7, 2011 Additional negotiation in the TCP Timestamp Option field during the TCP handshake draft-scheffenegger-tcpm-timestamp-negotiation-00 Abstract RFC 1323 defines the TSecr field of a SYN packet to be not valid and thus this field will always be zero. This documents specifies the use of this field to signal and negotiate additional information about the content of the TSopt field as well as the behavior of the receiver. If the receiver understands this extension, it will use the TSecr field of the SYN/ACK to reply. Otherwise the receiver will ignore the TSecr field and set a timestamp in the TSecr field as specified in RFC 1323. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 8, 2011. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents Kuehlewind & Scheffenegger Expires September 8, 2011 [Page 1] Internet-Draft Timestamp Negotiation March 2011 carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 4 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4. Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4.1. Capability Flags . . . . . . . . . . . . . . . . . . . . . 5 5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 8 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 9 8. Security Considerations . . . . . . . . . . . . . . . . . . . . 9 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9.1. Normative References . . . . . . . . . . . . . . . . . . . 9 9.2. Informative References . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 Kuehlewind & Scheffenegger Expires September 8, 2011 [Page 2] Internet-Draft Timestamp Negotiation March 2011 1. Introduction The TCP Timestamps Option (TSopt) provides timestamp echoing for Round-trip Time (RTT) measurments. TSopt is widely deployed and activated by default in many systems. RFC 1323 [RFC1323] specifies TSopt the following way: Kind: 8 Length: 10 bytes +-------+-------+---------------------+---------------------+ |Kind=8 | 10 | TS Value (TSval) |TS Echo Reply (TSecr)| +-------+-------+---------------------+---------------------+ 1 1 4 4 RFC1323 TSopt "The Timestamps option carries two four-byte timestamp fields. The Timestamp Value field (TSval) contains the current value of the timestamp clock of the TCP sending the option. The Timestamp Echo Reply field (TSecr) is only valid if the ACK bit is set in the TCP header; if it is valid, it echos a timestamp value that was sent by the remote TCP in the TSval field of a Timestamps option. When TSecr is not valid, its value must be zero. The TSecr value will generally be from the most recent Timestamp option that was received; however, there are exceptions that are explained below. A TCP may send the Timestamps option (TSopt) in an initial SYN segment (i.e., segment containing a SYN bit and no ACK bit), and may send a TSopt in other segments only if it received a TSopt in the initial SYN segment for the connection." The comparison of the timestamp in the TSecr field to the current time gives an estimation of the RTT. RFC 1323 [RFC1323] specifies various cases when more than one timestamp is available to echo. The proposed solution might not always be the best choice, e.g. when the TCP Selective Acknowledgment Option (SACK) is used. Moreover, more and more use cases arise where one-way delay (OWD) measurements are needed. These mechanism misuse usually the TSopt to estimated the variation in OWD. To enable such mechanisms the TSecr field in the TCP SYN packet could be used for additional negotiation. Kuehlewind & Scheffenegger Expires September 8, 2011 [Page 3] Internet-Draft Timestamp Negotiation March 2011 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Kuehlewind & Scheffenegger Expires September 8, 2011 [Page 4] Internet-Draft Timestamp Negotiation March 2011 2. Overview 3. Definitions The reader is expected to be familiar with the definitions given in [RFC1323]. 4. Signaling During the initial TCP three-way handshake, timestamp options are negotiated using the TSecr field. A compliant TCP receiver will XOR the flags with the received TSval, when responding with the SYN+ACK. Timestamp Options MAY only be present when the SYN bit is set. 4.1. Capability Flags In order to signal the supported capabilities, the TSecr is overloaded with the following flags and fields during the three-way handshake. If optional capabilities such as tcp clock range are presented, minimal state will be required in the host to decode the returned Flags xor'ed with the TSval. Kind: 8 Length: 10 bytes +-------+-------+---------------------+---------------------+ |Kind=8 | 10 | TS Value (TSval) |TS Echo Reply (TSecr)| +-------+-------+---------------------+---------------------+ 1 1 4 | 4 | / | .-----------------------------------' | / \ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E|R|R|B|M| | EXP12hi | FRAC12hi | EXP12lo | FRAC12lo | |X|E|N|I|I| MSK +-----------------------+-----------------------+ |O|S|G|A|R| | RES |S| EXP16 | FRAC16 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ timestamp option flags Kuehlewind & Scheffenegger Expires September 8, 2011 [Page 5] Internet-Draft Timestamp Negotiation March 2011 EXO - Extended Options Indicated that the sender supports extended timestamp options as defined by this document, and MUST be set ("1") by compliant implementations. RES - Reserved Reserved for future use. MUST not be set ("0"). If a timestamp option is received with this bit set, the receiver MUST ignore the extended options field and react as if the Flags were not set (compatibility mode). RNG - Range negotiation Indicated that the sender is capable of adjusting the timestamp clock rate within the bounds of the two 12 bit fields (see ). Only the active sender of a TCP session is allowed to offer a range, while the receiver MAY choose a rate within these bounds. BIA - Exponent Bias When set, the 16 and 12 bit floating point exponents are presented with a bias of 21 instead of 15. This allows negotiation of extremely fine-grained timestamp clock resolutions, for example in hardware implementations and high speed (>10 Gigabit/s) environments. See section for more details. MIR - Always Mirror Timestamp To disambiguate segements and aid timing calculations even during loss episodes, the timestamp will always be mirrored regardless of the state of the receiver. A sender SHOULD use this option only in conjunction with Selective Acknowledgements (SACK [RFC2018]). MSK - Mask Timestamps If the timestamp is used for congestion control purposes, an incentive exists for malicious receivers to reflect tampered timestamps. A sender MAY choose to protect timestamps from such modifications by including a fingerprint (secure hash of some kind) in some of the least significant bits. However, doing so would prevent a receiver from using the timestamp for other purposes. The MASK field indicates how many least significant nibbles should be excluded by the receiver, when processing a timestamp. Note that this does not impact the reflected timestamp in any way - TSecr will always be equal to a appropriate TSval. Another use case would be when the sender does not support a timestamp clock which can guarantee unique timestamps for retransmitted segments. For unambigously identifying regular from retransmitted segments, the timestamp must be unique for otherwise identical segments. Reserving the Kuehlewind & Scheffenegger Expires September 8, 2011 [Page 6] Internet-Draft Timestamp Negotiation March 2011 lowest nibble for this purpose allows senders with slow running timestamp clocks to make use of this feature. S - binary16 Sign This is the sign bit of the IEEE 754-2008 binary16 floating point representation of the timestamp clock. Timestamp clocks MUST be positive, thus this bit MUST be zero. EXP16 - binary16 Exponent The exponent component of a binary16 floating point number indicating the timestamp clock. When BIA is not set, the exponent bias is 15 (identical to the binary16 definition in IEEE 754-2008). If OFF is set, the exponent bias is 21, allowing faster timestamp clock rates. Subnormal numbers (lower precision), where the exponent is zero, extend the range to 2^-24 and 2^-30 respectively. Infinity and NaN (all exponent bits set) MUST NOT be invalid, and a timestamp option with NaN/Infinity SHOULD be ignored. FRAC16 - binary16 Fraction The fraction component of a binary16 floating point number indicating the timestamp clock. The clock rate is measured in seconds between ticks. The least significant bit corresponds therefore to a time interval of 59.6 ns with the default bias of 15, and 0.931 ns with bias set to 21. The longest time interval would be 65504 sec with default bias, and 511.75 sec with bias set to 21. EXP12hi and EXP12lo - binary12 Exponent The exponent component of a truncated, 12 bit floating point number indicating the possible timestamp clock ranges. Only the host initiating a TCP session MAY offer a timestamp clock range, while the receiver SHOULD select a timestamp clock within these bounds. If the receiver can not adjust it's timestamp clock to match the range, it MAY use a timestamp clock rate outside these bounds. If the receiver indicated a timestamp clock rate within the indicated bounds, the sender MUST set it's timestamp clock rate to the negotiated rate. If the receiver uses a timestamp clock rate outside the indicated bounds, it MUST NOT use timestamps where knowledge of the timestamp clock rate is required (ie. congesion control). The exponent bias is 15 when BIA is not set, and 21 otherwise. Kuehlewind & Scheffenegger Expires September 8, 2011 [Page 7] Internet-Draft Timestamp Negotiation March 2011 FRAC12hi and FRAC12lo - binary12 Fraction The fraction component of a 12 bit floating point number. Subnormal numbers are allowed, while Inifinity/NaN MUST NOT be used. Timestamp options with Infinity/NaN values SHOULD be ignored. The smallest representable value is 238 ns with default bias, and 3.73 ns with bias set to 21, while the largest values would be virtually identical to the 16 bit floating point values (65024 and 508 sec). 5. Discussion One-way delay (variation) based congestion controls would benefit from knowing the clock resolution on both sides. RTT variance during loss episodes is not deeply researched. Current heuristics (RFC1122, RFC1323, Karn's algorithm, RFC2988) explicitly exclude (and prevent) the use of RTT samples when loss occurs. However, solving the retransmission ambiguity problem - and the related reliable ACK delivery problem - may allow the refinement of these algorithms further, as well as enabling new research to distinguish between corruption loss (without RTT / one-way delay impact) and congestion loss (with RTT / one-way delay impact). Research into this field appears to be a rather neglected, especially when it comes to large scale, public internet investigations. Due to the very nature of this, passive investigations without signals contained within the headers are only of limited use in empirical research. Retransmission ambiguity detection during loss recovery would allow an additional level of loss recovery control without reverting to timer-based methods. As with the deployment of SACK, separating "what" to send from "when" to send it could be driven one step further. In particular, less conservative loss recovery schemes which do not trade principles of packet conservation against timeliness, require a reliable way of prompt and best possible feedback from the receiver about any delivered segment and their ordering. SACK alone goes quite a long way, but using Timestamp information in addition could remove any ambiguity. However, the current specs in RFC1323 make that use impossible, thus a modified signaling (receiver behavior) is a necessity. 6. Acknowledgements The authors would like to thank Dragana Damjanovic for some initial Kuehlewind & Scheffenegger Expires September 8, 2011 [Page 8] Internet-Draft Timestamp Negotiation March 2011 thoughts around Timestamps and their extended potential use. 7. IANA Considerations This memo includes no request to IANA. 8. Security Considerations The algorithm presented in this paper shares security considerations with [RFC1323]. Some implementations address the vulerabilities of [RFC1323], by dedicating a few low-order bits of the timestamp fields for use with a (secure) hash, that protects against malicious tweaking of TSecr values. A Flag-field has been provided to transparently notify the receiver about that use of low-order bits, so that they can be excluded in one-way delay calculations. 9. References 9.1. Normative References [RFC1323] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions for High Performance", RFC 1323, May 1992. [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, October 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 9.2. Informative References [Chirp] Kuehlewind, M. and B. Briscoe, "Chirping for Congestion Control - Implementation Feasibility", Nov 2010, . [I-D.ietf-tcpm-tcp-security] Gont, F., "Security Assessment of the Transmission Control Protocol (TCP)", draft-ietf-tcpm-tcp-security-02 (work in progress), January 2011. Kuehlewind & Scheffenegger Expires September 8, 2011 [Page 9] Internet-Draft Timestamp Negotiation March 2011 Authors' Addresses Mirja Kuehlewind University of Stuttgart Pfaffenwaldring 47 Stuttgart 70569 Germany Email: mirja.kuehlewind@ikr.uni-stuttgart.de Richard Scheffenegger (editor) NetApp, Inc. Am Euro Platz 2 Vienna, 1120 Austria Phone: +43 1 3676811 3146 Email: rs@netapp.com Kuehlewind & Scheffenegger Expires September 8, 2011 [Page 10]