Transport Area Working Group S. Baset Internet-Draft H. Schulzrinne Intended status: Experimental Columbia University Expires: November 8, 2009 May 7, 2009 TCP-over-UDP draft-baset-tsvwg-tcp-over-udp-00 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on November 8, 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Baset & Schulzrinne Expires November 8, 2009 [Page 1] Internet-Draft Abbreviated Title May 2009 Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract We present TCP-over-UDP (ToU), an instance of TCP on top of UDP. It provides exactly the same congestion control, flow control, reliability, and extension mechanisms as offered by TCP. It is intended for use in scenarios where applications running on two hosts may not be able to establish a direct TCP connection but are able to exchange UDP packets. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Conventions . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 2. Model of Operation . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Setup and tear down . . . . . . . . . . . . . . . . . . . 4 2.2. Connection tracking . . . . . . . . . . . . . . . . . . . 4 3. Congestion Control, Flow Control, and Reliability . . . . . . 4 4. Header Format . . . . . . . . . . . . . . . . . . . . . . . . 5 5. ToU, TLS, and DTLS . . . . . . . . . . . . . . . . . . . . . . 6 6. Implementation Guidelines . . . . . . . . . . . . . . . . . . 7 7. Design Alternatives . . . . . . . . . . . . . . . . . . . . . 7 7.1. Simplified TCP . . . . . . . . . . . . . . . . . . . . . . 7 7.2. TCP-like mechanism within an application layer protocol . 8 7.3. Tunneling . . . . . . . . . . . . . . . . . . . . . . . . 8 7.4. TFRC . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 7.5. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 10. Security Considerations . . . . . . . . . . . . . . . . . . . 9 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 11.1. Normative References . . . . . . . . . . . . . . . . . . . 10 11.2. Informative References . . . . . . . . . . . . . . . . . . 10 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 Baset & Schulzrinne Expires November 8, 2009 [Page 2] Internet-Draft Abbreviated Title May 2009 1. Introduction The applications running on hosts behind restrictive network address translators (NATs) may not be able to establish a direct TCP connection with each other. Instead, these applications must establish a TCP connection with a reachable host, which relays the traffic of the application on the first host to the application on the second host and vice versa. While this works, this is undesirable as it creates a dependency on a reachable host. With certain NAT types, even though the applications cannot establish a direct TCP connection, they may be able to exchange UDP traffic by using techniques such as ICE-UDP [I-D.ietf-mmusic-ice]. Thus, using UDP is attractive for such applications as it removes the dependency on a reachable host. However, these applications have a requirement that the underlying transport be reliable. Further, these applications may run on machines with heterogeneous network connectivity, thereby requiring flow control. UDP does not provide reliability, congestion control, or flow control semantics. Therefore, these applications may either use TCP with a reachable host, or invent their own reliable, congestion control, and flow control transport protocol to establish a direct connection. We present TCP-over-UDP (ToU), a reliable, congestion control, and flow control transport protocol on top of UDP. The idea is that TCP is a well-designed transport protocol that provides reliable, congestion control, and flow control mechanisms and these mechanisms must be reused as much as possible. Further, a transport protocol that provides reliability and flow control mechanisms must not be tied to a specific application and must be designed to provide modular functionality. To accomplish this, ToU almost uses the same header as TCP which allows to easily incorporate TCP's reliable and congestion control algorithms as defined in TCP congestion control [I-D.ietf-tcpm-rfc2581bis] document. In essence, ToU is not a new protocol but merely an instance (or profile) of TCP over UDP minus the TCP checksum, urgent data, and PSH flag. We think that our approach is attractive for several reasons. First, we are not proposing a new congestion control algorithm. Designing new congestion control algorithms is complex, and requires a large validation effort. Second, our approach takes advantage of existing user-level-TCP (such as Daytona [Daytona] and MINET [MINET]) or TCP- over-UDP implementations (such as atou [atou]). Finally, since we are replicating TCP semantics over UDP including TCP header, any TCP options such as selective acknowledgement option (SACK) [RFC2018] or proposed TCP options such as TCP-Auth [I-D.ietf-tcpm-tcp-auth-opt] can be easily incorporated in ToU without a new standardization effort. Baset & Schulzrinne Expires November 8, 2009 [Page 3] Internet-Draft Abbreviated Title May 2009 1.1. Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 1.2. Terminology We use the terms such as congestion window (cwnd), initial window (IW), restart window (RW), receiver window (rwnd), and sender maximum segment size (SMSS) as defined in TCP congestion control [I-D.ietf-tcpm-rfc2581bis] document. 2. Model of Operation Below, we describe the key ToU operations. 2.1. Setup and tear down Like TCP, ToU uses a three-way handshake to establish a connection. Similarly, it follows TCP's semantics in tearing down the connection. 2.2. Connection tracking A key difference between TCP and UDP is that the former is connection-oriented whereas the later is not. This means that a ToU server must provide a way to keep track of existing connections. It does so through the source port and IP address of the UDP packet. 3. Congestion Control, Flow Control, and Reliability ToU follows the TCP congestion control algorithms described in TCP congestion control [I-D.ietf-tcpm-rfc2581bis] document. Thus, a ToU sender goes through the slow-start and congestion-avoidance phases. A ToU sender starts with an initial window (IW) following the guidelines in RFC 3390 [RFC3390]. During slow start, a ToU sender increments congestion window (cwnd) by at most SMSS bytes for each ACK received that cumulatively acknowledges new data. It switches to congestion avoidance when the congestion window (cwnd) exceeds slow start threshold (ssthresh). A ToU receiver generates an acknowledgement following the guidelines in Section 4.2 of TCP congestion control [I-D.ietf-tcpm-rfc2581bis] document. It immediately generates an ACK when an out-of-order segment arrives. The ToU sender uses the fast retransmit algorithm to detect and repair losses, and fast recovery algorithm to govern the transmission of new data until a non-duplicate ACK arrives. When ToU sender has Baset & Schulzrinne Expires November 8, 2009 [Page 4] Internet-Draft Abbreviated Title May 2009 not received a segment for more than one retransmission timeout (RTO), cwnd is reduced to the value of the restart window (RW) before transmission begins. The ToU sender may also use selective acknowledgement option (SACK) [RFC2018] to improve loss recovery when multiple packets are lost from one window of data. Like TCP, it uses receiver window (rwnd) to achieve flow control. 4. Header Format ToU header is like a TCP header [RFC0793] except that it does not include source port, destination port, and checksum, as they are already included in the UDP header. ToU header also does not include the 1-bit PSH flag and 1-bit Urgent flag and bits corresponding to these flags are reserved in ToU header. Further, it also does not include the 16-bit Urgent Pointer. Between sequence number and acknowledgement number, we have inserted a 32-bit magic cookie that allows to demultiplex ToU with other UDP-based protocols such as STUN [RFC5389]. The rest of the fields in a ToU header have exactly the same meaning as those in a TCP header. The size of the fixed ToU header is 16 bytes, whereas the size of fixed TCP header is 20 bytes. The fixed ToU header and UDP header have a cumulative size of 24 bytes, four more than a fixed TCP header. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Magic Cookie | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |A| |R|S|F| | | Offset| Reserved |C|R|S|Y|I| Window | | | |K| |T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Header for TCP-over-UDP (ToU) Figure 1 Baset & Schulzrinne Expires November 8, 2009 [Page 5] Internet-Draft Abbreviated Title May 2009 Since ToU header fields are exactly the same as TCP, we have borrowed their descriptions from the TCP RFC [RFC0793]. Sequence Number (32-bits): Same as a TCP sequence number. Magic Cookie (32-bits): A fixed value of 0x7194B32E in network byte order to demultiplex ToU from other application layer protocols. Acknowledgement Number (32-bits): Same as a TCP acknowledgement number. Data offset (4-bits): The number of 32-bit words in ToU header. Like a TCP header, ToU header is an integral number of 32-bits long. Reserved (7-bits): Reserved for future use. Must be zero. Control Bits (4-bits): 5-bits from left to right. Unlike TCP, the Urgent and PSH bits are excluded. ACK: Acknowledgment field significant R: Reserved in ToU. In the TCP header, it is used for the PSH function. RST: Reset the connection SYN: Synchronize sequence numbers FIN: No more data from sender Window (16-bits): Same as the window in TCP header. The number of data octets beginning with the one indicated in the acknowledgment field which the sender of this segment is willing to accept. Options: Same as TCP options. Padding: Like TCP, the ToU header padding is used to ensure that the ToU header ends and data begins on a 32 bit boundary. The padding is composed of zeros. 5. ToU, TLS, and DTLS Transport layer security (TLS) [RFC5246] and Datagram transport layer security (DTLS) [RFC4347] protocols provide privacy and data integrity between two communicating applications. TLS is layered on top of some reliable transport protocol such as TCP, whereas DTLS Baset & Schulzrinne Expires November 8, 2009 [Page 6] Internet-Draft Abbreviated Title May 2009 only assumes a datagram service. A question is what is the layering relationship between ToU protocol, TLS, and DTLS. Figure 2 (Figure 2) shows four possible options. We think that Option-2 and Option-4 are not feasible since ToU layer must be made aware of the size of header which DTLS and TLS protocols may add. Since ToU provides the same reliable and inorder delivery semantics as TCP, we prefer Option-1 over Option-3 in which TLS is layered on top of ToU. +-+-+-+-+ +-+-+-+-+ +-+-+-+-+ +-+-+-+-+ | TLS | | ToU | | DTLS | | ToU | +-+-+-+-+ +-+-+-+-+ +-+-+-+-+ +-+-+-+-+ | ToU | | TLS | | ToU | | DTLS | +-+-+-+-+ +-+-+-+-+ +-+-+-+-+ +-+-+-+-+ | UDP | | UDP | | UDP | | UDP | +-+-+-+-+ +-+-+-+-+ +-+-+-+-+ +-+-+-+-+ Option-1 Option-2 Option-3 Option-4 Layering options for ToU, TLS, DTLS Figure 2 6. Implementation Guidelines From the implementers perspective, the use of ToU should be as modular as possible. Once way to achieve this modularity is to implement ToU as a user-level library that provides socket-like function calls to the applications. The library may have its own thread of execution and can be instantiated at the start of the program. The library implements the reliable, inorder, congestion control, and flow control semantics of TCP. Applications can interact with the ToU library through socket-like function calls. 7. Design Alternatives ToU is strictly meant for scenarios where end-points desire to establish a TCP connection but are unable to do so due to the presence of NATs and firewalls. Below, we briefly discuss the design alternatives. 7.1. Simplified TCP It may be argued that TCP semantics are too complicated and it might be easier to define a protocol that adds retransmission of individual UDP packets, and ACK mechanisms, and sequencing layer. However, Baset & Schulzrinne Expires November 8, 2009 [Page 7] Internet-Draft Abbreviated Title May 2009 unless one is content with stop-and-wait congestion control (and roughly modem data rates), it is necessary for a transport protocol to have AIMD or rate-based congestion control (TFRC). As discussed in Section 7.4, rate-based congestion control is not suitable for mid-sized transfers and is not any simpler than AIMD. Further, since hosts may have heterogeneous network connectivity, a transport protocol needs to provide flow control. Moreover, it may not be easy to validate a new transport protocol that only provides selective TCP semantics. 7.2. TCP-like mechanism within an application layer protocol In this approach, key TCP mechanims such as reliability, congestion control, and flow control are designed as part of the application layer protocol. This approach has several disadvantages. First, every application layer protocol that is unable to establish TCP connections in the presence of NAT and firewalls but may use UDP will need to invent its own reliable, congestion control and flow control transport protocol. Second, it is non-trivial to get the first implementations of a conceptually new protocol right. Third, any new transport protocol, even if it is specified within an application layer protocol must undergo a large validation effort. Finally, most long-term successful protocols are those that provide modular functionality, and not extremely narrowly-tailored protocols. 7.3. Tunneling Another design option is to provide a VPN-like tunneling option for sending and receiving TCP packets over UDP. This is conceivable as follows. An application uses the regular TCP socket calls which make use of the TCP stack. Just before the transmission of the packet, a module or a virtual ethernet driver intercepts the packet, and sends the TCP packet along with its payload over UDP. Similarly, when a packet is received over UDP, the virtual ethernet driver checks if it is an encapsulated TCP packet, and if yes, passes it to the appropriate kernel level TCP handler. This approach is not desirable for several reasons. First, it creates a dependency on a kernel-level module or a virtual ethernet driver that must capture TCP packets before transmission and immediately upon reception. Kernel-level modules or virtual ethernet drivers require root access to a machine. Peer-to-peer applications are user space applications are expected to be the main users of ToU. It is unrealistic to create a dependency between these user space applications and a kernel level module. Second, sending a full-sized TCP segment over UDP may cause fragmentation. Lastly, other UDP based protocols such as STUN may need to be run on the same port as the tunneling port which can complicate the disambiguation of these Baset & Schulzrinne Expires November 8, 2009 [Page 8] Internet-Draft Abbreviated Title May 2009 protocols from the tunneled TCP. 7.4. TFRC TFRC [RFC5348] is a congestion control mechanism (not a protocol) that is designed for long-lived media streams. Its main benefit is of smoothing rates to these media streams. It does not provide any packet formats, reliability, or flow control. It's congestion control mechanism is not suited for exchanging data objects that range from a few dozen to a few hundred packets. The reason is that TFRC is based on estimating loss rates within 8 loss intervals. With a loss rate of 1%, this translates, very roughly, into 800 packets or roughly 800 kB, before a reliable estimate of a better (higher) rate is computed. Further, its main benefit, smoothing rates, is of no importance to applications desiring to replicate TCP functionality over UDP. 7.5. SCTP SCTP [RFC4960] is significantly more complicated than TCP in its implementation and its performance is generally the same, except in circumstances involving head-of-line blocking. Further, SCTP will have trouble getting traction in the consumer and enterprise Internet space unless it (also) runs over UDP, as there seem to be few NATs that know how to handle SCTP and thus it is effectively unusable by a fair fraction of the Internet user population. 8. Acknowledgements The draft incorporates comments from the discussion on P2PSIP mailing list. 9. IANA Considerations TBD. 10. Security Considerations ToU is subject to the same security considerations as TCP. 11. References Baset & Schulzrinne Expires November 8, 2009 [Page 9] Internet-Draft Abbreviated Title May 2009 11.1. Normative References [I-D.ietf-tcpm-rfc2581bis] Allman, M., "TCP Congestion Control", draft-ietf-tcpm-rfc2581bis-04 (work in progress), April 2008. [I-D.ietf-tcpm-tcp-auth-opt] Touch, J., Mankin, A., and R. Bonica, "The TCP Authentication Option", draft-ietf-tcpm-tcp-auth-opt-04 (work in progress), March 2009. [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [RFC1122] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989. [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, October 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's Initial Window", RFC 3390, October 2002. [RFC4347] Rescorla, E. and N. Modadugu, "Datagram Transport Layer Security", RFC 4347, April 2006. [RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC 4960, September 2007. [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.2", RFC 5246, August 2008. [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP Friendly Rate Control (TFRC): Protocol Specification", RFC 5348, September 2008. [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, "Session Traversal Utilities for NAT (STUN)", RFC 5389, October 2008. 11.2. Informative References [Daytona] Pradhan, P., Kandula, S., Xu, W., Sheikh, A., and E. Nahum, "Daytona : A User-Level TCP Stack", 2004, Baset & Schulzrinne Expires November 8, 2009 [Page 10] Internet-Draft Abbreviated Title May 2009 . [I-D.ietf-mmusic-ice] Rosenberg, J., "Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols", draft-ietf-mmusic-ice-19 (work in progress), October 2007. [MINET] Dinda, P., "The Minet TCP/IP Stack", 2002, . [atou] Dunigan, T. and F. Fowler, "A TCP-over-UDP Test Harness", 2002, . Authors' Addresses Salman A. Baset Columbia University 1214 Amsterdam Avenue New York, NY USA Email: salman@cs.columbia.edu Henning Schulzrinne Columbia University 1214 Amsterdam Avenue New York, NY USA Email: hgs@cs.columbia.edu Baset & Schulzrinne Expires November 8, 2009 [Page 11]