Transport Area Working Group                                    S. Baset
Internet-Draft                                            H. Schulzrinne
Intended status: Experimental                        Columbia University
Expires: November 8, 2009                                    May 7, 2009


                              TCP-over-UDP
                   draft-baset-tsvwg-tcp-over-udp-00

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.  This document may contain material
   from IETF Documents or IETF Contributions published or made publicly
   available before November 10, 2008.  The person(s) controlling the
   copyright in some of this material may not have granted the IETF
   Trust the right to allow modifications of such material outside the
   IETF Standards Process.  Without obtaining an adequate license from
   the person(s) controlling the copyright in such materials, this
   document may not be modified outside the IETF Standards Process, and
   derivative works of it may not be created outside the IETF Standards
   Process, except to format it for publication as an RFC or to
   translate it into languages other than English.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on November 8, 2009.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal


Baset & Schulzrinne     Expires November 8, 2009                [Page 1]

Internet-Draft              Abbreviated Title                   May 2009


   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Abstract

   We present TCP-over-UDP (ToU), an instance of TCP on top of UDP.  It
   provides exactly the same congestion control, flow control,
   reliability, and extension mechanisms as offered by TCP.  It is
   intended for use in scenarios where applications running on two hosts
   may not be able to establish a direct TCP connection but are able to
   exchange UDP packets.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1.  Conventions  . . . . . . . . . . . . . . . . . . . . . . .  4
     1.2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  Model of Operation . . . . . . . . . . . . . . . . . . . . . .  4
     2.1.  Setup and tear down  . . . . . . . . . . . . . . . . . . .  4
     2.2.  Connection tracking  . . . . . . . . . . . . . . . . . . .  4
   3.  Congestion Control, Flow Control, and Reliability  . . . . . .  4
   4.  Header Format  . . . . . . . . . . . . . . . . . . . . . . . .  5
   5.  ToU, TLS, and DTLS . . . . . . . . . . . . . . . . . . . . . .  6
   6.  Implementation Guidelines  . . . . . . . . . . . . . . . . . .  7
   7.  Design Alternatives  . . . . . . . . . . . . . . . . . . . . .  7
     7.1.  Simplified TCP . . . . . . . . . . . . . . . . . . . . . .  7
     7.2.  TCP-like mechanism within an application layer protocol  .  8
     7.3.  Tunneling  . . . . . . . . . . . . . . . . . . . . . . . .  8
     7.4.  TFRC . . . . . . . . . . . . . . . . . . . . . . . . . . .  9
     7.5.  SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . .  9
   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .  9
   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
   10. Security Considerations  . . . . . . . . . . . . . . . . . . .  9
   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
     11.1. Normative References . . . . . . . . . . . . . . . . . . . 10
     11.2. Informative References . . . . . . . . . . . . . . . . . . 10
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11


Baset & Schulzrinne     Expires November 8, 2009                [Page 2]

Internet-Draft              Abbreviated Title                   May 2009


1.  Introduction

   The applications running on hosts behind restrictive network address
   translators (NATs) may not be able to establish a direct TCP
   connection with each other.  Instead, these applications must
   establish a TCP connection with a reachable host, which relays the
   traffic of the application on the first host to the application on
   the second host and vice versa.  While this works, this is
   undesirable as it creates a dependency on a reachable host.  With
   certain NAT types, even though the applications cannot establish a
   direct TCP connection, they may be able to exchange UDP traffic by
   using techniques such as ICE-UDP [I-D.ietf-mmusic-ice].  Thus, using
   UDP is attractive for such applications as it removes the dependency
   on a reachable host.  However, these applications have a requirement
   that the underlying transport be reliable.  Further, these
   applications may run on machines with heterogeneous network
   connectivity, thereby requiring flow control.  UDP does not provide
   reliability, congestion control, or flow control semantics.
   Therefore, these applications may either use TCP with a reachable
   host, or invent their own reliable, congestion control, and flow
   control transport protocol to establish a direct connection.

   We present TCP-over-UDP (ToU), a reliable, congestion control, and
   flow control transport protocol on top of UDP.  The idea is that TCP
   is a well-designed transport protocol that provides reliable,
   congestion control, and flow control mechanisms and these mechanisms
   must be reused as much as possible.  Further, a transport protocol
   that provides reliability and flow control mechanisms must not be
   tied to a specific application and must be designed to provide
   modular functionality.  To accomplish this, ToU almost uses the same
   header as TCP which allows to easily incorporate TCP's reliable and
   congestion control algorithms as defined in TCP congestion control
   [I-D.ietf-tcpm-rfc2581bis] document.  In essence, ToU is not a new
   protocol but merely an instance (or profile) of TCP over UDP minus
   the TCP checksum, urgent data, and PSH flag.

   We think that our approach is attractive for several reasons.  First,
   we are not proposing a new congestion control algorithm.  Designing
   new congestion control algorithms is complex, and requires a large
   validation effort.  Second, our approach takes advantage of existing
   user-level-TCP (such as Daytona [Daytona] and MINET [MINET]) or TCP-
   over-UDP implementations (such as atou [atou]).  Finally, since we
   are replicating TCP semantics over UDP including TCP header, any TCP
   options such as selective acknowledgement option (SACK) [RFC2018] or
   proposed TCP options such as TCP-Auth [I-D.ietf-tcpm-tcp-auth-opt]
   can be easily incorporated in ToU without a new standardization
   effort.


Baset & Schulzrinne     Expires November 8, 2009                [Page 3]

Internet-Draft              Abbreviated Title                   May 2009


1.1.  Conventions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

1.2.  Terminology

   We use the terms such as congestion window (cwnd), initial window
   (IW), restart window (RW), receiver window (rwnd), and sender maximum
   segment size (SMSS) as defined in TCP congestion control
   [I-D.ietf-tcpm-rfc2581bis] document.


2.  Model of Operation

   Below, we describe the key ToU operations.

2.1.  Setup and tear down

   Like TCP, ToU uses a three-way handshake to establish a connection.
   Similarly, it follows TCP's semantics in tearing down the connection.

2.2.  Connection tracking

   A key difference between TCP and UDP is that the former is
   connection-oriented whereas the later is not.  This means that a ToU
   server must provide a way to keep track of existing connections.  It
   does so through the source port and IP address of the UDP packet.


3.  Congestion Control, Flow Control, and Reliability

   ToU follows the TCP congestion control algorithms described in TCP
   congestion control [I-D.ietf-tcpm-rfc2581bis] document.  Thus, a ToU
   sender goes through the slow-start and congestion-avoidance phases.
   A ToU sender starts with an initial window (IW) following the
   guidelines in RFC 3390 [RFC3390].  During slow start, a ToU sender
   increments congestion window (cwnd) by at most SMSS bytes for each
   ACK received that cumulatively acknowledges new data.  It switches to
   congestion avoidance when the congestion window (cwnd) exceeds slow
   start threshold (ssthresh).  A ToU receiver generates an
   acknowledgement following the guidelines in Section 4.2 of TCP
   congestion control [I-D.ietf-tcpm-rfc2581bis] document.  It
   immediately generates an ACK when an out-of-order segment arrives.
   The ToU sender uses the fast retransmit algorithm to detect and
   repair losses, and fast recovery algorithm to govern the transmission
   of new data until a non-duplicate ACK arrives.  When ToU sender has


Baset & Schulzrinne     Expires November 8, 2009                [Page 4]

Internet-Draft              Abbreviated Title                   May 2009


   not received a segment for more than one retransmission timeout
   (RTO), cwnd is reduced to the value of the restart window (RW) before
   transmission begins.  The ToU sender may also use selective
   acknowledgement option (SACK) [RFC2018] to improve loss recovery when
   multiple packets are lost from one window of data.  Like TCP, it uses
   receiver window (rwnd) to achieve flow control.


4.  Header Format

   ToU header is like a TCP header [RFC0793] except that it does not
   include source port, destination port, and checksum, as they are
   already included in the UDP header.  ToU header also does not include
   the 1-bit PSH flag and 1-bit Urgent flag and bits corresponding to
   these flags are reserved in ToU header.  Further, it also does not
   include the 16-bit Urgent Pointer.  Between sequence number and
   acknowledgement number, we have inserted a 32-bit magic cookie that
   allows to demultiplex ToU with other UDP-based protocols such as STUN
   [RFC5389].  The rest of the fields in a ToU header have exactly the
   same meaning as those in a TCP header.  The size of the fixed ToU
   header is 16 bytes, whereas the size of fixed TCP header is 20 bytes.
   The fixed ToU header and UDP header have a cumulative size of 24
   bytes, four more than a fixed TCP header.


       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        Sequence Number                        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                         Magic Cookie                          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                    Acknowledgment Number                      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Data |             |A| |R|S|F|                               |
      | Offset|  Reserved   |C|R|S|Y|I|            Window             |
      |       |             |K| |T|N|N|                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                    Options                    |    Padding    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                             data                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


                       Header for TCP-over-UDP (ToU)

                                 Figure 1


Baset & Schulzrinne     Expires November 8, 2009                [Page 5]

Internet-Draft              Abbreviated Title                   May 2009


   Since ToU header fields are exactly the same as TCP, we have borrowed
   their descriptions from the TCP RFC [RFC0793].

   Sequence Number (32-bits):  Same as a TCP sequence number.

   Magic Cookie (32-bits):  A fixed value of 0x7194B32E in network byte
      order to demultiplex ToU from other application layer protocols.

   Acknowledgement Number (32-bits):  Same as a TCP acknowledgement
      number.

   Data offset (4-bits):  The number of 32-bit words in ToU header.
      Like a TCP header, ToU header is an integral number of 32-bits
      long.

   Reserved (7-bits):  Reserved for future use.  Must be zero.

   Control Bits (4-bits):  5-bits from left to right.  Unlike TCP, the
      Urgent and PSH bits are excluded.

      ACK: Acknowledgment field significant

      R: Reserved in ToU.  In the TCP header, it is used for the PSH
      function.

      RST: Reset the connection

      SYN: Synchronize sequence numbers

      FIN: No more data from sender

   Window (16-bits):  Same as the window in TCP header.  The number of
      data octets beginning with the one indicated in the acknowledgment
      field which the sender of this segment is willing to accept.

   Options:  Same as TCP options.

   Padding:  Like TCP, the ToU header padding is used to ensure that the
      ToU header ends and data begins on a 32 bit boundary.  The padding
      is composed of zeros.


5.  ToU, TLS, and DTLS

   Transport layer security (TLS) [RFC5246] and Datagram transport layer
   security (DTLS) [RFC4347] protocols provide privacy and data
   integrity between two communicating applications.  TLS is layered on
   top of some reliable transport protocol such as TCP, whereas DTLS


Baset & Schulzrinne     Expires November 8, 2009                [Page 6]

Internet-Draft              Abbreviated Title                   May 2009


   only assumes a datagram service.  A question is what is the layering
   relationship between ToU protocol, TLS, and DTLS.  Figure 2
   (Figure 2) shows four possible options.  We think that Option-2 and
   Option-4 are not feasible since ToU layer must be made aware of the
   size of header which DTLS and TLS protocols may add.  Since ToU
   provides the same reliable and inorder delivery semantics as TCP, we
   prefer Option-1 over Option-3 in which TLS is layered on top of ToU.


      +-+-+-+-+   +-+-+-+-+   +-+-+-+-+   +-+-+-+-+
      |  TLS  |   |  ToU  |   |  DTLS |   |  ToU  |
      +-+-+-+-+   +-+-+-+-+   +-+-+-+-+   +-+-+-+-+
      |  ToU  |   |  TLS  |   |  ToU  |   |  DTLS |
      +-+-+-+-+   +-+-+-+-+   +-+-+-+-+   +-+-+-+-+
      |  UDP  |   |  UDP  |   |  UDP  |   |  UDP  |
      +-+-+-+-+   +-+-+-+-+   +-+-+-+-+   +-+-+-+-+
      Option-1    Option-2    Option-3    Option-4


                    Layering options for ToU, TLS, DTLS

                                 Figure 2


6.  Implementation Guidelines

   From the implementers perspective, the use of ToU should be as
   modular as possible.  Once way to achieve this modularity is to
   implement ToU as a user-level library that provides socket-like
   function calls to the applications.  The library may have its own
   thread of execution and can be instantiated at the start of the
   program.  The library implements the reliable, inorder, congestion
   control, and flow control semantics of TCP.  Applications can
   interact with the ToU library through socket-like function calls.


7.  Design Alternatives

   ToU is strictly meant for scenarios where end-points desire to
   establish a TCP connection but are unable to do so due to the
   presence of NATs and firewalls.  Below, we briefly discuss the design
   alternatives.

7.1.  Simplified TCP

   It may be argued that TCP semantics are too complicated and it might
   be easier to define a protocol that adds retransmission of individual
   UDP packets, and ACK mechanisms, and sequencing layer.  However,


Baset & Schulzrinne     Expires November 8, 2009                [Page 7]

Internet-Draft              Abbreviated Title                   May 2009


   unless one is content with stop-and-wait congestion control (and
   roughly modem data rates), it is necessary for a transport protocol
   to have AIMD or rate-based congestion control (TFRC).  As discussed
   in Section 7.4, rate-based congestion control is not suitable for
   mid-sized transfers and is not any simpler than AIMD.  Further, since
   hosts may have heterogeneous network connectivity, a transport
   protocol needs to provide flow control.  Moreover, it may not be easy
   to validate a new transport protocol that only provides selective TCP
   semantics.

7.2.  TCP-like mechanism within an application layer protocol

   In this approach, key TCP mechanims such as reliability, congestion
   control, and flow control are designed as part of the application
   layer protocol.  This approach has several disadvantages.  First,
   every application layer protocol that is unable to establish TCP
   connections in the presence of NAT and firewalls but may use UDP will
   need to invent its own reliable, congestion control and flow control
   transport protocol.  Second, it is non-trivial to get the first
   implementations of a conceptually new protocol right.  Third, any new
   transport protocol, even if it is specified within an application
   layer protocol must undergo a large validation effort.  Finally, most
   long-term successful protocols are those that provide modular
   functionality, and not extremely narrowly-tailored protocols.

7.3.  Tunneling

   Another design option is to provide a VPN-like tunneling option for
   sending and receiving TCP packets over UDP.  This is conceivable as
   follows.  An application uses the regular TCP socket calls which make
   use of the TCP stack.  Just before the transmission of the packet, a
   module or a virtual ethernet driver intercepts the packet, and sends
   the TCP packet along with its payload over UDP.  Similarly, when a
   packet is received over UDP, the virtual ethernet driver checks if it
   is an encapsulated TCP packet, and if yes, passes it to the
   appropriate kernel level TCP handler.

   This approach is not desirable for several reasons.  First, it
   creates a dependency on a kernel-level module or a virtual ethernet
   driver that must capture TCP packets before transmission and
   immediately upon reception.  Kernel-level modules or virtual ethernet
   drivers require root access to a machine.  Peer-to-peer applications
   are user space applications are expected to be the main users of ToU.
   It is unrealistic to create a dependency between these user space
   applications and a kernel level module.  Second, sending a full-sized
   TCP segment over UDP may cause fragmentation.  Lastly, other UDP
   based protocols such as STUN may need to be run on the same port as
   the tunneling port which can complicate the disambiguation of these


Baset & Schulzrinne     Expires November 8, 2009                [Page 8]

Internet-Draft              Abbreviated Title                   May 2009


   protocols from the tunneled TCP.

7.4.  TFRC

   TFRC [RFC5348] is a congestion control mechanism (not a protocol)
   that is designed for long-lived media streams.  Its main benefit is
   of smoothing rates to these media streams.  It does not provide any
   packet formats, reliability, or flow control.  It's congestion
   control mechanism is not suited for exchanging data objects that
   range from a few dozen to a few hundred packets.  The reason is that
   TFRC is based on estimating loss rates within 8 loss intervals.  With
   a loss rate of 1%, this translates, very roughly, into 800 packets or
   roughly 800 kB, before a reliable estimate of a better (higher) rate
   is computed.  Further, its main benefit, smoothing rates, is of no
   importance to applications desiring to replicate TCP functionality
   over UDP.

7.5.  SCTP

   SCTP [RFC4960] is significantly more complicated than TCP in its
   implementation and its performance is generally the same, except in
   circumstances involving head-of-line blocking.  Further, SCTP will
   have trouble getting traction in the consumer and enterprise Internet
   space unless it (also) runs over UDP, as there seem to be few NATs
   that know how to handle SCTP and thus it is effectively unusable by a
   fair fraction of the Internet user population.


8.  Acknowledgements

   The draft incorporates comments from the discussion on P2PSIP mailing
   list.


9.  IANA Considerations

   TBD.


10.  Security Considerations

   ToU is subject to the same security considerations as TCP.


11.  References


Baset & Schulzrinne     Expires November 8, 2009                [Page 9]

Internet-Draft              Abbreviated Title                   May 2009


11.1.  Normative References

   [I-D.ietf-tcpm-rfc2581bis]
              Allman, M., "TCP Congestion Control",
              draft-ietf-tcpm-rfc2581bis-04 (work in progress),
              April 2008.

   [I-D.ietf-tcpm-tcp-auth-opt]
              Touch, J., Mankin, A., and R. Bonica, "The TCP
              Authentication Option", draft-ietf-tcpm-tcp-auth-opt-04
              (work in progress), March 2009.

   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
              RFC 793, September 1981.

   [RFC1122]  Braden, R., "Requirements for Internet Hosts -
              Communication Layers", STD 3, RFC 1122, October 1989.

   [RFC2018]  Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
              Selective Acknowledgment Options", RFC 2018, October 1996.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3390]  Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's
              Initial Window", RFC 3390, October 2002.

   [RFC4347]  Rescorla, E. and N. Modadugu, "Datagram Transport Layer
              Security", RFC 4347, April 2006.

   [RFC4960]  Stewart, R., "Stream Control Transmission Protocol",
              RFC 4960, September 2007.

   [RFC5246]  Dierks, T. and E. Rescorla, "The Transport Layer Security
              (TLS) Protocol Version 1.2", RFC 5246, August 2008.

   [RFC5348]  Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
              Friendly Rate Control (TFRC): Protocol Specification",
              RFC 5348, September 2008.

   [RFC5389]  Rosenberg, J., Mahy, R., Matthews, P., and D. Wing,
              "Session Traversal Utilities for NAT (STUN)", RFC 5389,
              October 2008.

11.2.  Informative References

   [Daytona]  Pradhan, P., Kandula, S., Xu, W., Sheikh, A., and E.
              Nahum, "Daytona : A User-Level TCP Stack", 2004,


Baset & Schulzrinne     Expires November 8, 2009               [Page 10]

Internet-Draft              Abbreviated Title                   May 2009


              <http://nms.lcs.mit.edu/~kandula/data/daytona.pdf>.

   [I-D.ietf-mmusic-ice]
              Rosenberg, J., "Interactive Connectivity Establishment
              (ICE): A Protocol for Network Address  Translator (NAT)
              Traversal for Offer/Answer Protocols",
              draft-ietf-mmusic-ice-19 (work in progress), October 2007.

   [MINET]    Dinda, P., "The Minet TCP/IP Stack", 2002, <http://
              cs.northwestern.edu/~pdinda/minet/NWU-CS-02-08.pdf>.

   [atou]     Dunigan, T. and F. Fowler, "A TCP-over-UDP Test Harness",
              2002, <http://www.csm.ornl.gov/~dunigan/atou.ps>.


Authors' Addresses

   Salman A. Baset
   Columbia University
   1214 Amsterdam Avenue
   New York, NY
   USA

   Email: salman@cs.columbia.edu


   Henning Schulzrinne
   Columbia University
   1214 Amsterdam Avenue
   New York, NY
   USA

   Email: hgs@cs.columbia.edu


Baset & Schulzrinne     Expires November 8, 2009               [Page 11]