Internet Engineering Task Force                           Yunhong Gu 
   Internet Draft                                    Robert L. Grossman 
   Intended status: Informational     University of Illinois at Chicago 
   Expires: April 15, 2008                                 October 2007 
    
    
                   UDT: UDP-based Data Transfer Protocol 
                            draft-gg-udt-02.txt 
    
Status of this Memo 
    
   By submitting this Internet-Draft, each author represents that any 
   applicable patent or other IPR claims of which he or she is aware 
   have been or will be disclosed, and any of which he or she becomes 
   aware will be disclosed, in accordance with Section 6 of BCP 79. 
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that 
   other groups may also distribute working documents as Internet-
   Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time.  It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress." 
    
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt. 
    
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html. 
    
   This Internet-Draft will expire on April 15, 2008. 
 
Copyright Notice 
 
   Copyright (C) The IETF Trust (2007). 
    
Abstract 
    
   This document proposes UDT, or UDP based Data Transfer protocol, as 
   an alternative data transfer protocol for the situation when TCP does 
   not work well. One of the most common cases, and also the original 
   motivation of UDT, is to overcome TCP's inefficiency in high 
   bandwidth-delay product (BDP) networks. Another important target use 
   scenario is to allow networking researchers, students, and 
   application developers to easily implement and deploy new data 
   transfer algorithms and protocols. 
    
   UDT is completely based on UDP. UDT is connection oriented, unicast, 
 
 
Gu, Grossman           Expires - April 15, 2008               [Page 1] 

                                 UDT                     October 2007 
 
 
   and duplex. It supports both reliable data streaming and partial 
   reliable messaging. The congestion control module is an open 
   framework that can be used to implement and/or deploy different 
   control algorithms. UDT also have a native/default control algorithm 
   based on AIMD rate control. 


Gu, Grossman           Expires - April 15, 2008               [Page 2] 

                                 UDT                     October 2007 
 
 
Table of Contents 
    
   1. Introduction...................................................4 
   2. Packet Structures..............................................5 
   3. UDP Multiplexer................................................7 
   4. Timers.........................................................8 
   5. Connection Setup and shutdown..................................9 
      5.1 Client/Server Connection Setup.............................9 
      5.2 Rendezvous Connection Setup...............................10 
      5.3 Shutdown..................................................11 
   6. Data Sending and Receiving....................................11 
      6.1 The Sender's Algorithm....................................11 
      6.2 The Receiver's Algorithm..................................12 
      6.3 Flow Control..............................................15 
      6.4 Loss Information Compression Scheme.......................15 
   7. Configurable Congestion Control (CCC).........................15 
      7.1 CCC Interface.............................................15 
      7.2 UDT's Native Control Algorithm............................16 
   Security Considerations..........................................17 
   IANA Considerations..............................................18 
   Normative References.............................................18 
   Informative References...........................................18 
   Author's Addresses...............................................19 
   Full Copyright Statement.........................................20 
   Intellectual Property............................................20 
   Acknowledgment...................................................20 
    

Gu, Grossman           Expires - April 15, 2008               [Page 3] 

                                 UDT                     October 2007 
 
         
1. Introduction 
    
   The Transmission Control Protocol (TCP) [RFC2581] has been very 
   successful and greatly contributes to the popularity of today's 
   Internet. Today TCP still contributes the majority of the traffic on 
   the Internet. 
    
   However, TCP is not perfect and it is not designed for every specific 
   applications. In the last several years, with the rapid advance of 
   optical networks and rich Internet applications, TCP has been found 
   inefficient as the network bandwidth delay product (BDP) increases. 
   Its AIMD (additive increase multiplicative decrease) algorithm 
   reduces the TCP congestion window drastically but fails to recover it 
   to the available bandwidth quickly. Theoretical flow level analysis 
   has shown that TCP becomes more vulnerable to packet loss as the BDP 
   increases higher [LM97]. 
    
   To overcome the TCP's inefficiency problem over high speed wide area 
   networks is the original motivation of UDT. Although there are new 
   TCP variants deployed today (for example, BiC TCP [XHR04] on Linux 
   and Compound TCP [TS06] on Windows), certain problems still exist. 
   For example, none of the new TCP variants address RTT unfairness, the 
   situation that connections with shorter RTT consume more bandwidth. 
    
   Moreover, as the Internet continues to evolve, new challenges and 
   requirements to the transport protocol will always emerge. 
   Researchers need a platform to rapidly develop and test new 
   algorithms and protocols. Network researchers and students can use 
   UDT to easily implement their ideas on transport protocols, in 
   particular congestion control algorithms, and conduct experiments 
   over real networks. 
    
   Finally, there are other situations when UDT can be found more 
   helpful than TCP. For example, UDP-based protocol is usually easier 
   for punching NAT firewall. For another example, TCP's congestion 
   control and reliability control is not desirable in certain 
   applications of VOIP, wireless communication, etc. Application 
   developers can modify UDT to suit their requirement. 
    
   Due to all those reasons and motivations described above, we believe 
   that it is necessary to design a well defined and developed UDP-based 
   data transfer protocol. 
    
   As its name suggest, UDT is built solely on the top of UDP [RFC768]. 
   Both data and control packets are transferred using UDP. UDT is 
   connection-oriented in order to easily maintain congestion control, 
   reliability, and security. It is a unicast protocol while multicast 
   is not considered here. Finally, data can be transferred over UDT in 
   duplex. 
 
 
Gu, Grossman           Expires - April 15, 2008               [Page 4] 

                                 UDT                     October 2007 
 
 
   UDT supports both reliable data streaming and partial reliable 
   messaging. The data streaming semantics is similar to that of TCP, 
   while the messaging semantics is similar to a subset of SCTP 
   [RFC2960]. 
    
   This document defines UDT's protocol specification. The detailed 
   description and performance analysis can be found in [GG07], and a 
   fully functional reference implementation can be found at [UDT]. 
    
2. Packet Structures 
    
   UDT has two kinds of packets: the data packets and the control 
   packets. They are distinguished by the 1st bit (flag bit) of the 
   packet header. 
    
   The data packet header structure is as following. 
    
   0                   1                   2                   3 
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |0|                     Packet Sequence Number                  | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |FF |O|                     Message Number                      | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |                          Time Stamp                           | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |                    Destination Socket ID                      | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
   The data packet header starts with 0. Packet sequence number uses the 
   following 31 bits after the flag bit. UDT uses packet based 
   sequencing, i.e., the sequence number is increased by 1 for each sent 
   data packet in the order of packet sending. Sequence number is 
   wrapped after it is increased to the maximum number (2^31 - 1). 
    
   The next 32-bit field in the header is for the messaging. The first 
   two bits "FF" flags the position of the packet is a message. "10" is 
   the first packet, "01" is the last one, "11" is the only packet, and 
   "00" is any packets in the middle. The third bit "O" means if the 
   message should be delivered in order (1) or not (0). A message to be 
   delivered in order requires that all previous messages must be either 
   delivered or dropped. The rest 29 bits is the message number, similar 
   to packet sequence number (but independent). A UDT message may 
   contain multiple UDT packets. 
    
   Following are the 32-bit time stamp when the packet is sent and the 
   destination socket ID. The time stamp is a relative value starting 
   from the time when the connection is set up. The time stamp 
 
 
Gu, Grossman           Expires - April 15, 2008               [Page 5] 

                                 UDT                     October 2007 
 
 
   information is not required by UDT or its native control algorithm. 
   It is included only in case that a user defined control algorithm may 
   require the information (See Section 6). 
    
   The Destination ID is used for UDP multiplexer. Multiple UDT socket 
   can be bound on the same UDP port and this UDT ID is used to 
   differentiate the UDT connections. 
    
   If the flag bit of a UDT packet is 1, then it is a control packet and 
   parsed according to the following structure. 
    
   0                   1                   2                   3 
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |1| type  |      Reserved       |          ACK Seq. No.         | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |     |                     Message Number                      | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |                            Time Stamp                         | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |                    Destination Socket ID                      | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |                                                               | 
   ~                 Control Information Field                     ~ 
   |                                                               | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
   There are 8 types of control packets in UDT and the type information 
   is put in bit field 1 - 4 of the header. The contents of the 
   following fields depend on the packet type. The first 128 bits must 
   exist in the packet header, whereas there may be an empty control 
   information field, depending on the packet type. 
    
   Particularly, UDT uses sub-sequencing for ACK packet. Each ACK packet 
   is assigned a unique increasing 16-bit sequence number, which is 
   independent of the data packet sequence number. The ACK sequence 
   number uses bits 16 - 31 in the control packet header. The ACK 
   sequence number ranges from 0 to (2^16 - 1). Bits 16 - 31 are 
   undefined in other control packets. 
    
   TYPE 0000: Protocol Connection Handshake 
              Control Info: 
              1) 32 bits: UDT version 
              2) 32 bits: Socket Type (STREAM or DGRAM) 
              3) 32 bits: initial sequence number 
              4) 32 bits: packet size  
              5) 32 bits: maximum flow window size  
              6) 32 bits: connection type (regular or rendezvous)  
              7) 32 bits: socket ID 
 
 
Gu, Grossman           Expires - April 15, 2008               [Page 6] 

                                 UDT                     October 2007 
 
 
   TYPE 0001: Keep-alive 
              Control Info: None 
    
   TYPE 0010: Acknowledgement (ACK) 
              bits 16-31: ACK sequence number 
              Control Info: 
              1) 32 bits: The packet sequence number to which all the 
                 previous packets have been received (excluding) 
              2) 32 bits: RTT (in microseconds) 
              3) 32 bits: RTT variance 
              4) 32 bits: Available buffer size (in bytes) 
              5) 32 bits: Packets receiving rate (in number of packets  
                          per second) 
              6) 32 bits: Estimated link capacity (in number of packets  
                          per second) 
    
   TYPE 0011: Negative Acknowledgement (NAK) 
              Control Info: 
              1) 32 bits integer array of compressed loss information  
                 (see section 3.9). 
    
   TYPE 0100: Unused 
    
   TYPE 0101: Shutdown 
              Control Info: None 
    
   TYPE 0110: Acknowledgement of Acknowledgement (ACK2) 
              bits 16-31: ACK sequence number 
              Control Info: None 
    
   TYPE 0111: Message Drop Request: 
              Bits 35-64: Message number 
              Control Info: 
              1) 32 bits: First sequence number in the message 
              2) 32 bits: Last sequence number in the message 
    
   TYPE 1111: Explained by bits 4 - 15, reserved for user defined 
              Control Packet 
    
   Finally, Time Stamp and Destination Socket ID also exist in the 
   control packets. 
    
3. UDP Multiplexer 
 
   A UDP multiplexer is used to handle concurrent UDT connections 
   sharing the same UDP port. The multiplexer dispatch incoming UDT 
   packets to the corresponding UDT sockets according to the destination 
   socket ID in the packet header. 
 
 
Gu, Grossman           Expires - April 15, 2008               [Page 7] 

                                 UDT                     October 2007 
 
 
   One multiplexer is used for all UDT connections bound to the same UDP 
   port. That is, UDT sockets on different UDP port will be handled by 
   different multiplexer. 
    
   The multiplexer maintains two queues. The sending queue includes the 
   sockets with at least one packet scheduled for sending. The UDT 
   sockets in the sending queue are ordered by the next packet sending 
   time. A high performance timer is maintained by the sending queue and 
   when it is time for the first socket in the queue to send its packet, 
   the packet will be sent and the socket will be removed. If there is 
   new packet to be sent, the socket will be re-inserted to the queue. 
    
   The receiving queue reads incoming packets and dispatches them to the 
   corresponding sockets. If the destination ID is 0, the packet will be 
   sent to the listening socket (if there is any), or to a socket that 
   is in rendezvous connection phase. (See Section 5.) 
    
   Similar to the sending queue, the receiving queue also maintains a 
   list of sockets waiting for incoming packets. The receiving queue 
   scan the list to check if any timer expires for each socket every SYN 
   (SYN = 0.01 second, defined in Section 4). The list is ordered by the 
   checking time, which means if for one socket that was last checked 
   within an SYN time, there is no need to check the rest on the queue. 
    
   Note that the UDP multiplexer is only used so that multiple UDT 
   sockets can be bound to the same UDP port. It is not a must in order 
   to support a UDT data transfer session. 
    
4. Timers 
    
   UDT uses four timers to trigger different periodical events. Each 
   event has its own period and they are all independent. They use the 
   system time as origins and should process wrapping if the system time 
   wraps.  
    
   For a certain periodical event E in UDT, suppose the time variable is 
   ET and its period is p. If E is set or reset at system time t0 (ET = 
   t0), then at any time t1, (t1 - ET >= p) is the condition to check if 
   E should be triggered. 
    
   The four timers are ACK, NAK, EXP and SND. SND is used in the sender 
   only for rate-based packet sending (see Section 6.1), whereas the 
   other three are used in the receiver only. 
    
   ACK is used to trigger an acknowledgement (ACK). Its period is set by 
   the congestion control module. However, UDT will send an ACK no 
   longer than every 0.01 second, even though the congestion control 
   does not need timer-based ACK. Here, 0.01 second is defined as the 
 
 
Gu, Grossman           Expires - April 15, 2008               [Page 8] 

                                 UDT                     October 2007 
 
 
   SYN time, or synchronization time, and it affects many of the other 
   timers used in UDT. 
    
   NAK is used to trigger a negative acknowledgement (NAK). Its period 
   is dynamically updated to 4 * RTT_+ RTTVar + SYN, where RTTVar is the 
   variance of RTT samples. 
    
   EXP is used to trigger data packets retransmission and maintain 
   connection status. Its period is dynamically updated to 4 * RTT + 
   RTTVar + SYN. 
    
   The recommended granularity of their periods is microseconds. The 
   system time is queried after each time bounded UDP receiving (there 
   will be additional necessary data processing time if a UDP packet is 
   received) to check if any of the ACK, NAK, or EXP event should be 
   triggered. The timeout value of UDP receiving should be at least SYN. 
    
   In the rest of this document, a name of a time variable will be used 
   to represent the associated event, the variable itself, or the value 
   of its period, depending on the context. For example, ACK can mean 
   either the ACK event or the value of ACK period. 
    
5. Connection Setup and shutdown 
    
   UDT supports two different connection setup methods, the traditional 
   client/server mode and the rendezvous mode. In the latter mode, both 
   UDT sockets connect to each other at (approximately) the same time. 
    
   The UDT client (in rendezvous mode, both peer are clients) sends a 
   handshake request (type 0 control packet) to the server or the peer 
   side. The handshake packet has the following information: 
     1) UDT version: this value is for compatibility purpose. The 
        current version is 4. 
     2) Socket Type: STREAM (0) or DGRAM (1). 
     3) Initial Sequence Number: It is the initial data packet sequence 
        number that the UDT entity that sends this handshake will use to 
        send out data packets. This should be a random value. 
     4) Packet Size: the maximum size of a data packet (including all 
        headers). This is usually the value of MTU. 
     5) Maximum Flow Window Size: The maximum flow window size. This 
        value may not be necessary; however, it is needed in the current 
        reference implementation. 
     6) Connection Type. This information is used to differential the 
        connection setup modes and request/response. 
     7) Socket ID. The client UDT socket ID. 
    
5.1 Client/Server Connection Setup 
    
   One UDT entity starts first as the server (listener). The server 
 
 
Gu, Grossman           Expires - April 15, 2008               [Page 9] 

                                 UDT                     October 2007 
 
 
   accepts and processes incoming connection request, and creates new 
   UDT socket for each new connection. 
    
   A client that wants to connect to the server will send a handshake 
   packet first. The client should keep on sending the handshake packet 
   every constant interval (the implementation should decide this 
   interval according to the balance between response time and system 
   overhead) until it receives a response handshake from the server or a 
   timeout timer expires. 
    
   The server, when receiving a handshake packet, compares the packet 
   size and maximum window size with its own values and set its own 
   values as the smaller ones. The result values are also sent back to 
   the client by a response handshake packet, together with the server's 
   version and initial sequence number. The server is ready for 
   sending/receiving data right after this step is finished. However, it 
   must send back response packet as long as it receives any further 
   handshakes from the same client. 
    
   The client can start sending/receiving data once it gets a response 
   handshake packet from the server. Further response handshake 
   messages, if received any, should be omitted. 
    
   The connection type from the client should be set to 1 and the 
   response from the server should be set to -1. 
    
   The client should also check if the response is from the server that 
   the original request was sent to. 
    
5.2 Rendezvous Connection Setup 
    
   In this mode, both clients send a connect request to each other at 
   the same time. The initial connection type is set to 0. Once a peer 
   receives a connection request, it sends back a response. If the 
   connection type is 0, then the response sends back -1; if the 
   connection type is -1, then the response sends back -2; No response 
   will be sent for -2 request. 
    
   The rendezvous peer does the same check on the handshake messages 
   (version, packet size, window size, etc.) as described in Section 
   5.1. In addition, the peer only process the connection request from 
   the address it has sent a connection request to. Finally, rendezvous 
   connection will be rejected by a regular UDT server (listener). 
    
   A peer initializes the connection when it receives -1 response. 
    
   The rendezvous connection setup is useful when both peers are behind 
   firewall. 
    
 
Gu, Grossman           Expires - April 15, 2008              [Page 10] 

                                 UDT                     October 2007 
 
 
5.3  Shutdown 
    
   If one of the connected UDT entities is being closed, it will send a 
   shutdown message to the peer side. The peer side, after received this 
   message, will also be closed. This shutdown message, delivered using 
   UDP, is only sent once and not guaranteed to be received. If the 
   message is not received, the peer side will be closed after 16 
   continuous EXP timeout (see section 3.5). However, the total timeout 
   value should be between 3 seconds and 30 seconds. 
    
6. Data Sending and Receiving 
 
   Each UDT entity has two logical parts: the sender and the receiver. 
   The sender sends (and retransmits) application data according to flow 
   control and congestion control. The receiver receives both data 
   packets and control packets, and sends out control packets according 
   to the received packets and timers as well. 
    
   The receiver is responsible for triggering and processing all control 
   events, including congestion control and reliability control, and 
   their related mechanisms. 
    
   Note that when a UDT multiplexer is implemented, the sender and the 
   receiver will be called from the multiplexer. Otherwise, each UDT 
   entity maintains the running of its own sender and receiver. 
    
   UDT always tries to pack application data into fixed size packets 
   (the maximum packet size negotiated during connection setup), unless 
   there is not enough data to be sent. 
    
   We explained the rationale of some of the UDT data sending/receiving 
   schemes in [GHG04b]. 
 
6.1 The Sender's Algorithm 
    
   Data Structures and Variables: 
    
   1) Sender's Loss List: The sender's loss list is used to store the 
      sequence numbers of the lost packets fed back by the receiver 
      through NAK packets. The numbers are stored in increasing order. 
    
   Data Sending Algorithm: 
   1) If the sender's loss list is not empty, retransmit the first 
      packet in the list and remove it from the list. Go to 5). 
   2) In messaging mode, if the packets has been the loss list for a 
      time more than the application specified TTL, send a message drop 
      request and remove all related packets from the loss list. Go to 
      1). 
   3) Wait until there is application data to be sent. 
 
 
Gu, Grossman           Expires - April 15, 2008              [Page 11] 

                                 UDT                     October 2007 
 
 
   4)  
        a. If the number of unacknowledged packets exceeds the 
           flow/congestion window size, wait until an ACK comes. Go to 
           1). 
        b. Pack a new data packet and send it out. 
   5) If the sequence number of the current packet is 16n, where n is an 
      integer, go to 2). 
   6) Wait (SND - t) time, where SND is the inter-packet interval 
      updated by congestion control and t is the total time used by step 
      1 to step 5. Go to 1).  
    
6.2 The Receiver's Algorithm 
    
   Data Structures and Variables: 
    
   1) Receiver's Loss List: It is a list of tuples whose values include: 
      the sequence numbers of detected lost data packets, the latest 
      feedback time of each tuple, and a parameter k that is the number 
      of times each one has been fed back in NAK. Values are stored in 
      the increasing order of packet sequence numbers. 
   2) ACK History Window: A circular array of each sent ACK and the time 
      it is sent out. The most recent value will overwrite the oldest 
      one if no more free space in the array. 
   3) PKT History Window: A circular array that records the arrival time 
      of each data packet. 
   4) Packet Pair Window: A circular array that records the time 
      interval between each probing packet pair. 
   5) LRSN: A variable to record the largest received data packet 
      sequence number. LRSN is initialized to the initial sequence 
      number minus 1. 
   6) ExpCount: A variable to record number of continuous EXP time-out 
      events. 
    
   Data Receiving Algorithm: 
   1) Query the system time to check if ACK, NAK, or EXP timer has 
      expired. If there is any, process the event (as described below 
      in this section) and reset the associated time variables. For 
      ACK, also check the ACK packet interval. 
   2) Start time bounded UDP receiving. If no packet arrives, go to 1). 
   1) Reset the ExpCount to 1. If there is no unacknowledged data 
      packet, or if this is an ACK or NAK control packet, reset the EXP 
      timer. 
   3) Check the flag bit of the packet header. If it is a control 
      packet, process it according to its type and go to 1). 
   4) If the sequence number of the current data packet is 16n + 1, 
      where n is an integer, record the time interval between this 
      packet and the last data packet in the Packet Pair Window. 
   5) Record the packet arrival time in PKT History Window. 
   6)  
 
 
Gu, Grossman           Expires - April 15, 2008              [Page 12] 

                                 UDT                     October 2007 
 
 
        a. If the sequence number of the current data packet is greater 
           than LRSN + 1, put all the sequence numbers between (but 
           excluding) these two values into the receiver's loss list and 
           send them to the sender in an NAK packet. 
        b. If the sequence number is less than LRSN, remove it from the 
           receiver's loss list. 
   7) Update LRSN. Go to 1). 
    
   ACK Event Processing: 
   1) Find the sequence number prior to which all the packets have been 
      received by the receiver (ACK number) according to the following 
      rule: if the receiver's loss list is empty, the ACK number is LRSN 
      + 1; otherwise it is the smallest sequence number in the 
      receiver's loss list. 
   2) If (a) the ACK number equals to the largest ACK number ever 
      acknowledged by ACK2, or (b) it is equal to the ACK number in the 
      last ACK and the time interval between this two ACK packets is 
      less than 2 RTTs, stop (do not send this ACK). 
   3) Assign this ACK a unique increasing ACK sequence number. Pack the 
      ACK packet with RTT, RTT Variance, and flow window size (available 
      receiver buffer size). If this ACK is not triggered by ACK timers, 
      send out a Light ACK and stop. 
   4) Calculate the packet arrival speed according to the following 
      algorithm: 
         Calculate the median value of the last 16 packet arrival  
         intervals (AI) using the values stored in PKT History Window.  
         In these 16 values, remove those either greater than AI*8 or  
         less than AI/8. If more than 8 values are left, calculate the  
         average of the left values AI', and the packet arrival speed is  
         1/AI' (number of packets per second). Otherwise, return 0. 
   5) Calculate the estimated link capacity according to the following 
      algorithm: 
         Calculate the median value of the last 16 packet pair  
         intervals (PI) using the values in Packet Pair Window, and the  
         link capacity is 1/PI (number of packets per second).  
   6) Pack the packet arrival speed and estimated link capacity into the 
      ACK packet and send it out. 
   7) Record the ACK sequence number, ACK number and the departure time 
      of this ACK in the ACK History Window. 
    
   NAK Event Processing: 
   Search the receiver's loss list, find out all those sequence numbers 
   whose last feedback time is k*RTT before, where k is initialized as 2 
   and increased by 1 each time the number is fed back. Compress 
   (according to section 6.4) and send these numbers back to the sender 
   in an NAK packet. 
    
   EXP Event Processing: 
   1) Put all the unacknowledged packets into the sender's loss list. 
 
 
Gu, Grossman           Expires - April 15, 2008              [Page 13] 

                                 UDT                     October 2007 
 
 
   2) If (ExpCount > 16) and at least 3 seconds has elapsed since last 
      time ExpCount is reset to 1, or, 3 minutes has elapsed, close the 
      UDT connection and exit. 
   3) If the sender's loss list is empty, send a keep-alive packet to 
      the peer side. 
   4) Increase ExpCount by 1. 
    
   On ACK packet received: 
   1) Update the largest acknowledged sequence number. 
   2) Send back an ACK2 with the same ACK sequence number in this ACK. 
   3) Update RTT and RTTVar. 
   4) Update both ACK and NAK period to 4 * RTT + RTTVar + SYN. 
   5) Update flow window size. 
   6) If this is a Light ACK, stop. 
   7) Update packet arrival rate: A = (A * 7 + a) / 8, where a is the 
      value carried in the ACK. 
   8) Update estimated link capacity: B = (B * 7 + b) / 8, where b is 
      the value carried in the ACK. 
   9) Update sender's buffer (by releasing the buffer that has been 
      acknowledged). 
   10)Update sender's loss list (by removing all those that has been 
      acknowledged). 
    
   On NAK packet received: 
   1) Add all sequence numbers carried in the NAK into the sender's loss 
      list. 
   2) Update the SND period by rate control (see section 3.6). 
   3) Reset the EXP time variable. 
    
   On ACK2 packet received: 
   1) Locate the related ACK in the ACK History Window according to the 
      ACK sequence number in this ACK2. 
   2) Update the largest ACK number ever been acknowledged. 
   3) Calculate new rtt according to the ACK2 arrival time and the ACK 
      departure time, and update the RTT value as: RTT = (RTT * 7 + 
      rtt) / 8. 
   4) Update RTTVar by: RTTVar = (RTTVar * 3 + abs(RTT - rtt)) / 4. 
   5) Update both ACK and NAK period to 4 * RTT + RTTVar + SYN. 
    
   On message drop request received: 
   1) Tag all packets belong to the message in the receiver buffer so 
      that they will not be read. 
   2) Remove all corresponding packets in the receiver's loss list. 
    
   On Keep-alive packet received: 
   Do nothing. 
    
   On Handshake/Shutdown packet received: 
   See Section 5. 
 
 
Gu, Grossman           Expires - April 15, 2008              [Page 14] 

                                 UDT                     October 2007 
 
 
6.3  Flow Control 
    
   The flow control window size is 16 initially. 
    
   On ACK packet received: 
   The flow window size is updated to the receiver's available buffer 
   size. 
 
6.4 Loss Information Compression Scheme 
    
   The loss information carried in an NAK packet is an array of 32-bit 
   integers. If an integer in the array is a normal sequence number (1st 
   bit is 0), it means that the packet with this sequence number is 
   lost; if the 1st bit is 1, it means all the packets starting from 
   (including) this number to (including) the next number in the array 
   (whose 1st bit must be 0) are lost. 
    
   For example, the following information carried in an NAK: 
      0x00000002, 0x80000006, 0x0000000B, 0x0000000E 
   means packets with sequence number 2, 6, 7, 8, 9, 10, 11, and 14 are 
   lost. 
     
7. Configurable Congestion Control (CCC) 
    
   The congestion control in UDT is an open framework so that user-
   defined control algorithm can be easily implemented and switched. 
   Particularly, the native control algorithm is also implemented by 
   this framework. 
    
   The user-defined algorithm may redefine several control routines to 
   read and adjust several UDT parameters. The routines will be called 
   when certain event occurs. For example, when an ACK is received, the 
   control algorithm may increase the congestion window size. 
    
7.1 CCC Interface 
    
   UDT allow users to access two congestion control parameters: the 
   congestion window size and the inter-packet sending interval. Users 
   may adjust these two parameters to realize window-based control, 
   rate-based control, or a hybrid approach. 
    
   In addition, the following parameters should also be exposed. 
   1) RTT 
   2) Maximum Segment/Packet Size 
   3) Estimated Bandwidth 
   4) The maximum packet sequence number that has been sent so far 
   5) Packet arriving rate at the receiver side 
    
 
Gu, Grossman           Expires - April 15, 2008              [Page 15] 

                                 UDT                     October 2007 
 
 
   A UDT implementation may expose additional parameters as well. This 
   information can be used in user-defined congestion control algorithms 
   to adjust the packet sending rate. 
    
   The following control events can be redefined via CCC (e.g., by a 
   callback function). 
    
   1) init: when the UDT socket is connected. 
   2) close: when the UDT socket is closed. 
   3) onACK: when ACK is received. 
   4) onLOSS: when NACK is received. 
   5) onTimeout: when timeout occurs. 
   6) onPktSent: when a data packet is sent. 
   7) onPktRecv: when a data packet is received. 
    
   Users can also adjust the following parameters in the user-defined 
   control algorithms. 
    
   1) ACK interval: An ACK may be sent every fixed number of packets. 
      User may define this interval. If this value is -1, then it means 
      no ACK will be sent based on packet interval. 
   2) ACK Timer: An ACK will also be sent every fixed time interval. 
      This is mandatory in UDT. The maximum and default ACK time 
      interval is SYN. 
   3) RTO: UDT uses 4 * RTT + RTTVar to compute RTO. Users may redefine 
      this. 
    
   Detailed description and discussion of UDT/CCC can be found in 
   [GG05]. 
    
7.2 UDT's Native Control Algorithm 
    
   UDT has a native and default control algorithm, which will be used if 
   no user-defined algorithm is implemented and configured. The native 
   UDT algorithm should be implemented using CCC. 
    
   UDT's native algorithm is a hybrid congestion control algorithm, 
   hence it adjusts both the congestion window size and the inter-packet 
   interval. The native algorithm uses timer-based ACK and the ACK 
   interval is SYN. 
    
   The initial congestion window size is 16 packets and the initial 
   inter-packet interval is 0. The algorithm start with Slow Start phase 
   until the first ACK or NAK arrives. 
    
   On ACK packet received: 
   1) If it is in slow start phase, set the congestion window size to 
      the product of packet arrival rate and (RTT + SYN). Slow Start 
      ends. Stop. 
 
 
Gu, Grossman           Expires - April 15, 2008              [Page 16] 

                                 UDT                     October 2007 
 
 
   2) Set the congestion window size (CWND) to: CWND = A + 16. 
   3) The number of sent packets to be increased in the next SYN period 
      (inc) is calculated as:  
         if (B <= C) 
            inc = 1/PS; 
         else 
            inc = max(10^(ceil(log10((B-C)*PS*8))) * Beta/PS, 1/PS); 
      where B is the estimated link capacity and C is the current 
      sending speed. All are counted as packets per second. PS is the 
      fixed size of UDT packet counted in bytes. Beta is a constant 
      value of 0.0000015. 
   4) The SND period is updated as: 
         SND = (SND * SYN) / (SND * inc + SYN). 
    
   These four parameters are used in rate decrease, and their initial 
   values are in the parentheses: AvgNAKNum (1), NAKCount (1), 
   DecCount(1), LastDecSeq (initial sequence number - 1). 
    
   We define a congestion epoch as the period between two NAKs in which 
   the first biggest lost packet sequence number is greater than the 
   LastDecSeq, which is the biggest sequence number when last time the 
   packet sending rate is decreased. 
    
   AvgNAKNum is the average number of NAKs in a congestion epoch. 
   NAKCount is the current number of NAKs in the current epoch. 
    
   On NAK packet received: 
   1) If it is in slow start phase, set inter-packet interval to 
      1/recvrate. Slow start ends. Stop. 
   2) If this NAK starts a new congestion epoch, increase inter-packet 
      interval (snd) to snd = snd * 1.125; Update AvgNAKNum, reset 
      NAKCount to 1, and compute DecRandom to a random (average 
      distribution) number between 1 and AvgNAKNum. Update LastDecSeq. 
      Stop. 
   3) If DecCount <= 5, and NAKCount == DecCount * DecRandom: 
        a. Update SND period: SND = SND * 1.125; 
        b. Increase DecCount by 1; 
        c. Record the current largest sent sequence number (LastDecSeq). 
    
   The native UDT control algorithm is designed for bulk data transfer 
   over high BDP networks. [GHG04a] 
    
Security Considerations 
    
   UDT does not have its specific security mechanism, whereas it depends 
   on the application to provide authentication and lower layer to 
   provide security mechanisms, if necessary. However, UDT 
   implementations should check each arrived packets are from the 
   expected source, since UDP is connectionless. 
 
 
Gu, Grossman           Expires - April 15, 2008              [Page 17] 

                                 UDT                     October 2007 
 
 
IANA Considerations 
 
   This document has no actions for IANA. 
 
Normative References 
 
   [RFC768] J. Postel, RFC 768: User Datagram Protocol, Aug. 1980. 
 
Informative References 
    
   [GG07] Yunhong Gu and Robert L. Grossman, UDT: UDP-based Data 
      Transfer for High-Speed Wide Area Networks, Computer Networks 
      (Elsevier). Volume 51, Issue 7. May 2007. 
    
   [GG05] Yunhong Gu and Robert L. Grossman, Supporting Configurable 
      Congestion Control in Data Transport Services, SC 2005, Nov 12 - 
      18, Seattle, WA, USA. 
    
   [GHG04b] Yunhong Gu, Xinwei Hong, and Robert L. Grossman, Experiences 
      in Design and Implementation of a High Performance Transport 
      Protocol, SC 2004, Nov 6 - 12, Pittsburgh, PA, USA. 
    
   [GHG04a] Yunhong Gu, Xinwei Hong, and Robert L. Grossman, An Analysis 
      of AIMD Algorithms with Decreasing Increases, First Workshop on 
      Networks for Grid Applications (Gridnets 2004), Oct. 29, San Jose, 
      CA, USA. 
    
   [LM97] T. V. Lakshman and U. Madhow, The Performance of TCP/IP for 
      Networks with High Bandwidth-Delay Products and Random Loss, 
      IEEE/ACM Trans. on Networking, vol. 5 no 3, July 1997, pp. 336-
      350.  
  
   [RFC2581] Allman, M., Paxson, V. and W. Stevens, TCP Congestion 
      Control, RFC 2581, April 1999. 
    
   [RFC2960] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. Chwarzbauer, 
      T. Taylor, I. Rytina, M. Kalla, L. Zhang, and V. Paxson. Stream 
      Control Transmission Protocol. RFC 2960, IETF, October 2000. 
    
   [TS06] K. Tan, Jingmin Song, Qian Zhang, Murari Sridharan, A Compound 
      TCP Approach for High-speed and Long Distance Networks, in IEEE 
      Infocom, April 2006, Barcelona, Spain. 
    
   [UDT] UDT: UDP-based Data Transfer, URL http://udt.sf.net. 
    
   [XHR04] Lisong Xu, Khaled Harfoush, and Injong Rhee, Binary Increase 
      Congestion Control for Fast Long-Distance Networks, INFOCOM 2004. 
    
 
Gu, Grossman           Expires - April 15, 2008              [Page 18] 

                                 UDT                     October 2007 
 
 
Author's Addresses 
    
   Yunhong Gu 
   National Center for Data Mining 
   University of Illinois at Chicago 
   713 SEO, M/C 249, 851 S Morgan St 
   Chicago, IL 60607, USA 
   Phone: +1 (312) 413-9576 
   Email: yunhong@lac.uic.edu 
    
   Robert Grossman 
   National Center for Data Mining 
   University of Illinois at Chicago 
   727 SEO, M/C 249, 851 S Morgan St 
   Chicago, IL 60607, USA 
   Phone: +1 (312) 413-2176 
   Email: grossman@uic.edu 
    

Gu, Grossman           Expires - April 15, 2008              [Page 19] 

                                 UDT                     October 2007 
 
         
Full Copyright Statement 
 
   Copyright (C) The IETF Trust (2007). 
    
   This document is subject to the rights, licenses and restrictions   
   contained in BCP 78, and except as set forth therein, the authors   
   retain all their rights. 
    
   This document and the information contained herein are provided on an 
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 
 
Intellectual Property 
 
   The IETF takes no position regarding the validity or scope of any 
   Intellectual Property Rights or other rights that might be claimed to 
   pertain to the implementation or use of the technology described in 
   this document or the extent to which any license under such rights 
   might or might not be available; nor does it represent that it has 
   made any independent effort to identify any such rights.  Information 
   on the procedures with respect to rights in RFC documents can be 
   found in BCP 78 and BCP 79. 
    
   Copies of IPR disclosures made to the IETF Secretariat and any 
   assurances of licenses to be made available, or the result of an 
   attempt made to obtain a general license or permission for the use of 
   such proprietary rights by implementers or users of this 
   specification can be obtained from the IETF on-line IPR repository at 
   http://www.ietf.org/ipr. 
    
   The IETF invites any interested party to bring to its attention any 
   copyrights, patents or patent applications, or other proprietary 
   rights that may cover technology that may be required to implement 
   this standard.  Please address the information to the IETF at  
   ietf-ipr@ietf.org. 
 
Acknowledgment 
 
   Funding for the RFC Editor function is provided by the IETF 
   Administrative Support Activity (IASA). 
    
    
Gu, Grossman           Expires - April 15, 2008              [Page 20]