INTERNET DRAFT Yogesh Prem Swami File: draft-swami-tcp-lmdr-03.txt Khiem Le Expires: January 14, 2005 Nokia Research Center Dallas July 15, 2004 Lightweight Mobility Detection and Response (LMDR) Algorithm for TCP Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of [RFC2026] Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract TCP congestion control is based on the assumption that the end-to- end path of a connection changes very infrequently (most likely due to router failure) after connection establishment. This assumption allows a TCP sender to compute (predict) a new congestion window (cwnd) based on the ACKs from previous cwnd. With host mobility, however, the assumption of "constant path" does not hold, and the present congestion control and avoidance mechanisms can lead to suboptimal system performance. In this document we describes a TCP option that allows a receiver to inform the sender about subnet change; based on which, the sender can react to optimize performance. Expires: January 15, 2005 [Page 1] draft-swami-tcp-lmdr-03.txt July 15, 2004 1. Introduction TCP congestion control [RFC2581] is based on the assumption that end-to-end path of a connection does not change--or at best changes infrequently--once the connection is established. Based on this assumption, TCP increases its data rate whenever it receives a positive feedback in the form of new ACKs (i.e., ACKs for new data). However, unless the assumption of "constant path" is made, the TCP sender cannot continue with the old data rate: ACKs received for packets sent on old path only reflect the congestion state of that path, not of the new path. When a TCP sender or receiver changes its point of attachment to the Internet (henceforth referred as "changes subnets" or "changes path"), the entire end-to-end path between the sender and receiver can change. Therefore, relying on the rate of arrival of ACKs as the only criterion for congestion control can lead to suboptimal system performance. In this document, we describe a network layer independent mechanism by which a hosts can propagate their path-change information to their peers, based on which peers can react to optimize performance. We assume that a mobile host always knows about its own subnet information (for example, by looking at its neighbor cache, destination cache, default router, or a combination of these [RFC2461]), but currently, it is not able to inform its peer of such. Please note that some network layer mobility management techniques such Mobile-IPv6 [JPA03] with route optimization may be used to indirectly derive peer's mobility information (for example, by looking into the binding cache), but these schemes do not work in other cases such as Mobile-IPv6 with reverse tunneling, Mobile-IPv4 [RFC3344], or traditional cellular networks. Once a TCP sender has mobility information about itself or its peer, it can use the congestion response described in section-5 to adjust its data rate. Please also note that we are not trying to solve the link-up/link- down problem. Link-up/link-down issues are related to link layer mechanisms which may or may not take place due to subnet change. For example, unplugging and replugging the ethernet cable constitutes a link-up/link-down event, even though the host might remain in same subnet after replugging the cable. LMDR on the other hand has been designed for just one purpose: To facilitate subnet change notification and to optimize performance if there is a subnet change. Furthermore, we consider packet loss due to bit errors to be Expires: January 15, 2005 [Page 2] draft-swami-tcp-lmdr-03.txt July 15, 2004 different from packet loss due to host mobility. LMDR MUST NOT be used as a general mechanism to recover from packet loss due to bit error. Conceptually, loss due to bit errors are different from loss due to mis-routed packets. 2. Terminology The key words "MUST," "MUST NOT," "REQUIRED," "SHALL," "SHALL NOT," "SHOULD," "SHOULD NOT," "RECOMMENDED," "MAY," "OPTIONAL," and "silently ignore" in this document are to be interpreted as described in [RFC2119]. Mobile Node (MN): A host (not a router) capable of changing its point of attachment to the Internet without breaking transport layer connectivity. Hosts that change their point of attachment to the Internet but use DHCP or other mechanism to get a new IP address are not considered mobile. Old Subnet: MN's point of attachment (subnet prefix) to the Internet prior to movement. Old Subnet and Old Path are often used interchangeably in this document. New Subnet: MN's point of attachment after movement. New Subnet and New Path are used interchangeably in this document. INIT_WINDOW: The initial congestion window size at the start of connection as described in [RFC3390]. Stale ACK: ACKs corresponding to the data sent on the Old Path. These ACKs don't contain meaningful congestion information about the new path and should be ignored for congestion response on the new path. 3. Congestion Issues with Subnet Change For concreteness, the description below assumes network mobility based on Mobile IP, but the same concepts are readily applicable to other types of networks. To illustrate the problem, consider Figure-1. At time=T, the MN is reachable on Subnet-1 through AR-1 and has the care-of address . While MN is "attached" to AR-1, packets between TCP-Sender and are routed using PATH-1. Let's assume Expires: January 15, 2005 [Page 3] draft-swami-tcp-lmdr-03.txt July 15, 2004 that after some period of time, at T+1, MN moves (hands over) to Subnet-2 and is reachable through AR-2 with the care-of address . While MN is attached to AR-2, all packets between TCP-Sender and are routed using PATH-2. <---------PATH-1----------> /---------\ +---------+ | | | | Subnet-1 +---+ Cloud-1 +---+ AR-1 +-->>>>>MN | | | | | (Time=T) +------------+ | \----++---/ +---------+ | | | || | | TCP Sender +---+ ^V PATH-3 ^V^ PATH-4 | | | || | +------------+ | /----++---\ +----+----+ | | | | | Subnet-2 +---+ Cloud-2 +---+ AR-2 +-->>>>>MN | | | | (Time=T+1) \---------/ +---------+ <--------PATH-2-----------> Figure-1 During the transient period, when MN moves from Subnet-1 to Subnet-2, AR-1 may (or may not) buffer and forward packets destined to and from through PATH-3 or through PATH-4 [K03]. We make the distinction between PATH-3 and PATH-4 to emphasize the fact that PATH-4 may belong to a well provisioned network that has dynamic equilibrium for mobile users. Such networks are designed to accommodate extremely bursty traffic. PATH-3, on the other hand, may consist of arbitrary routers without proper provisioning. Let's assume that a TCP connection was progressing between MN and TCP Sender when the user moves from Subnet-1 to Subnet-2. We now analyze the problem of congestion on different paths shown above. 3.1 Congestion On PATH-1 Congestion on PATH-1 is governed by basic slow-start and congestion Expires: January 15, 2005 [Page 4] draft-swami-tcp-lmdr-03.txt July 15, 2004 avoidance mechanisms [RFC2581]. As long as MN remains in Subnet-1, standard congestion control algorithms is sufficient. But once it moves from Subnet-1 to Subnet-2, two different scenarios are possible depending on the network topology. Scenario-1: Access Routers Don't Tunnel Packets to new Subnet. In this scenario (typical of Mobile-IPv4), all packets destined to are dropped by AR-1 once the mobile has moved (this happens if the access routers don't have enough packet forwarding information). Since the latency involved in establishing a new tunnel is of the order of RTT (2*RTT in case of Mobile-IPv6), roughly an entire window worth of data will be dropped by AR-1. Because of this window loss, the sender will timeout in most cases. In this scenario, the TCP sender has to unnecessarily wait for an RTO before it can initiate its loss recovery algorithm. In addition, the sender's SS_THRESH value will be set to an arbitrary value which will have no correlation with the BDP on the new path. An arbitrary SS_THRESH severely impacts the throughput of the connection. It forces the sender to spend a lot of time trying to reach a reasonable throughput on the new path if the BDP on the two paths are substantially different. For example, consider the case where the BDP on the old path was 10 packet, while the BDP on the new path is 1000 packets. With a normal timeout based loss recover algorithm, the sender's SS_THRESH will be set to 10 packets, and reaching a reasonable throughput of at least 500 packet (i.e., half of BDP) will require ( log_2(10/2) + (500-5)) Round Trips(recall that data rate increase during congestion avoidance is just one packet per RTT). Contrast this with a scheme where the sender resets the SS_THRESH to a large value after subnet change and only spends log_2(500/2) RTT to reach a reasonable throughput. Scenario-2: Access Routers Tunnel Packets to the new Subnet In this scenario, all packets destined to are forwarded to by AR-1 [K03]. In this case, AR-1 can forward packets to using PATH-3 or PATH-4. We consider these two paths separately in the following sections. 3.2 Congestion On PATH-3 If AR-1 starts forwarding packets to AR-2 using PATH-3, PATH-3 will experience a sudden burst of data. In addition, If multiple MNs move between AR-2 and AR-1, PATH-3 MAY get congested. But if Expires: January 15, 2005 [Page 5] draft-swami-tcp-lmdr-03.txt July 15, 2004 sending packets on PATH-3 is bad for other connections, dropping them is bad for the connections that change subnets (section-3.1). 3.3 Congestion On PATH-4 In many cases, it's reasonable to assume that wireless service providers will have a well provisioned network that can accommodate highly bursty traffic. Such networks may have a dynamic equilibrium where the average transit traffic from AR-1 to AR-2 is the same as the transit traffic from AR-2 to AR-1. Such well provisioned paths are, however, not possible Internet-wide. 3.4 Congestion On PATH-2 Since the MN is able to receive packets even after moving away from AR-1, it will continue to generate ACKs in the orderly fashion. These ACKs will traverse PATH-3 or PATH-4 and finally reach the TCP sender. But the segments sent by TCP sender due to these ACKs will travel on PATH-2 (assuming the TCP sender has received the binding update to send data on new path). Unfortunately, the TCP sender has no congestion information about PATH-2 and using the old congestion window may cause congestion on PATH-2. This problem becomes worse as the number of mobile users or rate of subnet change increases in the system. Consider, for example, the case where a train moves across a subnet boundary due to wireless radio coverage limitations, and hundreds of mobile users on that train handoff to a new subnet. In these cases, the new subnet will see a burst of data that can cause unnecessary packet loss and timeouts. Conversely, if PATH-2 is much lightly loaded than PATH-1, and if the sender is in congestion avoidance, it will spend multiple RTTs before reaching a reasonable throughput. To summarize: a) If packets from the old subnet are tunneled to the new subnet, then the influx of TCP connection in the new subnet MAY add to network congestion and cause unnecessary packet loss and timeouts. Furthermore, if the new subnet is lightly loaded, the sender will spend a lot of time trying to reach a reasonable throughput. b) If packets are not tunneled to the new subnet, then the sender may have to wait for an RTO before it can start loss recovery. In addition, the SS_THRESH update after a timeout may further degrade the performance if the BDP on the two paths are very different. Expires: January 15, 2005 [Page 6] draft-swami-tcp-lmdr-03.txt July 15, 2004 4. Subnet Change Detection Subnet change detection in itself is a two step process. First, a mobile node needs to know if it has moved from one subnet to another; second it needs to propagate this information to its peer. Detecting when a mobile node has moved is a neighbor discovery [RFC2461] problem and is beyond the scope of this document. In this document we assume that hosts can determine path-change information either from lower layers or through other out of band mechanisms. We now focus on how a mobile can propagate this information to its peer. To do so, we propose to use a TCP option. 4.1 LMDR TCP Option The basic idea behind LMDR option is to use a counter, which is decremented every time there is a subnet change. At the start of the connection, both endpoints use this option in the SYN packet and agree on an initial counter value of 7 (each side has it's own counter). After the SYN exchange is completed, the mobile hosts don't send this option until there is a subnet change. When there is a subnet change, the Initiator (the host that wants to inform its peer--the Responder--about subnet change) decrements the counter and sends this option in every subsequent ACK or data packet. When the Responder sees an LMDR option, it echoes back the Initiator's counter. The Responder keeps echoing back the value until the Initiator stops sending the option. On the other hand, the Initiator keeps sending this TCP option until it has received an Echoed value. In short, the initiator keeps sending the LMDR option until the Responder "acknowledges" that it has received the Subnet change notification. The responder acknowledged the value by echoing back the LMDR counter to the Initiator. Note that in case both the initiator and responder mode simultaneously, the host that has maximum Initial TCP sequence number should assume the role of Initiator. Following is the LMDR TCP Option format: +----------------+----------------+----+------+------+ | TYPE | LENGTH |RES | CNTR | ECNT | +----------------+----------------+----+------+------+ TYPE: (8 Bits) TCP Option Type. Value set to 25 for experimental purposes. LENGTH: (8 Bits) TCP Option Length. Value = 3. Expires: January 15, 2005 [Page 7] draft-swami-tcp-lmdr-03.txt July 15, 2004 RES: (2 Bits) Reserved bits. Sender should set the value to zero. Receiver should ignore these fields. CNTR: (3 Bits) The subnet counter value of the host sending this option. This value is decremented once for ever subnet change (i.e., if the mobile host moves from x1.y1.z1/24 to x2.y2.z2/24, and the counter value in x1.y1.z1/24 was C1, then the counter value in x2.y2.z2/24 will be C1-1). As long as the mobile is the same subnet, it should send the same value of counter. ECNT: (3 Bits) The echoed value of CNTR. When the Responder receives an LMDR option, it should copy the CNTR value to ECNT. Moreover, the Responder should use it's own subnet counter to fill in the CNTR value. Following is an example of how it works. Let's say MN-A has a subnet counter CNTR-A = 5 and MN-B has CNTR-B = 3 before subnet change. Let's assume that node B moves to a new subnet. See Figure-2 for details of the message exchange. [NO LMDR OPTION] MN-A <-----------------------------------> MN-B ( my_subnet_count = 5 ) ( my_subnet_count = 3) ( rem_subnet_count = 3) ( rem_subnet_count = 5) Time = T (MN-B moves to a new subnet) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LMDR: CNTR=2 (3-1), ECNT=5 MN-A <----------------------------------- MN-B ( rem_subnet_count = 3) ( my_subnet_count = 2) ( my_subnet_count = 5 ) ( rem_subnet_count = 5) LMDR: CNTR=5, ECNT=2 MN-A -------------------------------------> MN-B (B Has Moved. Echo back ECNTR=2) (Stop sending LMDR) Figure-2 Following are the details of subnet change detection algorithm: Expires: January 15, 2005 [Page 8] draft-swami-tcp-lmdr-03.txt July 15, 2004 1. Each TCP implementation should keep three new variables--my_subnet_count, remote_subnet_count, and in_transition--to facilitate mobility detection and response algorithm. my_subnet_count, and rem_subnet_count are used for the mobility count information about the local and remote hosts respectively. in_transition is set to one when the Responder receives the first LMDR option. The value is reset to zero when the Responder receives a packet without the LMDR option set. 2. At connection set up, both the client and server willing to have mobility detection MUST send LMDR option with CNTR=7 in the SYN packets. If both the end points agree to using the LMDR option, only then the TCP sender should process future LMDR options. 3. For each packet sent, each host should determine if it has moved to a new subnet. If either of the end-points determine that it has moved, it SHOULD update the value of my_subnet_count as follows: my_subnet_count = (my_subnet_count - 1); in_transition = 1; The node that updates this value is referred as Initiator. The Initiator SHOULD send an LMDR option for every packet as long as in_transition == 1.If the Initiator is also a data sender, it MUST follow the congestion response algorithm described in Section-5. In addition, the Initiator MUST keep the in_transition value unaltered until it receives a packet with ECNT == my_subnet_counter; (i.e., until the recent most CNTR value is echoed back by the Responder). 4. When the Responder receives a valid TCP packet (i.e., a packet that meets the sequence number and ACK sequence number criteria of RFC 793), it should compare the value of 'CNTR' with the value of conclude that the Initiator has not moved and MUST NOT update its in_transition variable. (Although it MUST keep echoing back the LMDR option. Note that in case of simultaneous move it will result in sending the option for every subsequent packet. To break this infinite loop, the host with largest Initial TCP sequence number should assume the role of Initiator.) Finally, if the two values of remote_subnet_counter and CNTR in LMDR option differ, the Responder should conclude that the Expires: January 15, 2005 [Page 9] draft-swami-tcp-lmdr-03.txt July 15, 2004 Initiator has moved. In addition, the Responder MUST update the variables as follows: rem_subnet_count = CNTR; in_transition = 1; After making these changes, the TCP sender MUST follow the congestion response algorithm as described in Section-5. Moreover, the value of in_transition SHOULD be reset when the responder receives a packet from the Initiator without the LMDR option (in other words, this guarantees that the Initiator has received the option). NOTE: In certain network architectures it's possible that a mobile (and the associated link technology) has sufficient congestion information about the new path. In these cases, if the congestion on the new path is low, one MAY choose not to indicate subnet change information to the sender since there is no need to reduce the data rate. However, the mobility information MUST be indicated if no such information is available or if the congestion information is not for the entire path (i.e., if the congestion information is only for a part of the new path, then the Initiator MUST inform about subnet change). 5. Congestion Response after Subnet Change The goal of congestion response after subnet change is to minimize congestion on PATH-2. In principle, congestion response for PATH-2 has the same requirements as that of a new connection: The sender should have no more than INIT_WINDOW worth of data outstanding on the *new path* and the SS_THRESH should be set to a large value. What makes the problem complex is the fact that connections after subnet change have non-zero packets in flight. ***The congestion response after subnet change MUST therefore ignore the Stale ACKs and MUST base its congestion control response based solely on the new ACKs (i.e., ACKs generated for data sent on new path).*** The idea behind the congestion response is to send an INIT_WINDOW worth of new data packets at the time when in_transition field is set to one, and not send any packets until the in_transition field is set to zero. Since the in_transition field will remain set for at least one RTT on the new path, it guarantees that the TCP sender would behave like a standard TCP connection. Following are the details of the congestion response algorithm. 1. When the TCP sender concludes that there is a subnet change, it's value of in_transition should be set to 1 (as described above in Section-4). At this time, the data sender should increase its congestion window as: Expires: January 15, 2005 [Page 10] draft-swami-tcp-lmdr-03.txt July 15, 2004 cwnd=cwnd+INIT_WINDOW; and send INIT_WINDOW worth of data on the new path and restart RTO timer as if this were a new connection [RFC2018]. 2. For each subsequent ACK received, the sender should adjust the congestion window such that *no new data packet is sent* into the network. This behavior should continue until in_transition = 0 again or there is a timeout. Once the in_transition is set to zero, the sender should update the unsacked packets as lost, and update the packets in flight as INIT_WINDOW - 1. The sender MUST also set the congestion window to INI_WINDOW + 1, and initiate loss recovery in slow start. 6. Architectural Considerations Architecturally, the method described above does not add any new architectural features in the system. Although LMDR requires a TCP receiver to look into some parameters and data structures (local to that stack) that are specific to IP layer, it should not be a problem either from an implementation point of view or from a theoretical point of view. In most cases, TCP layer already consults the IP layer for MTU information, at the very least. 7. Security Considerations Since LMDR option is valid only for an acceptable ACK [RFC793], it's immune to passive attacks as long as the congestion window is not of the order of 2^31 bytes. However, LMDR is not safe against active DoS attacks (present TCP is not safe either). We will describe a security mechanism to protect against active attacks if there is a requirement from the working group. 8. Acknowledgments We would like to thank Shashikant Maheshwari and Mark Allman for their comments and suggestions on a previous version of this draft. 9. REFERENCES [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control," Apr 1999. [K03] R. Koodli, "Fast Handover for Mobile IPv6," Internet draft; work in progress, draft-ietf-mobileip-fast- mipv6-07.txt, Sept 2003. Expires: January 15, 2005 [Page 11] draft-swami-tcp-lmdr-03.txt July 15, 2004 [RFC2461] T. Narten, E. Normark., W, Simpson, " Neighbor Discovery for IP Version 6 (IPv6)," Dec 1998. [JPA03] D. Johnson, C. Perkins, J. Arkko, "Mobility Support in IPv6," Internet Draft; Work In Progress, draft-ietf- mobileip-ipv6-24.txt, June 2003. [RFC3344] C. Perkins, "IP Mobility Support for IPv4," Aug 2002. [RFC3390] M. Allman, S. Floyd, C. Partridge, "Increasing TCP's Initial Window," Oct 2002. [RFC3360] S. Floyd, "Inappropriate TCP Resets Considered Harmful," Aug 2002. [RFC3517] E. Blanton, M. Allman, K. Fall, L. Wang, "A Conservative SACK-based Loss Recovery Algorithm for TCP," Internet draft; work in progress, draft-allman- tcp-sack-13.txt, Oct 2002. [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, "TCP Selective Acknowledgment Options," RFC 2018. Nov 2000. [RFC2988] V. Paxson, M. Allman, "Computing TCP's Retransmission Timer," Nov 2000. [RFC793] "Transmission Control Protocol," RFC-793, Sept 1981. 10. IPR Statement The IETF has been notified of intellectual property rights claimed in regard to some or all of the specification contained in this document. For more information consult the on-line list of claimed rights at http://www.ietf.org/ipr. Author's Address: Yogesh Prem Swami Khiem Le Nokia Research Center, Dallas Nokia Research Center, Dallas 6000 Connection Drive 6000 Connection Drive Irving, TX-75063, USA. Irving, TX-75063. USA. E-Mail: yogesh.swami@nokia.com E-Mail: khiem.le@nokia.com Ph : +1 972 374 0669 Ph : +1 972 894 4882 Expires: January 15, 2005 [Page 12]