INTERNET-DRAFT J. Kumar Internet Engineering Task Force University of Florida Issued: February 2002 L. Coene Expires: July 2002 Siemens Multihomed Loadsharing Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This document describes a way to loadshare the different paths of a multihomed SCTP association at the same moment while keeping congestion control per path. Table of Contents Multirouting ii Chapter 1: Introduction 2 Chapter 2: Loadsharing within a SCTP association 2 Chapter 4: Considerations 5 Chapter 5: Security considerations 6 Chapter 6: References and related work 6 Chapter 7: Acknowledgments 7 Chapter 8: Author's address 7 Jagdish & Coene [Page 1] Draft Multihomed Loadsharing October 2002 1 Introduction Multihoming has the potential to solve some Quality-of-service (QOS) resilience and relialability problems that exist nowadays in the internet. In order to solve these problems, Multihoming must be able to use all the paths present in a single association at the same time/in parallel. The SCTP specification [RFC2960] only allows a single(=primary) path to be active at any given moment. Only when this path experience trouble(such as no transmission possible...), will another path be used for the transmission of the messages. This draft is a attempt to improve this behaviour. 2 Loadsharing within a SCTP association on the host A multihomed SCTP association on a host has always more than one path to send its traffic over it. The number of paths is dependant on the number of IP adddresses exchanged during the setup of the association. As each path can have different transmission characteristic(such as delay, bandwith, jitter ...etc), separate congestion control processing must be done for each path. (Note : in future IP addresses may be added and removed "on-the-fly" during the active lifetime of the association, this amounts to adding and removing of paths to the association [ADDIP]). At present, the congestion control information is already kept per path as is required in [RFC2960]. The information is updated for the primary path by the flow of the traffic and for the alternative paths by exchanging heartbeat messages. However the heartbeat timer can be very different from the timers used for the congestion control per path and retransmission, thus rendering the info from the heartbeat useless. Congestion control info concerning a single path decays if no traffic is send over that path. To keep the congestion info up to date, the timing of sending heartbeats must be in the same range as the congestion control timings, which may place a burden of not-so-usefull(= they are NOT carrying data) messages on the alternate paths. For each path within the association, a separate congestion control window is to be specified within the transport protocol, as for every path its congestion control characteristics may (and will) be different(example RTT). This will lead to a seperate congestion control per path. Each path should be seen(in TCP terms) as a separate TCP connection, with each TCP connection having a different path/route through the network. If all paths are in use(assuming enough traffic is sent/received), then all congestion control info for every path will remain up to date. This will make a change-over more smoothly and traffic can be distributed from the failed path to all the remaining active paths, thus smoothing the change-over. The present SCTP changeover works the following: one path active, all others in standby and a changeover is from the previously single active to a single standby path. The scheme allows also the endpoints to choose whether all Jagdish & Coene [Page 2] Draft Multihomed Loadsharing October 2002 paths will be active in parallel or that there will be some standby paths in addition to the active paths. When all paths are in use it is up to some form of distributor function in SCTP to distribute the traffic across the different paths. The distributor function is a implementation dependant function which can have different, sometimes conflicting functions. Example the distributor can try to obtain a certain message transfer rate accross the complete association, another kind of distributor can try to load up all paths up till maximun capacity with all paths doing SCTP/TCP friendly congestion control. Other distributors may try to minimalise the delay or jitter. For that they would need some feedback from the remote side on top of the already existing SCTP congestion control mechanism. If that is the case then a SCTP extension may be needed. A SCTP implementation which does NOT support parallel usage of its paths must be able to communicate with an implementation which can support this. As no new additions to the SCTP protocol are required, that would mean that a SCTP full-path(meaning all paths are used in parallel)implementation would NOT break a SCTP single-path implementation. The single-path will answer the SACK the received messages to the source address of the messages. If a SACK is send back spanning multiple paths, each of the paths congestion control info will be updated per RFC2960. The application can do at present this by specifying the primary path before sendng a message to SCTP. In order to truly utilize the multihoming features of an SCTP association, the ACTIVE destination address chosen by the end points upon initialization of the SCTP connection should not be the only path used for communication through out the life time of an SCTP association. As remarked earlier Multihoming must be able to use all the DESTINATION paths present in a single association at the same time/in parallel. The change proposed here is that there is no concept like a PRIMARY path throughout the life time of the connection though it may last for some time of the connection determined by the network conditions. The network conditions and the congestions keep changing dynamically all the time, and hence choosing just one PRIMARY path for communication without actually evaluating the performance of the other IP destination addresses would not fully utilize the multi homing feature. According to the base RFC 2960 by default, an SCTP endpoint shall monitor the reachability of the idle destination transport address(es) of its peer by sending a HEARTBEAT chunk periodically to the destination transport address(es). Obviously the credibility of the PRIMARY path is checked by the regular traffic. HEARTBEAT information is used to measure the RTT of a particular path since the timing information is embedded in the HEARTBEAT chunk. Lets assume the RTT of the Primary path is x(say) and if we find out that Jagdish & Coene [Page 3] Draft Multihomed Loadsharing October 2002 through the HEARTBEAT chunks that the RTT of another path is 0.8*x or 0.9*x, then it definitely makes sense to use this path rather than the old primary path in order to improve the performance. Now there are more questions which can crop up : how can one say for sure whether the new path chosen for changeover is truly better than the old path. In other words how can one be sure that this lesser RTT of the newly discovered path may have just happened once or Is this a short term RTT or Is this value good enough for a long term for us to change the PRIMARY path? To confirm(we can definitely not confirm for eternity) that the lesser RTO did not just happen randomly, we need to take this decision for change over after successive confirmations limited by a variable Path.Max.Confirm. Each time a particular path is proved to be beneficial over others, the corresponding counter is incremented and each time a path is NOT proved to be beneficial over others the corresponding counter is cleared. If the counters of any path does not come up to the value of Path.Max.Confirm, then the communication takes place through the old path itself. If the value of the counters exceeds or equals the value of Path.Max.Confirm, then the change over is confirmed and affected. If there is a contention among 2 or more paths for the optimal path, then any one of the contending paths is chosen to break the tie.The same procedure continues thus continuosly testing all the paths against the current active path churning out the best possible paths for communication throughout the life time of the SCTP assocication. Having considered only the RTT or the delay involved in deciding the best path, other metrics like the bandwidth, No of hops, jitter etc can be taken into consideration and methods devised to measure them and make a decision. This could be a larger part of the future work in this area. Now if a path has been decided to change over to, by applying the above mentioned criterion, the primary path is now updated to contain the newly found path. But however as explained in [IYENGAR], there could be a spurious SCTP Congestion Window Overgrowth during this changeover. The solutions as mentioned in [IYENGAR] are in order to prevent this problem. Parallel Paths: The idea described above still supports one active path though the selection of the active path happens frequently by considering the different metrics of each path. However all the possible paths described in the SCTP association can be utilized simultaneously by a simple extension of the above mentioned idea. Each path's characteristics MUST be measured. Just consider the RTT as our path characteristic for now. As an example if there are 3 paths: A, B and C, and the RTTs of the these 3 paths are in the ratio of 1:2:3, then x/2, x/3 and x/6 could be the traffic sent along these 3 paths respectively at the same time where x is the total traffic to be sent from one end point to another in bytes. Thus this would also help ease the amount of traffic sent through congested paths.Thus as fresher RTTs are measured, the ratios keep changing dynamically and Jagdish & Coene [Page 4] Draft Multihomed Loadsharing October 2002 so do the amount of traffic sent along those paths. Hence this fair distribution of traffic based on the ratio of the RTTs for each path could be used to simulataneously support communication along all existing paths of an SCTP association. However since the application may still remain the same or different for all these paths,care has to be taken to co-ordinate incoming traffic from the different paths and manage a reliable ordered stream of data to the application(s) if required. This situation would become very complex if the no of streams in each SCTP connection becomes higher. Just as an aside to corroborate the above idea : To summarize from the Research findings from the thesis report http://www.research.microsoft.com/~padmanab/phd-thesis.html (University of California, Berkeley),"First, the transmission of a Web page from a server to a client involves the transfer of multiple distinct components, each in itself of some value to the user. To minimize user-perceived latency, it is desirable to transfer the components concurrently. TCP provides an ordered byte-stream abstraction with no mechanism to demarcate sub-streams. If a separate TCP connection is used for each component, as with HTTP/1.0, uncoordinated competition among the connections could exacerbate congestion, packet loss, unfairness, and latency.However the concurrent connections compete in an uncoordinated manner, which aggravates congestion and causes unpredicatable performance for each connection.". Hence this definitely is an inspiration to develop SCTP as a Transport layer for HTTP as SCTP overcomes the limitations of having to use multiple TCP connections for concurrent transfers(By concurrent tranfers, we mean separate simultaneous connections, one carrying text data, one carrying image data, one carrying advertizement data for a single website). And moreover SCTP demarcates different streams by assigning them their own data or control chunks. Thus developing HTTP over SCTP would further enhance the efficiency of the data transfer by having each SCTP chunk carry each logical component of a website in a single SCTP association. Hence utilizing the multi homing feature of SCTP thru' simultaneous parallel paths mentioned above (and by considering the use of SCTP parallel paths in the above mentioned HTTP example), one can hope for maximum load sharing and optimality. Thus Load Sharing thru' True multihoming is achieved by proportionate distribution of traffic This idea is further not discussed here and again could be the a larger part of the future work in this area in the further revisions of this draft 4 Considerations. The following extreme cases may happen when this solution is put into operation: Congestion control Jagdish & Coene [Page 5] Draft Multihomed Loadsharing October 2002 Congestion control has to be done on a per-path basis Addresses Some addres classes may simply not be suited for this approach as the last bits of the address are factory fabricated and thus may clash with adresses of other interfaces in the same host or router. Routing protocols Routing protocols should be try to keep the different paths(= a different prefix) of a association as separate from each other as possible. Link failure along one of the paths, will be covered by SCTP itself, however it is still possible for the routing protocols to find a new path around the the failed link(or node). The other paths of teh the SCTP association can only hope that this route computation does NOT influence the present active paths traffic. Different path characteristics: If a stream with in-sequence delivery is required by SCTP, splitting the traffic up between 2 or more paths(with radical different transmission characteristic such as short versus long delays), may lead to large SACKs, due to the large number of Gap reports.... Conclusions: The multi homing was original intended just to be used as a fault tolerance technique when one of the interfaces / destinations went actually down. But the multihoming feature can also be used to the maximum advantage by making use of all the possible IP addresses / paths. By implementing the idea of Parallel Paths, one could also achieve load sharing. 5 Security considerations To be completed. 6 References and related work [RFC2960] Stewart, R. R., Xie, Q., Morneault, K., Sharp, C. , , Schwarzbauer, H. J., Taylor, T., Rytina, I., Kalla, M., Zhang, L. and Paxson, V."Stream Control Transmission Protocol", RFC2960, October 2000. [ROUTER] Draves, R., "Default router preferences and more-specific routes",draft-ietf-ipngwg-router-selection-00.txt, work in progress [INGRES] Draves, R., "Ingress filtering, Site multihoming and source adddress selection", draft-draves-ipngwg-ingress-filtering-00.txt, work in progress Jagdish & Coene [Page 6] Draft Multihomed Loadsharing October 2002 [ADDRSEL] Draves, R., "Default Address selection for IPv6", draft-ietf-ipngwg-default-addr-select-00.txt, work in progress [SCTPMULTI] Coene, L(Ed.), "Multihoming issues in the Stream Control Transmission Protocol", draft-coene-sctp-multihome-03.txt, work in progress [DRSCN2000] http://www.sctp.de/papers/drcn2000.pdf [IYENGAR] Iyengar, J.R., Amer, P.D., Stewart, R., Arias-Rodriguez, I.,"Preventing SCTP Congestion Window Overgrowth During Changeover", draft-iyengar-sctp-cacc-01.txt, work in progress 7 Acknowledgments The authors wish to thank M. Tuexen, ... and many others for their invaluable comments. 8 Author's address Jagdish Kumar Phone: +01-352-2192163 University of Florida Email: gjagdish@ufl.edu, Department of Computer Science, jkumar@cise.ufl.edu Gainesville, FL - 32601 Lode Coene Phone: +32-14-252081 Siemens Atea EMail: lode.coene@siemens.atea.be Atealaan 34 B-2200 Herentals Belgium Jagdish & Coene [Page 7] Draft Multihomed Loadsharing October 2002 Expires: May 31, 2002 Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not Be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Jagdish & Coene [Page 8]