Multipath schedulers

Multipath schedulers UCLouvain

Olivier.Bonaventure@uclouvain.be

UCLouvain

Maxime.Piraux@uclouvain.be

UCLouvain

quentin.deconinck@uclouvain.be

Tessares

Matthieu.Baerts@tessares.net

Apple

cpaasch@apple.com

Deutsche Telekom

markus.amend@telekom.de

IRTF ICCRG Working Group Internet-Draft This document proposes a series of abstract packet schedulers for multipath transport protocols equipped with a congestion controller.

The Internet was designed under the implicit assumption that hosts are equipped with a single network interface while routers are equipped with several ones. Under this assumption, an Internet host is usually identified by the IP address of its network interface. This assumption does not hold anymore today for two reasons. First, a growing fraction of the Internet hosts are equipped with several network interfaces, usually through different datalink networks. These multihomed hosts are reachable via different IP addresses. Second, a growing fraction of the hosts that are attached through a single network interface are dual-stack and are thus reachable over both IPv4 and IPv6. Several Internet transport protocols have been extended to leverage the different paths that are exposed on such hosts: Multipath TCP , the load sharing extensions to SCTP , Multipath DCCP and Multipath QUIC . These multipath transport protocols differ in the way they are organized and exchange control information and user data. However, they all include algorithms to handle three problems that any multipath transport protocol needs to solve: Congestion controller Path manager Packet scheduler Packet re-assembly From a congestion control viewpoint, the main concern for a multipath transport protocol is that a multipath connection should not be unfair to single-path transport connections that share a common bottleneck. This problem can be solved by coupling the congestion windows of the different paths. The solution proposed in is applicable to any transport protocol. Beside providing fairness, congestion control can also be a valuable input for different kind of traffic distribution algorithm within a packet scheduler. Typically metrics like RTT and available capacity can be derived. A multipath transport protocol uses different flows during the lifetime of a connection. The Path Manager contains the logic that regulates the creation/deletion of these flows. This logic usually depends on the requirements of the application that uses the multipath transport. Some applications use multipath in failover situations. In this case, the connection can use one path and the path manager can create another path when the primary one fails. An application that wishes to share its load among different paths can request the path manager to establish different paths in order to simultaneously use them during the connection. Many path managers have been proposed in the literature , but these are outside the scope of this document. The packet scheduler is the generic term for the algorithm that selects the path that will be used to transmit each packet on a multipath connection. This logic is obviously only useful when there are at least two active paths for a given multipath transport connection. A variety of packet schedulers have been proposed in the literature and implemented in multipath transport protocols. Experience with multipath transport protocols shows that the packet scheduler can have a huge impact on the performance achieved by such protocols. Packet re-assembly or re-ordering in multipath transport has the functionality to equalize the effect of packet scheduling across paths with different characteristics and restore the original packet order to a certain extent. Obviously, packet re-assembly is the counterpart of packet scheduling and located at the far end of the multipath transport. However, packet scheduling schemes exists which render the re-assembly superfluous or lowering at least its effort. In this document, we document a series of multipath packet schedulers that are known to provide performance that matches well the requirements of specific applications. To describe these packet schedulers, we assume an abstract transport that is briefly presented in . In we describe the challenges and constraints around a multipath scheduler. Finally, we describe the different schedulers in . To keep the description as simple and intuitive as possible, we assume here multipath connections that are composed of two paths, a frequent deployment scenario for multipath transport. This does not restrict the proposed schedulers to using only two paths. Implementations are encouraged to support more than 2 paths. We leave the discussion on how to adapt these abstract schedulers to concrete multipath transport protocols in future drafts.

For simplicity, we assume a multipath transport protocol which can send packets over different paths. Some protocols such as Multipath TCP support active and backup paths. We do not assume this in this document and leave the impact of these active/backup paths in specific documents. Furthermore, we assume that there are exactly two active paths for the presentation of the packet schedulers. We consider that a path is active as long as it supports the transmission of packets. Meaning, A Multipath TCP subflow TCP segment with the FIN or RST flags set is not considered as an active path. Other constraints are possible on whether or not a path is active. These are specific to the scheduler and vary depending on the goal of the scheduler. An example of these is that when a path has experienced a certain number N of retransmission timeouts, the path can be considered inactive. We assume that the transport protocol maintains one congestion controller per path as in . We do not assume a specific congestion controller, but assume that it can be queried by the packet scheduler to verify whether a packet of length l would be blocked or not by the congestion control scheme. A window-based congestion controller such as can block a packet from being transmitted for some time when its congestion window is full. The same applies to a rate-based congestion controller although the latter could indicate when the packet could be accepted while the former cannot. We assume that the multipath transport protocol maintains some state at the connection level and at the path level. On both level, the multipath transport protocol will maintain send and receive windows, and a Maximum Segment Size that is negotiated at connection establishment. It may also contain some information that is specific to the application (e.g. total amount of data sent or received) and information about non-active flows. At the path level, we expect that the multipath transport protocol will maintain an accurate estimation of the round-trip-time over that path, possibly a send/receive window, per path MTU information, the state of the congestion controller, and optionally information that is specific to the application or the packet scheduler (e.g. priority for one path over another one).

Packet scheduling tries to balance different quality of service goals with different constraints of the paths. The balance depends on which of the goals or constraints is the primary factor for the experience the application is aiming for. In the following we list these goals and constraints and conclude by how they can influence each other. Each path can be subject to a different cost when transmitting data. For example, a path can introduce a per-byte monetary cost for the transmission (e.g., metered cellular link). Another cost can be the power consumption when transmitting or receiving data. These costs are imposing restrictions on when a path can be used compared to the lower-cost path. A goal for many applications is to reduce the latency of their transaction. With multiple paths, each path can have a significantly different latency compared to the other paths. It is thus crucial to schedule the traffic on a path such that the latency requirements of the application are satisfied. Achieving high throughput is another goal of many applications. Streaming applications often require a minimum bit rate to sustain playback. The scheduler should try to achieve this bit rate to allow for a flawless streaming experience. Beyond that, adaptive streaming requires also a more stable throughput experience to ensure that the bit rate of the video stream is consistent. When sending traffic over multiple paths the bit rate can experience more variance and thus the scheduler for such a streaming application needs to take precautions to ensure a smooth experience. Finally, transport protocols impose a receive-window that signals to the sender how much data the application is willing to receive. When the paths have a large latency difference, a multipath transport can quickly become receive-window limited. This limitation comes from the fact that a packet might have been sent on a high-latency path. If the transport imposes in-order delivery of the data, the receiver needs to wait to receive this packet over the high-latency path before providing it to the application. The sender will thus become receive-window limited and may end up under-utilizing the low-latency path. This can become a major challenge when trying to achieve high throughput. All of these quality of service goals and constraints need to be balanced against each other. A scheduler might decide to trade latency for higher throughput. Or reduce the throughput with the goal of reducing the cost.

The packet scheduler is executed every time a packet needs to be transmitted by the multipath transport protocol. A packet scheduler can consider three different types of packets: packets that carry new user data packets that carry previously transmitted user data packets that only carry control information (e.g., acknowledgements, address advertisements) In Multipath TCP, the packet scheduler is only used for packets that carry data. Multipath TCP will typically return acknowledgements on the same path as the one over which data packets were received. For Multipath QUIC, the situation is different since Multipath QUIC can acknowledge over one path data that was previously received over another path. In Multipath TCP, this is only partially possible. The subflow level acknowledgements must be sent on the subflow where the data was received while the data-level acknowledgements can be sent over any subflow. This document uses the Python language to represent multipath schedulers. A multipath scheduler is represented as a Python function. This function takes the length of the next packet to schedule as argument and returns the path on which it will be send. A path is represented as a Python class with the following attributes: srtt: The smoothed RTT of the path . cc_state: The state of the congestion controller, i.e. either slow_start, congestion_avoidance or recovery. blocked(l): A function indicating whether a packet of length l would be rejected by the congestion controller. The schedulers presented can be executed in a simulator implementing the abstract multipath protocol presented in . It can be used to simulate a file transfer between a client and a server over multiple paths.

We use the Round-Robin scheduler as a simple example to illustrate how a packet scheduler can be specified, but we do not recommend its usage. Experiments with Multipath TCP indicate that it does not provide good performance. This packet scheduler uses one additional state at the connection level: last_path. This stores the identifier of the last path that was used to send a packet. The scheduler is defined by the code shown in .

This scheduler does not distinguish between the different types of packets. It iterates over the available paths and sends over the ones whose congestion window is open.

The Weighted Round-Robin scheduler is a more advanced version of the Round-Robin scheduler. It allows specifying a particular distribution of paths. This can be used to non-uniformly spread packets over paths. This packet scheduler adds two states: distribution: A list containing the distribution of paths to consider. Paths to which more importance is given will be present several times in the list. The ordering of the list allows to choose whether interleaved or burst sending is preferred. last_idx: It stores the index in the distribution of the last path used to send a packet.

Optional[Path]: next_idx = (self.last_idx + 1) % len(self.distribution) sorted_paths = self.distribution[next_idx:] + self.distribution[:next_idx] for i, p in enumerate(sorted_paths): if not p.blocked(packet_len): self.last_idx = (self.last.idx + i) % len(self.distribution) return p ]]> This scheduler does not distinguish between the different types of packets. It iterates over the available paths following the given distribution and sends over the ones whose congestion window is open. A variant of this algorithm could maintain a deficit per path and consider the length of packets when distributing them.

The Strict Priority scheduler's aim is to select paths based on a priority list. Some paths might go through networks that are more expensive to use than others. Then the idea is to select the path with the highest priority if it is available before looking at others by priority. This scheduler is described by the code shown in .

This scheduler can face performance issues if, compared to others, paths with high priority accept a lot of data but delivered packets with a high latency. When the path is experiencing bufferbloat, the receiver has to store packets for a long time in its buffers to ensure an in-order delivery. It is then recommended to cover these cases in the scheduler implementation with the help of the congestion control algorithm.

The Round-Trip-Time Threshold scheduler selects the first available path with a smoothed round-trip-time below a certain threshold. The goal is to keep the RTT of the multipath connection to a small value and avoid having the whole connection impacted by "bad" paths. A prototype is shown in .

This kind of protection can of course be added to other existing schedulers.

The Lowest round-trip-time first scheduler's goal is to minimize latency for short flows while at the same time achieving high throughput for long flows . To handle the latency differences across the paths when being limited by the receive-window, this scheduler deploys a fast reinjection mechanism to quickly recover from the head-of-line blocking. At each round, the scheduler iterates over the list of paths that are eligible for transmission. To decide whether or not a path is eligible, a few conditions need to be satisfied: The congestion window needs to provide enough space for the segment The path is not in fast-recovery or experiencing retransmission timeouts Among all the eligible paths, the scheduler will choose the path with the lowest RTT and transmit the segment with the new data on that path. illustrates a simple lowest RTT scheduler which does not include fast reinjections.

To handle head-of-line blocking situations when the paths have a large delay difference the scheduler uses a strategy of opportunistic retransmission and path penalization as described in . Opportunistic retransmission kicks in whenever a path is eligible for transmission but the receive-window advertised by the receiver prevents the sender from transmitting new data. In that case the sender can transmit previously transmitted data over the eligible path. To overcome the head-of-line blocking the sender will thus transmit the packet at the head of the transmission queue over this faster path (if it hasn't been transmitted on this particular path yet). This packet has thus a chance to quickly reach the receiver and fill the hole created by the head-of-line blocking. Whenever the previously mentioned mechanism kicks in, it is and indication that the path's round-trip-time is too high to allow the path with the lower RTT to fully use its capacity. We thus should reduce the transmission rate on this path. This mechanism is called penalization and is achieved by dividing the congestion window by 2. [comment:] ## Out-of-order transmission for in-order arrival

Combining some types of schedulers can be a way to address some use cases. For example, a scheduler using the priority and the round-trip-time attributes can be used to give more priorities to some links having a lower cost (e.g. fixed vs. mobile accesses) while still being able to benefit from the advantages of the "Lowest RTT First" scheduler described in . A prototype of this "hybrid" scheduler is shown in .

Combining some properties can have new undesired effects. In the case presented here, paths with a higher priority but also a higher RTT can affect performances compared to a setup having a scheduler not looking at the priority but only the round-trip-time. If paths with a higher priority are used first whatever the network conditions are on these paths, it is normal to sacrifice the total bandwidth capacity but fully use the capacity of these links with a higher priority. If the paths with a lower priority are seen as extra capacity that can be used only when the other links are congested, it is fine if they are not fully used when the sender is limited by the global sending window of the multipath connection. For this kind of scheduler, it could be interesting to also associate the benefits associated to a "Round-Trip-Time Threshold" scheduler described in . This scheduler prevents being too impacted by links having a higher priority but a very high RTT while other paths, with a lower priority and a lower RTT, can be used. It is a matter of qualifying what is important: maximizing the use of paths over reducing the latency and probably the total bandwidth as well if the sender and/or the receiver are limited by congestion windows. It is also important to note that the penalization mechanism described in the "Lowest round-trip-time first" scheduler in also needs to take into account the priority. If the goal is to maximize the use of some links over others, links with a higher priority cannot be penalized over the ones with a lower priority. The consequence of this would be that links with higher priority are under used due to the penalization.

TCP Extensions for Multipath Operation with Multiple Addresses TCP/IP communication is currently restricted to a single path per connection, yet multiple paths often exist between peers. The simultaneous use of these multiple paths for a TCP/IP session would improve resource usage within the network and, thus, improve user experience through higher throughput and improved resilience to network failure.Multipath TCP provides the ability to simultaneously use multiple paths between peers. This document presents a set of extensions to traditional TCP to support multipath operation. The protocol offers the same type of service to applications as TCP (i.e., reliable bytestream), and it provides the components necessary to establish and use multiple TCP flows across potentially disjoint paths. This document defines an Experimental Protocol for the Internet community. Coupled Congestion Control for Multipath Transport Protocols Often endpoints are connected by multiple paths, but communications are usually restricted to a single path per connection. Resource usage within the network would be more efficient were it possible for these multiple paths to be used concurrently. Multipath TCP is a proposal to achieve multipath transport in TCP.New congestion control algorithms are needed for multipath transport protocols such as Multipath TCP, as single path algorithms have a series of issues in the multipath context. One of the prominent problems is that running existing algorithms such as standard TCP independently on each path would give the multipath flow more than its fair share at a bottleneck link traversed by more than one of its subflows. Further, it is desirable that a source with multiple paths available will transfer more traffic using the least congested of the paths, achieving a property called "resource pooling" where a bundle of links effectively behaves like one shared link with bigger capacity. This would increase the overall efficiency of the network and also its robustness to failure.This document presents a congestion control algorithm that couples the congestion control algorithms running on different subflows by linking their increase functions, and dynamically controls the overall aggressiveness of the multipath flow. The result is a practical algorithm that is fair to TCP at bottlenecks while moving traffic away from congested links. This document defines an Experimental Protocol for the Internet community. Computing TCP's Retransmission Timer This document defines the standard algorithm that Transmission Control Protocol (TCP) senders are required to use to compute and manage their retransmission timer. It expands on the discussion in Section 4.2.3.1 of RFC 1122 and upgrades the requirement of supporting the algorithm from a SHOULD to a MUST. This document obsoletes RFC 2988. [STANDARDS-TRACK] Load Sharing for the Stream Control Transmission Protocol (SCTP) The Stream Control Transmission Protocol (SCTP) supports multi-homing for providing network fault tolerance. However, mainly one path is used for data transmission. Only timer-based retransmissions are carried over other paths as well. This document describes how multiple paths can be used simultaneously for transmitting user messages. Multipath Extensions for QUIC (MP-QUIC) This document specifies extensions to the QUIC protocol to enable the simultaneous usage of multiple paths for a single connection. These extensions are compliant with the single-path QUIC design and preserve QUIC privacy features. DCCP Extensions for Multipath Operation with Multiple Addresses DCCP communication is currently restricted to a single path per connection, yet multiple paths often exist between peers. The simultaneous use of these multiple paths for a DCCP session could improve resource usage within the network and, thus, improve user experience through higher throughput and improved resilience to network failure. Multipath DCCP provides the ability to simultaneously use multiple paths between peers. This document presents a set of extensions to traditional DCCP to support multipath operation. The protocol offers the same type of service to applications as DCCP and it provides the components necessary to establish and use multiple DCCP flows across potentially disjoint paths. Experimental Evaluation of Multipath TCP Schedulers How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP SMAPP : Towards Smart Multipath TCP-enabled APPlications Multipath simulator for the IETF draft Multipath schedulers

Renamed Delay Threshold to RTT Threshold Added the Priority And Lowest RTT First scheduler