Multipath schedulersUCLouvainOlivier.Bonaventure@uclouvain.beUCLouvainMaxime.Piraux@uclouvain.beUCLouvainquentin.deconinck@uclouvain.beTessaresMatthieu.Baerts@tessares.netApplecpaasch@apple.comDeutsche Telekommarkus.amend@telekom.de
IRTF
ICCRG Working GroupInternet-DraftThis document proposes a series of abstract packet schedulers for
multipath transport protocols equipped with a congestion controller.The Internet was designed under the implicit assumption that hosts are
equipped with a single network interface while routers are equipped with several
ones. Under this assumption, an Internet host is usually
identified by the IP address of its network interface.This assumption does not hold anymore today for two reasons. First,
a growing fraction of the Internet hosts are equipped with several network
interfaces, usually through different datalink networks. These multihomed
hosts are reachable via different IP addresses. Second, a growing
fraction of the hosts that are attached through a single network interface are
dual-stack and are thus reachable over both IPv4 and IPv6.Several Internet transport protocols have been extended to leverage the
different paths that are exposed on such hosts: Multipath TCP , the
load sharing extensions to SCTP , Multipath
DCCP and
Multipath QUIC . These multipath transport
protocols differ in the way they are organized and exchange control information
and user data. However, they all include algorithms to handle three problems
that any multipath transport protocol needs to solve:Congestion controllerPath managerPacket schedulerPacket re-assemblyFrom a congestion control viewpoint, the main concern for a multipath transport
protocol is that a multipath connection should not be unfair to single-path
transport connections that share a common bottleneck. This problem can be solved
by coupling the congestion windows of the different paths. The solution proposed
in is applicable to any transport protocol. Beside providing fairness,
congestion control can also be a valuable input for different kind of traffic
distribution algorithm within a packet scheduler. Typically metrics like RTT and
available capacity can be derived.A multipath transport protocol uses different flows during the lifetime of a
connection. The Path Manager contains the logic that regulates the
creation/deletion of these flows. This logic usually depends on the
requirements of the application that uses the multipath transport. Some
applications use multipath in failover situations. In this case, the connection
can use one path and the path manager can create another path when the primary
one fails. An application that wishes to share its load among different paths
can request the path manager to establish different paths in order to
simultaneously use them during the connection. Many path managers have been
proposed in the literature , but these are outside the scope of this
document.The packet scheduler is the generic term for the algorithm that selects the
path that will be used to transmit each packet on a multipath connection. This
logic is obviously only useful when there are at least two active paths for
a given multipath transport connection. A variety of packet schedulers have
been proposed in the literature and implemented in multipath
transport protocols. Experience with multipath transport protocols shows that
the packet scheduler can have a huge impact on the performance achieved by such
protocols.Packet re-assembly or re-ordering in multipath transport has the functionality
to equalize the effect of packet scheduling across paths with different
characteristics and restore the original packet order to a certain extent. Obviously,
packet re-assembly is the counterpart of packet scheduling and located at the
far end of the multipath transport. However, packet scheduling schemes exists
which render the re-assembly superfluous or lowering at least its effort.In this document, we document a series of multipath packet schedulers that
are known to provide performance that matches well the requirements of specific
applications. To describe these packet schedulers, we assume an
abstract transport that is briefly presented in . In
we describe the challenges and constraints around a multipath scheduler.
Finally, we describe the different schedulers in . To keep the
description as simple and intuitive as possible, we assume here multipath
connections that are composed of two paths, a frequent deployment scenario for
multipath transport.
This does not restrict the proposed schedulers to using only two paths.
Implementations are encouraged to support more than 2 paths. We leave the
discussion on how to adapt these abstract schedulers to concrete multipath
transport protocols in future drafts.For simplicity, we assume a multipath transport protocol which can send packets
over different paths. Some protocols such as Multipath TCP support
active and backup paths. We do not assume this in this
document and leave the impact of these active/backup paths in specific
documents.Furthermore, we assume that there are exactly two active paths for
the presentation of the packet schedulers. We consider that a path is active as
long as it supports the transmission of packets. Meaning, A Multipath TCP
subflow TCP segment with the FIN or RST flags set is not considered as an active
path.
Other constraints are possible on whether or not a path is active. These are
specific to the scheduler and vary depending on the goal of the scheduler. An
example of these is that when a path has experienced a certain number N of
retransmission timeouts, the path can be considered inactive.We assume that the transport protocol maintains one congestion controller per
path as in . We do not assume a specific congestion controller,
but assume that it can be queried by the packet scheduler to verify whether
a packet of length l would be blocked or not by the congestion control scheme.
A window-based congestion controller such as can block a packet
from being transmitted for some time when its congestion window is full. The
same applies to a rate-based congestion controller although the latter could
indicate when the packet could be accepted while the former cannot.We assume that the multipath transport protocol maintains some state at the
connection level and at the path level. On both level, the multipath
transport protocol will maintain send and receive windows, and a Maximum
Segment Size that is negotiated at connection establishment.It may also contain some information that is specific to the
application (e.g. total amount of data sent or received) and information about
non-active flows. At the path level, we expect that the multipath transport
protocol will maintain an accurate estimation of the round-trip-time over that
path, possibly a send/receive window, per path MTU information, the state of the
congestion controller, and optionally information that is specific to the
application or the packet scheduler (e.g. priority for one path over another
one).Packet scheduling tries to balance different quality of service goals with
different constraints of the paths. The balance depends on which of the goals
or constraints is the primary factor for the experience the application is
aiming for. In the following we list these goals and constraints and conclude
by how they can influence each other.Each path can be subject to a different cost when transmitting data. For example,
a path can introduce a per-byte monetary cost for the transmission (e.g., metered
cellular link). Another cost can be the power consumption when transmitting or
receiving data. These costs are imposing restrictions on when a path can be used
compared to the lower-cost path.A goal for many applications is to reduce the latency of their transaction. With
multiple paths, each path can have a significantly different latency compared to
the other paths. It is thus crucial to schedule the traffic on a path such that
the latency requirements of the application are satisfied.Achieving high throughput is another goal of many applications. Streaming applications
often require a minimum bit rate to sustain playback. The scheduler should try to
achieve this bit rate to allow for a flawless streaming experience. Beyond that,
adaptive streaming requires also a more stable throughput experience to ensure
that the bit rate of the video stream is consistent. When sending traffic
over multiple paths the bit rate can experience more variance and thus the
scheduler for such a streaming application needs to take precautions to ensure
a smooth experience.Finally, transport protocols impose a receive-window that signals to the sender
how much data the application is willing to receive. When the paths have a large
latency difference, a multipath transport can quickly become receive-window limited.
This limitation comes from the fact that a packet might have been sent on a
high-latency path. If the transport imposes in-order delivery of the data, the
receiver needs to wait to receive this packet over the high-latency path before
providing it to the application. The sender will thus become receive-window limited
and may end up under-utilizing the low-latency path. This can become a major
challenge when trying to achieve high throughput.All of these quality of service goals and constraints need to be balanced against
each other. A scheduler might decide to trade latency for higher throughput. Or
reduce the throughput with the goal of reducing the cost.The packet scheduler is executed every time a packet needs to be transmitted
by the multipath transport protocol. A packet scheduler can consider three
different types of packets:packets that carry new user datapackets that carry previously transmitted user datapackets that only carry control information (e.g., acknowledgements, address advertisements)In Multipath TCP, the packet scheduler is only used for packets that carry data. Multipath TCP will typically return acknowledgements on the same path as the one over which data packets were
received. For Multipath QUIC, the situation is different since Multipath QUIC
can acknowledge over one path data that was previously received over another path.
In Multipath TCP, this is only partially possible. The subflow level
acknowledgements must be sent on the subflow where the data was received while
the data-level acknowledgements can be sent over any subflow.This document uses the Python language to represent multipath schedulers. A
multipath scheduler is represented as a Python function. This function takes the
length of the next packet to schedule as argument and returns the path on which
it will be send. A path is represented as a Python class with the following
attributes:srtt: The smoothed RTT of the path .cc_state: The state of the congestion controller, i.e. either slow_start,
congestion_avoidance or recovery.blocked(l): A function indicating whether a packet of length l would be
rejected by the congestion controller.The schedulers presented can be executed in a simulator
implementing the abstract multipath protocol presented in . It can
be used to simulate a file transfer between a client and a server over multiple
paths.We use the Round-Robin scheduler as a simple example to illustrate how a packet
scheduler can be specified, but we do not recommend its usage. Experiments with
Multipath TCP indicate that it does not provide good performance.This packet scheduler uses one additional state at the connection level:
last_path. This stores the identifier of the last path that was used to send a
packet. The scheduler is defined by the code shown in .This scheduler does not distinguish between the different types of packets. It
iterates over the available paths and sends over the ones whose congestion
window is open.The Weighted Round-Robin scheduler is a more advanced version of the Round-Robin
scheduler. It allows specifying a particular distribution of paths. This can be
used to non-uniformly spread packets over paths.This packet scheduler adds two states:distribution: A list containing the distribution of paths to consider. Paths
to which more importance is given will be present several times in the list.
The ordering of the list allows to choose whether interleaved or burst sending
is preferred.last_idx: It stores the index in the distribution of the last path used to
send a packet.This scheduler does not distinguish between the different types of packets. It
iterates over the available paths following the given distribution and sends
over the ones whose congestion window is open. A variant of this algorithm
could maintain a deficit per path and consider the length of packets when
distributing them.The Strict Priority scheduler's aim is to select paths based on a priority list.
Some paths might go through networks that are more expensive to use than others.
Then the idea is to select the path with the highest priority if it is available
before looking at others by priority. This scheduler is described by the code
shown in .This scheduler can face performance issues if, compared to others, paths with
high priority accept a lot of data but delivered packets with a high latency.
When the path is experiencing bufferbloat, the receiver has to store packets for
a long time in its buffers to ensure an in-order delivery. It
is then recommended to cover these cases in the scheduler implementation with
the help of the congestion control algorithm.The Round-Trip-Time Threshold scheduler selects the first available path with a
smoothed round-trip-time below a certain threshold. The goal is to keep the RTT
of the multipath connection to a small value and avoid having the whole
connection impacted by "bad" paths. A prototype is shown in
.This kind of protection can of course be added to other existing schedulers.The Lowest round-trip-time first scheduler's goal is to minimize latency for
short flows while at the same time achieving high throughput for long flows .
To handle the latency differences across the paths when being limited by the
receive-window, this scheduler deploys a fast reinjection mechanism to quickly
recover from the head-of-line blocking.At each round, the scheduler iterates over the list of paths that are eligible
for transmission. To decide whether or not a path is eligible, a few conditions
need to be satisfied:The congestion window needs to provide enough space for the segmentThe path is not in fast-recovery or experiencing retransmission timeoutsAmong all the eligible paths, the scheduler will choose the path with the
lowest RTT and transmit the segment with the new data on that path.
illustrates a simple lowest RTT scheduler which does not
include fast reinjections.To handle head-of-line blocking situations when the paths have a large delay
difference the scheduler uses a strategy of opportunistic retransmission and
path penalization as described in .Opportunistic retransmission kicks in whenever a path is eligible for transmission
but the receive-window advertised by the receiver prevents the sender from transmitting
new data. In that case the sender can transmit previously transmitted data over the
eligible path. To overcome the head-of-line blocking the sender will thus transmit
the packet at the head of the transmission queue over this faster path (if it
hasn't been transmitted on this particular path yet). This packet has thus a
chance to quickly reach the receiver and fill the hole created by the head-of-line
blocking.Whenever the previously mentioned mechanism kicks in, it is and indication that
the path's round-trip-time is too high to allow the path with the lower RTT to
fully use its capacity. We thus should reduce the transmission rate on this path.
This mechanism is called penalization and is achieved by dividing the congestion
window by 2.[comment:] ## Out-of-order transmission for in-order arrivalCombining some types of schedulers can be a way to address some use cases. For
example, a scheduler using the priority and the round-trip-time attributes can
be used to give more priorities to some links having a lower cost (e.g. fixed
vs. mobile accesses) while still being able to benefit from the advantages of
the "Lowest RTT First" scheduler described in . A prototype of
this "hybrid" scheduler is shown in .Combining some properties can have new undesired effects. In the case presented
here, paths with a higher priority but also a higher RTT can affect performances
compared to a setup having a scheduler not looking at the priority but only the
round-trip-time. If paths with a higher priority are used first whatever the
network conditions are on these paths, it is normal to sacrifice the total
bandwidth capacity but fully use the capacity of these links with a higher
priority. If the paths with a lower priority are seen as extra capacity that can
be used only when the other links are congested, it is fine if they are not
fully used when the sender is limited by the global sending window of the
multipath connection.For this kind of scheduler, it could be interesting to also associate the
benefits associated to a "Round-Trip-Time Threshold" scheduler described in
. This scheduler prevents being too impacted by links having a
higher priority but a very high RTT while other paths, with a lower priority and
a lower RTT, can be used. It is a matter of qualifying what is important:
maximizing the use of paths over reducing the latency and probably the total
bandwidth as well if the sender and/or the receiver are limited by congestion
windows.It is also important to note that the penalization mechanism described in the
"Lowest round-trip-time first" scheduler in also needs to take
into account the priority. If the goal is to maximize the use of some links over
others, links with a higher priority cannot be penalized over the ones with a
lower priority. The consequence of this would be that links with higher priority
are under used due to the penalization.TCP Extensions for Multipath Operation with Multiple AddressesTCP/IP communication is currently restricted to a single path per connection, yet multiple paths often exist between peers. The simultaneous use of these multiple paths for a TCP/IP session would improve resource usage within the network and, thus, improve user experience through higher throughput and improved resilience to network failure.Multipath TCP provides the ability to simultaneously use multiple paths between peers. This document presents a set of extensions to traditional TCP to support multipath operation. The protocol offers the same type of service to applications as TCP (i.e., reliable bytestream), and it provides the components necessary to establish and use multiple TCP flows across potentially disjoint paths. This document defines an Experimental Protocol for the Internet community.Coupled Congestion Control for Multipath Transport ProtocolsOften endpoints are connected by multiple paths, but communications are usually restricted to a single path per connection. Resource usage within the network would be more efficient were it possible for these multiple paths to be used concurrently. Multipath TCP is a proposal to achieve multipath transport in TCP.New congestion control algorithms are needed for multipath transport protocols such as Multipath TCP, as single path algorithms have a series of issues in the multipath context. One of the prominent problems is that running existing algorithms such as standard TCP independently on each path would give the multipath flow more than its fair share at a bottleneck link traversed by more than one of its subflows. Further, it is desirable that a source with multiple paths available will transfer more traffic using the least congested of the paths, achieving a property called "resource pooling" where a bundle of links effectively behaves like one shared link with bigger capacity. This would increase the overall efficiency of the network and also its robustness to failure.This document presents a congestion control algorithm that couples the congestion control algorithms running on different subflows by linking their increase functions, and dynamically controls the overall aggressiveness of the multipath flow. The result is a practical algorithm that is fair to TCP at bottlenecks while moving traffic away from congested links. This document defines an Experimental Protocol for the Internet community.Computing TCP's Retransmission TimerThis document defines the standard algorithm that Transmission Control Protocol (TCP) senders are required to use to compute and manage their retransmission timer. It expands on the discussion in Section 4.2.3.1 of RFC 1122 and upgrades the requirement of supporting the algorithm from a SHOULD to a MUST. This document obsoletes RFC 2988. [STANDARDS-TRACK]Load Sharing for the Stream Control Transmission Protocol (SCTP)The Stream Control Transmission Protocol (SCTP) supports multi-homing for providing network fault tolerance. However, mainly one path is used for data transmission. Only timer-based retransmissions are carried over other paths as well. This document describes how multiple paths can be used simultaneously for transmitting user messages.Multipath Extensions for QUIC (MP-QUIC)This document specifies extensions to the QUIC protocol to enable the simultaneous usage of multiple paths for a single connection. These extensions are compliant with the single-path QUIC design and preserve QUIC privacy features.DCCP Extensions for Multipath Operation with Multiple AddressesDCCP communication is currently restricted to a single path per connection, yet multiple paths often exist between peers. The simultaneous use of these multiple paths for a DCCP session could improve resource usage within the network and, thus, improve user experience through higher throughput and improved resilience to network failure. Multipath DCCP provides the ability to simultaneously use multiple paths between peers. This document presents a set of extensions to traditional DCCP to support multipath operation. The protocol offers the same type of service to applications as DCCP and it provides the components necessary to establish and use multiple DCCP flows across potentially disjoint paths.Experimental Evaluation of Multipath TCP SchedulersHow Hard Can It Be? Designing and Implementing a Deployable Multipath TCPSMAPP : Towards Smart Multipath TCP-enabled APPlicationsMultipath simulator for the IETF draft Multipath schedulersRenamed Delay Threshold to RTT ThresholdAdded the Priority And Lowest RTT First scheduler