Lightweight Fair Queueing
Kokkonranta 21
Pitkajarvi
31520
Finland
+358 44 927 2377
chromatix99@gmail.com
Redacted
Liberec 30
463 11
Czech Republic
pete@heistp.net
Internet
Transport Working Group
Internet-Draft
Fair Queueing
SCE
This note presents Lightweight Fair Queueing (LFQ), a fair queueing algorithm
with a small code footprint, low memory requirements, no multiply operations,
only two physical queues, and only one set of AQM state. LFQ provides throughput
fairness, sparse flow prioritization and ordering guarantees, making it suitable
for a mixture of traffic flow types.
Introduction
Flow isolation is a powerful tool for congestion management in today's Internet.
Early implementations, such as SFQ , aimed simply to have inter-flow induced
latency dependent on the number of flows, rather than the total length of the
queue. Today, DRR++ explicitly shares throughput capacity between
flows, and prioritises "sparse" flows that use less than their fair share, on
the grounds that these are probably latency-sensitive traffic. This reduces the
inter-flow induced latency to near zero for sparse flows, regardless of the
number of saturating flows.
Unfortunately, the relatively complex algorithms and considerable dynamic state
of a DRR++ queue set with individual AQM (Active Queue Management)
instances has proved disheartening to hardware implementors, and thus to
deployment on high-capacity links. Ordinary CPE devices implementing DRR++ in
software work well up to 100Mbps or so. A scheme involving only a small number
of queues and AQM instances might be more suitable for the 1Gbps and up
category.
This note therefore presents LFQ, a fair queueing algorithm suitable for
implementation in hardware, making fair queueing possible on high-throughput
routers and low-cost middleboxes.
Background
LFQ is inspired by DRR++'s facility for identifying "sparse" flows and giving them
strict priority over "saturating" flows. DRR++ does this by maintaining separate
lists of queues (each queue containing one flow) meeting "sparseness" criteria or not.
Queues are first placed into the sparse list when they become non-empty, then moved
to the saturating list when their deficit exceeds a threshold called "quantum".
Every queue's deficit is incremented by the packet size when packets are delivered
from it, and decremented by the quantum when they come up in the list rotation.
Queues are removed from the saturating list only when they are found empty for a
full rotation.
This "sparseness" heuristic over observed per-flow queue occupancy
characteristics is relatively robust, compared to relying on the correct
behaviour of each source's congestion control algorithms and/or explicit traffic
marking. This is especially relevant with the recent development of
high-fidelity congestion signalling schemes, such as DCTCP and SCE
(Some Congestion Experienced), whose expected congestion-signal response is
markedly different from previous standards.
In fq_codel and Cake , AQM is applied individually to each
DRR++ flow, thus avoiding unnecessary leakage of AQM action from flows requiring
it to well-behaved traffic which does not. This arrangement has been shown to
work well in practice, and is widely deployed as part of the Linux kernel,
including in many CPE devices. However the per-queue AQM state dominates the
memory requirements of DRR++.
LFQ attempts to retain most of these characteristics while simplifying implementation
requirements considerably. This still requires identifying individual traffic flows
and keeping some per-flow state, but there is no longer an individual queue per state
nor any lists of such queues. Instead there are only two queues and only one set of
AQM state. The operations required are believed to be amenable to hardware implementation.
The Algorithm
Overview
Unlike conventional fair queueing, with Lightweight Fair Queueing, packets are
not distributed to queues by a flow mapping, but by a sparseness metric
associated with that mapping. Thus, the number of queues is reduced to two.
The number of flows which can be handled is far greater, however, being limited
by the number of flow buckets indexed by the flow hash. An implementation might
define a flow as traffic to one subscriber, and provide a perfect mapping between
subscribers and buckets. Alternatively it might provide a stochastic mapping based on
the traditional 5-tuple of addresses, port numbers, and protocol number.
The per-flow state is just two integers (one signed, one unsigned) and one binary flag,
in contrast to DRR++ which requires a whole queue and a set of AQM state per flow.
These integers are B, tracking the backlog of the flow in packets, and D, tracking a
deficit value analogous to that used in DRR++. The range of D is [-MTU..+MTU], so for
a typical 1500-byte MTU, a 12-bit register suffices. The binary flag K indicates whether
packets should be skipped for the rest of this pass through the queue. This small
per-flow state makes tracking a large number of flows practical.
The two queues provided are SQ and BQ:
SQ is the "sparse queue" which handles flows classed as sparse, including the
first packets in newly active flows. This queue tends to remain short and
drain quickly, which are ideal characteristics for latency-sensitive traffic,
and young flows still establishing connections or probing for capacity. This
queue does not maintain AQM state nor apply AQM signals.
BQ is the "bulk queue" which handles all traffic not classed as sparse, including
at least the second and subsequent packets in a burst. BQ has not only the typical
"head" and "tail", but also a "scan" pointer which iterates over the packets in the
queue from head to tail. Packets are delivered from the "scan" position, not from
the "head"; this is key to the capacity-sharing mechanism. A full set of AQM
state is maintained on BQ, and applied to all traffic delivered from it.
In case of queue overflow, packets are removed from the "head" of BQ to make room
for the new arrivals; this head-dropping behaviour minimises the delay before the
lost packets can be retransmitted.
This simplification of state and algorithm has some drawbacks in terms of
resultant behaviour. The sharing of link capacity between flows will not be as
smooth as with DRR++, and the relatively coarse provision of AQM may result in
a noticeable degradation of congestion signalling.
Declarations
The following queues are defined:
-------------------------------
--> | | | | -->
-------------------------------
SQ: the Sparse Queue, containing packets from flows with no more
than one packet in the queue at a time (no AQM for this queue).
/--[AQM]-->
|
---------------------|---------
--> | | | | | | | | | | | |
-------------------------------
BQ: the Bulk Queue, containing packets from flows that build up a
multi-packet backlog (AQM managed queue), showing scan pointer.
The following constants and variables are defined:
- B: the flow backlog, in packets
- D: the flow deficit, in bytes
- K: the flow scan skip flag
- N: the number of flow buckets (each bucket containing a value of B, D, and K)
- S: the size of a packet
- T: the packet's timestamp, for later use by AQM
- H: the packet's flow hash, cached
- MTU: the MTU of the link
- MAXSIZE: the maximum size for all packets in the queue
- NOW: the current timestamp
- FLOWS: all flow buckets
Finally, the hash function FH() maps a packet to a flow bucket:
+-------+
/--- | B D K |
/ +-------+
/
+------+ / +-------+
----- Packet -----> | FH() | ------- | B D K |
+------+ \ +-------+
\
\
\--- ... N
Pseudo-code
In the following pseudo-code:
- Lowercase is used for internal variables, and uppercase for constants,
variables and queues defined in .
- The send() function applies AQM logic before actually transmitting the packet
given; if it drops the packet, dequeue() is immediately re-called.
The following functions and variables are defined for both the sparse and bulk
queues:
- The push() function adds a packet to the tail of the specified queue.
- The pop() function removes and returns the packet from the head of the
specified queue. BQ's scan pointer must point to the same packet afterwards,
if it is still present, otherwise to the head of the queue.
- The .size variable (BQ.size and SQ.size) refers to the sum of the sizes of all
packets in the queue, and may be maintained during push(), pop(), and pull().
- The .head variable is the current head pointer for the queue.
The following functions and variables are defined only for BQ:
- The pull() function removes and returns the packet at the scan pointer.
- The scan() function returns the packet at the scan pointer without removing
it.
- The head() function returns the packet at the head of the specified queue
without removing it.
- The .scan variable is the current scan pointer for the queue.
The logic for the enqueue operation is as follows:
enqueue(packet p) {
while (SQ.size + BQ.size + S > MAXSIZE) {
; Queue overflow - drop from BQ head, then from SQ
dp := pop(BQ)
if (!dp)
dp := pop(SQ)
bkt := dp.H
bkt.B -= 1
}
bkt := FH(p)
p.T = NOW
p.H = bkt
if (bkt.B == 0 && bkt.D >= 0 && !bkt.K)
push(SQ, p)
else
push(BQ, p)
bkt.B += 1
}
The logic for the dequeue operation is as follows:
dequeue() {
; SQ gets strict priority
p := pop(SQ)
if (p) {
send(p)
bkt := p.H
bkt.B -= 1
bkt.D -= S
if (bkt.D < 0) {
bkt.K = true
bkt.D += MTU
}
return
}
; Process BQ if SQ was empty
while (head(BQ)) {
p := scan(BQ)
if (!p) {
; Scan has reached tail of queue
forall(f in FLOWS where f.B == 0 && !f.K)
f.D = 0
forall(f in FLOWS where f.K)
f.K = false
BQ.scan = BQ.head
p := scan(BQ)
}
bkt := p.H
if (!bkt.K) {
; Packet eligible for immediate delivery
send(p)
pull(BQ)
bkt.B -= 1
bkt.D -= S
if (bkt.D < 0) {
bkt.K = true
bkt.D += MTU
}
return
} else {
; Packet to stay in queue
BQ.scan = BQ.scan.next
}
}
}
Simulator
A discrete time simulator for LFQ has been implemented, which acts as a
supporting demonstration, verification of the algorithm's effectiveness and a
test bed for exploration .
Security Considerations
As with all FQ algorithms, an attacker may degrade service by flooding the queue
with traffic that hashes into random buckets, or obtain enhanced service by
using multiple flows where one would normally suffice. The latter may be
mitigated by a flow mapping for individual hosts, or subscribers, rather than
the 5-tuple.
IANA Considerations
There are no IANA considerations.
Informative References
Stochastic Fairness Queueing
Deficits for Bursty Latency-critical Flows: DRR++
Piece of CAKE: A Comprehensive Queue Management Solution for Home Gateways
Lightweight Fair Queueing Simulator GitHub Repository