INTERNET-DRAFT                           J L Adams BT
individual submission                    A J Smith Cranfield University
                                         December 2001
rap working group

Expires May 31, 2002


             A New QoS Mechanism for Mass-Market Broadband
                    draft-adams-qos-broadband-00.txt


Status of this Memo

This document is an Internet-Draft and is subject to all provisions of Section 
10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task Force 
(IETF), its areas, and its working groups.  Note that other groups may also 
distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be 
updated, replaced, or obsoleted by other documents at any time.  It is 
inappropriate to use Internet-Drafts as reference material or to cite them other 
than as 'work in progress'.

The list of current Internet-Drafts can be accessed at http://www.ieft.org/lid-
abstracts.html

The list of Internet-Draft Shadow Directories can be accessed at 
http://www.ieft.org/shadow.html


Abstract

This document describes a proposal which deals with congestion conditions that 
may arise when a home or SME customer requests too many simultaneous flows to be 
forwarded down a DSL link or other access technology.  It provides a solution to 
guaranteeing certain flows while making others (typically the latest or another 
flow selected for policy reasons) the subject of focused packet discards.  It 
has a number of significant benefits over other possible solutions, such as 
classical RSVP, and these are also listed in the document.


1 Introduction

Broadband services delivered over DSL to residential or SME consumers have been 
the focus of much interest recently. The potential opportunities include TV 
distribution for selected residential areas, combined with voice and data 
services. Consumers may select lower value packages to begin with and move 
progressively through a process of 'upsell' towards higher value packages. It is 
envisaged that an upsell is automatically configured after the consumer selects 
it, e.g. using a browser.

This creates a market opportunity where lower value packages may be the normal 
offering in early realisations, and higher value packages are added to the 
platform in stages as vendor equipment develops. For example, service packages 
may exclude TV in the early offering.

Among the higher value packages that could be added later to a service platform 
is one which relies on a QoS function controlling the aggregate mix of services 
forwarded to each consumer.

This QoS function would protect certain flows that could be pre-selected by the 
consumers. Such flows would not be interrupted or subject to packet discard. 

This internet draft proposes a new QoS function at the IP layer that provides 
policy-based flow protection for consumers. We believe this new function has 
advantages over classical RSVP, but it may be accommodated within a more 
lightweight version of the protocol.

2 Background

2.1 Edge Nodes

Edge nodes exist in a network and channel all content from service providers 
towards customers.  While this could be achieved using a separate ATM VC for 
each service type (TV, voice, and data), this is very complex if it is extended 
so that e.g. the data VC is no longer a single VC but a separate VC for several 
types of data.  In particular web streaming would need a separate VC if its QOS 
is to be treated different to other data types.  Therefore, it is advantageous 
if all flows are aggregated onto a single ATM VC because it is possible to give 
each flow the possibility of policy controlled QOS treatment.  This implies that 
the IP layer has to handle the separate QOS requirements of each service type.  

Several vendors have developed equipment (Edge Nodes) that channels services 
using separate ATM VCs and several vendors are now considering how they can move 
to IP-based multiservice aggregation.  The device in this document relates to an 
improved Edge Node which operates in conjunction with essentially functionally 
similar equipment to that currently existing, except for a modification to the 
set top box.  This is made in order for it to recognise certain new alarm 
signals created by the device described here.  

The target requirements for Edge Nodes are that large numbers of customers 
should be connected (ultimately this may be 100,000 and upwards per Edge Node).

The customer is connected to the Edge Node via a DSL link.  In the network a 
number of fibre interfaces may be used, e.g. ATM which is well known as a simple 
and effective technology to pick out individual or aggregated content flows for 
a specific customer and forward them down the correct DSL links.  

ATM currently places some restrictions on the maximum link rate;  most vendors 
currently stop at 625 Mbit/s for ATM interfaces, because their products do not 
include SAR chips that go faster than this rate.  However it is expected that 
2.5 Gbit/s ATM links will be commonly available, within the next two years;  
this would permit rates towards groups of customers to be increased.  A higher 
rate of a few megabit/s per customer would permit the aggregation of TV and VOD 
signals into the mix.

We may anticipate patterns of demand to be such that there will be a mix of both 
lower rate customers and higher rate (multimegabit per second customers) on the 
same link, enabling that link to handle many hundreds of customers in total.
  
2.2 Quality of Service

While much effort has been directed by vendors towards the development of an 
Edge Node, there is one aspect where further improvements are needed.   An Edge 
Node must be able to control QoS when congestion occurs, and this is the subject 
of the device described in this document.

As an example scenario, consider a customer connected to a Virtual Private 
Network (VPN) which in turn is connected to various content sites.  The customer 
has subscribed to a basic service package, which provides a main content source, 
which can include TV and data.  This basic service package can be extended, and 
the customer is able to select from extra TV or data content sources.  

More generally, a customer can be connected to multiple VPNs and receive 
additional content via the internet.  All these sources of traffic can combine 
to cause congestion.  Both the simpler case of a single VPN and the extended 
case of multiple VPNs leads to the QoS issue.

An example of this occurs if a source of real time video is demanded at the same 
time as streamed media is being viewed by another person in the same home. These 
two flows could both have low loss tolerance.  If the combined traffic load 
produced by these sources is larger than the capacity of the link then it 
results in some information being lost (typically from both flows), and the 
perceived QoS becomes unacceptable.

The actual place where packets are discarded is best handled at a single point 
in the network for all downstream flows to a specific customer and, logically, 
this best point is the Edge Node.  We propose that a shaping function should be 
located at the Edge Node, which controls an envelope of traffic destined for any 
one customer at, or below, the customer's link capacity.  This negates the 
requirement for sophisticated traffic handling functions in the DSLAM equipment.

3 The device

The device described in this document would modify and improve the above 
proposed shaping function. It could also operate equally well in other network 
locations, wherever packets are buffered and can be examined in terms of their 
flow identities and class type.

Currently, when flows consist of different priority information, such as video 
and data, shapers would first cause the discard of the lower priority flows 
(typically the data flow) and protect the video flows.  However, our device 
addresses the problem of equal priority flows causing congestion, and unable to 
slow down through the control of e.g. TCP.

3.1 Classical RSVP: some disadvantages

Classical RSVP can be used for congestion control of IP based flows.  However, 
there are disadvantages with the full heavyweight version of the protocol.  RSVP 
messages are separate from the higher-level call request and acknowledgement 
messages that lead to e.g. phone -ringing (ie the H225 messages).  To introduce 
RSVP into e.g. the standard voice signalling message sequence requires the 
suspension of this sequence and then its resumption following the successful 
completion of the RSVP message sequence.  This kind of suspension-resumption 
methodology will have to be added to the higher level signalling sequence of any 
kind of content, to prevent such content from starting to flow before the 
reservations have been made.

If some flows are variable bit rate, RSVP is faced with difficult choices, which 
present disadvantages to this solution:
-  To admit the latest reservation request based on some average rate with the 
possibility that the flow will exceed this average rate for significantly long 
intervals and cause congestion and loss of packets to itself and other reserved 
bandwidth flows.  
-  To admit the latest reservation request based on a peak rate, thereby wasting 
some of the available capacity through the condition that flows will only be 
admitted while the sum of their peak rates is less than the available capacity.
-  To operate a function that tries to estimate the remaining available capacity 
on a link by estimating a percentile point of the current offered traffic load, 
and use this estimate as the condition for accepting or rejecting the latest 
reservation request.

Another disadvantage of RSVP (and indeed any other call admittance procedure), 
is the need to keep state information on flow arrivals and cessations so that 
guaranteed bandwidth can be returned to a notional common pool of available 
capacity.

Yet another disadvantage is the need to suspend higher-level session control 
protocols until RSVP has completed its reservations.  This requires certain 
timeouts to be implemented so that suspension does not continue indefinitely and 
various failure modes then need to be catered for, requiring additional state 
information to be kept.  For example, the 'call state' reached at the point 
where suspension is implemented needs to be kept, so that it can be torn down if 
necessary.

3.2 Device Advantages

All these disadvantages are overcome with the device described in this document.  
With this device it is possible to:

- Admit variable bit rate flows without being constrained to accept only a set 
of flows whose peak rates are less than the available capacity.
- Admit such flows without knowing the remaining capacity of the link.
- Admit flows without being required to keep active/ceased state information on 
admitted flows.
- Admit flows without requiring a suspension of higher-level session control 
protocols.
- Provide guarantees to each of the admitted flows except under certain extreme 
traffic conditions, when selected flows will be targeted for packet loss 
enabling other flows to continue without any loss or undesirable packet delays.

3.3 Device Operation Description

What follows is a detailed description of how the device operates to achieve 
these advantages.

3.3.1 Start Packet

When a flow towards the customer commences, a new control packet must be sent; 
we have called this a 'Start Packet'.  There is no requirement for that flow to 
wait for any processing or acknowledgement of its Start Packet and it can 
immediately start transmitting actual data packets after the Start Packet.

A Start Packet is an IP-layer control packet with an identifying field.  This 
field may be split into two parts, with one part being included in the standard 
IP layer.  For example, by setting bit 49 (not yet used for any other purpose), 
it would be identified as a control packet.  The other part of the field is the 
first element of the information field that further identifies it as a Start 
packet, or as an Alarm message packet.  The exact nature of this field needs 
agreement among the standards community.  In other respects, a Start Packet 
carries the same information (destination address, source address, and 
source/destination port numbers) as will be on the IP packet headers of the 
stream of data packets which form the flow behind the Start Packet.

Because of its field, the Start Packet would be recognisable to a packet discard 
device located, for example, at the edge of the network to the customer.  The 
basic principle is that the Start Packet contains information (such as the IP 
header fields of the subsequent data packets) which is loaded into a register by 
the Edge Node.   Subsequent data packets are examined, and if their headers 
match what is in the register, then such packets may be discarded when the 
buffer is filled beyond a certain threshold value.

Note that although we describe the device as operating on flows towards 
customers converging at Edge Nodes, there is no restriction on it to be only 
operating in that direction, or at one particular buffer point.  


3.3.2 Functionality

The device has a set of functions which are co-located with a buffer to achieve 
the advantages listed above.  The buffer is part of the proposed shaping 
function specific to a single customer and the output from this buffer towards 
the customer is restricted in maximum rate to be compatible with the capacity on 
the corresponding link.  The function that controls this rate limitation is a 
scheduling function.

The set of functions include:

- The customer-specific buffer;  the implementation of such a buffer need not be 
in the form of physically separate buffers per customer.   It would normally be 
a single buffer shared by all customers, with flow accounting maintained on a 
per-customer basis.

- A packet discard function which maintains a state machine specific to a 
customer (although it should be noted that the number of states maintained per 
customer are far more limited than the number required by RSVP). It also serves 
to detect newly arriving start packets that are routed to the customer-specific 
buffer.  It is an assumption of this description that there is already a routing 
process set up that routes packets (including start packets) destined for a 
specific customer towards a customer-specific buffer, where a shaped output is 
enforced.  The buffer is needed to absorb some degree of burstiness in the 
arrival rate.  

- A main processor that controls which flows specific to a given customer may be 
subject to focused discards, as discussed further below.  As with the buffer, an 
actual implementation would normally run a virtual process per customer in a 
single processor capable of handling all customers on a link.

- A register which maintains discard control information specific to a given 
customer.  Again, an actual implementation would use a single register which is 
divided on a per-customer basis into a number of virtual registers. 

3.3.3 Basic Operation

In its simplest operation a succession of start packets (preceding a succession 
of new flows) are sent towards a customer and are loaded into the (virtual) 
register such that each overwrites the previous one and therefore the register 
always contains the latest flow.

When a packet identity is removed from the register (usually by being 
overwritten) the corresponding flow becomes bandwidth guaranteed except under 
certain extreme traffic conditions to be discussed below. This means that, 
normally, there are no packets discarded from such a flow when the buffer 
experiences congestion. We speak here of such flows as having entered the 
guaranteed area.

If, over an interval of time a sequence of flows start with their corresponding 
start packets, then the normal behaviour of the system being described here 
allows some of the earlier flows to move to the guaranteed area.  It will always 
retain at least one flow identity in the register, whose packets will be the 
subject of focused discard if the buffer becomes too full.

A focused discard is triggered when the buffer has sent a control signal to the 
main control logic indicating that a fill threshold level has been exceeded.  
This subsequently instructs the discarding function to commence packet 
discarding. Before beginning to discard packets the discarding function sends 
two control packets.

The first is sent forward towards the customer; this control packet is called 
the 'Congestion Notification' packet.  This advises the application resident in 
the customer's equipment that a network congestion condition has occurred. An 
application may choose to continue receiving such data packets that are not 
deleted by the discarding function, or it may close down and indicate network 
busy to the user.  

The second is sent backwards towards the source to indicate that this flow is 
about to become the subject of focussed packet discard.  Again, the source may 
choose to ignore this control packet, or may terminate the flow. 

The discard function then commences to discard all packets whose flow identity 
matches the identity in the flow register.  Packet discarding will continue 
until the buffer fill level is reduced to a lower threshold value.

The main control logic may also inform a network billing function that flow 
discarding has commenced on a specific flow, if the charging arrangements 
require this information. In some preferred arrangements the customer will be 
billed on a flat-rate basis and therefore it may be unnecessary to send any 
indication to a billing function.

If an application chooses to close down on receipt of the Congestion 
Notification signal then it is responsible for sending the appropriate signals 
to the source end to shut down the flow. These procedures are outside the scope 
of this device and will vary from application to application. 

3.3.4 More Refined Operations

In a refinement of the simplest way of operating such control functions for 
packet discard, a field is utilised in the Start Packet. This field is known as 
the 'Rate Advisory' field. This field conveys the peak bit-rate of the flow. The 
register is now loaded so that it always retains a set of flows whose rate 
advisories sum to N percent, e.g. 5 percent of the link bandwidth. The value N 
can be varied to suit certain known traffic conditions. It caters for the degree 
of uncertainty that exists when accepting variable bit-rate flows. Thus, if the 
combined set of flows in the 'guaranteed area' bursts to a load-level which is 
significantly higher than the link capacity, then focused discard on the set of 
flows in the register can be expected to reduce the load by up to N percent of 
the link capacity.  It provides sufficient flows to focus on for packet discards 
to get the buffer fill level down below the threshold value.

Another refinement is concerned with the problem of how equal flows are subject 
to focused discards if several such flows are currently in the register. It is 
possible to operate the discard function so that the latest flow is the most 
vulnerable, and earlier flows (which are still retained because they make up a 
combined set of flows whose rate advisories sum to N percent) become less and 
less vulnerable, until they eventually leave the window into the guaranteed 
area.

The discarding function will try to control the forwarding rate towards the 
buffer according to a leaky bucket principle, where only a limited burst of 
packets above a defined rate is permitted to be forwarded to the buffer. This 
defined rate is equal to the rate at which the buffer can transmit packets 
towards the customer. The discarding function can start by discarding every 
packet of the latest flow and only pick on additional packets from other flows 
in its register if it would otherwise exceed its burst size restrictions.

There are other ways of operating the discard function, such as policy-based 
controls where, instead of the latest flow being the one chosen for total 
discard, another flow is chosen due to policy information stored in the discard 
function.

A specific way of obtaining such policy information is to make use of a second 
control field of the start packet. This field is termed the sub-components 
field, and allows policy information to be captured within the register and 
readable by the discard function. When a flow consists of different media 
components, such as video and data, this sub-component field stores information 
relating to each component including its priority in terms of packet discard. 
The packet matching performed on the data packets passing through the discard 
function includes not only the destination and source addresses but also other 
information that uniquely identifies a sub-component. This may include source or 
destination port numbers or other information such as TOS QoS settings.

The fraction of the link bandwidth that is used as a control for the set of 
flows retained in the discard function effectively defines a 'window' size for 
flow retention. Thus a flow starts up, its identity enters the discard function, 
and exits when further additional flows have arrived whose combined set of rate 
advisories makes it no longer necessary to retain this earlier flow. The term 
'within the window' is used here to describe flows that are currently 
progressing towards guaranteed status.

3.3.5 Failure Conditions

In this section we describe the action taken by the device under failure 
conditions.  

-	 It is possible that a start packet may fail to arrive, having been lost 
in the network between the point of generation and the device buffer.  The 
proposed solution is to make the source generate two start packets which are 
sent prior to any data packet; and,  regardless of the QoS setting of the 
subsequent data packets, to mark all start packets with a very high priority or 
class setting, thus making the loss of both packets very improbable.

-  It is possible that too many flows are requested by the customer, within a 
very short interval of time, so that the ability to assess their impact on the 
buffer cannot be assessed on a one-flow-at-a-time basis.  The solution to this 
is to have a guard period that is the minimum time that a flow identity can 
remain in the window.  For example, a counter is reset for each such flow at the 
moment when it enters the window and the flow identity must remain in the window 
until some number N of data packets has been sent and detected by the discard 
function, regardless of any other criteria governing exit from the window.  
However, it does not apply to those flows which, through policy reasons, are not 
put in the window.

-  It is possible that so many flows are requested by the customer within some 
very short interval of time that the control logic has insufficient space to 
handle all of their separate identities and guard periods within the window.  
The solution to this is for the main processor to have an alarm function that is 
triggered by such a condition.  If triggered, the discard function is instructed 
to send a special Alarm signal towards the customer, indicating that the service 
is being abused outside of its expected parameters, and that all flows are being 
discarded.  The discard function now deletes all packets.  The Alarm message 
will advise the customer to contact the network or service administrator because 
the administrator will need to reset a discard flag and clear the existing 
window data after clearing down all flows.

-  It is possible that a flow is maintained by the customer as active even 
though it has been silent for some time.  It now starts up again and creates 
congestion.  Normally, this would not cause a problem since the flow is either 
still in the window, in which case its packets would start to be discarded, or 
it has moved to the guaranteed area, in which case other, newer flows will start 
to be discarded.  There are, however, exceptional conditions when a high bit-
rate real-time flow behaves in this way.  Its rate could now exceed the 
protective window capacity (e.g. the N percent figure).  This would also happen 
if some malicious flow delayed the onset of some very high rate until after it 
was likely to be guaranteed and then overloaded the buffer.

The solution to these abnormal conditions is to allow the discard function to 
randomly choose an additional flow (by selecting this information from any 
passing data packet) and add such a random flow to the register window, and 
begin discarding on it.  This flow could be distinguished from other flows by 
setting an additional parameter called the aux flowid parameter to 'emergency' 
(usually it is set to 'normal').  The discard function would also send an Alarm 
signal to the customer saying that the operation is outside of the expected 
service parameters.

The discard function would be triggered into this mode of selecting one or more 
additional flows whenever the buffer fill-level hits a second, higher threshold, 
generating an alarm signal.  Once in this mode (emergency delete mode), the 
discard function can repeat the random selection of further flows any number of 
times until the buffer loading starts to reduce.

If the buffer load starts to reduce, a buffer alarm off signal is generated, 
causing the discard function to perform a stability check before removing any 
flowids from the register which have their aux parameter set to the value 
emergency.  The stability check is designed to prevent the discard function from 
removing emergency flowids from the register and then quickly needing to add a 
random new set under the conditions that the buffer is quickly oscillating 
between alarm off and alarm on signals.  It is preferable to keep the same set 
of emergency flowids under these conditions, which helps to limit the number of 
different flowids that become randomly selected.

The stability check consists of the discard function inspecting flowids for 
emergency settings.  If this is the case, a timeout period is begun, and the 
function monitors buffer alarm on/off transitions during timeout.  If the buffer 
generates an alarm during this period, the emergency flows are not cleared from 
the register and remain the target of discard.  This situation continues until 
an alarm off signal is generated by the buffer which causes a further timeout 
period to commence.  The discard function will always perform the stability 
check before removing aux=emergency flowids from the register.  The final error 
trap used by the discard function protects against a timeout period beginning if 
there is already a timeout in progress. 

-  It is possible that a flow sends no Start Packet.  This may cause existing 
flows in the window to be discarded if the additional flow (which has sent no 
Start Packet) causes congestion.  In the extreme, it will trigger the same 
actions and alarm messages as described in the previous condition.

Under the circumstances of the abnormal conditions described in the last two 
conditions, it is possible that some guaranteed flows are subject to packet 
discard, but this should be an exceptional event that is regarded as an alarm 
condition.


4 Conclusions

The device described in this document offers a bridge between two worlds. The 
narrower the interval that we have termed the window, the more the current 
invention emulates the classic connection-oriented paradigm. The latest 
connection is the only one in the window. It is therefore either accepted or 
subject at any moment to full packet discard. If accepted it is placed in the 
guaranteed area as soon as a further new flow starts up. 

On the other hand, the wider the window, the more like the classic 
connectionless world. Most flows are vulnerable to packet discard when the 
buffer is too full.

Notice also that this device fits with the connectionless paradigm in that 
sources are only required to transmit a start packet and then, without waiting 
further, start to transmit their data. There is no negotiation (unlike classical 
RSVP) yet there are still guaranteed flows in the case of window sizes that are 
some small fraction of the link rate.

So we effectively have a new QoS procedure that is based only on start packets 
and no subsequent response packets are triggered or used. In place of such 
additional control messages as would have been expected in the classic 'circuit 
world' only warning indications are triggered on flows just prior to packet 
discard.

The device may be refined by the addition of policy controls governing how a 
flow gets into the window. A family may decide that viewing the main film on a 
Thursday is the most important thing that day. That's why they subscribed to the 
service and they want that guaranteed. So, even though the film is the latest 
flow it moves straight into the guaranteed area because of a policy database 
that can be written to by the customer using, for example, a browser. This 
database information is readable by the main control logic. When this function 
is informed by the discard function that a new start packet has arrived, it 
checks the policy database and determines if the flow is to be added to the 
register or simply ignored so that effectively it passes straight to the 
guaranteed area. If it is moved straight to the guaranteed area, the possibility 
of recovering from buffer overload at the time when the movie starts is still 
achieved by focusing on the previous flows which remain in the window.


John L Adams                                john.l.adams@bt.com
pp MLB G 7
Orion Building (B62-MH)                     +44 1473 606321 
Adastral Park
Martlesham Heath
Ipswich IP5 3RE 
UK


A new QoS Mechanism for Mass-market Broadband           Adams and Smith

draft-adams-QoS-broadband-00.txt   expires: May 31, 2002