Internet Draft                                        Mick Seaman
Expires January 1998                                         3Com
draft-ietf-issll-is802-svc-mapping-00.txt            Andrew Smith
                                                 Extreme Networks
                                                     Eric Crawley
                                              Gigapacket Networks
                                                        July 1997


            Integrated Service Mappings on IEEE 802 Networks


Status of this Memo

   This document is an Internet Draft.  Internet Drafts are working
   documents of the Internet Engineering Task Force (IETF), its Areas,
   and its Working Groups. Note that other groups may also distribute
   working documents as Internet Drafts.

   Internet Drafts are draft documents valid for a maximum of six
   months. Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time.  It is not appropriate to use Internet
   Drafts as reference material or to cite them other than as a "working
   draft" or "work in progress."

   Please check the I-D abstract listing contained in each Internet
   Draft directory to learn the current status of this or any other
   Internet Draft.


Abstract

This document describes the support of IETF Integrated Services over
LANs built from IEEE 802 network segments which may be interconnected by
IEEE 802.1 MAC Bridges (switches) [1].

It describes the practical capabilities and limitations of this
technology for supporting Controlled Load [8] and Guaranteed Service [9]
using the inherent capabilities of the relevant 802 technologies
[5],[6],[15],[16] etc. and the proposed 802.1p queuing features in
switches. IEEE P802.1p [2] is a superset of the existing IEEE 802.1D
bridging specification. This document provides a functional model for
the layer 3 to layer 2 and user-to-network dialogue which supports
admission control and defines requirements for interoperability between
switches. The special case of such networks where the sender and
receiver are located on the same segment is also discussed.

This scheme expands on the ISSLL over 802 LANs framework described in


Seaman, Smith, Crawley    Expires January 1998          [Page 1]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


[7]. It makes reference to a signaling protocol for admission control
developed by the ISSLL WG which is known as the "Subnet Bandwidth
Manager". This is an extension  to the IETF's RSVP protocol [4] and is
described in a separate document [10].


1. Introduction

The IEEE 802.1 Interworking Task Group is currently enhancing the basic
MAC Service provided in Bridged Local Area Networks (a.k.a. "switched
LANs"). As a supplement to the original IEEE MAC Bridges standard [1],
the update P802.1p [2] proposes differential traffic class queuing and
access to media on the basis of a "user_priority" signaled in frames.

In this document we
* review the meaning and use of user_priority in LANs and the frame
forwarding capabilities of a standard LAN switch.
* examine alternatives for identifying layer 2 traffic flows for
admission control.
* review the options available for policing traffic flows.
* derive requirements for consistent traffic class handling in a network
of switches and use these requirements to discuss queue handling
alternatives for 802.1p and the way in which these meet administrative
and interoperability goals.
* consider the benefits and limitations of this switched-based approach,
contrasting it with full router based RSVP implementation in terms of
complexity, utilisation of transmission resources and administrative
controls.

The model used is outlined in the "framework document" [7] which in
summary:
* partitions the admission control process into two separable
operations:
* an interaction between the user of the integrated service and the
local network elements ("provision of the service" in the terms of
802.1D) to confirm the availability of transmission resources for
traffic to be introduced.
* selection of an appropriate user_priority for that traffic on the
basis of the service and service parameters to be supported.
* distinguishes between the user to network interface above and the
mechanisms used by the switches ("support of the service"): these
include communication between the switches (network to network
signaling).
* describes a simple architecture for the provision and support of these
services, broken down into components with functional and interface
descriptions:
* "user" components: a layer-3 to layer-2 negotiation and translation


Seaman, Smith, Crawley    Expires January 1998          [Page 2]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


component for sending and receiving, with interfaces to other components
residing in the station.
* processes residing in a bridge/switch to handle admission control and
mapping requests, including proposals for actual traffic mappings to
user_priority values.
* identifies the requirements of a signaling protocol to carry admission
control requests between devices.

It will be noted that this document is written from the pragmatic
viewpoint that there will be a widely deployed network technology and we
are evaluating it for its ability to support some or all of the defined
IETF integrated services: this approach is intended to ensure
development of a system which can provide useful new capabilities in
existing (and soon to be deployed) network infrastructures.


2. Goals and Assumptions

It is assumed that typical subnetworks that are concerned about
quality-of-service will be "switch-rich": that is to say most
communication between end stations using integrated services support
will pass through at least one switch. The mechanisms and protocols
described will be trivially extensible to communicating systems on the
same shared media, but it is important not to allow problem
generalisation to complicate the practical application that we target:
the access characteristics of Ethernet and Token-Ring LANs are forcing a
trend to switch-rich topologies. In addition, there have been
developments in the area of  MAC enhancements to ensure delay-
deterministic access on network links e.g. IEEE 802.12 [15] and other
proprietary schemes.

Note that we illustrate most examples in this document using RSVP as an
"upper-layer" QoS signaling protocol but there are actually no real
dependencies on this protocol: RSVP could be replaced by some other
dynamic protocol or else the requests could be made by network
management or other policy entities. In particular, the SBM signaling
protocol [10], which is based upon RSVP, is designed to work seamlessly
in the service-mapping architecture described in this document and the
"Integrated Services over IEEE 802" framework [7].

There may be a heterogeneous mixture of switches with different
capabilities, all compliant with IEEE 802.1p, but implementing queuing
and forwarding mechanisms in a range from simple 2-queue per port,
strict priority, up to more complex multi-queue (maybe even one per-
flow) WFQ or other algorithms.

The problem is broken down into smaller independent pieces: this may
lead to sub-optimal usage of the network resources but we contend that


Seaman, Smith, Crawley    Expires January 1998          [Page 3]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


such benefits are often equivalent to very small improvements in network
efficiency in a LAN environment. Therefore, it is a goal that the
switches in the network operate using a much simpler set of information
than the RSVP engine in a router. In particular, it is assumed that such
switches do not need to implement per-flow queuing and policing
(although they might do so).

It is a fundamental assumption of the int-serv model that flows are
isolated from each other throughout their transit across a network.
Intermediate queueing nodes are expected to police the traffic to ensure
that it conforms to the pre-agreed traffic flow specification. In the
architecture proposed here for mapping to layer-2, we diverge from that
assumption in the interests of simplicity: the policing function is
assumed to be implemented in the transmit schedulers of the layer-3
devices (end stations, routers). In the LAN environments envisioned, it
is reasonable to assume that end stations are "trusted" to adhere to
their agreed contracts at the inputs to the network and that we can
afford to over-allocate resources at admission -control time to
compensate for the inevitable extra jitter/bunching introduced by the
switched network itself.

These divergences have some implications on the types of receiver
heterogeneity that can be supported and  the statistical multiplexing
gains that might have been exploited, especially for Controlled Load
flows: this is discussed in a later section of this document.

3. Non-Goals

This document describes service mappings onto existing IEEE- and ANSI-
defined standard MAC layers and uses standard MAC-layer services as in
IEEE 802.1 bridging. It does not attempt to make use of or describe the
capabilities of other proprietary or standard MAC-layer protocols
although it should be noted that there exists published work regarding
MAC layers suitable for QoS mappings: these are outside the scope of the
IETF ISSLL working group charter.

4. User Priority and Frame Forwarding in IEEE 802 Networks

4.1 General IEEE 802 Service Model

User_priority is a value associated with the transmission and reception
of all frames in the IEEE 802 service model: it is supplied by the
sender which is using the MAC service. It is provided along with the
data to a receiver using the MAC service. It may or may not be actually
carried over the network: Token- Ring/802.5 carries this value (encoded
in its FC octet), basic Ethernet/802.3 does not, 802.12 may or may not
depending on the frame format in use. 802.1p defines a consistent way to
carry this value over the bridged network on Ethernet, Token Ring,


Seaman, Smith, Crawley    Expires January 1998          [Page 4]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


Demand- Priority, FDDI or other MAC-layer media using an extended frame
format. The usage of user_priority is summarised below but is more fully
described in section 2.5 of 802.1D [1] and 802.1p [2] "Support of the
Internal Layer Service by Specific MAC Procedures" and readers are
referred to these documents for further information.

If the "user_priority" is carried explicitly in packets, its utility is
as a simple label in the data stream enabling packets in different
classes to be discriminated easily by downstream nodes without their
having to parse the packet in more detail.

Apart from making the job of desktop or wiring-closet switches easier,
an explicit field means they do not have to change hardware or software
as the rules for classifying packets evolve (e.g. based on new protocols
or new policies). More sophisticated layer-3 switches, perhaps deployed
towards the core of a network, can provide added value here by
performing the classification more accurately and, hence, utilising
network resources more efficiently or providing better protection of
flows from one another: this appears to be a good economic choice since
there are likely to be very many more desktop/wiring closet switches in
a network than switches requiring layer-3 functionality.

The IEEE 802 specifications make no assumptions about how user_priority
is to be used by end stations or by the network. In particular it can
only be considered a "priority" in a loose sense: although the current
802.1p draft defines static priority queuing as the default mode of
operation of switches that implement multiple queues (user_priority is
defined as a 3-bit quantity so strict priority queueing would give value
7 = high priority, 0 = low priority). The general switch algorithm is as
follows: packets are placed onto a particular queue based on the
received user_priority (from the packet if a 802.1p header or 802.5
network was used, invented according to some local policy if not). The
selection of queue is based on a mapping from user_priority
[0,1,2,3,4,5,6 or 7] onto the number of available queues. Note that
switches may implement any number of queues from 1 upwards and it may
not be visible externally, except through any advertised int- serv
parameters and the switch's admission control behaviour, which
user_priority values get mapped internally onto the same vs. different
queues. Other algorithms that a switch might implement might include
e.g. weighted fair queueuing, round robin.

In particular, IEEE makes no recommendations about how a sender should
select the value for user_priority: one of the main purposes of this
current document is to propose such usage rules and how to communicate
the semantics of the values between switches, end- stations and routers.
In the remainder of this document we use the term "traffic class"
synonymously with user_priority.


Seaman, Smith, Crawley    Expires January 1998          [Page 5]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


4.2 Ethernet/802.3

There is no explicit traffic class or user_priority field carried in
Ethernet packets. This means that user_priority must be regenerated at a
downstream receiver or switch according to some defaults or by parsing
further into higher-layer protocol fields in the packet. Alternatively,
the IEEE 802.1Q encapsulation [11] may be used which provides an
explicit traffic class field on top of an basic MAC format.

For the different IP packet encapsulations used over Ethernet/802.3, it
will be necessary to adjust any admission- control calculations
according to the framing and to the padding requirements:


Encapsulation                          Framing Overhead  IP MTU
                                          bytes/pkt       bytes

IP EtherType (ip_len<=46 bytes)             64-ip_len    1500
             (1500>=ip_len>=46 bytes)         18         1500

IP EtherType over 802.1p/Q (ip_len<=42)     64-ip_len    1500*
             (1500>=ip_len>=42 bytes)         22         1500*

IP EtherType over LLC/SNAP (ip_len<=40)     64-ip_len    1492
             (1500>=ip_len>=40 bytes)         24         1492

* note that the draft IEEE 802.1Q specification exceeds the current IEEE
802.3 maximum packet length values by 4 bytes although work is
proceeding within IEEE to address this issue.

4.3 Token-Ring/802.5

The token ring standard [6] provides a priority mechanism that can be
used to control both the queuing of packets for transmission and the
access of packets to the shared media. The priority mechanisms are
implemented using bits within the Access Control (AC) and the Frame
Control (FC) fields of a LLC frame. The first three bits of the AC
field, the Token Priority bits, together with the last three bits of the
AC field, the Reservation bits, regulate which stations get access to
the ring. The last three bits of the FC field of an LLC frame, the User
Priority bits, are obtained from the higher layer in the user_priority
parameter when it requests transmission of a packet. This parameter also
establishes the Access Priority used by the MAC. The user_priority value
is conveyed end-to-end by the User Priority bits in the FC field and is
typically preserved through Token-Ring bridges of all types. In all
cases, 0 is the lowest priority.

Token-Ring also uses a concept of Reserved Priority: this relates to the


Seaman, Smith, Crawley    Expires January 1998          [Page 6]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


value of priority which a station uses to reserve the token for the next
transmission on the ring.  When a free token is circulating, only a
station having an Access Priority greater than or equal to the Reserved
Priority in the token will be allowed to seize the token for
transmission. Readers are referred to [14] for further discussion of
this topic.

A token ring station is theoretically capable of separately queuing each
of the eight levels of requested user priority and then transmitting
frames in order of priority.  A station sets Reservation bits according
to the user priority of frames that are queued for transmission in the
highest priority queue.  This allows the access mechanism to ensure that
the frame with the highest priority throughout the entire ring will be
transmitted before any lower priority frame.  Annex I to the IEEE 802.5
token ring standard recommends that stations send/relay frames as
follows:

            Application             user_priority

            non-time-critical data      0
                  -                     1
                  -                     2
                  -                     3
            LAN management              4
            time-sensitive data         5
            real-time-critical data     6
            MAC frames                  7

To reduce frame jitter associated with high-priority traffic, the annex
also recommends that only one frame be transmitted per token and that
the maximum information field size be 4399 octets whenever delay-
sensitive traffic is traversing the ring.  Most existing implementations
of token ring bridges forward all LLC frames with a default access
priority of 4.  Annex I recommends that bridges forward LLC frames that
have a user priorities greater that 4 with a reservation equal to the
user priority (although the draft IEEE P802.1p [2] permits network
management override this behaviour). The capabilities provided by token
ring's user and reservation priorities and by IEEE 802.1p can provide
effective support for Integrated Services flows that request QoS using
RSVP. These mechanisms can provide, with few or no additions to the
token ring architecture, bandwidth guarantees with the network flow
control necessary to support such guarantees.

For the different IP packet encapsulations used over Token Ring/802.5,
it will be necessary to adjust any admission-control calculations
according to the framing requirements:

Encapsulation                          Framing Overhead  IP MTU


Seaman, Smith, Crawley    Expires January 1998          [Page 7]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


                                          bytes/pkt       bytes

IP EtherType over 802.1p/802.1Q               29          4370*
IP EtherType over LLC/SNAP                    25          4370*

*the suggested MTU from RFC 1042 [13] is 4464 bytes but there are issues
related to discovering what the maximum supported MTU between any two
points both within and between Token Ring subnets. We recommend here an
MTU consistent with the 802.5 Annex I recommendation.

4.4 FDDI

The Fiber Distributed Data Interface standard [16] provides a priority
mechanism that can be used to control both the queuing of packets for
transmission and the access of packets to the shared media. The priority
mechanisms are implemented using similar mechanisms to Token-Ring
described above. The standard also makes provision for "Synchronous"
data traffic with strict media access and delay guarantees - this mode
of operation is not discussed further here: this is an area within the
scope of the ISSLL WG that requires further work. In the remainder of
this document we treat FDDI as a 100Mbps Token Ring (which it is) using
a service interface compatible with IEEE 802 networks.

4.5 Demand-Priority/802.12

IEEE 802.12 [15] is a standard for a shared 100Mbit/s LAN. Data packets
are transmitted using either 803.3 or 802.5 frame formats. The MAC
protocol is called Demand Priority. Its main characteristics in respect
to QoS are the support of two service priority levels (normal- and
high-priority) and the service order: data packets from all network
nodes (e.g. end-hosts and bridges/switches) are served using a simple
round robin algorithm.

If the 802.3 frame format is used for data transmission then
user_priority is encoded in the starting delimiter of the 802.12 data
packet. If the 802.5 frame format is used then the priority is
additionally encoded in the YYY bits of the AC field in the 802.5 packet
header (see also section 4.3). Furthermore, the 802.1p/Q encapsulation
may also be applied in 802.12 networks with its own user_priority field.
Thus, in all cases, switches are able to recover any user_priority
supplied by a sender.

The same rules apply for 802.12 user_priority mapping through a bridge
as with other media types: the only additional information is that
"normal" priority is used by default for user_priority values 0 through
4 inclusive and "high" priority is used for user_priority levels 5
through 7: this ensures that the default Token-Ring user_priority level
of 4 for 802.5 bridges is mapped to "normal" on 802.12 segments.


Seaman, Smith, Crawley    Expires January 1998          [Page 8]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


The medium access in 802.12 LANs is deterministic: the demand priority
mechanism ensures that, once the normal priority service has been pre-
empted, all high priority packets have strict priority over packets with
normal priority. In the abnormal situation that a normal-priority packet
has been waiting at the front of a MAC transmit queue for a time period
longer than PACKET_PROMOTION (200 - 300 ms [15]),its priority is
automatically 'promoted' to high priority. Thus, even normal-priority
packets have a maximum guaranteed access time to the medium.

Integrated Services can be built on top of the 802.12 medium access
mechanism. When combined with admission control and bandwidth
enforcement mechanisms, delay guarantees as required for a Guaranteed
Service can be provided without any changes to the existing 802.12 MAC
protocol.

Since the 802.12 standard supports the 802.3 and 802.5 frame formats,
the same framing overhead as reported in sections 4.2 and 4.3 must be
considered in the admission control equations for 802.12 links.

5. Integrated services through layer-2 switches

5.1 Summary of switch characteristics

For the sake of illustration, we divide layer-2 bridges/switches into
several categories, based on the level of sophistication of their QoS
and software protocol capabilities: these categories are not intended to
represent all possible implementation choices but, instead, to aid
discussion of what QoS capabilities can be expected from a network made
of these devices (the basic "class 0" device is included for
completeness but cannot really provide useful integrated service).

Class 0
         - 802.1D MAC bridging
         - single queue per output port, no separation of
            traffic classes
         - Spanning-Tree to remove topology loops (single active path)

Class I
         - 802.1p priority queueuing between traffic classes.
         - No multicast heterogeneity.
         - 802.1p GARP/GMRP pruning of individual multicast addresses.

Class II As (I) plus:
         - can map received user_priority on a per-input-port basis
            to some internal set of canonical values.
         - can map internal canonical values onto transmitted
            user_priority on a per-output-port basis giving some
            limited form of multicast heterogeneity.


Seaman, Smith, Crawley    Expires January 1998          [Page 9]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


         - maybe implements IGMP snooping for pruning.

Class III As (II) plus:
         - per-flow classification
         - maybe per-flow policing and/or reshaping
         - more complex transmit scheduling (probably not per-flow)

5.2 Queueing

Connectionless packet-based networks in general, and LAN-switched
networks in particular, work today because of scaling choices in network
provisioning. Consciously or (more usually) unconsciously, enough excess
bandwidth and buffering is provisioned in the network to absorb the
traffic sourced by higher-layer protocols or cause their transmission
windows to run out, on a statistical basis, so that the network is only
overloaded for a short duration and the average expected loading is less
than 60% (usually much less).

With the advent of time-critical traffic such over-provisioning has
become far less easy to achieve. Time critical frames may find
themselves queued for annoyingly long periods of time behind temporary
bursts of file transfer traffic, particularly at network bottleneck
points, e.g. at the 100 Mb/s to 10 Mb/s transition that might occur
between the riser to the wiring closet and the final link to the user
from a desktop switch. In this case, however, if it is known (guaranteed
by application design, merely expected on the basis of statistics, or
just that this is all that the network guarantees to support) that the
time critical traffic is a small fraction of the total bandwidth, it
suffices to give it strict priority over the "normal" traffic. The worst
case delay experienced by the time critical traffic is roughly the
maximum transmission time of a maximum length non-time-critical frame -
less than a millisecond for 10 Mb/s Ethernet, and well below an end to
end budget based on human perception times.

When more than one "priority" service is to be offered by a network
element e.g. it supports Controlled-Load as well as Guaranteed Service,
the queuing discipline becomes more complex. In order to provide the
required isolation between the service classes, it will probably be
necessary to queue them separately. There is then an issue of how to
service the queues - a combination of admission control and more
intelligent queueing disciplines e.g. weighted fair queuing, may be
required in such cases. As with the service specifications themselves,
it is not the place for this document to specify queuing algorithms,
merely to observe that the external behaviour meet the services'
requirements.

5.3 Multicast Heterogeneity


Seaman, Smith, Crawley    Expires January 1998         [Page 10]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


At layer-3, the int-serv model allows heterogeneous multicast flows
where different branches of a tree can have different types of
reservations for a given multicast destination. It also supports the
notion that trees may have some branches with reserved flows and some
using best effort (default) service. If we were to treat a layer-2
subnet as a single "network element", as defined in [3], then all of the
branches of the distribution tree that lie within the subnet could be
assumed to require the same QoS treatment and be treated as an atomic
unit as regards admission control etc. with this assumption, the model
and protocols already defined by int- serv and RSVP already provide
sufficient support for multicast heterogeneity. Note, though, that an
admission control request may well be rejected because just one link in
the subnet has reached its traffic limit and that this will lead to
rejection of the request for the whole subnet.

The above approach would, therefore, provide very sub-optimal
utilisation of resources given the size and complexity of the layer-2
subnets envisioned by this document. Therefore, it is desirable to
support the ability of layer-2 switches to apply QoS differently on
different egress branches of a tree that divides at that switch: this is
discussed in the following paragraphs.

IEEE 802.1D and 802.1p specify a basic model for multicast whereby a
switch performs multicast routing decisions based on the destination
address: this would produce a list of output ports to which the packet
should be forwarded. In its default mode, such a switch would use the
user_priority value in received packets (or a value regenerated on a
per-input-port basis in the absence of an explicit value) to enqueue the
packets at each output port. All of the classes of switch identified
above can support this operation.

If a switch is selecting per-port output queues based only on the
incoming user_priority, as described by 802.1p, it must treat all
branches of all multicast sessions within that user_priority class with
the same queuing mechanism: no heterogeneity is then possible and this
could well lead to the failure of an admission control request for the
whole multicast session due to a single link being at its maximum
allocation, as described above. Note that, in the layer-2 case as
distinct from the layer-3 case with RSVP/int-serv, the option of having
some receivers getting the session with the requested QoS and some
getting it best effort does not exist as the Class I switches are unable
to re-map the user_priority on a per- link basis: this could well become
an issue with heavy use of dynamic multicast sessions. If a switch were
to implement a separate user_priority mapping at each output port, as
described under "Class II switch" above, then some limited form of
receiver heterogeneity can be supported e.g. forwarding of traffic as
user_priority 4 on one branch where receivers have performed admission
control reservations and as user_priority 0 on one where they have not.


Seaman, Smith, Crawley    Expires January 1998         [Page 11]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


We assume that per-user_priority queuing without taking account of input
or output ports is the minimum standard functionality for switches in a
LAN environment (Class I switch, as defined above) but that more
functional layer-2 or even layer-3 switches (a.k.a. routers) can be used
if even more flexible forms of heterogeneity are considered necessary to
achieve more efficient resource utilisation: note that the behaviour of
layer-3 switches in this context is already well standardised by IETF.

5.4 Override of incoming user_priority

In some cases, a network administrator may not trust the user_priority
values contained in packets from a source and may wish to map these into
some more suitable set of values. Alternatively, due perhaps to
equipment limitations or transition periods, values may need to be
mapped to/from different regions of a network.

Some switches may implement such a function on input that maps received
user_priority into some internal set of values (this table is known in
802.1p as the "user_priority regeneration table"). These values can then
be mapped using the output table described above onto outgoing
user_priority values: these same mappings must also be used when
applying admission control to requests that use the user_priority values
(see e.g. [10]).  More sophisticated approaches may also be envisioned
where a device polices traffic flows and adjusts their onward
user_priority based on their conformance to the admitted traffic flow
specifications.

5.5 Remapping of non-conformant aggregated flows

One other topic under discussion in the int-serv context is how to
handle the traffic for data flows from sources that are exceeding their
currently agreed traffic contract with the network. An approach that
shows some promise is to treat such traffic with "somewhat less than
best effort" service in order to protect traffic that is normally given
"best effort" service from having to back off (such traffic is often
"adaptive" using TCP or other congestion control algorithms and it would
be unfair to penalise it due to badly behaved traffic from reserved
flows which are often set up by non-adaptive applications).

One solution here might be to assign normal best effort traffic to one
user_priority and to label excess non-conformant traffic as a "lower"
user_priority although the re-ordering problems that might arise from
doing this may make this solution undesirable, particularly if the flows
are using TCP: for this reason the controlled load service recommends
dropping excess traffic, rather than re-mapping to a lower priority.
This topic is further discussed below.


Seaman, Smith, Crawley    Expires January 1998         [Page 12]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


6. Selecting traffic classes

One fundamental question is "who gets to decide what the classes mean
and who gets access to them?" One approach would be for the meanings of
the classes to be "well-known": we would then need to standardise a set
of classes e.g. 1 = best effort, 2 = controlled- load, 3 = guaranteed
(loose delay bound, high bandwidth), 4 = guaranteed (slightly tighter
delay) etc. The values to encode in such a table in end stations, in
isolation from the network to which they are connected, is
problematical: one approach could be to define one user_priority value
per int-serv service and leave it at that (reserving the rest of the
combinations for future traffic classes - there are sure to be plenty!).

We propose here a more flexible mapping: clients ask "the network" which
user_priority traffic class to use for a given traffic flow, as
categorised by its flow-spec and layer-2 endpoints. The network provides
a value back to the requester which is appropriate to the current
network topology, load conditions, other admitted flows etc. The task of
configuring switches with this mapping (e.g. through network management,
a switch-switch protocol or via some network-wide QoS-mapping directory
service) is an order of magnitude less complex than performing the same
function in end stations. Also, when new services (or other network
reconfigurations) are added to such a network, the network elements will
typically be the ones to be upgraded with new queuing algorithms etc.
and can be provided with new mappings at this time.

Given the need for a new session or "flow" requiring some QoS support, a
client then needs answers to the following questions:

1. which traffic class do I add this flow to?
 The client needs to know how to label the packets of the flow as it
places them into the network.

2. who do I ask/tell?
 The proposed model is that a client ask "the network" which
user_priority traffic class to use for a given traffic flow. This has
several benefits as compared to a model which allows clients to select a
class for themselves.

3. how do I ask/tell them?
 A request/response protocol is needed between client and network: in
fact, the request can be piggy-backed onto an admission control request
and the response can be piggy-backed onto an admission control
acknowledgment: this "one pass" assignment has the benefit of completing
the admission control in a timely way and reducing the exposure to
changing conditions which could occur if clients cached the knowledge
for extensive periods.


Seaman, Smith, Crawley    Expires January 1998         [Page 13]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


The network (i.e. the first network element encountered downstream from
the client) must then answer the following questions:

1. which traffic class do I add this flow to?
 This is a packing problem, difficult to solve in general, but many
simplifying assumptions can be made: presumably some simple form of
allocation can be done without a more complex scheme able to dynamically
shift flows around between classes.

2. which traffic class has worst-case parameters which meet the needs of
this flow?
This might be an ordering/comparison problem: which of two service
classes is "better" than another? Again, we can make this tractable by
observing that all of the current int-serv classes can be ranked (best
effort <= Controlled Load <= Guaranteed Service) in a simple manner. If
any classes are implemented in the future that cannot be simply ranked
then the issue can be finessed by either a priori knowledge about what
classes are supported or by configuration.

and return the chosen user_priority value to the client.

Note that the client may be either an end station, router or a first
switch which may be acting as a proxy for a client which does not
participate in these protocols for whatever reason. Note also that a
device e.g. a server or router, may choose to implement both the
"client" as well as the "network" portion of this model so that it can
select its own user_priority values: such an implementation would,
however, be discouraged unless the device really does have a close tie-
in with the network topology and resource allocation policies but would
work in some cases where there is known over- provisioning of resources.


7. Flow Identification

Some models for int-serv over lower-layers treat layer-2 switches very
much as a special case of routers: in particular, that switches along
the data path will make packet handling decisions based on the RSVP flow
and filter specifications and use them to classify the corresponding
data packets. However, filtering to the per-flow level becomes difficult
with increasing switch speed: devices with such filtering capabilities
are unlikely to have a very different implementation complexity from IP
routers and there already exist protocol specifications for those
devices.

This document argues that "aggregated flow" identification based on
user_priority is a useful intermediate point between no QoS and full
router-type integrated services and that this be the minimum flow
classification capability required of switches.


Seaman, Smith, Crawley    Expires January 1998         [Page 14]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


8. Reserving Network Resources - Admission Control

So far we have not discussed admission control. In fact, without
admission control it is possible to assemble a layer-2 LAN of some size
capable of supporting real-time services, providing that the traffic
fits within certain scaling constraints (relative link speeds, numbers
of ports etc. - see below). This is not surprising since it is possible
to run a fair approximation to real time services on small LANs today
with no admission control or help from encoded priority bits.

As an example, imagine a campus network providing dedicated 10 Mbps
connections to each user. Each floor of each building supports up to 96
users, organized into groups of 24, with each group being supported by a
100 Mbps downlink to a basement switch which concentrates 5 floors (20 x
100 Mbps) and a data center (4 x 100 Mbps) to a 1 Gbps link to an 8 Gbps
central campus switch, which in turn hooks 6 buildings together (with 2
x 1 Gbps full duplex links to support a corporate server farm). Such a
network could support 1.5 Mb/s of voice/video from every user to any
other user or (for half the population) the server farm, provided the
video ran high priority: this gives 3000 users, all with desktop video
conferencing running along with file transfer/email etc.

In such a network, a discussion as to the best service policy to apply
to high and low priority queues may prove academic: while it is true
that "normal" traffic may be delayed by bunches of high priority frames,
queuing theory tells us that the average queue occupancy in the high
priority queue at any switch port will be somewhat less than 1 (with
real user behaviour, i.e. not all watching video conferences all the
time) it should be far less. A cheaper alternative to buying equipment
with a fancy queue service policy may be to buy equipment with more
bandwidth to lower the average link utilisation by a few per cent.

In practice a number of objections can be made to such a simple
solution. There may be long established expensive equipment in the
network which does not provide all the bandwidth required. There will be
considerable concern over who is allowed to say what traffic is high
priority. There may be a wish to give some form of "prioritised" service
to crucial business applications, above that given to experimental
video-conferencing: in this context, admission control needs to provide
administrative control to some level, without making that control so
elaborate to implement that it is not simply rejected in favor of
providing yet more bandwidth instead.

The proposed admission control mechanism requires a query-response
interaction with the network returning a "YES/NO" answer and, if
successful, a user_priority value with which to tag the data frames of
this flow.


Seaman, Smith, Crawley    Expires January 1998         [Page 15]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


The relevant int-serv specifications describe the parameters which need
to be considered when making an admission control decision at each node
in the network path between sender and receiver. We discuss how to
calculate these parameters for different network technologies below but
we do not specify admission control algorithms or mechanisms as to how
to progress the admission control process across the network. The
proposed IETF protocol for this purpose "Subnet Bandwidth Manager" (SBM)
is defined in [10].

Where there are multiple mechanisms in use for allocating resources e.g.
some combination of SBM and network management, it will be necessary to
ensure that network resources are partitioned amongst the different
mechanisms in some way: this could be by configuration or maybe by
having the mechanisms allocate from a common resource pool within any
device.


9. Mapping of integrated services to layer-2 in layer-3 devices

9.1 Layer-3 Client Model

We assume the same client model as int-serv and RSVP where we use the
term "client" to mean the entity handling QoS in the layer-3 device at
each end of a layer-2 hop (e.g. end-station, router). The sending client
itself is responsible for local admission control and scheduling packets
onto its link in accordance with the service agreed. As with the current
int-serv model, this involves per-flow scheduling (a.k.a. traffic
shaping) in every such originating source.

The client is running an RSVP process which presents a session
establishment interface to applications, signals over the network,
programs a scheduler and classifier in the driver and interfaces to a
policy control module. In particular, RSVP also interfaces to a local
admission control module: it is this entity that we focus on here.


Seaman, Smith, Crawley    Expires January 1998         [Page 16]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


The following diagram is taken from the RSVP specification[4]:
                      _____________________________
                     |  _______                    |
                     | |       |   _______         |
                     | |Appli- |  |       |        |   RSVP
                     | | cation|  | RSVP <-------------------->
                     | |       <-->       |        |
                     | |       |  |process|  _____ |
                     | |_._____|  |       -->Polcy||
                     |   |        |__.__._| |Cntrl||
                     |   |data       |  |   |_____||
                     |===|===========|==|==========|
                     |   |   --------|  |    _____ |
                     |   |  |        |  ---->Admis||
                     |  _V__V_    ___V____  |Cntrl||
                     | |      |  |        | |_____||
                     | |Class-|  | Packet |        |
                     | | ifier|==>Schedulr|====================>
                     | |______|  |________|        |    data
                     |                             |
                     |_____________________________|

                     Figure 1 - RSVP in Sending Hosts


Note that we illustrate examples in this document using RSVP as the
"upper-layer" signaling protocol but there are no actual dependencies on
this protocol: RSVP could be replaced by some other dynamic protocol or
else the requests could be made by network management or other policy
entities.

9.2 Requests to layer-2 ISSLL

The local admission control entity within a client is responsible for
mapping these layer-3 session-establishment requests into layer-2
language.

The upper-layer entity makes a request, in generalised terms,  to ISSLL
of the form:

    "May I reserve for traffic with <traffic characteristic> with
    <performance requirements> from <here> to <there> and how
    should I label it?"

where
    <traffic characteristic> = Sender Tspec
                               (e.g. bandwidth, burstiness, MTU)
    <performance requirements> = FlowSpec


Seaman, Smith, Crawley    Expires January 1998         [Page 17]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


                               (e.g. latency, jitter bounds)
    <here> = IP address(es)
    <there> = IP address(es) - may be multicast


9.3 At the Layer-3 Sender

The ISSLL  functionality in the sender is illustrated below and the
functions of the box labeled "SBM client" may be summarised as:
* maps the endpoints of the conversation to layer-2 addresses in the
LAN, so that the client can figure out what traffic is really going
where (probably makes reference to the ARP protocol cache for unicast or
an algorithmic mapping for multicast destinations).
* applies local admission control on outgoing link and driver
* formats a SBM request to the network with the mapped addresses and
filter/flow specs
* receives response from the network and reports the YES/NO admission
control answer back to the upper layer entity, along with any negotiated
modifications to the session parameters.
* saves any returned user_priority to be associated with this session in
a "802 header" table: this will be used when adding layer-2 header
before sending any future data packet belonging to this session. This
table might, for example, be indexed by the RSVP flow identifier.

                    from IP     from RSVP
                   ____|____________|____________
                  |    |            |            |
                  |  __V____     ___V___         |
                  | |       |   |       |        |
                  | | Addr  |<->|       |        | SBM signaling
                  | |mapping|   | SBM   |<------------------------>
                  | |_______|   |Client |        |
                  |  ___|___    |       |        |
                  | |       |<->|       |        |
                  | |  802  |   |_______|        |
                  | | header|     / | |          |
                  | |_______|    /  | |          |
                  |    |        /   | |   _____  |
                  |    | +-----/    | +->|Local| |
                  |  __V_V_    _____V__  |Admis| |
                  | |      |  |        | |Cntrl| |
                  | |Class-|  | Packet | |_____| |
                  | | ifier|==>Schedulr|======================>
                  | |______|  |________|         |  data
                  |______________________________|

               Figure 2 - ISSLL in End-station Sender


Seaman, Smith, Crawley    Expires January 1998         [Page 18]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


9.4 At the Layer-3 Receiver

The ISSLL functionality in the receiver is a good deal simpler. It is
summarised below and is illustrated by the following picture:
* handles any received SBM protocol indications.
* applies local admission control to see if a request can be supported
with appropriate local receive resources.
* passes indications up to RSVP if OK.
* accepts confirmations from RSVP and relays them back via SBM signaling
towards the requester.
* may program a receive classifier and scheduler, if any is used, to
identify traffic classes of received packets and accord them appropriate
treatment e.g. reserve some buffers for particular traffic classes.
* programs receiver to strip any 802 header information from received
packets.

                     to RSVP       to IP
                       ^            ^
                   ____|____________|___________
                  |    |            |           |
                  |  __|____        |           |
                  | |       |       |           |
 SBM signaling    | |  SBM  |    ___|___        |
<-----------------> |Client |   | Strip |       |
                  | |_______|   |802 hdr|       |
                  |    |   \    |_______|       |
                  |  __v___ \       ^           |
                  | | Local |\      |           |
                  | | Admis | \     |           |
                  | | Cntrl |  \    |           |
                  | |_______|   \   |           |
                  |  ______     v___|____       |
                  | |Class-|   | Packet  |      |
===================>| ifier|==>|Scheduler|      |
     data         | |______|   |_________|      |
                  |_____________________________|

             Figure 3 - ISSLL in End-station Receiver


10. Layer-2 Switch Functions

10.1 Switch Model

The model of layer-2 switch behaviour described here uses the
terminology of the SBM protocol [10] as an example of an admission
control protocol: the model is equally applicable when other mechanisms


Seaman, Smith, Crawley    Expires January 1998         [Page 19]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


e.g. static configuration, network management are in use for admission
control. We define the following entities within the switch:

* Local admission control - one of these on each port accounts for the
available bandwidth on the link attached to that port. For half-duplex
links, this involves taking account of the resources allocated to both
transmit and receive flows. For full-duplex, the input port accountant's
task is trivial.

* Input SBM module: one instance on each port, performs the "network"
side of the signaling protocol for peering with clients or other
switches. Also holds knowledge of the mappings of int-serv classes to
user_priority.

* SBM propagation - relays requests that have passed admission control
at the input port to the relevant output ports' SBM modules. This will
require access to the switch's forwarding table (layer-2 "routing table"
cf. RSVP model) and port spanning-tree states.

* Output SBM module - forwards requests to the next layer-2 or -3
network hop.

* Classifier, Queueing and Scheduler - these functions are basically as
described by the Forwarding Process of IEEE 802.1p (see section 3.7 of
[2]). The Classifier module identifies the relevant QoS information from
incoming packets and uses this, together with the normal bridge
forwarding database, to decide to which output queue of which output
port to enqueue the packet. In Class I switches, this information is the
"regenerated user_priority" parameter which has already been decoded by
the receiving MAC service and potentially re-mapped by the 802.1p
forwarding process (see description in section 3.7.3 of [2]). This does
not preclude more sophisticated classification rules which may be
applied in more complex Class III switches e.g. matching on individual
int-serv flows.

 The Queueing and Scheduler module holds the output queues for ports and
provides the algorithm for servicing the queues for transmission onto
the output link in order to provide the promised int-serv service.
Switches will implement one or more output queues per port and all will
implement at least a basic strict priority dequeueing algorithm as their
default, in accordance with 802.1p.

* Ingress traffic class mapper and policing - as described in 802.1p
section 3.7. This optional module may check on whether the data within
traffic classes are conforming to the patterns currently agreed:
switches may police this and discard or re-map packets. The default
behaviour is to pass things through unchanged.


Seaman, Smith, Crawley    Expires January 1998         [Page 20]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


* Egress traffic class mapper - as described in 802.1p section 3.7. This
optional module may apply re-mapping of traffic classes e.g. on a per-
output port basis. The default behaviour is to pass things through
unchanged.

These are shown by the following diagram which is a superset of the IEEE
802.1D/802.1p bridge model:

                   _______________________________
                  |  _____     ______     ______  |
 SBM signaling    | |     |   |      |   |      | | SBM signaling
<------------------>| IN  |<->| SBM  |<->| OUT  |<---------------->
                  | | SBM |   | prop.|   | SBM  | |
                  | |_____|   |______|   |______| |
                  |  / |          ^     /     |   |
    ______________| /  |          |     |     |   |_____________
   | \             / __V__        |     |   __V__             / |
   |   \      ____/ |Local|       |     |  |Local|          /   |
   |     \   /      |Admis|       |     |  |Admis|        /     |
   |       \/       |Cntrl|       |     |  |Cntrl|      /       |
   |  _____V \      |_____|       |     |  |_____|    / _____   |
   | |traff |  \               ___|__   V_______    /  |egrss|  |
   | |class |    \            |Filter| |Queue & | /    |traff|  |
   | |map & |=====|==========>|Data- |=| Packet |=|===>|class|  |
   | |police|     |           |  base| |Schedule| |    |map  |  |
   | |______|     |           |______| |________| |    |_____|  |
   |____^_________|_______________________________|______|______|
data in |                                                |data out
========+                                                +========>
                    Figure 4 - ISSLL in Switches

10.2 Admission Control

On reception of an admission control request, a switch performs the
following actions, again using SBM as an example: the behaviour is
different depending on whether the "Designated SBM"  for this segment is
within this switch or not - see [10] for a more detailed specification
of the DSBM/SBM actions:
* if the ingress SBM is the "Designated SBM" for this link/segment, it
translates any received user_priority or else selects a layer-2 traffic
class which appears compatible with the request and whose use does not
violate any administrative policies in force. In effect, it matches up
the requested service with those available in each of the user_priority
classes and chooses the "best" one. It ensures that, if this reservation
is successful, the selected value is passed back to the client.
* ingress DSBM observes the current state of allocation of resources on
the input port/link and then determines whether the new resource
allocation from the mapped traffic class would be excessive. The request


Seaman, Smith, Crawley    Expires January 1998         [Page 21]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


is passed to the reservation propagator if accepted so far.
* if the ingress SBM is not the "Designated SBM" for this link/segment
then it passes the request on directly to the reservation propagator
* reservation propagator relays the request to the bandwidth accountants
on each of the switch's outbound links to which this reservation would
apply (implied interface to routing/forwarding database).
* egress bandwidth accountant observes the current state of allocation
of queueing resources on its outbound port and bandwidth on the link
itself and determines whether the new allocation would be excessive.
Note that this is only the local decision of this switch hop: each
further layer-2 hop through the network gets a chance to veto the
request as it passes along.
* the request, if accepted by this switch, is then passed on down the
line on each output link selected. Any user_priority described in the
forwarded request must be translated according to any egress mapping
table.
* if accepted, the switch must notify the client of the user_priority to
use for packets belonging to this flow.  Note that this is a
"provisional YES" - we assume an optimistic approach here: later
switches can still say "NO" later.
* if this switch wishes to reject the request, it can do so by notifying
the original client (by means of its layer-2 address).


11. Mappings from int-serv service models to IEEE 802

It is assumed that admission control will be applied when deciding
whether or not to admit a new flow through a given network element and
that a device sending onto a link will be proxying the parameters and
admission control decisions on behalf of that link: this process will
require the device to be able to determine (by estimation, measurement
or calculation) several parameters. It is assumed that details of the
potential flow are provided to the device by some means (e.g. a
signaling protocol, network management). The service definition
specifications themselves provide some implementation guidance as to how
to calculate some of these quantities.

The accuracy of calculation of these parameters may not be very
critical: indeed it is an assumption of this model's being used with
relatively simple Class I switches that they merely provide values to
describe the device and admit flows conservatively.

11.1 General characterisation parameters

There are some general parameters that a device will need to use and/or
supply for all service types:
* Ingress link


Seaman, Smith, Crawley    Expires January 1998         [Page 22]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


* Egress links and their MTUs, framing overheads and minimum packet
sizes (see media-specific information presented above).
* available path bandwidth: updated hop-by-hop by any device along the
path of the flow.
* minimum latency

11.2 Parameters to implement Guaranteed Service

A network element must be able to determine the following parameters:

* Constant delay bound through this device (in addition to any value
provided by "minimum latency" above) and up to the receiver at the next
network element for the packets of this flow if it were to be admitted:
this would include any access latency bound to the outgoing link as well
as propagation delay across that link.
* Rate-proportional delay bound through this device and up to the
receiver at the next network element for the packets of this flow if it
were to be admitted.
* Receive resources that would need to be associated with this flow
(e.g. buffering, bandwidth) if it were to be admitted and not suffer
packet loss if it kept within its supplied Tspec/Rspec.
* Transmit resources that would need to be associated with this flow
(e.g. buffering, bandwidth, constant- and rate-proportional delay
bounds) if it were to be admitted.

11.3 Parameters to implement Controlled Load

A network element must be able to determine the following parameters
which can be extracted from [8]:

* Receive resources that would need to be associated with this flow
(e.g. buffering) if it were to be admitted.
* Transmit resources that would need to be associated with this flow
(e.g. buffering) if it were to be admitted.

11.4 Parameters to implement Best Effort

For a network element to implement best effort service there are no
explicit parameters that need to be characterised.

11.5 Mapping to IEEE 802 user_priority

There are many options available for mapping aggregations of flows
described by int-serv service models (Best Effort, Controlled Load, and
Guaranteed are the services considered here) onto user_priority classes.
There currently exists very little practical experience with particular
mappings to help make a determination as to the "best" mapping.  In that
spirit, the following options are presented in order to stimulate


Seaman, Smith, Crawley    Expires January 1998         [Page 23]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


experimentation in this area. Note, this does not dictate what
mechanisms/algorithms a network element (e.g. an Ethernet switch) needs
to perform to implement these mappings: this is an implementation choice
and does not matter so long as the requirements for the particular
service model are met. Having said that, we do explore below the ability
of a switch implementing strict priority queueing to support some or all
of the service types under discussion: this is worthwhile because this
is likely to be the most widely deployed dequeueing algorithm in simple
switches as it is the default specified in 802.1p.

In order to reduce the administrative problems, such a mapping table is
held by *switches* (and routers if desired) but generally not by end-
station hosts and is a read-write table. The values proposed below are
defaults and can be overridden by management control so long as all
switches agree to some extent (the required level of agreement requires
further analysis).

It is possible that some form of network-wide lookup service could be
implemented that serviced requests from clients e.g. traffic_class =
getQoSbyName("H.323 video") and notified switches of what sorts of
traffic categories they were likely to encounter and how to allocate
those requests into traffic classes: such mechanisms are for further
study.

Example:  A Simple Scheme

        user_priority       Service

          0                 "less than" Best Effort
          1                 Best Effort
          2                 reserved
          3                 reserved
          4                 Controlled Load
          5                 Guaranteed Service, 100ms bound
          6                 Guaranteed Service, 10ms bound
          7                 reserved

             Table 1 - Example user_priority to service mappings

In this proposal, all traffic that uses the controlled load service is
mapped to a single 802.1p user_priority whilst that for guaranteed
service is placed into one of two user_priority classes with different
delay bounds. Unreserved best effort traffic is mapped to another.

The use of classes 4, 5 and 6 for Controlled Load and Guaranteed Service
is somewhat arbitrary as long as they are increasing. Any two classes
greater than Best Effort can be used as long as GS is "greater" than CL
although those proposed here have the advantage that, for transit


Seaman, Smith, Crawley    Expires January 1998         [Page 24]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


through 802.1p switches with only two-level strict priority queuing,
they both get "high priority" treatment (the current 802.1p default
split is 0-3 and 4-7 for a device with 2 queues). The choice of delay
bound is also arbitrary but potentially very significant: this can lead
to a much more efficient allocation of resources as well as greater
(though still not very good) isolation between flows.

The "less than best effort" class might be useful for devices that wish
to tag packets that are exceeding a committed network capacity and can
be optionally discarded by a downstream device.  Note, this is not
*required* by any current int-serv models but is under study.

The advantage to this approach is that it puts some real delay bounds on
the Guaranteed Service without adding any additional complexity to the
other services.  It still ignores the amount of *bandwidth* available
for each class. This should behave reasonably well as long as all
traffic for CL and GS flows does not exceed any resource capacities in
the device. Some isolation between very delay-critical GS and less
critical GS flows is provided but there is still an overall assumption
that flows will in general be well- behaved. In addition, this mapping
still leaves room for future service models.

Expanding the number of classes for CL service is not as appealing since
there is no need to map to a particular delay bound.  There may be cases
where an administrator might map CL onto more classes for particular
bandwidths or policy levels.  It may also be desirable to further
subdivide CL traffic in cases where the it is frequently non-conformant
for certain applications.


12. Network Topology Scenarios

12.1 Switched networks using priority scheduling algorithms

In general, the int-serv standards work has tried to avoid any
specification of scheduling algorithms, instead relying on implementers
to deduce appropriate algorithms from the service definitions and on
users to apply measurable benchmarks to check for conformance. However,
since one standards' body has chosen to specify a single default
scheduling algorithm for switches [2], it seems appropriate to examine
to some degree, how well this "implementation" might actually support
some or all of the int-serv services.

If the mappings of Proposal A above are applied in a switch implementing
strict priority queueing between the 8 traffic classes (7 = highest)
then the result will be that all Guaranteed Service packets will be
transmitted in preference to any other service. Controlled Load packets
will be transmitted next, with everything else waiting until both of


Seaman, Smith, Crawley    Expires January 1998         [Page 25]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


these queues are empty. If the admission control algorithms in use on
the switch ensure that the sum of the "promised" bandwidth of all of the
GS and CL sessions are never allowed to exceed the available link
bandwidth then the promised service can be maintained.

12.2 Full-duplex switched networks

We have up to now ignored the MAC access protocol. On a full-duplex
switched LAN (of either Ethernet or Token-Ring types - the MAC algorithm
is, by definition, unimportant) this can be factored in to the
characterisation parameters advertised by the device since the access
latency is well controlled (jitter = one largest packet time). Some
example characteristics (approximate):

    Type        Speed         Max Pkt   Max Access
                               Length    Latency

    Ethernet         10Mbps    1.2ms     1.2ms
                    100Mbps    120us     120us
                      1Gbps     12us      12us
    Token-Ring        4Mbps      9ms       9ms
                     16Mbps      9ms       9ms
    FDDI            100Mbps    360us     8.4ms
    Demand-Priority 100Mbps    120us     253us

             Table 2 - Full-duplex switched media access latency

These delays should be also be considered in the context of speed- of-
light delays of e.g. ~400ns for typical 100m UTP links and ~7us for
typical 2km multimode fibre links.

Therefore we see Full-Duplex switched network topologies as offering
good QoS capabilities for both Controlled Load and Guaranteed Service
when supported by suitable queueing strategies in the switch nodes.

12.3 Shared-media Ethernet networks

We have not mentioned the difficulty of dealing with allocation on a
single shared CSMA/CD segment: as soon as any CSMA/CD algorithm is
introduced then the ability to provide any form of Guaranteed Service is
seriously compromised in the absence of any tight coupling between the
multiple senders on the link. There are a number of reasons for not
offering a better solution for this issue.

Firstly, we do not believe this is a truly solvable problem: it would
seem to require a new MAC protocol. There have been proposals for
enhancements to the MAC layer protocols e.g.  BLAM and enhanced flow-
control in IEEE 802.3; IEEE 802.1 has examined research showing


Seaman, Smith, Crawley    Expires January 1998         [Page 26]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


disappointing simulation results for performance guarantees on shared
CSMA/CD Ethernet without MAC enhancements. However, any solution
involving a new "software MAC" running above the traditional 802.3 MAC
or other proprietary MAC protocols is clearly outside the scope of the
work of the ISSLL WG and this document. Secondly, we are not convinced
that it is really an interesting problem. While not everyone in the
world is buying desktop switches today and there will be end stations
living on repeated segments for some time to come, the number of
switches is going up and the number of stations on repeated segments is
going down. This trend is proceeding to the point that we may be happy
with a solution which assumes that any network conversation requiring
resource reservations will take place through at least one switch (be it
layer-2 or layer-3). Put another way, the easiest QoS upgrade to a
layer-2 network is to install segment switching: only when this has been
done is it worthwhile to investigate more complex solutions involving
admission control.

Thirdly, in the core of the network (as opposed to at the edges), there
does not seem to be wide deployment of repeated segments as opposed to
switched solutions. There may be special circumstances in the future
(e.g. Gigabit buffered repeaters) but these have differing
characteristics to existing CSMA/CD repeaters anyway.

Type        Speed          Max Pkt   Max Access
                            Length    Latency

Etherne      10Mbps         1.2ms    unbounded
            100Mbps         120us    unbounded
              1Gbps          12us    unbounded

             Table 3 - Shared Ethernet media access latency

12.4 Half-duplex switched Ethernet networks

Many of the same arguments for sub-optimal support of Guaranteed Service
apply to half-duplex switched Ethernet as to shared media: in essence,
this topology is a medium that *is* shared between at least two senders
contending for each packet transmission opportunity. Unless these are
tightly coupled and cooperative then there is always the chance that the
best-effort traffic of one will interfere with the important traffic of
the other. Such coupling would seem to need some form of modifications
to the MAC protocol (see above).

Notwithstanding the above, half-duplex switched topologies do seem to
offer the chance to provide Controlled Load service: with the knowledge
that there are only a small limited number (e.g. two) of potential
senders that are both using prioritisation for their CL traffic (with
admission control for those CL flows based on the knowledge of the


Seaman, Smith, Crawley    Expires January 1998         [Page 27]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


number of potential senders) over best effort, the media access
characteristics, whilst not deterministic in the true mathematical
sense, are somewhat predictable. This is probably a close enough
approximation to CL to be useful.

Type        Speed           Max Pkt   Max Access
                             Length    Latency

Ethernet     10Mbps           1.2ms    unbounded
            100Mbps           120us    unbounded
              1Gbps            12us    unbounded

      Table 4 - Half-duplex switched Ethernet media access latency

12.5 Half-duplex and shared Token Ring networks

In a shared Token Ring network, the network access time for high
priority traffic at any station is bounded and is given by (N+1)*THTmax,
where N is the number of stations sending high priority traffic and
THTmax is the maximum token holding time [14]. This assumes that network
adapters have priority queues so that reservation of the token is done
for traffic with the highest priority currently queued in the adapter.
It is easy to see that access times can be improved by reducing N or
THTmax.  The recommended default for THTmax is 10 ms [6]. N is an
integer from 2 to 256 for a shared ring and 2 for a switched half duplex
topology. A similar analysis applies for FDDI. Using default values
gives:

Type        Speed             Max Pkt   Max Access
                               Length    Latency

Token-Ring  4/16Mbps shared      9ms      2570ms
            4/16Mbps switched    9ms        30ms
FDDI        100Mbps            360us         8ms

    Table 5 - Half-duplex and shared Token-Ring media access latency

Given that access time is bounded, it is possible to provide an upper
bound for end-to-end delays as required by Guaranteed Service assuming
that traffic of this class uses the highest priority allowable for user
traffic.  The actual number of stations that send traffic mapped into
the same traffic class as GS may vary over time but, from an admission
control standpoint, this value is needed a priori.  The admission
control entity must therefore use a fixed value for N, which may be the
total number of stations on the ring or some lower value if it is
desired to keep the offered delay guarantees smaller. If the value of N
used is lower than the total number of stations on the ring, admission
control must ensure that the number of stations sending high priority


Seaman, Smith, Crawley    Expires January 1998         [Page 28]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


traffic never exceeds this number. This approach allows admission
control to estimate worst case access delays assuming that all of the N
stations are sending high priority data even though, in most cases, this
will mean that delays are significantly overestimated.

Assuming that Controlled Load flows use a traffic class lower than that
used by GS, no upper-bound on access latency can be provided for CL
flows.  However, CL flows will receive better service than best effort
flows.

Note that, on many existing shared token rings, bridges will transmit
frames using an Access Priority (see section 4.3) value 4 irrespective
of the user_priority carried in the frame control field of the frame.
Therefore, existing bridges would need to be reconfigured or modified
before the above access time bounds can actually be used.

12.6 Half-duplex and shared Demand-Priority networks

In 802.12 networks, communication between end-nodes and hubs and between
the hubs themselves is based on the exchange of link control signals.
These signals are used to control the shared medium access. If a hub,
for example, receives a high-priority request while another hub is in
the process of serving normal- priority requests, then the service of
the latter hub can effectively be pre-empted in order to serve the
high-priority request first. After the network has processed all high-
priority requests, it resumes the normal-priority service at the point
in the network at which it was interrupted.

The time needed to preempt normal-priority network service (the high-
priority network access time) is bounded: the bound depends on the
physical layer and on the topology of the shared network. The physical
layer has a significant impact when operating in half- duplex mode as
e.g. used across unshielded twisted-pair cabling (UTP) links, because
link control signals cannot be exchanged while a packet is transmitted
over the link. Therefore the network topology has to be considered
since, in larger shared networks, the link control signals must
potentially traverse several links (and hubs) before they can reach the
hub which possesses the network control. This may delay the preemption
of the normal priority service and hence increase the upper bound that
may be guaranteed.

Upper bounds on the high-priority access time are given below for a UTP
physical layer and a cable length of 100 m between all end- nodes and
hubs using a maximum propagation delay of 570ns as defined in [15].
These values consider the worst case signaling overhead and assume the
transmission of maximum-sized normal- priority data packets while the
normal-priority service is being pre-empted.


Seaman, Smith, Crawley    Expires January 1998         [Page 29]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


Type            Speed                  Max Pkt   Max Access
                                        Length    Latency

Demand Priority 100Mbps, 802.3pkt, UTP   120us     253us
                         802.5pkt, UTP   360us     733us

   Table 6 - Half-duplex switched Demand-Priority UTP access latency

Shared 802.12 topologies can be classified using the hub cascading level
"N". The simplest topology is the single hub network (N = 1). For a UTP
physical layer, a maximum cascading level of N = 5 is supported by the
standard. Large shared networks with many hundreds nodes can however
already be built with a level 2 topology. The bandwidth manager could be
informed about the actual cascading level by using network management
mechanisms and use this information in its admission control algorithms.

Type            Speed             Max Pkt  Max Access Topology
                                   Length   Latency

Demand Priority 100Mbps, 802.3pkt  120us      262us     N=1
                                   120us      554us     N=2
                                   120us      878us     N=3
                                   120us     1.24ms     N=4
                                   120us     1.63ms     N=5

Demand Priority 100Mbps, 802.5pkt  360us      722us     N=1
                                   360us     1.41ms     N=2
                                   360us     2.32ms     N=3
                                   360us     3.16ms     N=4
                                   360us     4.03ms     N=5

          Table 7 - Shared Demand-Priority UTP access latency

In contrast to UTP, the fibre-optic physical layer operates in dual
simplex mode: Upper bounds for the high-priority access time are given
below for 2 km multimode fibre links with a propagation delay of 10 us.

Type            Speed                  Max Pkt   Max Access
                                        Length    Latency

Demand Priority 100Mbps,802.3pkt,Fibre   120us     139us
                        802.5pkt,Fibre   360us     379us

  Table 8 - Half-duplex switched Demand-Priority Fibre access latency

For shared-media with distances of 2km between all end-nodes and hubs,
the 802.12 standard allows a maximum cascading level of 2. Higher levels
of cascaded topologies are supported but require a reduction of the


Seaman, Smith, Crawley    Expires January 1998         [Page 30]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


distances [15].

Type            Speed             Max Pkt  Max Access Topology
                                   Length   Latency

Demand Priority 100Mbps,802.3pkt    120us     160us     N=1
                                    120us     202us     N=2

Demand Priority 100Mbps,802.5pkt    360us     400us     N=1
                                    360us     682us     N=2

         Table 9 - Shared Demand-Priority Fibre access latency

The bounded access delay and deterministic network access allow the
support of service commitments required for Guaranteed Service and
Controlled Load, even on shared-media topologies. The support of just
two priority levels in 802.12, however, limits the number of services
that can simultaneously be implemented across the network.


13. Signaling protocol


The mechanisms described in this document make use of a signaling
protocol for devices to communicate their admission control requests
across the network: the service definitions to be provided by such a
protocol e.g. [10] are described below. Below, we illustrate the
primitives and information that need to be exchanged with such a
signaling protocol entity - in all these examples, appropriate
delete/cleanup mechanisms will also have to be provided for when
sessions are torn down.

13.1 Client service definitions

The following interfaces can be identified from Figures 2 and 3:

* SBM <-> Address mapping

 This is a simple lookup function which may cause ARP protocol
interactions, may be just a lookup of an existing ARP cache entry or may
be an algorithmic mapping. The layer-2 addresses are needed by SBM for
inclusion in its signaling messages to/from switches which avoids the
switches having to perform the mapping and, hence, have knowledge of
layer-3 information for the complete subnet:

 l2_addr = map_address( ip_addr )

* SBM <-> Session/802 header


Seaman, Smith, Crawley    Expires January 1998         [Page 31]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


This is for notifying the transmit path of how to add layer-2 header
information e.g. user_priority values to the traffic of each outgoing
flow: the transmit path will provide the user_priority value when it
requests a MAC-layer transmit operation for each packet (user_priority
is one of the parameters passed in the packet transmit primitive defined
by the IEEE 802 service model):

 bind_l2_header( flow_id, user_priority )

* SBM <-> Classifier/Scheduler

This is for notifying transmit classifier/scheduler of any additional
layer-2 information associated with scheduling the transmission of a
flow packets: this primitive may be unused in some implementations or it
may be used, for example, to provide information to a transmit scheduler
that is performing per- traffic_class scheduling in addition to the
per-flow scheduling required by int-serv: the l2_header may be a pattern
(additional to the FilterSpec) to be used to identify the flow's
traffic.

 bind_l2schedulerinfo( flow_id, , l2_header, traffic_class )

* SBM <-> Local Admission Control

For applying local admission control for a session e.g. is there enough
transmit bandwidth still uncommitted for this potential new session? Are
there sufficient receive buffers? This should commit the necessary
resources if OK: it will be necessary to release these resources at a
later stage if the session setup process fails. This call would be made
by a segment's Designated SBM for example:

 status = admit_l2session( flow_id, Tspec, FlowSpec )

* SBM <-> RSVP - this is outlined above in section 9.2 and fully
described in [10].

* Management Interfaces

Some or all of the modules described by this model will also require
configuration management: it is expected that details of the manageable
objects will be specified by future work in the ISSLL WG.

13.2 Switch service definitions

The following interfaces are identified from Figure 4:

* SBM <-> Classifier


Seaman, Smith, Crawley    Expires January 1998         [Page 32]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


This is for notifying receive classifier of how to match up incoming
layer-2 information with the associated traffic class: it may in some
cases consist of a set of read-only default mappings:

 bind_l2classifierinfo( flow_id, l2_header, traffic_class )

* SBM <-> Queue and Packet Scheduler

This is for notifying transmit scheduler of additional layer-2
information associated with a given traffic class (it may be unused in
some cases - see discussion in previous section):

 bind_l2schedulerinfo( flow_id, l2_header, traffic_class )

* SBM <-> Local Admission Control

 As for host above.

* SBM <-> Traffic Class Map and Police

 Optional configuration of any user_priority remapping that might be
implemented on ingress to and egress from the ports of a switch (note
that, for Class I switches, it is likely that these mappings will have
to be consistent across all ports):

 bind_l2ingressprimap( inport, in_user_pri, internal_priority )
 bind_l2egressprimap( outport, internal_priority, out_user_pri )

 Optional configuration of  any layer-2 policing function to be applied
on a per-class basis to traffic matching the l2_header. If the switch is
capable of per-flow policing then existing int- serv/RSVP models will
provide a service definition for that configuration:

 bind_l2policing( flow_id, l2_header, Tspec, FlowSpec )

* SBM <-> Filtering Database

SBM propagation rules need access to the layer-2 forwarding database to
determine where to forward SBM messages (analogous to RSRR interface in
L3 RSVP):

 output_portlist = lookup_l2dest( l2_addr )

* Management Interfaces

Some or all of the modules described by this model will also require
configuration management: it is expected that details of the manageable
objects will be specified by future work in the ISSLL WG.


Seaman, Smith, Crawley    Expires January 1998         [Page 33]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


14. Compatibility and Interoperability with existing equipment

Switches using layer-2-only standards (e.g. 802.1p) will have to
cooperate with routers and layer-3 switches. Wide deployment of such
802.1p switches will occur in a number of roles in the network: "desktop
switches" provide dedicated 10/100 Mbps links to end stations and high
speed core switches will act as central campus switching points for
layer-3 devices. Layer-2 devices will have to operate in all of the
following scenarios:  * every device along a network path is layer-3
capable and intrusive into the full data stream * only the edge devices
are pure layer-2 * every alternate device lacks layer-3 functionality *
most devices lack layer-3 functionality except for some key control
points such as router firewalls, for example.


Of course, where int-serv flows pass through equipment which is ignorant
of priority queuing and which places all packets through the same
queuing/overload-dropping path, it is obvious that some of the
characteristics of the flow get more difficult to support. Suitable
courses of action in the cases where sufficient bandwidth or buffering
is not available are of the form:

*  buy more (and bigger) routers
*  buy more capable switches
*  rearrange the network topology: 802.1Q VLANs [11] may help
     here.
*  buy more bandwidth


It would also be possible to pass more information between switches
about the capabilities of their neighbours and to route around non-
QoS-capable switches: such methods are for further study.


15. Justification

An obvious comment is that this is all too complex, it's what RSVP is
doing already, why do we think we can do better by reinventing the
solution to this problem at layer-2?

The key is that there are a number of simple layer-2 scenarios that
cover a considerable proportion of the real QoS problems that will occur
and a solution that covers nearly all of the problems at significantly
lower cost is beneficial: full RSVP/int-serv with per-flow queueing in
strategically-positioned high-function switches or routers may be needed
to completely solve all issues but devices implementing the architecture


Seaman, Smith, Crawley    Expires January 1998         [Page 34]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


described in this document will allow a significantly simpler network.


16. References


[1] ISO/IEC 10038, ANSI/IEEE Std 802.1D-1993 "MAC Bridges"

[2] "Supplement to MAC Bridges: Traffic Class Expediting and
       Dynamic Multicast Filtering",  May 1997, IEEE P802.1p/D6

[3] "Integrated Services in the Internet Architecture: an Overview"
       RFC1633, June 1994

[4] "Resource Reservation Protocol (RSVP) - Version 1 Functional
       Specification", Internet Draft, June 1997
       <draft-ietf-rsvp-spec-16.ps>

[5] "Carrier Sense Multiple Access with Collision Detection
       (CSMA/CD) Access Method and Physical Layer Specifications"
       ANSI/IEEE Std 802.3-1985.

[6] "Token-Ring Access Method and Physical Layer Specifications"
       ANSI/IEEE Std 802.5-1995

[7] "A Framework for Providing Integrated Services Over Shared and
       Switched LAN Technologies", Internet Draft, May 1997
       <draft-ietf-issll-is802-framework-02>

[8] "Specification of the Controlled-Load Network Element Service",
       Internet Draft, May 1997,
       <draft-ietf-intserv-ctrl-load-svc-05.txt>

[9] "Specification of Guaranteed Quality of Service",
       Internet Draft, February 1997,
       <draft-ietf-intserv-guaranteed-svc-08.txt>

[10] "SBM (Subnet Bandwidth Manager): A Proposal for Admission
       Control over Ethernet", Internet Draft, July 1997
       <draft-ietf-issll-sbm-04>

[11] "Draft Standard for Virtual Bridged Local Area Networks",
        May 1997, IEEE P802.1Q/D6

[12] "General Characterization Parameters for Integrated
        Service Network Elements", Internet Draft, July 1997


Seaman, Smith, Crawley    Expires January 1998         [Page 35]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


        <draft-ietf-intserv-charac-03.txt>

[13] "A Standard for the Transmission of IP Datagrams over IEEE
        802 Networks", RFC 1042, February 1988

[14] C. Bisdikian, B. V. Patel, F. Schaffa, and M Willebeek-LeMair,
        The Use of Priorities on Token-Ring Networks for Multimedia
        Traffic, IEEE Network, Nov/Dec 1995.

[15] "Demand Priority Access Method, Physical Layer and Repeater
        Specification for 100Mbit/s", IEEE Std. 802.12-1995.

[16] "Fiber Distributed Data Interface", ANSI Std. X3.139-1987


17. Security Considerations

Implementation of the model described in this memo creates no known new
avenues for malicious attack on the network infrastructure although
readers are referred to section 2.8 of the RSVP specification for a
discussion of the impact of the use of admission control signaling
protocols on network security.


18. Acknowledgments

This document draws heavily on the work of the ISSLL WG of the IETF and
the IEEE P802.1 Interworking Task Group. In particular, it relies on
previous work on Token-Ring/802.5 from Anoop Ghanwani, Wayne Pace and
Vijay Srinivasan and on Demand-Priority/802.12 from Peter Kim.


19. Authors' addresses

Mick Seaman
3Com Corp.
5400 Bayfront Plaza
Santa Clara CA 95052-8145
USA
+1 (408) 764 5000
mick_seaman@3com.com

Andrew Smith
Extreme Networks
10460 Bandley Drive
Cupertino CA 95014
USA


Seaman, Smith, Crawley    Expires January 1998         [Page 36]


INTERNET DRAFT        Int-Serv over IEEE 802.1D/p              July 1997


+1 (408) 863 2821
andrew@extremenetworks.com

Eric Crawley
Gigapacket Networks
25 Porter Rd.
Littleton MA 01460
USA
+1 (508) 486 0665
esc@gigapacket.com


Seaman, Smith, Crawley    Expires January 1998         [Page 37]