Internet Draft Mick Seaman Expires May 1997 3Com Corp. draft-ietf-issll-802-00.txt Andrew Smith Extreme Networks Eric Crawley Bay Networks November 1996 Integrated Services over IEEE 802.1D/802.1p Networks Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. Abstract This document describes the support of IETF Integrated Services over LANs built from IEEE 802 network segments which are interconnected by standard IEEE 8021.D [1] switches. It describes the practical capabilities and limitations of this technology for supporting Controlled Load [8] and Guaranteed Service [9] using the inherent capabilities the relevant 802 technologies [5],[6] etc. and the proposed 802.1p queuing features in switches. It provides a functional model for the layer 3 to layer 2 and user-to-network dialogue which supports admission control and defines requirements for interoperability between switches. This scheme is consistent with the ISSLL over LANs framework discussed at the October 1996 ISSLL interim meeting and described in [7]. Seaman, Smith, Crawley Expires May 1997 [Page 1] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 1. Introduction The IEEE 802.1 Interworking Task Group is currently enhancing the basic MAC Service provided in Bridged Local Area Networks (aka "switched LANs"). As a supplement to the IEEE MAC Bridges standard [1] , P802.1p [2], proposes differential traffic class queuing ("priorities") and access to media on the basis of a "user_priority" signaled in frames. In this document we * review the meaning and use of user_priority in LANs and the frame forwarding capabilities of a standard LAN switch. * examine alternatives for identifying layer 2 traffic flows for admission control. * review the options available for policing traffic flows. * derive requirements for consistent priority handling in a network of switches and use these requirements to discuss priority queue handling alternatives for 802.1p and the way in which these meet administrative and interoperability goals. * consider the benefits and limitations of this switched-based approach, contrasting it with full router based RSVP implementation in terms of complexity, utilisation of transmission resources and administrative controls. We then describe a model which: * partitions the admission control process into two separable operations: * an interaction between the user of the integrated service and the local network elements ("provision of the service" in the terms of 802.1D) to confirm the availability of transmission resources for traffic to be introduced. * selection of an appropriate user_priority for that traffic on the basis of the service and service parameters to be supported. * distinguishes between the user to network interface above and the mechanisms used by the switches ("support of the service"). These include communication between the switches (network to network signaling). * describes a simple architecture for the provision and support of these services, broken down into components with functional and interface descriptions: * a single "user" component: a layer-3 to layer-2 negotiation and translation component. * bridge/switch processes to handle admission control and mapping requests, including proposals for actual traffic mappings to user_priority values. * proposes a set of protocol exchange primitives based on the functions introduced. This document contains much background material that is used as Seaman, Smith, Crawley Expires May 1997 [Page 2] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 justification for the approach taken. It is anticipated that much of this material will not form a part of the final specification. It will be noted that this document is written from the pragmatic viewpoint that there will be a widely deployed network technology and we are evaluating it for its ability to support some or all of the defined IETF integrated services: this approach is intended to ensure development of a system which can provide useful new capabilities in existing (and soon to be deployed) network infrastructure. 2. Goals and Assumptions It is assumed that the network is "switch-rich": that is to say all communication between end stations using integrated services support will pass through at least one switch. Perhaps the mechanisms and protocols described will be trivially extensible to communicating systems on the same shared media, but it is important not to allow problem generalisation to complicate the practical application that we target: the access characteristics of Ethernet are forcing a trend to switch-rich topologies together with MAC enhancements to ensure access predictability on half-duplex switch to switch links. It is assumed that layer-3 entities, including end-stations, are running the RSVP protocol in support of integrated services at that layer. No extra modifications to this protocol are assumed. There may be a heterogeneous mixture of switches with different capabilities, all compliant with IEEE 802.1p, but implementing queuing and forwarding mechanisms in a range from simple 2-queue per port, strict priority, up to more complex multi-queue (maybe even one per- flow) WFQ or other algorithms. The problem is broken down into smaller independent pieces: this may lead to sub-optimal usage of the network resources but we contend that such benefits are often equivalent to very small improvements in network efficiency in a LAN environment. Therefore, it is a goal that the switches in the network operate using a much simpler set of information than the RSVP engine in a router. In particular, it is assumed that such switches do not need to implement per-flow queuing and policing. One corollary is that no per-flow policing function need take place in the switches: it is a fundamental part of the intserv model that flows are isolated from each other throughout their transit across a network. Intermediate queuing nodes are expected to police the traffic to ensure that it conforms to the pre-agreed traffic flow specification. In the architecture proposed here for mapping to layer-2, that policing function is assumed to be implemented in the transmit schedulers of the Seaman, Smith, Crawley Expires May 1997 [Page 3] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 layer-3 devices (end stations, routers): it is reasonable to assume that end stations are "trusted" to adhere to their agreed contracts at the inputs to the network and that we can afford to over-allocate resources to compensate for the inevitable extra jitter/bunching introduced by the switched network itself. 3. User Priority and Frame Forwarding User_priority is a value associated with the transmission and reception of all frames in the IEEE 802 service model: it is supplied by a sender which is using the MAC service. It is provided to a receiver using the MAC service. It may or may not be actually carried over the network: Token-Ring/802.5 carries this value (encoded in its FC octet), basic Ethernet/802.3 does not. 802.1p defines a way to carry this value over the network in a similar way on Ethernet, Token Ring, FDDI or other MACs using an extended frame format. The "user_priority" or "traffic class" (the latter term is to be preferred and it is the title of the 802.1p document) field in packets is a simple label in the data stream enabling packets in different classes to be discriminated by downstream nodes. Apart from making the job of desktop or wiring-closet switches easier, it means they do not have to change (hardware or software) as the rules for classifying packets evolve (based on new protocols or new policies). Layer-3 switches do provide added value here by performing the classification more accurately and, hence, utilising network resources more efficiently: this appears to be a good economic choice since there are likely to be very many more desktop/wiring closet switches in a network than switches requiring layer 3 functionality. The IEEE 802 specifications make no assumptions about how user_priority is to be used by end stations or by the network, although the current 802.1p draft defines static priority queuing as the default mode of operation of all switches (user_priority is defined as a 3-bit quantity with value 7 = high priority, 0 = low priority). The switch algorithm in this case is as follows: packets are placed onto a particular queue based on the received user_priority (from the packet if a 802.1p header or 802.5 network was used, invented according to some local policy if not). The selection of queue is based on a mapping from user_priority [0,1,2,3,4,5,6 or 7] onto the number of available queues - switches may implement any number of queues from 1 upwards. On transmit, any/all frames from a higher priority queue are sent first before transmitting any from a lower priority queue. In particular, IEEE makes no recommendations about how a sender should select the value for user_priority: one of the main purposes of this draft is to propose such usage rules. Seaman, Smith, Crawley Expires May 1997 [Page 4] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 Additionally, there are no IEEE 802-defined rules for switches to agree on how to treat frames with different user_priority values: later on in this draft we make some recommendations as to what information needs to be shared amongst switches. 4. Mapping of integrated services to layer-2 in layer-3 devices The end-station or router itself is responsible for local admission control and scheduling packets onto its link in accordance with the service agreed. Just as in the intserv model, this involves per- flow schedulers somewhere in every such data source: it is an implementation issue whether there are separate schedulers for layer-3 and layer-2 or whether these are combined. 5. Mapping of integrated services through layer-2 switches 5.1 Queuing Connectionless packet-based networks in general and LAN switched networks in particular, work today because of scaling choices in network provisioning. Consciously or (more usually) unconsciously, enough excess bandwidth and buffering is provisioned in the network to absorb the traffic sourced by higher-layer protocols or cause their transmission windows to run out, on a statistical basis, so that the network is only overloaded for a short duration and the average expected loading is less than 60% (usually much less). With the advent of time-critical traffic such overprovisioning has become far less easy to achieve. Time critical frames may find themselves queued for annoyingly long periods of time behind temporary bursts of file transfer traffic, particularly at network bottleneck points, e.g. at the 100 Mb/s to 10 Mb/s transition that might occur between the riser to the wiring closet and the final link to the user from a desktop switch. In this case, however, if it is known (guaranteed by application design, merely expected on the basis of statistics, or just that this is all that the network guarantees to support) that the time critical traffic is a small fraction of the total bandwidth, it suffices to give it strict priority over the "normal" traffic. The worst case delay experienced by the time critical traffic is roughly the maximum transmission time of a maximum length non-time-critical frame - less than a millisecond for 10 Mb/s Ethernet, and well below an end to end budget based on human perception times. When more than one "priority" service is to be offered by a network element e.g. it supports controlled-load as well as Guaranteed Service, the queuing discipline becomes more complex. In order to provide the Seaman, Smith, Crawley Expires May 1997 [Page 5] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 required isolation between the service classes, it will probably be necessary to queue them separately. There is then an issue of how to service the queues - a combination of admission control and maybe weighted fair queuing may be required in such cases. As with the service specifications themselves, it is not the place for this document to specify queuing algorithms, merely to observe that the external behaviour meet the services' requirements. 5.2 Multicast Heterogeneity IEEE 802.1D and 802.1p use a model for multicast whereby a switch performs multicast routing decisions based on the destination address: this would produce a list of output ports to which the packet should be forwarded. In its default mode, such a switch would use any user_priority value in received packets to enqueue the packets at each output port. At layer-3, the intserv model allows heterogeneous multicast flows where different branches of a tree can have different types of reservations for a given multicast destination, or even supports the notion that some trees will have some branches with reserved flows and some using best effort (default) service. If a switch is selecting per-port output queues based only on the incoming user_priority, it will have to treat all branches of all multicast sessions within that user_priority class with the same queuing mechanism: no heterogeneity is then possible (if it were to implement a separate mapping at each output port then some limited form of heterogeneity could be supported). It is proposed that per- user_priority queuing support is adequate as minimum standard functionality for systems *in a LAN environment*. Layer-3 switches (a.k.a. routers) can be used if more flexible forms of heterogeneity are considered necessary: their behaviour is well standardised. 6. Selecting User Priority classes One fundamental question is "who gets to decide what the classes mean and who gets access to them?" One approach would be for the meanings of the classes to be "well-known": we would then need to standardise a set of classes e.g. 1 = best effort, 2 = controlled- load, 3 = guaranteed (loose delay bound, high bandwidth), 4 = guaranteed (slightly tighter delay) etc. The values to encode in such a table in end stations, in isolation from the network to which they are connected, is problematical: the best we could probably do would be to define on user_priority value per intserv service type and leave it at that (reserving the rest of the combinations for future traffic classes - there are sure to be plenty!). Seaman, Smith, Crawley Expires May 1997 [Page 6] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 We propose a more flexible mapping: clients ask "the network" which user_priority traffic class to use for a given traffic flow, as categorised by its flow-spec and layer-2 endpoints. The network provides a value back to the requester which is appropriate to the current network topology, load conditions, other admitted flows etc. The task of configuring switches with this mapping (e.g. through network management or some other switch-switch protocol) is an order of magnitude less complex than performing the same function in end stations. Also, when new services (or other network reconfigurations) are added to such a network, the network elements will typically be the ones to be upgraded with new queuing algorithms etc. and can be provided with new mappings at this time. Given the need for a new session or "flow" requiring some QoS support, a client then needs answers to the following questions: 1. which traffic class do I add this flow to? The client needs to know how to label the packets of the flow as it places them into the network. 2. who do I ask/tell? The proposed model is that a client ask "the network" which user_priority traffic class to use for a given traffic flow. This has several benefits as compared to a model which allows clients to select a class for themselves. 3. how do I ask/tell them? A request/response protocol is needed between client and network: in fact, the request can be piggy-backed onto an admission control request and the response can be piggy-backed onto an admission control acknowledgment. The network (i.e. the first network element encountered downstream from the client) must then answer the following questions: 1. which traffic class do I add this flow to? This is a packing problem, difficult to solve in general, but many simplifying assumptions can be made: presumably some simple form of allocation can be done without a more complex scheme able to dynamically shift flows around between classes. 2. which traffic class has worst-case parameters which meet the needs of this flow? This might be an ordering/comparison problem: which of two service classes is "better" than another? Again, we can make this tractable by observing that all of the current intserv classes can be ranked (best effort <= Controlled Load <= Guaranteed Service) in a simple manner. If any classes are implemented in the future that cannot be simply ranked Seaman, Smith, Crawley Expires May 1997 [Page 7] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 then the issue can be finessed by either a priori knowledge about what classes are supported or by configuration. and return the chosen user_priority value to the client. Note that the client may be either an end station, router or a first switch which may be acting as a proxy for a client which does not participate in these protocols for whatever reason. Note also that a device e.g. a server or router, may choose to implement both the "client" as well as the "network" portion of this model so that it can select its own user_priority values: such an implementation is, however, discouraged unless the device really does have a close tie-in with the network topology and resource allocation policies. 7. Flow Identification Several previous proposals for intserv over lower-layers have treated switches very much as a special case of routers: in particular, that switches along the data path will make packet handling decisions based on the RSVP flow and filter specifications and use them to classify the corresponding data packets. However, filtering to the per-flow level becomes cost-prohibitive with increasing switch speed: devices with such filtering capabilities are unlikely to have a very different implementation cost to IP routers, in which case we must question whether a specification oriented toward switched networks is of any benefit at all. This document proposes that "flow" identification based in user_priority be the minimum required of switches. 8. Reserving Network Resources - Admission Control So far we have not discussed admission control. In fact, without admission control it is possible to scratchbuild a LAN network of some size capable of supporting real-time services, providing that the traffic fits within certain scaling constraints (relative link speeds, numbers of ports etc. - see below). This is not surprising since it is possible to run a fair approximation to real time services on small LANs today with no admission control or help from encoded priority bits. Imagine a campus network providing dedicated 10 Mbps connections to each user. Each floor of each building supports up to 96 users, organized into groups of 24, with each group being supported by a 100 Mbps downlink to a basement switch which concentrates 5 floors (20 x 100 Mbps) and a data center (4 x 100 Mbps) to a 1 Gbps link to an 8 Gbps central campus switch, which in turn hooks 6 buildings together (with 2 Seaman, Smith, Crawley Expires May 1997 [Page 8] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 x 1 Gbps full-duplex links to support a corporate server farm). Such a network could support 1.5 Mb/s of voice/video from every user to any other user or (for half the population) the server farm, provided the video ran high priority: this gives 3000 users, all with desktop video conferencing running along with file transfer/email etc. In such a network RSVP's role would be limited to ensuring resource availability at the communicating end stations and for connection to the wide area. In such a network, a discussion as to the best service policy to apply to high and low priority queues may prove academic: while it is true that "normal" traffic may be delayed by bunches of high priority frames, queuing theory tells us that the average queue occupancy in the high priority queue at any switch port will be somewhat less than 1 (with real user behaviour, i.e. not all watching video conferences all the time) it should be far less. A cheaper alternative to buying equipment with a fancy queue service policy may be to buy equipment with more bandwidth to lower the average link utilisation by a few per cent. In practice a number of objections can be made to such a simple solution. There may be long established expensive equipment in the network which does not provide all the bandwidth required. There will be considerable concern over who is allowed to say what traffic is high priority. There may be a wish to give some form of "prioritised" service to crucial business applications, above that given to experimental video-conferencing. The task that faces us is to provide a degree of control without making that control so elaborate to implement that the control oriented solution is not simply rejected in favor of providing yet more bandwidth, at a lower cost. The proposed admission control mechanism requires a query-response interaction with the network returning a "YES/NO" answer and, if successful, the user_priority value with which to tag the data frames of this flow. 9. Client mapping to layer 2 We assume the same host model as intserv and RSVP: the client is running an RSVP process which presents a session establishment interface to applications, signals RSVP over the network, programs scheduler and classifiers in the driver and interfaces to a policy control module. In particular, RSVP also interfaces to a local admission control module: it is this entity that we focus on here. Seaman, Smith, Crawley Expires May 1997 [Page 9] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 The following diagram is taken from the RSVP spec: _____________________________ | _______ | | | | _______ | | |Appli- | | | | RSVP | | cation| | RSVP <-------------------- | | <-- | | | | | |process| _____ | | |_._____| | --Polcy|| | | |__.__._| |Cntrl|| | |data | | |_____|| |===|===========|==|==========| | | --------| | _____ | | | | | ----Admis|| | _V__V_ ___V____ |Cntrl|| | | | | | |_____|| | |Class-| | Packet | | | | ifier|==Schedulr|==================== | |______| |________| | data | | |_____________________________| Figure 1 - RSVP in Hosts The local admission control entity (known as "TUTU") within a client is responsible for mapping these layer-3 requests in TO layer TwO language. The upper-layer entity requests from TUTU: "May I reserve for traffic with