Internet Engineering Task Force INTERNET-DRAFT P. Kim draft-kim-llrmp-01.txt HP Laboratories Bristol Expires: 6/97 Link Level Resource Management Protocol (LLRMP) Protocol Specification - Version 1 December, 1996 Status of Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract This memo describes the LLRMP, a link level signalling protocol used to setup resources in shared-medium and bridged/switched LANs. Network layer resource management protocols like RSVP or STII, or a local network manager, may invoke the LLRMP to request a certain quality of service over a bridged LAN. The LLRMP protocol supports a distributed admission control over bridged LANs. Resources are reserved independently on each segment, on a hop-by-hop basis, as a LLRMP reservation message travels along the data path from the source to the receiver. P. Kim Expires 6/97 [Page 1] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 Changes from the last Version 1. Added a section describing the LLRMP operation in a full-duplex switched LAN and the consequent simplifications that can be made to the protocol. 2. Added a section describing the RSVP Snooping Module and how this enables full receiver heterogeneity to be supported without changes to LLRMP or RSVP. 3. Added an appendix A.2 containing top-level pseudo-code for the LLRMP host and switch algorithms. 4. Added an appendix A.5 containing detailed pseudo-code for the above algorithms. Table of Contents 1. Introduction 3 2. Protocol Properties 4 2.1 The LAN Model 4 2.2 The Reservation Model 5 2.3 The Receiver Model 5 2.4 Summary 6 3. Protocol Operation and Mechanisms 8 3.1 The LLRMP Message Types 8 3.2 The Reservation Setup in a Full-Duplex Switched LAN 11 3.3 The Optimistic Setup Strategy - Reservations without local ACK-Messages 13 3.4 The Pessimistic Setup Strategy - Reservations with local ACK-Messages 14 3.5 The Reservation Refresh Mechanism 15 3.6 The Dynamic Segment Arbiter Election 15 3.7 Request Modifications and Tear Down 16 3.8 Control-Message-Forwarding within Bridges/Switches 16 3.9 Dynamic Topology Changes 17 3.10 Heterogeneous bridged LANs with non-LLRMP Clouds 18 3.11 More Receiver Heterogeneity - The RSVP Snooping Module 18 4. LLRMP Relationship to RSVP 18 4.1 LLRMP Invocation 19 4.1 Interface 19 5. Functional Specification 17 6. References 19 A. Appendix 20 A.1 Host and Switch Control Structures -- A.2 General Operation Overview -- A.3 Switch Functions -- A.4 Timer Values and Constants -- A.5 Example Pseudo-Code -- P. Kim Expires 6/97 [Page 2] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 1. Introduction The Link Level Resource Management Protocol (LLRMP) was designed for setting up resources in local, shared-medium and bridged/switched networks. This document describes the protocol operation and the mechanisms used to achieve this. The LLRMP can be used to dynamically reserve resources in heterogeneous bridged/switched networks composed of segments of a different technology e.g. Ethernet, Token Ring, FDDI and 100VG-AnyLAN. The reservation setup for those segments will only differ with regard to technology specific information which might need to be carried by the LLRMP and the actual admission control algorithm applied to control access to the link. Whenever an application desires quality of service then it negotiates this service with the network layer management protocol e.g. RSVP [1]. RSVP then carries the request through the internetwork and initiates admission control on each of the intermediate links between source and receiver. When an intermediate link in the data path is a bridged LAN then RSVP passes the reservation request down to the LLRMP. In general, the LLRMP performs similar operations for the link layer as carried out by RSVP or STII [2] at the network layer: it carries the request through the bridged network and initiates admission control for each of the segments within the data path between the local source and local receiver. Resources are reserved when necessary and available. Unicast and multicast flows are supported. However the LLRMP is much simpler than RSVP and can be deployed on existing switches. The LLRMP is a new link level protocol that runs in end-hosts and on bridges. All LLRMP control messages are sent with a new link level protocol identifier (e.g. a new ethertype) and are addressed to a multicast MAC address which is to be exclusively allocated for the protocol. The deployment of the LLRMP in bridges/switches can be performed gradually since the protocol can operate transparently through bridges and switches which do not support it. This would allow an administrator to initially control resources on bottleneck segments within the bridged LAN. Note that if the link technology supports e.g. priorities then only end-hosts which use the higher priority access mechanism need to be updated. Nodes which only send with normal priority are not affected. The LLRMP is independent of the network layer resource management. This allows the protocol to serve requests from different upper layer management entities e.g. RSVP, STII or a local network management system. Reservations are established and torn-down dynamically. The protocol further dynamically adapts to topology changes. Protocol state is only maintained in bridges along the data path and not in every bridge on the LAN. This relies on MAC address entries within bridges. We assume standard IEEE 802.1 learning bridges [5]. P. Kim Expires 6/97 [Page 3] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 In order to forward control-messages efficiently, the LLRMP entity on a bridge needs access to the local MAC address table. However, the protocol does not perform any routing functionality. This is assumed to be carried out by other link layer mechanisms e.g. the standard learning process. Throughout this draft, we expect the reader to be familiar with RSVP [1] and the current drafts for the Guaranteed- and the Controlled-Load service [3], [4] proposed in the Integrated Services IETF working group. The remainder of this document is organized as follows. Section 2 discusses the reservation- and the receiver model and lists the protocol properties of the LLRMP. We then describe in section 3 the protocol operation for reservation setup, reservation refresh, request mofification and tear down within full-duplex switched and shared-medium bridged LANs. Section 4 discusses the relationship to RSVP. Appendix A contains example pseudo-code for the protocol operation and the status information to be held within hosts and bridges. 2. Protocol Properties 2.1 The LAN Model We assume a LAN architecture which may have a pure shared, pure switched, or shared and switched topology. Heterogeneous technologies are assumed to be used for different segments of the bridged/ switched LAN. An example for a bridged LAN is illustrated in Figure 1. It consists of a root-switch and several workgroup-switches connecting shared-medium workgroup-segments to the switched backbone. A router and special hosts e.g. servers are connected directly to the root-switch. Since a pure switched network topology is a special case of a shared one, we mainly discuss the operation of the LLRMP protocol in a shared environment as the more general case. _____ RT - Router | SW |_____________________ (Root Switch) SW - Switch |_____| | H - Host / | \ | / | \ | / | \ | / __|__ \_____ __|__ / | SW | | SW | | SW | (Workgroup Switches) RT |_____| |_____| |_____| / \ / \ / \ / \ / \ / \ _/__ _\__ __/ _\__ __/ _\__ (Shared | | | || | | | | | Workgroup-Segments) H H H HH H H H H H Figure 1: Example bridged LAN Topology. P. Kim Expires 6/97 [Page 4] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 During the following discussion, we assume that shared-medium segments and bridges/switches support advanced packet-scheduling mechanisms e.g. priorities or have an equivalent mechanism which can enforce the desired quality of service on a per-packet basis. Bridges and hosts are further assumed to be able to classify data packets and process them according to the service assigned to the corresponding session. A simple and efficient classification at the link layer might be based on the IEEE 802.1p traffic-class and the destination address. The actual algorithms used for packet classification and scheduling are independent of the LLRMP protocol and out of the scope of this paper. However the LLRMP needs to control the classification and scheduling process. Interactions occur e.g. when a new flow is admitted - if admission control is successful the LLRMP informs the local classifier and scheduler in order to enable the requested quality of service for the new flow. 2.2 The Reservation Model The LLRMP performs a sender-based reservation setup. The current reservation model is one pass. Resources are reserved independently on each segment in the data path, on a hop-by-hop basis, as the LLRMP reservation-message travels from the source to the receiver. Resources are requested at the node where the traffic enters the link. We chose a sender-based reservation model because this ensures simplicity and good scaling. It further supports network-layer independence and preserves the ability of the LLRMP to serve different upper layer management protocols and to operate in different networking environments - not just IP. The protocol maintains a single interface to be used for requesting reservations, regardless of whether the request is for RSVP or non-RSVP flows. Furthermore, the model can more easily support measurement based admission control schemes since the existing reservation-messages travelling from the source to the receiver(s) can also be used to carry measured traffic characteristics or particular link or node-specific parameters used at the source node. No additional signalling is required for this case. 2.3 The Receiver Model The simplest LLRMP reservation setup implies no receiver heterogeneity at all within a multicast group. This is intended to be the minimum requirement to be supported in hosts and bridges/switches. The LLRMP protocol does not require a mechanism for receivers to signal a reservation request back to the data source since this is already provided by upper layer management protocols e.g. RSVP, and is therefore not required at link layer. By exploiting such upper layer signalling the LLRMP can support full RSVP receiver heterogeneity for all members of the multicast group. This is performed by an optional RSVP Snooping Module on the bridge. This module observes all RSVP control traffic forwarded through the switch and collects per-flow P. Kim Expires 6/97 [Page 5] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 information which is then used by the LLRMP during the reservation setup. Details are discussed in section 3.11. The separation between the LLRMP as a simple core protocol and the Snooping Module providing additional receiver heterogeneity allows the design of a wide range of switch solutions which can all operate with each other. Low cost switches would only support the LLRMP itself which would enable them to participate in the reservation setup. High-end switches could also support more sophisticated receiver-models up to full RSVP heterogeneity. All solutions can be incorporated in the same bridged/switched LAN. Note that the core LLRMP can be deployed on existing switches since it is simply a new link level protocol. This assumes that these switches have a processor. 2.4 Summary The LLRMP uses the soft-state concept, as proposed in RSVP, for managing reservation state in hosts and bridges. Resources are reserved for simplex data streams. There is one important control message in the LLRMP protocol: the reservation-message. This message is used to create, modify, periodically refresh and tear down reservation state. In the absence of refreshes, the state automatically times out. The LLRMP includes two different strategies for admitting and rejecting reservation requests. The first uses an optimistic approach which does not have an explicit local acknowledgment of reservation requests: all nodes running the protocol keep a local data base with information about existing reservations on the local network segment. After a node has successfully performed admission control for a flow, it sends an LLRMP reservation-message and immediately enables the requested service for data packets of the admitted flow. Other nodes are informed about the reservation when they receive the reservation message. Possible reservation state inconsistencies on different nodes on the segment are resolved using a distributed reject mechanism whereby a service request can be rejected by any node on the same segment. The second strategy is pessimistic. It uses an explicit acknowledgement-message sent by a single dedicated node on each segment in order to permit the request. The requested service at the source node is only enabled after the source node has received the acknowledgement-message. The dedicated node is called the arbiter. It performs the admission control for the local segment. The arbiter can be a bridge or a host and is automatically elected on each segment using a simple election mechanism. In contrast to the optimistic strategy, the pessimistic approach only requires the arbiter to perform the admission control. Other nodes do not have to manage a reservation data base. Both reservation setup strategies represent a compromise between control packet overhead and probability of service violation. For P. Kim Expires 6/97 [Page 6] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 shared-medium segments, we propose using the Optimistic Setup Strategy for services which do not provide hard guarantees e.g. the Controlled Load service, and the Pessimistic Strategy for a Guaranteed service. On full-duplex switched segments, the optimistic approach can be used for all types of service since the switch has complete control over the outgoing link and reservation state is not shared. A local acknowledgment-mechanism is not required in this case. Note that the end-to-end reservation setup is optimistic. After resources are admitted on the first segment, the LLRMP will inform the upper layer e.g. RSVP that resources are available on the first hop. The admission control carried out on downstream segments within the data-path could still fail and the source might later receive a notification that resources on a particular segment or branch of the multicast distribution tree could not be allocated. There is no end-to-end acknowledgment mechanism. The LLRMP protocol mechanisms are identical for hosts and bridges/switches. However since bridges are connected to multiple network segments they also have to make forwarding decisions for reservation-messages. Within bridges, the LLRMP could be implemented in a similar way to other link level protocols e.g. the 802.1D Spanning Tree protocol. Within hosts, we expect the LLRMP to be implemented e.g. as part of the device driver of a LAN adapter card. Summarizing, the protocol properties of the LLRMP are: a) The allocation concept is distributed, there is no central allocation-manager for the entire bridged LAN. The reservation setup can follow two different setup strategies. b) The LLRMP protocol is simple. Resources are reserved in one pass. Soft-state is installed in hosts and bridges. Reservation state is created, modified, torn-down and refreshed by the source using the LLRMP reservation-message. Synchronization mechanisms between senders and receivers are assumed to be at a higher level than the link level (e.g. performed by RSVP or ST-II). c) The LLRMP protocol is independent of the network layer resource management mechanisms. d) Resource management is supported for unicast and multicast data. Distinct or shared reservations can be created depending on the reservation request and the ability of bridges to do policing and rate regulation. e) The LLRMP protocol on its own does not support heterogeneous reservation-requests for receivers of the same multicast group. This keeps the core protocol simple and ensures that the LLRMP can be P. Kim Expires 6/97 [Page 7] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 implemented on existing bridges. However, if the LLRMP is used in conjunction with the RSVP Snooping Module then full RSVP receiver heterogeneity can be supported. f) LLRMP protocol state is only maintained along the data path between source and receiver and not in every bridge in the bridged network. This relies on MAC address entries in bridges/switches and assumes standard learning bridges. g) Reservation requests may be gathered into large control packets in bridges to reduce the overall control traffic load. h) The LLRMP protocol mechanisms are identical for end-nodes and bridges. Bridges additionally have to make forwarding decisions since they are connected to multiple segments. i) The LLRMP can operate in a heterogeneous bridged environment, where only a few bridges support the new protocol. Parts of the network which do not support the LLRMP are treated as a single logical segment by the LLRMP. 3. Protocol Operation and Mechanisms 3.1 The LLRMP Message Types The LLRMP contains five control messages which are used to manage and maintain a consistent reservation state on the segment. Note that some of the message types listed below are only used for reservation state notification when admission control failed. Resources at the link layer are reserved, refreshed, modified and torn-down using the reservation-message. If the setup uses the pessimistic strategy then any reservation request is acknowledged by the segment-arbiter. This uses an explicit acknowledgment-message. If the optimistic setup strategy (reservations without local acknowledgements) is used then the setup is only based on the reservation- message. No acknowledgement is returned in response to a new reservation-request. Any reservation-request that would exceed the resource limit on a local segement is rejected with a reject-message. Admission control failures from further down the data path (i.e. not on the local segement) are passed back to the source using notify-messages. Query-messages are periodically multicasted by the segment-arbiter to periodically force all nodes on a shared-medium segment to refresh the resources they have allocated. P. Kim Expires 6/97 [Page 8] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 3.1.1 The Reservation-Message LLRMP reservation-messages always travel downstream along the data path from the source to the receiver. A single message may contain several reservation-requests (request). Each reservation-request contains the following elements: request.ack_flag If this flag is set then the source of the request still expects an acknowledgement from the segment-arbiter for this request. This flag is only used when the reservation setup uses the Pessimistic Setup strategy (reservations with local acknowledgements). request.flow_id The flow-identifier contains the session-identifier session_id and the flow's source-identifier flow_src_id. The session_id uniquely identifies the session, the flow_src_id identifies the source node and contains e.g. the MAC address of the source. request.QoS_request The quality of service request QoS_request contains the reservation-request for this flow. A reservation-request is described by a set of QoS parameters e.g. bandwidth, burst-size, reservation style, minimum and maximum packet size used, etc. request.filter_spec The filter specification filter_spec contains a filter e.g. a multicast MAC destination address. It is passed to the classifier at all switches along the data path. After installing the filter, switches will apply e.g. a particular output priority level to any packet that matches this filter. 3.1.2 The Acknowledgement-Message Acknowledgements are only sent when the reservation setup uses the Pessimistic Setup strategy. They are always sent by the segment-arbiter. A single message may carry several acknowledgements (ack)'s. Each ack acknowledges a reservation-request made for a flow on the local segment. An ack consists of the flow-originator identifier phop, the flow-identifier flow_id, the quality-of-service request QoS_request, and the filter specification filter_spec. ack.phop The phop parameter identifies the flow-originator (the previous hop) on the local segment. It contains e.g. a MAC address of a bridge, host or router. The flow-originator is the node on the segment where the traffic enters the link. P. Kim Expires 6/97 [Page 9] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 ack.flow_id The flow-identifier flow_id is identical to the parameter carried within the reservation-request: request.flow_id. ack.QoS_request The quality of service request QoS_request is identical to the parameter carried within the reservation-request: request.QoS_request. ack.filter_spec The filter specification filter_spec is identical to the parameter carried within the reservation-request: request.filter_spec. 3.1.3 The Reject-Message A LLRMP reject-message may carry several rejects (reject). Each reject rejects a reservation-request made for a flow on the local segment and causes QoS parameters to be reset at hosts and switches (in contrast to the notify message below). A reject contains the flow-originator identifier phop, the flow-identifier flow_id, a set of quality-of-service parameters QoS, and an information field info carrying information to be returned to the user. Note that the reject message is only required on shared segments (it is not required in full-duplex switched environments). reject.phop The flow-originator (previous hop) phop is identical to the parameter carried within the acknowledgement-message: ack.phop. reject.flow_id The flow-identifier flow_id is identical to the parameter carried within the reservation-request: request.flow_id. reject.QoS The QoS field contains a set of quality-of-service parameters e.g. bandwidth, burst size, etc. It describes the resources which may be used by the rejected flow on the local segment. reject.info The info field carries information to be returned to the user. It contains e.g. the reject-code, the address of the rejecting segment-arbiter, etc. 3.1.4 The Notify-Message Notify-messages inform the data source that sufficient resources could not be reserved on a particular segment further down the data path (i.e. not on the local segment). This is particularly for the case where admission control for a branch of a multicast tree failed, but the rest of the tree passed. Therefore the QoS parameters set for the rest of P. Kim Expires 6/97 [Page 10] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 the tree can be retained. A single message may carry several notify-entries (notify) e.g. if a switch could not reserve resources on several outgoing segments. Each notify consists of the flow-originator identifier phop, the flow-identifier flow_id, and an information field info containing information to be returned to the user. notify.phop The flow-originator (previous hop) phop is identical to the parameter carried within the acknowledgement-message: ack.phop. notify.flow_id The flow-identifier flow_id is identical to the parameter carried within the reservation-request: request.flow_id. notify.info The info field carries information to be returned to the user. It contains e.g. the reject-code, the address of the rejecting segment-arbiter, etc. 3.1.5 The Query Message The query-message has two functions. It first allows a new node (e.g. a bridge) to quickly learn the reservation state on a local segment because all nodes receiving a query will report their reservation state within a defined time interval TD. Secondly, query-messages are used to elect the segment-arbiter. Query-messages only carry the arbiter flag arbiter_flag. query.arbiter_flag If set, the arbiter_flag indicates that the source of the query-message requests to become the arbiter on the local segment. For details about the arbiter-election process, see section 3.6. 3.2 The Reservation Setup in a Full-Duplex Switched LAN In order to clarify the fundamental LLRMP operation, we first consider the reservation setup in a full-duplex switched LAN. An implementation that only runs in a full-duplex switched environment can be greatly simplified. It does not have to support the segment-arbiter election process and the acknowledgement mechanism because the links between hosts and switches or between switches themselves are not shared. Each host/switch has complete control over the resources of the outgoing link. Note that this does not apply to half-duplex links. Figure2 illustrates the reservation setup from router RT to receiver R1. Router RT sends data packets to the multicast group g1 which was joined by receiver R1. The setup starts when the LLRMP on RT receives a reservation-request from the upper layer e.g. RSVP. After receiving the request, the LLRMP first initiates admission control for the P. Kim Expires 6/97 [Page 11] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 outgoing link to switch SW1. This is illustrated in step (1) within Figure 2. If the test fails then the reservation request is immediately rejected without any signalling on the network. If the admission control was successful, then the LLRMP sends a reservation-message to switch SW1 as shown in step (2). The LLRMP on RT also informs its local classifier about the existence of the new flow. After the filter for the new flow has been installed, all data packets matching this filter are transmitted using the specified medium access mechanism e.g. a certain priority level. _(3)_ (4) --> RT - Router | SW1 |_____________________ (Root Switch) SW - Switch |_____| g1 | H - Host / | \ | / / | \ | Request (2) / | \ | | / __|__ \_____ __|(5) | (1) / | SW2 | | SW3 | | SW4 | (Workgroup Switches) --> RT |_____| |_____| |_____| / \ / \ g1 / \ / \ / \ (6) / \ H H H H H(R1) H (Receiver) Figure 2: Example: Reservation Setup in a full-duplex switched LAN. When switch SW1 receives the reservation-message from RT, it first creates reservation state for the new flow. All output-ports are then checked to see whether a reservation on the corresponding link must be made for this flow. This decision is based on a MAC address entry within the forwarding table of the switch. Since group address g1 is only registered for the link to switch SW4, switch SW1 only reserves resources for that link. It initiates admission control and if this test was successful, SW1 forwards the reservation-message to switch SW4. This is shown in step (4). If however the admission control failed then SW1 only returns a notify-message to router RT reporting the admission control failure on the link to SW4. Note that if group g1 were registered for several output ports then SW1 would try to reserve resources for these ports despite the fact that the reservation-request for the link to SW4 was rejected. If switch SW4 receives the reservation-message then it carries out the same actions as described for switch SW1. The setup finishes when receiver R1 receives the reservation-message reporting the QoS parameters e.g. the bandwidth, bridge-hop-count, etc. reserved along the data path: RT - SW1 - SW4 - R1. Note that the reservation setup in full-duplex switched LANs only uses reservation- and notify-messages. Note further that, joining an active multicast group does not necessarily trigger a new reservation-message at the source P. Kim Expires 6/97 [Page 12] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 node. Instead, the reservation setup for the new mulicast receiver may be initiated at the nearest branch point (switch) of the reservation tree. 3.3 The Optimistic Setup Strategy - Reservations without local ACK-Messages This approach allows a reservation setup across shared medium segments. It is based on the same fundamental mechanisms as discussed for the setup in a full-duplex switched LAN. However, additional mechanisms are required because of the shared character of the link. These are the segment-arbiter election process and mechanisms for performing a distributed admission control for each shared- medium segment. The latter are specific for the Optimistic Setup Strategy. The reservation setup on a single shared-medium segment is illustrated in Figure 3. The Optimistic Setup Strategy implies that all nodes running the protocol keep a local data base with information about existing reservations on their local network segment. When the LLRMP on router RT receives the reservation request, it first performs admission control using its local reservation data base. If the test fails then the request is instantly rejected without any signalling on the network. If the admission control was successful, then the LLRMP on router RT updates the local data base and sends a reservation-message onto the local segment. It also informs the local classifier to enable the quality of service for the new flow. This is illustrated in steps (1), (2) and (3) within Figure 3. Figure 3: Reservation Setup without Acknowledgement-Messages. The reservation request message is multicasted to all other LLRMP entities on the local segment using a high priority medium access mechanism, if available. The message is addressed to a well-known, LLRMP specific multicast address. After receiving the request, each node (e.g. the switches SW1, S and host H) itself does admission control for the request using its own view of the total reserved resources and, if successful, updates its local data base. This is shown in step (4). If the admission control fails then a node assembles a reject-message and schedules it for transmission within a random time interval between > 0 and TD seconds. This process happens on all nodes and is illustrated in step (X). However there is one node on the segment which is allowed to send its reject immediately. This is the elected arbiter of the shared-medium segment. P. Kim Expires 6/97 [Page 13] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 Whenever other nodes receive a reject from the segment-arbiter, they cancel the reject-message they have scheduled, if they have one. This mechanism ensures that requests on a segment are usually rejected by just one node: the arbiter. However other nodes set up a reject-message in case the arbiter fails or has not received the request. After receiving a reject from the network, router RT updates its data base and reports the error to the upper layer e.g. RSVP. This finishes the reservation setup on the local segment. The setup process is repeated if a bridge decides to forwards the request onto an adjacent segment, as shown in step (5) within Figure 3. Note that bridges do not flood reservation-messages. They only forward request messages along the data path using the forwarding rules provided in section 3.8. 3.3.1 The Robustness of the Optimistic Setup Strategy The optimistic approach can tolerate the loss of a single control message without any impact. Two or even more reservation-messages can go missing at several nodes (e.g. due to congestion) as long as there is at least one node on the segment which has received all the messages. This doesn't have to be the arbiter since all nodes are able to reject reservation requests. However, the probability that reservation-messages go missing is very small since control messages are transmitted over a single segment using a high priority medium access mechanism, if available. In bridges and switches, control messages are passed to the processor queue and not to a (possibly congested) link output queue. It should be emphasized that the quality of the link level service is only affected if reservation-messages go missing and over-link-limit reservations are made during inconsistent reservation state conditions. Further, it is very unlikely that network administrators will allow reservations up to the link capacity. Inconsistencies of the reservation state are resolved by the refresh mechanism. Reservation-messages periodically update the status information on all nodes in the network. Any crash of the arbiter is resolved by the arbiter election process discussed in section 3.6. 3.4 The Pessimistic Setup Strategy - Reservations with local ACK-Messages The Pessimistic Setup Strategy uses an explicit acknowledgement of reservation requests to prevent any possible service violation due to inconsistent reservation state on different nodes on the shared- medium segment. After invocation, the LLRMP multicasts a reservation-message onto the local segment as illustrated in step (1) in Figure 4. Unlike the optimistic approach, the local classifier at router RT is not informed until the request was acknowledged by the segment-arbiter. P. Kim Expires 6/97 [Page 14] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 The admission control for all reservation requests on the segment is carried out by the arbiter as shown in step (2). If the test fails then a reject message is returned to the originator of the request. After receiving this message, router RT informs the upper layer e.g. RSVP about the event. Figure 4: Reservation Setup using local Acknowledgement-Messages. The arbiter returns an acknowledgement message to router RT if the request passed the admission con- trol test. Upon receipt, RT updates its local classifier which enables the quality of service requested for the new flow. As in the previous approach, all bridges on the segment check whether the reservation request needs to be forwarded or not. Note that this test is always carried out after the request has been acknowledged on the input-segment. A bridge that is not the segment-arbiter performs this test after it has received the acknowledgement-message (or a subsequent acknowledged reservation-message). Note that control messages are multicasted. By observing control messages, other nodes can automatically learn the existing reservation state on the local segment. 3.5 The Reservation Refresh Mechanism The reservation state on all nodes is aged out if not refreshed. Each LLRMP node periodically refreshes the reservation information held about it at all other nodes on the segment by sending a reservation-message. All refreshes are synchronized using a simple query-mechanism. The arbiter periodically sends a query-message onto the local segment. Whenever a node receives a query, it replies with a reservation-message reporting the reservations made by itself. The message is scheduled within an arbitrary time interval of length TD to avoid congestion. Each node only reports its own flows. 3.6 The Dynamic Segment Arbiter Election The query-mechanism allows a node to quickly learn the total reservation state on the local segment. It is also used to elect the arbiter. By default, the arbiter is the node with the lowest MAC address on the segment. The election is carried out independently on each segment of the bridged LAN. If an LLRMP node comes up, then it first assumes that it is the P. Kim Expires 6/97 [Page 15] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 segment-arbiter and thus queries the network by multicasting query-messages. After receiving a query with a lower MAC source address, a node will reschedule its own query-timer to be longer than the current value. If the new node is itself the new arbiter then its query-messages will update the query-timer on all other nodes, if not, then it will itself receive a query with a lower address. If the arbiter fails, the query-timer on other nodes runs out and finally another node becomes the arbiter. Optionally a node can set the arbiter flag in its query-messages. This forces all nodes which do not have the arbiter flag set themselves to update their query-timer, even if they have a lower MAC address than the originator of the query-message. The arbiter flag allows a dedicated node e.g. a bridge or a network management station to become the arbiter of the segment regardless of its MAC address. 3.7 Request Modifications and Tear Down LLRMP allows applications to dynamically change their reservation requests. As with new requests the availability of new resources is first checked by admission control. The new state is distributed using the normal reservation mechanism. Applications always keep their reservations while the LLRMP attempts to increase them. Reservation-messages are further used to release resources. In this case, the LLRMP simply sends a reservation-message with the new parameter set. If all resources for a flow are to be released then NULL values for all QoS-parameters are distributed with the reservation- message. This allows resources to be torn down quickly. 3.8 Control-Message-Forwarding within Bridges/Switches The LLRMP protocol mechanisms are identical for end-nodes and bridges/switches. However bridges are connected to several network segments and therefore have to manage reservation state for each of the segments connected. The forwarding decision for control-messages is based on matching address information found in the reservation-message and in the MAC address table of bridges. For unicast, the rules are similar to the data-packet forwarding rules used within bridges: 1. A request does not need to be forwarded to any other segment if the MAC destination address is registered for the same port from which the reservation-message was received (Leaf Rule). 2. If the bridge has a table entry registered on a port which is different to the one from which the reservation-message was received, then it only forwards the request through that port (Direct Path Rule). 3. If there is no address entry in the MAC address table then the request is forwarded to all segments, except the one from which the reservation-message was received (Flood Rule). P. Kim Expires 6/97 [Page 16] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 Similar rules apply for multicast. However, optional input can be provided by the RSVP Snooping Module, if such a module exists on the switch. We assume a multicast routing mechanism as currently being standardized in IEEE 802.1p: 1. A reservation request does not need to be forwarded through a particular output-port if a group entry was found in the address table but the group is not registered for that output-port. 2. If the group is registered for an output-port then the reservation request is forwarded. However, this decision can be overruled by information from the Snooping Module on the bridge reporting that no quality of service was requested for the flow on this output-port. Note that the reservation request forwarded through any output-port will always be equal or lower than the request received from the input-port. 3. If there is no address entry in the address-table then the request is forwarded to all segments, except the one from which the reservation-message was received. As in the previous rule, the decision for each output-port can be overruled by information received from an existing Snooping Module. The forwarding process ensures that reservation-messages are only forwarded along the data path between the sender and the receiver(s). This relies on MAC address entries within the address table of bridges/switches. For reservation requests from RSVP and STII, these entries are likely to be there because: (1) network layer control messages are periodically exchanged. These are PATH and RESV messages in RSVP and HELLO messages in ST-II. (2): applications with stringent service requirements are likely to be interactive, and will thus have a return control-channel and/or a return data-path. The LLRMP still works when these address table entries are missing. In this case, reservation messages become forwarded to all segments and will reserve the requested resources on the entire LAN. This is equivalent to a network layer based allocation policy, where the link level structure of the network remains hidden. Eventually however, data packets will flow causing entries to be made in MAC address tables of bridges and old reservations to time out on unused segments. 3.9 Dynamic Topology Changes The LLRMP automatically recovers from dynamic topology changes using the reservation refresh mechanism. Whenever the bridged topology changes, reservation refreshes are forwarded along the new data path and will immediately reserve any missing resources. Old reservation state in bridges will time out since it is not refreshed. P. Kim Expires 6/97 [Page 17] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 3.10 Heterogeneous bridged LANs with non LLRMP Clouds The LLRMP can operate in a heterogeneous environment where only parts of the bridged LAN support the new protocol. Existing bridges and switches will flood LLRMP reservation messages since control messages are sent with a new link level protocol identifier (e.g. a new ethertype) and are addressed to an exclusively allocated multicast address. Non LLRMP clouds between two LLRMP nodes or bridges/switches are treated as one single logical segment for allocating resources. 3.11 More Receiver Heterogeneity - The RSVP Snooping Module The RSVP Snooping Module is an optional optimization for supporting RSVP receiver heterogeneity. The module sits on the switch and observes all RSVP control traffic forwarded through the switch. It does not modify any RSVP messages during forwarding (although it might drop duplicate messages), and it also does not require any modifications to RSVP e.g. new frame formats, objects etc. While observing RSVP control traffic, the Snooping Module collects per-session, per-flow, or even per- receiver information. This information is used by the LLRMP when it determines the quality of service to be reserved for a particular session, flow or receiver on a particular output-port. The level of RSVP receiver heterogeneity supported depends on the amount of information collected. For each flow using e.g. a Controlled Load service, the module maintains a data base entry similar to: | flow_id | service_id = Controlled_Load | ist_of_input_ports | timer | where list_of_input_ports is a per-port bit-mask containing a flag for each switch-port. Each bit indi- cates whether an RSVP RESV message for flow flow_id was received from this port during the last timer interval. The parameter timer contains a simple timer which is used to let the entry time out if not refreshed. It is also a bit mask. More heterogeneity can be supported if the module collects per receiver information. A data base entry for a guaranteed service supporting full receiver heterogeneity could look like: | flow_id | service_id = Guaranteed | receiver_id | QoSrequest | input_port | timer | It is clear that the amount of information to be maintained represents a compromise between costs and level of receiver heterogeneity supported. 4. LLRMP Relationship to RSVP RSVP is the reservation setup protocol designed for an Integrated Services Internet. Applications use the protocol to negotiate a P. Kim Expires 6/97 [Page 18] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 specific quality of service with the network. After receiving a reservation request, RSVP carries the request through the internetwork and initiates resource-reservation on each of the intermediate links along the data path. During the RSVP reservation setup, a PATH message travels downstream from the source towards the receiver(s). It contains the traffic characteristic (TSpec) describing the traffic generated at the data source. RSVP RESV messages travel upstream from the receiver(s) towards the source. They are used by receiver(s) for signalling their service request (FlowSpec). 4.1 LLRMP Invocation LLRMP reservation requests are made at the node where the traffic enters the bridged LAN. This node could be a host, a router or a gateway. RSVP makes the reservation request for a flow to the LLRMP after is has received a PATH and a RESV message for this flow, because this provides the required information (TSpec, FlowSpec) required for the request. This is illustrated in Figure 5. Figure 5: LLRMP Invocation from RSVP. 3.2 Interface The interface between RSVP and LLRMP is for further study. 4. Functional Specification The exact message formats are defined in a future draft. 5. References [1] R. Braden, L. Zhang, S. Berson, S. Herzog, S. Jamin, Resource ReSerVation Protocol (RSVP) - Version 1 Functional Specification, Internet-Draft draft-ietf-rsvp-spec-13.ps, August 1996. [2] L. Delgrossi, L. Berger, Internet Stream Protocol Version 2 (ST2), Protocol Specification - Version ST2+, RFC 1819, August 1995. [3] S. Shenker, C. Partridge, R. Guerin, Specification of Guaranteed Quality of Service, Internet Draft draft-ietf-intserv-guaranteed-svc-06.txt, August 1996. [4] J. Wroclawski, Editor, Specification of the Controlled-Load Network Element Service, Internet Draft draft-ietf-intserv-ctrl-load-svc-03.txt, August 1996. P. Kim Expires 6/97 [Page 19] INTERNET-DRAFT draft-kim-llrmp-01.txt December 1996 [5] IEEE 802.1D, International Standard, Information technology - Telecommunications and information exchange between systems - Local area networks - Media access control (MAC) bridges, ISO/IEC 10038:1993. Authors Address Peter Kim Hewlett Packard Laboratories Bristol Filton Road, Stoke Gifford Bristol BS12 6QZ. U.K. pk@hplb.hpl.hp.com +44 117 922 8357 A. Appendix A. Appendix This section contains example pseudo-code for the LLRMP protocol operation and the status information to be held within hosts and bridges. It is intended to provide a detailed overview of the algorithm. However real implementations may use additional or different data structures in order to e.g. optimize status-lookups and management. The pseudo-code uses the packet-formats described in section 3.1. Protocol details like the format verification, some error-cases and the details of the hashing mechanisms were omitted. Several possible optimizations and optional mechanisms were also left out. For the sake of clarity, we further included a summarizing overview for each algorithm. NOTE that the appendix is only available within the postscript version of this draft (draft-kim-llrmp-01.ps).