INTERNET-DRAFT Marc Greis November 1998 Markus Albrecht University of Bonn, Germany Aggregation of Internet Integrated Services State using Parameter-based Admission Control draft-greis-aggregation-with-pbac-00.txt Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Abstract Aggregation has been proposed as one possible solution to the scalability problem of the Internet Integrated Services. The current suggestions for aggregation are based on measurement-based admission control, which allows for the omission of RSVP soft state in the interior routers of an aggregating domain. However, measurement-based admission control has certain flaws which may lead to over-reservations on links in the network under certain conditions. This can result in packet losses for reserved traffic. Hence, we believe that it will be necessary to discuss the possibility of using parameter-based admission control with aggregation. In this document, we present a technique for using parameter-based admission control with aggregation as a basis for further discussions, and we evaluate possible advantages and disadvantages. Greis, Albrecht Expires 5/99 [Page 1] INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998 1. Introduction It has been stated in the RSVP Applicability Statement [4] that RSVP as defined in [3] has a scalability problem, since per-flow state has to be maintained in each node of the network for each RSVP-supported flow traversing the node. One of the most promising solutions for the problem seems to be the aggregation of RSVP-supported flows. The basic idea is to let the ingress nodes for a flow decide if a flow can be admitted when a RESV message arrives. In [1], messages are sent from the ingress to the egress to determine if the interior nodes in the aggregating domain can admit the new flow. The interior nodes do not keep per-flow state for admission control, they simply measure the amount of reserved traffic to determine if a new flow can be admitted. However, it is not impossible that this kind of admission control fails under certain circumstances. Admission control failure can lead to packet losses, which makes the results of performing reservations with RSVP less predictable. There are two kinds of traffic which may cause measurement-based admission control (MBAC, [2]) to fail: - Very bursty traffic, such as video or audio traffic (especially audio traffic from audio conferences which may be idle for long periods). - Traffic from sessions where resources are reserved a long time before data is actually sent. In both cases, the future behavior of the traffic sources can not be predicted from past measurements. The second case seems to be less likely, as in a real environment it would be a waste of money for a user to reserve resources for a session a long time in advance, and possible solutions for performing advance reservations have been proposed. Still, a user can not be kept from reserving resources a long time before they are being used, and in fact, it is possible that users who want to disrupt a provider's RSVP services may exploit this possibility. We believe that it is necessary to consider and discuss the possibility of using parameter-based admission control (PBAC) with aggregation. In this document, we present a scheme based on [1] for using PBAC with aggregation, thus enhancing the reliability of flow aggregation at the cost of a somewhat greater overhead. However, we will evaluate the overhead based on comparisons with 'standard' RSVP and with MBAC-based aggregation, and we will show that the additional overhead may be acceptable especially in smaller domains. The rest of this document is structured as follows: In section 2, we will present our basic idea together with an example scenario, in section 3 we describe the necessary additions to the RSVP protocol, Greis, Albrecht Expires 5/99 [Page 2] INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998 in section 4 we describe our technique in more detail, and in section 5 we evaluate the additional overhead necessary for using our technique with aggregation. 2. The Basic Technique The main idea behind the scheme we propose is that each router in an aggregating domain maintains a table with the amount of aggregate bandwidth reserved on the path to each edge router which can be reached from this router. The information in these tables will be gathered from the ADREQ messages that are sent by the ingress routers of an aggregating domain to request admission control information from the interior routers (as proposed in [1]). However, there are several possible problems with this approach: - A reservation request from an ADREQ message that is accepted by an interior router may still be rejected by downstream routers. - ADREQ messages would inform the interior routers only about reservation requests, but not about reservation teardown. - ADREQ messages may be lost on their way from the ingress to the egress, in which case they will be resent later. This may cause overreservations in interior routers which receive the same ADREQ message twice, and use it twice to update their admission control information. One obvious solution to the second problem may be to let interior routers process ResvTear messages. This would create several new problems though, as it is possible that ResvTear messages may be lost. It is also one of the advantages of the scheme proposed in [1] that interior routers in an aggregating domain do not have to process any RSVP messages except ADREQ messages. To solve all three problems mentioned above, we propose a new message type called ADSTAT (=ADmission control STATus) which would be sent in certain intervals (e.g. every 30 seconds) from each router in the aggregating domain to each adjacent router to update the admission control information. We also propose that admission control status information is sent with ADREQ messages. They would contain the amount of aggregate bandwidth reserved on the path from the router which sent the ADREQ message to the corresponding egress router for this message. They would be updated by each interior router with the aggregate admission control information for the egress router as they pass through the aggregating domain. The admission control status information in the interior router can be seen as a 'per-edge-router state' (as opposed to the 'per-flow state' in RSVP), which grows larger with the amount of edge routers Greis, Albrecht Expires 5/99 [Page 3] INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998 in an aggregating domain, which limits this scheme to 'small' domains, though possible values for the amount of edge routers will be discussed in section 5. It should be noted that the admission control state in the routers has to be a soft state. If it is not refreshed by ADSTAT or ADREQ messages with status information, it will expire. This is necessary to avoid the 'survival' of out-dated status information after routing changes. It will be left to future research and discussions to determine how long an admission control status should be kept before it expires. It will be necessary to avoid sending redundant status information to save bandwidth. Status information should only be sent when it differs from the last status information that was sent with an ADSTAT or ADREQ message. It should be kept in mind though that the admission control status in the interior routers needs to be refreshed periodically. But sending hundreds of ADREQ messages with exactly the same status information within a few seconds should be avoided. This could happen on interior routers if there is no more free bandwidth on an interface for an interior router, so new reservations are rejected and the admission control status does not change, but ADREQ messages may still pass through. It is also possible to only send status information with ADREQ messages periodically (as with ADSTAT messages). It should be kept in mind though that this makes the technique more 'conservative', as the information in interior routers may be out-dated, and they may reject new reservations based on this old information. Further research will be necessary to determine useful values for this and other possible parameters. Other issues which will also have to be considered in the future are multicast sessions where reservations from different branches of the multicast tree can be merged, and shared reservation styles. It may be necessary to use common RSVP in these cases or to use aggregation with MBAC. 2.1. A Simple Example The example in this section will be used to illustrate the technique we propose. Figure 1 shows a sample domain with 5 routers. The routers R1, R4 and R5 are edge routers, R2 and R3 are interior routers. An example for the admission control status information in the routers is shown in figure 2. Integer numbers were chosen to represent resources. For example, router R1 has reserved an amount of 11 'resource units' towards edge router R4 and 21 towards edge router Greis, Albrecht Expires 5/99 [Page 4] INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998 R5. No other entries are necessary for this router, as no other edge routers are present in the domain, and all traffic from R1 going into the domain will leave the domain either through R4 or R5. | | --[R1]--------(R2)--------(R3)--------[R4]-- | | | | --[R5]-- | Figure 1: A Sample Domain __________ _______ | R2 | | R3 | ____ |-+--+--+--| |-+--+--| ____ | R1 | | | a| b| c| | | a| b| | R4 | |-+--| (a)|-|--|--|--|(c) (a)|-|--|--|(b) |-+--| |4|11|----------|1| /|17|31|----------|1| /|31|----------|1|31| |5|21| |4|11|14| /| |4|25| /| |5|22| |_|__| |5|21| /|22| |5| /|22| |_|__| |_|__|__|__| |_|__|__| |(b) | _|__ | R5 | |-+--| |1|17| |4|14| |_|__| Figure 2: The Admission Control Tables for the Example Topology The admission control status for R2 is more complicated. It is not only necessary to store the amount of reserved bandwidth on the path towards an edge router, but also through which interface this information was received. For example R2 received the information from edge router R1 about the reservations for edge router R4 and R5 through interface (a). To determine how much bandwidth is actually reserved on an outgoing interface, the sum of all entries in the table for all edge routers the interface routes to has to be calculated. For example, interface (c) on R2 routes only to one edge router, to R4. Hence, the amount of reserved traffic on interface (c) can be calculated as the sum of all Greis, Albrecht Expires 5/99 [Page 5] INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998 entries for R4, in this case 25 (11 from interface (a) and 14 from interface (b)). The method used to calculate the sum is usually service-specific (e.g. for Controlled-Load or Guaranteed Service) and should be described in the documents defining the service. It will be left to the underlying routing protocol and to periodical routing lookups to determine which interface routes to which edge router. When R2 sends an ADSTAT message to R3, it will only include the information for the edge routers which interface (c) (the outgoing interface for the ADSTAT message) routes to. In this case it is only the 25 for R4, while R3 sends a 31 for R1 and a 22 for R5 to R2. In this example, only one service class was used. It is possible to use the scheme proposed here with several service classes. It will simply be necessary to keep separate admission control status information for each service class and to send separate ADSTAT messages. 3. RSVP Extensions The most important question, which has to be answered before a format for ADSTAT messages and for the necessary extensions to the ADREQ messages can be specified, is how the bandwidth on the path to an edge router can be represented. This information would usually be service-specific, so it would be useful to use the FLOWSPEC object as defined for Controlled-Load and Guaranteed Service (in [5]). It may be argued that some of the information in the FLOWSPEC objects may not be necessary for this purpose and that a new smaller object should be defined to save bandwidth, but using the FLOWSPEC object seems to allow for the highest flexibility. The format for ADSTAT messages is as follows: ::= ::= | Each RSVP_HOP/FLOWSPEC pair describes the bandwidth reserved on the path to an edge router. The RSVP_HOP object contains the address of the edge router which the pair corresponds to. An RSVP object RSVP_EDGE which would only contain the address of an edge router may be defined for to replace the RSVP_HOP object here, but for now this seems like an unnecessary redundancy. Greis, Albrecht Expires 5/99 [Page 6] INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998 An ADREQ message carrying admission control status information would only contain one such RSVP_HOP/FLOWSPEC pair, or in fact, the RSVP_HOP object can be omitted, since the edge router the pair would correspond to is already known: The egress router the ADREQ message is being sent to. However, adding the status information to the ADREQ messages may cause confusion, as they already contain a FLOWSPEC object. The FLOWSPEC object containing the admission control status information should always be the second FLOWSPEC object in an ADREQ message. This means that interior routers can determine if an ADREQ message carries admission control status information simply by checking if it contains a second FLOWSPEC object. 4. Detailed Algorithms There are three important events which can occur at an interior router in an aggregating domain with PBAC: - An ADSTAT message has to be sent - An ADSTAT message is received from an adjacent router - An ADREQ message is received from an adjacent router In this section, we will describe the actions to be taken when these events occur in more detail as a basis for possible implementations. Send ADSTAT message: - For each interface on the router sending the message: - Create a new ADSTAT message - For each edge router the interface routes to: - Add an RSVP_HOP object with the edge router's address to the ADSTAT message - Calculate the sum of all reservations to this edge router - Add a FLOWSPEC object containing the calculated reservation to the message - Send the ADSTAT message - Set a timer to send a new ADSTAT message after a certain time As will be discussed in the next section, it should be kept in mind that an ADSTAT message does not have to contain admission control information for all edge routers. It is possible to modify the above algorithm so that information for a certain subset of the edge routers is sent. ADSTAT message received on the incoming interface (i): - For each RSVP_HOP/FLOWSPEC pair in the message: - Is there already an entry for the edge router with the address stored in the RSVP_HOP object? - No: Create a new entry for this edge router - Modify the entry for this edge router: Replace the old entry for (i) with the data from the FLOWSPEC object - Reset the expiration timer for the modified entry Greis, Albrecht Expires 5/99 [Page 7] INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998 It can be seen from this (and the next) algorithm how the admission control status tables are built dynamically from the received ADSTAT and ADREQ messages. It will not be necessary to configure the interior routers with the addresses of all edge routers in advance. ADREQ message received on an interior router on the incoming interface (i): - Determine which edge router the message is being sent to and store the address in e_addr - Is there already an admission control status entry for e_addr? - No: Create a new entry for this edge router - Reset the expiration timer for the admission control status entry for e_addr and (i) - Does the ADREQ message contain admission control status information? - Yes: Replace the old admission control status information for e_addr and (i) with the information from the ADREQ message and remove the information from the message - Calculate the sum of all reservations for e_addr and store it in the FLOWSPEC object old_status - Determine the outgoing interface (o) for the ADREQ message - Calculate the sum of all reservations for all edge routers (o) routes to and store it in the FLOWSPEC object res_sum - Use res_sum to decide if the flow which corresponds to the ADREQ message can be admitted. - Was admission control successful? - Yes: Modify the admission control status information for e_addr and (i) by adding the FLOWSPEC from the ADREQ message - Determine if admission control status information should be sent with the forwarded ADREQ message - If yes: Add the FLOWSPEC object old_status to the message - Forward the modified ADREQ message towards e_addr In this algorithm the expiration timer for e_addr is always reset, even if the ADREQ message does not contain admission control status information, because the fact that an ADREQ message for e_addr was received shows that the router is still on the route to e_addr, while the purpose of the expiration timer is to let out-dated admission control status expire after routing changes. The decision if admission control status information should be sent with an ADREQ message is based on a set of rules as mentioned in section 2. The only case when status information HAS to be sent with an ADREQ message is when an edge router sends an ADREQ message for the same session twice (i.e. when the first was probably lost, since no corresponding ADREP message was received). Otherwise, the bandwidth for the same reservation might be added to the same entry twice. This also means that it will probably be necessary for Greis, Albrecht Expires 5/99 [Page 8] INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998 interior routers to send status information with forwarded ADREQ messages when the ADREQ message that was received contained status information, or else important information would be lost. 5. Evaluation The overhead necessary for deploying Integrated Services in a network can be split up in three categories: - Classifier and scheduler state - Setup protocol state - Setup protocol messages The state in the classifier and scheduler is the most important potential problem, since each packet has to pass through these two elements. The first and foremost goal of all aggregation schemes would be to reduce the size of the classifier and scheduler state. The size of the setup protocol state is less important, but can still consume a huge amount of memory with lists that have to be maintained, and CPU time which is needed to maintain these lists. The overhead created by setup protocol messages can be a problem in two ways: The routers have to create and send outgoing messages and they have to process and forward incoming messages, both of which means additional CPU load for the router. But the messages may also consume a considerable amount of bandwidth if they are too big, or if they are sent too often. In table 1, we evaluate and compare the classifier/scheduler state size, the setup protocol state size and the message overhead for 'classic' RSVP, for aggregation with measurement-based admission control and for aggregation with parameter-based admission control. It has to be kept in mind that one important factor does not appear in table 1: The possible additional overhead for measuring the amount of traffic for each service class when using aggregation with MBAC. More experience with MBAC algorithms will be needed to determine the importance of this problem, as it will depend to a large extent on the router's capability to perform such measurements. It can be seen from table 1 that for the classifier and scheduler state, aggregation with PBAC has the same advantages over RSVP as aggregation with MBAC, which means that RSVP's most central scalability problem is still solved when aggregation with PBAC is used. The setup protocol state size for PBAC aggregation can be seen to be 'between' the protocol state size for RSVP (where the protocol state can become very large in extreme cases) and for MBAC aggregation (where no protocol state is kept at all on interior Greis, Albrecht Expires 5/99 [Page 9] INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998 routers) for 'small' domains, that means in domains where the result of #I*#E*#S is significantly smaller than the highest expected number of flows. The entries in the admission control status table would not be too big, so it may be possible to maintain admission control status for 1000 or more edge routers. | RSVP | Aggr. with MBAC | Aggr. with PBAC | -----------+------------------+------------------+------------------| Classifier/|Per-Flow |Fixed state. Size |Fixed state. Size | Scheduler |Limited only by |based on the |based on the | State Size |Si/r (*) |number of service |number of service | | |classes. |classes. | -----------+------------------+------------------+------------------| |Per-Flow |Full RSVP state on|Full RSVP state on| | |edge routers, no |edge routers, no | | |state on interior |RSVP state on | Protocol | |routers. |interior routers. | State Size | | |Admission control | | | |state on all | | | |routers. Limited | | | |by #I*#E*#S (*) | -----------+------------------+------------------+------------------| |RSVP messages have|New messages: |New messages: | |to be sent and |ADREQ and ADREP. |ADREQ, ADREP and | Message |processed by all |RSVP messages are |ADSTAT. RSVP | Overhead |routers for all |still sent, but |messages are still| |flows. |interior routers |sent, but interior| | |only send and |routers only send | | |process ADREQ |and process ADREQ | | |messages. |and ADSTAT. | --------------------------------------------------------------------- Table 1: A Comparison of the Overhead for RSVP and Aggregation (*) Explanation of the symbols used in table 1: Si - The sum of the reservable bandwidth on all interfaces r - The smallest possible reservation #I - The number of interfaces on a router #E - The number of edge routers in a domain #S - The number of service classes The biggest limitation for aggregation with PBAC is the message overhead. The ADSTAT messages can grow fairly big. The FLOWSPEC object for Controlled-Load Service is 72 bytes long, an RSVP_HOP object for IPv4 is 12 bytes long. That means that the admission control status information in an ADSTAT message for 100 edge routers Greis, Albrecht Expires 5/99 [Page 10] INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998 would be 8400 bytes long. However, ADSTAT messages could be split up. A router does not have to send the information for all edge routers, but it could send the information in small portions, which would require only a small modification to the algorithm presented in the last section. We believe that the message overhead created by aggregation with parameter-based admission control would be acceptable for domains with a few hundred edge routers. Future research based on network simulations will be necessary to make more exact statements. 6. Security Considerations Security considerations have not been addressed in [1], and they will also not be addressed in this draft. However, it is important to understand that it will be necessary to develop security mechanisms in the future to protect the network especially from corrupted or spoofed ADSTAT messages. 7. Conclusion We have described both the basic idea and several details of a possible scheme for using parameter-based admission control with aggregation for RSVP. In our opinion this adds flexibility to aggregation, especially in smaller domain where the additional overhead is smaller due to a small number of edge routers. At the cost of a somewhat higher overhead (depending on the number of edge routers) as compared to aggregation with MBAC, our scheme gives reservations in aggregating domains the reliability which they would otherwise only receive with RSVP and full per-flow state on all routers. Our technique is not meant to replace aggregation techniques with MBAC, but to allow for greater flexibility. In fact, MBAC and PBAC could coexist in aggregating regions. MBAC could be used for general predictable traffic, while PBAC can be used for traffic with characteristics which are likely to 'disturb' MBAC, like bursty audio or video traffic. Future research and discussions are needed to resolve open issues, especially how multicast sessions and shared reservation styles may fit into our scheme. Greis, Albrecht Expires 5/99 [Page 11] INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998 8. References [1] Berson, S., Vincent, S., "Aggregation of Internet Integrated Services State", (draft-berson-rsvp-aggregation-00), Internet Draft (work in progress), August 1998 [2] Jamin, S., Shenker, S., Danzig, P., "Comparison of Measurement- based Admission Control Algorithms for Controlled Load Service", Infocom 1997 [3] Braden, R., Zhang, L., Berson, S., Herzog, S., Jamin, S., "Resource ReSerVation Protocol (RSVP) -- Version 1 Functional Specification", RFC (Request for Comments) 2205, September 1997 [4] Mankin, A. et al, "Resource ReSerVation Protocol (RSVP) Version 1 Applicability Statement - Some Guidelines on Deployment", RFC (Request for Comments) 2208, September 1997 [5] Wroclawski, J., "The Use of RSVP with IETF Integrated Services", RFC (Request for Comments) 2210, September 1997 Author's Address Marc Greis University of Bonn Institute of Computer Science IV Roemerstr. 164 53117 Bonn Germany Email: greis@cs.uni-bonn.de Markus Albrecht University of Bonn Institute of Computer Science IV Roemerstr. 164 53117 Bonn Germany Email: sukram@cs.uni-bonn.de Greis, Albrecht Expires 5/99 [Page 12]