Internet Draft                                        Robert Hancock
                                                       Eleanor Hepworth
                                                        Andrew McDonald
                                            Siemens/Roke Manor Research
   Document: draft-hancock-nsis-overload-
   00.txt 
   Expires: December 2003                                     June 2003
    
    
          Handling Overload Conditions in the NSIS Protocol Suite 
                                        
Status of this Memo 
    
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026 [1].  
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups. Note that      
   other groups may also distribute working documents as Internet-
   Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time. It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress." 
    
   The list of current Internet-Drafts can be accessed at 
        http://www.ietf.org/ietf/1id-abstracts.txt 
   The list of Internet-Draft Shadow Directories can be accessed at 
        http://www.ietf.org/shadow.html. 
 
Abstract 
 
   The NSIS working group is considering protocols for signaling for 
   resources for a traffic flow along its path in the network. The 
   requirements for such signaling are being developed in [2] and a 
   framework in [3]. The framework describes a 2-layer protocol 
   architecture, with a common lower NSIS 'transport' layer protocol 
   (NTLP) supporting a variety of upper layer NSIS signaling layer 
   protocols (NSLPs). 
    
   It is an open issue where within this architecture to place the 
   responsibility for handling overload conditions. These conditions 
   relate both to overload of the IP layer itself, as well as overload 
   of buffer/processing resources within the NTLP/NSLPs. This note 
   discusses the requirements and the implications of various 
   approaches, and proposes a way forwards. 
    
 
Hancock et al.         Expires - December 2003                [Page 1] 
                       NSIS: Overload Handling               June 2003   
 
Table of Contents 
    
   1. Introduction, Scope and Terminology............................2 
     1.1 Terminology; Flow and Congestion Control ...................3 
   2. Requirements...................................................3 
   3. Implications of Doing Overload Handling within NSIS Protocols..5 
   4. RSVP and Other Protocol Work...................................5 
   5. Handling IP Overload ("Congestion Control")....................6 
   6. Handling NSIS Protocol Overload................................7 
   7. Security Considerations........................................9 
   8. Conclusions....................................................9 
   Acknowledgments..................................................10 
   Author's Addresses...............................................11 
   Full Copyright Statement.........................................11 
    
 
1. Introduction, Scope and Terminology 

   The NSIS working group is considering protocols for signaling for 
   resources for a traffic flow along its path in the network. The 
   requirements for such signaling are being developed in [2] and a 
   framework in [3]. The framework describes a 2-layer protocol 
   architecture, with a common lower NSIS 'transport' layer protocol 
   (NTLP) supporting a variety of upper layer NSIS signaling layer 
   protocols (NSLPs). 
    
   It is an open issue where within this architecture to place the 
   responsibility for handling overload conditions; 'handling' includes 
   detection as well as prevention and recovery. These conditions relate 
   both to overload of the network (IP) layer itself, as well as 
   overload of buffer/processing resources within the NTLP/NSLPs. This 
   note discusses the requirements and the implications of various 
   approaches, and proposes a way forwards. 
    
   These issues have been intermittently discussed on the NSIS mailing 
   list [4], and noted in some of the design-related drafts [5, 6, 7]. 
   [8] provides authoritative guidance specifically on how the problem 
   of congestion should be approached within Internet protocol 
   standards, and includes many important references. 
    
   Note that this draft is specifically not about resource signaling to 
   manage congestion within the network when it actually occurs - for 
   example, traffic engineering to route data flows around congested 
   network areas. This is an important subject, but it is specifically 
   about how resource management should be done, rather than about how 
   signaling protocols should work. This draft includes discussion of 
   how to prevent signaling protocols from adding to the network 
   congestion problem. 
 
 
Hancock et al.         Expires - December 2003                [Page 2] 
                       NSIS: Overload Handling               June 2003   
 
    
   After classifying the various types of signaling overload in section 
   1.1, section 2 describes the potential causes of overload and the 
   (proposed) requirements for how they should be dealt with. Section 3 
   describes the basic implications for protocol design and 
   implementation if they provide overload handling, and section 4 
   briefly mentions how some other protocols related to network 
   operation handle the problem. Section 5 discusses how to handle 
   network (IP layer) overload, and section 6 discusses overload within 
   the NSIS protocol suite itself. Security aspects are briefly 
   mentioned in section 7, and section 8 concludes. 
    
1.1 Terminology; Flow and Congestion Control 

   Unless otherwise stated, this document follows the terminology given 
   in the current NSIS framework [3]. 
    
   The overload problem is actually (at least) three problems: 
   a) Overload in the IP layer, i.e. buffer congestion which causes IP 
   packets to be dropped (affecting all flows, for signaling, data and 
   other applications). 
   b) Overload in the NTLP, meaning it cannot process incoming or 
   outgoing packets fast enough. This might be caused by processor 
   overload or by lower (IP) level congestion. It affects all NSIS 
   signaling applications, but not the rest of the network - assuming 
   (a) is already handled. 
   c) Overload in an NSLP, meaning it cannot process incoming or 
   outgoing packets fast enough. This might be caused by processor 
   overload or by lower (NTLP/IP) level congestion. It affects only this 
   signaling application - assuming that (a) and (b) are already 
   handled. 
    
   Traditionally, networking discussions draw a distinction between 
   congestion control - protecting the infrastructure - and flow control 
   - protecting the end systems. Making this distinction is somewhat 
   subtle in the NSIS case, since the infrastructure includes end 
   systems. For example, overload within the NTLP could be prevented by 
   NTLP-level flow control; however, it would still be seen as 
   equivalent to network congestion by NSLPs, and be invisible to the IP 
   layer (as congestion or anything else). Therefore we work in terms of 
   the more concrete concept of overload within particular protocol 
   layers. No doubt even finer distinctions could be drawn. 
    
2. Requirements 

   This section summarises the potential sources of overload, and just 
   how critical it is to deal with them as part of protocol design. 
    
 
Hancock et al.         Expires - December 2003                [Page 3] 
                       NSIS: Overload Handling               June 2003   
 
   Load/overload could originate from the following causes: 
   NORMAL: 'Normal' operation, as user applications initiate signaling 
   for their flows. (If this actually causes problems, the network or 
   network elements probably just need re-engineering.) 
   RETRY: Aggressive retry behaviour, as end-systems attempt to re-
   signal for failed or failing sessions, i.e. even if the flow itself 
   is not active. (This sort of behaviour is felt to be a real problem 
   in traditional telephony networks, where the worst excesses of such 
   devices are curbed by regulation.) 
   REFRESH: Signaling refresh messages generated within the network may 
   cause overload, if the refresh period is not appropriately chosen. 
   RXMIT: Message retransmission (e.g. to achieve reliability in the 
   face of congestive loss) is itself a potential cause of overload, and 
   particularly worrying as a source of instability, since the 
   retransmissions themselves add to the overload. 
   REPAIR: If there is a path change within the network, local repair 
   actions could cause a flood of signaling traffic over the 
   neighbouring links. 
    
   While the sources of NORMAL and RETRY are end-systems proxies, the 
   others are not. Therefore, it is not possible to rely only on end-to-
   end load control mechanisms, unless the other sources can be 
   discounted. 
    
   While NORMAL and REFRESH are proportional (somehow) to data traffic 
   (and should be a small proportion of it) and hence should not usually 
   be a source of IP-level overload, the others are not. Hence, both 
   signaling element and general network overload should be handled 
   within the protocol design. 
    
   Any of these factors, especially RETRY and REPAIR, can lead to 
   overload within the signaling protocol processing. The consequences 
   of such overload would be reduced responsiveness within the network 
   control plane, dropped signaling state for user sessions, and so on. 
   Modified operation under these circumstances is mainly signaling-
   application specific; however, the signaling applications usually 
   need support at the protocol level to detect the overload condition 
   in the first place. 
    
   In the case where all nodes in the network are NSIS-aware, the IP 
   overload problem essentially becomes a node implementation issue 
   (allocation of forwarding resources on outgoing links). However, a 
   background assumption is that the NSIS protocols need to operate well 
   over large-diameter NSIS-unaware clouds. 
    
   A related issue is that causes REFRESH and REPAIR are mainly about 
   signaling generated in support of particular signaling applications, 
   rather than 'protocol maintenance' signaling. This is therefore 
 
 
Hancock et al.         Expires - December 2003                [Page 4] 
                       NSIS: Overload Handling               June 2003   
 
   generated only at NSLP-aware nodes. (This is a consequence of the 
   design decision that the NTLP only handles message forwarding, not 
   state maintenance, and therefore cannot for example generate a flood 
   of signaling application messages on a rerouting event.) 
    
   While NSLP/NTLP overload failures are problems which are 'local' to 
   the NSIS activity, there is no point in even attempting to 
   standardise protocols which can contribute to network congestion (IP 
   overload) in an uncontrolled way (see the warnings in [9]). 
    
   The conclusion of this section is that overload both within the NSIS 
   protocols and IP layer needs to be handled with the NSIS protocol 
   designs, the latter with particular attention to robustness. 
    
3. Implications of Doing Overload Handling within NSIS Protocols 

   Overload handling generally implies having a feedback channel to 
   complement the forward channel which carries the 'overload 
   generating' traffic. The nodes at each end of the feedback channel 
   have to be sensitive to the presence of the overload and be able to 
   reduce it; generally, the closer to the location of the overload the 
   better (e.g. end-to-end mechanisms will be inefficient at dealing 
   with a local overload caused by a rerouting event). 
    
   The implication of this is that an NSIS protocol that purports to 
   deal with overloads has to be bi-directional, and have state 
   information at each end which tracks the current load situation. The 
   more direct the feedback in the reverse direction the better. 
    
   Overload protection mechanisms are often associated with reliability 
   mechanisms, but they don't have to be (e.g. DCCP [10]); they can be 
   considered independently. Indeed, there may be a case for 
   unreliability within the protocol (e.g. to delete aged messages), 
   even though overload control is still needed. 
    
   Avoidance of congestion (IP overload) generally has to be done by 
   tracking packet drops at NSIS-unaware nodes. The mechanisms can vary 
   from very simple to very complex. At one extreme, a simple stop-and-
   wait protocol will work; at the other end, the full (and growing) 
   sophistication of TCP can be used. More sophistication is needed as 
   the network length of the feedback channel and the desired throughput 
   performance increase. This may be a situation where there is a case 
   for different protocol options in different parts of the network. 
    
4. RSVP and Other Protocol Work 

   The base RSVP protocol as defined in [11] includes very limited 
   overload detection and management capabilities. The main aspect is 
 
 
Hancock et al.         Expires - December 2003                [Page 5] 
                       NSIS: Overload Handling               June 2003   
 
   the fact that refresh intervals can be locally adjusted, but this 
   just allows management intervention rather than being an adaptive 
   mechanism within the protocol itself. RSVP extensions for reliability 
   were introduced in [12], accompanied by an exponential backoff 
   procedure to address overload cause RXMIT. 
    
   Most end-to-end application protocols, subject to causes NORMAL and 
   RETRY, handle the overload control problem either by using TCP/SCTP 
   as transports, or with a variety of ad hoc application level 
   techniques applied over UDP. 
    
   Within the network, the protocols which could be victims of causes 
   REFRESH, RXMIT and REPAIR are non-trivial routing protocols. The most 
   serious potential overload cause is a flood of routing messages as a 
   new link is brought up. Here, OSPF uses a simple stop-and-wait 
   protocol, while BGP uses TCP. The situation for the NSIS protocols is 
   more severe, since the situation arises for any re-routing event 
   (even one caused by link changes in a remote part of the network), 
   and affects links which are already supposedly operational. 
    
   In the Diameter Base protocol, which uses TCP/SCTP as a transport, 
   higher layer overload is managed on a per-peer-connection basis by 
   the explicit signaling of "busy" indications to the originating peer 
   and the termination of the connection. The originating peer has the 
   option to switch to an alternative next hop (load sharing), which is 
   not possible within NSIS because the signaling has to be coupled to 
   the data path. 
    
5. Handling IP Overload ("Congestion Control") 

   If NTLP can generate its own messages for any of causes REFRESH, 
   RXMIT or REPAIR, then it has to do so in a way which cannot cause IP 
   layer overload; there is no other option. If this is the case, it 
   would seem to make sense to rely on the same mechanism (whatever it 
   is) to protect the IP layer from all NSIS overload causes. 
    
   However, whether the NTLP generates such messages depends on other 
   aspects of NTLP design and other decisions about NTLP functionality. 
   One could imagine a situation where a very lightweight NTLP had no 
   intelligence to generate messages independently of NSLP operation, in 
   which case protection responsibility could be pushed up to the 
   individual NSLPs. We can't tell whether this argument applies or not 
   without more detail about the proposed NTLP design. 
    
   Therefore, the question remains of whether it is sensible to allocate 
   the problem to the NTLP in any case. The following arguments would 
   seem to apply: 

 
Hancock et al.         Expires - December 2003                [Page 6] 
                       NSIS: Overload Handling               June 2003   
 
   *) There is no need for different sorts of congestion control for 
   different signaling applications. (There may be different detailed 
   reactions to congestion, i.e. how to generate fewer messages; 
   however, detecting that fewer messages need to be sent is universal 
   across all signaling applications.) Therefore, there is no need to 
   solve this in a signaling-application sensitive manner. 
   *) Detecting the problem may be easier with closer interaction with 
   the lower layers. The NTLP is best placed to do this. 
   *) Solving the problem is hard and important. Therefore, it is better 
   to do it once and for all, and make life less burdensome for future 
   NSLP developers. 
    
   The conclusion of this set of arguments appears to be that congestion 
   control, i.e. protection of the IP layer from overloads caused by 
   NSIS protocol operation, should be an NTLP function.  
    
6. Handling NSIS Protocol Overload 

   The other question is related to handling overloads within the NSIS 
   protocol layers themselves, i.e. when the internal resource of the 
   NEs are constrained. It is clear that the NSLP should be in charge of 
   adapting its own behaviour in response to overload situations, since 
   the response will be specific to the signaling application. However, 
   the method of detection and response depends on what overload 
   detection and control features the NTLP provides, and what 
   assumptions the NSLP can make about their presence (especially in 
   remote nodes). Therefore, this section aims to identify the different 
   options for how overload indications can be pushed up the protocol 
   stack and/or out to the edge of the network (where the adaptation can 
   take place) and how in particular the NTLP should support this. 
    
   If the conclusion of section 5 is correct (i.e. NTLP enforcing IP 
   layer congestion control), it is most likely that in any case there 
   should be a flow-controlling API between the NSIS protocol layers. 
    
   For providing overload indications towards the edge nodes, there seem 
   to be three cases to consider. The argument depends on whether there 
   are intermediate nodes which are unaware of the NSLPs in use (see 
   Figure 1). 
    
   1) The NTLP provides the equivalent of a highly granular flow 
   controlled delivery service up to the next NSLP-aware node, with no 
   assumed constraints on NSLP behaviour. The source is explicitly 
   forced to throttle back the transmission of messages for the 
   combination of source/destination/application. The NSLP only has to 
   detect the condition locally; in fact, it can only send messages 
   which the local NTLP is prepared to deliver. This makes life very 

 
Hancock et al.         Expires - December 2003                [Page 7] 
                       NSIS: Overload Handling               June 2003   
 
   easy for the NSLP, but NTLP design (in particular, buffer allocation 
   and propagation of flow control information across nodes) is hard. 
    
                                                +------+ 
                                                |  NE3 | 
                                                |+----+| 
                                                ||NSLP|| 
                                                |+----+| 
               +------+    +------+             |  ||  | 
               |  NE1 |    |  NE2 |             |+----+| 
               |+----+|    |      |      |======||NTLP||=== 
               ||NSLP||    |      |      |      |+----+| 
               |+----+|    |      |      |      +------+ 
               |  ||  |    |      |      | 
               |+----+|    |+----+|  +------+   +------+ 
           ====||NTLP||====||NTLP||==|Router|   |  NE4 | 
               |+----+|    |+----+|  +------+   |+----+| 
               +------+    +------+      |      ||NSLP|| 
                                         |      |+----+| 
                                         |      |  ||  | 
                                         |      |+----+| 
                                         |======||NTLP||==== 
                                                |+----+| 
                                                +------+ 
    
                  Figure 1: Signaling with NTLP-only hops 
    
   2) The NTLP provides a flow controlled delivery service (as above), 
   but operates under assumptions about upper layer sending windows 
   which allow buffer management to be simplified. For example, if only 
   one message is allowed to be outstanding for a particular session at 
   any time, the buffer requirements can be precisely calculated. 
   3) The NTLP simply provides the service of delivery to the next NTLP 
   node, e.g. NE1->NE2, NE2->NE3 in the figure. Overload at an NSLP-
   unaware intermediate node (NE2) is handled by dropping packets there 
   (or, more sophisticated but still IP-like behaviour). The NSLPs in 
   NE1 and NE3 have to detect this condition and somehow adapt 
   accordingly (in particular, NE1 has to be able to detect that NE3 is 
   overloaded but that NE4 may not be). 
    
   Solutions (1) and (2) are both flow-control based, and require the 
   maintenance of per-source-destination information in order to support 
   flow control properly. For example, in figure 1, the NTLP at NE2 
   would have to detect overload for the signaling application at NE3 
   and throttle signaling messages for it from NE1, while not affecting 
   NE1->NE2->NE4 communications. In addition, these solutions put 
   complexity into the NTLP, and might infect it with knowledge about 
   signaling flow topologies which it should really be ignorant of. 
 
 
Hancock et al.         Expires - December 2003                [Page 8] 
                       NSIS: Overload Handling               June 2003   
 
    
   Solution (3) puts some complexity into the NSLP behaviour which could 
   be common to several applications; on the other hand, the flexibility 
   to do it differently between different applications could be 
   valuable. This option does not preclude the NTLP from doing flow 
   control, but it does place a requirement on the NSLP to cope with 
   lost messages at least as pathological events (although this would 
   have to be the case anyway, e.g. to cope with intermediate node 
   failure). 
    
   Note that these problems are mainly caused by the NSLP-unaware node, 
   NE2, and the fact that the NTLP cannot bypass it. In contrast, for 
   direct communication (e.g. NE3<->NE4) it would be very easy to 
   implement solution (1). Flow-controlling solutions are also 
   attractive because they can minimize the buffering taking place 
   within the network and hence improve responsiveness. 
    
   The conclusion of this argument appears to be that (3) is the 
   preferred approach. This conclusion is mainly driven by complexity 
   arguments about the NTLP, and the existence of NSLP-unaware nodes; if 
   both of these arguments could be dealt with, the conclusion might 
   well be the opposite way around. 
    
7. Security Considerations 

   Malicious nodes can attack congestion control mechanisms to force 
   nodes into a congestion avoidance state. The NTLP design should 
   protect against this type of attack where the network is open to it. 
   Also, both NSIS overload protection approaches have to make some 
   assumptions about fairness at the NTLP level; however, this seems to 
   be unavoidable. 
    
8. Conclusions 

   1. The NTLP needs to prevent network overload in the IP layer between 
   NTLP peers. 
   2. However, NSLPs need to detect and adapt to overload within the 
   NSIS protocols themselves. 
   3. Detection may take place by noting messages dropped by the NTLP, 
   as well as any flow control imposed by the NTLP. 
    
   References 
                     
   1  Bradner, S., "The Internet Standards Process -- Revision 3", BCP 
      9, RFC 2026, October 1996. 
    

Hancock et al.         Expires - December 2003                [Page 9] 
                       NSIS: Overload Handling               June 2003   
 
                                                                         
   2  Brunner, M., "Requirements for QoS Signaling Protocols", draft-
      ietf-nsis-req-07.txt (work in progress), March 2003 
    
   3  Freytsis, I., R. E. Hancock, G. Karagiannis, J. Loughney, S. van 
      den Bosch, "Next Steps in Signaling: Framework", draft-ietf-nsis-
      fw-02.txt (work in progress), March 2003 
    
   4  Archive at: www.ietf.org/mail-archive/working-groups/nsis/ 
    
   5  Braden, R. and B. Lindell, "A Two-Level Architecture for Internet 
      Signaling", draft-braden-2level-signal-arch-01.txt (work in 
      progress), November 2002 
    
   6  Schulzrinne, H., H. Tschofenig, X. Fu, A. McDonald, "CASP - Cross-
      Application Signaling Protocol", draft-schulzrinne-nsis-casp-
      01.txt (work in progress), March 2003 
    
   7  McDonald, A., R. Hancock, E. Hepworth, "Design Considerations for 
      an NSIS Transport Layer Protocol", draft-mcdonald-nsis-ntlp-
      considerations-00.txt (work in progress), January 2003 
    
   8  Floyd, S., "Congestion Control Principles", RFC 2914, September 
      2000 
    
   9  http://www.ietf.org/ID-nits.html 
    
   10 http://www.ietf.org/html.charters/dccp-charter.html 
    
   11 Braden, R. et al., "Resource ReSerVation Protocol (RSVP) --  
      Version 1 Functional Specification", RFC 2205, September 1997 
    
   12 Berger, L., Gan, D., Swallow, G., Pan, P., Tommasi, F. and S. 
      Molendini, "RSVP Refresh Overhead Reduction Extensions", RFC 2961, 
      April 2001 
 

Acknowledgments 

   The authors would like to thank all their colleagues and fellow 
   participants in the NSIS working group and internal protocol 
   discussions for exposing the complexities and subtleties in this 
   subject area. In particular, input was used from (in order of 
   CRC{name}) Henning Schulzrinne, Xiaoming Fu, John Loughney, Melinda 
   Shore, Hannes Tschofenig, Georgios Karagiannis, Ping Pan, Bob Braden, 
   Sven Van den Bosch, Lars Westberg, Marcus Brunner, and Ruediger Geib. 
   Henning in particular provided valuable education on flow control in 

 
Hancock et al.         Expires - December 2003               [Page 10] 
                       NSIS: Overload Handling               June 2003   
 
   signaling protocols. Needless to say, the interpretation and 
   conclusions should be blamed only on the authors. 
 
Author's Addresses 

   {Robert Hancock, Eleanor Hepworth, Andrew McDonald} 
   Roke Manor Research 
   Old Salisbury Lane 
   Romsey, Hampshire 
   SO51 0ZN 
   United Kingdom 
   email: {robert.hancock|eleanor.hepworth|andrew.mcdonald}@roke.co.uk 
    
Full Copyright Statement 
 
   Copyright (C) The Internet Society (2003). All Rights Reserved. This 
   document and translations of it may be copied and furnished to 
   others, and derivative works that comment on or otherwise explain it 
   or assist in its implementation may be prepared, copied, published 
   and distributed, in whole or in part, without restriction of any 
   kind, provided that the above copyright notice and this paragraph are 
   included on all such copies and derivative works. However, this 
   document itself may not be modified in any way, such as by removing 
   the copyright notice or references to the Internet Society or other 
   Internet organizations, except as needed for the purpose of 
   developing Internet standards in which case the procedures for 
   copyrights defined in the Internet Standards process must be 
   followed, or as required to translate it into languages other than 
   English. 
    
   The limited permissions granted above are perpetual and will not be 
   revoked by the Internet Society or its successors or assigns. This 
   document and the information contained herein is provided on an "AS 
   IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK 
   FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT 
   LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL 
   NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY 
   OR FITNESS FOR A PARTICULAR PURPOSE. 


Hancock et al.         Expires - December 2003               [Page 11]