Network Working Group (Editor)Srikanth Chavali INTERNET DRAFT Vasile Radoaca Expiration Date: October 2004 Nortel Networks, Inc. Mo Miri BellSouth Luyuan Fang AT&T (Editor)Susan Hares NextHop Technologies April 2004 Peer Prefix Limits Exchange in BGP draft-chavali-bgp-prefixlimit-01.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document proposes a mechanism to allow BGP peers to coordinate the setting of a limit on the number of prefixes which one BGP speaker will send to its peer. Coordination can prevent disruption of the peering session or discarding of routes, which can occur when a maximum prefix limit is configured on the "receiving" peer, and the "sending" peer exceeds the limit. 1. Terms The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. In this document we use the term "BGP sender" to refer to a BGP speaker which is advertising prefixes to its peer. We use the term "BGP receiver" to refer to a BGP speaker which is receiving prefixes from its peer. Although it is clear that in reality each peer is usually both a "BGP sender" and a "BGP receiver", we emphasize a unidirectional relationship in this document for clarity. 2. Introduction There are many scenarios where BGP [BGP-4] peering may be established between two speakers in which there is an expectation that some limited number of prefixes will be announced by a given speaker. Section 6 describes these secnarios. Several implementations of BGP offer a configuration option that allows a BGP receiver to provision a limit to the number of prefixes it will accept from a specific peer. When the limit is exceeded, then there are generally two options: the prefixes exceeding the limit can be dropped by the BGP receiver, or the peering session may be terminated by the BGP receiver and restarted at a later time. Neither of these options is desirable. Dropping prefixes leads to network unreliability, since the dropped prefixes will be unreachable through the BGP receiver. Terminating the BGP session is probably worse, since all traffic between the peers will typically be disrupted, even for those prefixes which were advertised before the limit was reached. In many cases, the result of not limiting the number of received BGP prefixes can be much worse than either case just mentioned. If the BGP receiver becomes overloaded, it can fail and affect many or all of its peers. The effects of the disruptions caused by lost peering sessions and device failures propagate through the Internet, leading to instability as described in detail in [BGP-STUDY]. Other undesirable effects include resource utilization on the peers from restarting the peering session, and the processing load and bandwidth utilization from withdrawing and re-advertising the prefixes throughout the Internet. Other issues arising out of this are described in section 6. The disruption may be due to network changes, misconfigurations, miscommunications, or other factors where the number of prefixes advertised from a BGP sender to the receiver exceeds the expected number, and the configurations must be revised. It may be due to a specific configuration which is functioning properly in order to prevent an overload condition, or it may occur when the receiving BGP speaker becomes overloaded and suffers various consequences. Two newer sources of additional route overload are: Virtual Private Networks (VPN) services and denial of service (DOS) attacks. A denial of service attacks which send additional more specific routes to a bgp speaker can overload the routing table. In VPNs, a sudden increase in routes may be a true addition of routes or a misconfiguration, or a Denial of Service attack. A basic functionality is proposed here for BGP speakers to exchange three prefix limits per AFI/SAFI pair: warning, stop receiving, and disconnect limits. BGP [BGP-4] peers coordinate several types of information sent via the CAPABILITIES listed in the OPEN message or the capabilities sent via the CAPABILITY message (Dynamic capabilities). The BGP peers negotiate routes that will be sent in Route Refreshes via the Outbound Route Filters (ORFs). This draft proposes: 1. OPEN message with BGP Capability message [BGP_CAP] to carry the proposed parameter. 2. A new Route Filter type for the OutBound Route Filter community or Extended community [ORF,ASPATH ORF]. 3. Definition of Prefix limit Prefix limit is encoded as an optional capability parameter [BGP-CAP] in the BGP OPEN message [BGP-4]. In addition, for dynamic re- adjustment of these capabilities the Prefix limit TLV can be included in: - Dynamic Capability negotation (described in section 3.1), - ORF of Type Prefix (described in section 3.2), and - error messages related to Dynamic capabilities (section 5.3), or CEASE codes (section 5.4). If multiple of these features specify maximum prefix, the precedence of the usage is: dynamic capability, ORF, Inform, and Soft Notifty. By precedence we indicate that the Dynamic capability negotiation takes priority over the other mechanisms. The required fields in the Maximum Prefix TLV are sub-code 1 through sub-code 3 which MUST be present in the Maximum Prefix TLV. All optional fields MAY be present in the Maxmimum Prefix TLV. 3.1 Layout of Bytes 0 7 15 +---------+---------+ |code |length | | | | |1 octet | 1 octet | +----------+--------+ 0 7 15 23 +---------+---------+----------+ | AFI | SAFI | | | | |2 octets | 1 octet | +---------+---------+----------+ 0 7 15 23 55 +---------+---------+----------+----------+ |sub | |warning |warning | |code 1 | length |indicator |prefix | |[Warn] | | |limit | | 1 octet | 1 octet | 1 octet | 4 octets | +---------+-------+------------+----------+ 0 7 15 23 55 +---------+---------+----------+-----------+ |sub | |stop |stop adver-| |code 2 | length |advertise-|tisement | |[stop] | |ment |prefix | | | |action |limit | | 1 octet | 1 octet |1 octet | 4 octets | +---------+-------+------------+-----------+ 0 7 15 23 55 +---------+---------+----------+-----------+ |sub | |reset |reset peer-| |code 3 | length |peering |ing prefix | | | |action |limit | | 1 octet | 1 octet | 1 octet | 4 octets | +---------+-------+------------+-----------+ 0 7 15 +---------+---------+ | option length | | | | 2 octets | +---------+---------+ 0 7 15 47 +---------+----------+-----------+ |sub | |current Rx | |code 4 |length |routes | |[CurRX] | | | | 1 octet | 1 octet | 4 octets | +---------+----------+-----------+ 0 7 15 47 +---------+----------+-----------+ |sub | |current Tx | |code 5 |length |routes | |[CurTX] | | | | 1 octet | 1 octet | 4 octets | +---------+----------+-----------+ 0 7 15 +---------+-------+ |sub | | |code 6 |length | |[pfxln] | | |1 octet |1 octet| +---------+-------+ 0 7 15 47 59 91 +-------+--------+----------+-----------+---------+ |prefix |action |warning |stop adver-|reset | |length |flags |indicator |tisement |peering | | 1 |for |limit for |limit for |limit for| | |limits |prefix |prefix |prefix | | |for |length-1 |length-1 |length-1 | | |prefix | | | | | |length-1| | | | |1 octet|1 octet |4 octets |4 octets |4 octets | +---------+-------+--------+-------+--------+-----+ . . . . +-------+--------+----------+-----------+---------+ |prefix |action |warning |stop adver-|reset | |length |flags |indicator |tisement |peering | | n |for |limit for |limit for |limit for| | |limits |prefix |prefix |prefix | | |for |length-n |length-n |length-n | | |prefix | | | | | |length-n| | | | |1 octet|1 octet |4 octets |4 octets |4 octets | +---------+-------+--------+-------+--------+-----+ 0 7 15 +-------+-------+ |sub | | |code 7 |length | |[orfmx]| | +-------+-------+ 0 7 39 71 103 111 +----------+----------+-----------+------------+---------+------+ | action |warning |stop Adver-|reset peer- |ORF type |ORF | | flags |indicator |tisement |ing | |Info | |for ORF |prefix |prefix |prefix | | | |match |limit for |limit for |limit for | | | | |ORF match |ORF Match |ORF match | | | | | | | | | | | |(4 octets)|(4 octets) |(4 octets) |(1 octet)| | +----------+----------+-----------+------------+---------+------+ 3.2 Byte definitions Meaning for each of the bitwise indicated capability fields above is as follows: Type-Code (1 octet): code identifying this capability (TBD) Length (1 octet): The required portion of the Prefix limit TLV is 28 octets and includes type-code, length, sub-codes 1-2, and the optional length. The optional length of the prefix limit TLV is variable based on the information. If the length exceeds, 254 octets, the length byte is set to 255 and the length is determined by the 28 plus the number of octets in the optional length field. Address Family Identifier AFI (2 octets): This along with the Subsequent Address Family Indentifier field identifies the Network Layer Protocol associated with the Network Address. Subsequent Address Family Identifier SAFI (1 octet): This along with the Address Family Identifier field identifies the Network Layer Protocol associated with the Network Address. sub code 1 (1 octet): It is used to identify the number of routes sent before raising warning. This is done by the BGP speaker that detects it. Warning Indicator (1 octet): This octet can be assigned a value of 0, 1 or 2. A value of 0 means that the sender SHOULD NOT raise any warning. The warning mechanisms are described in the operation section of this draft. A value of 1 means the warning indication is necessary and SHOULD be used by the sender when its route advertisement equals the number of sent routes. If a BGP information messages is supported (such as the BGP INFORM), a 2 value indicates that such a BGP message will be transmitted to the remote peer if the route advertisement limit is hit. Warning prefix limit (4 octet): Number of routes sent by the BGP sender. The value for this field is dependent on the maximum prefix limit and SHOULD be always less than it. sub code 2 (1 octet): It is used to identify the number of routes sent before the sender BGP speaker needs to stop advertising routes to its receiving BGP speaker. Stop Advertisement action (1 octet): This octet can be set to 0, 1 or 2. Setting the value to 0, means the bgp speaker will ignore any routes sent after the stop advertisment limit. Setting the bits to 1 means that the route advertisement MUST be stopped by the speaker when the route advertisement limit is hit. It is implicit that whichever speaker encounters the situation will stop advertisement to its peer. If a BGP information messages is supported (such as the BGP INFORM), a 2 value indicates that such a BGP message will be transmitted to the remote peer if the route advertisement limit is hit. maximum prefix limit (4 octet): Number of routes sent by the sender BGP speaker. sub code 3 (1 octet): It is used to identify the number of routes received after which the BGP speaker will reset the peering session. It MUST be noted here that this situation will never be encountered if adhered to the draft. In other words this happens only during error conditions. The error conditions are beyond the scope of this document. reset peering action (1 octet): This field can be set to 0, 1 or 2. If the field is zero, the BGP speaker will reset the peering session if the route sent to the peer exceeds the reset prefix limit. If the field is 1, the BGP peer will reset the peering session and hold it down until a manual restart occurs. If the field is 2, the BGP peer will reset the peering session via mechanisms such as soft-notify. reset prefix limit (4 octet): Number of routes sent by the sender BGP speaker. The value for this field is dependent on the maximum prefix limit and SHOULD be always greater than it. optional parameter length (1 octet): The value of this optional variable length is 13 octets plus the additional 29 bits of reserve field. This value can change when more sub codes are added. sub code 4: The BGP speaker uses this sub-code to indicate to its peer the current count of the routes it receieved from it. current Rx routes: Number of routes received by the BGP speaker from its peer. The value of this field SHOULD always be less than or equal to the maximum prefix limit configured to receive from the peer. sub code 5: The BGP speaker uses this sub-code to indicate to its peer the current count of the routes sent to it. current Tx routes: Number of routes sent by the BGP speaker to its peer. The value of this field SHOULD always be less than or equal to the maximum prefix limit it receieved from the peer in the capability. sub code 6: The BGP speakers use this sub-code to indicate a prefix-length based set of limits: (warning limit, stop advertisement limit, and reset limit). The field carries an action flag that indicates actions that occur for all prefixes that hit limits, and the limits per length of the prefix. An example of a length of a prefix is length 19 for all /19 routes. All /19 routes will have a warning limit, a stop advertisement limit and a reset limit. Only 1 sub-code 6 parameter may be in Prefix limit TLV. prefix length-1: The length (in bits) of the prefix group. action-flags for prefix -1: The action flag octet carries the set of action flags for all prefix in the following bit pattern 0x00WWSSRR The WW bits can be set with the warning indicator values (0,1,2) indicated in sub-code 1. The SS bits can be set the stop advertisement action values (0,1,2) indicated in sub-code 2. The RR bits can be set to the rest action values (0,1,2) indicated in sub- code 3. warning prefix limit for prefix length-1: The warning limit for the prefix length-1. stop advertisement limit for prefix length-1: The stop advertisement prefix limit for prefix length-1. reset peering limit for prefix length: The reset peering route limit for the prefix of length-1. sub code 7: Sub-code 7 allows the 3 basic prefix limits for set of prefixes matching the ORFs. Multiple sub-code 7 TLVs may be in a Prefix TLV. Action flags for ORF: The action flag definitions are the same as for the action-flag for sub-code 6 (prefix length). warning indicator prefix limit for ORF match: The warning indicator prefix limit for any prefix that match the ORF filter. stop advertisement prefix limit for ORF match: The stop advertisement prefix limit for any prefix that matches the ORF filter. reset peering prefix limit for ORF Match: The stop peering prefix limit for any prefix that matches the ORF filter. We refer to the warning prefix limit, maximum prefix limit and the reset prefix limit as prefix limits in this document for the ease of illustration. 3.3. Carrying Prefix limits in the Open Capabilities The BGP OPEN capabilities field uses the following triples: triples , where each triple is encoded as shown below: +------------------------------+ | Capability Code (1 octet) | +------------------------------+ | Capability Length (1 octet) | +------------------------------+ | Capability Value (variable) | +------------------------------+ The BGP Maximum Prefix Capability value to be assigned by IANA. 3.4. Interaction between sub-codes 6-7 and sub-codes 1-3 Within the TLV, if sub-code 6 or sub-code 7 are specified, these cannot specify the 0/0 prefix length or an ORF match that matches all routes. 3.5. Carrying Maximum Prefix Limits the the Dynamic Open Capabilities The BGP Dynamic Capabilities is carried in the Capability message (Message type 6), and uses the following fields: +------------------------------+ | Action (1 octet) | +------------------------------+ | Capability Code (1 octet) | +------------------------------+ | Capability Length (1 octet) | +------------------------------+ | Capability Value (variable) | +------------------------------+ Action code of "0" in a dynamic capability adds the maximum preifx limits specified in the TLV for the corresponding AFI/SAFI. The Action code of "1" removes the prefix limits for a particular AFI/SAFI. An Action Code of "0" followed by an action code of "0" writes over the required fields, and provides an exclusive OR of the optional fields. 3.6. Carrying Maximum Prefix in ORF Match Field in BGP Route Refresh +--------------------------------------------------+ | Address Family Identifier (2 octets) | +--------------------------------------------------+ | Reserved (1 octet) | +--------------------------------------------------+ | Subsequent Address Family Identifier (1 octet) | +--------------------------------------------------+ | When-to-refresh (1 octet) | +--------------------------------------------------+ | ORF Type = Maximum Prefix (08) | +--------------------------------------------------+ | Length of ORFs (2 octets) | +--------------------------------------------------+ | First Maximum Prefix ORF sub-code (TLV 1-7) | +--------------------------------------------------+ +--------------------------------------------------+ | Second Maximum Prefix ORF sub-code (TLV 1-7) | +--------------------------------------------------+ ... +--------------------------------------------------+ | Nth Maximum Prefix ORF sub-code (TLV 1-7) | +--------------------------------------------------+ ORF entries are carried in the BGP ROUTE-REFRESH message [BGP-RR]. A single ROUTE-REFRESH message could carry multiple ORF entries, as long as all these entries share the same AFI/SAFI. From the encoding point of view each ORF entry consists of a common part and type-specific part. The common part consists of . The "When-to-refresh" field in the route can be one of IMMEDIATE (0x01) or DEFER (0x02), the semantics and operation of which are described in [BGP-CRF]. Following this field is a collection of one or more ORFs, grouped by ORF-Type. The Maximum Prefix ORF type ORF field can be intermixed with other ORF fields. If the ORF field is specific to the Maximum Prefix field, the ORF (sub-code 7) should be utilized to specify the ORF field. The ORF-Type component is encoded as a one-octet field. The value 0 is reserved. The values currently proposed to be assigned are: 1. reserved (00) 2. Community (02) 3. Extended Community (03) 4. AsPath (xx) 5. Prefix (64) 6. Maximum Prefix (08) 3.7. Carrying the Maximum Prefix in a Soft Notify [BGP-SOFT-NOTIFY] 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | AFI | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SAFI | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type-code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sub-code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Variable Data TLV | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The type code of 3 will indicate that a prefix maximum has been exceeded. The sub-code will indicate which type of prefix maximum has been exceeded. The value of <1> will indicate a warning prefix maximum, the value of <2> will indicate that a stop advertisement prefix maximum has been exceeded, and the value of <0> will indicate that a reset peering advertistement has been exceeded. The length specifies the length of the optional portion of the soft- notify. The variable portion of the soft-notify SHOULD contain the required fields of the Maximum prefix field. The variable Data TLV MAY contain the fields of the optional fields. 4. Operation 4.1 Exchanging the configured prefix limits BGP speakers exchange the prefix limits as an optional capability parameter [BGP-CAP] as described in section 3. +--------+ +--------+ | A | <-----------------> | B | +--------+ +--------+ Figure 1 In figure 1 both BGP speakers A and B exchange the prefix limits to indicate the support for this capability. Each of A and B set the warning prefix limit, maximum prefix limit and reset prefix limit along with the actions associated with each of them in the capability message before exchanging them. The warning prefix limit and reset limit values are determined based on the configured maximum prefix limit. They are typically a percentage value of the maximum prefix limit. The exact percentage values are beyond the scope of this document. The maximum prefix limit configured on A for the peer B implies the maximum number of prefixes that A expects to receive from B. B informs this in the new capability described in section 4. The same interpretation applies to B too. 4.2 Dynamic Capability Reset of the Capability Dynamic Capabilities can set the BGP speakers maximum prefix values (warning indicator, stop advertisement, and reset peering values) to different values that initially negotiated via the OPEN Capabilities. The exact mechanisms for the decision to reset the values are outside the scope of this specification. Figure 2 indicates how the dynamic capability can be utilized when a prefix limit is detected by the BGP speaker. +--------+ +--------+ | A | <-----------------> | B | +--------+ +--------+ B detects warning prefix limit <------ generates dynamic capability message to A Figure 2 4.2.1 Dynamic Capability use of Sub-code 4 (Current Received Route) and Sub-Code 5 (Current Transmit Routes) Sub-codes 4 (Current Received Routes) and sub-code 5(current Transmit routes) provides information to the BGP speakers which aids in preventing peer disruption. Figure 2 demonstrates the case where BGP speaker A and B maintain a count of the routes they receive from each other. Route processing operation is illustrated using the case where B sends route advertisements to A. (The same operational procedures apply for the other case of A sending route advertisements to B.) B, as shown in figure 1, applies the out bound route policies on the Adjacent-Rib-Out followed by the condition of the prefix limits before route advertisements. Upon hitting the the warning indicator prefix limit, BGP speaker B sends the Dynamic Capability messages to A with 5 sub-codes: warning indicator (sub-code1), stop advertisement (sub-code 2), reset peering indicator (sub-code 3), Current Receive Routes (sub-code 4), Current Transmit routes (sub-code 5). The additional sub-codes i.e 4 and 5, provide information that assists the network administrators in prioritizing the handling of the warning. For example, if the limits are 1000 routes for warning, 2000 for stop advertisement, and 3000 for reset peering and the current routes are 1010. Then, it can be deduced by the network operator that the received routes are well within the tolerance limit i.e sub-code 2. If instead for the same limits (1000,2000,3000), the current received routes (by speaker B) is 1900, the network operator may want to investigate the customer changes. In figure 2 it can be seen in due course of route advertisements to A, B generates a dynamic capability [BGP-DYN-CAP] destined to A comprising of the sub-codes 1-5. The reason B sends this message in this case is that it detects the warning limit at the time of route advertisements earlier than A. In other words either A or B or both of them could generate this message depending on timing of warning limit detection. B and A MAY choose to raise internal warning when this condition is detected. Following the warnings both A and B continue advertising routes normally to each other. If B determines that the prefix limits can be increased, BGP speaker MAY send these changed values in the Dynamic capability alongwith sub-codes 1-5. In figure 3, B during route advertisement detects that the maximum prefix limit for route advertisement is reached. It SHOULD stop further route advertisements to A. In other words in this condition it SHOULD implicitly mean to B that the announce policy to A is stop/deny. B then SHOULD send a Dynamic Capability [BGP-DYN-CAP] to A indicating the current Receive and Transmit routes (sub-code 4 and Sub-cod 5). As in the case of warning prefix limit condition either A or B or both could send dynamic capability [BGP-DYN-CAP]. Any route withdrawal to A is automatically recorded and SHOULD result in restoring the announce policy to the configured one (if any configured) implicitly. This helps in, preserving the incremental nature of the protocol and avoiding processing of routes by peers such as B, which get discarded by speakers such as A when the limit is reached. In addition to these network bandwidth consumption by the route UPDATES can be avoided. It is expected that conformance to this document will not lead to any further route advertisements to A by B unless there exists an unforseen error. Under such situation A can reset the peering session as indicated in the maximum prefix limit to B during the capability negotiation. +--------+ +--------+ | A | <-----------------> | B | +--------+ +--------+ B detects stop advertisement maximum prefix limit and generates dynamic <------ capability message to provide additional information. Figure 3 4.2.2 Prefix limit changes Utilizing Dynamic Capabilities If a need for prefix limits change arises, each BGP speaker A whose configuration changes for its peer B, SHOULD dynamically [BGP-DYN- CAP] inform the corresponding peer of this change. Such changes SHOULD be handled as described in the following sub-sections. 4.2.2.1 Processing when maximum prefix limit is increased When the prefix limits are increased in the configuration of A, in figure 1, it SHOULD inform B about it as described in 4.2. B SHOULD then restart the route advertisements and it MAY either choose to do so from the Adjacent-Rib-Out for A incrementally or make use of Route Refresh mechanism [BGP-RREFRESH], if it has stopped because of reaching the maximum prefix limit. The former methodology is similar to the approach taken prior to the introduction of Route Refresh. In other words it can be handled in the way policy changes were handled prior to the availability of Route Refresh mechanism, with a minor change of just sending the routes that were rejected due to the prefix limit. In doing so the restart of BGP peering and the associated network traffic and service disruption with it, is avoided. If the maximum prefix limit is not reached and increased prefix limits are received by the peer B, then peer B SHOULD note this and continue with its advertisements to A until these limits are reached. 4.2.2.2 Processing when the maximum prefix limit is decreased When the prefix limits are decreased in the configuration of A (refer figure 1), then B SHOULD be informed about it as described in 4.2. B then SHOULD note this information and SHOULD stop route advertisement immediately if the number of route adtverisments exceeds this new maximum prefix limit for A. By doing so B can avoid processing the routes which will be discarded by A when it detects the maximum prefix limit condition. A does this even before adding the routes to its Adjacent-Rib-In for the peer or in some cases restarting of the peering session. Additionally, network bandwidth consumption by the routing UPDATES can be avoided this way. B at that point follows the process described in 4.2 for route processing. 4.3 ORF based processing The ORF filters can be carried either in the dynamic capability or in the Route Refresh message. The processing of the Route Refresh and ORF is described in 3.5 and [BGP-CRF]. 4.4 Soft Notify processing Soft Notify processing is restricted at first to sub-codes 1-5. Use of sub-codes 6 and 7 in Soft Notify is left for Future study. 4.5 Prefix Length based limits processing All of the operational procedures described in section 4.1 through 4.4 are applicable to the negotiated prefix length based limits. 5. Error Handling The Maximum prefix TLV can be sent in an OPEN (Message 1), a Route Refresh (message 5), or a capability (message 6). The sections below define the error codes and sub-codes related to these message for the maximum prefix draft. 5.1 Open Message responded to with Notification OPEN messages can be rejected for the listed unsupported capabilities by the BGP speakers. The error code for an open message negotiation of Capabilities is sub-code 7 [BGP-CAP]. The maximum prefix TLV will be included in the list of capabilities. 5.2 Route Refresh caused Notification Errors [ROUTE-REFRESH] does not specify error messages associated with the Route-Refresh processing. 5.3 Capability Message responded to with a Notification Errors For errors in Dynamic Capabilities, a NOTIFICATION message may be sent with the Capability messages error code (7) [BGP-DYNCAP] set. Current sub-code for this error message are: Subcode Symbolic Name 1 Invalid Action Value 2 Invalid Capability Length 3 Malformed Capability Value 4 Unsupported Capability Code Support for the Maximum Prefix value negotations will require the addition of the following sub-code 5 Invalid Capability Value If the Maximum Prefix code is not supported, the NOTIFICATION message will be returned with a error code of 7 with a sub-code of 4 (unsupported Capability Code). If the Maximum Prefix Capability is supported, but the value is not-acceptable to receiving node, the Notification can be sent with the 5 invalid capability value and the data field set to the Maximum Prefix TLVs that are not acceptable. 5.4 Cease message for peering reset When the reset maximum prefix value is exceeded, the peering session SHOULD be dropped. In which case the CEASE code in the NOTIFICATION message will be used. The [CEASECODE] proposed BGP Draft gives a subcode of 1 for a Maximum prefix exceed. The data field has a maximum prefix upper bound. This field should have a optional 1 octet field that allows a maximum prefix sub-codes to be encoded beyond this field. 6. Usage in Current Service Providers We provide an example to illustrate a typical Service Provider's (SP) practice with maximum prefix limit. Providers can set one of three levels: Warning, Stop and Reset. This section provides an example of setting two limits (warning, stop/reset) versus three limits (warning, stop, reset). 6.1. Two limits (warning, stop/reset) The provider may set two levels of threshold on the BGP receivers at the network edge: - low water mark as warning threshold and high water mark as stop/reset level. The high water mark has been thought of to quickly detect and stop a misconfigured router sending a full blast of Internet routers. However, the High water mark also may be exceeded in VPN clients by only a few routes as the routing tables grow. Let's examine why this is problematic. When the warning threshold is triggered, SNMP traps are transmitted by the SP's BGP receiver (router) to the SP's management system. The operator needs to contact the customer upon receiving the trap. When the stop/reset threshold with maximum prefix limit is reached, the BGP session may be dropped by the BGP receiver. Again it would generate traps on the provider side. (Some implementation may not drop the session, but drop the customer's routes or prefixes silently.) Then the operator needs to work with the customer to correct the problem and restart the session. There are several issues around using only 2 levels: - First, the provider has to prove to the customer that the session drop was due to customer violating the agreed maximum prefix limit rather than being due to the operator's network condition causing the session drop. Keeping the warning traps may aid, but session error codes specifying the reason of Maximum prefix exceeded will aid in identifying the the reason for the BGP session drop. Secondly, the operator has to work with customer to locate the root cause, and more likely manually bring back the BGP peering session at an agreed time. This is labor intensive for the operator and the customer. If the stop/reset limit cathes an upswing in VPN traffic from a site, the operator and customer must work in crisis mode to resolve the growth. The customer may be more unhappy about the session drops due to growth rather than misconfiguration. Due to the above reasons, as todays common practice, a provider may choose not to use the maximum prefix limit feature for their Internet services to avoid these complications. But the same provider may choose to use the maximum prefix limit feature in their MPLS VPN services for customer connection, due to edge device resource management needs which are particularly associated with VPN services. The issues of where to use and not to use the maximum prefix limit feature are beyond the scope of this draft. 6.2. three levels: warning, stop, reset In this draft, we are promoting a proactive approach to dealing with maximum prefix limit issues. With reference to the example above on the relation of provider and customer edge devices (BGP senders), we propose that both the customer and the provider participate in setting these three levels of thresholds: warning, stop, and reset, and reacting to the resulting warnings, traps or error messages. Anytime a threshold is set or changed on either side, it is communicated to the remote side via BGP signalling, and both sides communicate dynamically whenever an unexpected event triggers any of the threshold levels. The warning level triggers the warning on both provider and customer edge devices, so customer should act on it without waiting for the provider to call. The second level triggers the customer edge device to stop sending routes, as it is reaching the agreed max prefix limit. This may also result in traps being issued on both customer and provider side. The idea is to have the customer take action to fix the problem without dropping the session, thereby requiring less human intervention from the provider side. The third level triggers the session drop action from the provider side. This is used as safeguard for the providers network in case the customer edge device did not behave as expected and is continuing to send routes after exceeding the second level threshold. We believe this feature can help both providers and customers to proactively manage their BGP connections by dynamic signaling, monitoring and taking corrective actions before any drastic action is necessary. In many cases, this can help avoid service interruption, avoid finger-pointing when sessions are dropped, lower operation cost, and increase customers satisfaction. In general, this feature can be applied to provider - provider peering connections as well, with similar advantages. 7. Security Considerations This document does not change the underlying security issues in the BGP protocol. It however, does provide an additional mechanism to protect against Denial of service attacks based on exceeding configured maximum prefix limits. 8. References [BGP-4] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP- 4)", draft-ietf-idr-bgp4-20.txt. Work in progress. [BGP-CAP] Chandra, R., Scudder, J., "Capabilities Advertisement with BGP-4", RFC 3392, May 2000. [BGP-RREFRESH] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, September 2000. [BGP-DYN-CAP] Chen, E., Sangli, S. R., "Dynamic Capability for BGP- 4", draft-ietf-idr-dynamic-cap-03.txt. Work in progress. [BGP-STUDY] Chang, D., Govindan, R., Heidemann, J., "An Empirical Study of Router Response to Large BGP Routing Table Load", ACM SIGCOMM Internet Measurement Workshop, pp. 203-208, Marseille, France, November 2002. [BGP-CRF] Chen, E., Rekhter, Y., "Cooperative Route Filtering Capability for BGP-4", draft-ietf-idr-route-filter-08.txt. Work in progress. [CEASECODE] Chen, E., "Subcodes for BGP Cease Notification Message", draft-ietf-idr-cease-subcode-05.txt. Work in progress. [BGP-SOFT-NOTIFY] Gargi, N., Patel, K., Scudder, J., Ward, D., "BGPv4 Soft-Notification Message", draft-nalawade-bgp-soft-notify- 00.txt. Work in progress. 9. IANA Considerations This document uses a new capability type for the support of prefix limits and the corresponding NOTIFICATION code along with the sub- codes for non-support. This must be assigned by IANA. 10. Acknowledgements The authors would like to thank George Matey, Marten Terpstra, Yakov Rekhter, Enke Chen, Rob Thomas, Manish Gupta, Dan Joyal, Rajesh Saluja and Elwyn Davies for their review and comments. 11. Full Copyright Statement Copyright (C) The Internet Society (2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 12. Author's Addresses: Srikanth Chavali Vasile Radoaca Paul Knight Nortel Networks 600 Technology Park Drive Billerica, MA 01821 USA Email: schavali@nortelnetworks.com Email: vasile@nortelnetworks.com Email: paul.knight@nortelnetworks.com Mo Miri BellSouth 575 Morosgo Drive 4A62 Atlanta, GA 3032 home: +1 404-499-5526 email: mohammad.miri@bellsouth.com Luyuan Fang ATT Labs 200 Laurel Avenue, Room C2-3B35, Middletown, NJ 07748 Phone: +1 732 420 1921 Email: luyuanfang@att.com Susan Hares NextHop Technologies 825 Victors Way Suite 100 Ann Arbor, MI 48108 Phone: +1 734 222 1610 Email: skh@nexthop.com