Network Working Group                     (Editor)Srikanth Chavali
INTERNET DRAFT                                      Vasile Radoaca
Expiration Date: October 2004                 Nortel Networks, Inc.
                                                           Mo Miri
                                                         BellSouth
                                                       Luyuan Fang
                                                              AT&T
                                               (Editor)Susan Hares
                                              NextHop Technologies
                                                        April 2004


                   Peer Prefix Limits Exchange in BGP
                 draft-chavali-bgp-prefixlimit-01.txt


Status of this Memo


   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026. Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups. Note that other groups may also distribute
   working documents as Internet-Drafts.


   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."


   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt


   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


Abstract


   This document proposes a mechanism to allow BGP peers to coordinate
   the setting of a limit on the number of prefixes which one BGP
   speaker will send to its peer.  Coordination can prevent disruption
   of the peering session or discarding of routes, which can occur when
   a maximum prefix limit is configured on the "receiving" peer, and the
   "sending" peer exceeds the limit.


1.  Terms


   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119.


   In this document we use the term "BGP sender" to refer to a BGP
   speaker which is advertising prefixes to its peer.  We use the term
   "BGP receiver" to refer to a BGP speaker which is receiving prefixes
   from its peer.  Although it is clear that in reality each peer is
   usually both a "BGP sender" and a "BGP receiver", we emphasize a
   unidirectional relationship in this document for clarity.


2.  Introduction


   There are many scenarios where BGP [BGP-4] peering may be established
   between two speakers in which there is an expectation that some
   limited number of prefixes will be announced by a given speaker.
   Section 6 describes these secnarios. Several implementations of BGP
   offer a configuration option that allows a BGP receiver to provision
   a limit to the number of prefixes it will accept from a specific
   peer. When the limit is exceeded, then there are generally two
   options: the prefixes exceeding the limit can be dropped by the BGP
   receiver, or the peering session may be terminated by the BGP
   receiver and restarted at a later time. Neither of these options is
   desirable.


   Dropping prefixes leads to network unreliability, since the dropped
   prefixes will be unreachable through the BGP receiver.  Terminating
   the BGP session is probably worse, since all traffic between the
   peers will typically be disrupted, even for those prefixes which were
   advertised before the limit was reached. In many cases, the result of
   not limiting the number of received BGP prefixes can be much worse
   than either case just mentioned.  If the BGP receiver becomes
   overloaded, it can fail and affect many or all of its peers.  The
   effects of the disruptions caused by lost peering sessions and device
   failures  propagate through the Internet, leading to instability as
   described in detail in [BGP-STUDY]. Other undesirable effects include
   resource utilization on the peers from restarting the peering
   session, and the processing load and bandwidth utilization from
   withdrawing and re-advertising the prefixes throughout the Internet.
   Other issues arising out of this are described in section 6.


   The disruption may be due to network changes, misconfigurations,
   miscommunications, or other factors where the number of prefixes
   advertised from a BGP sender to the receiver exceeds the expected
   number, and the configurations must be revised. It may be due to a
   specific configuration which is functioning properly in order to
   prevent an overload condition, or it may occur when the receiving BGP
   speaker becomes overloaded and suffers various consequences. Two
   newer sources of additional route overload are: Virtual Private
   Networks (VPN) services and denial of service (DOS) attacks. A denial
   of service attacks which send additional more specific routes to a
   bgp speaker can overload the routing table.  In VPNs, a sudden
   increase in routes may be a true addition of routes or a
   misconfiguration, or a Denial of Service attack.


   A basic functionality is proposed here for  BGP speakers to exchange
   three prefix limits per AFI/SAFI pair: warning, stop receiving, and
   disconnect limits. BGP [BGP-4] peers coordinate several types of
   information sent via the CAPABILITIES listed in the OPEN message or
   the capabilities sent via the CAPABILITY message (Dynamic
   capabilities). The BGP peers negotiate routes that will be sent in
   Route Refreshes via the Outbound Route Filters (ORFs).  This draft
   proposes:


   1. OPEN message with BGP Capability message [BGP_CAP] to carry the
   proposed parameter.


   2. A new Route Filter type for the OutBound Route Filter community or
   Extended community [ORF,ASPATH ORF].


3.  Definition of Prefix limit


   Prefix limit is encoded as an optional capability parameter [BGP-CAP]
   in the BGP OPEN message [BGP-4].  In addition, for dynamic re-
   adjustment of these capabilities the Prefix limit TLV can be included
   in:
       - Dynamic Capability negotation (described in section 3.1),
       - ORF of Type Prefix (described in section 3.2), and
       - error messages related to Dynamic capabilities (section 5.3),
         or CEASE codes (section 5.4).


   If multiple of these features specify maximum prefix, the precedence
   of the usage is: dynamic capability, ORF, Inform, and Soft Notifty.
   By precedence we indicate that the Dynamic capability negotiation
   takes priority over the other mechanisms.


   The required fields in the Maximum Prefix TLV are sub-code 1 through
   sub-code 3 which MUST be present in the Maximum Prefix TLV. All
   optional fields MAY be present in the Maxmimum Prefix TLV.


3.1 Layout of Bytes


     0      7       15
     +---------+---------+
     |code     |length   |
     |         |         |
     |1 octet  | 1 octet |
     +----------+--------+


     0      7       15         23
     +---------+---------+----------+
     |       AFI         | SAFI     |
     |                   |          |
     |2 octets           | 1 octet  |
     +---------+---------+----------+
     0      7          15         23          55
     +---------+---------+----------+----------+
     |sub      |         |warning   |warning   |
     |code 1   | length  |indicator |prefix    |
     |[Warn]   |         |          |limit     |
     | 1 octet | 1 octet | 1 octet  | 4 octets |
     +---------+-------+------------+----------+


     0      7          15         23          55
     +---------+---------+----------+-----------+
     |sub      |         |stop      |stop adver-|
     |code 2   | length  |advertise-|tisement   |
     |[stop]   |         |ment      |prefix     |
     |         |         |action    |limit      |
     | 1 octet | 1 octet |1 octet   | 4 octets  |
     +---------+-------+------------+-----------+


     0        7        15         23          55
     +---------+---------+----------+-----------+
     |sub      |         |reset     |reset peer-|
     |code 3   | length  |peering   |ing prefix |
     |         |         |action    |limit      |
     | 1 octet | 1 octet | 1 octet  | 4 octets  |
     +---------+-------+------------+-----------+


     0        7       15
     +---------+---------+
     |  option  length   |
     |                   |
     |  2 octets         |
     +---------+---------+


     0      7          15          47
     +---------+----------+-----------+
     |sub      |          |current Rx |
     |code 4   |length    |routes     |
     |[CurRX]  |          |           |
     | 1 octet | 1 octet  | 4 octets  |
     +---------+----------+-----------+


     0      7       15          47
     +---------+----------+-----------+
     |sub      |          |current Tx |
     |code 5   |length    |routes     |
     |[CurTX]  |          |           |
     | 1 octet | 1 octet  | 4 octets  |
     +---------+----------+-----------+


     0         7       15
     +---------+-------+
     |sub      |       |
     |code 6   |length |
     |[pfxln]  |       |
     |1 octet  |1 octet|
     +---------+-------+
     0       7       15         47          59        91
     +-------+--------+----------+-----------+---------+
     |prefix |action  |warning   |stop adver-|reset    |
     |length |flags   |indicator |tisement   |peering  |
     |  1    |for     |limit for |limit for  |limit for|
     |       |limits  |prefix    |prefix     |prefix   |
     |       |for     |length-1  |length-1   |length-1 |
     |       |prefix  |          |           |         |
     |       |length-1|          |           |         |
     |1 octet|1 octet |4 octets  |4 octets   |4 octets |
     +---------+-------+--------+-------+--------+-----+
                       .
                       .
                       .
                       .
     +-------+--------+----------+-----------+---------+
     |prefix |action  |warning   |stop adver-|reset    |
     |length |flags   |indicator |tisement   |peering  |
     |  n    |for     |limit for |limit for  |limit for|
     |       |limits  |prefix    |prefix     |prefix   |
     |       |for     |length-n  |length-n   |length-n |
     |       |prefix  |          |           |         |
     |       |length-n|          |           |         |
     |1 octet|1 octet |4 octets  |4 octets   |4 octets |
     +---------+-------+--------+-------+--------+-----+


     0      7      15
     +-------+-------+
     |sub    |       |
     |code 7 |length |
     |[orfmx]|       |
     +-------+-------+
     0          7         39          71          103       111
     +----------+----------+-----------+------------+---------+------+
     | action   |warning   |stop Adver-|reset peer- |ORF type |ORF   |
     | flags    |indicator |tisement   |ing         |         |Info  |
     |for ORF   |prefix    |prefix     |prefix      |         |      |
     |match     |limit for |limit for  |limit for   |         |      |
     |          |ORF match |ORF Match  |ORF match   |         |      |
     |          |          |           |            |         |      |
     |          |(4 octets)|(4 octets) |(4 octets)  |(1 octet)|      |
     +----------+----------+-----------+------------+---------+------+


3.2 Byte definitions


   Meaning for each of the bitwise indicated capability fields above is
   as follows:


   Type-Code (1 octet):


   code identifying this capability (TBD)


   Length (1 octet):


   The required portion of the Prefix limit TLV is 28 octets and
   includes type-code, length, sub-codes 1-2, and the optional length.
   The optional length of the prefix limit TLV is variable based on the
   information.  If the length exceeds, 254 octets, the length byte is
   set to 255 and the length is determined by the 28 plus the number of
   octets in the optional length field.


   Address Family Identifier AFI (2 octets):


   This along with the Subsequent Address Family Indentifier field
   identifies the Network Layer Protocol associated with the Network
   Address.


   Subsequent Address Family Identifier SAFI (1 octet):


   This along with the Address Family Identifier field identifies the
   Network Layer Protocol associated with the Network Address.


   sub code 1 (1 octet):


   It is used to identify the number of routes sent before raising
   warning. This is done by the BGP speaker that detects it.


   Warning Indicator (1 octet):


   This octet can be assigned a value of 0, 1 or 2.


   A value of 0 means that the sender SHOULD NOT raise any warning. The
   warning mechanisms are described in the operation section of this
   draft. A value of 1 means the warning indication is necessary and
   SHOULD be used by the sender when its route advertisement equals the
   number of sent routes. If a BGP information messages is supported
   (such as the BGP INFORM), a 2 value indicates that such a BGP message
   will be transmitted to the remote peer if the route advertisement
   limit is hit.


   Warning prefix limit (4 octet):


   Number of routes sent by the BGP sender. The value for this field is
   dependent on the maximum prefix limit and SHOULD be always less than
   it.


   sub code 2 (1 octet):


   It is used to identify the number of routes sent before the sender
   BGP speaker needs to stop advertising routes to its receiving BGP
   speaker.


   Stop Advertisement action (1 octet):


   This octet can be set to 0, 1 or 2.


   Setting the value to 0, means the bgp speaker will ignore any routes
   sent after the stop advertisment limit. Setting the bits to 1 means
   that the route advertisement MUST be stopped by the speaker when the
   route advertisement limit is hit.  It is implicit that whichever
   speaker encounters the situation will stop advertisement to its peer.
   If a BGP information messages is supported (such as the BGP INFORM),
   a 2 value indicates that such a BGP message will be transmitted to
   the remote peer if the route advertisement limit is hit.


   maximum prefix limit (4 octet):


   Number of routes sent by the sender BGP speaker.


   sub code 3 (1 octet):


   It is used to identify the number of routes received after which the
   BGP speaker will reset the peering session. It MUST be noted here
   that this situation will never be encountered if adhered to the
   draft. In other words this happens only during error conditions. The
   error conditions are beyond the scope of this document.


   reset peering action (1 octet):


   This field can be set to 0, 1 or 2.


   If the field is zero, the BGP speaker will reset the peering session
   if the route sent to the peer exceeds the reset prefix limit. If the
   field is 1, the BGP peer will reset the peering session and hold it
   down until a manual restart occurs. If the field is 2, the BGP peer
   will reset the peering session via mechanisms such as soft-notify.


   reset prefix limit (4 octet):


   Number of routes sent by the sender BGP speaker. The value for this
   field is dependent on the maximum prefix limit and SHOULD be always
   greater than it.


   optional parameter length (1 octet):


   The value of this optional variable length is 13 octets plus the
   additional 29 bits of reserve field. This value can change when more
   sub codes are added.


   sub code 4:


   The BGP speaker uses this sub-code to indicate to its peer the
   current count of the routes it receieved from it.


   current Rx routes:


   Number of routes received by the BGP speaker from its peer. The value
   of this field SHOULD always be less than or equal to the maximum
   prefix limit configured to receive from the peer.


   sub code 5:


   The BGP speaker uses this sub-code to indicate to its peer the
   current count of the routes sent to it.


   current Tx routes:


   Number of routes sent by the BGP speaker to its peer. The value of
   this field SHOULD always be less than or equal to the maximum prefix
   limit it receieved from the peer in the capability.


   sub code 6:


   The BGP speakers use this sub-code to indicate a prefix-length based
   set of limits: (warning limit, stop advertisement limit, and reset
   limit). The field carries an action flag that indicates actions that
   occur for all prefixes that hit limits, and the limits per length of
   the prefix.  An example of a length of a prefix is length 19 for all
   /19 routes.  All /19 routes will have a warning limit, a stop
   advertisement limit and a reset limit.  Only 1 sub-code 6 parameter
   may be in Prefix limit TLV.


   prefix length-1:


   The length (in bits) of the prefix group.


   action-flags for prefix -1:


   The action flag octet carries the set of action flags for all prefix
   in the following bit pattern


   0x00WWSSRR


   The WW bits can be set with the warning indicator values (0,1,2)
   indicated in sub-code 1. The SS bits can be set the stop
   advertisement action values (0,1,2) indicated in sub-code 2. The RR
   bits can be set to the rest action values (0,1,2) indicated in sub-
   code 3.


   warning prefix limit for prefix length-1:


   The warning limit for the prefix length-1.


   stop advertisement limit for prefix length-1:


   The stop advertisement prefix limit for prefix length-1.


   reset peering limit for prefix length:


   The reset peering route limit for the prefix of length-1.


   sub code 7:


   Sub-code 7 allows the 3 basic prefix limits for set of prefixes
   matching the ORFs. Multiple sub-code 7 TLVs may be in a Prefix TLV.


   Action flags for ORF:


   The action flag definitions are the same as for the action-flag for
   sub-code 6 (prefix length).


   warning indicator prefix limit for ORF match:


   The warning indicator prefix limit for any prefix that match the ORF
   filter.


   stop advertisement prefix limit for ORF match:


   The stop advertisement prefix limit for any prefix that matches the
   ORF filter.


   reset peering prefix limit for ORF Match:


   The stop peering prefix limit for any prefix that matches the ORF
   filter.


   We refer to the warning prefix limit, maximum prefix limit and the
   reset prefix limit as prefix limits in this document for the ease of
   illustration.


   3.3.  Carrying Prefix limits in the Open Capabilities


   The BGP OPEN  capabilities field uses the following triples: triples
   <Capability Code, Capability Length, Capability Value>, where each
   triple is encoded as shown below:
          +------------------------------+
          | Capability Code (1 octet)    |
          +------------------------------+
          | Capability Length (1 octet)  |
          +------------------------------+
          | Capability Value (variable)  |
          +------------------------------+


   The BGP Maximum Prefix Capability value to be assigned by IANA.


   3.4. Interaction between sub-codes 6-7 and sub-codes 1-3


   Within the TLV, if sub-code 6 or sub-code 7 are specified, these
   cannot specify the 0/0 prefix length or an ORF match that matches all
   routes.


   3.5. Carrying Maximum Prefix Limits the the Dynamic Open Capabilities


   The BGP Dynamic Capabilities is carried in the Capability message
   (Message type 6), and uses the following fields:


                  +------------------------------+
                  | Action (1 octet)             |
                  +------------------------------+
                  | Capability Code (1 octet)    |
                  +------------------------------+
                  | Capability Length (1 octet)  |
                  +------------------------------+
                  | Capability Value (variable)  |
                  +------------------------------+


   Action code of "0" in a dynamic capability adds the maximum preifx
   limits specified in the TLV for the corresponding AFI/SAFI. The
   Action code of "1" removes the prefix limits for a particular
   AFI/SAFI. An Action Code of "0" followed by an action code of "0"
   writes over the required fields, and provides an exclusive OR of the
   optional fields.


   3.6. Carrying Maximum Prefix in ORF Match Field in BGP Route Refresh


            +--------------------------------------------------+
            | Address Family Identifier (2 octets)             |
            +--------------------------------------------------+
            | Reserved (1 octet)                               |
            +--------------------------------------------------+
            | Subsequent Address Family Identifier (1 octet)   |
            +--------------------------------------------------+
            | When-to-refresh (1 octet)                        |
            +--------------------------------------------------+
            | ORF Type  = Maximum Prefix (08)                  |
            +--------------------------------------------------+
            | Length of ORFs (2 octets)                        |
            +--------------------------------------------------+
            | First Maximum Prefix ORF  sub-code (TLV 1-7)     |
            +--------------------------------------------------+


            +--------------------------------------------------+
            | Second Maximum Prefix ORF sub-code (TLV 1-7)     |
            +--------------------------------------------------+
                ...
            +--------------------------------------------------+
            | Nth Maximum Prefix ORF sub-code (TLV 1-7)        |
            +--------------------------------------------------+


   ORF entries are carried in the BGP ROUTE-REFRESH message [BGP-RR]. A
   single ROUTE-REFRESH message could carry multiple ORF entries, as
   long as all these entries share the same AFI/SAFI.


   From the encoding point of view each ORF entry consists of a common
   part and type-specific part. The common part consists of <AFI/SAFI,
   ORF-Type, Action, Match>.


   The "When-to-refresh" field in the route can be one of IMMEDIATE
   (0x01) or DEFER (0x02), the semantics and operation of which are
   described in [BGP-CRF]. Following this field is a collection of one
   or more ORFs, grouped by ORF-Type.  The Maximum Prefix ORF type ORF
   field can be intermixed with other ORF fields. If the ORF field is
   specific to the Maximum Prefix field, the ORF (sub-code 7) should be
   utilized to specify the ORF field.


   The ORF-Type component is encoded as a one-octet field. The value 0
   is reserved. The values currently proposed to be assigned are:


     1. reserved (00) 2. Community (02) 3. Extended Community (03) 4.
     AsPath (xx) 5. Prefix (64) 6. Maximum Prefix (08)


3.7. Carrying the Maximum Prefix in a Soft Notify [BGP-SOFT-NOTIFY]
          0                   1
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       AFI                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       SAFI    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Type-code               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Sub-code                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Length                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Variable Data TLV       |
   |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   The type code of 3 will indicate that a prefix maximum has been
   exceeded. The sub-code will indicate which type of prefix maximum has
   been exceeded. The value of <1> will indicate a warning prefix
   maximum, the value of <2> will indicate that a stop advertisement
   prefix maximum has been exceeded, and the value of <0> will indicate
   that a reset peering advertistement has been exceeded.


   The length specifies the length of the optional portion of the soft-
   notify. The variable portion of the soft-notify SHOULD contain the
   required fields of the Maximum prefix field.  The variable Data TLV
   MAY contain the fields of the optional fields.


4.  Operation


4.1 Exchanging the configured prefix limits


   BGP speakers exchange the prefix limits as an optional capability
   parameter [BGP-CAP] as described in section 3.


           +--------+                     +--------+
           |    A   | <-----------------> |    B   |
           +--------+                     +--------+


                           Figure 1


   In figure 1 both BGP speakers A and B exchange the prefix limits to
   indicate the support for this capability. Each of A and B set the
   warning prefix limit, maximum prefix limit and reset prefix limit
   along with the actions associated with each of them in the capability
   message before exchanging them. The warning prefix limit and reset
   limit values are determined based on the configured maximum prefix
   limit. They are typically a percentage value of the maximum prefix
   limit. The exact percentage values are beyond the scope of this
   document. The maximum prefix limit configured on A for the peer B
   implies the maximum number of prefixes that A expects to receive from
   B. B informs this in the new capability described in section 4. The
   same interpretation applies to B too.


4.2 Dynamic Capability Reset of the Capability


   Dynamic Capabilities can set the BGP speakers maximum prefix values
   (warning indicator, stop advertisement, and reset peering values) to
   different values that initially negotiated via the OPEN Capabilities.
   The exact mechanisms for the decision to reset the values are outside
   the scope of this specification. Figure 2 indicates how the dynamic
   capability can be utilized when a prefix limit is detected by the BGP
   speaker.


           +--------+                     +--------+
           |    A   | <-----------------> |    B   |
           +--------+                     +--------+


                                         B detects warning
                                         prefix limit
                                <------  generates dynamic
                                         capability message
                                         to A
                          Figure 2


   4.2.1 Dynamic Capability use of Sub-code 4 (Current Received Route)
   and Sub-Code 5 (Current Transmit Routes)


   Sub-codes 4 (Current Received Routes) and sub-code 5(current Transmit
   routes) provides information to the  BGP speakers which aids in
   preventing peer disruption. Figure 2 demonstrates the case where BGP
   speaker A and B maintain a count of the routes they receive from each
   other. Route processing operation is illustrated using the case where
   B sends route advertisements to A. (The same operational procedures
   apply for the other case of A sending route advertisements to B.)


   B, as shown in figure 1,  applies the out bound route policies on the
   Adjacent-Rib-Out followed by the condition of the prefix limits
   before route advertisements. Upon hitting the the warning indicator
   prefix limit, BGP speaker B sends the Dynamic Capability messages to
   A with 5 sub-codes: warning indicator (sub-code1), stop advertisement
   (sub-code 2), reset peering indicator (sub-code 3), Current Receive
   Routes (sub-code 4), Current Transmit routes (sub-code 5). The
   additional sub-codes i.e 4 and 5, provide information that assists
   the network administrators in prioritizing the handling of the
   warning. For example, if the limits are 1000 routes for warning, 2000
   for stop advertisement, and 3000 for reset peering and the current
   routes are 1010. Then, it can be deduced by the network operator that
   the received routes are well within the tolerance limit i.e sub-code
   2. If instead for the same limits (1000,2000,3000), the current
   received routes (by speaker B) is 1900, the network operator may want
   to investigate the customer changes.


   In figure 2 it can be seen in due course of route advertisements to
   A, B generates a  dynamic capability [BGP-DYN-CAP] destined to A
   comprising of the sub-codes 1-5.  The reason B sends this message in
   this case is that it detects the warning limit at the time of route
   advertisements earlier than A. In other words either A or B or both
   of them could generate this message depending on timing of warning
   limit detection. B and A MAY choose to raise internal warning when
   this condition is detected. Following the warnings both A and B
   continue advertising routes normally to each other.


   If B determines that the prefix limits can be increased, BGP speaker
   MAY send these changed values in the Dynamic capability alongwith
   sub-codes 1-5.


   In figure 3, B during route advertisement detects that the maximum
   prefix limit for route advertisement is reached. It SHOULD stop
   further route advertisements to A. In other words in this condition
   it SHOULD implicitly mean to B that the announce policy to A is
   stop/deny. B then SHOULD send a Dynamic Capability [BGP-DYN-CAP] to A
   indicating the current Receive and Transmit routes (sub-code 4 and
   Sub-cod 5). As in the case of warning prefix limit condition either A
   or B or both could send dynamic capability [BGP-DYN-CAP]. Any route
   withdrawal to A is automatically recorded and SHOULD result in
   restoring the announce policy to the configured one (if any
   configured) implicitly. This helps in, preserving the incremental
   nature of the protocol and avoiding processing of routes by peers
   such as B, which get discarded by speakers such as A when the limit
   is reached. In addition to these network bandwidth consumption by the
   route UPDATES can be avoided. It is expected that conformance to this
   document will not lead to any further route advertisements to A by B
   unless there exists an unforseen error. Under such situation A can
   reset the peering session as indicated in the maximum prefix limit to
   B during the capability negotiation.


           +--------+                     +--------+
           |    A   | <-----------------> |    B   |
           +--------+                     +--------+


                                         B detects stop advertisement
                                         maximum prefix limit
                                         and generates dynamic
                                <------  capability message
                                         to provide additional
                                         information.
                          Figure 3


   4.2.2 Prefix limit changes Utilizing Dynamic Capabilities


   If a need for prefix limits change arises, each BGP speaker A whose
   configuration changes for its peer B, SHOULD dynamically [BGP-DYN-
   CAP] inform the corresponding peer of this change. Such changes
   SHOULD be handled as described in the following sub-sections.


   4.2.2.1 Processing when maximum prefix limit is increased


   When the prefix limits are increased in the configuration of A, in
   figure 1, it SHOULD inform B about it as described in 4.2. B SHOULD
   then restart the route advertisements and it MAY either choose to do
   so from the Adjacent-Rib-Out for A incrementally or make use of Route
   Refresh mechanism [BGP-RREFRESH], if it has stopped because of
   reaching the maximum prefix limit. The former methodology is similar
   to the approach taken prior to the introduction of Route Refresh. In
   other words it can be handled in the way policy changes were handled
   prior to the availability of Route Refresh mechanism, with a minor
   change of just sending the  routes that were rejected due to the
   prefix limit. In doing so the restart of BGP peering and the
   associated network traffic and service disruption with it, is
   avoided. If the maximum prefix limit is not reached and increased
   prefix limits are received by the peer B, then peer B SHOULD note
   this and continue with its advertisements to A until these limits are
   reached.


   4.2.2.2 Processing when the maximum prefix limit is decreased


   When the prefix limits are decreased in the configuration of A (refer
   figure 1), then B SHOULD be informed  about it as described in 4.2. B
   then SHOULD note this information and SHOULD stop route advertisement
   immediately if the number of route adtverisments exceeds this new
   maximum prefix limit for A. By doing so B can avoid processing the
   routes which will be discarded by A when it detects the maximum
   prefix limit condition. A does this even before adding the routes to
   its Adjacent-Rib-In for the peer or in some cases restarting of the
   peering session. Additionally, network bandwidth consumption by the
   routing UPDATES can be avoided this way. B at that point follows the
   process described in 4.2 for route processing.


4.3 ORF based processing


   The ORF filters can be carried either in the dynamic capability or in
   the Route Refresh message. The processing of the Route Refresh and
   ORF is described in 3.5 and [BGP-CRF].


4.4 Soft Notify processing


   Soft Notify processing is restricted at first to sub-codes 1-5.  Use
   of sub-codes 6 and 7 in Soft Notify is left for Future study.


4.5 Prefix Length based limits processing


   All of the operational procedures described in section 4.1 through
   4.4 are applicable to the negotiated prefix length based limits.


5. Error Handling


   The Maximum prefix TLV can be sent in an OPEN (Message 1), a Route
   Refresh (message 5), or a capability (message 6). The sections below
   define the error codes and sub-codes related to these message for the
   maximum prefix draft.


   5.1  Open Message responded to with Notification


   OPEN messages can be rejected for the listed unsupported capabilities
   by the BGP speakers. The error code for an open message negotiation
   of Capabilities is sub-code 7 [BGP-CAP].  The maximum prefix TLV will
   be included in the list of capabilities.


   5.2 Route Refresh caused Notification Errors
   [ROUTE-REFRESH] does not specify error messages associated with the
   Route-Refresh processing.


   5.3 Capability Message responded to with a Notification Errors


   For errors in Dynamic Capabilities, a NOTIFICATION message may be
   sent with the Capability messages error code (7) [BGP-DYNCAP] set.
   Current sub-code for this error message are:


          Subcode        Symbolic Name


           1           Invalid Action Value
           2           Invalid Capability Length
           3           Malformed Capability Value
           4           Unsupported Capability Code


   Support for the Maximum Prefix value negotations will require the
   addition of the following sub-code


           5           Invalid Capability Value


   If the Maximum Prefix code is not supported, the NOTIFICATION message
   will be returned with a error code of 7 with a sub-code of 4
   (unsupported Capability Code).  If the Maximum Prefix Capability is
   supported, but the value is not-acceptable to receiving node, the
   Notification can be sent with the 5 invalid capability value and the
   data field set to the Maximum Prefix TLVs that are not acceptable.


   5.4 Cease message for peering reset


   When the reset maximum prefix value is exceeded, the peering session
   SHOULD be dropped. In which case the CEASE code in the NOTIFICATION
   message will be used.  The [CEASECODE] proposed BGP Draft gives a
   subcode of 1 for a Maximum prefix exceed.  The data field has a
   maximum prefix upper bound.  This field should have a optional 1
   octet field that allows a maximum prefix sub-codes to be encoded
   beyond this field.


6.  Usage in Current Service Providers


   We provide an example to illustrate a typical Service Provider's (SP)
   practice with maximum prefix limit.  Providers can set one of three
   levels:  Warning, Stop and Reset.  This section provides an example
   of setting two limits (warning, stop/reset) versus three limits
   (warning, stop, reset).


   6.1. Two limits (warning, stop/reset)


   The provider may set two levels of threshold on the BGP receivers at
   the network edge: - low water mark as warning threshold and high
   water mark as stop/reset level. The high water mark has been thought
   of to quickly detect and stop a misconfigured router sending a full
   blast of Internet routers.  However, the High water mark also may be
   exceeded in VPN clients by only a few routes as the routing tables
   grow.  Let's examine why this is problematic.


   When the warning threshold is triggered, SNMP traps are transmitted
   by the SP's BGP receiver (router) to the SP's management system. The
   operator needs to contact the customer upon receiving the trap. When
   the stop/reset threshold with maximum prefix limit is reached, the
   BGP session may be dropped by the BGP receiver. Again it would
   generate traps on the provider side. (Some implementation may not
   drop the session, but drop the customer's routes or prefixes
   silently.) Then the operator needs to work with the customer to
   correct the problem and restart the session.


   There are several issues around using only 2 levels: - First, the
   provider has to prove to the customer that the session drop was due
   to customer violating the agreed maximum prefix limit rather than
   being due to the operator's network condition causing the session
   drop. Keeping the warning traps may aid, but session error codes
   specifying the reason of Maximum prefix exceeded will aid in
   identifying the the reason for the BGP session drop.


   Secondly, the operator has to work with customer to locate the root
   cause, and more likely manually bring back the BGP peering session at
   an agreed time. This is labor intensive for the operator and the
   customer.  If the stop/reset limit cathes an upswing in VPN traffic
   from a site, the operator and customer must work in crisis mode to
   resolve the growth.  The customer may be more unhappy about the
   session drops due to growth rather than misconfiguration.


   Due to the above reasons, as todays common practice, a provider may
   choose not to use the maximum prefix limit feature for their Internet
   services to avoid these complications. But the same provider may
   choose to use the maximum prefix limit feature in their MPLS VPN
   services for customer connection, due to edge device resource
   management needs which are particularly associated with VPN services.


   The issues of where to use and not to use the maximum prefix limit
   feature are beyond the scope of this draft.


   6.2. three levels: warning, stop, reset


   In this draft, we are promoting a proactive approach to dealing with
   maximum prefix limit issues. With reference to the example above on
   the relation of provider and customer edge devices (BGP senders), we
   propose that both the customer and the provider participate in
   setting these three levels of thresholds: warning, stop, and reset,
   and reacting to the resulting warnings, traps or error messages.
   Anytime a threshold is set or changed on either side, it is
   communicated to the remote side via BGP signalling, and both sides
   communicate dynamically whenever an unexpected event triggers any of
   the threshold levels.


   The warning level triggers the warning on both provider and customer
   edge devices, so customer should act on it without waiting for the
   provider to call.


   The second level triggers the customer edge device to stop sending
   routes, as it is reaching the agreed max prefix limit. This may also
   result in traps being issued on both customer and provider side. The
   idea is to have the customer take action to fix the problem without
   dropping the session, thereby requiring less human intervention from
   the provider side.


   The third level triggers the session drop action from the provider
   side. This is used as safeguard for the providers network in case the
   customer edge device did not behave as expected and is continuing to
   send routes after exceeding the second level threshold.


   We believe this feature can help both providers and customers to
   proactively manage their BGP connections by dynamic signaling,
   monitoring and taking corrective actions before any drastic action is
   necessary. In many cases, this can help avoid service interruption,
   avoid finger-pointing when sessions are dropped, lower operation
   cost, and increase customers satisfaction. In general, this feature
   can be applied to provider - provider peering connections as well,
   with similar advantages.


7. Security Considerations


    This document does not change the underlying security issues in the
   BGP protocol. It however, does provide an additional mechanism to
   protect against Denial of service attacks based on exceeding
   configured maximum prefix limits.


8. References


   [BGP-4] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP-
       4)", draft-ietf-idr-bgp4-20.txt. Work in progress.


   [BGP-CAP] Chandra, R., Scudder, J., "Capabilities Advertisement with
       BGP-4", RFC 3392, May 2000.


   [BGP-RREFRESH] Chen, E., "Route Refresh Capability for BGP-4", RFC
       2918, September 2000.


   [BGP-DYN-CAP] Chen, E., Sangli, S. R., "Dynamic Capability for BGP-
       4", draft-ietf-idr-dynamic-cap-03.txt. Work in progress.


   [BGP-STUDY] Chang, D., Govindan, R., Heidemann, J., "An Empirical
       Study of Router Response to Large BGP Routing Table Load", ACM
       SIGCOMM Internet Measurement Workshop, pp. 203-208, Marseille,
       France, November 2002.


   [BGP-CRF] Chen, E., Rekhter, Y., "Cooperative Route Filtering
       Capability for BGP-4", draft-ietf-idr-route-filter-08.txt. Work
       in progress.


   [CEASECODE] Chen, E., "Subcodes for BGP Cease Notification Message",
       draft-ietf-idr-cease-subcode-05.txt. Work in progress.


   [BGP-SOFT-NOTIFY] Gargi, N., Patel, K., Scudder, J., Ward, D., "BGPv4
       Soft-Notification Message", draft-nalawade-bgp-soft-notify-
       00.txt. Work in progress.


9. IANA Considerations


   This document uses a new capability type for the support of prefix
   limits and the corresponding NOTIFICATION code along with the sub-
   codes for non-support. This must be assigned by IANA.


10. Acknowledgements


   The authors would like to thank George Matey, Marten Terpstra, Yakov
   Rekhter, Enke Chen, Rob Thomas, Manish Gupta, Dan Joyal, Rajesh
   Saluja and Elwyn Davies for their review and comments.


11. Full Copyright Statement


   Copyright (C) The Internet Society (2000).  All Rights Reserved.


   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.


   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.


   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


12. Author's Addresses:


   Srikanth Chavali
   Vasile Radoaca
   Paul Knight


   Nortel Networks
   600 Technology Park Drive
   Billerica, MA 01821 USA


   Email:  schavali@nortelnetworks.com
   Email:  vasile@nortelnetworks.com
   Email:  paul.knight@nortelnetworks.com


   Mo Miri
   BellSouth
   575 Morosgo Drive
   4A62
   Atlanta, GA 3032


   home: +1 404-499-5526
   email: mohammad.miri@bellsouth.com


   Luyuan Fang
   ATT Labs
   200 Laurel Avenue,
   Room C2-3B35,
   Middletown, NJ 07748


   Phone: +1 732 420 1921
   Email: luyuanfang@att.com


   Susan Hares
   NextHop Technologies
   825 Victors Way
   Suite 100
   Ann Arbor, MI 48108


   Phone: +1 734 222 1610
   Email: skh@nexthop.com