Network Working Group Yakov Rekhter Internet Draft Cisco Systems, Inc. Paul Traina Juniper Networks, Inc. Expiration Date: January 1997 June 1996 Inter-Domain Routing Protocol, Version 2 draft-ietf-idr-idrp2-00.txt 1. Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). 2. Introduction The Inter-Domain Routing Protocol (IDRP) permits a routing domain to exchange information with other routing domains to facilitate the operation of the routing and relaying functions of the Network Layer. This protocol calculates path segments which consist of Boundary Intermediate systems (BIS aka border routers) and the links that interconnect them. A packet destined for an end system in another routing domain will be routed via Intra-domain routing to a Boundary Intermediate system in the current routing domain. Then, the BIS, using the methods of this inter-domain routing protocol, will calculate a path to a Boundary Intermediate system in an adjacent routing domain lying on a path to the destination. After arriving at the next routing domain, the packet may also travel within that Yakov Rekhter, Paul Traina [Page 1] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 domain on its way towards a BIS located in the next domain along its path. This process will continue on a hop-by-hop basis until the packet arrives at a BIS in the routing domain which contains the destination End system. The Boundary IS in this routing domain will hand the incoming NPDU over to the domain's intra-domain routing protocol, which will construct a path to the destination End system. Inter-domain routing protocols place requirements on the type of information that a routing domain must provide and on the methods by which this information will be distributed to other routing domains. These requirements are intended to be minimal, addressing only the interactions between Boundary ISs; all other internal operations of each routing domain are outside the scope of this protocol. That is, this Inter-domain routing protocol does not mandate that a routing domain run a particular intra-domain routing protocol. The methods of this protocol differ from those generally adopted for an intra-domain routing protocol because they emphasize the interdependencies between efficient route calculation and the preservation of legal, contractual, and administrative concerns. This protocol calculates routes which will be efficient, loop-free, and in compliance with the domain's local routing policies. IDRP may be used when routing domains do not fully trust each other; it imposes no upper limit on the number of routing domains that can participate in this protocol; and it provides isolation between its operations and the internal operations of each routing domain. 3. Scope This document specifies a protocol to be used by Boundary Intermediate systems (defined in 6 ) to acquire and maintain information for the purpose of routing NPDUs between different routing domains. Figure 1 illustrates the field of application of this protocol. This document specifies: - the procedures for the exchange of inter-domain reachability and path information between BISs - the procedures for maintaining inter-domain routing information bases within a BIS - the encoding of protocol data units used to distribute inter- domain routing information between BISs Yakov Rekhter, Paul Traina [Page 2] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +--------------+ +------------------+ +--------------+ | End Routing | | Transit Routing | | End Routing | | Domain | | Domain | | Domain | +--------------+ +------------------+ +--------------+ | / | | / | / | | / | / | | / | / | | / +------------------+ | +------------------+ | Transit Routing | | | Transit Routing | | Domain | | | Domain | +------------------+ | +------------------+ / | | | / | | | / | | | +--------------+ +------------------+ | | End Routing | | Transit Routing | | | Domain | | Domain | | +--------------+ +------------------+ | | | | | | | +--------------+ +--------------+ | End Routing | | End Routing | | Domain | | Domain | +--------------+ +--------------+ | / | / | / +------------------+ | Transit Routing | | Domain | +------------------+ Figure 1. Field of Application. The Inter-domain Routing Protocol operates between routing domains; intra-domain routing is not within its scope. The procedures are defined in terms of: - interactions between Boundary Intermediate systems through the exchange of protocol data units - interactions between this protocol and the underlying Network Service Yakov Rekhter, Paul Traina [Page 3] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 - constraints on policy feasibility and enforcement which must be observed by each Boundary Intermediate system in a routing domain The boundaries of Administrative Domains are realized as artifacts of the placement of policy constraints and the aggregation of network layer reachability information; they are not manifested explicitly in the protocol. The protocol described in this document operates at the level of individual routing domains. The establishment of administrative domains is outside the scope of this standard. 4. Definitions For the purposes of this document, the following definitions apply. 4.1. Intra-domain routing protocol A routing protocol that is run between Intermediate systems in a single routing domain to determine routes that pass through only systems and links wholly contained within the domain. 4.2. Inter-domain link A real (physical) or virtual (logical) link between two or more Boundary Intermediate systems. A link between two BISs in the same routing domain carry both intra-domain traffic and inter-domain traffic; a link between two BISs located in adjacent routing domains can carry inter-domain traffic, but not intra-domain traffic. 4.3. Boundary Intermediate system An intermediate system that runs the protocol specified in this standard, has at least one inter-domain link attached to it, and may optionally have intra-domain links attached to it. 4.4. End Routing Domain A routing domain whose local policies permit its BISs to calculate inter-domain path segments only for PDUs whose source is located within that routing domain. There are two varieties of End routing domains: stub and multi-homed. A stub ERD has inter-domain links to Yakov Rekhter, Paul Traina [Page 4] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 only one adjacent routing domain, while a multi-homed ERD has inter- domain links to several adjacent routing domains. 4.5. Transit Routing Domain A routing domain whose policies permit its BISs to calculate inter- domain path segments for PDUs whose source is located either in the local routing domain or in a different routing domain. That is, it can provide a relaying service for such PDUs. 4.6. Adjacent RDs Two RDs ("A" and "B") are adjacent to one another if there is a at least one pair of BISs, one located in "A" and the other in "B", that are attached to each other by means of a real subnetwork. 4.7. RD Path A list of the RDIs of the routing domains and routing domain confederations through which a given UPDATE PDU has travelled. 4.8. Routing Domain Confederation A set of routing domains which have agreed to join together and to conform to the rules in 8.13 of this document. To the outside world, a confederation is indistinguishable from a routing domain. 4.9. Nested RDCs A routing domain confederation "A" (RDC-A) is nested within RDC-B when all of the following conditions are satisfied simultaneously: a) all members of RDC-A are also members of RDC-B b) there are some members of RDC-B that are not members of RDC-A Yakov Rekhter, Paul Traina [Page 5] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 4.10. Overlapping RDCs A routing domain confederation (RDC-A) overlaps RDC-B when all the following conditions are satisfied simultaneously: a) there are some members of RDC-A that are also members of RDC-B, and b) there are some members of RDC-A that are not members of RDC-B, and c) there are some members of RDC-B that are not members of RDC-A. 4.11. Disjoint RDCs Two routing domain confederations, RDC-A and RDC-B, are disjoint from one another when there are no routing domains which are simultaneously members of both RDC-A and RDC-B. 4.12. Policy Information Base The collection of routing policies that a BIS will apply to the routing information that it learns using the protocol described in this document. It is not required that all routing domains use the same syntax and semantics to express policy; that is, the format of the Policy Information Base is left as a local option. 4.13. Route Origin Each route or component of an aggregated route has a single unique origin. This is the RD or RDC in which the route's destinations are located. Yakov Rekhter, Paul Traina [Page 6] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 5. Symbols and abbreviations The symbols, acronyms, and abbreviations listed in the following clauses are used in this document. 5.1. Data unit abbreviations BISPDU Boundary Intermediate System PDU (IDRP message) NPDU Network Protocol Data Unit (IPv4 or IPv6 packet) PDU Protocol Data Unit (a packet) 5.2. Addressing abbreviations SNPA Subnetwork Point of Attachment (data link address) 5.3. Other abbreviations BIS Boundary Intermediate System (border router) CM Confederation Member ERD End Routing Domain ES End System FIB Forwarding Information Base FSM Finite State Machine IDRP Inter-domain Routing Protocol (an acronym for the protocol described in this document) IS Intermediate System (router) MIB Management Information Base NLRI Network layer reachability information (set of reachable destinations) Yakov Rekhter, Paul Traina [Page 7] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 PIB Policy Information Base RDC Routing Domain Confederation RDI Routing Domain Identifier RIB Routing Information Base TRD Transit Routing Domain 6. General protocol information IDRP uses IP (either v4 or v6) as its transport protocol. In particular, BISPDUs are encapsulated as the data portion of IP packets. IDRP is a connection-oriented protocol which is implemented only in Intermediate systems. Routing and control information is carried in BISPDUs (as specified in Section 7 ), which flow on connections between pairs of BISs. Each BISPDU is packaged within one or more NPDUs for transmission by the underlying Network service. IDRP relies on the underlying Network service to provide for fragmentation and reassembly of BISPDUs. IDRP queues outbound BISPDUs as input to the underlying Network Layer service, retaining a copy of each BISPDU until an acknowledgement is received. Similarly, inbound BISPDUs are queued as input to the BISPDU-Receive process. IDRP exchanges BISPDUs in a reliable fashion. It provides mechanisms for the ordered delivery of BISPDUs and for the detection and retransmission of lost or corrupted BISPDUs. The mechanisms for achieving reliable delivery of BISPDUs are described in 8.7 ; methods for establishing BIS-BIS connections are described in 8.6 To emphasize its policy-based nature, the IDRP routing model includes a Policy Information Base. IDRP can be described in terms of following major components: a) BISPDU-Receive Process: responsible for accepting and processing control and routing information from the local environment and from BISPDUs of other BISs. This information is used for a variety of purposes, such as receiving error reports and guaranteeing reliable reception of BISPDUs from neighboring BISs. (For example, the Yakov Rekhter, Paul Traina [Page 8] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 Update-Receive process (see 8.14 ) is the part of the BISPDU- Receive process that deals with the reception of routing information after a BIS-BIS connection has been established.) b) BISPDU-Send Process: responsible for constructing BISPDUs which contain control and routing information. BISPDUs are used by the local BIS for a variety of purposes, such as advertising routing information to other BISs, initiating BIS-BIS communication, and validating BIS routing information bases. c) Decision Process: responsible for calculating routes which will be consistent with local routing policies. It operates on information in both the PIB and the Adj-RIBs, using it to create the Local RIBs (Loc-RIBs) and the local Forwarding Information Bases (see 8.10 ). d) Forwarding Process: responsible for supplying resources to accomplish relaying of NPDUs to their destinations. It uses the FIB(s) created by the Decision Process. 6.1. Inter-RD topology This protocol views an internet as an arbitrary interconnection of Transit Routing Domains and End Routing Domains which are connected by real inter-domain links placed between BISs located in the respective routing domains. This standard provides for the direct exchange of routing information between BISs, which may be located either in the same routing domain or in adjacent routing domains. 6.2. Routing policy The direct exchange of policy information is outside the scope of IDRP. Instead, IDRP communicates policy information indirectly in its UPDATE PDUs which reflect the effects of the local policies of RDs on the path to the destination. Each routing domain chooses its routing policies independently, and insures that all its BISs calculate inter-domain paths which satisfy those policies. Local routing policies are applied to information in Yakov Rekhter, Paul Traina [Page 9] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 the Routing Information Base (RIB) to determine a degree of preference for potential paths (see 8.15 ). From those paths which are not rejected by the routing policy, a BIS selects the paths which it will use locally; from the locally selected paths, the BIS will then select the paths that it will advertise externally. To enforce routing policies and to insure that policies are both feasible and consistent, this protocol: - carries path information, expressed in terms of Routing Domain Identifiers (RDIs) and various path attributes, in its UPDATE PDUs - permits a routing domain to selectively propagate its reachability information to a limited set of other routing domains - provides a method to detect policy inconsistencies within the set of BISs located in a single routing domain. - permits each routing domain to set its policies individually: that is, global coordination of policy is not required. The set of rules that comprises the routing policy enforced by a BIS are held in a Policy Information Base (PIB), which is separate from the RIB. 6.3. Types of systems An Intermediate system that implements the protocol described in this document is called a Boundary Intermediate system (BIS). Each BIS resides in a single routing domain, and may optionally act simultaneously as a BIS and as an intra-domain IS within its own routing domain. 6.4. Types of routing domains The protocol described in this document recognizes two types of routing domains, end routing domains and transit routing domains; each of them may contain both ISs and ESs. Yakov Rekhter, Paul Traina [Page 10] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 6.5. Routing domain confederations IDRP provides support for Routing Domain Confederations (RDCs); this optional function permits groups of routing domains to be organized in a hierarchical fashion. An RDC is formed by means outside the scope of this protocol, and composed of a set of confederation members. Confederation members (CMs) are either individual routing domains or routing domain confederations. Thus, the definition of an RDC is recursive: a confederation member may be a single routing domain or another confederation. 6.6. Routes: advertisement and storage For purposes of this protocol, a route is defined as a unit of information that pairs destinations with the attributes of a path to those destinations: - Routes are advertised between a pair of BISs in UPDATE PDUs: the destinations are the systems whose address matches address prefixes reported in the NLRI field, and the path is the information reported in the path attributes fields of the same UPDATE PDU. - Routes are stored in the Routing Information Bases: namely, the Adj-RIBs-In, the Loc-RIBs, and the Adj-RIBs-Out. Routes that will be advertised to other BISs must be present in the Adj-RIBs-Out; routes that will be used by the local BIS must be present in the Loc-RIBs, and the next hop for each of these routes must present in the local BIS's Forwarding Information Bases; and routes that are received from other BISs are present in the Adj-RIBs-In. A BIS can support multiple routes to the same destination by maintaining multiple RIBs and the corresponding multiple FIBs. Each Loc-RIB will be identified by a different RIB-Tag (see 6.7 and 6.8 ); an Adj-RIB-Out shall contain at most one route to a particular destination. If the BIS chooses to advertise the route, it may add to or modify the path attributes of the route before advertising it to adjacent BISs. IDRP also provides mechanisms by which a BIS can inform its neighbor that a previously advertised route is no longer available for use. Yakov Rekhter, Paul Traina [Page 11] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 There are three methods by which a given BIS can indicate that a route has been withdrawn from service: a) the NLRI for a previously advertised route can be advertised in the WITHDRAWN ROUTES field of an UPDATE PDU, thus marking the associated route as being no longer available for use b) a replacement route (with the same FIB-Tag and NLRI) can be advertised, or c) the BIS-BIS connection can be closed, which implicitly removes from service all routes which the pair of BISs had advertised to each other. 6.7. RIB-Tag Each RIB-Tag identifies a particular information base which will be used to store information about the route. The RIB-tag is a common identifier for the Adj-RIB-In, Loc-RIB, Adj-RIB-Out, and FIB with which the route information is associated. The number of RIB-tags is limited by local decisions - a BIS may choose to support only a limited number of RIB-tags. 6.8. Selecting the information bases Each RIB is identified by a RIB-Tag, and the same RIB-Tag also uniquely identifies the associated FIB. For an UPDATE PDU, the BIS determines the RIB-Tag, and the LOCAL_PREF associated with each route that is advertised. For an NPDU, the BIS unambiguously determines the FIB that should be used for forwarding this NPDU. It maps certain fields in NPDU's header into a RIB-Tag, which then unambiguously identifies a particular FIB. A summary of IDRP's information bases is presented in Table 1. Yakov Rekhter, Paul Traina [Page 12] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +----------------------------------------------------------------------+ | Table 1 The IDRP Information Bases. The indexing | | variables and contents of the RIBs and FIBs | | are shown. | +-----------------+-----------------+----------------------------------+ | Information | Indexed by... | Contains... | | Base | | | +-----------------+-----------------+----------------------------------+ | Adj-RIB-In | - BIS-Ident | - Path attributes | | | of adjacent | - NLRI | | | BIS | | | | - RIB-Tag | | | | - NLRI | | +-----------------+-----------------+----------------------------------+ | Loc-RIB | - RIB-Tag | - Path attributes | | | | - NLRI | +-----------------+-----------------+----------------------------------+ | Adj-RIB-Out | - BIS-Ident | - Path attributes | | | of adjacent | - NLRI | | | BIS | | | | - RIB-Tag | | | | - NLRI | | +-----------------+-----------------+----------------------------------+ | FIB | - RIB-Tag | - IP addr of next hop BIS | | | - NLRI | - Output SNPA of local BIS | | | | - Input SNPA of next hop BIS | +-----------------+-----------------+----------------------------------+ 6.9. Routing information exchange This document provides several rules governing the distribution and exchange of routing information: - rules for distributing routing information internally (to BISs within a routing domain) - rules for distributing routing information externally (to BISs in adjacent routing domains) Routing information is carried in the protocol's BISPDUs, which are generated on an event-driven basis whenever a BIS receives information which causes it advertise new paths. Yakov Rekhter, Paul Traina [Page 13] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +----------------------------------------------------------------------+ | Notes: | | | | a) As a local option, a BIS may elect to apply information | | reduction techniques to path attributes and NLRI information. | | | | b) For each adjacent BIS, a given BIS maintains an Adj-RIB-In for | | each RIB-Tag (including the null RIB-Tag) that it supports. | | | | c) A BIS maintains a separate Loc-RIB for each RIB-Tag (including | | the null RIB-Tag) that it supports. | | | | d) For each adjacent BIS, a given BIS maintains an Adj-RIB-Out for | | each RIB-Tag (including the null RIB-Tag) that it | | advertises to that neighbor. | | | | e) A given BIS maintains a separate FIB for each RIB-Tag | | (including the null RIB-Tag) that it supports - that is, each | | FIB corresponds to a Loc-RIB. | | | | To facilitate the forwarding process, a BIS can organize each of | | its FIBS into two conceptual parts: one containing information | | for NLRI located within its own RD, and another for NLRI located | | in other RDs (as in clause 8). For external NLRI, a BIS can | | further organize the FIB information based on whether the | | next-hop-BIS is located within its own RD or in another RD (see | | 8.4, items "a" and "b"). And finally, for those next-hop BISs | | located in its own RD, the local BIS can organize the | | information according to a specific forwarding mechanism (see | | 8.4, items "b1" and "b2"). | +----------------------------------------------------------------------+ 6.9.1. Internal neighbor BIS Each BIS establishes and maintains communications with all other BISs located in its routing domain. The identity of all BISs within a routing domain is contained in managed object internalBISNeighbors described in 8.3 6.9.2. External neighbor BIS Each BIS may establish and maintain communications with other BISs in adjacent routing domains. A BIS has no direct communications link with any BIS in another routing domain unless that RD is adjacent to it, as defined in 6 : that is, a BIS does not communicate directly with a BIS located in a different routing domain unless the pair of Yakov Rekhter, Paul Traina [Page 14] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 BISs are attached to at least one common subnetwork. The identity of neighbor BISs in adjacent routing domains is contained in managed object externalBISNeighbors described in 8.3 6.10. Design objectives The protocol described in this document was developed with a view towards satisfying certain design goals, while others specifically were not addressed by the mechanisms of this protocol. 6.10.1. Within the scope of the protocol This document supports the following design requirements: Control Transit through an RD: It provides mechanisms to permit a given routing domain to control the ability of NPDUs from other routing domains to transit through itself. Autonomous Operation: It provides stable operation in an internet where significant sections of the interworking environment will be controlled by disjoint entities. Distributed Information Bases: It does not require a centralized global repository for either routing information or policy information. Deliverability: It accepts and delivers NPDUs addressed to reachable routing domains and rejects NPDUs addressed to routing domains known to be unreachable. Adaptability: It adapts to topological changes between routing domains, but not to traffic changes. Promptness: It provides a period of adaptation to topological changes between domains that is a reasonable function of the maximum logical distance between any pair of routing domains participating in an instance of this protocol. Efficiency: It is efficient in the use of both processing resources and memory resources; it does not create excessive routing traffic overhead. Robustness: It recovers from transient errors such as lost or Yakov Rekhter, Paul Traina [Page 15] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 temporarily incorrect routing PDUs, and it tolerates imprecise parameter settings. Stability: It stabilizes in finite time to "good routes", as long as there are no continuous topological changes or corruptions of the routing and policy information bases. Heterogeneity: It is designed to operate correctly over a set of routing domains that may employ diverse intra-domain routing protocols. It is capable of running over a wide variety of subnetworks. Availability: It will not result in inability to calculate acceptable inter-domain paths when a single point of failure happens for a pairing of topology and policy that have a cut set greater than one. Fault isolation: It will provide fault isolation so that: - Problems within one routing domain will not affect intra- domain routing in any other routing domain - Problems within one routing domain will not affect inter- domain routing, unless they occur on internal inter-domain Links - Inter-domain routing will not adversely affect intra-domain routing. Scaling: It imposes no upper limit on the number of routing domains that can participate in a single instance of this protocol. Multiple Routing Administrations: It will accommodate inter-domain route calculations without regard to whether or not the participating routing domains are under control of one or several administrative authorities. 6.10.2. Outside the scope of the protocol The following requirements are not within the design scope of this protocol: Traffic Adaptation: It does not automatically modify routes based Yakov Rekhter, Paul Traina [Page 16] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 on the traffic load. Guaranteed delivery: It does not guarantee delivery of all offered NPDUs. Suppression of Transient Loops: Although it provides mechanisms to detect and suppress looping of routing information, it provides no mechanisms to detect or suppress transient looping of NPDUs. 7. Structure of BISPDUs In this document, the term BISPDU (Boundary IS PDU) is used as a general term to refer to any of the PDUs defined in this clause. Octets in a PDU are numbered starting from 1, in increasing order. Bits in an octet are numbered from 1 to 8, where bit 1 is the low- order bit and is pictured on the right. When consecutive octets are used to represent a number, the lower octet number has the most significant value. The more significant semi-octet of each pair of semi-octets in a given octet is encoded in the high order bit positions (8 through 5). Values are given in decimal, and all numeric fields are unsigned, unless otherwise noted. The types of PDUs used by this protocol are: - OPEN PDU - UPDATE PDU - IDRP ERROR PDU - KEEPALIVE PDU - CEASE PDU - RIB REFRESH PDU Yakov Rekhter, Paul Traina [Page 17] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 7.1. Header of BISPDU Each BISPDU has a fixed size header. There may or may not be a data portion following the header, depending on the PDU type. The layout of the header fields and their meanings are shown below: +-------------------------------------------------------------------+ | BISPDU Length (2 octets) | +-------------------------------------------------------------------+ | BISPDU Type (1 octet) | +-------------------------------------------------------------------+ | Sequence (4 octets) | +-------------------------------------------------------------------+ | Acknowledgement (4 octets) | +-------------------------------------------------------------------+ | Credit Offered (1 octet) | +-------------------------------------------------------------------+ | Credit Available (1 octet) | +-------------------------------------------------------------------+ | Validation Pattern (16 octets) | +-------------------------------------------------------------------+ The meaning and use of these fields are as follows: BISPDU Length: The BISPDU Length field is a 2 octet unsigned integer. It contains the total length in octets of this BISPDU, including both header and data portions. The value of the BISPDU Length field shall be at least MinBISPDULength octets, and no greater than the value carried in the Maximum_PDU_Size field of the OPEN PDU received from the remote BIS. Further, depending on the PDU type, there may be other constraints on the value of the Length field; for example, a KEEPALIVE PDU must have a length of exactly 30 octets. No padding after the end of BISPDU is allowed, so the value of the Length field must be the smallest value required given the rest of the BISPDU. BISPDU Type: The BISPDU Type field contains a one octet type code which identifies the specific type of the PDU. The supported type codes are: Yakov Rekhter, Paul Traina [Page 18] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +----------------------------------------------------+------------+ | BISPDU Type | Code | +----------------------------------------------------+------------+ | OPEN PDU | 1 | +----------------------------------------------------+------------+ | UPDATE PDU | 2 | +----------------------------------------------------+------------+ | IDRP ERROR PDU | 3 | +----------------------------------------------------+------------+ | KEEPALIVE PDU | 4 | +----------------------------------------------------+------------+ | CEASE PDU | 5 | +----------------------------------------------------+------------+ | RIB REFRESH PDU | 6 | +----------------------------------------------------+------------+ All other BISPDU type codes are reserved for future extensions. Sequence: The Sequence field contains a 4 octet unsigned integer that is the sequence number of this PDU. The procedures for generating sequence numbers for the various types of BISPDU are described in 8.7.4 Acknowledgement: The Acknowledgment field is a 4 octet unsigned integer that contains the sequence number of the PDU that the sender last received correctly and in sequence number order. Credit Offered: The contents of this field indicate the number of additional BISPDUs that the sender is willing to accept from the remote BIS; it is used by the flow control process described in 8.7.5 Credit Available: The contents of this field indicate the number of additional BISPDUs that the sender is able to send to the remote BIS; it is used by the flow control process described in 8.7.5 Validation Pattern: This 16-octet field is used to provide a validation function Yakov Rekhter, Paul Traina [Page 19] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 for the BISPDU. Depending upon the contents of the field "Authentication Code" of the OPEN PDU, this field can provide: - data integrity for the contents of the BISPDU (see 8.7.1 and 8.7.3 ), or - data integrity for the contents of the BISPDU plus authentication of the peer BIS (see 8.7.2 ). 7.2. OPEN PDU The OPEN PDU is used by a BIS for starting a BIS-BIS connection. The first PDU sent by either side is an OPEN PDU. The OPEN PDU contains a fixed header and the additional fields shown below: +-------------------------------------------------------------------+ | Fixed Header | +-------------------------------------------------------------------+ | Version (1 octet) | +-------------------------------------------------------------------+ | Hold Time (2 octets) | +-------------------------------------------------------------------+ | Maximum PDU Size (2 octets) | +-------------------------------------------------------------------+ | BIS-Identifier Length Indicator (1 octet) | +-------------------------------------------------------------------+ | BIS-Identifier (variable) | +-------------------------------------------------------------------+ | Source RDI Length Indicator (1 octet) | +-------------------------------------------------------------------+ | Source RDI (variable) | +-------------------------------------------------------------------+ | RIB-TagsSet (variable) | +-------------------------------------------------------------------+ | Confed-IDs (variable) | +-------------------------------------------------------------------+ | Optional Parameters Length (2 octets) | +-------------------------------------------------------------------+ | Optional Parameters (variable) | +-------------------------------------------------------------------+ Yakov Rekhter, Paul Traina [Page 20] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 The meaning and use of these fields are as follows: Version: This one octet field contains the version number of the protocol. Its value is currently 1. Hold Time: This 2-octet unsigned integer indicates the number of seconds that the sender proposes for the value of the Hold Timer. Upon receipt of an OPEN PDU, a BIS shall calculate the value of the Hold Timer by using the smaller of its configured Hold Time and the Hold Time received in the OPEN PDU. The Hold Time shall be either zero or at least three seconds. An implementation may reject connections on the basis of the Hold Time. The calculated value indicates the maximum number of seconds that may elapse between the receipt of successive KEEPALIVE, and/or UPDATE messages by the sender (see 8.6.1.4 and 8.18.5 ) Maximum PDU Size: This field contains a 2 octet unsigned integer that is the maximum number of octets that this BIS will accept in an incoming UPDATE PDU, IDRP ERROR PDU, or RIB REFRESH PDU. Independent of this value, every BIS shall accept KEEPALIVE PDUs or CEASE PDUs of length 30 octets; every BIS shall also accept any OPEN PDU with length less than or equal to 3000 octets. As a minimum, every BIS is required to support reception of all BISPDUs whose size is greater than or equal to MinBISPDULength octets and less than or equal to 1024 octets: that is, the minimum acceptable value for this field is 1024. BIS-Identifier Length Indicator: This one octet field contains the length in octets of the BIS- Identifier field. BIS-Identifier This field indicates the BIS Identifier of the sender. A given BIS sets the value of its BIS Identifier to an IP address assigned to that BIS speaker. The value of the BIS Identifier is determined on startup and is the same for every local interface and every BIS peer. Yakov Rekhter, Paul Traina [Page 21] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 Source RDI Length Indicator: This one octet field contains the length in octets of the Source RDI field. Source RDI: This variable length field contains the RDI of the routing domain in which the BIS that is sending this BISPDU is located. RIB-TagsSet: This variable length field contains a list of all RIB-Tags that the local BIS is willing to support when communicating with the neighbor BIS (that is, the BIS to which this OPEN PDU is being sent). It contains an encoding of all or part of the information contained in managed object RIBTagsSet (See clauses 8.3 and 8.10.1 ). A BIS is not required to list all of its supported RIB-Tags in an OPEN PDU that is sent to a neighbor BIS located in an adjacent routing domain. It must include only those RIB-Tags that correspond to Adj-RIBs-Out that the local BIS will use to communicate with the neighbor BIS, and those that correspond to the RIB-Tags of the Adj-RIBs-In that the local BIS supports for storing UPDATE PDUs received from that neighbor BIS. However, a BIS shall include all of the RIB-Tags listed in managed object RIBTagsSet in an OPEN PDU that is sent to another BIS located in its own routing domain. Failure to do so will result in an OPEN PDU error, as described in 8.18.2 The encoding of this field is as follows: +-----------------------------------------------------------------+ | Number of Non-null RIB-Tags (1 octet) | +-----------------------------------------------------------------+ | First RIB-Tag (1 octet) | +-----------------------------------------------------------------+ | .... | +-----------------------------------------------------------------+ | Last RIB-Tag (1 octet) | +-----------------------------------------------------------------+ The field Number of RIB-Tags is one octet long. It contains the total Yakov Rekhter, Paul Traina [Page 22] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 number of non-null RIB-Tags supported by this BIS. Since every BIS supports a null RIB-Tag (see clause 8.10.1 ), the null RIB-Tag shall not be listed in the OPEN PDU. If a BIS supports no RIB-Tags other than the null RIB-Tag, then the field Number of Non-empty RIB-Tags shall contain 0. If the Number of Non-null RIB-Tags is non-zero, then the BIS supports all of the listed RIB-Tags plus the null RIB-Tag. Confed-IDs: This is a variable length field which reports the RDIs of all RDCs that this BIS is a member of. The encoding of this field is as follows: The 1 octet field Number of RDCs gives the number of RDCs of which this BIS is a member. A value of zero indicates that this BIS participates in no RDCs. For each such confederation, the following fields give the length and RDI for each confederation. +-----------------------------------------------------------------+ | Number of RDCs (1 octet) | +-----------------------------------------------------------------+ | Length of First RDI (1 octet) | +-----------------------------------------------------------------+ | RDI of first RDC | +-----------------------------------------------------------------+ | .... | +-----------------------------------------------------------------+ | Length of Last RDI (1 octet) | +-----------------------------------------------------------------+ | RDI of last confederation | +-----------------------------------------------------------------+ Optional Parameters Length: This 2-octet unsigned integer indicates the total length of the Optional Parameters following this field in octets. If the value of this field is zero, no Optional Parameters are present. Optional Parameters: Yakov Rekhter, Paul Traina [Page 23] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 This field may contain a list of optional parameters, where each parameter is encoded as a vector. +-----------------------------------------------------------------+ | Parameter Flags (1 octet) | +-----------------------------------------------------------------+ | Parameter Type (1 octet) | +-----------------------------------------------------------------+ | Parameter Length (1 or 2 octets) | +-----------------------------------------------------------------+ | Parameter Value (variable) | +-----------------------------------------------------------------+ Parameter Flags is a one octet field. The high-order bit (bit 8) of the Parameter Flags octet is the Optional bit. If it is set to 1, the parameter is optional; if set to 0, the parameter is well-known. The second high-order bit (bit 7) of the Parameter Flags is the Extended Length bit. It defines whether the Parameter Length is one octet (if set to 0), or two octets (if set to 1). Extended Length may be used only if the length of the parameter is greater than 255 octets. Parameter Type is a one octet field that unambiguously identifies individual parameters. Parameter Length is a one or two octets field (depending on the value of the Extended Length in the Parameter Flags field) that contains the length of the Parameter Value field in octets. Parameter Value is a variable length field that is interpreted according to the value of the Parameter Type field. This document defines the following Optional Parameters: a) Authentication Information (Parameter Type 1): This well-known parameter may be used to authenticate a BIS peer. The Parameter Value field contains a 1-octet Authentication Code followed by a variable length Authentication Data. Authentication Code: This 1-octet unsigned integer indicates the authentication mechanism being used: Yakov Rekhter, Paul Traina [Page 24] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 a) Code 1 indicates that the Validation Pattern field in the header of each BISPDU contains an unencrypted checksum that provides data integrity for the contents of that BISPDU. Its use is described in 8.7.1 b) Code 2 indicates that the Validation Pattern field in the header of each BISPDU provides both peer-BIS authentication and data integrity for the contents of the BISPDU. The specific mechanism used to generate the validation pattern is mutually agreed to by the pair of BISs, but is not specified by this document. Its use is described in 8.7.2 c) Code 3 indicates that the Validation Pattern field in the header of each BISPDU contains an unencrypted checksum covering the concatenation of the contents of the BISPDU with untransmitted password string(s). Its use is defined in 8.7.3 Authentication Data: The form and meaning of this field is a variable-length field that depends on the Authentication Code. The length of the authentication data field can be determined from the Length field of the BISPDU header. Absence of any Authentication Information in an OPEN PDU shall be treated as if the PDU carries Authentication Information with Authentication Type 1 (see 8.7.1 ). 7.3. UPDATE PDU An UPDATE PDU is used to advertise feasible routes to a neighbor BIS, or to withdraw multiple unfeasible routes from service (see 6.6 ). An UPDATE PDU may simultaneously advertise multiple feasible routes and withdraw multiple unfeasible routes from service. The UPDATE PDU always includes the fixed header, Unfeasible Route Count field, and Total Path Length Attributes field; it can optionally contain the other fields: - if routes are being explicitly withdrawn from service, then the UNFEASIBLE ROUTE COUNT field will be non-zero, and the WITHDRAWN ROUTES fields will be present - if feasible routes are being advertised, then the TOTAL PATH ATTRIBUTES LENGTH field will be non-zero, and the PATH ATTRIBUTES Yakov Rekhter, Paul Traina [Page 25] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 and NLRI fields will be present. An UPDATE PDU can advertise multiple routes; a route is described by several path attributes, each of which is encoded as a 4-tuple. All path attributes contained in a given UPDATE PDU apply to the destinations carried in the Network Layer Reachability Information field of the UPDATE PDU. An UPDATE PDU can list multiple routes to be withdrawn from service. Each such route is identified by its NLRI, which unambiguously identifies the route in the context of the BIS-BIS connection in which it had been previously been advertised. An UPDATE PDU that is used only to withdraw routes from service (but not to advertise any feasible routes) will not include Path Attributes or NLRI. Conversely, if an UPDATE PDU does not withdraw any routes from service, the UNFEASIBLE ROUTE COUNT field will contain the value 0, and WITHDRAWN ROUTES field will not be present. The components of the UPDATE PDU are described below: +-------------------------------------------------------------------+ | Fixed Header | +-------------------------------------------------------------------+ | FIB Tag (1 octet) | +-------------------------------------------------------------------+ | Unfeasible Route Count (2 octets) | +-------------------------------------------------------------------+ | Withdrawn Routes (variable) | +-------------------------------------------------------------------+ | Total Path Attributes Length (2 octets) | +-------------------------------------------------------------------+ | Path Attributes (variable) | +-------------------------------------------------------------------+ | Network Layer Reachability Information (variable) | +-------------------------------------------------------------------+ The use of these fields is as follows: FIB Tag: This is a 1-octet long field that contains the FIB Tag associated with the routes carried in this UPDATE PDU. Yakov Rekhter, Paul Traina [Page 26] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 Unfeasible Route Count: This is a 2-octet long field that contains an unsigned integer whose value is equal to the number of routes that are included in the subsequent WITHDRAWN ROUTES field. A value of 0 indicates that no routes are being withdrawn from service, and that the WITHDRAWN ROUTES field is not present in this UPDATE PDU. Withdrawn Routes: This is a variable length field that contains a list of NLRIs for the routes that are being withdrawn from service. Each NLRI is encoded as specified in 7.3.2 withdrawn from service. Each such route is identified by its NLRI, which unambiguously identifies the route in the context of the BIS-BIS connection in which it had been previously been advertised. Total Path Attribute Length: This is a 2-octet long field that contains an unsigned integer whose value is the total length of all Path Attributes in the UPDATE PDU in octets. Path Attributes: A variable length sequence of path attributes is present in every UPDATE PDU that is used to advertise a feasible route. Network Layer Reachability Information: A variable length field that lists the destinations for the feasible routes that are being advertised in this UPDATE PDU. 7.3.1. Path attribute encoding Each path attribute is a 4-tuple of variable length - . The elements are used as follows: - Flag: The first element of each attribute is a one octet field: - The high-order bit (bit 8) of the attribute flags octet is the Optional bit. If it is set to 1, the attribute is optional; if set to 0, the attribute is well-known. Yakov Rekhter, Paul Traina [Page 27] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 - The second high-order bit (bit 7) of the attribute flags octet is the Transitive bit. It defines whether an optional attribute is transitive (if set to 1) or non-transitive (if set to 0). For well-known attributes, the Transitive bit shall be set to 1. - The third high-order bit (bit 6) of the attribute flags octet is the Partial bit. It defines whether the optional transitive attribute is partial (if set to 1) or complete (if set to 0). For well-known attributes and for optional non-transitive attributes the Partial bit shall be set to 0. - The lower order five bits (1 through 5) of the attribute flags octet are reserved. They shall be transmitted as 0 and shall be ignored when received. - Type: The second element of each attribute is a one octet field which contains the type code for the attribute. Currently defined attribute type codes are discussed in clause 8.11 Note 4: It is not the intention of this document to define globally understood path attributes for type codes greater than value 128. Such codes are reserved for local use. - Length: The third field of each path attribute is 2 octets in length; it contains the length in octets of the immediately following Value field. - Value: The remaining octets of each path attribute field contain the value of the attribute, which is interpreted according to the attribute flags and the attribute type code. The supported attribute values and their encodings are defined below. 7.3.1.1. LOCAL_PREF (Type Code 1) LOCAL_PREF is a well-known discretionary attribute that is a four octet non-negative integer. It is used by a BIS to inform other BISs in its own RD of the originating BIS's degree of preference for an advertised route. Usage of this attribute is described in 7.3.1.1 Yakov Rekhter, Paul Traina [Page 28] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 7.3.1.2. INCOMPLETE_PATH (Type Code 2) INCOMPLETE_PATH is a well-known discretionary attribute that has a length of zero octets; its presence indicates that some (or all) of the path attributes or Network Layer Reachability Information contained in this UPDATE PDU have been obtained by methods not specified by IDRP. Conversely, its absence indicates that all path attributes and NLRI have been learned by methods defined within IDRP. Its usage is defined in 7.3.1.2 7.3.1.3. RD_PATH (Type Code 3) The RD_PATH attribute is a well-known mandatory attribute composed of a series of RD path segments. Each path segment is represented by a triple . The path segment type is a 1-octet long field, with the following values defined: +------------------------------------------------------+------------+ | Segment Type | Value | +------------------------------------------------------+------------+ | RD_SET | 1 | +------------------------------------------------------+------------+ | RD_SEQ | 2 | +------------------------------------------------------+------------+ | ENTRY_SEQ | 3 | +------------------------------------------------------+------------+ | ENTRY_SET | 4 | +------------------------------------------------------+------------+ An RD_SEQ and a ENTRY_SEQ provide a list of RDIs, for routing domains or for confederations respectively, in the order that the routing information has travelled through them. An RD_SET and an ENTRY_SET provide an unordered list of RDIs, for routing domains or for confederations respectively; the routing information has not necessarily travelled through all of the listed domains or confederations. The path segment length is a two octet field containing the length in octets of the path segment value field. The path segment value field contains one or more 2-tuples . Length is a one octet long field that contains the length of the RDI in octets. Yakov Rekhter, Paul Traina [Page 29] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 Usage of this attribute is defined in clause 7.3.1.3 7.3.1.4. NEXT_HOP (Type Code 4) This is a well-known discretionary attribute that can be used for two principal purposes: a) to permit a BIS to advertise a different BIS's IP address in the "Network Address of Next Hop" field b) to allow a given BIS to report some or all of the SNPAs that exist within the local system It is encoded as shown below: +-------------------------------------------------------------------+ | Address Family (2 octets) | +-------------------------------------------------------------------+ | Length of Network Address (1 octet) | +-------------------------------------------------------------------+ | Network Address of Next Hop (variable) | +-------------------------------------------------------------------+ | Number of SNPAs (1 octet) | +-------------------------------------------------------------------+ | Length of first SNPA(1 octet) | +-------------------------------------------------------------------+ | First SNPA (variable) | +-------------------------------------------------------------------+ | Length of second SNPA (1 octet) | +-------------------------------------------------------------------+ | Second SNPA (variable) | +-------------------------------------------------------------------+ | ... | +-------------------------------------------------------------------+ | Length of Last SNPA (1 octet) | +-------------------------------------------------------------------+ | Last SNPA (variable) | +-------------------------------------------------------------------+ The use and meaning of these fields are as follows: Address Family Yakov Rekhter, Paul Traina [Page 30] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 This field carries the identity of the protocol associated with the address information that follows. Presently defined values for this field are specified in RFC1700. A conformant implementation of IDRP for IPv6 may ignore any address information indicating other than IPv6. A conformant implementation of IDRP for IPv4 may ignore any address information indicating other than IPv4. Length of Network Address: A 1 octet field whose value expresses the length of the "Network Address of Next Hop" field as measured in octets Network Address of Next Hop: A variable length field that contains the Network Address of the next BIS on the path to the destination system An implementation of IDRP for IPv4 or IPv6 shall have the following values in the Network Address field: IPv6: Length of Network Address: 16 Network Address of Next Hop: an IPv6 unicast address IPv4: Length of Network Address: 4 Network Address of Next Hop: an IPv4 unicast address Number of SNPAs: A 1 octet field which contains the number of distinct SNPAs to be listed in the following fields. The value 0 may be used to indicate that no SNPAs are listed in this attribute. Length of Nth SNPA: A 1 octet field whose value expresses the length of the "Nth SNPA of Next Hop" field as measured in semi-octets Nth SNPA of Next Hop: A variable length field that contains an SNPA of the BIS whose Yakov Rekhter, Paul Traina [Page 31] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 Network Address is contained in the "Network Address of Next Hop" field. The field length is an integral number of octets in length, namely the rounded-up integer value of one half the SNPA length expressed in semi-octets; if the SNPA has an contains an odd number of semi-octets, a value in this field will be padded with a trailing all-zero semi-octet. Usage of this attribute is defined in clause 7.3.1.4 7.3.1.5. AGGREGATOR (Type Code 5) AGGREGATOR is an optional transitive attribute of length 32. The attribute contains the last RDI that formed the aggregate route (encoded as 16 octets), followed by the IP address of the BIS that formed the aggregate route (encoded as 16 octets, IPv4 addresses are prefixed with 12 octets of zeros). The BIS that formed the aggregate route may decline to encode its address and instead insert a value of all zeros into that field. Usage of this attribute is defined in clause 7.3.1.5 7.3.1.6. ATOMIC_AGGREGATE (Type Code 6) ATOMIC_AGGREGATE is a well-known discretionary attribute of length 0. It is used by a BIS to inform other BISs that the local system selected for advertisement a less specific route without selecting a more specific route which is included in it. Usage of this attribute is defined in clause 7.3.1.6 7.3.1.7. MULTI-EXIT_DISC (Type Code 7) MULTI-EXIT_DISC (Multi-exit Discriminator) is an optional non- transitive attribute that is a 4 octets non-negative integer. The value of this attribute may be used by a BIS's decision process to discriminate between multiple exit points to an adjacent routing domain. Its usage is defined in 7.3.1.7 Yakov Rekhter, Paul Traina [Page 32] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 7.3.1.8. RD_HOP_COUNT (Type Code 13) The RD_HOP_COUNT is a well-known mandatory attribute that contains a 1 octet long field. It contains an unsigned integer that is the upper bound on the number of routing domains through which this UPDATE PDU has travelled. Its usage is defined in 7.3.1.8 7.3.1.9. CAPACITY (Type code 15) CAPACITY is a well-known mandatory attribute that has a length of 1 octet, and is used to denote the relative capacity of the RD_PATH for handling traffic. High values indicate a lower traffic handling capacity than do low values. Its usage is defined in 7.3.1.9 7.3.1.10. COMMUNITIES (Type Code 16) COMMUNITIES is an optional transitive attribute of variable length. The attribute is a tuple , where the first component specifies a particular community, and the second component specifies the scope within which the community is defined. All routes with this attribute belong to the communities listed in the attribute. Communities are treated as 32 bit values, however for administrative assignment, the following presumptions may be made: communities values ranging from 0x0000000 through 0x0000FFFF and 0xFFFF0000 through 0xFFFFFFFF are hereby reserved. Scope of a community is either an RD, or RDC. Scope is specified by an RDI of the associated RD or RDC, encoded as a tuple . Length is a one octet long field that contains the length of the RDI in octets. The following communities have global significance and their operations shall be implemented in any community-attribute-aware BGP speaker. NO_EXPORT (0xFFFFFF01) All routes received carrying a communities attribute containing this value MUST NOT be advertised outside an RDC (as specified in the Scope component of the attribute) boundary. A stand- alone RD that is not part of an RDC should be considered an RDC itself). Yakov Rekhter, Paul Traina [Page 33] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 NO_ADVERTISE (0xFFFFFF02) All routes received carrying a communities attribute containing this value MUST NOT be advertised to other BISs. NO_EXPORT_SUBCONFED (0xFFFFFF03) All routes received carrying a communities attribute containing this value MUST NOT be advertised to external neighbors. 7.3.2. Network layer reachability information The Network Layer Reachability information is a variable length field that contains a list of reachable destinations encoded as zero or more triples of the form
, whose fields are described below: +---------------------------+ | Address Family (2 octets) | +---------------------------+ | Addr_length (2 octets) | +---------------------------+ | Addr_info (variable) | +---------------------------+ The use and meaning of these fields are as follows: Address Family: This field carries the identity of the protocol associated with the address information that follows. Presently defined values for this field are specified in RFC1700. A conformant implementation of IDRP for IPv6 may ignore any address information indicating other than IPv6. A conformant implementation of IDRP for IPv4 may ignore any address information indicating other than IPv4. Addr_Length: This field specifies the total length in octets of the address information that follows. Yakov Rekhter, Paul Traina [Page 34] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 Addr_Info: This is a variable length field that contains a list of Network Layer address prefixes for the routes that are being advertised. Each address prefix is encoded as a 2-tuple of the form , whose fields are described below: +---------------------------+ | Length (1 octet) | +---------------------------+ | Prefix (variable) | +---------------------------+ The use and the meaning of these fields are as follows: a) Length: The Length field indicates the length in bits of the address prefix. A length of zero indicates a prefix that matches all (as specified by the address family) addresses (with prefix, itself, of zero octets). b) Prefix: The Prefix field contains address prefixes followed by enough trailing bits to make the end of the field fall on an octet boundary. Note that the value of trailing bits is irrelevant. 7.4. IDRP ERROR PDU IDRP ERROR PDUs report error conditions which have been detected by the local BIS. In addition to its fixed header, the IDRP ERROR PDU contains the following fields: The use of these fields is as follows: Error code: The Error code field is 1 octet long, and shall be present in every IDRP ERROR PDU. It describes the type of error. The following error codes are defined: All errors are fatal to the BIS connection. Yakov Rekhter, Paul Traina [Page 35] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +-------------------------------------------------------------------+ | Fixed Header | +-------------------------------------------------------------------+ | Error Code (1 octet) | +-------------------------------------------------------------------+ | Error Subcode (1 octet) | +-------------------------------------------------------------------+ | Data (variable) | +-------------------------------------------------------------------+ +----------------------------------------------------+------------+ | Error Code | Value | +----------------------------------------------------+------------+ | OPEN_PDU_Error | 1 | +----------------------------------------------------+------------+ | UPDATE_PDU_Error | 2 | +----------------------------------------------------+------------+ | Hold_Timer_Expired | 3 | +----------------------------------------------------+------------+ | FSM_Error | 4 | +----------------------------------------------------+------------+ | RIB_REFRESH_PDU_Error | 5 | +----------------------------------------------------+------------+ Error subcode: The Error subcode is one octet long, and shall be present in every IDRP ERROR PDU. The error subcode provides more specific information about the nature of the reported error. A given IDRP ERROR PDU may report only one error subcode for the indicated error code. The supported error subcodes are as follows: a) OPEN PDU Error subcodes: b) UPDATE PDU Error subcodes: c) Hold_Timer_Expired Error Subcodes: d) FSM_Error Error Subcodes: When an FSM Error (see 8.6.1 ) has occurred, the first semi- octet of the error subcode carries the type number of the BISPDU that should not have been received and the second semi- octet encodes the state that the FSM was in when the reception took place. The BISPDU type numbers are defined in 7.1 ; the Yakov Rekhter, Paul Traina [Page 36] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +-------------------------------------------------+-----------+ | Subcode | Value | +-------------------------------------------------+-----------+ | Unsupported_Version_Number | 1 | +-------------------------------------------------+-----------+ | Bad_Max_PDU_Size | 2 | +-------------------------------------------------+-----------+ | Bad_Peer_RD | 3 | +-------------------------------------------------+-----------+ | Unsupported_Authentication_code | 4 | +-------------------------------------------------+-----------+ | Authentication_Failure | 5 | +-------------------------------------------------+-----------+ | Bad_RIB-TagsSet | 6 | +-------------------------------------------------+-----------+ | RDC_Mismatch | 7 | +-------------------------------------------------+-----------+ | Unacceptable Hold Time | 8 | +-------------------------------------------------+-----------+ | Unsupported well-known parameter | 9 | +-------------------------------------------------+-----------+ FSM states are encoded as follows: +-------------------------------------------------+-----------+ | FSM State | Encoded | | | Value | +-------------------------------------------------+-----------+ | CLOSED | 1 | +-------------------------------------------------+-----------+ | OPEN-RCVD | 2 | +-------------------------------------------------+-----------+ | OPEN-SENT | 3 | +-------------------------------------------------+-----------+ | CLOSE-WAIT | 4 | +-------------------------------------------------+-----------+ | ESTABLISHED | 5 | +-------------------------------------------------+-----------+ e) RIB_REFRESH_PDU_Error Subcodes: +-------------------------------------------------+-----------+ | Subcode | Value | +-------------------------------------------------+-----------+ | Invalid_OpCode | 1 | +-------------------------------------------------+-----------+ | Unsupported_RIB-Tags | 2 | +-------------------------------------------------+-----------+ Yakov Rekhter, Paul Traina [Page 37] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +-------------------------------------------------+-----------+ | Subcode | Value | +-------------------------------------------------+-----------+ | Malformed_Attribute_List | 1 | +-------------------------------------------------+-----------+ | Unrecognized_Well-known_Attribute | 2 | +-------------------------------------------------+-----------+ | Missing_Well-known_Attribute | 3 | +-------------------------------------------------+-----------+ | Attribute_Flags_Error | 4 | +-------------------------------------------------+-----------+ | Attribute_Length_Error | 5 | +-------------------------------------------------+-----------+ | RD_Routing_Loop | 6 | +-------------------------------------------------+-----------+ | Invalid_NEXT_HOP_Attribute | 7 | +-------------------------------------------------+-----------+ | Optional_Attribute_error | 8 | +-------------------------------------------------+-----------+ | Invalid_Reachability_Information | 9 | +-------------------------------------------------+-----------+ | Misconfigured_RDCs | 10 | +-------------------------------------------------+-----------+ | Malformed_NLRI | 11 | +-------------------------------------------------+-----------+ | Duplicated_Attributes | 12 | +-------------------------------------------------+-----------+ | Illegal_RD_Path_Segment | 13 | +-------------------------------------------------+-----------+ +-------------------------------------------------+-----------+ | Subcode | Value | +-------------------------------------------------+-----------+ | NULL | 0 | +-------------------------------------------------+-----------+ Data: This variable length field contains zero or more octets of data to be used in diagnosing the reason for the IDRP ERROR PDU. The contents of the Data field depends upon the error code and error subcode. Note that the length of the Data field can be determined from the Length field of the BISPDU header. The minimum length of the IDRP ERROR PDU is 32 octets, including BISPDU header. Yakov Rekhter, Paul Traina [Page 38] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 7.5. KEEPALIVE PDU A KEEPALIVE PDU consists of only a PDU header and has a length of 30 octets. A BIS can use the periodic exchange of KEEPALIVE PDUs with an adjacent BIS to verify that the peer BIS is reachable and active. KEEPALIVE PDUs are exchanged often enough as not to cause the hold time advertised in the OPEN PDU to expire. A reasonable maximum time between KEEPALIVE PDUs would be one third of the Hold Time interval. An implementation may adjust the rate at which it sends KEEPALIVE PDUs as a function of the Hold Time interval. If the negotiated Hold Time interval is zero, then periodic KEEPALIVE PDUs shall not be sent. A KEEPALIVE PDU may be sent asynchronously to acknowledge the receipt of other BISPDUs. Sending a KEEPALIVE PDU does not cause the sender's sequence number to be incremented. 7.6. CEASE PDU A CEASE PDU consists of only a PDU header and has length of 30 octets. A CEASE PDU is used by the originating BIS to instruct the peer BIS to close the BIS-BIS connection. Receipt of a CEASE PDU will cause the BIS to close down the connection with the BIS that issued it, as described in 8.6.2 7.7. RIB REFRESH PDU The RIB REFRESH PDU is used to allow a BIS to send a refresh of the routing information in an Adj-RIB-Out to a neighbor BIS, or to solicit a neighbor BIS to send a refresh of its Adj-RIB-Out to the local BIS. The RIB REFRESH PDU contains a fixed header and also the additional fields shown below: The use and meaning of these fields is as follows: There are three OpCode values defined: Yakov Rekhter, Paul Traina [Page 39] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +-------------------------------------------------------------------+ | Fixed Header | +-------------------------------------------------------------------+ | OpCode (1 octet) | +-------------------------------------------------------------------+ | RIB-Tags (variable) | +-------------------------------------------------------------------+ +------------+------------------------------------------------------+ | Code | Operation | +------------+------------------------------------------------------+ | 1 | RIB Refresh Request | +------------+------------------------------------------------------+ | 2 | RIB Refresh Start | +------------+------------------------------------------------------+ | 3 | RIB Refresh End | +------------+------------------------------------------------------+ The RIB-Tags field contains the RIB-Tags of the Adj-RIB-In for which a refresh is being requested. This field is encoded in the same way that the RIB-TagsSet field of the OPEN PDU is encoded. Its usage is defined in 8.10.2 8. Elements of procedure This clause explains the elements of procedure used by the protocol specified in this document; it also describes the naming conventions and system deployment practices assumed by this protocol. 8.1. Naming and addressing conventions IDRP for IPv4 and IPv6 does not assume or require any particular structure for IP addresses. That is, as long as the domain administrator assigns addresses that are consistent with the deployment constraints of section 7 of this document, the protocol will operate correctly. IP address prefixes provide a compact way for identifying groups of systems that reside in a given domain or confederation. A prefix may have a length that is either smaller than, or the same size as the IP address (an IPv4 or IPv6 address is a special case of an address prefix). The length of an encoded prefix is specified in bits. Yakov Rekhter, Paul Traina [Page 40] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 Each routing domain and routing domain confederation whose BIS(s) implement IDRP for IPv4 and IPv6 shall have an unambiguous routing domain identifier (RDI), which is an IPv4 or IPv6 address prefix. In the case of IPv4 address prefixes, the prefix value shall be prepended with 12 octets of zeros. An RDI is assigned statically and does not change based on the operational status of a routing domain. An RDI identifies routing domain or confederation uniquely, but does not necessarily convey any information about policies or identities of its members. 8.2. Deployment guidelines Hosts and routers may use any IP unicast addresses, provided that these addresses are globally unambiguous. However correct and efficient operation of this protocol can only be guaranteed if the address assignment reflects the actual topology -- addresses are topologically significant. 8.3. Domain configuration information Correct Operation of IDRP assumes that a minimum amount of information is available to both the inter-domain and intra-domain routing protocols. This information is static in nature, and is not expected to change frequently. This document assumes that this information is supplied via IDRP MIB. While the following in phrased in terms of MIB, this document allows alternative mechanisms (e.g. configuration files) as well. The information required by a BIS that implements the IDRP for IPv4 and IPv6 protocol is: a) Location and identity of adjacent Intra-Domain routers: The MIB table IntraIS lists the IP addresses of the routers to which the local BIS may deliver an inbound NPDU whose destination lies within the BIS's routing domain. These routers listed in the IntraIS table support the intra-domain routing protocol of this domain, and share at least one common subnet with the BIS. In particular, if the local BIS participates in both the inter-domain routing protocol (IDRP) and the intra-domain routing protocol, then the IP address of the local BIS will be Yakov Rekhter, Paul Traina [Page 41] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 listed in the IntraIS table. b) Location and identity of BISs in the BIS's domain: This information permits a BIS to identify all other BISs located within its routing domain. This information is contained in the MIB table internalBISNeighbors, which contains a set of IP addresses which identify the BISs in the domain. c) Location and identity of BISs in adjacent domains: Each BIS needs information to identify the IP address of each BIS located in an adjacent RD and reachable via a single subnetwork hop. This information is contained in the IDRP MIB table externalBISNeighbors, which is a table of IP addresses. d) IP network address information for all systems in the routing domain: This information is used by the BIS to construct its network layer reachability information. This information is contained in the MIB table internalSystems, which lists NLRI (expressed as address prefixes) of the systems within the routing domain. e) Local RDI: This information is contained in managed object localRDI; it is the RDI of the routing domain in which the BIS is located. f) RDC-Config: This information identifies all the routing domain confederations (RDCs) to which the RD of the local BIS belongs, and it describes the nesting relationships that are in force between them. It is contained in the MIB table rdcConfig. Note that since a domain is not required to belong to a confederation this information is optional and needs to be present only at BISs of the domains that are part of one or more of RDCs. g) RIBTagsSet This managed object lists all of the RIB-Tags which are supported by the BISs located in this routing domain. Yakov Rekhter, Paul Traina [Page 42] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 8.4. Advertising NLRI The NLRI field in an UPDATE PDU contains information about the addresses of systems that reside within a given routing domain or whose Network Layer addresses are under control of the administrator of that routing domain; it should not be used to convey information about the operational status of these systems. The information in the NLRI field is intended to convey static administrative information rather than dynamic transient information: for example, it is not necessary to report that a given system has changed its status from online to offline. The following guidelines for inclusion of Network Layer address prefixes in the NLRI field of UPDATE PDUs originated within a given routing domain will provide efficient operation of this protocol: a) Network Layer addresses that are within the control of the administrator of a given routing domain may be reported in the NLRI field for that routing domain. The Network Layer address prefixes can provide information about systems that are online, systems that are offline, or unallocated Network Layer addresses. The ability to include unallocated Network Layer addresses and Network Layer addresses of offline systems in the NLRI allows a routing domain administrator to advertise compact prefixes, thus minimizing the amount of data carried in the BISPDUs. b) Network Layer addresses that are known to correspond to systems that are not under control of the routing domain administrator should not be included in the NLRI field for that routing domain. c) For efficient operation of this protocol, the WITHDRAWN ROUTES field should not be used to report the NLRI of systems in the local routing domain that are offline. This field should be used only to advertise Network Layer address prefixes that are no longer under control of the administrator of the local routing domain, regardless of whether such systems are online or offline. Note 9: Although the protocol in this document will operate correctly if each system is reported individually as a maximum-length Network Layer address prefix, this will result in a large amount of routing information, and hence can result in inefficient operation of this protocol. This protocol provides no means to verify that the preceding guidelines are followed. However, it is within the prerogative of the administrator of any routing domain to implement policies that ignore UPDATE PDUs that contain an excessive amount of NLRI information or that can cause inefficient Yakov Rekhter, Paul Traina [Page 43] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 operation of this protocol. 8.5. An interface to IP IDRP information is carried between a pair of BISs in the form of BISPDUs. For IDRP for IPv6 these BISPDUs are carried in the data field of IP packets of protocol type 45. IDRP relies on IP to perform the initial processing of incoming BISPDUs. The IP protocol machine shall process inbound packets according to the appropriate IP functions. If a fixed header of an IP packet contains a protocol type that identifies IDRP, and the packet's source address identifies any system listed in managed objects internalBISNeighbors or externalBISNeighbors, then the packet contains a BISPDU. The BISPDU shall be passed to the IDRP finite state machine defined in 8.6.1 8.6. BIS-BIS connection management The protocol described in this document relies on the underlying Network layer service to establish a full-duplex communications channel between each pair of BISs. 8.6.1. BIS finite state machines A BIS shall maintain one finite state machine (FSM) for each BIS-BIS connection that it supports, and each FSM in a given BIS shall run independently of one another. A BIS-BIS connection will progress through a series of states during its lifetime, which are summarized in the state table shown in Table 2. BISPDUs passed to this finite state machine are subject the flow control procedures of 8.7.5 if the FSM is in the ESTABLISHED state. When the FSM is in the ESTABLISHED state, only BISPDUs that are not discarded by the flow control process are processed by the FSM. In all other states, all BISPDUs are processed directly by the finite state machine without being subject to flow control procedures. In describing the FSM transitions in response to receipt of BISPDUs, Yakov Rekhter, Paul Traina [Page 44] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +--- Notation Warning ----------------------------------------------+ | | | To create a readable table within the bounds of a flat ASCII file | | using a monospaced font at 10 characters/inch, the following | | abbreviated notation is used within the table: | | | | start activate | | | | stop deactivate | | | | CLSD CLOSED | | | | OPRC OPEN-RCVD | | | | OPSN OPEN-SENT | | | | CLWT CLOSED-WAIT | | | | ESTB ESTABLISHED | | | | KPALV KEEPALIVE | | | | ClWtD CloseWaitDelay | | | | LFO ListenForOPEN | | | +-------------------------------------------------------------------+ the following shorthand notation is used: a) Receive with no errors means that the none of the error conditions defined in the appropriate subclause of 8.18 have been detected. b) Receive with errors means that an error condition defined in the appropriate subclause of 8.18 has been detected. It is possible to receive a BISPDU which is properly formed, but which normally should not be received while the FSM is in the given state. Such an event constitutes an FSM Error. If an FSM Error can occur for a given state, it is shown in the description of that state. Yakov Rekhter, Paul Traina [Page 45] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 8.6.1.1. CLOSED State Initially the BIS Finite State Machine is in the CLOSED state. The CLOSED state exists when no BIS-BIS connection exists and there is no connection record allocated. While in the CLOSED state, the BIS shall take the following actions: a) If the BIS receives a deactivate, no action shall be taken and the FSM shall remain in the CLOSED state. b) If the FSM receives an activate, the local BIS shall shall generate an initial sequence number (see 8.7.4 ), and shall send an OPEN PDU to the remote BIS. The sequence field of the OPEN PDU shall contain the Initial Sequence Number (ISN); the Acknowledgement and Credit Available fields shall contain the value 0; and the Credit Offered field shall contain the initial flow control credit. The FSM shall enter the OPEN-SENT state. c) If the managed object ListenForOPEN is TRUE, and the BIS receives an OPEN PDU with no errors, then the local BIS shall generate an initial sequence number (see 8.7.4 ) and shall send an OPEN PDU to the remote BIS. The sequence field of the OPEN PDU shall contain the Initial Sequence Number (ISN), the Acknowledgement field shall acknowledge the received OPEN PDU, the Credit Available field shall be set according to the procedures of 8.7.5 (b), and the Credit Offered field shall contain the initial flow control credit. The FSM shall then change its state to OPEN_RCVD. d) If the managed object ListenForOPEN is TRUE and the BIS receives any BISPDU other than an OPEN PDU with no errors, or if the managed object ListenForOPEN is FALSE and the BIS receives any BISPDU, with or without errors, the BIS shall ignore the BISPDU and the FSM shall remain in the CLOSED state. 8.6.1.2. OPEN-SENT State While in the OPEN-SENT state, the BIS shall take the following actions: a) If the FSM receives an activate, the BIS shall ignore it, and the FSM shall remain in the OPEN-SENT state. b) If the FSM receives a deactivate, the BIS shall send a CEASE Yakov Rekhter, Paul Traina [Page 46] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 PDU to its peer, and the FSM shall enter the CLOSE-WAIT state. c) If the BIS receives a BISPDU with header errors (see 8.18.1 ), it shall log the error event, and the FSM shall remain in the OPEN-SENT state. d) If the BIS receives an OPEN PDU with errors (see 8.18.2 ), it shall send an IDRP ERROR PDU to the adjacent BIS, acknowledging the remote BIS's OPEN PDU. The FSM shall then enter the CLOSE-WAIT state. e) If the BIS receives an OPEN PDU with no errors that does not acknowledge its own previously sent OPEN PDU, then the local BIS shall resend its own OPEN PDU with the same sequence number and with an acknowledgement of the remote BIS's OPEN PDU. The value of the Credit Available field shall be set according to the procedures of 8.7.5 (b). The FSM shall then change its state to OPEN-RCVD. f) If the BIS receives an OPEN PDU with no errors that acknowledges its own previously sent OPEN PDU, the local BIS shall send a KEEPALIVE, RIB REFRESH, or UPDATE PDU that acknowledges the OPEN PDU received from the remote BIS. The FSM shall then enter the ESTABLISHED state. g) If the BIS receives an IDRP ERROR PDU, either with or without error, it shall send a CEASE PDU, and the FSM shall change its state to CLOSED. h) If the BIS receives a RIB REFRESH PDU or UPDATE PDU, either with or without errors, it shall issue an IDRP ERROR PDU, indicating "FSM Error". The FSM shall then enter the CLOSE-WAIT state. i) If the BIS receives a KEEPALIVE PDU, it shall issue an IDRP ERROR PDU, indicating "FSM Error". The FSM shall then enter the CLOSE-WAIT state. j) If the BIS receives a CEASE PDU, it shall issue a CEASE PDU in return, and then the FSM shall enter the CLOSED state. k) If the BIS does not receive an OPEN PDU within a period t[ R] after sending an OPEN PDU, the BIS shall resend the OPEN PDU. If the OPEN PDU is retransmitted n times, the local BIS shall issue a deactivate to close the BIS-BIS connection. Note 10: The value t[R] should be chosen to be large enough so that attempting to establish a connection to an unresponsive Yakov Rekhter, Paul Traina [Page 47] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 peer BIS does not consume significant network resources. The values of t[R] and n must be chosen so that * n is greater than the architectural constant CloseWaitDelay. 8.6.1.3. OPEN-RCVD State While in the OPEN-RCVD state, the BIS shall take the following actions: a) If the BIS receives an activate, it shall ignore it and the FSM shall remain in the OPEN-RCVD state. b) If the BIS receives a deactivate, it shall send a CEASE PDU to the remote BIS, and the FSM shall enter the CLOSE-WAIT state. c) If the BIS receives a BISPDU with a header error, it shall log the error event, and the FSM shall remain in the OPEN-RCVD state. d) If the BIS receives a KEEPALIVE PDU that acknowledges its previously sent OPEN PDU, then the FSM shall enter the ESTABLISHED state. e) If the BIS receives a KEEPALIVE PDU that does not acknowledge its previously sent OPEN PDU, the BIS shall issue an IDRP ERROR PDU indicating "FSM Error", and the FSM shall change its state to CLOSE-WAIT. f) If the BIS receives a CEASE PDU, it shall issue a CEASE PDU in return, and then the FSM shall enter the CLOSED state. g) If the BIS receives an OPEN PDU with no errors from the remote BIS that acknowledges the local BIS's previously sent OPEN PDU, the BIS shall send a KEEPALIVE, RIB REFRESH, or UPDATE PDU that acknowledges the OPEN PDU received from the remote BIS. The FSM shall then enter the ESTABLISHED state. h) If the BIS receives an OPEN PDU with no errors that does not acknowledge the local BIS's previously sent OPEN PDU, then the local BIS shall resend its own OPEN PDU with the same sequence number, and shall also include an acknowledgement of the remote BIS's OPEN PDU. The FSM shall remain in the OPEN-RCVD state. i) If the BIS receives an UPDATE PDU or RIB REFRESH PDU with errors, theBIS shall send an IDRP ERROR PDU to the remote BIS, and the FSM shall enter the CLOSE-WAIT state. Yakov Rekhter, Paul Traina [Page 48] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 j) If the BIS receives an UPDATE PDU or RIB REFRESH PDU with no errors that acknowledges the OPEN PDU previously sent by the local BIS, the FSM shall enter the ESTABLISHED state. k) If the BIS receives an UPDATE PDU or RIB REFRESH PDU with no errors that does not acknowledge the OPEN PDU previously sent by the local BIS, the BIS shall issue an IDRP ERROR PDU indicating "FSM Error", and the FSM shall change its state to CLOSE-WAIT. l) If the BIS receives an IDRP ERROR PDU, either with or without errors, it shall send a CEASE PDU to the remote BIS, and the FSM shall enter the CLOSED state. m) If the BIS does not exit the OPEN-RCVD state within a period t[R] after sending an OPEN PDU, the BIS shall resend the OPEN PDU. If the OPEN PDU is transmitted n times, the local BIS shall issue a deactivate. Note 11: The value t[R] should be chosen to be large enough so that attempting to establish a connection to an unresponsive peer BIS does not consume significant network resources. The values of t[R] and n must be chosen so that * n is greater than the architectural constant CloseWaitDelay. 8.6.1.4. ESTABLISHED State The ESTABLISHED state is entered from either the OPEN-SENT or the OPEN-RCVD states. It is entered when a connection has been established by the successful exchange of state information between two sides of the connection. Each side has exchanged and received Yakov Rekhter, Paul Traina [Page 49] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +----------------------------------------------------------------------+ | Table 2 (Page 1 of 7). BIS Finite State Machine. This table | | summarizes the effects that its inputs will | | have on an IDRP FSM, giving both state | | transitions and the actions to be taken. | +--------+-----------+-------------+-------------+----------+----------+ | STATE | CLSD | OPRC | OPSN | CLWT | ESTB | | > | | | | | | +--------+ | | | | | | INPUT | | | | | | | V | | | | | | +--------+-----------+-------------+-------------+----------+----------+ | start | S=OPSN | S=OPRC | S=OPSN | S=CLWT | S=ESTB | | | A=send | A=none | A=none | A=none | A=none | | | OPEN PDU | | | | | +--------+-----------+-------------+-------------+----------+----------+ | stop | S=CLSD | S=CLWT | S=CLWT | S=CLWT | S=CLWT | | | A=none | A=send | A=send | A=none | A=send | | | | CEASE PDU | CEASE PDU | | CEASE | | | | | | | PDU | +--------+-----------+-------------+-------------+----------+----------+ | Expiry | S=CLSD | S=OPRC | S=OPSN | S=CLSD | S=ESTB | | of | A=none | A=none | A=none | A=7.6.2 | A=none | | ClWtD | | | | | | | Timer | | | | | | +--------+-----------+-------------+-------------+----------+----------+ such data as initial sequence number, maximum PDU size, credit offered, protocol version number, hold time, and RDI of the other side. In addition, the remote BIS may also have been authenticated. In ESTABLISHED state, both BISs that are involved in the connection may exchange UPDATE PDUs, KEEPALIVE PDUs, IDRP ERROR PDUs, RIB REFRESH PDUs, and CEASE PDUs. While in the ESTABLISHED state, the local BIS shall take the following actions: a) If the FSM receives an activate, the FSM shall ignore it, and the FSM shall remain in the ESTABLISHED state. b) If the FSM receives a deactivate, the BIS shall send a CEASE PDU to the peer BIS. The FSM shall enter the CLOSE-WAIT state. c) If the Hold Timer expires, the BIS shall issue an IDRP ERROR PDU to the remote BIS, reporting a Hold_Timer error. The FSM shall enter the CLOSE-WAIT state. Yakov Rekhter, Paul Traina [Page 50] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +----------------------------------------------------------------------+ | Table 2 (Page 2 of 7). BIS Finite State Machine. This table | | summarizes the effects that its inputs will | | have on an IDRP FSM, giving both state | | transitions and the actions to be taken. | +--------+-----------+-------------+-------------+----------+----------+ | STATE | CLSD | OPRC | OPSN | CLWT | ESTB | | > | | | | | | +--------+ | | | | | | INPUT | | | | | | | V | | | | | | +--------+-----------+-------------+-------------+----------+----------+ | Expiry | S=CLSD | S=OPRC | S=OPSN | S=CLWT | S=CLWT | | of | A=none | A=none | A=none | A=none | A=Send | | Hold | | | | | IDRP | | Timer | | | | | ERROR | | | | | | | PDU to | | | | | | | report | | | | | | | error | +--------+-----------+-------------+-------------+----------+----------+ | Receive| S=CLSD | S=OPRC | S=OPSN | S=CLWT | S=ESTB | | BISPDU | A=none | A=log error | A=log error | A=none | A=log | | with | | event | event | | error | | Header | | | | | event | | Error | | | | | | +--------+-----------+-------------+-------------+----------+----------+ | Receive| S=CLSD | If ACK is | S=CLWT | S=CLWT | S=ESTB | | KPALV | A=none | correct, | A=Send IDRP | A=send | A=Restart| | PDU | | | ERROR PDU | CEASE, | Hold | | with | | S=ESTB | to report | restart | Timer | | no | | A=Restart | FSM Error | ClWtD | | | errors | | Hold Timer | | timer | | | | | | | | | | | | If ACK is | | | | | | | incorrect, | | | | | | | | | | | | | | S=CLWT | | | | | | | A=Send | | | | | | | IDRP ERROR | | | | | | | PDU to | | | | | | | peer BIS | | | | | | | to report | | | | | | | FSM Error | | | | +--------+-----------+-------------+-------------+----------+----------+ d) If the BIS receives a BISPDU with a header error, it shall log the error event, and the FSM shall remain in the ESTABLISHED state. Yakov Rekhter, Paul Traina [Page 51] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +----------------------------------------------------------------------+ | Table 2 (Page 3 of 7). BIS Finite State Machine. This table | | summarizes the effects that its inputs will | | have on an IDRP FSM, giving both state | | transitions and the actions to be taken. | +--------+-----------+-------------+-------------+----------+----------+ | STATE | CLSD | OPRC | OPSN | CLWT | ESTB | | > | | | | | | +--------+ | | | | | | INPUT | | | | | | | V | | | | | | +--------+-----------+-------------+-------------+----------+----------+ | Receive| S=CLSD | S=CLSD | S=CLSD | S=CLSD | S=CLSD | | CEASE | A=none | A=send | A=send | A=7.6.2 | A=send | | PDU | | CEASE, | CEASE, | | CEASE, | | with | | 7.6.2 | 7.6.2 | | 7.6.2 | | no | | | | | | | errors | | | | | | +--------+-----------+-------------+-------------+----------+----------+ | Receive| S=CLOSE | S=CLWT | S=CLWT | S=CLWT | S=CLWT | | OPEN | A=none | A=Send IDRP | A=Send IDRP | A=none | A=Send | | PDU | | ERROR PDU | ERROR PDU | | IDRP | | with | | to peer BIS | to peer BIS | | ERROR | | errors | | to report | to report | | PDU to | | | | OPEN PDU | OPEN PDU | | peer BIS | | | | error | error | | to | | | | | | | report | | | | | | | OPEN PDU | | | | | | | error | +--------+-----------+-------------+-------------+----------+----------+ e) If the BIS receives a KEEPALIVE PDU, it shall restart its Hold Timer, and the FSM shall remain in the ESTABLISHED state. f) If the BIS receives a CEASE PDU, it shall issue a CEASE PDU in return, and then the FSM shall enter the CLOSED state. g) If an OPEN PDU with no errors is received from the peer BIS, it shall issue an IDRP ERROR PDU, indicating FSM error. The FSM shall enter the CLOSE-WAIT state. h) If the BIS receives an UPDATE PDU with no errors, the BIS shall perform the actions provided in 8.14 , and shall restart its Hold Timer. The FSM shall remain in the ESTABLISHED state. i) If the BIS receives a RIB REFRESH PDU with no errors, the BIS shall perform the actions provided in 8.10.2 , and shall restart its Hold Timer. The FSM shall remain in the ESTABLISHED state. Yakov Rekhter, Paul Traina [Page 52] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +----------------------------------------------------------------------+ | Table 2 (Page 4 of 7). BIS Finite State Machine. This table | | summarizes the effects that its inputs will | | have on an IDRP FSM, giving both state | | transitions and the actions to be taken. | +--------+-----------+-------------+-------------+----------+----------+ | STATE | CLSD | OPRC | OPSN | CLWT | ESTB | | > | | | | | | +--------+ | | | | | | INPUT | | | | | | | V | | | | | | +--------+-----------+-------------+-------------+----------+----------+ | Receive| If LFO is | If ACK is | If ACK is | S=CLWT | S=CLWT | | OPEN | TRUE, | correct, | correct, | A=send | A=Send | | PDU | | | | CEASE, | IDRP | | with | S=OPRC | S=ESTB | S=ESTB | restart | ERROR | | no | A=send | A=send | A=send | ClWtD | PDU to | | errors | OPEN PDU | KPALV, | KPALV, | timer | peer BIS | | | | UPDATE, or | UPDATE, or | | to | | | If LFO is | RIB | RIB | | report | | | FALSE, | REFRESH | REFRESH | | FSM | | | | PDU | PDU | | error | | | S=CLSD | | | | | | | A=none | If ACK is | If ACK is | | | | | | incorrect, | incorrect, | | | | | | | | | | | | | S=OPRC, | S=OPRC | | | | | | A=send | A=send | | | | | | OPEN PDU | OPEN PDU | | | +--------+-----------+-------------+-------------+----------+----------+ | Receive| S=CLSD | S=CLWT | S=CLWT | S=CLWT | S=CLWT | | UPDATE | A=none | A=Send IDRP | A=Send IDRP | A=none | A=Send | | PDU | | ERROR PDU | ERROR PDU | | IDRP | | with | | to peer BIS | to peer BIS | | ERROR | | errors | | to report | to report | | PDU to | | | | UPDATE PDU | FSM error | | peer BIS | | | | error | | | to | | | | | | | report | | | | | | | UPDATE | | | | | | | PDU | | | | | | | error | +--------+-----------+-------------+-------------+----------+----------+ j) If the BIS receives an UPDATE PDU with errors, an OPEN PDU with errors, or a RIB REFRESH PDU with errors, it shall send an IDRP ERROR PDU to the remote BIS to report the error, and the FSM shall enter the CLOSE-WAIT state. Yakov Rekhter, Paul Traina [Page 53] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +----------------------------------------------------------------------+ | Table 2 (Page 5 of 7). BIS Finite State Machine. This table | | summarizes the effects that its inputs will | | have on an IDRP FSM, giving both state | | transitions and the actions to be taken. | +--------+-----------+-------------+-------------+----------+----------+ | STATE | CLSD | OPRC | OPSN | CLWT | ESTB | | > | | | | | | +--------+ | | | | | | INPUT | | | | | | | V | | | | | | +--------+-----------+-------------+-------------+----------+----------+ | Receive| S=CLSD | If ACK is | S=CLWT | S=CLWT | S=ESTB | | UPDATE | A=none | correct, | A=Send IDRP | A=send | A=7.14, | | PDU | | | ERROR PDU | CEASE, | restart | | with | | S=ESTB | to peer BIS | restart | Hold | | no | | A=7.14, | to report | ClWtD | Timer | | errors | | restart | FSM error | timer | | | | | Hold Timer | | | | | | | | | | | | | | If ACK is | | | | | | | incorrect, | | | | | | | | | | | | | | S=CLWT | | | | | | | A=Send | | | | | | | IDRP ERROR | | | | | | | PDU to | | | | | | | peer BIS | | | | | | | to report | | | | | | | FSM Error | | | | +--------+-----------+-------------+-------------+----------+----------+ | Receive| S=CLSD | S=CLSD | S=CLSD | S=CLSD | S=CLSD | | IDRP | A=none | A=Send | A=Send | A=Send | A=Send | | ERROR | | CEASE PDU, | CEASE PDU, | CEASE | CEASE | | PDU | | 7.6.2 | 7.6.2 | PDU, | PDU, | | with | | | | 7.6.2 | 7.6.2 | | errors | | | | | | +--------+-----------+-------------+-------------+----------+----------+ | Receive| S=CLSD | S=CLSD | S=CLSD | S=CLSD | S=CLSD | | IDRP | A=none | A=Send | A=Send | A=Send | A=Send | | ERROR | | CEASE PDU, | CEASE PDU, | CEASE | CEASE | | PDU | | 7.6.2 | 7.6.2 | PDU, | PDU, | | with | | | | 7.6.2 | 7.6.2 | | no | | | | | | | errors | | | | | | +--------+-----------+-------------+-------------+----------+----------+ Yakov Rekhter, Paul Traina [Page 54] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +----------------------------------------------------------------------+ | Table 2 (Page 6 of 7). BIS Finite State Machine. This table | | summarizes the effects that its inputs will | | have on an IDRP FSM, giving both state | | transitions and the actions to be taken. | +--------+-----------+-------------+-------------+----------+----------+ | STATE | CLSD | OPRC | OPSN | CLWT | ESTB | | > | | | | | | +--------+ | | | | | | INPUT | | | | | | | V | | | | | | +--------+-----------+-------------+-------------+----------+----------+ | Receive| S=CLSD | S=CLWT | S=CLWT | S=CLWT | S=CLWT | | RIB | A=none | A=Send IDRP | A=Send IDRP | A=none | A=Send | | REFRESH| | ERROR PDU | ERROR PDU | | IDRP | | PDU | | to peer BIS | to peer BIS | | ERROR | | with | | to report | to report | | PDU to | | errors | | RIB REFRESH | FSM error | | peer BIS | | | | PDU error | | | to | | | | | | | report | | | | | | | RIB | | | | | | | REFRESH | | | | | | | PDU | | | | | | | error | +--------+-----------+-------------+-------------+----------+----------+ k) If the BIS receives an IDRP ERROR PDU, either with or without errors, it shall send a CEASE PDU to the remote BIS. The FSM shall enter the CLOSED state. CLOSE-WAIT State When an FSM enters the CLOSE-WAIT state, the local BIS is preparing to close the connection with the remote BIS. Upon entering this state, the local BIS shall mark all entries in the Adj-RIB-In associated with the adjacent BIS as unreachable, and shall then re- run its Decision Process. The CloseWaitDelay timer shall be started. While in the CLOSE-WAIT state, the BIS shall take the following actions: a) If the CloseWaitDelay timer expires, the connection ceases to exist. The FSM shall enter the CLOSED state. b) If the BIS receives a CEASE PDU, the FSM shall enter the CLOSED state. c) If the BIS receives an IDRP ERROR PDU, it shall send a CEASE Yakov Rekhter, Paul Traina [Page 55] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +----------------------------------------------------------------------+ | Table 2 (Page 7 of 7). BIS Finite State Machine. This table | | summarizes the effects that its inputs will | | have on an IDRP FSM, giving both state | | transitions and the actions to be taken. | +--------+-----------+-------------+-------------+----------+----------+ | STATE | CLSD | OPRC | OPSN | CLWT | ESTB | | > | | | | | | +--------+ | | | | | | INPUT | | | | | | | V | | | | | | +--------+-----------+-------------+-------------+----------+----------+ | Receive| S=CLSD | If ACK is | S=CLWT | S=CLWT | S=ESTB | | RIB | A=none | correct, | A=Send IDRP | A=send | A=7.10.3,| | REFRESH| | | ERROR PDU | CEASE, | restart | | PDU | | S=ESTB | to report | restart | Hold | | with | | A=7.10.3, | FSM Error | ClWtD | Timer | | no | | restart | | timer | | | errors | | Hold Timer | | | | | | | | | | | | | | If ACK is | | | | | | | incorrect, | | | | | | | | | | | | | | S=CLWT | | | | | | | A=Send | | | | | | | IDRP ERROR | | | | | | | PDU to | | | | | | | peer BIS | | | | | | | to report | | | | | | | FSM Error | | | | +--------+-----------+-------------+-------------+----------+----------+ PDU to the peer BIS. The FSM shall then enter the CLOSED state. d) If the BIS receives any other type of BISPDU, with or without errors, it shall issue a CEASE PDU. The FSM shall remain in the CLOSE-WAIT state, and the CloseWaitDelay timer shall be restarted. e) The BIS shall take no action for any of the following inputs, and the FSM shall remain in the CLOSE-WAIT state: - activate - deactivate - Expiration of Hold Timer Yakov Rekhter, Paul Traina [Page 56] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +----------------------------------------------------------------------+ | Notes: | | | | a) "S" indicates the state into which the FSM will make a | | transition after performing the indicated action. | | | | b) "A" indicates the action to be taken. | | | | c) "X.Y.Z" is shorthand notation for "do as specified in clause | | X.Y.Z". | | | | d) The phrase "no errors" for a given BISPDU type means that no | | condition described in the appropriate subclause of 7.20 has | | been detected. | | | | e) The phrase "with errors" for a given BISPDU type means that a | | condition described in the appropriate subclause of 7.20 has | | been detected. | | | | f) Since the KPALV PDU and the CEASE PDU consist of only a fixed | | BISPDU header, errors in these BISPDUs are handled as Header | | Errors. Hence, there are no explicit entries in the table for | | "KPALV with errors" or "CEASE with errors". | +----------------------------------------------------------------------+ 8.6.2. Closing a connection The closing of a connection can be initiated by a deactivate generated by the local system, by receipt of an incorrect PDU, by receipt of a IDRP ERROR PDU, by expiration of the Hold Timer, or by receipt of a CEASE PDU. The actions taken in response to each of these stimuli are shown in Table 2. When the connection enters the CLOSED state, the sequence number last used by the local BIS is recorded in managed object lastPriorSeqNo, and all routes that had been exchanged between the pair of BISs are implicitly withdrawn from service; hence, the local BIS should rerun its Decision Process. 8.7. Validation of BISPDUs The protocol described in this document is a connection oriented protocol in which the underlying Network Layer service is used to establish full-duplex communication channels between pairs of BISs, as described in 8.6 use of any of the following three mechanisms for validating BISPDUs. Types 1,2, and 3 provide data integrity for the contents of BISPDUs; in addition, types 2 and 3 provide peer BIS Yakov Rekhter, Paul Traina [Page 57] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 authentication. Each mechanism is described below. 8.7.1. Authentication type 1 For all BISPDUs that flow on a connection that was established in response to an OPEN PDU whose authentication code field was equal to 1, the validation field shall contain a 16-octet unencrypted checksum: a) Generating a Validation Pattern: The contents of the Validation Pattern field that is included in an outbound BISPDU shall be generated by applying the MD5 Message-Digest algorithm (RFC1321) to the input data stream that consists of the contents of the entire BISPDU with all bits of the Validation Pattern field initially set to 0. The output of this step is an unencrypted 16-octet long checksum, which shall be placed in the Validation Pattern field of the BISPDU. b) Checking the Validation Pattern of an Inbound BISPDU: The contents of the Validation Pattern field of an inbound BISPDU shall be checked by applying the MD5 Message-Digest algorithm (RFC1321) to the contents of the inbound BISPDU with its Validation Pattern set to all zeros. Call this quantity the "reference pattern". If the "reference pattern" matches the contents of the Validation Pattern field of the inbound BISPDU, then the BISPDU's checksum is correct; otherwise, it is incorrect. 8.7.2. Authentication type 2 When the authentication type code of the OPEN PDU is 2, the pattern carried in the 16-octet Validation Pattern field of the fixed header shall provide both peer-BIS authentication and data integrity for the contents of the BISPDU. The specific mechanisms used to provide these functions are not specified by this document. However, they must be agreed to by the pair of communicating BISs as part of their security association. Note 12: This document includes as an optional function a mechanism that can be used for authentication of the source of a BISPDU. Other security-related facilities (for example, protection Yakov Rekhter, Paul Traina [Page 58] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 against replay of BISPDUs or the ability to re-key during a BIS_BIS connection) are not intended to be provided by this protocol, and therefore are not specified in this document. 8.7.3. Authentication type 3 When the authentication type code of the OPEN PDU is 3, the Validation Pattern field shall contain a 16-octet checksum covering both the contents of the BISPDU and some additional Password Text, which is not transmitted to the peer BIS. The method for encoding this data is specified in MD5 HMAC (RFC XXXX) The checksum provides data integrity and the untransmitted Password Text provides peer BIS authentication. The mechanisms are as follows: a) Generating a Validation Pattern: The contents of the Validation Pattern field that is carried in the outbound BISPDU shall be generated by the following process: 1) Password text shall be appended to the BISPDU immediately after the final octet of the BISPDU (as defined by the BISPDU length field of the BISPDU header). Additional password text may also be prepended to the BISPDU immediately prior to the first octet of its header. 2) A checksum that covers the contents of the BISPDU and the password text as specified by MD5 HMAC (RFC XXXX) shall be generated using the MD5 Message-Digest algorithm (RFC1331) with all bits of the Validation Pattern initially set to zero. The resultant checksum shall then be placed in the Validation Pattern field of the BISPDU. 3) The password text shall not be transmitted along with the BISPDU. b) Checking the Validation Pattern of an Inbound BISPDU: The contents of the Validation Pattern field of an inbound BISPDU shall be checked by the following procedure: 1) Append the Password Text to the BISPDU immediately after the final octet of the BISPDU (as defined by the BISPDU Length field of the BISPDU header. Yakov Rekhter, Paul Traina [Page 59] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 2) Apply the IDRP Checksum Algorithm to the data stream that consists of the concatenated contents of the BISPDU and the password text, with all bits of the BISPDU Validation Pattern set to zero. Call this value the "reference pattern". 3) If the "reference pattern" is identical to the data carried in the Validation Pattern of the incoming BISPDU, then the peer BIS has been authenticated. If the "reference pattern" does not match the Validation Pattern, the receiving BIS shall inform system management that an authentication failure has occurred. The incoming BISPDU shall be ignored. The receiving BIS shall not send an IDRP ERROR PDU to the peer BIS because the identity of the peer has not been authenticated. 8.7.4. Sequence numbers A sequence number is a 4-octet unsigned value. Sequence numbers shall increase linearly from 1 up to a maximum value of <2^32>-1. The value 0 is not a valid sequence number. The rules for manipulating sequence numbers are: a) When a BIS initially establishes a connection with an adjacent BIS, the first sequence number shall be set to 1 and shall increase linearly to a value of <2^32>-1. Before attempting to establish an initial BIS-BIS connection with an adjacent BIS, the local BIS must ensure that it has not sent a BISPDU to the adjacent BIS for at least CloseWaitDelay seconds. b) The sequence number shall not be incremented for the KEEPALIVE PDU, CEASE PDU, and the IDRP ERROR PDU. c) If the connection is subsequently closed under the conditions described in Table 2 and a subsequent connection is to be made to the same adjacent BIS, the local BIS shall, as a local matter, choose one of the following options: 1) Maintain status of the sequence number space, and use any value greater than the value last used in the prior BIS-BIS connection (lastPriorSeqNo), or 2) Ensure that at least CloseWaitDelay seconds have passed since the last BISPDU was sent to the adjacent BIS, and start with any sequence number. The choice of the initial value of the sequence number is a local matter. Yakov Rekhter, Paul Traina [Page 60] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 d) After a BIS sends a BISPDU with the maximum permissible sequence number (<2^32>-1) the BIS shall not send any further BISPDUs until the BISPDU with maximum sequence number and all outstanding BISPDUs have been acknowledged using the procedure of 8.7.5 BIS then shall set its lower window edge (see 8.7.5 ) to one. When a BIS receives a BISPDU with a sequence number of one, after having acknowledged a BISPDU with the maximum permissible sequence number, it shall set the value of its next expected sequence number to one, prior to processing that BISPDU. Alternatively, after a BIS sends a BISPDU with the maximum permissible sequence number, the BIS may issue a CEASE BISPDU and restart the BIS-BIS connection. 8.7.5. Flow control After an IDRP connection is established, the BIS Finite State Machine is in state ESTABLISHED (see section 8.6.1 ), and flow control and packet sequencing is in effect. The IDRP flow control process shall obey the following rules: a) A separate series of sequence numbers shall be maintained for each direction of a BIS-BIS connection, with the initial sequence number value chosen by the sender of a BISPDU and declared in the Sequence field of its OPEN PDU. The local BIS will maintain a window to manage transmission of BISPDUs to the remote BIS. The sender's lower window edge shall be set to the initial sequence number plus one; the sender's upper window edge shall be set to the lower window edge plus the value of credit offered contained in the peer BIS's OPEN PDU. Record is also kept of the next expected sequence number for an inbound UPDATE, RIB REFRESH, KEEPALIVE, or OPEN PDU to be received from the peer BIS; this is initially set to the value of one plus Sequence that is carried in the peer BIS's OPEN PDU. b) An UPDATE PDU or RIB REFRESH PDU shall not be sent if the upper window edge is less than or equal to the lower window edge. When a BISPDU is sent, the value of Sequence in the fixed header shall be set to the current value of the lower window edge. When an UPDATE or RIB REFRESH PDU is to be sent, the local BIS shall generate the contents of the BISPDU based on the current value of the lower window edge. The local BIS shall increment the local window edge by one before it transmits the BISPDU to the peer BIS and before it generates any other BISPDUs or processes any received BISPDUs; when a BISPDU other than an UPDATE or RIB REFRESH PDU is to be Yakov Rekhter, Paul Traina [Page 61] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 sent, the lower window edge shall not be incremented. The value of Acknowledgement shall be set to the value of the next expected sequence number less one. The value of credit offered shall be set to the number of additional BISPDUs that the local BIS is currently able to accept from the peer BIS. Credit, once offered, can not be revoked (that is, the remote BIS's upper window edge can not be reduced). Therefore, the sum of Acknowledgement and credit offered must never decrease in successive BISPDUs. The value of credit available shall be set to the upper window edge less the lower window edge (after incrementing the lower window edge, if appropriate). The local BIS shall retain a copy of transmitted UPDATE and RIB REFRESH BISPDUs for possible retransmission. c) An incoming UPDATE PDU or RIB REFRESH PDU whose Sequence value corresponds to the next expected sequence number shall be accepted and passed to the Finite State Machine described in 8.6.1 ; the next expected sequence number shall be incremented by one. An incoming UPDATE PDU or RIB REFRESH PDU whose Sequence is less than the next expected sequence number shall be discarded. An incoming UPDATE PDU or RIB REFRESH PDU whose Sequence is greater than the next expected sequence number shall be discarded, unless re-ordering is supported as a local implementation option, and the sequence number is not greater than the peer's upper window edge. An incoming KEEPALIVE PDU or OPEN PDU whose Sequence value corresponds to the next expected sequence number shall be accepted and passed to the Finite State Machine described in 7.6.1. An incoming KEEPALIVE PDU or OPEN PDU whose Sequence does not correspond to the next expected sequence number shall be discarded. An Incoming CEASE PDU or IDRP ERROR PDU shall be accepted and passed to the Finite State Machine described in 7.6.1regardless of its Sequence value. Whenever a BIS receives an UPDATE PDU, RIB REFRESH PDU, or KEEPALIVE PDU, it shall inspect its Acknowledgement and credit offered fields. Any BISPDUs retained for retransmission whose sequence number is less than or equal to the value of the Acknowledgement field shall be discarded. If the sum of one plus the value of Acknowledgement plus the value of credit offered in the received BISPDU is greater than the local BIS's current upper window edge, then the BIS shall set its upper window edge to this sum. Yakov Rekhter, Paul Traina [Page 62] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 d) A BIS shall acknowledge receipt of incoming UPDATE PDUs and RIB REFRESH PDUs within a period t[A] of their receipt. The acknowledgement may be accomplished by means of an UPDATE PDU or a RIB REFRESH PDU sent as outlined in item b above. However, if no UPDATE PDU or RIB REFRESH PDU is available to be sent, then a KEEPALIVE PDU may be sent instead, with its Sequence set to the lower window edge and its Acknowledgement, credit offered, and credit available set as in step b above. e) If a retained BISPDU remains unacknowledged after a period t[R], then it shall again be transmitted and again retained for possible retransmission. If, for a retained BISPDU, t[R] expires after n retransmissions, the local BIS shall issue a deactivate to close the BIS-BIS connection. Note 14: The value t[R] should be chosen to be greater than the value + 2*L, where L is the transmission delay over the subnetwork or virtual link between the pair of communicating BISs. f) The local BIS shall provide its peer BIS with sufficient credit to send further BISPDUs as long as the local BIS has resources to receive them. Therefore, if the local BIS receives a BISPDU whose credit available is equal to zero (that is, the peer BIS believes itself unable to send additional BISPDUs), then as soon as resources are available locally, the local BIS shall send an UPDATE PDU or a RIB REFRESH PDU, if appropriate. If not, then a KEEPALIVE PDU shall be sent. Note 15: An UPDATE PDU of minimal size will contain the Unfeasible Route Count field with a value of zero, but will not contain any path attributes or NLRI. Thus, its size will be only 33 octets. A KEEPALIVE PDU that advertises a non-zero value of credit offered in response to a received BISPDU with a credit available of zero shall be retransmitted within a period t[R] until the local BIS receives any in-sequence BISPDU that reports a non-zero value of credit available. If t[R] expires after n retransmissions, then the local BIS shall issue a deactivate to close the connection. g) A BIS that has sent a BISPDU with zero credit available to its neighbor shall respond within a period t[A] to a BISPDU from that neighbor that causes its upper window edge to be increased. The response shall consist of an UPDATE PDU or a RIB REFRESH PDU, if available, or a KEEPALIVE PDU, if not. Yakov Rekhter, Paul Traina [Page 63] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 h) A BIS that has not sent any BISPDU for a period t[I] shall send a KEEPALIVE PDU, with Sequence equal to the lower window edge, and Acknowledgement, credit offered, and credit available set as in step b above. Note 16: The condition (t[I]) >> (t[R]) should be satisfied, where t[I] is one third of the Hold Timer value. i) A BIS that has sent a BISPDU containing a credit offered of zero shall, as soon as its local resources become available to process additional BISPDUs from its peer, send an UPDATE PDU or RIB REFRESH PDU, if appropriate, containing a non-zero value of credit offered. If neither of these BISPDU types is appropriate, then a KEEPALIVE PDU shall be sent. j) The BIS shall issue a deactivate to close the BIS-BIS connection if no BISPDUs are received for a period equal to the value of Hold Time that is carried in the OPEN PDU. 8.8. Version negotiation BIS peers may negotiate the version number of IDRP by making successive attempts to open a BIS-BIS connection, starting with the highest supported version number (contained in managed object version) and decrementing the number each time a connection attempt fails. The lack of support for a particular IDRP version is indicated by an IDRP ERROR PDU with error code "OPEN_PDU_Error" and an error subcode of "Unsupported_Version_Number". One BIS may determine the highest version number supported by the other BIS (as advertised in its OPEN PDU) by examining the "Data" field of the received IDRP ERROR PDU. No further retries should be attempted if the version number reaches zero. 8.9. Checksum algorithm The checksums used in this document for authentication types 1 and 3 shall be generated in accordance with the MD5 Message-Digest algorithm described in RFC1321 and MD5 HMAC described in RFCXXXX. For an input data stream of any length, this algorithm will generate a checksum that is 16 octets long. This algorithm shall be used to generate the checksums for both the BISPDUs and the RIBs. Yakov Rekhter, Paul Traina [Page 64] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 8.10. Routing information bases The Routing Information Base (RIB) within a BIS consists of three distinct parts: a) Adj-RIBs-In: The Adj-RIBs-In store routing information that has been learned from inbound UPDATE PDUs. Their contents represent routes that are available as input to the Decision Process. A BIS must support at least one Adj-RIB-In for each of its neighbor BISs; it may optionally support several Adj-RIBs-In for a given neighbor BIS. Within the set of Adj-RIBs-In associated with a given neighbor BIS, no two shall have the same RIB-Tag (see 8.10.1 ). b) Loc-RIBs: The Loc-RIBs contain the local routing information that the BIS has selected by applying its local policies to the routing information contained in its Adj-RIBs-In. A BIS may support multiple Loc-RIBs. No two Loc-RIBs within a given BIS shall have the same RIB-Tag (see clause 8.10.1 ). Information in the Loc-RIB is used to build the Adj-RIBs-Out. c) Adj-RIBs-Out: The Adj-RIBs-Out store the information that the local BIS has selected for advertisement to its neighbors. A BIS must support at least one Adj-RIB-Out for each of its neighbor BISs; it may optionally support several Adj-RIBs-Out for a given neighbor BIS. Within the set of Adj-RIBs-Out associated with a given neighbor BIS, no two shall have the same RIB-Tag (see 8.10.1 ). The routing information stored in the Adj-RIBs-Out will be carried in the local BIS's UPDATE PDUs and advertised to its neighbor BISs. In summary, the Adj-RIBs-In contain unprocessed routing information that has been advertised to the local BIS by its neighbors; the Loc- RIBs contain the routes that have been selected by the local BIS's Decision Process; and the Adj-RIBs-Out organize the selected routes for advertisement to specific neighbor BISs by means of the local BIS's UPDATE PDUs. Note 17: Although the conceptual model distinguishes between Adj- RIBs-In, Adj-RIBs-Out, and Loc-RIBs, this does neither implies nor requires that an implementation must maintain three separate copies of the routing information. The choice of implementation (for example, 3 copies of the information vs. 1 copy with pointers) is not constrained by this standard. Yakov Rekhter, Paul Traina [Page 65] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 8.10.1. Identifying an information base Each information base (a single Adj-RIB-In, a single Loc-RIB, or a single Adj-RIB-Out) has one and only one RIB-Tag associated with it. The managed object RIBTagsSet explicitly enumerates all the RIB-Tags that a BIS supports. Managed object RIBTagsSet shall not contain any pairs of RIB-Tags that are identical, thus assuring that each RIB-Tag is unambiguous within the BIS. All BISs located within a given routing domain shall support the same RIB-Tags: that is, the managed object RIB-TagsSet of every BIS within an RD shall list the same RIB-Tags. When a BIS receives an OPEN PDU from another BIS located in its own routing domain, it shall compare the information in the field RIB-TagsSet with the information in its local managed object RIBTagsSet. If they do not match, then the appropriate error handling procedure in 8.18.2 shall be followed. Each BIS shall support default information bases (Adj-RIBs-In, Adj- RIBs-Out, Loc-RIB, and FIB) that correspond to the null RIB-Tag. 8.10.2. Use of the RIB REFRESH PDU The RIB REFRESH PDU can be used by a BIS to solicit a refresh of its Adj-RIBs-In by a neighbor BIS, or to send an unsolicited refresh to a neighbor BIS: a) Solicited Refresh A BIS may request a neighbor BIS to refresh one or more of the local BIS's Adj-RIBs-In by sending a RIB-REFRESH PDU that contains the OpCode for RIB-Refresh-Request and the RIB-Tags of the Adj-RIBs-In that it wants to be refreshed. When the neighbor BIS receives a RIB-REFRESH PDU with OpCode RIB-Refresh-Request, it shall send back a RIB-REFRESH PDU with OpCode RIB-Refresh-Start, followed by a sequence of UPDATE PDUs that contain the information in its Adj-RIBs-Out associated with the requesting BIS. The neighbor BIS shall indicate the completion of the refresh by sending a RIB-REFRESH PDU with OpCode RIB-Refresh-End. b) Unsolicited Refresh Yakov Rekhter, Paul Traina [Page 66] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 A BIS may initiate an unsolicited refresh by sending a RIB- REFRESH PDU with OpCode RIB-Request-Start, followed by a sequence of UPDATE PDUs that contain the information in its Adj-RIBs-Out that been advertised to a given BIS. The completion of the refresh shall be indicated by sending the RIB-REFRESH PDU with OpCode RIB-Refresh-End. When a BIS receives a RIB REFRESH PDU with OpCode 2 (RIB Refresh Start), it shall not change any of the routing information currently stored in the Adj-RIB-In which is identified by the FIB-Tag of the RIB REFRESH PDU until the refresh cycle has been completed or has been aborted. The BIS shall accumulate the routing information contained in all the UPDATE PDUs that are received in a completed refresh cycle. Completion of a refresh cycle is indicated by receipt of a RIB REFRESH PDU with OpCode 3 (RIB Refresh End). Then the BIS shall replace the previous routing information in the associated Adj-RIB-In with the routing information that was learned during the refresh cycle. Abortion of a refresh cycle is indicated by receipt of another RIB REFRESH PDU with OpCode 2 (RIB Refresh Start) before receipt of a RIB REFRESH PDU with OpCode 3 (RIB Refresh End). In this case, any routing information learned in the time between receipt of the two successive RIB Refresh Starts shall be discarded, and a new refresh cycle (triggered by receipt of the second RIB Refresh Start) shall begin. If the refreshing BIS receives a new RIB-Refresh-Request while it is in the middle of refresh (after sending RIB-REFRESH PDU with OpCode RIB-Refresh-Start, but before sending RIB-REFRESH PDU with OpCode RIB-Refresh-End), then the current refresh shall be aborted and the new refresh is initiated. 8.11. Path attributes An UPDATE PDU that carries an NLRI field also carries a set of path attributes. An UPDATE PDU that does not carry any NLRI field shall not carry any path attributes. Path attributes are summarized in Table 3; their encoding is described in 7.3 Yakov Rekhter, Paul Traina [Page 67] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +-------------------------------------------------+ | Table 3. Path Attribute Characteristics | +----------------+--------------+------+----------+ | Attribute | Category | Type | Length | | | | Code | (octets) | +----------------+--------------+------+----------+ | LOCAL_PREF | well-known | 1 | 4 | | | discretionary| | | +----------------+--------------+------+----------+ |INCOMPLETE_PATH | well-known | 2 | 0 | | | discretionary| | | +----------------+--------------+------+----------+ | RD_PATH | well-known | 3 | variable | | | mandatory | | | +----------------+--------------+------+----------+ | NEXT_HOP | well-known | 4 | variable | | | discretionary| | | +----------------+--------------+------+----------+ | AGGREGATOR | optional | 5 | 32 | | | transitive | | | +----------------+--------------+------+----------+ | ATOMIC_AGGREG | well-known | 6 | 0 | | | discretionary| | | +----------------+--------------+------+----------+ | MULTI-EXIT | optional | 7 | 4 | | DISC | non-transitiv| | | +----------------+--------------+------+----------+ | RD_HOP_COUNT | well-known | 13 | 1 | | | mandatory | | | +----------------+--------------+------+----------+ | CAPACITY | well-known | 15 | 1 | | | discretionary| | | +----------------+--------------+------+----------+ | COMMUNITIES | well-known | 16 | variable | | | discretionary| | | +----------------+--------------+------+----------+ 8.11.1. Categories of path attributes Path attributes fall into four categories: a) Well-known mandatory: these attributes must be recognized upon receipt by all BISs, and must be present in every UPDATE PDU b) Well-known discretionary: these attributes must be recognized upon receipt by all BISs, but are not necessarily present in an Yakov Rekhter, Paul Traina [Page 68] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 UPDATE PDU c) Optional transitive: these attributes need not be recognized upon receipt by all BISs, and are not necessarily present in an UPDATE PDU. If a given BIS does not recognize an optional transitive attribute, it must pass it on to other BISs d) Optional non-transitive: these attributes need not be recognized upon receipt by all BISs, and are not necessarily present in an UPDATE PDU. If it does not recognize an optional non-transitive attribute, a BIS shall ignore it and shall not include it in any of its own UPDATE PDUs. A BIS shall handle optional attributes in the following manner: a) If a route with an unrecognized optional transitive attribute is received and the route is to be propagated to other BISs, the optional transitive attribute must be propagated with the route, and the Partial bit in the Flag field of the attribute shall be set to 1. b) If a route with a recognized optional transitive attribute is received and the route is to be propagated to other BISs, the optional transitive attribute may or may not be propagated with the route, according to the definition of the attribute. If the attribute is propagated, then the local BIS shall not modify the value of the PARTIAL bit in the Flag field of the attribute. c) If a route with an unrecognized optional non-transitive attribute is received, the receiving BIS shall ignore the attribute and shall not propagate that attribute to any other BIS. However, it may propagate the remainder of the route: that is, the route without the unrecognized optional non-transitive attribute. d) If a route with a recognized optional non-transitive attribute is received and the route is to be propagated to other BISs, the optional transitive attribute may or may not be propagated with the route, according to the definition of the attribute. If the attribute is propagated, then the local BIS shall not modify the value of the PARTIAL bit in the Flag field of the attribute. BISs shall observe the following rules for attaching and updating the values of optional attributes: - New optional transitive attributes may be attached to the path information by any BIS in the path, and that BIS shall then set Yakov Rekhter, Paul Traina [Page 69] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 the PARTIAL bit in the attributes flag of its UPDATE PDU to 1. - The rules for attaching new non-transitive optional attributes depend on the nature of each specific attribute. The definition of each non-transitive optional attribute specifies such rules. - Any optional attribute may be updated by any BIS in its path. 8.12. Path attribute usage The usage of each of IDRP's path attributes is described in the following clauses. 8.12.1. LOCAL_PREF LOCAL_PREF is a well-known discretionary attribute that shall be included in all UPDATE PDUsthat a given BIS sends to the other BIS located in its own RD. A BIS shall calculate the degree of preference for each external route and include the degree of preference when advertising a route to BISs that are located in the same RD. The higher degree of preference should be preferred. A BIS shall use the degree of preference learned via LOCAL_PREF in its decision process (see section 8.15.1 ). A BIS shall not include this attribute in UPDATE PDUs that it sends to BISs located in adjacent RDs. If it is contained in an UPDATE PDU that is received from a BIS which is not located in the same RD as the receiving BIS, then this attribute shall be ignored by the receiving BIS. 8.12.2. INCOMPLETE_PATH INCOMPLETE_PATH is a well-known discretionary attribute. It shall be recognized upon receipt by all BISs. It shall be included in each UPDATE PDU that reports either an RD_PATH attribute or Network Layer Reachability Information that has been learned by methods not described in this document. The INCOMPLETE_PATH attribute shall be generated by the RD that originates the associated routing information. If the INCOMPLETE_PATH attribute was present in a received UPDATE PDU, then it shall also be Yakov Rekhter, Paul Traina [Page 70] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 included in the UPDATE PDUs of all BISs that choose to propagate this information to other BISs. Note 24: Information obtained from the managed object internalSystems or obtained from UPDATE PDUs which do not contain the INCOMPLETE_PATH attribute has been learned by methods within IDRP's scope; however, manually configured reachability information for an RD which does not run IDRP is an example of information which is learned by means outside IDRP's scope. If a BIS selects a route which has been advertised with the INCOMPLETE_PATH attribute, it is possible that there may be undetected looping of routing information. Therefore, it is recommended that distribution of information not learned by the methods of IDRP be tightly controlled. Furthermore, a given RD may also enforce policies which prohibit any of its BISs from selecting routes which have the INCOMPLETE_PATH attribute associated with them. 8.12.3. RD_PATH RD_PATH is a well-known mandatory attribute. It shall be present in every UPDATE PDU, and shall be recognized on receipt by all BISs. This attribute consists of a concatenation of path segments that identifies the routing domains and routing domain confederations through which this route has passed. The path segments can be RD_SETs, RD_SEQs, ENTRY_SEQs, or ENTRY_SETs. 8.12.3.1. Generating an RD_PATH attribute When a BIS originates a route to destinations contained within its own routing domain or to destinations learned by means outside the protocol (see 7.3.1.2 ), it shall examine the information contained in its managed object rdcConfig to determine the ordering relationships among all the confederations of which the local routing domain is a member. The local BIS shall then construct an RD_PATH attribute as follows: a) If the local routing domain is a member of one or more confederations, the RD_PATH shall consist of an ENTRY_SEQ segment followed immediately by an RD_SEQ segment. The ENTRY_SEQ shall list the confederations, ordered as follows: 1) If a confederation, RDC-B, is nested within another confederation, RDC-A, then the RDI of RDC-A shall precede that of RDC-B. Yakov Rekhter, Paul Traina [Page 71] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 2) The RDIs of overlapping confederations shall be listed in increasing order of the RDIs, as long as the order implied by any nesting relationships is maintained. For purposes of ordering, two RDIs are compared octet-by-octet from the left until differing octet values are found. The RDI with the lesser octet value (when treated as an unsigned integer) is considered to have the lesser RDI value. If there are two RDIs of different lengths, and the leading octets of the longer RDI are exactly the same as the octets of the (complete) shorter RDI, then the shorter RDI is considered to have the lesser value. The RD_SEQ shall list the RDI of the BIS's routing domain. b) If the local routing domain is not a member of any confederation, then the RD_PATH contains a single RD_SEQ segment that lists the RDI of the BIS's routing domain. 8.12.3.2. Updating a received RD_PATH attribute The local BIS shall update the RD_PATH attribute of a route received from another BIS according to the following rules: a) If the route was received from a BIS located in the same routing domain as the local BIS, then the RD_PATH attribute shall not be updated. b) If the route was received from a BIS located in an adjacent routing domain, the local BIS shall determine if the route has entered any confederations (see 8.13.3 ), and it shall examine the information contained in its managed object rdcConfig to determine the ordering relationships among all such confederations. The local BIS shall then amend the RD_PATH attribute as follows: 1) If the route has entered any confederations, the BIS shall append a path segment of type ENTRY_SEQ that lists all the newly entered confederations, ordered as follows: i) If a confederation, RDC-B, is nested within another confederation, RDC-A, then the RDI of RDC-A shall precede that of RDC-B. ii) The RDIs of overlapping confederations shall be listed in increasing order of the RDIs, as long as the order implied by any nesting relationships is maintained. For purposes of ordering, two RDIs are compared octet-by-octet from the left until differing octet values are found. The RDI with the lesser octet value (when treated as an unsigned Yakov Rekhter, Paul Traina [Page 72] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 integer) is considered to have the lesser RDI value. If there are two RDIs of different lengths, and the leading octets of the longer RDI are exactly the same as the octets of the (complete) shorter RDI, then the shorter RDI is considered to have the lesser value. The ENTRY_SEQ segment shall be followed immediately by an RD_SEQ segment that lists the RDI of the BIS's routing domain. 2) If the route has not entered any confederations, the local BIS shall append a path segment of type RD_SEQ that lists the RDI of the BIS's routing domain. 8.12.3.3. Advertising a route received from another BIS After receiving a route, a BIS will have modified its RD_PATH attribute in accordance with 8.12.3.2 ; and when a route is generated locally, the BIS will have created an RD_PATH attribute in accordance with 8.12.3.1 advertisement, the RD_PATH attribute of that route shall be amended as follows, based on the confederations which have been exited and on the nesting relationships among confederations of which the local BIS is a member (see managed object rdcConfig): a) If the adjacent BIS to which the route will be advertised can be reached without exiting any confederations, then no modification to the RD_PATH attribute shall be made. b) If the adjacent BIS to which the route will be advertised can only be reached by exiting one or more confederations, then the local BIS shall check the RD_PATH attribute for the presence of ENTRY_SEQ or ENTRY_SET path segments that contain the RDIs of the exited confederations. If there is any RDI of an exited confederation which is absent from all ENTRY_SEQ and ENTRY_SET segments, then the route is in error. The local BIS shall send an IDRP ERROR PDU to the BIS that advertised the route, reporting a Misconfigured_RDCs error. If two confederation, RDC-A and RDC-B, are listed in the same ENTRY_SEQ, and managed object rdcConfig indicates that RDC-B is nested within RDC-A, then the RDI of RDC-A shall precede that of RDC-B in the ENTRY_SEQ. If it does not, the local BIS shall send an IDRP ERROR to the BIS that advertised the route, reporting a Misconfigured_RDCs error. Otherwise, the local BIS shall scan the RD_PATH attribute from the Yakov Rekhter, Paul Traina [Page 73] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 back (right to left, starting at the highest numbered octet) looking for an ENTRY_SEQ or ENTRY_SET path segment that lists an exited confederation. Within a given ENTRY_SET or ENTRY_SEQ segment, the RDI for a given confederation can not be processed until the RDIs for all confederations nested within it have been processed. For each exited confederation (for example, the confederation whose RDI is "X"), the advertising BIS shall then update the RD_PATH of the route as follows: 1) The entry for "X" shall be removed from the ENTRY_SEQ or ENTRY_SET segment 2) If "X" is the only RDI contained in an ENTRY_SEQ or ENTRY_SET segment of the RD_PATH, then create a path segment of type RD_SEQ that lists "X" and insert it in front of the previous entry for "X". 3) If the local BIS's routing domain is a member of other confederations besides "X" that are listed in the ENTRY_SEQ or ENTRY_SET segments of the RD_PATH, then: i) If "X" occurs in an ENTRY_SEQ or ENTRY_SET segment, and "X" is nested within none of the other confederations, then create an RD_SET that lists "X" and insert it in front of the first ENTRY_SEQ or ENTRY_SET segment that occurs in the RD_PATH. ii) If "X" occurs in an ENTRY_SEQ and "X" is nested within all the other confederations, then create a path segment of type RD_SEQ that lists "X" and insert it immediately in front of the previous entry for "X" iii) If "X" occurs in an ENTRY_SEQ and "X" is nested within some but not all of the other confederations, then create a path segment of type RD_SET that lists "X", and insert it immediately after the closest prior entry for any confederation in which "X" is nested. iv) If "X" occurs in an ENTRY_SET and "X" is nested within all the other confederations, then create a path segment of type RD_SET that lists "X" and insert it immediately in front of the previous entry for "X" v) If "X" occurs in an ENTRY_SET and "X" is nested within some but not all of the other confederations, then create a Yakov Rekhter, Paul Traina [Page 74] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 path segment of type RD_SET that lists "X", and insert it immediately after the the closest prior entry for any confederation in which "X" is nested. If the procedures call for the insertion of an RD_SET or an RD_SEQ between entries that are contained in a single ENTRY_SET or ENTRY_SEQ, then break the ENTRY_SET or ENTRY_SEQ into two segments of identical type and perform the insertion. For example, if it is necessary to insert RD_SET(X) between entries for "A" and "B", where "A" and "B" are contained in ENTRY_SEQ(H,J,A,B,C), the result would be: ENTRY_SEQ(H,J,A) RD_SET(X) ENTRY_SEQ(B,C). If, after applying these procedures, the ENTRY_SEQ or ENTRY_SET segment in which "X" originally occurred is empty, then that path segment shall be deleted, together with any subsequent path segments between itself and the next occurring ENTRY_SEQ or ENTRY_SET segment, or between itself and the end of the RD_PATH attribute if there is no subsequent ENTRY_SEQ or ENTRY_SET segment. 8.12.4. NEXT_HOP NEXT_HOP is a well-known discretionary attribute. It shall be recognized upon receipt by all BISs. For purposes of defining the usage rules for this attribute, a subnetwork is transitive with respect to system reachability if all of the following conditions are true: a) Systems A, B, and C are all attached to the same subnetwork, b) When A can reach B directly, and B can reach C directly, it follows that A can reach C directly. Verification of the above conditions should be accomplished by means outside of IDRP. Consider three BISs attached to a fully connected transitive subnetwork, as shown in Figure 8: A and B share a BIS-BIS connection, B and C share a BIS-BIS connection, but A and C have no BIS-BIS connection between themselves. If C propagates an UPDATE PDU to B, then with respect to the UPDATE PDU advertised by B: Yakov Rekhter, Paul Traina [Page 75] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +----------------------------------------------------------------------+ | | | +-----+ | | | B | | | +-----+ | | = | = | | BIS-BIS = | = BIS-BIS | | Connection = | = Connection | | = | = | | = | = | | = | = | | = | = | | +---+ | +---+ | | | A |--------+--------| C | | | +---+ +---+ | | | | nhop2 | | | +----------------------------------------------------------------------+ Figure 8. A Transitive Fully Connected Subnetwork - C is defined to be the source BIS - B is defined to be the first recipient BIS - A is defined to be the subsequent recipient BIS. In terms of these definitions, the following rules apply to the usage of the NEXT_HOP attribute: a) Generating the Attribute When a given BIS generates an UPDATE PDU: 1) It may list its own Network Layer address and the SNPAs of subnetworks that connect itself to the remote BIS in the NEXT_HOP attribute of that UPDATE PDU. 2) It may choose not to include a NEXT_HOP attribute in its UPDATE PDU. When the NEXT_HOP field is not present, it implies that the Network Layer address of the BIS that advertises the UPDATE PDU should be considered to be the Network Layer address of the next-hop BIS. Yakov Rekhter, Paul Traina [Page 76] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 b) Advertising Routing Information When a BIS chooses to advertise routing information learned from an UPDATE PDU: 1) The BIS may choose to list its own Network Layer address and the SNPAs of subnetworks that connect itself to the remote BIS in the NEXT_HOP attribute of an UPDATE PDU that propagates the routing information 2) The BIS may choose not to include a NEXT_HOP attribute in its UPDATE PDU. When the NEXT_HOP field is not present, it implies that the BIS that advertises the UPDATE PDU is also the next-hop BIS. 3) If any condition listed below is not satisfied, then the recipient BIS shall not list the Network Layer address and SNPAs of the source BIS in its own UPDATE PDUs. If they are all satisfied, then instead of listing its own Network Layer address and SNPAs, the BIS may optionally list the Network Layer address and SNPAs of the source BIS (as contained in the UPDATE PDU received from the source BIS) when it propagates the information to a subsequent recipient BIS. The conditions are the following: ii) All three BISs (source, first recipient, and subsequent recipient) are located on a common subnetwork which is full-duplex and is transitive with respect to reachability of all three BISs. iii) The managed object routeServer is "true". iv) The first recipient and subsequent recipient are located in different routing domains. v) Advertisement of this route to the subsequent recipient BIS does not conflict with any of the path attributes that were contained in the UPDATE PDU from the source BIS. Note 25: The following observations should be noted with regard to the rules stated above: a) The rules do not remove the requirement that there must be a BIS-BIS connections between each pair of BISs located in the same routing domain. Yakov Rekhter, Paul Traina [Page 77] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 b) The contents of the NEXT_HOP attribute have no effect upon the contents of the RD_PATH attribute: that is, the RD_PATH attribute will always be used in accordance with 7.3.1.3 c) If the Network Layer address and SNPAs are not available in an UPDATE PDU, then a BIS that receives it must learn them by means outside of this document. A BIS must never install a route with itself as the next hop. When a BIS advertises the route to a BIS located in its own domain, the advertising BIS should not modify the NEXT_HOP attribute associated with the route. When a BIS receives the route from an internal neighbor BIS, it may use the NEXT_HOP address as the forwarding address, provided that the address is on a common subnet with the local BIS. 8.12.5. AGGREGATOR AGGREGATOR is an optional transitive attribute which may be included in updates which are formed by aggregation (see 8.16.2 ). A BIS which performs route aggregation may add the AGGREGATOR attribute which shall contain BIS's own RDI and IP address. 8.12.6. ATOMIC_AGGREGATE ATOMIC_AGGREGATE is a well-known discretionary attribute. If a BIS, when presented with a set of overlapping routes from one of its peers (see 8.15.4 ), selects the less specific route without selecting the more specific one, then the local system shall attach the ATOMIC_AGGREGATE attribute to the route when propagating it to other BISs (if that attribute is not already present in the received less specific route). A BIS that receives a route with the ATOMIC_AGGREGATE attribute shall not remove the attribute from the route when propagating it to other BISs. A BIS that receives a route with the ATOMIC_AGGREGATE attribute shall not make any NLRI of that route more specific (as defined in 8.15.4 ) when advertising this route to other BISs. A BIS that receives a route with the ATOMIC_AGGREGATE attribute needs to be cognizant of the fact that the actual path to destinations, as specified in the NLRI of the route, while having the loop-free property, may traverse domains/confederations that are not listed in the RD_PATH attribute. Yakov Rekhter, Paul Traina [Page 78] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 8.12.7. MULTI-EXIT_DISC MULTI-EXIT_DISC is an optional non-transitive attribute. If the local BIS's managed object multiExit is "true", the BIS may use the attribute in its path selection algorithm. For example, a routing domain may choose to implement a policy which mandates that if all other path attributes are equal, the exit point with the lowest value of MULTI-EXIT_DISC should be preferred. Each BIS that is connected to an adjacent RD by one or more common subnetworks may generate a MULTI-EXIT_DISC attribute for each link connecting itself to an adjacent RD. The value of this attribute is a local matter, which will be determined by the policies of the RD in which the originating BIS is located. A BIS that generates a value for this attribute may distribute it to all neighboring BISs which are located in adjacent RDs. If a MULTI-EXIT_DISC attribute is received from a BIS located in an adjacent RD, then the receiving BIS may distribute this attribute to all other BISs located in its own RD. However, the receiving BIS shall not re-distribute the attribute to any BISs which are not located within its own RD. 8.12.8. RD_HOP_COUNT This is a well-known mandatory attribute whose usage is as follows: a) The initial value of this attribute is 0. b) Before sending an UPDATE PDU to a BIS located in an adjacent routing domain, a BIS shall increment the value of this attribute by 1, and shall place the result in the RD_HOP_COUNT field of the outbound UPDATE PDU. c) A BIS shall not increment the value of this attribute when it sends an UPDATE PDU to another BIS located in its own routing domain. d) This attribute may be modified by administrative proceedures. Yakov Rekhter, Paul Traina [Page 79] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 8.12.9. CAPACITY This is a well-known discretionary attribute that is used to denote the traffic handling capacity of the RD_PATH listed in the same UPDATE PDU. Different routing domains may use different values for this attribute: thus, the attribute shall deal in relative capacities. Note 27: The semantics of this attribute must be agreed on a bilateral basis using mechnaisms outside the scope of this document before a BIS-BIS connection is established. The value of capacity that is associated with a given routing domain is contained in managed object capacity. If a BIS advertises a route whose destinations are located in its own routing domain, then the originating BIS shall include this attribute in its outbound UPDATE PDUs, and its value shall be equal to that of managed object capacity. If a BIS redistributes a route and the route includes the CAPACITY attribute, the attribute shall reflect the lower of the following two quantities: the value of the CAPACITY attribute contained in the UPDATE PDU that advertised the route, or the value of local managed object capacity. 8.13. Routing domain confederations Formation of an RDC is done via a private arrangement between its members without any need for global coordination; the methods for doing so are not within the scope of this document. From the outside, an RDC looks similar to a single routing domain: for example, it has an identifier which is an RDI. Other RDs can develop policies with respect to the confederation as a whole, as opposed to the individual RDs that are members of the confederation. Confederations can be disjoint, nested, or overlapping (see 6 ). 8.13.1. RDC policies Each RD within a confederation may have its own set of policies; that is, different RDs in the same confederation can have different policies. Since a confederation appears to the external world as if it were an Yakov Rekhter, Paul Traina [Page 80] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 individual RD, IDRP's loop detection methods will detect routing information loops through a given confederation. In particular, a route which leaves the confederation and then later re-enters it will be detected as a loop: thus, a route between two RDs that are members of the same confederation will be constrained to remain within that confederation. 8.13.2. RDC configuration information Each BIS that participates in one or more RDCs must be aware of the RDIs of all confederations of which it is a member, and it must know the partial order which prevails between these confederations: that is, it must know the nesting and overlap relationships between all confederations to which it belongs. This information shall be contained in managed object rdcConfig, which consists of a list of confederation RDIs and the partial order that prevails among those confederations. Since RDCs are formed via private arrangement between their members, the partial order of a given confederation is a local matter for that confederation, and bears no relationship to the partial orders that may prevail in different confederations. The RDI of its own routing domain is contained in managed object localRDI, as defined in 8.3 8.13.3. Detecting confederation boundaries A given BIS can tell which confederations are common to itself and an adjacent BIS by comparing information obtained from the Confed-IDs field of the adjacent BIS's OPEN PDU with the local BIS's rdcConfig managed object. This knowledge determines when an outbound UPDATE PDU exits a given confederation and when an inbound UPDATE PDU enters a given confederation: Exiting a Confederation: An UPDATE PDU sent by a given BIS to an adjacent BIS is defined to have exited all those confederations whose RDIs are present in the advertising BIS's rdcConfig managed object but were not reported in the Confed-IDs field of the adjacent BIS's OPEN PDU. Entering a Confederation: An UPDATE PDU received from an adjacent BIS is defined to have entered all those confederations whose RDIs are present in the receiving BIS's rdcConfig managed object but were not reported in the Confed-IDs field of the sending BIS's OPEN PDU. Yakov Rekhter, Paul Traina [Page 81] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 8.14. Update-Receive process The Update-Receive process is initiated when an UPDATE PDU with no errors is received while the FSM is in the ESTABLISHED state. When this occurs, the BIS shall update the appropriate Adj-RIB-In. For each route (either feasible or unfeasible), the Adj-RIB-In is identified by the FIB-Tag carried in the UPDATE PDU. The actions to be taken for each route are: a) If the UPDATE PDU contains a non-empty WITHDRAWN ROUTES field, the previously advertised feasible routes associated with the NLRIs contained in this field shall be removed from the Adj-RIB- In. The BIS shall run its Decision Process since the previously advertised route is no longer available for use. b) If the UPDATE PDU contains feasible routes, they shall each be placed in the appropriate Adj-RIB-In, and the following additional actions shall be taken for each route: 1) If its NLRI is identical to those of a route currently stored in the Adj-RIB-In, then the new route shall replace the older route in the Adj-RIB-In, thus implicitly withdrawing the older route from service. The BIS shall run its Decision Process since the older route is no longer available for use. 2) If the new route is an overlapping route that is more specific (see 8.15.4 ) than an earlier route contained in the Adj-RIB-In, and the path attributes of the new route differ from those of the earlier route, the BIS shall run its Decision Process since the more specific route has implicitly made a portion of the less specific route unavailable for use. 3) If the new route has identical path attributes to an earlier route contained in the Adj-RIB-In, and is more specific (see 8.15.4 ) than the earlier route, no further actions are necessary. 4) If a new route has different NLRI from any of the routes currently in the Adj-RIB-In, it shall be placed in the Adj- RIB-In. 5) If a new route is an overlapping route that is less specific (see 8.15.4 ) than an earlier route in an Adj-RIB-In, the BIS shall place the new route in the appropriate Adj-RIB-In. The earlier, more specific route remains unaffected. Yakov Rekhter, Paul Traina [Page 82] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 8.15. Decision process The Decision Process selects routes for subsequent advertisement by applying the policies in its Policy Information Base to the routes stored in its Adj-RIBs-In. The output of the Decision Process is the set of routes that will be advertised to adjacent BISs; the selected routes will be stored in the local BIS's Loc-RIBs and Adj-RIBs-Out. The selection process is formalized by defining a function that takes the attributes of a given route as an argument and returns a non- negative integer denoting the degree of preference for the route. The function that calculates the degree of preference for a given route shall not use as its inputs any of the following: the existence of other routes, the non-existence of other routes, or the path attributes of other routes. Route selection then consists of individual application of the degree of preference function to each feasible route, followed by the choice of the one with the highest degree of preference. Routes that could form routing loops must be ignored by the Decision Process. Therefore, any route that was a) received from a BIS located in an adjacent routing domain and b) contains in its RD_PATH attribute a path segment of type RD_SEQ or RD_SET that contains the RDI of the local routing domain or any RDC of which the local RD is a member is unfeasible, and shall be discarded by the Decision Process. IDRP does not require a universally agreed-upon metric to exist between multiple RDs. Instead, IDRP allows each RD to apply its own set of criteria for route selection, as determined by its local PIB. The Decision process operates on routes contained in each Adj-RIB-In, and is responsible for: - selection of routes to be advertised to BISs located in local BIS's routing domain - selection of routes to be advertised to BISs located in adjacent routing domains - route aggregation and route information reduction The Decision process takes place in three distinct phases, each triggered by a different event: a) Phase 1 is responsible for calculating the degree of preference for each route received from a BIS located in an adjacent routing domain, and for advertising to the other BISs in the local Routing Yakov Rekhter, Paul Traina [Page 83] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 Domain the routes that have the highest degree of preference for each distinct destination. b) Phase 2 is invoked on completion of Phase 1. It is responsible for choosing the best route out of all those available for each distinct destination, and for installing each chosen route into the appropriate Loc-RIB. c) Phase 3 is invoked after the Loc-RIB has been modified. It is responsible for disseminating routes in the Loc-RIB to each adjacent BIS located in an adjacent routing domain, according to the policies contained in the PIB. Route aggregation, information reduction and the modification of QOS path attributes can optionally be performed within this phase. 8.15.1. Phase 1: calculation of degree of preference The Phase 1 decision function shall be invoked whenever the local BIS receives an UPDATE PDU from a neighbor BIS that advertises a new route, a replacement route, or a withdrawn route. The Phase 1 decision function is a separate process which completes when it has no further work to do. The Phase 1 decision function shall be blocked from running while the Phase 2 decision function for the same RIB-Tag is in process. The Phase 1 decision function shall lock an Adj-RIB-In prior to operating on any route contained within it, and shall unlock it after operating on all new or unfeasible routes contained within it. For each newly received or replacement feasible route, the local BIS shall determine a degree of preference as follow. If the route is learned from a BIS in the local RD, the value of the LOCAL_PREF attribute shall be taken as the degree of preference. If the route is learned from a BGP speaker in a neighboring autonomous system, then the degree of preference shall be computed based on preconfigured policy information. The exact nature of this policy information and the computation involved is a local matter. After a degree of preference is determined, the local BIS shall run the internal update process of 8.15.7 to select and advertise the most preferable routes. Yakov Rekhter, Paul Traina [Page 84] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 8.15.2. Phase 2: route selection The Phase 2 decision function shall be invoked on completion of Phase 1. The Phase 2 function is a separate process which completes when it has no further work to do. The Phase 2 process shall consider all routes that are present in the Adj-RIBs-In, including those received from BISs located in its own routing domain and those received from BISs located in adjacent routing domains. The Phase 2 decision function shall be blocked from running while the Phase 3 decision function is in process. The Phase 2 function shall lock all Adj-RIBs-In with the RIB-Tag associated with this instance of the process prior to commencing its function, and shall unlock them on completion. For each set of destinations for which a feasible route exists in the Adj-RIBs-In identified by the RIB-Tag on which this instance of the function operates, the local BIS shall identify the route that has: a) the highest degree of preference of any route to the same set of destinations, or b) is the only route to that destination, or c) is selected as a result of the Phase 2 tie breaking rules specified in 95 The local BIS shall then install that route in the Loc-RIB, replacing any route to the same destination that is currently held in the Loc- RIB. The local BIS shall determine the immediate next hop to the address depicted by the NEXT_HOP attribute of the selected route by performing a lookup in the intra-domain routing and selecting one of the possible paths in the intra-domain routing. This immediate next hop shall be used when installing the selected route in the Loc-RIB. If the route to the address depicted by the NEXT_HOP attribute changes such that the immediate next hop changes, route selection should be recalculated as specified above. If a route copied to a Loc-RIB does not have a NEXT_HOP path attribute, then the local BIS shall add that attribute to the entry in the Loc-RIB. The value of the attribute shall be the Network Layer address of the adjacent BIS from which the route was received. Unfeasible routes shall be removed from the Loc-RIB, and corresponding unfeasible routes shall then be removed from the Adj- RIBs-In. Note 28: The decision process should not select a route to Yakov Rekhter, Paul Traina [Page 85] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 destinations located within the local routing domain if that route would exit the local routing domain and later re-enter it. Such routes would be rejected by other RDs due to the existence of an RD- loop. 8.15.2.1. Breaking ties (phase 2) In its Adj-RIBs-In a BIS may have several routes to the same destination that have the same degree of preference. The local BIS can select only one of these routes for inclusion in the associated Loc-RIB. The local BIS considers all equally preferable routes, both those received from BISs located in adjacent RDs, and those received from other BISs located in the local BIS's own RD. The following tie-breaking procedure assumes that for each candidate route all the BGP speakers within an autonomous system can ascertain the cost of a path (interior distance) to the address depicted by the NEXT_HOP attribute of the route. Ties shall be broken according to the following algorithm: a) If the local BIS is configured to take into account MULTI_EXIT_DISC, and the candidate routes differ in their MULTI_EXIT_DISC attribute, select the route that has the lowest value of the MULTI_EXIT_DISC attribute. If the local BIS is configured to take into account MULTI_EXIT_DISC, but that attribute is not present, a locally defined "default" MULTI_EXIT_DISC may be assumed as a basis for performing tie- breaking. b) Otherwise, if the local BIS can ascertain the cost of a path to the entity depicted by the NEXT_HOP attribute of the candidate route, select the route with the lowest cost (interior distance) to the entity depicted by the NEXT_HOP attribute of the route. If there are several routes with the same cost, then the tie-breaking shall be broken as follows: - if at least one of the candidate routes was advertised by the BIS in an adjacent RD, select the route that was advertised by the BIS in an adjacent RD whose IDRP Identifier has the lowest value among all other BIS in adjacent RDs; - otherwise, select the route that was advertised by the BIS whose IDRP Identifier has the lowest value. Yakov Rekhter, Paul Traina [Page 86] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 8.15.3. Phase 3: route dissemination The Phase 3 decision function shall be invoked on completion of Phase 2, or when any of the following events occur: a) when routes in a Loc-RIB to local destinations have changed b) when locally generated routes with the INCOMPLETE_PATH attribute (that is, routes learned by means outside of IDRP) have changed c) when a new BIS-BIS connection has been established d) when directed to do so by system management. The Phase 3 function is a separate process which completes when it has no further work to do. The Phase 3 Routing Decision function shall be blocked from running while the Phase 2 decision function is in process. All routes in the Loc-RIB shall be processed into a corresponding entry in the associated Adj-RIBs-Out and FIBs (which are identified by the same RIB-Tag), replacing the current entries., The path attributes are updated in accordance with the appropriate subclause of 8.12 8.16 - 96 ) may optionally be applied. Routes with identical NLRI extracted from the same Loc-RIB shall always be aggregated before being copied to an Adj-RIB-Out, and may be aggregated with other routes according to the local Routing Policy. Every FIB shall have an entry for every destination for which a route exists in a Loc-RIB. A locking scheme should be implemented to prevent simultaneous access to an FIB by both the phase 3 function and forwarding engine. The phase 3 function should first lock an FIB before entering, replacing or deleting an entry, and then unlock that FIB once the operation is complete. When the updating of the Adj-RIBs-Out and the FIBs is complete, the local BIS shall run the external update process of 97 Yakov Rekhter, Paul Traina [Page 87] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 8.15.4. Overlapping routes A BIS may transmit routes with overlapping NLRI to another BIS. NLRI overlap occurs when a set of destinations are identified in non- matching multiple routes, all of which have the same set of FIB-Tags. Since IDRP encodes NLRI using prefixes, overlaps will always exhibit subset relationships. A route describing a smaller set of destinations (a longer prefix) is said to be more specific than a route describing a larger set of destinations (a shorter prefix); similarly, a route describing a larger set of destinations (a shorter prefix) is said to be less specific than a route describing a smaller set of destinations (a longer prefix). When overlapping routes are present in the same Adj-RIB-In, the more specific routes shall take precedence, in order from most specific to least specific. This precedence relationship effectively decomposes less specific routes into two parts: - a set of destinations described only by the less specific route, and - a set of destinations described by the overlap of the less specific and the more specific routes The set of destinations described by the overlap represent a portion of the less specific route that is feasible, but is not currently in use. If a more specific route is later withdrawn, the set of destinations described by the overlap will still be reachable using the less specific route. If a BIS receives overlapping routes from a given neighbor, the Decision Process shall not simultaneously reject the more specific route from neighbor BIS (A) and install A's less specific route unless the contents of the local BIS's Adj-RIBs-Out and FIBs insure that NPDUs with destinations listed in the NLRI of A's more specific route can not be forwarded to the neighbor BIS (A). Therefore, when presented with overlapping routes from a given neighbor BIS (A), the local BIS could select any of the following options, all of which satisfy the criterion stated above: a) Install both the less specific and more specific routes received from the given neighbor (A) b) Install the more specific route received from the given neighbor (A) and reject A's less specific route Yakov Rekhter, Paul Traina [Page 88] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 c) Install the non-overlapping part of the less specific and more specific routes received from the given neighbor (A) d) Install a route formed by the aggregation of the less specific and the more specific route received from the given neighbor (A) e) Install the less specific route received from the given neighbor (A), and also install another route received from a different neighbor (B) that is simultaneously: 1) more specific than A's less specific route, and 2) less specific than A's more specific route. f) Install neither of the routes received from A. 8.15.5. Interaction with update process Since the Adj-RIBs-In are used both to receive inbound UPDATE PDUs and to provide input to the Decision Process, care must be taken that their contents are not modified while the Decision Process is running. That is, the input to the Decision Process shall remain stable while a computation is in progress. Two examples of approaches that could be taken to accomplish this: a) The Decision Process can signal when it is running. During this time, any incoming UPDATE PDUs will be queued and will not be written into the Adj-RIBs-In. If more UPDATE PDUs arrive than can be fit into the allotted queue, they will be dropped and will not be acknowledged. b) A BIS can maintain two copies of the Adj-RIBs-In - one used by the Decision Process for its computation (call this the Comp-Adj- RIB) and the other to receive inbound UPDATE PDUs (call this the Holding-Adj-RIB). Each time the Decision begins a new computation, the contents of the Holding-Adj-RIB will be copied to the Comp- Adj-RIB: that is, the a snapshot of the Comp-Adj-RIB is used as the input for the Decision Process. The contents of the Comp-Adj- RIB remain stable until a new computation is begun. The advantage of the first approach is that it takes less memory; the advantage of the second is that inbound UPDATE PDUs will not be dropped. This document does not mandate the use of either of these methods. Any method that guarantees that the input data to the Decision Process will remain stable while a computation is in progress and that is consistent with the conformance requirements of Yakov Rekhter, Paul Traina [Page 89] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 this document may be used. 8.15.6. Update-Send process The Update-Send process is responsible for advertising BISPDUs to adjacent BISs. For example, it distributes the routes chosen by the Decision Process to other BISs which may be located in either the same RD or an adjacent RD. Rules for information exchange between BIS located in different routing domains are given in 97 ; rules for information exchange between BIS located in the same domain are given in 8.15.7 Distribution of reachability information between a set of BISs, all of which are located in the same routing domain, is referred to as internal distribution. All BISs located in a single RD must present consistent reachability information to adjacent RDs, thus requiring that they have consistent routing and policy information among them. Note 29: This requirement on consistency does not preclude an RD from distributing different reachability information to each of its adjacent routing domains. It does mean that all of a domain's BISs which are attached to a given adjacent domain must provide identical reachability to that domain. When this protocol is run between BISs located in different routing domains, the communicating BISs must be located in adjacent routing domains - that is, they must be attached to a common subnetwork. 8.15.7. Internal updates The internal update process is concerned with the distribution of routing information to BISs located in the local BIS's own routing domain. Each BIS selects the most preferable route, if any, that it has received from a BIS in an adjacent routing domain, and distributes that route to every other BIS in its own routing domain. This process ensures that all BISs in a routing domain will select the same set of routes. The following procedures shall be applied separately for each set of FIB-Tags supported by the BIS: a) When a BIS receives an UPDATE PDU from another BIS located in its own routing domain, the receiving BIS shall not re-distribute the routing information contained in that UPDATE PDU to other BISs located in its own routing domain. Yakov Rekhter, Paul Traina [Page 90] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 b) When a BIS receives a new feasible route from a BIS in an adjacent routing domain, it shall advertise that route to all other BISs in its routing domain by means of an UPDATE PDU if any of the following conditions occur: 1) the degree of preference assigned to the newly received route by the local BIS is higher than the degrees of preference that the local BIS has assigned to other routes - with the same destinations and the same FIB-Tag - that have been received from BISs in adjacent routing domains. 2) there are no other routes - with the same destinations and the same FIB-Tag - that have been received from BISs in adjacent routing domains. 3) the newly received route is selected as a result of breaking a tie between several routes that were received from BISs in adjacent routing domains and that have the highest degree of preference, the same destinations, and the same FIB- Tag (see 8.15.7.1 ). c) When a BIS receives an UPDATE PDU with a non-empty WITHDRAWN ROUTES field, it shall remove from its Adj-RIBs-In all routes whose NLRIs was carried in this field. The BIS shall take the following additional steps: 1) if the corresponding feasible route had not been previously advertised, then no further action is necessary 2) if the corresponding feasible route had been previously advertised, then: i) if a new route is selected for advertisement that has the same FIB-Tag and NLRI as the unfeasible route, then the local BIS shall advertise the replacement route ii) if a replacement route is not available for advertisement, then the BIS shall include the NLRI of the unfeasible route in the WITHDRAWN ROUTES field of an UPDATE PDU, and shall send this PDU to each neighbor BIS to whom it had previously advertised the corresponding feasible route. All feasible routes which are advertised shall be placed in the appropriate Adj-RIB-Out, and all unfeasible routes which are advertised shall be removed from the Adj-RIBs-Out. Yakov Rekhter, Paul Traina [Page 91] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 8.15.7.1. Breaking ties (internal updates) If a local BIS has connections to several BISs in adjacent domains, there will be multiple Adj-RIBs-In associated with these neighbors. These Adj-RIBs-In might contain several equally preferable routes to the same destination, all of which have the same FIB-Tag an all of which were advertised by BISs located in adjacent routing domains. The local BIS shall select one of these routes, according to the following rules: a) If all candidate routes contain the MULTI-EXIT_DISC attribute, the candidate routes differ only in their NEXT_HOP and MULTI- EXIT_DISC attributes, and the local BIS's managed object Multiexit is TRUE, select the route that has the lowest value of the MULTI- EXIT_DISC attribute. b) If the local system can ascertain the cost of a path to the entity depicted by the NEXT_HOP attribute of the candidate route, select the route with the lowest cost. c) In all other cases, select the route that was advertised by the BIS whose IDRP Identifier has the lowest value. 8.15.8. External updates The external update process is concerned with the distribution of routing information to BISs located in adjacent routing domains. As part of the Phase 3 route selection process, the BIS has updated its Adj-RIBs-Out and its FIBs. All newly installed routes and all newly unfeasible routes for which there is no replacement route shall be advertised to BISs located in adjacent routing domains by means of UPDATE PDUs. Any routes in the Loc-RIB marked as infeasible shall be removed. Changes to the reachable destinations within its own RD shall also be advertised in an UPDATE PDU. However, advertisement of a given UPDATE PDU shall not violate any distribution constraint imposed by the path attributes of the route contained therein. A BIS shall not propagate an UPDATE PDU that contains routes with FIB-Tag that was not listed in the RIB-TagsSet field of the neighbor BIS's OPEN PDU. If such routes are advertised, it will cause the BIS-BIS connection to be closed, as described in 8.18.3 Yakov Rekhter, Paul Traina [Page 92] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 8.15.9. Controlling routing traffic overhead The inter-domain routing protocol constrains the amount of routing traffic (that is, BISPDUs) in order to limit both the link bandwidth needed to advertise BISPDUs and the processing power needed by the Decision Process to digest the information contained in the BISPDUs. 8.15.9.1. Frequency of route advertisement The managed object minRouteAdvertisementInterval determines the minimum amount of time that must elapse between advertisements of routes to a particular destination from a single BIS. This rate limiting procedure applies on a per-destination basis, although the value of minRouteAdvertisementInterval is set on a per-BIS basis. Two UPDATE PDUs sent from a single BIS that advertise feasible routes to some common set of destinations received from BISs in other routing domains must be separated in time by at least minRouteAdvertisementInterval. For example, any technique that ensures that the separation will be between one and two times the value minRouteAdvertisementInterval is acceptable. Since fast convergence is needed within an RD, this procedure does not apply for routes received from other BISs in the same routing domain. To avoid long-lived black holes, the procedure does not apply to the explicit withdrawal of unfeasible routes (that is, routes whose NLRI is listed in the Withdrawn Routes field of an UPDATE PDU). This procedure does not limit the rate of route selection, but only the rate of route advertisement. If new routes are selected multiple times while awaiting the expiration of minRouteAdvertisementInterval, the last route selected shall be advertised at the end of minRouteAdvertisementInterval. 8.15.9.2. Frequency of route origination The architectural constant MinRDOriginationInterval determines the minimum amount of time that must elapse between successive advertisements of UPDATE PDUs that report changes within the advertising BIS's own routing domain. Yakov Rekhter, Paul Traina [Page 93] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 8.15.9.3. Jitter To minimize the likelihood that the distribution of BISPDUs by a given BIS will contain peaks, jitter should be applied to the timers associated with minRouteAdvertisementInterval and MinRDOriginationInterval. A given BIS shall apply the same jitter to each of these quantities regardless of the destinations to which the updates are being sent: that is, jitter will not be applied on a "per peer" basis. The amount of jitter to be introduced shall be determined by multiplying the base value in the appropriate managed object by a random factor which is uniformly distributed in the range from 1-J to 1, where J is the value of the architectural constant Jitter. The result shall be rounded up to the nearest 100 millisecond increment. 8.16. Efficient organization of routing information Having selected the routing information which it will advertise, a BIS may avail itself of several methods to organize this information in an efficient manner. 8.16.1. Information reduction Information reduction may imply a reduction in granularity of policy control - after information is collapsed, the same policies will apply to all destinations and paths in the equivalence class. The Decision Process may optionally reduce the amount of information that it will place in the Adj-RIBs-Out by any of the following methods: a) Network Layer Reachability Information: Destination addresses can be represented as address prefixes. In cases where there is a correspondence between the address structure and the systems under control of a routing domain administrator, it will be possible to reduce the size of the network layer reachability information that is carried in the UPDATE PDUs. b) RD_PATHS: RD path information can be represented as ordered RD-SEQUENCES or unordered RD_SETs. RD_SETs are used in the route aggregation Yakov Rekhter, Paul Traina [Page 94] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 algorithm described in 8.16.2 They reduce the size of the RD_PATH information by listing each RDI only once, regardless of how many times it may have appeared in the multiple RD_PATHS that were aggregated. An RD_SET implies that the destinations listed in the NLRI can be reached through paths that traverse at least some of its constituent RDs. RD_SETs provide sufficient information to avoid routing loops; however, their use may prune potentially useful paths, since such paths are no longer listed individually as in the form of RD_SEQUENCES. In practice this is not likely to be a problem, since once an NPDU arrives at the edge of a group of RDs, the BIS at that point is likely to have more detailed path information and can distinguish individual paths to destinations. 8.16.2. Aggregating routing information Aggregation is the process of combining the characteristics of several different routes (or components of a route such as an individual path attribute) in such a way that a single route can be advertised. Aggregation can occur as part of the decision process to reduce the amount of information that will be placed in the Adj- RIBs-Out. For example, at the boundary of a routing domain confederation an exit BIS can aggregate several intra-confederation routes into a single route that will be advertised externally. Aggregation reduces the amount of information that BISs must store and exchange with each other. Routes can be aggregated by applying the following procedures separately to path attributes of like type and to the NLRI information. 8.16.2.1. Route aggregation Several routes shall not be aggregated into a single route unless the FIB-Tags of each of these route are the same. Routes that have the following attributes shall not be aggregated unless the corresponding attributes of each route are identical: MULTI-EXIT_DISC and NEXT_HOP. An aggregated route is constructed from one or more component routes. If a component of an aggregated route that has been advertised in an UPDATE PDU becomes unfeasible, then all component routes that Yakov Rekhter, Paul Traina [Page 95] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 comprise the aggregated route, except for the unfeasible component, shall be advertised again, either as separate routes or as a new aggregated route. If the new aggregated route has the same NLRI as the previous aggregated route, then no further actions are necessary, since advertisement of the new aggregated route implicitly marks the old aggregated route as having been withdrawn from use. In all other cases, the original aggregated route must be withdrawn explicitly by means of the Withdrawn Routes field of an UPDATE PDU. 8.16.2.2. Path attribute aggregation Path attributes that have different type codes can not be aggregated together. Path attributes of the same type code may be aggregated, according to the following rules: LOCAL_PREF attributes: When several routes are aggregated, the advertising BIS shall compute a degree of preference for the aggregated route, and shall carry this value in the LOCAL_PREF attribute of the aggregated route. INCOMPLETE_PATH attributes: If at least one route among routes that are aggregated has the INCOMPLETE_PATH attribute, then the aggregated route must have the INCOMPLETE_PATH attribute as well. RD_PATH attributes: The individual RD_PATH attributes from which the aggregated RD_PATH attribute will be constructed are called the component attributes, and the ENTRY_SEQ and ENTRY_SET path segments contain the RDIs of confederations that have been entered but not yet exited. If the RDIs of all such confederations appear in the same relative order of entry in every component route, then aggregation may be performed without pre-processing the component routes. If they appear in different orders of entry in the component routes, then the pre-processing step outlined below must be performed in order to create the same order of entry in every component route before applying the aggregation procedures. If routes to be aggregated have identical RD_PATH attributes, Yakov Rekhter, Paul Traina [Page 96] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 then the aggregated route has the same RD_PATH attribute as each individual route, and no further processing is necessary. Pre-processing to Attain Identical Order of Entry: Apply the following procedure to each component route individually. Replace all path segments, from the first ENTRY_SET or ENTRY_SEQ segment to the last path segment, inclusive, with a path segment of type ENTRY_SET followed by a path segment of type RD_SET: - The path segment of type ENTRY_SET shall contain the union of the all the RDIs listed in the individual ENTRY_SET and ENTRY_SEQ segments. The RDIs must be listed in the same order in each component route. The specific ordering algorithm is left as a local matter, but it shall guarantee that the RDI of a given confederation does not precede the RDI of any confederation within which it it is nested. - The path segment of type RD_SET shall contain the union of the RDIs contained in all RD_SETs and RD_SEQs that appear after the first ENTRY_SET or ENTRY_SEQ of the component route. Aggregation Procedures: For purposes of this procedure, a path segment that lists multiple RDIs shall be treated as if it were multiple consecutive path segments, where each path segment lists a single RDI and the order of appearance of RDIs is maintained. For example, a path segment that listed RDIs X, Y, and Z (in that order) is treated as if it were a path segment listing X, followed by a path segment listing Y, followed by a path segment listing Z. If all the RD_PATH attributes of all component routes are identical, the aggregated path attribute is equal to the original RD_PATH attribute. The main procedure of 8.16.2.2.1 calls the subroutine of 8.16.2.2.2 for aggregating RD_PATH attributes that contain no ENTRY_SEQs or ENTRY_SETs (generically called an "Entry Marker"). In effect, the main procedure applies the subroutine to all segments that are located between Entry Markers, between an Entry Marker and the end of a component attribute, or between the start of a component attribute and its first Entry Marker. The main procedure is described in 8.16.2.2.1 , and the subroutine is described in 8.16.2.2.2 Yakov Rekhter, Paul Traina [Page 97] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 RD_HOP COUNT attribute: The value of the RD_HOP_COUNT of the aggregated route shall be set equal to the largest RD_HOP_COUNT that was contained in the routes being aggregated. CAPACITY Attributes: The value of the CAPACITY attribute of the aggregated route shall be equal to the smallest integer value contained in the CAPACITY fields of the routes being aggregated. 8.16.2.2.1. Main procedure for RD_PATH aggregation This procedure is used to aggregate the RD_PATH attributes of component routes: a) Set the aggregated RD_PATH to "empty". b) Scanning from the back of each non-empty component attribute, locate the first Entry Marker. If the type of marker in any component route is ENTRY_SET, then change the type of the corresponding Entry Marker in all component attributes to ENTRY_SET. c) If no Entry Marker is found, apply the subroutine for aggregating RD_PATHs with no Entry Markers (see 8.16.2.2.2 ), and prepend the result to the aggregated RD_PATH attribute. d) If a Entry Marker is found, prepend the following to the aggregated RD_PATH attribute, in the order indicated: the located Entry Marker, followed immediately by the path segments obtained by applying the subroutine for aggregation of RD_PATHs with no Entry Markers (see 8.16.2.2.2 ) to the path segments that follow the located Entry Marker in each component attribute. If a component attribute has no path segments following the located Entry Marker, pass it to the subroutine as an empty set. e) Delete from each component attribute all the path segments that were appended to the aggregated attribute in steps c or d. f) Repeat steps b through e until every component attribute is empty. If there are consecutive path segments of the same type, they shall be combined into a single path segment of the same type. Yakov Rekhter, Paul Traina [Page 98] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 8.16.2.2.2. Aggregating routes with no entry markers The subroutine for aggregating RD_PATH attributes with no entry markers is as follows: a) Set the aggregated RD_PATH to "empty". b) Scanning from the back of each component attribute, locate the first identical longest sequence of path segments that occurs in every component attribute, including any that are empty. Note 30: It will not be possible to find an identical sequence in every component attribute if one or more of them are empty. c) If there is no identical sequence, form a path segment of type RD_SET that contains every RDI in every non-empty component attribute. Prepend this list to the aggregated RD_PATH attribute. d) If the identical sequence is the final sequence of every component attribute, prepend it to the aggregated route. e) If the identical sequence is not the final sequence of every component attribute, form a path segment of type RD_SET that lists every RDI that occurs between the end of the identical sequence and the end of each non-empty component attribute. Prepend this list to the aggregated RD_PATH attribute. f) Delete from each component attribute all path segments that were added to the aggregated RD_PATH attribute in step c, d, or e. g) If, after the deletions in step f have been made, an RDI is present in both the aggregated RD_PATH attribute and in any of the component attributes, then the accumulated RD_PATH attribute shall be replaced by a single path segment of type RD_SET that lists every RDI that was present in the component routes that were the input to this subroutine (before any deletions were made), and the subroutine terminates. Otherwise, repeat steps b through f until every component attribute is empty. 8.17. 7.19 Maintenance of the forwarding information bases As summarized in Table 1, the Forwarding Information Bases contain the following information: a) the Network Layer address of the next-hop BIS, b) the local SNPA used by the local BIS to forward traffic to the Yakov Rekhter, Paul Traina [Page 99] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 next-hop BIS, e) if available, the SNPA in the next-hop BIS to which NPDUs will be forwarded. The RIB-Tag of the Loc-RIB which contains a route is also the RIB-Tag of the corresponding FIB; the NLRI for the associated FIB is the same as the NLRI of the corresponding route that is stored in the Loc-RIB. The forwarding information consists of three parts. a) Network Layer address of Next-hop BIS: For each route in the Loc- RIB, the next-hop BIS has been determined, and is carried as a tag, as described in 8.15.2 entry in the FIB. This information is always present. b) Output SNPA: The SNPA that will be used by the local BIS for forwarding traffic to the destinations identified in the NLRI field of the FIB is established locally, and is one of the SNPAs identified in managed object localSNPA. c) Input SNPA: The SNPA that will be used by the remote BIS to receive traffic that is the NEXT_HOP attribute of the corresponding route stored in the Loc-RIB. If the NEXT-HOP attribute contains an empty SNPA list, or if the NEXT_HOP attribute itself is not included in the route, then the Input SNPA field in the FIB will be empty. 8.18. Error handling for BISPDUs This section describes actions to be taken when errors are detected while processing BISPDUs. Error handling procedures apply individually to each FSM in the BIS. 8.18.1. BISPDU header error handling If BIS-BIS connection was established using authentication code 2 (checksum plus authentication) and the validation pattern in the BISPDU header does not match the locally computed pattern, then the BISPDU shall be discarded without any further actions. If any of the following error conditions are detected, the BISPDU shall be discarded, and the appropriate error event shall be logged by the receiving BIS: Yakov Rekhter, Paul Traina [Page 100] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 a) Length field of a PDU header less than 30 octets or greater than the Segment Size specified by the remote system's OPEN PDU, b) Length field of an OPEN PDU less than minimum length of an OPEN_ PDU c) Length field of an UPDATE PDU less than minimum length of an UPDATE PDU d) Length field of KEEPALIVE PDU not equal to 30 e) Length of an IDRP ERROR PDU less than the minimum length of 32 f) Length of a CEASE PDU less than the minimum length of 30 g) The BIS-BIS connection was established using authentication code 1 (checksum without authentication) and the validation pattern in the BISPDU header does not match the locally computed pattern h) Type field in the BISPDU is not recognized 8.18.2. OPEN PDU error handling The following errors detected while processing the OPEN PDU shall be indicated by sending an IDRP ERROR PDU with error code OPEN_PDU_Error. The error subcode of the IDRP ERROR PDU shall elaborate on the specific nature of the error. a) If the version number of the received OPEN PDU is not supported, then the error subcode of the IDRP ERROR PDU shall be set to Unsupported_Version_Number. The Data field of the IDRP ERROR PDU is a 2-octet unsigned integer, which indicates the highest supported version number less than the version of the remote BIS peer's bid (as indicated in the received OPEN PDU). b) If the Maximum PDU Size field of the OPEN PDU is less than MinBISPDULength octets, the error subcode of the IDRP ERROR PDU is set to Bad_Maximum PDU_Size. The Data field of the IDRP ERROR PDU is a 2 octet unsigned integer which contains the erroneous Maximum PDU Size field. c) If the Routing Domain Identifier field of the OPEN PDU is not the expected one, the error subcode of the IDRP ERROR PDU is set to Bad_Peer_RD. The expected values of the Routing Domain Identifier may be obtained by means outside the scope of this Yakov Rekhter, Paul Traina [Page 101] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 protocol (usually it is a configuration parameter). The value of the erroneous RDI is returned in the Data field of the IDRP ERROR PDU, encoded as a pair. "Length" is a one octet field containing a positive integer that gives the number of octets used for the following "RDI" field. d) If a BIS receives an OPEN PDU from a BIS located in the same RD, and the RIB-TagsSet field contained in that PDU is different from the receiving BIS's managed object RIBTagsSet, then the error subcode of the IDRP ERROR PDU shall be set to Bad-RIB-TagsSet. e) If the value of the Authentication Code field of the OPEN PDU is any value other than 1 or 2, the error subcode of the IDRP ERROR PDU is set to Unsupported_Authentication_Code. f) If a given BIS receives an OPEN PDU from another BIS located in the same routing domain, then the RDIs reported in the Confed-IDs field of the OPEN PDU (received from the remote BIS) should match the Confed-IDs of the local BIS. If they do not match exactly, then an IDRP ERROR PDU shall be issued, indicating an OPEN PDU error with an error subcode of RDC_Mismatch. The data field of the IDRP ERROR PDU shall report the offending Confed-IDs field from the rejected OPEN PDU. g) If the Hold Time field of the OPEN PDU is unacceptable, then the Error Subcode shall be set to Unacceptable Hold Time. An implementation shall reject Hold Time values of one or two seconds. An implementation may reject any proposed Hold Time. An implementation which accepts a Hold Time shall use the negotiated value for the Hold Time. h) If the OPEN PDU carries one or more well-known optional parameters, and if any of these parameters is not recognized, then the error subcode of the IDRP ERROR PDU shall be set to Unsupported well-known parameter. The Data field of the IDRP ERROR PDU shall report the unrecognized parameter (type, length and value). 8.18.3. UPDATE PDU error handling All errors detected while processing the UPDATE PDU are indicated by sending an IDRP ERROR PDU with error code UPDATE_PDU_Error. The error subcode of the IDRP ERROR PDU elaborates on the specific nature of the error. a) If the Total Attribute Length is inconsistent with the Length field of the PDU header, then the error subcode of the IDRP ERROR Yakov Rekhter, Paul Traina [Page 102] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 PDU shall be set to Malformed_Attribute_List. No further processing shall be done and all information in the UPDATE PDU shall be discarded. b) If any recognized attribute has attribute flags that conflict with the attribute type code, then the error subcode of the IDRP ERROR PDU shall be set to Attribute_Flags_Error. The Data field of the IDRP ERROR PDU shall contain the incorrect attribute (type, length and value). No further processing shall be done, and all information in the UPDATE PDU shall be discarded. c) If any recognized attribute has a length that conflicts with the expected length (based on the attribute type code), then the error subcode of the IDRP ERROR PDU shall be set to Attribute_Length_Error. The Data field of the IDRP ERROR PDU contains the incorrect attribute (type, length and value). No further processing shall be done, and all information in the UPDATE PDU shall be discarded. d) If any of the mandatory well-known attributes are not present, then the error subcode of the IDRP ERROR PDU shall be set to Missing_Well-known_Attribute. The Data field of the IDRP ERROR PDU contains the attribute type code of the missing well-known attribute. e) If any well-known attribute (so designated by the attribute flags) is not recognized, then the error subcode of the IDRP ERROR PDU shall be set to Unrecognized_Well-known_Attribute. The Data field of the IDRP ERROR PDU shall report the unrecognized attribute (type, length and value). In both cases no further processing shall be done, and all information in the UPDATE PDU shall be discarded. f) If the NEXT_HOP attribute field is invalid, then the error subcode of the IDRP ERROR PDU shall be set to Invalid_NEXT_HOP_Attribute. The Data field of the IDRP ERROR PDU contains the incorrect attribute (type, length and value). No further processing shall be done and all information in the UPDATE PDU shall be discarded. g) The sequence of RD path segments shall be checked for RD loops. RD loop detection shall be done by scanning the complete list of RD path segments (as specified in the RD_PATH attribute) and checking that each RDI in this list occurs only once. If an RD loop is detected, then the error subcode of the IDRP ERROR PDU shall be set to RD_Routing_Loop. The data field of the IDRP ERROR PDU shall report the first RDI Yakov Rekhter, Paul Traina [Page 103] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 that indicated a loop. This RDI shall be followed immediately by the complete RD_PATH attribute. The encoding shall be: length, RDI, Offending RD_PATH attribute>, where: - "length" is a one octet field that gives the length of the in octets of the immediately following RDI field - "RDI" is the RDI that was detected as creating the loop - RD_PATH is the octet string that encoded the value field of the offending RD_PATH attribute in the received UPDATE PDU (see 6.3). No further processing shall be done, and all information in the UPDATE PDU shall be discarded. h) If any non-null FIB-Tag advertised in an UPDATE PDU received from a BIS located in a different routing domain does not match any of the RIB-Tags that the local (receiving) BIS had advertised to that neighbor in the RIB-TagsSet field of its OPEN PDU, then the receiving BIS shall send an IDRP Error PDU that reports an error subcode of Malformed_Attribute_List. All information in the UPDATE PDU shall be discarded, and no further processing shall be done. l) If the length of the NLRI is inconsistent with the Length field of the PDU header, then the error subcode of the IDRP ERROR PDU shall be set to Malformed_NLRI. No further processing shall be done, and all information in the UPDATE PDU shall be discarded. m) If an optional attribute is recognized, then the value of this attribute shall be checked. If an error is detected, the attribute shall be discarded, and the error subcode of the IDRP ERROR PDU shall be set to Optional_Attribute_Error. The Data field of the IDRP ERROR PDU shall report the attribute (type, length and value). No further processing shall be done, and all information in the UPDATE PDU shall be discarded. n) If RDCs are supported and any of the error conditions noted in 8.12.3.3 occur, no further processing of the UPDATE PDU shall be done, all information in the UPDATE PDU shall be discarded, and the error code of the NOTIFICATION PDU shall be set to Misconfigured_RDCs. Note 32: This error condition refers to duplicated attributes within a single route. Yakov Rekhter, Paul Traina [Page 104] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 p) If an UPDATE PDU contains more than one instance of a path attribute of the same type, the BIS shall send an IDRP ERROR PDU with error subcode Duplicated_Attributes. The data field of the IDRP ERROR PDU shall list the type codes of all such duplicated attributes. q) If the RD_PATH attribute contains an illegal segment type, the BIS shall send an IDRP ERROR PDU, with error subcode Illegal_RD_Path_Segment. The data field of the IDRP ERROR PDU shall reproduce the encoding of the offending segment of the RD_PATH attribute, as it appeared in the received UPDATE PDU. 8.18.4. IDRP ERROR PDU error handling If a BIS receives an IDRP ERROR PDU with a correct validation pattern but which contains an unrecognized error code or error subcode, the local BIS shall close the connection as described in clause 8.6.2 Note 33: Any error (such as unrecognized Error Code or Error Subcode, or an incorrect Length field in the PDU header) should be logged locally and brought to the attention of the administration of the peer. The means to do this are, however, outside the scope of this protocol. 8.18.5. Hold timer expired error handling If the FSM for a given BIS-BIS connection is in the ESTABLISHED state and the local BIS does not receive successive PDUs of types KEEPALIVE, UPDATE, or RIB REFRESH, within the period specified in the Hold Time field of the OPEN PDU previously sent to the remote BIS, then an IDRP ERROR PDU with error code Hold_Timer_Expired shall be sent to the remote BIS and the FSM for the associated BIS-BIS connection shall enter the CLOSE-WAIT state. 8.18.6. KEEPALIVE PDU error handling The KEEPALIVE PDU consists of only the BISPDU Header. Error conditions are handled according to 8.18.1 Yakov Rekhter, Paul Traina [Page 105] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 8.18.7. CEASE PDU error handling The CEASE PDU consists of only the BISPDU Header. Error conditions are handled according to 8.18.1 8.18.8. RIB REFRESH PDU error handling If any of the following error conditions are detected, the BIS shall issue an IDRP ERROR PDU with the following error indications: a) Invalid OpCode not in Range 1 to 3: indicate RIB REFRESH error with error subcode "Invalid OpCode" b) Receipt of an OpCode 3 (RIB Refresh End) without prior receipt of OpCode 2 (Rib Refresh Start): indicate FSM Error c) Receipt of an unsupported RIB-Tag in the Rib-Tags variable length field in the RIB REFRESH PDU for a RIB Refresh Start OpCode: indicate RIB REFRESH error with error subcode "Unsupported RIB-Tags" 9. Constants This constants used by the protocol defined in this document are enumerated in Table 6. 10. Required set of supported routing policies Policies are provided to IDRP in the form of configuration information. This information is not directly encoded in the protocol. Therefore, IDRP can provide support for very complex routing policies. However, it is not required that all IDRP implementations support such policies. We are not attempting to standardize the routing policies that must be supported in every IDRP implementation; we strongly encourage all implementors to support the following set of routing policies: IDRP implementations should allow a domain to control announcements of IDRP-learned routes to adjacent domains. Implementations should also support such control with at least the Yakov Rekhter, Paul Traina [Page 106] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 +----------------------------------------------------------------------+ | Table 6. Architectural Constants of IDRP | +---------------------------+--------------+---------------------------+ | Name of Constant | Value | Description | +---------------------------+--------------+---------------------------+ | Inter-domain Routing | 45 | The Protocol the protocol | | Protocol Number | | described in this | | | | document | +---------------------------+--------------+---------------------------+ | MinBISPDULength | 30 | The size in octets of the | | | | smallest allowable | | | | BISPDU. | +---------------------------+--------------+---------------------------+ | MinRDOriginationInterval | 15 min | The minimum time between | | | | successive UPDATE PDUs | | | | advertising routing | | | | information about the | | | | local RD | +---------------------------+--------------+---------------------------+ | Jitter | 0,25 | The factor used to | | | | compute jitter according | | | | to clause 7.17.3.3. | +---------------------------+--------------+---------------------------+ | MaxCPUOverloadPeriod | 1 hr | Maximum time in which a | | | | BIS can remain | | | | CPU-overloaded before | | | | terminating its BIS-BIS | | | | connections. | +---------------------------+--------------+---------------------------+ | CloseWaitDelay | 150 s | The time that a FSM | | | | remains in CLOSE-WAIT | | | | state before entering the | | | | CLOSED state. | +---------------------------+--------------+---------------------------+ granularity of a single address prefix. Implementations should also support such control with the granularity of a domain, where the domain may be either the domain that originated the route, or the domain that advertised the route to the local system (adjacent domain). Care must be taken when a BIS selects a new route that can't be announced to a particular external peer, while the previously selected route was announced to that peer. Specifically, the local system must explicitly indicate to the peer that the previous route is now infeasible. IDRP implementations should allow a domain to prefer a particular path to a destination (when more than one path is available). At Yakov Rekhter, Paul Traina [Page 107] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 the minimum an implementation shall support this functionality by allowing to administratively assign a degree of preference to a route based solely on the IP address of the neighbor the route is received from. The allowed range of the assigned degree of preference shall be between 0 and 2^(31) - 1. IDRP implementations should allow a domain to ignore routes with certain domains in the RD_PATH path attribute. Such function can be implemented by assigning "infinity" as "weights" for such domains. The route selection process must ignore routes that have "weight" equal to "infinity". 11. Operations over Switched Virtual Circuits When using IDRP over Switched Virtual Circuit (SVC) subnetworks it may be desirable to minimize traffic generated by IDRP. Specifically, it may be desirable to eliminate traffic associated with periodic KEEPALIVE messages. IDRP includes a mechanism for operation over switched virtual circuit (SVC) services which avoids keeping SVCs permanently open and allows it to eliminates periodic sending of KEEPALIVE messages. This section describes how to operate without periodic KEEPALIVE messages to minimize SVC usage when using an intelligent SVC circuit manager. The proposed scheme may also be used on "permanent" circuits, which support a feature like link quality monitoring or echo request to determine the status of link connectivity. The mechanism described in this section is suitable only between the BISs that are directly connected over a common virtual circuit. 11.1. Establishing an IDRP Connection The feature is selected by specifying zero Hold Time in the OPEN BISPDU. 11.2. Circuit Manager Properties The circuit manager must have sufficient functionality to be able to compensate for the lack of periodic KEEPALIVE BISPDU: It must be able to determine link layer unreachability in a predictable finite period of a failure occurring. Yakov Rekhter, Paul Traina [Page 108] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 On determining unreachability it should: - start a configurable dead timer (comparable to a typical Hold timer value). - attempt to re-establish the Link Layer connection. If the dead timer expires it should: - send a deactivate indication to IDRP FSM. If the connection is re-established it should: - cancel the dead timer. - transmit any queued BISPDUs. 11.3. Combined Properties Some implementations may not be able to guarantee that the IDRP process and the circuit manager will operate as a single entity; i.e. they can have a separate existence when the other has been stopped or has crashed. If this is the case, a periodic two-way poll between the IDRP process and the circuit manager should be implemented. If the IDRP process discovers the circuit manager has gone away it should close all relevant BIS-BIS connections. If the circuit manager discovers the IDRP process has gone away it should close all its BIS-BIS connections associated with the IDRP process and reject any further incoming BIS-BIS connections. 12. Security Considerations Security issues are not discussed in this document. Yakov Rekhter, Paul Traina [Page 109] Internet Draft draft-ietf-idr-idrp2-00.txt June 1996 13. Acknowledgements This document is based on a combination of of IDRP version 1 (ISO10747) and BGP-4. As such, it borrows heavily from both of its ancestors. Note that during their development both of the ancestors (IDRP and BGP-4) borrowed heaviliy from each other. Thus we would like to acknowledge all the individuals who contributed to the design of both BGP and IDRP. We also like to acknowledge all the members of both the Inter-Domain Routing Working Group of the IETF and the X3S3.3 Working Group of ANSI where BGP and IDRP were designed. 14. References 15. Editors's Addresses Yakov Rekhter cisco Systems, Inc. 170 Tasman Dr. San Jose, CA 95134 Phone: (914) 528-0090 email: yakov@cisco.com Paul Traina Juniper Networks, Inc. 101 University Ave. Suite 240 Palo Alto, CA 94301 Phone: (415) 614-4140 email: pst@jnx.com Yakov Rekhter, Paul Traina [Page 110]