Network Working Group P. Tsuchiya INTERNET-DRAFT Bellcore July 1992 Shortcut Routing: Discovery and Routing over Large Public Data Networks Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts). Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. Changes in this Version This version (July 1992) has two differences from the previous version (June 1992). This version has the shortcut header format and encapsulation scheme. This version also adds a section discussion version negotiation. Please send comments to the IP Over Large Public Data Networks working group, iplpdn@nri.reston.va.us. The above paragraphs of course won't appear in the final RFC. The following paragraph will, but for now please ignore it. This RFC defines a protocol for both intra- and inter-domain discovery and routing over the Switched Multi-megabit Data Service Network. This protocol is an enhancement to the techniques described in RFC 1209. This RFC specifies an IAB standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "IAB Official Protocol Standards" for the standardization state and status of this protocol. Distribution of this RFC is unlimited. Abstract Various RFCs and Internet Drafts specify how to run IP or CLNP over various public data network services [1,7,8,9,10]. These documents specify the encapsulation to be used, and sometimes specify protocol techniques to aid in discovery and routing functions. None of the IPLPDN WG, Expires Jan. 1, 1993 [Page 1] INTERNET-DRAFT Shortcut Routing July 1992 specifications, however, solve the problem of discovery and routing in the full, especially with respect to scaling. This RFC extends these specifications by describing a general and scalable technique, called shortcut routing, for discovery and routing of any connectionless internet protocol over any switched data network (connectionless or connection-oriented). Acknowledgements This document was produced in the IP over Large Public Data Networks working group. The author would specifically like to acknowledge Joe Lawrence of Bellcore, Joel Halpern of Network Systems Corporation, and John Chang of USWest for their careful readings and comments. 1.0 INTRODUCTION Different network services offer different switching capabilities. For instance, SMDS provides a limited multicast service, and RFC 1209 [1] describes how to use that service in the context of ARP [3]. Even though Frame Relay does not yet provide a multicast service, RFC 1294 [7] describes a way of emulating ARP by sending multiple (unicast) ARP packets. None of these capabilities, however, solve the general problem of routing over a large public data network (in this RFC called simply a Subnet), especially with respect to the scaling problem. Any system (router or host) attached to a Subnet may be required to forward an internet packet (ip, spelled in lower case letters to imply any internet packet, as opposed to IP in capitals, which refers to the Internet Protocol of RFC 791 [11]) to any ip host. ip hosts may be reachable behind any Subnet exit point. In order for a Subnet-attached system to be able to forward an ip packet to the optimal Subnet exit point, it must either: 1. know what ip destinations are best reachable behind every Subnet exit point at all times, or 2. have some means of discovering which Subnet exit point to use. When a Subnet is small (such as a LAN) it is possible for condition 1 to exist. Since a Public Data Network (PDN), however, may have many thousands of systems attached to it, it is at best undesirable and at worst impossible for a system to have full ip routing information about every other system attached to the Subnet. The means of discovery of condition 2 does not yet exist (that is what this RFC provides), and so the only possible state of affairs for current large PDNs is for each system to know about a subset of other systems. This can be sufficient for routing ip packets across the Subnet, but does not result in optimal paths. For instance, con- sider four routers attached to a Subnet. Assume that D is the best Subnet exit point for an ip packet arriving at Subnet entry point A. Assume that A does not know about (have specific routing information IPLPDN WG, Expires Jan. 1, 1993 [Page 2] INTERNET-DRAFT Shortcut Routing July 1992 for) D, but that A knows about B, that B knows about C, and that C knows about D. The path for the packet from A to D will therefore be A-B-C-D. It crosses the same Subnet three times, when theoretically it could have crossed the Subnet only once. (This is called a multi-hop Subnet path, or simply a multi-hop path.) Note: The extent to which this is a serious problem depends on many factors, such as how often it occurs, and how non-optimal the multi-hop path is. For instance, it is much worse if both hosts are on the east coast and the router is on the west coast than if the router is close to one of the hosts. In other words, these multi-hop paths may or may not be accept- able. A common example of a multi-hop path is where a singly-homed stub network attached to the Subnet configures a single default router to which it sends all of its traffic. This default router forwards ip packets back over Subnet to the appropriate exit router. As is discussed later on, the default router itself may for scaling purposes not have full routing information about all Subnet-attached routers and may therefore send the ip packets to still other transit routers (routers that receive an ip packet from the Subnet and for- ward the packet back over the Subnet) on the way to the appropriate exit router. Therefore, an ip packet may enter and exit the Subnet multiple times. These multi-hop ip paths are the inevitable result of not requiring every Subnet-attached router to know about every other Subnet- attached router at all times. The only way to get single-hop paths is to allow routers to dynamically discover and cache single-hop paths at the time that they need to forward ip packets. This is similar to the way hosts currently dynamically discover and cache ICMP Redirect [2] and ARP [3] information when they send ip packets to routers. Except for limited cases (for instance, the Subnet single-LIS case described in RFC 1209), ICMP Redirect and ARP (as they are now used) are not appropriate mechanisms for dynamic discovery of paths across large Subnets. ICMP Redirect is not appropriate because routers can- not receive ICMP Redirects. ARP is not appropriate because ip sys- tems cannot scalably multicast to all other Subnet systems. This RFC describes a technique that allows entry Subnet-attached sys- tems to dynamically discover the Subnet address of the appropriate exit Subnet-attached system, even though the routing algorithm may have discovered only a multi-hop Subnet path. This technique, called shortcut routing, works independently of the routing algorithm in use. Therefore, it can work with ISIS, IDRP, ESIS, EGP, BGP, OSPF, RIP, IGRP, or any other routing protocol including static tables. Shortcut routing uses reverse-path learning to discover single-hop paths (called shortcuts) across the Subnet. Shortcut routing works by encapsulating an extra header, called the shortcut header, between IPLPDN WG, Expires Jan. 1, 1993 [Page 3] INTERNET-DRAFT Shortcut Routing July 1992 the Subnet header and the ip header. The primary information in the shortcut header is a single Subnet address. When a system transmits a packet onto the Subnet that was not received from the Subnet (e.g., the entry system), it puts its own Subnet address in the shortcut header. When a router transmits a packet onto the Subnet that was received from the Subnet (e.g., a transit router), it leaves the shortcut header unmodified. Therefore, when a system receives a packet from the Subnet that is not to be forwarded back onto the Sub- net (e.g., the exit system), it can deduce that the reverse path (to the source ip address) is via the Subnet address given in the shortcut header (that of the entry system). Once this is known, packets can be exchanged directly between the entry and exit systems, although on a connection-oriented Subnet, a connection must first be established. Variations of this basic technique exist for the case where one or more of the systems have not implemented shortcut routing, and for the case where a system wishes only to learn about shortcuts via "trusted" peer base-routing (called base-peer) routers. These cases are discussed further on. Note: The notion of trusted here is a rather soft one, and depends on a chain of trust relations (the base-peer router trusts its base-peer, and so on). Trust here does not necessarily imply that encryption of disseminated informa- tion or identification verification must been done. 2.0 SHORTCUT HEADER The shortcut header is formatted as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |entVer |tranVer| Length | Control | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | . | | . | | shortSAddr (Shortcut Address) | | . | | . | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Shortcut Header IPLPDN WG, Expires Jan. 1, 1993 [Page 4] INTERNET-DRAFT Shortcut Routing July 1992 The first two fields are the entry system version number (entVer) and transit system version number (tranVer) respectively. Each version field is 4 bits in length. When the value of the entVer and tranVer fields are both 1, the remainder of the shortcut header is as shown in the figure and described in the following. As of this writing, no other Version numbers are defined, but it is expected that others may follow. The entry system sets both entVer and tranVer to 1 upon transmission. The next 8 bits of the first word is the Length field. This field gives the length, in bits, of the subnet address (in the shortSAddr field). By specifying the Length field in bits, we allow for any type of subnet address, up to a length of 256 bits. Note: The author is not aware of any subnet address types that are variable to the bit level, but does not wish to exclude any that may already exist or may be invented in the future. The least significant 16 bits of the first 32-bit word is the Control field. For entVer and tranVer = 1, only one bit of the control field, the acceptShortcut bit, is defined. This is the least signi- ficant bit (bit 31 of the first 32-bit word). An acceptShortcut bit value of "1" means YES, and an acceptShortcut bit value of "0" means NO. The remaining 32-bit words of the shortcut header contains the shortcut address (shortSAddr). This field is referred to as the shortSAddr in the protocol description. The shortSAddr is a subnet address. The length of the shortSAddr field depends on the length of the subnet address. For any given subnet address type, the short- SAddr field is fixed at a length long enough to hold the maximum size subnet address. In other words, while the length of the shortSAddr field may be different for different address families, it is a fixed length packet for a single address family. The shortSAddr is transmitted most significant bit first. If the subnet address is described in 4-bit digits, then the most signifi- cant digit is transmitted first, and the most significant bit of each digit is transmitted first within that digit. Likewise, if the sub- net address is described in 8-bit bytes, then the most significant byte is transmitted first, and the most significant bit of each byte is transmitted first within that byte. The least significant bits of the shortSAddr field are padded out to a 32-bit boundary. The padding bits are set to 0 upon transmission, and ignored upon receipt. The following table gives the size of the shortSAddr field corresponding to several popular subnetwork address types: IPLPDN WG, Expires Jan. 1, 1993 [Page 5] INTERNET-DRAFT Shortcut Routing July 1992 shortSAddr length Address Type (in 32-bit words) Length value -------------- ------------------- -------------- SMDS 2 64 E.164 2 variable X.121 2 variable IEEE 802.2 2 48 IP 1 32 NSAP 6 variable 3.0 SHORTCUT HEADER ENCAPSULATION The shortcut header can be preceeded either by an LLC/SNAP or by an NLPID/SNAP encapsulation. In both cases, the shortcut header is identified by an EtherType value of XXX. Which encapsulation is used depends on the protocol below the shortcut header [1][7][8][9][10]. The shortcut header does not have an NLPID value assigned to it [12]. The shortcut header is always be followed by an LLC/SNAP. This LLC/SNAP is used to identify the next higher layer protocol. The following text describing this scheme is taken directly from [13] (thank-you Juha Heinanen). In LLC/SNAP based encapsulation the protocol of the routed PDU is identified by prefixing the PDU by an IEEE 802.2 LLC header, which is possibly followed by an IEEE 802.1a SNAP header. The LLC header con- sists of three one octet fields: +------+------+------+ | DSAP | SSAP | Ctrl | +------+------+------+ The LLC header value 0xFE-FE-03 identifies that a routed ISO PDU (see [13]) follows. For routed ISO PDUs the format of the data following the shortcut header shall thus be as follows: Payload Format for Routed ISO PDUs +-------------------------------+ | LLC 0xFE-FE-03 | +-------------------------------+ | . | | ISO PDU | | (up to 2^16 - 4 octets) | | . | +-------------------------------+ The routed ISO protocol is identified by a one octet Network Layer Protocol ID (NLPID) field that is part of Protocol Data. NLPID values are administered by ISO and CCITT. They are defined in IPLPDN WG, Expires Jan. 1, 1993 [Page 6] INTERNET-DRAFT Shortcut Routing July 1992 ISO/IEC TR 9577 [13]. An NLPID value of 0x00 is defined in ISO/IEC TR 9577 as the Null Net- work Layer or Inactive Set. Since it has no significance within the context of this encapsulation scheme, a NLPID value of 0x00 is invalid under the ATM encapsulation. It would also be possible to use the above encapsulation for IP, since, although not an ISO protocol, IP has an NLPID value 0xCC defined for it. This format shall, however, not be used. Instead, IP is encapsulated like all other routed non-ISO protocols by identi- fying it in the EtherType of the SNAP header that immediately follows the LLC header. The presence of a SNAP header is indicated by the LLC header value 0xAA-AA-03. A SNAP header is of the form +------+------+------+------+------+ | OUI | PID | +------+------+------+------+------+ The three-octet Organizationally Unique Identifier (OUI) identifies an organization which administers the meaning of the following two octet Protocol Identifier (PID). Together they identify a distinct routed or bridged protocol. The OUI value 0x00-00-00 specifies that the following PID is an EtherType. The format of the data following the shortcut header for routed non- ISO PDUs shall thus be as follows: Payload Format for Routed non-ISO PDUs +-------------------------------+ | LLC 0xAA-AA-03 | +-------------------------------+ | OUI 0x00-00-00 | +-------------------------------+ | EtherType (2 octets) | +-------------------------------+ | . | | Non-ISO PDU | | (up to 2^16 - 9 octets) | | . | +-------------------------------+ In the particular case of an Internet IP PDU, the EtherType value is 0x08-00: IPLPDN WG, Expires Jan. 1, 1993 [Page 7] INTERNET-DRAFT Shortcut Routing July 1992 Payload Format for Routed IP PDUs +-------------------------------+ | LLC 0xAA-AA-03 | +-------------------------------+ | OUI 0x00-00-00 | +-------------------------------+ | EtherType 0x08-00 | +-------------------------------+ | . | | IP PDU | | (up to 2^16 - 9 octets) | | . | +-------------------------------+ 4.0 SHORTCUT ROUTING ALGORITHM The primary information passed between systems running the shortcut routing function is the shortcut address shortSAddr, which is the Subnet address of the entry system into the Subnet. This parameter is passed in a shortcut protocol header in certain ip packet transmitted onto the Subnet (at least by systems employing shortcut routing). It is manipulated by the hosts and routers attached to the Subnet, but does not have end-host-to-end-host significance like the ip address does (unless both hosts are connected to the Subnet). It is not manipulated by Subnet switches. Table 1 lists the various tables used for shortcut routing. These tables do not need to actually exist in an implementation exactly as they are described below. They are used here for the purpose of describing the external behavior of the Subnet-attached system. Table Indexed by Contents ----- ---------- -------- forwardingTable dest iAddr iAddrSet outIf outIfSAddr outLNet baseSAddr shortcutStatus shortcutTable source iAddr, or iAddr dest iAddr shortSAddr lastRxTime acceptShortcut trustShortcutTable source sAddr, and sAddr source iAddr iAddr peerTable source sAddr sAddr Table 1: Various Shortcut Routing Tables Each system has a forwardingTable, each entry of which has parameters IPLPDN WG, Expires Jan. 1, 1993 [Page 8] INTERNET-DRAFT Shortcut Routing July 1992 . The iAddrSet is a set of reachable ip addresses (encoded as an address/mask pair). The outIf denotes the interface that the ip packet should be forwarded over. The outLNet denotes the Logical Network (LN) over which the ip packet should be forwarded. An LN is defined by this RFC for the purpose of partitioning private network resources from public network resources in the context of shortcut routing. For instance, an enterprise may have a collection of hosts and routers attached to the Subnet, but may wish for all interdomain traffic to go through one or a small number of "border" routers. These border routers may be significant in that they have security functions, such as ip address filtering. By defining LNs, we control the exit point of interdomain traffic while still allowing shortcuts to be found within each of the LNs (but not between LNs). Therefore, a system may have several interfaces to a single LN, or may have one interface to several LNs. The outIfSAddr is the Subnet address of the interface over which the packet should be forwarded. The baseSAddr is the Subnet address of the next hop system that was learned via the ip routing algorithm(s) (called the "base" routing algorithm). The shortcutStatus can have values SHORTCUT, NONSHORTCUT, and UNKNOWN, and is used to determine whether or not the next hop system can parse shortcut headers or not. Any system not on the Subnet cannot parse a shortcut header and has shortcutStatus NONSHORTCUT. If the shortcutStatus is unknown, then the system must determine the shortcutStatus of the system. This is described in a later section. If the shortcutStatus of another system cannot be explicitly deter- mined, then it is assumed to be NONSHORTCUT. In other words, at the time the shortcut routing algorithms specified below are executed, the entry in the forwardingTable is either known to be SHORTCUT or NONSHORTCUT, or is assumed to be NONSHORTCUT. Each system has a shortcutTable, each entry of which has parameters . When a shortcut is usable (because it is known and the acceptShortcut bit in set to YES), then ip packets with destination address iAddr are sent to the Subnet address shortSAddr. When a shortcut is learned (based on receiving a shortcut header in an incoming packet from the Subnet), the source ip address in the incoming ip packet becomes iAddr and the shortcut address in the shortcut header becomes shortSAddr. The parameter lastRxTime indicates when a packet was last received with source ip address iAddr and source Subnet address shortSAddr. This is used for timing out old shortcut entries. Since the remote system may only accept shortcut information via a (trusted) base-peer router, the acceptShortcut bit determines whether shortcut information should be sent via base routing. acceptShortcut has two possible values, YES and NO. Each system has a trustShortcutTable, each entry of which has IPLPDN WG, Expires Jan. 1, 1993 [Page 9] INTERNET-DRAFT Shortcut Routing July 1992 parameters . An entry in the trustShortcutTable means that a packet received with source Subnet address sAddr and source ip address iAddr without a shortcut header can be used to refresh or create a corresponding shortcutTable entry. That is, the shortcut information learned directly from the peer shortcut router (called a shortcut-peer) is treated as though it came from a trusted base-peer router. Note that multiple trusted sAddr's can be associated with a single iAddr, and multiple iAddr's can be associated with a single sAddr. The former case would occur, for instance, if simultaneous multiple paths were being used to reach a given destination. Note, however, that even if there are multiple simultaneous trusted sAddr's for a given iAddr, there is only one shortcut active in the shortcutTable at any given time. The function checkShortcut(sAddr,iAddr) returns TRUE if there is an entry in trustShortcutTable with matching sAddr and iAddr, and returns FALSE otherwise. Each system has a peerTable, each entry of which has parameter . This table lists the Subnet addresses of the routers from which the system is willing to learn shortcuts. This table might typically contain the complete list of base-peer routers, but could contain a subset of these, or could contain the set of all Subnet addresses (if a router is particularly trusting). The function checkPeer(sAddr) returns TRUE if the Subnet address sAddr is in the peerTable, and FALSE otherwise. Internet packets are received with parameters , where sIAddr and dIAddr are the source and destination ip addresses, inLNet is the LN over which the ip packet was received, subnetSAddr is the Subnet address of the system that sent the ip packet (the source Sub- net address), dSAddr is the Subnet address of the interface over which the packet was received (the destination Subnet address), shortSAddr is the Subnet address in the shortcut header, and acceptShortcut is the value of the acceptShortcutBit in the shortcut header. shortSAddr and acceptShortcut are NULL for ip packets not received over the Subnet, and for ip packets received over the Subnet without a shortcut header. Internet packets are transmitted with parameters . Below is a precise description of the shortcut routing algorithm. This is done by listing state conditions that a router may find itself in upon receiving an ip packet. This is followed by pseudo- code (in C) that describes the actions that should be taken upon various states described by combinations of the conditions. Note: This pseudo-code is not meant to suggest an implemen- tation approach. Indeed, it would make for a very poor IPLPDN WG, Expires Jan. 1, 1993 [Page 10] INTERNET-DRAFT Shortcut Routing July 1992 implementation. The pseudo-code is only meant to provide an unambiguous description of the shortcut routing state machine. For each of the conditions listed below, let "packet" be a data structure containing the incoming packet parameters. Let "baseEntry" be the forwardingTable entry with the smallest (in the cardinality sense) iAddrSet such that packet->dIAddr is a member of iAddrSet (or NULL if there is no such entry). Let "destShortEntry" be the shortcutTable entry with iAddr = packet->dIAddr (or NULL if there is no such entry). Let "sourceShortEntry" be the shortcutTable entry with iAddr = packet->sIAddr (or NULL if there is no such entry). The routine compareSAddr (condition cI) compares the argument with the system's own Subnet addresses and returns YES if any match. The rou- tine checkConnection(subnetAddr) returns YES if there is a Subnet connection established with address subnetAddr, and NO if there is not (this is for the connection-oriented Subnet case only). Condition cA: Incoming ip packet has a shortcut header TRUE if (packet.shortSAddr != NULL) FALSE if (packet.shortSAddr == NULL) Condition cB: Incoming ip packet is from a base-peer router TRUE if (checkPeer(packet.subnetSAddr) == TRUE) FALSE if (checkPeer(packet.subnetSAddr) == FALSE) Condition cC: Incoming ip packet is from a valid shortcut TRUE if (checkShortcut(packet.sIAddr,packet.subnetSAddr) == TRUE) FALSE if (checkShortcut(packet.sIAddr,packet.subnetSAddr) == FALSE) Condition cD: ip packet is forwarded over the same LN it came in on TRUE if (packet.inLNet == baseEntry.outLNet) FALSE if (packet.inLNet != baseEntry.outLNet) Condition cE: Next-hop base-routing peer is a shortcut system TRUE if (baseEntry.shortcutStatus == SHORTCUT) FALSE if (baseEntry.shortcutStatus != SHORTCUT) Condition cF: A shortcut for the destination ip address is known TRUE if (destShortEntry != NULL) FALSE if (destShortEntry == NULL) Condition cG: Router at other end of shortcut is prepared to accept packets sent via shortcuts TRUE if (sourceShortEntry.acceptShortcut == YES) FALSE if (sourceShortEntry.acceptShortcut == NO) Condition cH: Incoming source Subnet address is not a multicast address TRUE if (packet.shortSAddr != GROUP_ADDRESS) FALSE if (packet.shortSAddr == GROUP_ADDRESS) IPLPDN WG, Expires Jan. 1, 1993 [Page 11] INTERNET-DRAFT Shortcut Routing July 1992 Condition cI: Shortcut address of incoming packet matches one of system's own Subnet address TRUE if (compareSAddr(packet.shortSAddr) == YES) FALSE if (compareSAddr(packet.shortSAddr) == NO) Condition cJ: Shortcut address matches a base-peer router TRUE if (checkPeer(packet.shortSAddr) == TRUE) FALSE if (checkPeer(packet.shortSAddr) == FALSE) Condition cK: A shortcut for the source ip address is known TRUE if (destShortEntry != NULL) FALSE if (destShortEntry == NULL) Condition cL: A connection is established (connection- oriented Subnets only) TRUE if (checkConnection(packet.subnetSAddr) == YES) FALSE if (checkConnection(packet.subnetSAddr) == NO) Condition cM: The shortcut-peer can accept shortcuts TRUE if (packet.acceptShortcut == YES) FALSE if (packet.acceptShortcut == NO) The pseudo-code of both Figures 1 and 2 is executed upon reception of an ip packet. Figure 1 is for the purpose of learning or refreshing shortcut information, and Figure 2 is for the purpose of forwarding a packet to the appropriate next-hop and setting the shortcut header appropriately. The logic of 2 is predicate on 1 having been executed first. In the pseudo-code of Figure 1, the routine createShortcutEntry(iAddr) creates an entry in the shortcutTable, sets parameter iAddr, and returns a pointer to that entry. The rou- tine createTrustEntry(iAddr,sAddr) creates an entry in the trustShortcutTable. The routine flushShortcut(iAddr) flushes the shortcutTable entry with matching iAddr, and optionally flushes the trustShortcutTable entries with matching iAddr. Following is an explanation of the pseudo-code in Figure 1. A shortcut can only be learned if either a packet has come in with a shortcut header (the initial "if" statement), or has come in from a valid shortcut-peer (the "else if" statement). A system will not initially learn a shortcut (the first "if" state- ment) unless it comes in with a shortcut header (the cA==T expres- sion) and is being forwarded to another LN (the cD==F expression). Note that two different physical networks are considered to be dif- ferent LNs. Note also that if the packet is originated by the system (at an upper layer protocol), then the packet is also considered to have come from a different LN. The remaining two expressions (cB==T && cH==T) are checks. The cH==T checks to make sure that the shortcut address in the shortcut header (packet->shortSAddr) is not a multicast address. IPLPDN WG, Expires Jan. 1, 1993 [Page 12] INTERNET-DRAFT Shortcut Routing July 1992 if (cA==T && cD==F && cB==T && cH==T) { if (cI==T) { flushShortcut(packet.dIAddr); } else if (cJ==F) { if (cK==F) { shortEntry = createShortcutEntry(packet.sIAddr); createTrustEntry(packet.sIAddr,packet.shortSAddr); } if (cL==F && cM==T) { establishConnection(packet.shortSAddr); } shortEntry.shortSAddr = packet.shortSAddr; shortEntry.acceptShortcut = packet.acceptShortcut; shortEntry.lastRxTime = currentSystemTime; } } else if (cA==F && cD==F && cC==T) { shortEntry.shortSAddr = packet.subnetSAddr; shortEntry.acceptShortcut = YES; shortEntry.lastRxTime = currentSystemTime; } Figure 1: Logic for Learning or Refreshing Shortcut Information The cB==T expression is a check to make sure that the packet has come from a base-peer router. If the packet has not come from a base-peer router, the shortcut won't be learned. This check protects against an unknown router sending a bogus shortcut header, thus causing pack- ets to go to an unfriendly router, that can then either 1) read the packets before sending them to the appropriate destination, or 2) throw the packets away. The former action would be an undetectable privacy violation, and the latter a denial-of-service. Within the body of the initial "if" statement, the shortcut address is checked against the system's own addresses to make sure the packet isn't looping (cI==T). If it is, the shortcut associated with the destination ip address (packet.dIAddr) is flushed, if it exists. Next, the shortcut address in the shortcut header is checked to make sure that a shortcut to a base-peer is not being learned (cJ==F). It should almost never be necessary to learn shortcuts to base-peers, as IPLPDN WG, Expires Jan. 1, 1993 [Page 13] INTERNET-DRAFT Shortcut Routing July 1992 if (cF==T && cG==T && cD==F) { packet.dSAddr = shortEntry.shortSAddr; packet.shortSAddr = NULL; packet.acceptShortcut = NULL; transmit(packet); } else { packet.dSAddr = baseEntry.baseSAddr; if (cA==T && cB==T && cD==T && cE==T) noOperation(); else if (cE==F || (cA==T && cB==F)) { packet.shortSAddr = NULL; packet.acceptShortcut = NULL; } else if (cD==F) { packet.shortSAddr = baseEntry.outIfSAddr; if (cF==T) packet.acceptShortcut = YES; else packet.acceptShortcut = NO; } else if (cC==T) { packet.shortSAddr = packet.subnetSAddr; packet.acceptShortcut = NO; } transmit(packet); } Figure 2: Logic for Transmitting a Shortcut Packet the direct path to a base-peer should almost always be preferable to an indirect path. By not learning shortcuts to base-peer routers, shortcuts can be re-learned when base routing finds better paths across the Subnet (see description of code in Figure 2). If the shortcut is to a non-base-peer system, a shortcut is created if one doesn't already exist (in the body of the "if (cK==F)" state- ment). In addition, a trustShortcutTable entry is established for that shortcut (createTrustEntry). This allows packets received directly via the shortcut to refresh (or overwrite) the shortcut entry. Depending on the local policy for flushing trustShortcutTable IPLPDN WG, Expires Jan. 1, 1993 [Page 14] INTERNET-DRAFT Shortcut Routing July 1992 entries, a trustShortcutTable entry may already exist even though the correspond shortcutTable entry does not exist. This would be the case where trustShortcutTable entries are not automatically flushed when shortcutTable entries are. Always flushing both entries at the same time could result in the following scenario: Assume two routers A and B have shortcuts to each other. Router A times out its shortcutTable and trustShortcutTable entry before B. A packet arrives at B and is sent via the shortcut to A. A would not relearn the shortcut at this time because it does not trust B. (Note that A will correctly forward the packet, however.) Therefore, a subsequent packet from A to B will take the long path. Furthermore, this packet would contain a shortcut header with the acceptShortcutBit set to NO (0). This would cause the subsequent packet from B to A to also take the long path. (The packets after these would take the shortcut.) In some circumstances, this scenario would be considered desirable behavior. This would be the case where the purpose of flushing the shortcut was to retry the long path in case base routing has changed. In the case where the shortcut was flushed simply because no more packets using that shortcut were expected, then the above scenario would be undesirable. After this, a check (cL==F && cM==T) is made to see if a connection should be established with the shortcut-peer. If these is no connec- tion (cL==F) and the shortcut-peer is prepared to accept shortcuts (cM==T), then the connection is established. Naturally, this bit of code only applies to systems running shortcut routing over connection-oriented Subnets. With this connection establishment logic, connections are only esta- blished after both ends of the shortcut are willing to accept shortcuts. This has the effect of causing the system that first transmitted a packet to first attempt establishment of the connec- tion. It is normally appropriate for the system that first has data to send to also establish the connection. Note that this logic can result in both ends trying to setup the con- nection. This would occur in the case where, for instance, system A was still in the process of establishing a connection to system B when an ip packet was sent over the long path from A to B. In this case, the acceptShortcutBit would be set, and, assuming that A's con- nection was not yet established, B would start to establish a connec- tion with A. This behavior can be avoided in the following way. When the line of code indicating that a connection should be established is reached, the following check is made: If a shortcutTable entry for this shortcut exists, and the acceptShortcutBit in the entry is set to "NO", then delay establishing a connection for time period DELAYCON- NECTION. The constant DELAYCONNECTION should be long enough for the opposite end to have completely established a connection with high probability. Note that normally the above condition will exist only for the system IPLPDN WG, Expires Jan. 1, 1993 [Page 15] INTERNET-DRAFT Shortcut Routing July 1992 that was initially the exit system for a flow of ip packets. When the initial entry system first reaches the "establishConnection(packet.shortSAddr)" line of code, it will only have just created the shortcutTable entry. Temporary Note: If people think that it is important, we can add this check to the main code description, thus mak- ing it mandatory for all shortcut systems. Also, by adding a bit to the shortcut header, we can have a kind of nego- tiation about which end should setup the connection. I think this is overkill, but it ought to be discussed. A system may optionally accept connection-requests only from trusted shortcut-peer systems. In the remainder of the body of the initial "if" statement, the shortSAddr and acceptShortcut parameters are filled in, and the lastRxTime is set to the current system time. This is used to later flush the shortcut entry. The body of the "else if" statement of Figure 1 is executed when a packet is received from a trusted shortcut-peer. All packets traveling via a shortcut do not have a shortcut header, as is indicated by the first expression in the "else if" statement (cA==F). As with the initial "if" statement, the cD==F expression insures that the packet is leaving the LN. The third expression (cC==T) insures that the packet came in via a trusted shortcut. If it did not, the shortcut will not be learned (although the packet will still be forwarded, if possible). Note that it is not necessary to check to see if the source Subnet address is a multicast address at this point, because the trustShortcutTable entry would not have been made with a multicast address. In spite of the fact that there is no shortcut header on packets that travel via shortcuts, the information that would otherwise have been in the shortcut header is deduced, and the shortcutTable entry is updated accordingly. The shortcut address shortSAddr is the same as the source Subnet address subnetSAddr, and the acceptShortcut bit is assumed to be YES. In addition to setting these two values, the lastRxTime is refreshed. It is possible that the shortSAddr associated with a given iAddr in the shortcutTable can be modified from its previous value in the body of either the "if" or the "if else" of Figure 1. An example of where this would occur is the case where a stream of packets from source host A to destination host B is being split between two Subnet- attached routers X and Y. Assume that the packets enter host B's domain via Subnet-attached router Z. Assume that the first packet Z receives is via X. This will come via the long path, and Z will set up a shortcutTable entry and a trustShortcutTable entry for X. Later, Z will receive a packet via a long path from Y. Z will overwrite the shortcutTable entry with Y, and will create a new IPLPDN WG, Expires Jan. 1, 1993 [Page 16] INTERNET-DRAFT Shortcut Routing July 1992 trustShortcutTable entry, this time for Y. (Note that a trustShortcutTable entry is not flushed or overwritten when the shortcutTable entry is refreshed or overwritten.) At this point, packets will arrive at Z from X and Y via shortcuts. Since Z has trustShortcutTable entries for both X and Y, it will accept packets from both shortcuts. Each time Z receives a packet, it will overwrite its shortcutTable entry with the latest return-path shortcut information. As a result, packets from B to A will some- times come in via X and sometimes via Y. This is desirable behavior for the path splitting scenario. Following is an explanation of the pseudo-code in Figure 2. The out- going packet is understood to have the same packet parameters as the incoming packet (including those of the shortcut header), except where otherwise noted in the code, and except for the source Subnet address, which is understood to be that of the transmitting system. The body of the first "if" statement is executed if the packet is to take a shortcut. This will occur if a shortcut in known (cF==T), and if the system on the receiving side of the shortcut is willing to accept the packets arriving via shortcut (cG==T). The shortcut will not be taken unless the packet is being forwarded over a different LN from the one it came in on (cD==F). This prevents loops from forming within an LN resulting from changes in base routing while a shortcut entry does not change. Within the body of the first "if" statement, we see that the destina- tion Subnet address is set to the shortcutTable entry (packet->dSAddr = shortEntry->shortSAddr). Note that the shortcut header is not included in the outgoing packet. If the packet is not forwarded via a shortcut (the initial "if" statement), then it is forwarded via base routing (the "else" state- ment associated with the initial "if"). This can be seen from the first statement within the body of the "else" (packet.dSAddr = baseEntry.baseSAddr) where the destination Subnet address is set to that in the forwardingTable. Within the body of the else, there are four possible actions: 1) the shortcut header of the incoming packet is passed untouched, 2) the packet is forwarded without a shortcut header, 3) the packet is for- warded with a shortcut header containing the transmitting system's Subnet address, and 4) the packet header is forwarded with a shortcut header containing the last-hop system's Subnet address. These four actions correspond to the bodies of the four "if"/"if else" state- ments. If the packet arrived with a shortcut header and from a base-peer router (cA==T && cB==T), and if the packet is being forwarded over the same LN and is going to another shortcut system (cD==T && cE==T), then the incoming shortcut header is passed untouched so that a sub- sequent system on the path can learn the appropriate shortcut. IPLPDN WG, Expires Jan. 1, 1993 [Page 17] INTERNET-DRAFT Shortcut Routing July 1992 Note that nothing actually happens in the body of this first "if" statement (as signified by the "noOperation()" line of code). This is because the destination Subnet address has already been set, and nothing else in the headers (except the source Subnet address, whose setting is understood) needs to change. The packet is transmitted without a Shortcut header (second body) if the recipient is not a shortcut system (cE==F). This is simply because non-shortcut systems are not able to parse a shortcut header and would therefore discard the packet. Note that if a system is a non-shortcut system but still considered part of the same LN, then no shortcuts can be used by any system on a (base-routing) path involv- ing the non-shortcut system. The packet is also transmitted without a Shortcut header (second body) if the packet is received from a non-base-peer system but hav- ing a shortcut header (cA==T && cB==F). This is an error condition, because shortcuts should only be learned via base routing paths (for the privacy reasons discussed above). If the shortcut header were allowed to be passed on untouched, then the chain of trust relation- ships inherent among base-peer routers would be broken, and the privacy mechanism would be defeated. A reasonable action to take as a result of this condition would be to send an error message to the system that sent the packet (the source Subnet address), but no such message is currently defined by this RFC. The system may optionally discard the packet as a result of this error condition. The system's own shortcut header is attached to the packet (third body, cD==F) if the packet is being sent over a different LN from which it was received (that is, the system is an entry system). In other words, the system is an entry system into the LN, and includes the shortcut header so that the exit system can learn the return path. If the shortcut is known (cF==T), then the acceptShortcut bit is set to YES. Otherwise, the acceptShortcut bit is set to NO. This will cause the exit system to return packets via base routing, thus allowing the local system to learn the shortcut. If the third body (or the first two) is not executed, this means that the packet is being forwarded over the same LN that it came in on. If the packet is from a shortcut-peer router (fourth body, cC==T), then a shortcut header is composed, but with the Subnet address of the last-hop system as the shortcut address (packet.shortSAddr = packet.subnetSAddr). The reason behind this action is that the packet should not be re-routed onto the same LN if it previously took a shortcut. Shortcuts should always be from entry to exit system. If a shortcut was taken on the previous hop, then base routing must have changed since the shortcut was established. If so, it is neces- sary to find a new shortcut. However, the entry (last-hop) system does not know this, and so a shortcut header is added on behalf of the entry system. It is as though the entry system had attached its own shortcut header. The acceptShortcut bit is set to NO. IPLPDN WG, Expires Jan. 1, 1993 [Page 18] INTERNET-DRAFT Shortcut Routing July 1992 5.0 OTHER ASPECTS OF SHORTCUT ROUTING The following sections describe various other aspects of shortcut routing. 5.1 Version Negotiation The shortcut header has two version numbers, one for entry systems and one for transit systems. The reason for having two version numbers is to be able to manage enhancements to either entry system operation or transit system operation. For instance, a future enhancement to shortcut routing might change the behavior of entry and exit systems with respect to each other, but not change the behavior of transit systems. With version number negotiation, one expects a system to put its highest version number in the version number field. However, if both the entry and exit systems had a higher version number than one of the transit systems, and there were only one version number field, the transit system could lower the version number of a shortcut header even though both the entry and exit systems could handle the higher version. To prevent this occurance, the shortcut header has two version number fields. An entry system always sets the transit version number (tranVer) to "0" upon transmission. For this version of shortcut routing, the entry system sets the entVer field to value 1. Transit systems set the tranVer field to value 1. In future versions, it is expected that a kind of version negotiation will take place where entry systems initially set the entVer to their highest version number. It is expected that transit systems will set the tranVer version number to their highest version number if the tranVer field is 0 or if the tranVer field has a version number higher than the transit system's highest version number. It is expected that entry systems will monitor the entVer and potentially the tranVer in received shortcut headers, and choose the appropriate version to operate at. It is expected that transit systems will mon- itor the tranVer and potentially the entVer in received shortcut headers, and choose the appropriate version to operate at. Since there is currently only one version of shortcut routing, it is not known exactly how future version negotiation will take place. Version 1 shortcut systems should ignore both the entVer and the tranVer fields upon reception. 5.2 Configuring LNs In the context of shortcut routing, LNs behave like physically dis- tinct networks in the sense that shortcuts cannot be used between two LNs. The purpose of this feature is to allow groups of systems to insure that their traffic will go through certain routers, and not bypass those routers due to shortcuts being found. IPLPDN WG, Expires Jan. 1, 1993 [Page 19] INTERNET-DRAFT Shortcut Routing July 1992 The most common use of this is for a private enterprise, where a set of private hosts and routers use the Subnet to access each other, but wish to be well-partitioned from the rest of the world via one or a small number of border routers. If the Subnet offers Subnet-address level filtering capability, such as SMDS does, these private hosts and routers might use Subnet address filtering to prevent receiving Subnet packets from any but their own systems. All interdomain traffic would go through the border routers. While these systems will not want to find shortcuts to systems outside the enterprise, they may want to find shortcuts to the border router, and the border router may want to find shortcuts across the Subnet to other systems. To identify which LN a received packet came from, a system can use a list of Subnet addresses, a list of ip address/mask pair, or separate interfaces (with separate Subnet address) to the Subnet. For instance, in the case of a private enterprise network that does not transit traffic, it will be possible to identify all packets coming from or going to the private enterprise system as having a source or destination ip address that comes from one or a small number of ip address prefixes. From a configuration standpoint, a simpler way to make the determina- tion is for a router to have multiple Subnet interfaces, and corresponding multiple Subnet addresses, one for each LN that the router belongs to. Systems in each LN would use the appropriate Sub- net address when sending packets to the router. 5.3 Flushing Old Shortcut Entries Whenever a shortcut packet is received, the time of reception is recorded (lastRxTime). A shortcut entry should be flushed whenever the current system time is greater than the lastRxTime by a certain amount, SHORTCUT_LIFETIME. SHORTCUT_LIFETIME should be relatively small, approximately 15 seconds. This small setting allows many black-holes or loops to to be detected quickly. While it is not absolutely necessary from the perspective of correct operation, systems should flush the trustShortcutTable entries corresponding with the flushed shortcutTable entry (that is, those with matching iAddr). The advantage of not flushing a trustShortcutTable entry when flushing a shortcutTable entry is small (see discussion in shortcut routing algorithm description above). It is not necessary to set a timer each time a shortcut entry is refreshed. Either a periodic background function of frequency SHORTCUT_LIFETIME can flush out all old entries, or lastRxTime can be checked upon looking up a shortcut entry for transmission (Figure 1). A system may also optionally wish to flush a shortcut when base rout- ing detects that the path to a destination that has a shortcut in effect has changed. Note that different routing algorithms have differing amounts of knowledge about the path to the destination. For instance, simple distance-vector routing algorithms know only the IPLPDN WG, Expires Jan. 1, 1993 [Page 20] INTERNET-DRAFT Shortcut Routing July 1992 next hop to the destination, while link-state routing algorithms may know the entire path. 5.4 Looping in Shortcut Routing Even though base routing may be loop-free, it is possible for loops to form when shortcut routing is used in conjunction with base rout- ing. This is because part of a path from source to destination may be routed using information from base routing, and part from shortcut routing. If the base routing information from which the shortcut route was derived is no longer valid, a loop may form. These loops, however, can only form if part of the looping path goes outside of a Subnet LN. In other words, purely intra-LN loops cannot form. This is because all ip packets received from an LN and for- warded back over the same LN are routed using base routing. To prove that intra-LN loops cannot form, assume that an intra-LN loop does exist, where R1 is the entry router. Only R1 will route the packet using shortcut routing information. The looping path must eventually come back to R1, where it will be routed using base routing informa- tion (rather than shortcut routing information, as it did upon enter- ing the LN). The remainder of the path will follow base routing and will therefore not loop. All non-intra-LN looping can be eliminated if the following two res- trictions are placed on the logical (peer routers) topology: Assume two routers R1 and Rw connected to the same LN X. If there exists a base-routing path between R1 and R2 that exits R1(R2) via a non-LN X interface and enters R2(R1) via a non-LN X interface, then R1 and R2 must be peer routers. A system must never establish a shortcut to a routing peer. The reason that these restrictions eliminate loops is as follows. Since we have eliminated the possibility of intra-LN looping, one segment of the path (say R1->R2) must directly cross the LN, and the R2->R1 segment must exit R2 on a non-LN link, and enter R1 on a non- LN link. (This segment may in fact enter and exit the LN multiple times. This does not effect the argument.) By the first restric- tion, however, R1 and R2 must be routing peers if such a path exists. And by the second restriction, if R1 and R2 are routing peers, then R1 would not ever establish a shortcut to R2. If there is a loop, R1 would not send the packet to R2 because of base routing. Therefore, a loop cannot form if the above restrictions are followed. This restriction does not necessarily result in a large number of router peers for most routers. Most networks that connect to the Subnet are stub networks. These do not need to setup peer router relationships for the sake of preventing loops. Backbones need to establish peer router relationships with the routers of other back- bones, but backbones would normally want to do this anyway, due to the volume of traffic that passes between them (and the desire not to IPLPDN WG, Expires Jan. 1, 1993 [Page 21] INTERNET-DRAFT Shortcut Routing July 1992 incur the overhead of shortcut routing on this traffic). And, the number of backbones is small relative to the number of stubs. Some networks may be predominantly stub, but may transit traffic for a small number of other networks. Assuming that routing algorithms always prefer internal paths over external paths, then two such stubs A and B only need to establish router peers if they both transit traffic for the same third network C. Therefore, even most predom- inantly stub networks do not need to establish router peers with each other. Of course, it will not always be possible to insure that the above restriction holds, and therefore loops will occasionally occur. In addition, black holes may sometimes form, because a previously valid path used by a shortcut becomes invalid. In these cases, the shortcut will be flushed fairly quickly through the reverse path refresh mechanisms, thus eliminating the loop or black hole. 5.5 Interaction with Non-Shortcut Routing Systems An ip system running shortcut routing may need to exchange ip packets with an ip system not running shortcut routing. A system may not run shortcut routing either because it predates this RFC, or because it simply chooses not to follow shortcuts. These systems are called non-shortcut systems. A non-shortcut system will not be able to receive ip packets with a shortcut header because the non-shortcut system will not be able to parse the shortcut header and will therefore throw away the packet. A shortcut system must always be able to receive ip packets both with or without the shortcut header. The forwardingTable states whether or not a system is a shortcut sys- tem (shortcutStatus). However, before a system can set the shortcutStatus for a base-peer system in the forwardingTable prop- erly, it must know whether or not the base-peer system is a shortcut system. This section specifies how to determine the shortcutStatus of another system. All systems not reachable over the Subnet are non-shortcut systems. The shortcutStatus of systems reachable over the Subnet can be deter- mined one of two ways: static configuration and dynamic configura- tion. With dynamic configuration, a system has no a priori indication of whether or not the base-peer system is a shortcut system. The system does however have an Subnet address to which it can send ARP requests that the base-peer system will receive. The Subnet address may be a multicast address that the base-peer system belongs to, or may be the private address of the base-peer system. A system Sa can determine absolutely that another system Sb is a IPLPDN WG, Expires Jan. 1, 1993 [Page 22] INTERNET-DRAFT Shortcut Routing July 1992 shortcut system by the reception of any packet from Sb with a shortcut header. Sa can trigger a packet from Sb by sending an ARP request to Sb. If a shortcut system Sb receives an ARP request with a shortcut header for itself (i.e., the ARP target protocol address ar$tpa), then Sb responds with an ARP reply containing a shortcut header. Likewise, if Sb receives an ARP request without a shortcut header for itself, then Sb responds with an ARP reply without a shortcut header. By sending the appropriate ARP requests, Sa can dynamically determine the shortcutStatus of Sb. While any algorithm can be used by Sa to do this, the following algorithm is recommended. The algorithm for dynamic configuration can either be executed at the time an ip packet arrives, or sometime before an ip packet arrives. In addition to the shortcutStatus variable in the forwardingTable, Sa maintains the following variables for Sb (and every base-peer system Sa maintains shortcutStatus for). shTimer: This is a short timer (roughly one second) that is used to clock out ARP requests. shCount: This counts the number of ARP requests. Sa initially sets shortcutStatus to UNKNOWN, and sets shCount to 0. It sends an ARP request to Sb with the shortcut header attached, and starts shTimer. If Sa receives any packet from Sb with a shortcut header, it sets shortcutStatus to SHORTCUT. (This happens if the previous value of shortcutStatus was either UNKNOWN or NONSHORTCUT.) If the shTimer expires and shortcutStatus is equal to SHORTCUT, Sa does nothing. If the shTimer expires and shortcutStatus is equal to UNKNOWN, then Sa increments shCount. If shCount is equal to MAXSHCOUNT, then Sa sets shortcutStatus to NONSHORTCUT. (MAXSHCOUNT should be a small integer, perhaps 3.) If shCount is not equal to MAXSHCOUNT, then Sa sends another ARP request with a shortcut header, and restarts the shTimer. Note that if all of the MAXSHCOUNT ARP requests sent from Sa to Sb and those sent from Sb to Sa (if both sides try to establish each other's status) are dropped by the Subnet, then Sa and Sb will con- clude that the other is a nonshortcut system. From this time on, they will each send each other packets without the shortcut header, and so will never learn that the other system is in fact a shortcut system. Therefore, each system should periodically (perhaps every 24 hours or so) set the shortcutStatus of NONSHORTCUT base-peer systems to UNKNOWN and execute the above algorithm again. IPLPDN WG, Expires Jan. 1, 1993 [Page 23] INTERNET-DRAFT Shortcut Routing July 1992 This algorithm makes less sense for connection-oriented Subnets, because the first packet will be delivered with very high probabil- ity, and so a retransmission of the ARP request is not necessary. None-the-less, the algorithm works for both connectionless and connection-oriented Subnets. This algorithm discovers the status of shortcut systems quickly, but takes longer to discover the status of non-shortcut systems (because several shortcut headers will be sent before the status is determined to be nonshortcut). In addition, if the ARP packets are being sent over a multicast address, for instance because the discovering system is attempting to learn the Subnet address as well as the shortcutStatus of the remote system, then several multicast will be sent before discovering the status of a non-shortcut system. This latency and overhead (particularly in the multicast case) can be avoided by sending two ARP requests right away--one with a shortcut header and one without. The system will reply to both if it is a shortcut system, but only to the ARP request without a shortcut header if it is not. 5.6 Interaction with Subnet Address Filtering Subnet address filtering is a service whereby the Subnet does not deliver packets that whose source Subnet address is one of a precon- figured list. SMDS, for instance, offers this service. Because of the interaction between shortcut routing and Subnet address filtering, it is possible for shortcut routing to result in a black hole even though base routing can successfully find a path. For instance, consider three systems, Sa, St, and Sb. Sa and Sb are the entry and exit systems, and St is the transit router discovered by base routing. Sa is using the Subnet address filtering feature to filter out packet received from Sb. When Sa sends its first packet to Sb via St, Sb will learn the Subnet address of Sa using shortcut routing. When Sb sends return packets directly to Sa, however, those packets will be filtered and Sa will not receive them. Before discussing ways to avoid this behavior, it is worth pointing out that this behavior may in some cases be desirable. If Sa's filtering policy is such that it does not want to receive packets from Sb, either directly or indirectly, then this behavior is appropriate from Sa's perspective. Except for the packets sent before Sb acquired the shortcut, Sa indeed does not receive packets from Sb. If Sa's filtering policy is such that it does not want to receive packets directly from Sb, but is willing to receive packets indirectly from Sb, then this behavior is inappropriate. But under what circumstances would Sa have such a filtering policy? One IPLPDN WG, Expires Jan. 1, 1993 [Page 24] INTERNET-DRAFT Shortcut Routing July 1992 possible reason is that Sa depends on a small number of transit sys- tems to do ip level filtering for it, and therefore only accepts packets from those few transit systems. If this is the case, how- ever, Sa should not be advertising itself in shortcut headers at all, thus avoiding the black hole. On the other hand, if Sa wants to receive some of its traffic via transit systems, and other traffic directly, then Sa should be selectively transmitting shortcut headers based on destination ip address. In other words, Sa is generally responsible for transmitting or not transmitting shortcut headers in accordance with its filtering pol- icy. This having been said, there is a way that the problem can be avoided, although it is not attractive and therefore probably not worth doing. A system can always ping (using either ICMP Ping or ARP) another system to insure that it can speak directly with that system. If the shortcut-peer system does not respond, then base routing can be used. It is not efficient to do this every time a shortcut is discovered. Shortcut routing is designed to discover shortcuts very efficiently (it requires no control packets, for instance), so that shortcuts can be timed out quickly, thus minimizing black holes or loops. To send out a ping every time a shortcut is discovered would defeat this design feature. Alternatively, a system could send out the ping only when it suspected the shortcut-peer system of not receiving its packets. This occurs when the system notices that the shortcut-peer system persists in using the base routing path even though it should have learned the shortcut. This approach, however, requires extra state and extra processing. Given that the address filtering black hole problem can be avoided by proper configuration of filters, and proper coordination between Subnet-level filtering, ip-level filtering, and the transmission of shortcut headers, it is not necessary to specify an algorithm for avoiding the black hole problem in this RFC. 5.7 Overhead of Shortcut Routing Shortcut routing incurs processing, memory, and link overhead. The link overhead primarily comes from the shortcut header, which exists only in packets taking the base routing path across the Sub- net. The large majority of packets will not have a shortcut header attached. The memory overhead cannot be absolutely predicted, because shortcut routing is based on caching information. The memory overhead will depend on how much information is cached, which depends on the traffic spread and the timeout period for shortcut entries (that is, IPLPDN WG, Expires Jan. 1, 1993 [Page 25] INTERNET-DRAFT Shortcut Routing July 1992 how many destinations are being handled by the router during the timeout period). The memory overhead is equal to the sum of the number of entries in the shortcutTable, trustShortcutTable, and peerTable tables. Although the information in the peerTable already needs to be stored in the context of the base routing algorithm, an extra structure is needed to do lookups based on the Subnet addresses of base-peer routers, which is a function not normally required in routers. None-the-less, the extra memory overhead for the peerTable is always just a percentage of the base routing information normally needed, and so can be disregarded here. The number of entries in the shortcutTable is proportional to the number of distinct destinations reachable over each LN being handled at a given time. The number of entries in the trustShortcutTable for a router Y is proportional to the distinct number of systems not reachable over a given LN that all other LN-attached routers are using router Y to reach at a given time. The only time the trustShortcutTable is bigger than the shortcutTable is when a single system on one of the router's LNs is exchanging packets with multiple systems on another of the router's LNs. Note that the memory required for a shortcut system can be bounded without preventing packet delivery. If there is not room in the shortcutTable or trustShortcutTable for a new entry, then the end result is that a base routing path is taken. In other words, commun- ications takes place, albeit less efficiently than it otherwise might. Therefore, it is not necessary to size the memory for the absolute worst case load (although a router should have enough memory to handle the large majority of shortcuts). The processing overhead of concern with shortcut routing is the extra per-packet processing necessary to act on the shortcut information, either explicit in the shortcut header or implied. Of the several functions (such as header parsing) that contribute to processing overhead, the worst is generally any table lookup function. Shortcut routing requires table lookups based on the following param- eters: 1. The destination ip address (for the forwardingTable and shortcutTable, which can be combined into one lookup). 2. The source ip address (for the shortcutTable lookup). 3. The source ip address combined with the source Subnet address (for the checkShortcut operation). 4. The source Subnet address (for the checkPeer operation). Current routers already must do a lookup based on destination ip address (the first item above). There are two differences in this IPLPDN WG, Expires Jan. 1, 1993 [Page 26] INTERNET-DRAFT Shortcut Routing July 1992 lookup, however, when executed by a shortcut system, one negative and one positive. On the negative side, there will be more entries to look up because of the cached shortcuts. In the context of a search tree, this means a deeper search. However, since the depth of search trees grow log N, where N is the number of entries, this is not ter- ribly bad. On the positive side, all shortcut entries can be searched using a hash algorithm, since there is no masking associated with shortcut entries. (Hash lookups run less efficiently with masks, since each mask must be hashed in turn.) Hashing generally is faster than a search tree, and indeed many routers dynamically cache fully-masked ip addresses so that they can use a hash lookup, falling back on the search tree only in the case of misses. Unless a router is doing source ip address filtering, the source ip address based lookups (second item above) is additional overhead from what a router normally does. A hash lookup can be used for this. Alternatively, one could make the source ip address search a sub- search under the destination ip address search. Often (if not usu- ally) there will be a one-to-one correspondence between source and destination ip address. If, however, there are many sources talking to a single destination through the shortcut router (which will be the case for systems such as public ftp sites), then this lookup structure can be slow. By establishing pointers from the shortcutTable to the trustShortcutTable, most, and usually all of the work required for the combined source ip address/source Subnet address lookup (third item above) can be accomplished by the source ip address based lookup. This is because there will almost always be a one-to-one correspondence between the number of source ip addresses and the number of source Subnet addresses that packets with the source ip address come through. The exception to this will be with multi-path routing, in which case there will be only a small (two or three) number of source Subnet addresses for a given source ip address. The source Subnet address lookup (checkPeer, fourth item above) is a function that non-shortcut routers do not have to do, so this com- pletely represents extra overhead. In most but not all cases, the number of base-peer routers will be small. Fortunately, a hash lookup will work for this lookup as well. Clearly the extra per-packet processing overhead of shortcut routing is significant although not prohibitive. We feel, however, that it is still better than the alternatives--1) accepting extra hops across the Subnet, 2) maintaining routing information for all other routers on the Subnet, or 3) an alternative shortcut approach where some kind of "shortcut search" packet (similar to an ARP request) is used to discover a shortcut instead of embedding the shortcut address in every packet. IPLPDN WG, Expires Jan. 1, 1993 [Page 27] INTERNET-DRAFT Shortcut Routing July 1992 REFERENCES [1] Piscitello, D., Lawrence, J., "The Transmission of IP Datagrams over the SMDS Service", RFC 1209, USC/Information Sciences Institute, March 1991. [2] Postel, J.B., "Internet Control Message Protocol", RFC 792, September 1981. [3] Plummer, D., "An Ethernet Address Resolution Protocol - or - Converting Network Protocol Addresses to 48.bit Ethernet Address for Transmission on Ethernet Hardware", RFC 826, November 1982. [4] IEEE, "Draft Standard P802.1A--Overview and Architecture", 1989. [5] Reynolds, J.K., and J. Postel, "Assigned Numbers", RFC-1060, USC/Information Sciences Institute, March 1990. [6] Institute of Electrical & Electronic Engineers, Inc. IEEE Standard 802.6, "Distributed Queue Dual Bus (DQDB) Subnetwork of a Metropolitan Area Network (MAN) Standard", December 1990. [7] Bradley, T., Brown, C., "Multiprotocol Interconnect over Frame Relay", RFC 1294, January 1992. [8] Kantor, B., "Internet Protocol Encapsulation of AX.25 Frames", RFC 1226, May 1991. [9] Malis, A.G., Robinson, D., Ullmann, R.L., "Multiprotocol Internetconnect on X.25 and ISDN in the Packet Mode", Internet Draft, April 6, 1992. [10] Piscitello, D., Tsuchiya, P., "A Proposed Standard for the Transmission of OSI CLNP Datagrams over SMDS", Internet Draft, April 22, 1992. [11] Postel, J.B., "Internet Protocol", RFC 791, September 1981. [12] Information technology - Telecommunications and Information Exchange Between Systems, "Protocol Identification in the Network Layer". ISO/IEC TR 9577, October 1990. IPLPDN WG, Expires Jan. 1, 1993 [Page 28] INTERNET-DRAFT Shortcut Routing July 1992 [13] Heinanen, J., "Multiprotocol Interconnect over ATM", Internet- Draft. Security Considerations Limited security mechanisms have been incorporated into this docu- ment. In particular, the use of the acceptShortcut bit allows shortcut systems to only learn of shortcuts through a trusted chain of routers. Author's Address Paul Tsuchiya Bellcore 435 South St. 2L-281 Morristown, NJ 07960 Phone: (201) 829-4484 email: tsuchiya@thumper.bellcore.com IPLPDN WG, Expires Jan. 1, 1993 [Page 29]