zerouter BoF                                                  R. Perlman
Internet-Draft                                          Sun Microsystems
Expires: December 12, 2003                                   A. Williams
                                                                Motorola
                                                           June 13, 2003


                      Design for a Routing Bridge
                 draft-perlman-zerouter-rbridge-00.txt

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at http://
   www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on December 12, 2003.

Copyright Notice

   Copyright (C) The Internet Society (2003).  All Rights Reserved.

Abstract

   This design provides the ability to have an entire campus, with
   multiple physical links, look to IP like a single subnet.  This
   capability is often provided today with bridges.  Bridges have the
   advantage of being plug-and-play.  However, they have disadvantages:
   routing is confined to a spanning tree, the header on which the
   spanning tree forwards has no hop count, spanning tree forwarding in
   the presence of loops spawns exponential copies of packets, nodes can
   have only a single point of attachment, and the spanning tree, in
   order to avoid temporary loops, is slow to start forwarding on new
   ports.  The design in this paper avoids those disadvantages of


Perlman & Williams      Expires December 12, 2003               [Page 1]

Internet-Draft               Routing Bridge                    June 2003


   bridges.  The basic design is layer 3-independent, and is a design
   for bridging with a shortest-path routing algorithm (instead of
   spanning tree paths), and with more robust forwarding.  Then the
   design is extended to provide IP-specific optimizations.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Details of the Rbridge Scheme  . . . . . . . . . . . . . . . .  5
   2.1 Rbridge Addresses, parameters, and constants . . . . . . . . .  5
   2.2 The routing algorithm  . . . . . . . . . . . . . . . . . . . .  5
   2.3 The envelope . . . . . . . . . . . . . . . . . . . . . . . . .  6
   2.4 Link Cache . . . . . . . . . . . . . . . . . . . . . . . . . .  6
   2.5 The Spanning Tree  . . . . . . . . . . . . . . . . . . . . . .  6
   2.6 Data Packet handling . . . . . . . . . . . . . . . . . . . . .  7
   3.  Optimization for IP  . . . . . . . . . . . . . . . . . . . . .  8
   3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  8
   3.2 IP Data Packet handling  . . . . . . . . . . . . . . . . . . .  8
   3.3 Handling ARPs  . . . . . . . . . . . . . . . . . . . . . . . .  8
   3.4 Keeping the link cache up-do-date  . . . . . . . . . . . . . .  9
   4.  Alternatives . . . . . . . . . . . . . . . . . . . . . . . . .  9
   5.  Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 10
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10
   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 10
   8.  Intellectual Property Notice . . . . . . . . . . . . . . . . . 11
       Normative References . . . . . . . . . . . . . . . . . . . . . 11
       Informative References . . . . . . . . . . . . . . . . . . . . 11
       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 11
       Full Copyright Statement . . . . . . . . . . . . . . . . . . . 12


Perlman & Williams      Expires December 12, 2003               [Page 2]

Internet-Draft               Routing Bridge                    June 2003


1. Introduction

   Bridges can transparently glue many physical links into what appears
   to IP to be a single LAN.  However, routing via the spanning tree
   concentrates traffic onto selected links, is slow to bring new
   connectivity on-line because temporary loops are a disaster (with no
   hop count in the header and exponential proliferation of packets
   during loops), and nodes must have a distinct layer 2 address for
   each point of attachment.

   There have been proposals for having routers within a campus
   automatically number links with distinct IP subnet numbers.  Although
   this makes a campus plug-and-play, it requires a large number of IP
   subnet numbers, a node must change its address if it moves to a
   different link, and addresses of nodes might fluctuate as the
   topology changes and links must be renumbered.

   The first concept is to use routing for bridging (where "bridging"
   means forwarding according to the layer 2 header, and making no
   assumption about endnode behavior beyond layer 2).

   Let us refer to the region that should appear to be a single LAN (and
   a single IP subnet) as "the campus".

   We'll call the devices that will implement what is in this paper
   "Rbridges" (routing bridges).  It is possible, within a campus, to
   mix bridges with Rbridges.  A true router terminates the campus
   (i.e., is on the boundary).  A bridge is internal to what Rbridges
   see as a link.  Two Rbridges are neighbors if they are connected via
   a bridged LAN.  Rbridges, like routers, terminate a LAN and do not
   participate in the bridge spanning tree or bridge forwarding.

   The basic idea behind this proposal is that the Rbridges within the
   campus run a link state routing protocol such as IS-IS among
   themselves, so that they at all times compute an optimal path from
   themselves to each other Rbridge.

   Rbridges also compute a spanning tree among themselves, on which
   packets for unknown destinations will be forwarded.  This is a
   different spanning tree, and differently computed, from the spanning
   tree that bridges compute.  Rbridges will not forward regular bridge
   spanning tree messages, or participate in any other way as a bridge.
   Instead, like routers, Rbridges terminate LANs.  But unlike routers,
   Rbridges will glue many links together into what would appear to
   layer 3 routing to be a single subnet.

   When data packets are travelling between Rbridges within the campus,
   they will be encapsulated in an additional header, which will specify


Perlman & Williams      Expires December 12, 2003               [Page 3]

Internet-Draft               Routing Bridge                    June 2003


   the destination Rbridge and a hop count.  This header will be
   inserted by the source Rbridge, and removed by the destination
   Rbridge.  We call this extra header the "envelope".

   Since there might be bridges on the path between two Rbridges, there
   must be an additional layer 2 header on top of the envelope.  This
   layer 2 header will contain the transmitting and next hop Rbridge
   addresses (or when the packet is intended for all Rbridges on a LAN,
   a multicast address), and a new Ethertype that indicates that inside
   is an Rbridge envelope.  We'll call that Ethertype "Rbtype".

   Note that the extra layer 2 header is inserted and deleted on an
   Rbridge-hop basis, so there is no possibility of confusing bridge
   learning.  If the original source MAC address were seen by bridges in
   an outer layer 2 header, bridge learning would be confused, since
   this scheme allows packets to be routed along non-spanning tree
   paths.

   IS-IS selects, for each link, a "Designated Router" (DR).  This
   election must be per-port, so if a router R has two ports onto the
   same bridged LAN, at most one of the ports will be elected DR.  An
   alternate way of looking at it is that R must notice, because of the
   Designated Router election, that two of its ports, pa and pb, are on
   the same link, and R must never forward packets between ports pa and
   pb.  Also, since pa and pb are equivalent, the link cache should
   combine the learning seen on pa and pb.

   Only the DR on a link is allowed to learn the membership of the local
   link based on observing "naked packets" (packets without the extra
   envelope), and only the DR is allowed to delete an envelope and
   forward the resulting naked packet onto the local link.

   A DR, R, learns the endnode membership on its local link, and
   includes a list of MAC addresses that should be sent to R in its link
   state information.  This enables other Rbridges to know what
   destination Rbridge to address a packet to, for a given MAC address.

   The DR maintains a "link cache" of (link, node address) pairs for
   endnodes on links for which that Rbridge is DR.  For an Rbridge R1
   that is distant from destination D, it is only relevant that D must
   be sent to R2.  However, R2 must know which of its links D resides
   on.

   If the DR R sees a packet, without an envelope, it looks at the
   destination address D.

   a) If R's endnode cache indicates (D,R) (i.e., R itself owns D), then
      R forwards the packet as specified in the link cache (possibly not


Perlman & Williams      Expires December 12, 2003               [Page 4]

Internet-Draft               Routing Bridge                    June 2003


      forwarding it at all if D resides on the link from which the
      packet was received).  R adds no envelope in this case since the
      packet is going directly from the source link to the destination
      link.

   b) Else, if R's endnode cache indicates (D, R1), then R attaches an
      envelope to the packet indicating R1 as destination Rbridge, and
      forwards the packet towards R1.  In addition to the envelope, R
      must attach an additional layer 2 header, putting its own MAC
      address on that link as source address and the MAC address of the
      neighbor Rbridge to which the packet is being forwarded, as the
      destination, and the Ethertype "Rbtype".

   c) Else, (destination is unknown), R attaches an envelope to the
      packet indicating the special value "0" as destination Rbridge.
      This indicates that the packet should be sent through the spanning
      tree.  Each Rbridge forwards such a packet along the spanning
      tree, and additionally, if the Rbridge is a DR, it removes the
      envelope and forwards the packet onto each link for which that
      Rbridge is DR.  The additional layer 2 header will contain the
      source address R, the Ethertype Rbtype, and the destination the
      (to be assigned) layer 2 multicast address "All-Rbs".


2. Details of the Rbridge Scheme

2.1 Rbridge Addresses, parameters, and constants

   Each Rbridge needs a unique ID within the campus.  The simplest such
   address is a unique 6-byte ID, since such an ID is easily obtainable
   as any of the EUI-48's owned by that Rbridge.  IS-IS already requires
   each router to have such an address.

   A parameter is the value to which to initially set the hop count in
   the envelope.  Recommended default=20.

   An Ethertype must be assigned as "Rbtype".

   A layer 2 multicast address must be assigned for All-Rbs.

2.2 The routing algorithm

   IS-IS, without modifications, will compute paths between all routers
   within the campus, using EUI-48's as the unique IDs.

   In addition, a TLV value needs to be added for reporting MAC
   addresses of local endnodes.


Perlman & Williams      Expires December 12, 2003               [Page 5]

Internet-Draft               Routing Bridge                    June 2003


2.3 The envelope

   The information in the envelope is:

       +--------------+-----------+
       | dest Rbridge | hop count |
       |  (6 bytes)   |  (2 bytes)|
       +--------------+-----------+

   The value "0" for "dest Rbridge" indicates the destination Rbridge is
   unknown, and the packet should travel via the spanning tree.  If dest
   Rbridge=0, then next Rbridge is also 0.  If dest Rbridge is not 0 (it
   is a specific Rbridge), then "next Rbridge" indicates the neighbor
   Rbridge to which the packet is being forwarded.

2.4 Link Cache

   The link cache is kept by a DR, and is populated based on observing
   packets without envelopes.  It consists of the mapping between S and
   the specific link from which a packet from endnode S was received.

   These caches are refreshed based on seeing data, and timed out and
   entries deleted if some time has gone by without seeing data from
   that endnode.

2.5 The Spanning Tree

   Packets for unknown destinations, or packets for link level
   multicast/broadcasts (such as ARP packets) are sent through the
   spanning tree, with an envelope indicating destination Rbridge=0.
   There is no need to run an additional protocol for computing the
   spanning tree.  Instead, the link state database is used.  The
   Rbridge R with the lowest EUI-48 is chosen as the root of the
   spanning tree, and shortest paths from R are computed through the
   normal IS-IS Dijkstra algorithm.  Links on that shortest path tree
   are in the spanning tree.  It is vital that all Rbridges calculate
   the same spanning tree.  Therefore there must be a well-defined tie-
   breaker in the case of equal cost paths.

   The tie-breaker is that, when attaching Rbridge R3 to the tree, if R3
   has equally minimal cost paths using parent R1 or R2, the parent, R1,
   with the smallest ID is chosen.

   If there are multiple links between R3 and R1, this is irrelevant
   except between R3 and R1.  Such parallel links can actually be
   considered to be part of a fatter pipe, and packets can be load split
   across those links, or any of those links can be chosen.


Perlman & Williams      Expires December 12, 2003               [Page 6]

Internet-Draft               Routing Bridge                    June 2003


2.6 Data Packet handling

   If a data packet without an envelope is received by R on link L with
   (layer 2) destination D:

   a) if D=R, then R should process the packet (R is the destination)

   b) else, if R is not DR on L, drop the packet

   c) else, (R is DR on L):

      c1) if D is in R's endnode cache on link L1, then forward the
         packet onto link L1 (unless L=L1, in which case drop the
         packet)

      c2) else (D is not local), if the link state database indicates D
         is local to R1, then add an envelope indicating dest
         Rbridge=R1, add an extra layer 2 header indicating R's MAC
         address as source and the next-hop Rbridge towards R1 as
         destination, and forward the packet

      c3) else, (D is not local, and is unknown), the add an envelope
         indicate dest Rbridge=0 and forward the packet on the spanning
         tree, as well as forwarding the naked packet onto all other
         links for which R is DR.  Add to the enveloped packet an
         additional layer 2 header with R's MAC address on the link to
         which the packet is being forwarded as source, and the layer 2
         multicast address "All-Rbs" as destination, and Ethertype
         Rbtype.

   If a data packet with an envelope is received by R on link L, with
   layer 2 destination address All-Rbs or R's MAC address on that link
   (otherwise the packet will be discarded):

   a) If destination Rbridge in envelope=0:

      a1) if the packet was received on a non-spanning tree link, drop
         the packet

      a2) else, forward the packet onto all links in the spanning tree,
         decrementing the hop count in the envelope, and adding the
         extra layer 2 header with destination=All-Rbs.  Also, for each
         link on which R is DR, remove the envelope and forward the
         packet onto the link.

   b) If destination Rbridge in envelope=R, then remove the envelope,
      and if D is locally attached on link L1, forward the naked packet
      onto L1.  If D is not locally attached, drop the packet.


Perlman & Williams      Expires December 12, 2003               [Page 7]

Internet-Draft               Routing Bridge                    June 2003


   c) If destination Rbridge in envelope=Ri, not equal to 0 or R:

      c1) forward the packet towards Ri, decrementing the hop count, and
         adding a new layer 2 header with source address=R's MAC address
         on the link to which R is forwarding, and destination address
         the MAC address of the Rbridge which is the next hop towards
         Ri.


3. Optimization for IP

3.1 Introduction

   With the design above, IP would work on top of such an Rbridged
   campus.  However, there are some optimizations possible.

   To make optimizations for IP, Rbridges look beyond the layer 2
   header.  For IP packets, the DR (in addition to learning the source
   layer 2 address) learns the source IP address.  This information (IP
   addresses, MAC address) of the source of the packet is sent around in
   link state information.

   This optimization allows:

   a) a local Rbridge to answer ARP queries for destination IP addresses
      that have been learned through the link state information.  This
      keeps ARP traffic from being flooded throughout the campus.

   b) More timely keep-alives of IP addresses on the local link, since
      IP provides some mechanisms such as ARP that the DR can use to
      ensure that that IP address still resides on the link


3.2 IP Data Packet handling

   If a naked packet is received by R, if R is the DR, then in addition
   to learning the source MAC address, R checks to see if the layer 2
   protocol type indicates "IP", and if so, also learns the location of
   the source IP address, assuming that the source address is within the
   campus's IP prefix.

3.3 Handling ARPs

   Only the DR (and the real destination, if it's on that link) will
   answer an ARP query.  If R is DR, and sees an ARP query for D:

   a) if D is unknown, R creates its own ARP query (indicating itself as
      the querying source), using an envelope indicating destination


Perlman & Williams      Expires December 12, 2003               [Page 8]

Internet-Draft               Routing Bridge                    June 2003


      Rbridge=0, and forwards the ARP query along the spanning tree

   b) if D is known to reside on another link for which R is DR, R
      responds to the ARP query with the MAC address of D

   c) if D is known to reside on the same link, R drops the ARP query
      and lets D respond

   d) If D is known (through the link state database) as being attached
      to R1, with the mapping (D,d), then R responds to the ARP query
      with D's MAC address "d".

   If R receives a response to its ARP query from D, and D is not in the
   link state database, then R responds to the original ARP query with
   the MAC address indicated in the received ARP response.

3.4 Keeping the link cache up-do-date

   To ensure that IP addresses remain in the link cache if the endnode
   is still attached, the DR, once it learns that S is on the link,
   periodically issues ARP queries to S on that link.  This cuts down on
   flooded ARP queries from the campus for S, since S will remain in the
   link state database as long as it is alive.  It also allows quick
   learning that S is down, so that it can be removed from the link
   state database (and be reachable at its new location, if it has
   moved).

4. Alternatives

   Instead of passing around MAC addresses and IP addresses in link
   state information, this information could be learned by all Rbridges
   based on seeing data traffic.  This could be accomplished by adding
   an additional field to the envelope consisting of "source Rbridge".
   When any router R1 observed a data packet with source Rbridge=R, R1
   sould copy the inside layer 2 source address, and (if it's an IP
   packet), the inside IP address, and make a mapping that that layer 2
   address (and that layer 3 address) should be routed to R.

   This alternative, although elegant, had the disadvantages:

   a) it increases the size of the envelope for all packets (since the
      field "source Rbridge" must be included

   b) it forces more processing on enveloped data packets by all
      Rbridges, since every such packet must be examined to find the
      inner source layer 2 and layer 3 (if it's an IP packet) addresses

   c) it does not allow the tighter mapping of link location possible by


Perlman & Williams      Expires December 12, 2003               [Page 9]

Internet-Draft               Routing Bridge                    June 2003


      having the DR on the link explicitly poll the endnode to see if it
      is still alive.  Therefore, caches of routers would be slower to
      remove incorrect entries when an endnode moves.


5. Conclusions

   This design allows a plug-and-play campus to appear as a single IP
   subnet, with a stable routing protocol and robust forwarding header
   (as opposed to spanning tree, where routes are suboptimal, the header
   does not contain a hop count and packets can proliferate
   exponentially when being forwarded, and to avoid temporary loops a
   time must expire before new links can start being used for
   forwarding).

   There is a possibility of one-hop suboptimality, if the DR is not the
   optimal entrance point to the destination LAN.  However, given that
   most topologies are switched LANs, this would be rare.

   There is also the possibility of an additional one-hop suboptimality
   at the source LAN, since the DR might not be the optimal exit point
   from that LAN, and the DR might forward to R1, on the same LAN.  It
   is possible to eliminated this one-hop suboptimality by having R1
   know this, through the routing algorithm, or by being explicitly
   delegated to by the DR for this destination, and having R1 forward
   the packet directly.  This optimization is not trivial to implement,
   and given today's topologies of switched LANs, it's not necessarily
   worth it to implement this.

   This solution has all the advantages of a bridged LAN, and is
   considerably more stable, and allows optimal routing.

6. Security Considerations

   With this design, an endnode could transmit a packet with a forged
   source address and confuse the Rbridge learning, but this can be done
   with today's bridged LANs.  If instead the campus were implemented as
   separate IP subnets, with routers instead of bridges, endnodes will
   have addresses explicit to their links, so an endnode on one link
   cannot as easily subvert routing to another endnode.

   TBD.  Check rpsec for list of requirements

7. IANA Considerations

   No known IANA considerations arise from this document.


Perlman & Williams      Expires December 12, 2003              [Page 10]

Internet-Draft               Routing Bridge                    June 2003


8. Intellectual Property Notice

   Sun Microsystems may claim intellectual property rights over portions
   of the design described in this document.

   Some of the design may be covered by intellectual property from
   Digital Equipment Corporation.

Normative References

Informative References


Authors' Addresses

   Radia Perlman
   Sun Microsystems
   One Network Drive
   Burlington, MA  01803
   USA

   Phone: +1 781 442 0086
   EMail: Radia.Perlman@sun.com


   Aidan Williams
   Motorola Australian Research Centre
   Locked Bag 5028
   Botany, NSW  1455
   Australia

   Phone: +61 2 9666 0500
   EMail: Aidan.Williams@motorola.com
   URI:   http://www.motorola.com.au/marc/


Perlman & Williams      Expires December 12, 2003              [Page 11]

Internet-Draft               Routing Bridge                    June 2003


Full Copyright Statement

   Copyright (C) The Internet Society (2003).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Acknowledgement

   Funding for the RFC Editor function is currently provided by the
   Internet Society.


Perlman & Williams      Expires December 12, 2003              [Page 12]