LSR Working Group                                           H. Smit, Ed.
Internet-Draft
Intended status: Standards Track                         G. Van de Velde
Expires: April 25, 2019                                            Nokia
                                                        October 22, 2018


                    IS-IS Sparse Link-State Flooding
                      draft-hsmit-lsr-isis-dnfm-00

Abstract

   This document describes a technology extension to reduce link-state
   flooding in highly resilient dense networks.  It does this by using
   simple and backwards-compatible extensions to reduce the number of
   adjacencies over which link-state flooding takes place.

   "IS-IS Sparse Link-State Flooding" is an extension to the IS-IS
   routing protocol.

   It is relatively easy to understand and implement.  It is backwards
   compatible.  It requires no per-node configuration.  It uses a
   distributed algorithm, therefor no centralized computations are
   required.  No complex computations are required on each node in the
   network.  The algorithm has no requirements for the network topology.
   It can be deployed in a redundant way to improve robustness and
   convergence-times.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [1].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."


Smit & Van de Velde      Expires April 25, 2019                 [Page 1]

Internet-Draft      IS-IS Sparse Link-State Flooding        October 2018


   This Internet-Draft will expire on April 25, 2019.

Copyright Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  High level overview of Sparse Link-State Flooding . . . . . .   3
   3.  The Sparse Link-State Flooding algorithm in detail  . . . . .   4
     3.1.  Role of the Anchor  . . . . . . . . . . . . . . . . . . .   4
     3.2.  Bootstrapping the flooding  . . . . . . . . . . . . . . .   4
     3.3.  Determining which adjacency a router wants to flood over    5
     3.4.  Determining where flooding can be suppressed  . . . . . .   5
   4.  Using multiple concurrent flooding topologies . . . . . . . .   7
   5.  Benefits of the Sparse Link-State Flooding algorithm  . . . .   7
   6.  Extensions to IS-IS PDUs  . . . . . . . . . . . . . . . . . .   8
     6.1.  Anchor TLV in LSPs  . . . . . . . . . . . . . . . . . . .   8
     6.2.  Flooding-Suppression TLV in IIHs  . . . . . . . . . . . .   8
   7.  Operations of the new Sparse Link-State Flooding algorithm  .   9
     7.1.  Flooding at the anchor itself . . . . . . . . . . . . . .   9
     7.2.  New action after each SPF . . . . . . . . . . . . . . . .   9
     7.3.  When sending a IIH  . . . . . . . . . . . . . . . . . . .  10
     7.4.  When receiving a IIH  . . . . . . . . . . . . . . . . . .  10
     7.5.  When installing a new LSP in the LSDB . . . . . . . . . .  10
     7.6.  Preventing loops in the flooding topology . . . . . . . .  10
     7.7.  Fall-back to classic full flooding  . . . . . . . . . . .  11
   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  11
   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  11
   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  11
     10.1.  Normative References . . . . . . . . . . . . . . . . . .  11
     10.2.  Informative References . . . . . . . . . . . . . . . . .  11
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  12


Smit & Van de Velde      Expires April 25, 2019                 [Page 2]

Internet-Draft      IS-IS Sparse Link-State Flooding        October 2018


1.  Introduction

   In dense network topologies, using massive ECMP or massive numbers of
   resilient links, the flooding algorithm of link-state protocols is
   highly redundant.  This results in unnecessary overhead, potentially
   overloading control planes, decreasing robustness and slowing down
   convergence.  Because of this percepted inefficiency, some operators
   have resorted to using BGP as the IGP in their data center networks.
   Draft-li-dynamic-flooding [3] describes this in more detail.  However
   it is very clear that using an Exterior Gateway Protocol as an IGP is
   sub-optimal, if only due to the configuration overhead.

   This document proposes a technology extension to reduce the number of
   interfaces over which a link-state protocol floods its updates in
   highly resilient networks.  The result is a sparse flooding topology
   over a dense physical network topology.  We describe details how to
   implement this algorithm for the IS-IS protocol [2].  This algorithm
   can be extended to other link-state routing protocols, like OSPF.
   However, no details for protocols other IS-IS are included in this
   document.

   This proposal uses simple and backwards-compatible extensions.  It is
   easy to understand and and relatively easy to implement for IS-IS
   coders.  These proposed IS-IS extensions do not require additional
   configuration on every router.  However, it might be beneficial for
   the operation of the algorithm to manually configure one or more
   routers as "anchors" in the network.  The purpose of an "anchor" is
   explained in the next section of this document.  This extension uses
   a distributed algorithm.  No centralized calculations need to be
   performed.  Each pair of routers decide for themselves where flooding
   can be suppressed.  After ever regular SPF computation a router can
   adjust the interfaces over which it does flooding.  This decision
   requires no computational-complex calculations.

2.  High level overview of Sparse Link-State Flooding

   The goal of the new Sparse Link-State Flooding algorithm is to create
   a tree of nodes and links, over which updates will be flooded.  This
   tree is called "the flooding topology".  The flooding topology
   includes all the nodes in the network.  But it includes only a
   (small) subset of all available links in the physical network.

   The idea is that the flooding topology starts at a single router in
   the network.  This single router is called "the anchor".  Routers
   that are adjacent to the anchor will "attach" or "clamp" themselves
   to the flooding topology.  Making the flooding topology bigger.
   Their neighbors will "attach" themselves as well, making the flooding
   topology spread out.  In the end all routers will be part of the


Smit & Van de Velde      Expires April 25, 2019                 [Page 3]

Internet-Draft      IS-IS Sparse Link-State Flooding        October 2018


   flooding topology.  The flooding topology resembles a tree, with the
   anchor as the root of the tree.

   The decision to flood or not flood over an adjacency is a local
   matter.  This makes the algorithm a distributed algorithm.  The
   flooding topology itself is not flooded through the network.  Only
   the location of the anchor(s) is announced in LSPs.  An anchor
   announces itself by including this information in its LSP.  Two
   adjacent routers determine whether they need to exchange LSPs or not
   via a mechanism using a new TLV in hello PDUs (IIHs).

   This algorithm can be run once, or multiple times in parallel.  This
   creates one or more concurrent flooding topologies.  This provides
   robustness and faster convergence to the flooding process.  We
   envision that anchors are configured manually, like BGP's Route
   Reflectors.  Or they can be elected automatically.  For this the
   anchor-TLV contains a priority field, to allow operators to have
   influence on the location of the anchor(s).

3.  The Sparse Link-State Flooding algorithm in detail

3.1.  Role of the Anchor

   Each flooding topology needs a root of its tree.  The router acting
   as root is called "the anchor" of a flooding topology.  An anchor
   router includes information in its LSP to announce that it wants to
   function as an anchor.  This information can be encoded as a new TLV,
   or as a new capability in the existing IS-IS capability TLV.  This
   choice is open for discussion.

   The content of this new TLV includes a priority.  If multiple routers
   advertise their willingness to act as an anchor, the anchor with the
   highest priority is chosen as the anchor.  If multiple potential
   anchors have the same priority, then the router with the highest
   system-id is chosen as the anchor.

   Besides announcing itself as an anchor in its LSP, the role of the
   anchor-route is purely passive.  No extra actions are required of the
   anchor.

3.2.  Bootstrapping the flooding

   When a router boots, or when a new adjacency comes up, routers need
   to synchronize their LSDBs.  The reason is that a network could have
   been partitioned in two separate parts.  And flooding over the new
   adjacency might be the only way to make the two parts of the network
   aware of each other.


Smit & Van de Velde      Expires April 25, 2019                 [Page 4]

Internet-Draft      IS-IS Sparse Link-State Flooding        October 2018


   After the LSDBs are synchronized, and at least one SPF computation
   has been executed, the new algorithm can be used.  An implementation
   could use a longer grace period to wait before using the new
   algorithm, to ensure all or most of the LSPs in a network have been
   received.

3.3.  Determining which adjacency a router wants to flood over

   The decision to do regular flooding, or suppress flooding, is done as
   follows.  After each SPF computation, a router looks at the newly
   computed route towards the anchor.  Each router wants to do flooding
   over the adjacency to a router that is closer to the anchor than it
   is itself.  This guarantees that each router will do flooding with a
   router that is already part of the flooding topology.

   If there are multiple (equal-cost) paths towards the anchor, one of
   the next-hop adjacencies of the route towards the anchor is chosen to
   flood over.  It doesn't matter which adjacency that is, as long as
   the adjacent router is closer to the anchor.

   When the flooding topology breaks, the two routers next to the point
   of breakage will notice.  They will each generate a new LSP.  And
   they will send out that new LSP over the old flooding topology.  The
   LSP generated by the router that is still reachable through the old
   flooding topology will be received by all routers on their side of
   the breakage.  This will trigger new SPF computations on all those
   routers.  This SPF computation will compute a new path towards the
   anchor.  The routers will now adjust their flooding topology
   according to the new path they have just computed.  All routers in
   the network do this.  New LSPs will be flooded over the new flooding
   topology.  Which might trigger a follow-up SPF computation.  Which
   might cause routers to adjust their flooding topology again.  After a
   while all routers will have received all new LSPs.  Which will
   guarantee that they will all compute a new correct flooding topology.

   A requirement is that when routers start using an adjacency for their
   flooding topology, they need to synchronize LSDBs first.  This is
   done by exchanging CSNPs.  This can potentially be done more reliable
   and faster when doing IS-IS Flooding over TCP [4].

3.4.  Determining where flooding can be suppressed

   The decision whether to flood over an adjacency or not is a local
   matter.  Only the two routers of the adjacency are involved in this
   decision.  Both routers have a say in whether flooding will be
   suppressed or not.


Smit & Van de Velde      Expires April 25, 2019                 [Page 5]

Internet-Draft      IS-IS Sparse Link-State Flooding        October 2018


   This document defines a new TLV, called the Flooding-Suppression TLV,
   to be included in Hello PDUs (IIHs).  This new TLV includes a field
   that indicates whether a router wants to do flooding over this
   interface, or wants to suppress flooding.  The content of this TLV is
   set according to the decision made after each SPF, as explained in
   the previous section of this document.

   As a result, a router keeps two new pieces of state for each
   adjacency.

   o  Does the router itself want to flood over this adjacency ? We'll
      call this the adjacency's "suppression-local-request-state".

   o  Does the neighbor want to flood over this adjacency ? We'll call
      this the adjacency's "suppression-neighor-request-state".

   The suppression-local-request-state is determined after each SPF
   computation.

   The suppression-neighbor-request-state is learned from examining the
   Flooding-Suppression TLV in each received IIH.  If a router did not
   include the new Flooding-Suppression TLV in its IIH, it is assumed
   that the neighbor does want to flood over this adjacency.

   When both "suppression-local-request-state" and "suppression-
   neighbor-request-state" are true, then the overall "suppression-
   state" of the interface is set to true.  In that case flooding over
   the interface is to be suppressed.  In all 3 other cases, where at
   least one of the two routers does not want to suppress flooding,
   flooding is done in the normal way.

   So flooding over an adjacency is only suppressed when both neighbors
   have indicated that they want to suppress flooding over the
   adjacency.  This means that when one of the two routers does not
   support this new algorithm, and thus does not include the new TLV in
   its IIH, flooding is always done.  This makes the algorithm backwards
   compatible with routers that do not support this new extension of the
   protocol.

   A router will always have one or more flooding adjacencies.  One
   adjacency that the router itself needs, to "clamp" on to the part of
   the flooding topology that is closer to the anchor than it is itself.
   This adjacency points towards the anchor.  And zero or more
   adjacencies that its neighbors, downstream of the anchor, use to
   clamp themselves onto the flooding topology.  These adjacencies point
   away from the anchor.


Smit & Van de Velde      Expires April 25, 2019                 [Page 6]

Internet-Draft      IS-IS Sparse Link-State Flooding        October 2018


4.  Using multiple concurrent flooding topologies

   It is possible to use more than one flooding topology in parallel.
   This requires more than one anchor.  For each anchor a new flooding
   topology is built.  These flooding topologies can co-exist without
   problems.

   All that is required is that after each SPF computation, the router
   examines the shortest path to each anchor.  And sets the local state
   of each adjacency according to this.  This guarantees that the router
   will "clamp onto" each flooding topology.

   To ensure an optimal use of parallel flooding topologies, all routers
   in an IS-IS flooding domain (area or level-2 backbone) should use the
   same number of parallel flooding topologies.  This can be done
   through configuration.  Or an easier way would be to include the
   number of parallel flooding topologies to use, inside the new Anchor
   TLV.  When looking for Anchors, a router must first find all LSPs
   with the new Anchor TLV.  It then selects the router with the highest
   Anchor-priority as the main anchor.  If multiple router use the same
   priority, the router with the highest system-id is selected as the
   anchor.  Once the main anchor has been determined, a router looks
   inside the new anchor-TLV to determine how many parallel flooding
   topologies it should use.  It then selects that amount of anchors
   with the highest priorities, to set the flooding-state of adjacencies
   pointing towards those anchors.

   Flooding suppression is a local matter.  Therefore an implementation
   can decide to flood over more adjacencies than the minimum to build
   the minimal flooding topology.  It can signal this through the
   Flooding-Supression TLV in its IIHs.  This can improve robustness and
   convergence times, at the cost of some extra flooding overhead.

5.  Benefits of the Sparse Link-State Flooding algorithm

   The algorithm described in this document has a number of advantages.

   o  The algorithm is a distributed algorithm.  Distributed algorithms
      are usually more robust than centralized algorithms.  The flooding
      topology itself does not need to be flooded, which makes the
      algorithm easier when the flooding topology breaks.

   o  The algorithm is backwards compatible.  No flag-day is required to
      introduce this new sparse-flooding extension.  Older routers that
      do not support the new extension will obviously not include the
      flooding-state TLV in their IIHs.  The result of this is that
      regular flooding is done over all adjacencies of those older


Smit & Van de Velde      Expires April 25, 2019                 [Page 7]

Internet-Draft      IS-IS Sparse Link-State Flooding        October 2018


      routers.  This guarantees that older routers will never break the
      flooding topology.

   o  No extra computations have to be done to compute the flooding
      topology.  Using the result of the regular SPF computation
      suffices to determine over which adjacencies a router wants to
      flood.

   o  The proposed algorithm is robust and guarantees that a flooding
      topology eventually heals so that all routers are included in the
      flooding again.

   o  Several instances of the algorithm can be run in parallel.  This
      results in multiple parallel flooding topologies.  Although
      parallel flooding topologies are not required for correct
      operation of the algorithm, it will help in speeding up the
      healing of the flooding topology.  And thus convergence times in
      general.

6.  Extensions to IS-IS PDUs

   To implement this algorithm, we need two extensions of IS-IS PDUs.

6.1.  Anchor TLV in LSPs

   A new Anchor TLV in the LinkState PDUs.  This TLV indicates that a
   router can be used as an anchor.  This new TLV must include a
   priority field.  And it should include a field that suggests how many
   parallel flooding topologies all routers should use.

6.2.  Flooding-Suppression TLV in IIHs

   A new Flooding-Suppression TLV in the IIH PDUs.  This TLV is used to
   indicate to the neighbor if a router wants to suppress flooding over
   the adjacency.  This new TLV holds three fields:

   o  Flooding suppression suggestion field: this field indicates
      whether the sending router would like to suppress flooding over
      this interface or not.  The value of this field is set to the
      current "suppression-local-request-state".  Note, only when two
      routers both indicate they want to suppress flooding, then
      flooding will indeed be suppressed.

   o  Resulting actual suppression field: this field indicates whether
      the sending router will or will not do flooding.  The value of
      this field is set to the current "suppression-state" of the
      interface.  This field is included only for debugging purposes.
      The first field (the received suppression-local-request-state


Smit & Van de Velde      Expires April 25, 2019                 [Page 8]

Internet-Draft      IS-IS Sparse Link-State Flooding        October 2018


      field) is used to make the flooding decision.  The result of that
      decision is announced in the second field.

   o  The number of currently active flooding adjacencies.  This field
      can be used by the receiving router to pick a flooding adjacency
      when there are multiple ECMP paths towards the anchor.  A router
      can pick the upstream router with the least amount of flooding
      adjacencies.  In dense networks with many parallel paths, this can
      help spreading out the load of flooding equally over multiple
      routers.

   Backward compatibility: when a router does not include the Flooding-
   State TLV in the IIHs it sends out, it can be treated as if that
   router included the Flooding-State TLV while setting the first field
   to: "I do not want to suppress flooding".

7.  Operations of the new Sparse Link-State Flooding algorithm

7.1.  Flooding at the anchor itself

   When a router is acting as the anchor, it floods over all its
   interfaces.  It does include the Flooding-Suppression TLV in its
   IIHs, but it always sets the value inside the new TLV to "I do not
   want to suppress flooding".

7.2.  New action after each SPF

   At the end of each SPF computation, a router looks at the best-path
   to reach the anchor-router.  The router sets the "suppression-local-
   request-state" for that adjacency to false.  The router sets the
   "suppression-local-request-state" for all other adjacencies to true.

   If the best-path to the anchor-router's is load-balanced over
   multiple adjacencies, the router picks one of those adjacencies as
   its own "upstream flooding adjacencies".

   A router must take effort to ensure it changes its "upstream flooding
   adjacency" as little as possible.  Switching its upstream flooding
   adjacency is not without cost.  Every time an adjacency changes from
   suppressed flooding to normal flooding, the LSDBs of the two routers
   must be synchronized.

   If the "suppression-local-request-state" changed for one or more
   adjacencies, compared to the state after the previous SPF
   computation, the router will re-compute the "suppression-state".  If
   the "suppression-state" of an adjacency changes, the router will
   start or stop flooding over that adjacency.


Smit & Van de Velde      Expires April 25, 2019                 [Page 9]

Internet-Draft      IS-IS Sparse Link-State Flooding        October 2018


7.3.  When sending a IIH

   When a router sends an IIH, it includes the new Flooding-Suppression
   TLV.

   For adjacencies that were selected as "upstream flooding adjacency",
   the value of the Flooding-Suppression TLV must be set to: "I do not
   want to suppress flooding".  For all other adjacencies the value must
   be set to: "I do want to suppress flooding".

7.4.  When receiving a IIH

   When a router receives an IIH, it checks for the existence of the new
   Flooding-Suppression TLV.

   If it there is none, the state of the neighbor is assumed to be: "I
   do not want to suppress flooding".

   If the "suppression-remote-request-state" changed for this adjacency,
   compared to the state after receiving the previous IIH, the router
   will re-compute the "suppression-state".  If the "suppression-state"
   of an adjacency changes, the router will start or stop flooding over
   that adjacency.

7.5.  When installing a new LSP in the LSDB

   When a router receives a new LSP, it installs it in the LSDB.  It
   will normally then set the IS-IS SRM (Send Routing Message) bits for
   all adjacencies (in UP state).  Now, with the new algorithm, it will
   set SRM-bits for only the adjacencies that are part of the reduced
   flooding topology.

7.6.  Preventing loops in the flooding topology

   When the flooding topology changes, during a short period of time
   different routers can have a different view of the flooding topology.
   This can make the actual flooding topology in use be a random cyclic
   graph, instead of a non-cyclic tree.  This is not a problem.  The
   flooding algorithm in link-state protocols deals with this by
   default.  An LSP is only (scheduled to be) flooded the first time it
   is received and installed in the LSDB.

   The Sparse Link-State Flooding algorithm has some resemblance to the
   Spanning Tree Protocol used by transparent bridges.  Transient
   forwarding loops can be a huge problem in the operation of a SPT
   network.  However, while the flooding topology can be looping for
   short periods of time, this is not a problem at all.  Because as
   described in the previous paragraph, link-state flooding will take


Smit & Van de Velde      Expires April 25, 2019                [Page 10]

Internet-Draft      IS-IS Sparse Link-State Flooding        October 2018


   care of this by default.  This works because routers keep copies of
   the LSPs they forward in their LSDB.  This allows them to determine
   if they have received an LSP before or not.  In STP routers have no
   recollection of data-frames that they have forwarded in the past.  So
   in STP looping frames can not be recognized as looping.

   To improve convergence times during changes of the flooding topology
   it is recommended that when a router changes the state of an
   adjacency from flooding to non-flooding, both routers keep flooding
   over this adjacency for a short period of time.  A suggested value
   for this is 30 or 60 seconds.  By doing this, during changes of the
   flooding topology, both the old and the new topology will be in use.
   This guarantees that LSPs are flooded as quickly as possible.  This
   will also help in repairing the flooding topology itself.

7.7.  Fall-back to classic full flooding

   When a router thinks it might have got behind on flooding, it can
   always fall back to normal flooding behaviour.  It omits including
   the Flooding-Suppression TLV from its IIHs.  Consequently, classic
   flooding will allow guaranteed synchronization of its IS-IS LSDB with
   all neighbors.  This can be done on all adjacencies at once, or on
   subset.

8.  Security Considerations

   This draft introduces no new security considerations.

9.  IANA Considerations

   This document requests a new TLV and sub-TLV for IS-IS.

10.  References

10.1.  Normative References

   [1]        Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997,
              <http://xml.resource.org/public/rfc/html/rfc2119.html>.

10.2.  Informative References

   [2]        International Standard 10589, "Intermediate System to
              Intermediate System intra- domain routeing information
              exchange protocol for use in conjunction with the protocol
              for providing the connectionless-mode network service (ISO
              8473), Second Edition.", 2002.


Smit & Van de Velde      Expires April 25, 2019                [Page 11]

Internet-Draft      IS-IS Sparse Link-State Flooding        October 2018


   [3]        Li, T. and P. Psenak, "Dynamic Flooding on Dense Graphs",
              June 2018.

   [4]        Smit, H. and G. Van De Velde, "IS-IS Flooding over TCP",
              October 2018.

Authors' Addresses

   Henk Smit (editor)
   NL

   Email: hhw.smit@xs4all.nl


   Gunter Van de Velde
   Nokia
   Copernicuslaan 50
   Antwerp
   BE

   Email: gunter.van_de_velde@nokia.com


Smit & Van de Velde      Expires April 25, 2019                [Page 12]