LSR Working Group H. Smit, Ed. Internet-Draft Intended status: Standards Track G. Van de Velde Expires: April 25, 2019 Nokia October 22, 2018 IS-IS Sparse Link-State Flooding draft-hsmit-lsr-isis-dnfm-00 Abstract This document describes a technology extension to reduce link-state flooding in highly resilient dense networks. It does this by using simple and backwards-compatible extensions to reduce the number of adjacencies over which link-state flooding takes place. "IS-IS Sparse Link-State Flooding" is an extension to the IS-IS routing protocol. It is relatively easy to understand and implement. It is backwards compatible. It requires no per-node configuration. It uses a distributed algorithm, therefor no centralized computations are required. No complex computations are required on each node in the network. The algorithm has no requirements for the network topology. It can be deployed in a redundant way to improve robustness and convergence-times. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [1]. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Smit & Van de Velde Expires April 25, 2019 [Page 1] Internet-Draft IS-IS Sparse Link-State Flooding October 2018 This Internet-Draft will expire on April 25, 2019. Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. High level overview of Sparse Link-State Flooding . . . . . . 3 3. The Sparse Link-State Flooding algorithm in detail . . . . . 4 3.1. Role of the Anchor . . . . . . . . . . . . . . . . . . . 4 3.2. Bootstrapping the flooding . . . . . . . . . . . . . . . 4 3.3. Determining which adjacency a router wants to flood over 5 3.4. Determining where flooding can be suppressed . . . . . . 5 4. Using multiple concurrent flooding topologies . . . . . . . . 7 5. Benefits of the Sparse Link-State Flooding algorithm . . . . 7 6. Extensions to IS-IS PDUs . . . . . . . . . . . . . . . . . . 8 6.1. Anchor TLV in LSPs . . . . . . . . . . . . . . . . . . . 8 6.2. Flooding-Suppression TLV in IIHs . . . . . . . . . . . . 8 7. Operations of the new Sparse Link-State Flooding algorithm . 9 7.1. Flooding at the anchor itself . . . . . . . . . . . . . . 9 7.2. New action after each SPF . . . . . . . . . . . . . . . . 9 7.3. When sending a IIH . . . . . . . . . . . . . . . . . . . 10 7.4. When receiving a IIH . . . . . . . . . . . . . . . . . . 10 7.5. When installing a new LSP in the LSDB . . . . . . . . . . 10 7.6. Preventing loops in the flooding topology . . . . . . . . 10 7.7. Fall-back to classic full flooding . . . . . . . . . . . 11 8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 10.1. Normative References . . . . . . . . . . . . . . . . . . 11 10.2. Informative References . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 Smit & Van de Velde Expires April 25, 2019 [Page 2] Internet-Draft IS-IS Sparse Link-State Flooding October 2018 1. Introduction In dense network topologies, using massive ECMP or massive numbers of resilient links, the flooding algorithm of link-state protocols is highly redundant. This results in unnecessary overhead, potentially overloading control planes, decreasing robustness and slowing down convergence. Because of this percepted inefficiency, some operators have resorted to using BGP as the IGP in their data center networks. Draft-li-dynamic-flooding [3] describes this in more detail. However it is very clear that using an Exterior Gateway Protocol as an IGP is sub-optimal, if only due to the configuration overhead. This document proposes a technology extension to reduce the number of interfaces over which a link-state protocol floods its updates in highly resilient networks. The result is a sparse flooding topology over a dense physical network topology. We describe details how to implement this algorithm for the IS-IS protocol [2]. This algorithm can be extended to other link-state routing protocols, like OSPF. However, no details for protocols other IS-IS are included in this document. This proposal uses simple and backwards-compatible extensions. It is easy to understand and and relatively easy to implement for IS-IS coders. These proposed IS-IS extensions do not require additional configuration on every router. However, it might be beneficial for the operation of the algorithm to manually configure one or more routers as "anchors" in the network. The purpose of an "anchor" is explained in the next section of this document. This extension uses a distributed algorithm. No centralized calculations need to be performed. Each pair of routers decide for themselves where flooding can be suppressed. After ever regular SPF computation a router can adjust the interfaces over which it does flooding. This decision requires no computational-complex calculations. 2. High level overview of Sparse Link-State Flooding The goal of the new Sparse Link-State Flooding algorithm is to create a tree of nodes and links, over which updates will be flooded. This tree is called "the flooding topology". The flooding topology includes all the nodes in the network. But it includes only a (small) subset of all available links in the physical network. The idea is that the flooding topology starts at a single router in the network. This single router is called "the anchor". Routers that are adjacent to the anchor will "attach" or "clamp" themselves to the flooding topology. Making the flooding topology bigger. Their neighbors will "attach" themselves as well, making the flooding topology spread out. In the end all routers will be part of the Smit & Van de Velde Expires April 25, 2019 [Page 3] Internet-Draft IS-IS Sparse Link-State Flooding October 2018 flooding topology. The flooding topology resembles a tree, with the anchor as the root of the tree. The decision to flood or not flood over an adjacency is a local matter. This makes the algorithm a distributed algorithm. The flooding topology itself is not flooded through the network. Only the location of the anchor(s) is announced in LSPs. An anchor announces itself by including this information in its LSP. Two adjacent routers determine whether they need to exchange LSPs or not via a mechanism using a new TLV in hello PDUs (IIHs). This algorithm can be run once, or multiple times in parallel. This creates one or more concurrent flooding topologies. This provides robustness and faster convergence to the flooding process. We envision that anchors are configured manually, like BGP's Route Reflectors. Or they can be elected automatically. For this the anchor-TLV contains a priority field, to allow operators to have influence on the location of the anchor(s). 3. The Sparse Link-State Flooding algorithm in detail 3.1. Role of the Anchor Each flooding topology needs a root of its tree. The router acting as root is called "the anchor" of a flooding topology. An anchor router includes information in its LSP to announce that it wants to function as an anchor. This information can be encoded as a new TLV, or as a new capability in the existing IS-IS capability TLV. This choice is open for discussion. The content of this new TLV includes a priority. If multiple routers advertise their willingness to act as an anchor, the anchor with the highest priority is chosen as the anchor. If multiple potential anchors have the same priority, then the router with the highest system-id is chosen as the anchor. Besides announcing itself as an anchor in its LSP, the role of the anchor-route is purely passive. No extra actions are required of the anchor. 3.2. Bootstrapping the flooding When a router boots, or when a new adjacency comes up, routers need to synchronize their LSDBs. The reason is that a network could have been partitioned in two separate parts. And flooding over the new adjacency might be the only way to make the two parts of the network aware of each other. Smit & Van de Velde Expires April 25, 2019 [Page 4] Internet-Draft IS-IS Sparse Link-State Flooding October 2018 After the LSDBs are synchronized, and at least one SPF computation has been executed, the new algorithm can be used. An implementation could use a longer grace period to wait before using the new algorithm, to ensure all or most of the LSPs in a network have been received. 3.3. Determining which adjacency a router wants to flood over The decision to do regular flooding, or suppress flooding, is done as follows. After each SPF computation, a router looks at the newly computed route towards the anchor. Each router wants to do flooding over the adjacency to a router that is closer to the anchor than it is itself. This guarantees that each router will do flooding with a router that is already part of the flooding topology. If there are multiple (equal-cost) paths towards the anchor, one of the next-hop adjacencies of the route towards the anchor is chosen to flood over. It doesn't matter which adjacency that is, as long as the adjacent router is closer to the anchor. When the flooding topology breaks, the two routers next to the point of breakage will notice. They will each generate a new LSP. And they will send out that new LSP over the old flooding topology. The LSP generated by the router that is still reachable through the old flooding topology will be received by all routers on their side of the breakage. This will trigger new SPF computations on all those routers. This SPF computation will compute a new path towards the anchor. The routers will now adjust their flooding topology according to the new path they have just computed. All routers in the network do this. New LSPs will be flooded over the new flooding topology. Which might trigger a follow-up SPF computation. Which might cause routers to adjust their flooding topology again. After a while all routers will have received all new LSPs. Which will guarantee that they will all compute a new correct flooding topology. A requirement is that when routers start using an adjacency for their flooding topology, they need to synchronize LSDBs first. This is done by exchanging CSNPs. This can potentially be done more reliable and faster when doing IS-IS Flooding over TCP [4]. 3.4. Determining where flooding can be suppressed The decision whether to flood over an adjacency or not is a local matter. Only the two routers of the adjacency are involved in this decision. Both routers have a say in whether flooding will be suppressed or not. Smit & Van de Velde Expires April 25, 2019 [Page 5] Internet-Draft IS-IS Sparse Link-State Flooding October 2018 This document defines a new TLV, called the Flooding-Suppression TLV, to be included in Hello PDUs (IIHs). This new TLV includes a field that indicates whether a router wants to do flooding over this interface, or wants to suppress flooding. The content of this TLV is set according to the decision made after each SPF, as explained in the previous section of this document. As a result, a router keeps two new pieces of state for each adjacency. o Does the router itself want to flood over this adjacency ? We'll call this the adjacency's "suppression-local-request-state". o Does the neighbor want to flood over this adjacency ? We'll call this the adjacency's "suppression-neighor-request-state". The suppression-local-request-state is determined after each SPF computation. The suppression-neighbor-request-state is learned from examining the Flooding-Suppression TLV in each received IIH. If a router did not include the new Flooding-Suppression TLV in its IIH, it is assumed that the neighbor does want to flood over this adjacency. When both "suppression-local-request-state" and "suppression- neighbor-request-state" are true, then the overall "suppression- state" of the interface is set to true. In that case flooding over the interface is to be suppressed. In all 3 other cases, where at least one of the two routers does not want to suppress flooding, flooding is done in the normal way. So flooding over an adjacency is only suppressed when both neighbors have indicated that they want to suppress flooding over the adjacency. This means that when one of the two routers does not support this new algorithm, and thus does not include the new TLV in its IIH, flooding is always done. This makes the algorithm backwards compatible with routers that do not support this new extension of the protocol. A router will always have one or more flooding adjacencies. One adjacency that the router itself needs, to "clamp" on to the part of the flooding topology that is closer to the anchor than it is itself. This adjacency points towards the anchor. And zero or more adjacencies that its neighbors, downstream of the anchor, use to clamp themselves onto the flooding topology. These adjacencies point away from the anchor. Smit & Van de Velde Expires April 25, 2019 [Page 6] Internet-Draft IS-IS Sparse Link-State Flooding October 2018 4. Using multiple concurrent flooding topologies It is possible to use more than one flooding topology in parallel. This requires more than one anchor. For each anchor a new flooding topology is built. These flooding topologies can co-exist without problems. All that is required is that after each SPF computation, the router examines the shortest path to each anchor. And sets the local state of each adjacency according to this. This guarantees that the router will "clamp onto" each flooding topology. To ensure an optimal use of parallel flooding topologies, all routers in an IS-IS flooding domain (area or level-2 backbone) should use the same number of parallel flooding topologies. This can be done through configuration. Or an easier way would be to include the number of parallel flooding topologies to use, inside the new Anchor TLV. When looking for Anchors, a router must first find all LSPs with the new Anchor TLV. It then selects the router with the highest Anchor-priority as the main anchor. If multiple router use the same priority, the router with the highest system-id is selected as the anchor. Once the main anchor has been determined, a router looks inside the new anchor-TLV to determine how many parallel flooding topologies it should use. It then selects that amount of anchors with the highest priorities, to set the flooding-state of adjacencies pointing towards those anchors. Flooding suppression is a local matter. Therefore an implementation can decide to flood over more adjacencies than the minimum to build the minimal flooding topology. It can signal this through the Flooding-Supression TLV in its IIHs. This can improve robustness and convergence times, at the cost of some extra flooding overhead. 5. Benefits of the Sparse Link-State Flooding algorithm The algorithm described in this document has a number of advantages. o The algorithm is a distributed algorithm. Distributed algorithms are usually more robust than centralized algorithms. The flooding topology itself does not need to be flooded, which makes the algorithm easier when the flooding topology breaks. o The algorithm is backwards compatible. No flag-day is required to introduce this new sparse-flooding extension. Older routers that do not support the new extension will obviously not include the flooding-state TLV in their IIHs. The result of this is that regular flooding is done over all adjacencies of those older Smit & Van de Velde Expires April 25, 2019 [Page 7] Internet-Draft IS-IS Sparse Link-State Flooding October 2018 routers. This guarantees that older routers will never break the flooding topology. o No extra computations have to be done to compute the flooding topology. Using the result of the regular SPF computation suffices to determine over which adjacencies a router wants to flood. o The proposed algorithm is robust and guarantees that a flooding topology eventually heals so that all routers are included in the flooding again. o Several instances of the algorithm can be run in parallel. This results in multiple parallel flooding topologies. Although parallel flooding topologies are not required for correct operation of the algorithm, it will help in speeding up the healing of the flooding topology. And thus convergence times in general. 6. Extensions to IS-IS PDUs To implement this algorithm, we need two extensions of IS-IS PDUs. 6.1. Anchor TLV in LSPs A new Anchor TLV in the LinkState PDUs. This TLV indicates that a router can be used as an anchor. This new TLV must include a priority field. And it should include a field that suggests how many parallel flooding topologies all routers should use. 6.2. Flooding-Suppression TLV in IIHs A new Flooding-Suppression TLV in the IIH PDUs. This TLV is used to indicate to the neighbor if a router wants to suppress flooding over the adjacency. This new TLV holds three fields: o Flooding suppression suggestion field: this field indicates whether the sending router would like to suppress flooding over this interface or not. The value of this field is set to the current "suppression-local-request-state". Note, only when two routers both indicate they want to suppress flooding, then flooding will indeed be suppressed. o Resulting actual suppression field: this field indicates whether the sending router will or will not do flooding. The value of this field is set to the current "suppression-state" of the interface. This field is included only for debugging purposes. The first field (the received suppression-local-request-state Smit & Van de Velde Expires April 25, 2019 [Page 8] Internet-Draft IS-IS Sparse Link-State Flooding October 2018 field) is used to make the flooding decision. The result of that decision is announced in the second field. o The number of currently active flooding adjacencies. This field can be used by the receiving router to pick a flooding adjacency when there are multiple ECMP paths towards the anchor. A router can pick the upstream router with the least amount of flooding adjacencies. In dense networks with many parallel paths, this can help spreading out the load of flooding equally over multiple routers. Backward compatibility: when a router does not include the Flooding- State TLV in the IIHs it sends out, it can be treated as if that router included the Flooding-State TLV while setting the first field to: "I do not want to suppress flooding". 7. Operations of the new Sparse Link-State Flooding algorithm 7.1. Flooding at the anchor itself When a router is acting as the anchor, it floods over all its interfaces. It does include the Flooding-Suppression TLV in its IIHs, but it always sets the value inside the new TLV to "I do not want to suppress flooding". 7.2. New action after each SPF At the end of each SPF computation, a router looks at the best-path to reach the anchor-router. The router sets the "suppression-local- request-state" for that adjacency to false. The router sets the "suppression-local-request-state" for all other adjacencies to true. If the best-path to the anchor-router's is load-balanced over multiple adjacencies, the router picks one of those adjacencies as its own "upstream flooding adjacencies". A router must take effort to ensure it changes its "upstream flooding adjacency" as little as possible. Switching its upstream flooding adjacency is not without cost. Every time an adjacency changes from suppressed flooding to normal flooding, the LSDBs of the two routers must be synchronized. If the "suppression-local-request-state" changed for one or more adjacencies, compared to the state after the previous SPF computation, the router will re-compute the "suppression-state". If the "suppression-state" of an adjacency changes, the router will start or stop flooding over that adjacency. Smit & Van de Velde Expires April 25, 2019 [Page 9] Internet-Draft IS-IS Sparse Link-State Flooding October 2018 7.3. When sending a IIH When a router sends an IIH, it includes the new Flooding-Suppression TLV. For adjacencies that were selected as "upstream flooding adjacency", the value of the Flooding-Suppression TLV must be set to: "I do not want to suppress flooding". For all other adjacencies the value must be set to: "I do want to suppress flooding". 7.4. When receiving a IIH When a router receives an IIH, it checks for the existence of the new Flooding-Suppression TLV. If it there is none, the state of the neighbor is assumed to be: "I do not want to suppress flooding". If the "suppression-remote-request-state" changed for this adjacency, compared to the state after receiving the previous IIH, the router will re-compute the "suppression-state". If the "suppression-state" of an adjacency changes, the router will start or stop flooding over that adjacency. 7.5. When installing a new LSP in the LSDB When a router receives a new LSP, it installs it in the LSDB. It will normally then set the IS-IS SRM (Send Routing Message) bits for all adjacencies (in UP state). Now, with the new algorithm, it will set SRM-bits for only the adjacencies that are part of the reduced flooding topology. 7.6. Preventing loops in the flooding topology When the flooding topology changes, during a short period of time different routers can have a different view of the flooding topology. This can make the actual flooding topology in use be a random cyclic graph, instead of a non-cyclic tree. This is not a problem. The flooding algorithm in link-state protocols deals with this by default. An LSP is only (scheduled to be) flooded the first time it is received and installed in the LSDB. The Sparse Link-State Flooding algorithm has some resemblance to the Spanning Tree Protocol used by transparent bridges. Transient forwarding loops can be a huge problem in the operation of a SPT network. However, while the flooding topology can be looping for short periods of time, this is not a problem at all. Because as described in the previous paragraph, link-state flooding will take Smit & Van de Velde Expires April 25, 2019 [Page 10] Internet-Draft IS-IS Sparse Link-State Flooding October 2018 care of this by default. This works because routers keep copies of the LSPs they forward in their LSDB. This allows them to determine if they have received an LSP before or not. In STP routers have no recollection of data-frames that they have forwarded in the past. So in STP looping frames can not be recognized as looping. To improve convergence times during changes of the flooding topology it is recommended that when a router changes the state of an adjacency from flooding to non-flooding, both routers keep flooding over this adjacency for a short period of time. A suggested value for this is 30 or 60 seconds. By doing this, during changes of the flooding topology, both the old and the new topology will be in use. This guarantees that LSPs are flooded as quickly as possible. This will also help in repairing the flooding topology itself. 7.7. Fall-back to classic full flooding When a router thinks it might have got behind on flooding, it can always fall back to normal flooding behaviour. It omits including the Flooding-Suppression TLV from its IIHs. Consequently, classic flooding will allow guaranteed synchronization of its IS-IS LSDB with all neighbors. This can be done on all adjacencies at once, or on subset. 8. Security Considerations This draft introduces no new security considerations. 9. IANA Considerations This document requests a new TLV and sub-TLV for IS-IS. 10. References 10.1. Normative References [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997, . 10.2. Informative References [2] International Standard 10589, "Intermediate System to Intermediate System intra- domain routeing information exchange protocol for use in conjunction with the protocol for providing the connectionless-mode network service (ISO 8473), Second Edition.", 2002. Smit & Van de Velde Expires April 25, 2019 [Page 11] Internet-Draft IS-IS Sparse Link-State Flooding October 2018 [3] Li, T. and P. Psenak, "Dynamic Flooding on Dense Graphs", June 2018. [4] Smit, H. and G. Van De Velde, "IS-IS Flooding over TCP", October 2018. Authors' Addresses Henk Smit (editor) NL Email: hhw.smit@xs4all.nl Gunter Van de Velde Nokia Copernicuslaan 50 Antwerp BE Email: gunter.van_de_velde@nokia.com Smit & Van de Velde Expires April 25, 2019 [Page 12]