Network Working Group                                      A. Przygienda
Internet-Draft                                                   Juniper
Intended status: Standards Track                                  Y. Lee
Expires: March 12, 2020                                        A. Sharma
                                                                 Comcast
                                                                R. White
                                                                 Juniper
                                                       September 9, 2019


                            Flood Reflectors
                  draft-przygienda-flood-reflector-00

Abstract

   This document provides specification of an optional ISIS extension
   that allows to create l2 flood reflector topologies independent of
   resulting forwarding within L1 areas when they are used as 'transit'
   to guarantee L2 connectivity between L2 "islands".

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on March 12, 2020.

Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.


Przygienda, et al.       Expires March 12, 2020                 [Page 1]

Internet-Draft      draft-przygienda-flood-reflector      September 2019


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Description . . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Further Details . . . . . . . . . . . . . . . . . . . . . . .   6
   3.  Flood Reflection TLV  . . . . . . . . . . . . . . . . . . . .   7
   4.  Non-Forwarding Adjacency Sub-TLV  . . . . . . . . . . . . . .   7
   5.  Procedures  . . . . . . . . . . . . . . . . . . . . . . . . .   8
   6.  Adjacency Forming Procedures  . . . . . . . . . . . . . . . .   9
   7.  Special Considerations  . . . . . . . . . . . . . . . . . . .   9
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  10
   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  10
   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  10
   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  10
     11.1.  Informative References . . . . . . . . . . . . . . . . .  10
     11.2.  Normative References . . . . . . . . . . . . . . . . . .  11
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  11

1.  Description

   Due to the inherent properties of link-state protocols the number of
   IS-IS routers within a flooding domain is limited by processing and
   flooding overhead on each node.  While that number can be maximized
   by well written implementations and techniques such as exponential
   back-offs, IS-IS will still reach a saturation point where no further
   routers can be added to a single flooding domain.  In certain
   deployment scenarios of L2 backbones, this limit presents an
   obstacle.

   While the standard solution to increase the scale of an IS-IS
   deployement is to break it up into multiple L1 flooding domains and a
   single L2 backbone, and alternative way is to think about "multiple"
   L2 flooding domains connected via L1 flooding domains.  In such a
   solution, the L2 flooding domains are connected by "L1/L2 lanes"
   through the L1 areas to form a single L2 backbone again.  However, in
   the simplest implementation, this requires the inclusion of most, or
   all, of the transit L1 routers as L1/L2 to allow traffic to flow
   along optimal paths through such transit areas and with that


Przygienda, et al.       Expires March 12, 2020                 [Page 2]

Internet-Draft      draft-przygienda-flood-reflector      September 2019


   ultimately does not help to reduce number of L2 routers and increase
   the scalability of L2 backbone.


  +----+  +-------+            +-------+               +-------+  +----+
  | R1 |  |  00   +------------+  10   +---------------+  20   |  | R4 |
  | L2 +--+ L1/L2 |            |  L1   |               | L1/L2 +--+ L2 |
  |    |  |       +--------+ +-+       |  +------------+       |  |    |
  +----+  ++-+--+-+        | | +---+---+----------+    +-+--+-++  +----+
           | |  |          | | |   |   |  |       |      |  | |
           | |  |          | | |   |   |  |  +-----------+  | |
           | |  +-------+  | | |   |   |  |  |    |         | |
           | |          |  | | |   |   |  |  |    |  +------+ |
           | +------+ +--------+   |   +-------+  |  |        |
           |        | | |  | |     |      |  | |  |  |        |
  +----+  ++------+---+ |  +---+---+---+--+  | +-------+------++  +----+
  | R2 |  |  01   | |   |    | |  11   |     |    |  | |  21   |  | R5 |
  | L2 +--+ L1/L2 +------------+  L1   +---------------+ L1/L2 +--+ L2 |
  |    |  |       | |   |    | |       |     |    |  | |       |  |    |
  +----+  ++------+---+ |    | +---+--++     | +-------+------++  +----+
           |        | | |    | |   |  |      | |  |  |        |
           | +---------------+ |   |  |      | |  |  |        |
           | |      | | |      |   |  |      | |  |  |        |
           | |  +--------------+   |  +-----------------+     |
           | |  |   | | |          |         | |  |  |  |     |
  +----+  ++-+--+-+ | | +------+---+---+-----+ |  |  | ++-----++  +----+
  | R3 |  |  02   | +----------|  12   |       |  +----+  22   |  | R6 |
  | L2 +--+ L1/L2 |   +--------|  L1   +-------+     | | L1/L2 +--+ L2 |
  |    |  |       +------------+       |---------------+       |  |    |
  +----+  +-------+            +-------+-------------+ +-------+  +----+


                                 Figure 1

   Figure 1 is an example of a network where a topologically rich L1
   area is used to provide transit between six different routers in L2
   "partitions" (R1-R6).  To take advantage of the cornucopia of paths
   in the L1 transit, all the intermediate systems could be placed into
   both L1 and L2, but this essentially combines the separate L2
   flooding domains into a single one, triggering maximum L2 scale
   limitations again.

   A more effective solution would allow to reduce the number of links
   and routers exposed in L2, while still utilizing the full L1 topology
   when forwarding through the network.

   The mechanism described in [RFC8099] could be used in ISIS to build a
   full mesh of tunnels over the L1 transit, but a full mesh of tunnels


Przygienda, et al.       Expires March 12, 2020                 [Page 3]

Internet-Draft      draft-przygienda-flood-reflector      September 2019


   can also quickly limit the scaling.  The network in Figure 2 would
   expose 6 L1/L2 nodes and (5 * 6)/2 = 15 L2 tunnels.  In a slightly
   larger network, however, in a comparable topology containing 15 L1/L2
   edge nodes the number grows very quickly to 105 tunnels.


  +----+  +-------+    +-------------------------------+-------+  +----+
  | R1 |  |  00   |    |                               |  20   |  | R4 |
  | L2 +--+ L1/L2 +------------------------------------+ L1/L2 +--+ L2 |
  |    |  |       |    |                               |       |  |    |
  +----+  ++-+-+--+-+  |                             +-+--+---++  +----+
           | | |    |  |                             |    |   |
           | +----------------------------------------------+ |
           |   |    |  |                             |    | | |
           |   +-----------------------------------+ |    | | |
           |        |  |                           | |    | | |
           |     +----------------------------------------+ | |
           |     |  |  |                           | |      | |
  +----+  ++-----+- |  |                           | | -----+-++  +----+
  | R2 |  |  01   | |  |                           | | |  21   |  | R5 |
  | L2 +--+ L1/L2 +------------------------------------+ L1/L2 +--+ L2 |
  |    |  |       | |  |                           | | |       |  |    |
  +----+  ++------+------------------------------+ | | +----+-++  +----+
           |        |  |                         | | |      | |
           |        |  |                         | | |      | |
           |    +-------------------------------------------+ |
           |    |   |  |                         | | |        |
           |    |   |  |                         +----------+ |
           |    |   |  |                           | |      | |
           |    |   |  |                           +-----+  | |
           |    |   |  |                             |   |  | |
  +----+  ++----+-+-+  |                             +-+-+--+-++  +----+
  | R3 |  |  02   |    |                               |  22   |  | R6 |
  | L2 +--+ L1/L2 +------------------------------------+ L1/L2 +--+ L2 |
  |    |  |       |    |                               |       |  |    |
  +----+  +-------+----+                               +-------+  +----+


                                 Figure 2

   BGP, described in [RFC4271], faced a similar scaling problem, which
   has been solved in many networks by deploying BGP route reflectors,
   as described in [RFC4456].  And, to offer another crucial
   observation, BGP route reflectors do not necessarily need to be in
   the forwarding path.

   We suggest here a similar solution for IS-IS.  A good approximation
   of what a "flood reflector" approach would look like is shown in


Przygienda, et al.       Expires March 12, 2020                 [Page 4]

Internet-Draft      draft-przygienda-flood-reflector      September 2019


   Figure 3, where router 11 is used as 'reflector.'  All L1/L2 routers
   build an L2 tunnel to such reflectors, so we end up with only 6 L2
   tunnels instead of 15 of a full mesh.  Multiple such reflectors can
   be used, of course, allowing the network operator to balance between
   resilience, path utilization, and state in the control plane.  The
   resulting L2 tunnel scale is roughly R * n where R is the redundancy
   factor or in other words, number of flood reflectors used.  This
   compares quite favorably with n^2 / 2 tunnels used in a fully meshed
   L2 solution.


  +----+  +-------+                                    +-------+  +----+
  | R1 |  |  00   |                                    |  20   |  | R4 |
  | L2 +--+ L1/L2 +--------------+   +-----------------+ L1/L2 +--+ L2 |
  |    |  |       |              |   |                 |       |  |    |
  +----+  +-------+              |   |                 +-------+  +----+
                                 |   |
  +----+  +--------            --+---+--               --------+  +----+
  | R2 |  |  01   |            |  11   |               |  21   |  | R5 |
  | L2 +--+ L1/L2 +------------+  L1/L2+---------------+ L1/L2 +--+ L2 |
  |    |  |       |            |  FR   |               |       |  |    |
  +----+  +-------+            +-+---+-+               +-------+  +----+
                                 |   |
  +----+  +-------+              |   |                 +-------+  +----+
  | R3 |  |  02   +--------------+   +-----------------+  22   |  | R6 |
  | L2 +--+ L1/L2 |                                    | L1/L2 +--+ L2 |
  |    |  |       |                                    |       |  |    |
  +----+  +-------+                                    +-------+  +----+


                                 Figure 3

   This proposal, however, without further qualification would
   concentrate forwarded traffic at router 11.  It would be hence
   desirable to decouple the forwarding plane from the control plane, so
   router 11 can reflood information without being placed in the
   forwarding path (hence router 11 would not end up being a forwarding
   plane bottleneck).  To achieve that goal, multiple pieces will be
   necessary, only one of which is a local protocol extension on the L1/
   L2 leafs and the 'flood reflectors'.  In first approximation these
   extensions include:

   o  A full mesh of L1 tunnels between the L1/L2 routers, ideally load-
      balancing across all available L1 links.  This harnesses all
      forwarding paths between the L1/L2 edge nodes without injecting
      unneeded state into the L2 flooding domain or creating 'choke
      points' at the 'flood reflectors.'


Przygienda, et al.       Expires March 12, 2020                 [Page 5]

Internet-Draft      draft-przygienda-flood-reflector      September 2019


   o  A 'non-forwarding adjacency' for all the adjacencies built for the
      purpose of reflecting flooding information.  This allows these
      'flood reflectors' to participate in the IS-IS control plane
      without being used in the forwarding plane.  This is a purely
      local operation on the L1/L2 ingress; it does not require
      replacing or modifying any routers not involved in the reflection
      process.

   o  Some system to support reflector redundancy, and potentially some
      way to auto-discover and advertise such adjacencies as non-
      forwarding.  This may allow L2 nodes outside the L1 to perform
      optimizations in the future based on this information.

2.  Further Details

   Several considerations should be noted in relation to such a flood
   reflection mechanism.

   First, this allows multi-area IS-IS deployments to scale without any
   major modifications in the IS-IS implementation on most of the nodes
   deployed in the network.  Unmodified (traditional) L2 routers will
   compute reachability across the transit L1 area using the non-
   forwarding adjacencies.

   Second, the flooding reflectors are not required to participate in
   forwarding traffic through the L1 transit area.  These flooding
   reflectors can be hosted on virtual devices outside the forwarding
   topology.

   Third, astute readers will realize that flooding reflection may cause
   the use of suboptimal paths.  This is similar to the BGP route
   reflection suboptimal routing problem described in
   [ID.draft-ietf-idr-bgp-optimal-route-reflection-19].  The L2
   computation determines the egress L1/L2 and with that can create
   illusions of ECMP where there is none.  And in certain scenarios lead
   to an L1/L2 egress which is not globally optimal.  This represents a
   straightforward instance of the trade-off between the amount of
   control plane state and the optimal use of paths through the network
   often encountered when aggregating routing information.

   One possible solution to this problem is to expose additional
   topology information into the L2 flooding domains.  In the example
   network given, links from router 01 to router 02 can be exposed into
   L2 even when 01 and 02 are participating in flood reflection.  This
   information would allow the L2 nodes to build 'shortcuts' when the L2
   flood reflected part of the topology looks more expensive to cross
   distance wise.


Przygienda, et al.       Expires March 12, 2020                 [Page 6]

Internet-Draft      draft-przygienda-flood-reflector      September 2019


   Another possible variation is for an implementation to approximate
   with the L1 tunnel cost the cost of the underlying topology.

   Redundancy in the solution is trivial to achieve by building multiple
   flood reflectors into the L1 area while all reflectors are still
   remaining completely stateless and do not need any kind of
   synchronized algorithms amongst themselves except standard ISIS
   flooding procedures and database.

3.  Flood Reflection TLV

   The Flood Reflection TLV is indicating the participation of a node as
   reflector and/or client.  It is included in L1 area scope flooded
   LSPs and on L1 and L2 IIH.


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |    Length     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Priority    | FR Cluster ID |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   Type:  TBD

   Length  The length, in octets, of the following fields.

   Reflector Priority  Priority of the router to act as flood reflector
      in the cluster.  A value of 0 indicates that the router is a
      client in the cluster.  Any value higher than 0 indicates
      preference to be a flood reflector.  Higher values are to be
      preferred by clients.

   FR Cluster ID  Flood Reflector Cluster Identifier to allow a node to
      participate in possibly multiple clusters.

4.  Non-Forwarding Adjacency Sub-TLV


Przygienda, et al.       Expires March 12, 2020                 [Page 7]

Internet-Draft      draft-przygienda-flood-reflector      September 2019


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |    Length     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | FR Cluster ID |
   +-+-+-+-+-+-+-+-+


   Type:  TBD

   Length  The length, in octets, of the following fields.

   FR Cluster ID  Flood Reflector Cluster Identifier to which this NFA
      belongs.

5.  Procedures

   There are a number of points to consider when implementing and
   deploying this solution, including:

      A router participating in flood reflection MUST be configured as
      L1L2 router.  It originates the Flood Reflection TLV with area
      flooding scope in L1 only.  Normally routers on the edge of the
      area, i.e. with non-FR L2 adjacencies, will advertise themselves
      as clients.  Any L1L2 non-client router in the area can act as FR.

      A flood reflector can participate in a single cluster only, the
      clients are free to participate in multiple clusters at the same
      time.

      Upon reception of a Flood Reflection TLV router acting as client
      (in case it doesn't have such L2 adjacencies already) MUST
      initialize tunnels towards all the FRs with highest priority and
      MAY initiate such tunnels to FRs with lower priority.  L2 over
      such tunnels MUST be marked as non-forwarding adjacencies.  If the
      client has a direct L2 adjacency with the flood reflector it
      SHOULD use it instead of instantiating a tunnel.

      Upon reception of a Flood Reflection TLV router acting as client
      in case it doesn't have such direct L1 adjacencies already SHOULD
      initialize tunnels towards all the other clients in the its
      clusters.  L1 *only* adjacencies SHOULD be built over such tunnels
      to ensure their liveliness, but other means can be used (since
      those adjacencies are used for L1 forwarding, it is prudent to
      advertise them into L1 as forwarding links).


Przygienda, et al.       Expires March 12, 2020                 [Page 8]

Internet-Draft      draft-przygienda-flood-reflector      September 2019


      On the reflection client, after L2 and L1 computation, all non-
      forwarding adjacencies used as next-hops for L2 routes MUST be
      examined and replaced with the correct L1 tunnel next-hop to the
      egress.  Due to the rules in Section 6 the computation in the
      resulting topology is relatively simple, the L2 SPF from a flood
      reflector client is guaranteed to reach within a hop the FR and in
      the following hop the L2 egress to which it has a L1 forwarding
      tunnel.  However, if the topology has L2 paths which are not route
      reflected and look "shorter" than path through the FR then the
      computation will have to track the egress out of the L1 domain by
      a more advanced algorithm.

      A node, when advertising the L2 NFA SHOULD include the Non-
      Forwarding Adjacency Sub-TLV in Extended IS reachability TLV and
      MT-ISN TLV.

6.  Adjacency Forming Procedures

   To ensure loop-free routing the ingress routers MUST follow normal L2
   computation to generate L2 routes.  This is because nodes outside the
   L1 area may not be aware that flooding reflection is performed.  The
   resulting short cuts through the L1 area needs to be able to easily
   calculate the egress L1/L2 router where the tunnel tail-end is
   located.

   To prevent complex scenarios of flood reflectors building L2
   adjacencies within a cluster or across clusters or hierarchies of
   reflectors, a flood reflector MUST never form an L2 adjacency with a
   peer if the peer is not a client in the same Cluster ID.  This
   ensures a L2 computation on an ingress link or adjacency following a
   non-forwarding adjacency will always traverse a client of the flood
   reflector to exit the flooding domain.  This allows shortcuts through
   the L1 area to be used without any danger of forwarding loops.

   Depending on pseudo-node choice in case of a broadcast domain with
   multiple flood reflectors attached this can lead to a partitioned LAN
   and hence a router discovering such a condition MUST initiate an
   alarm and declare misconfiguration.

7.  Special Considerations

   In pathological cases setting the overload bit in L1 (but not in L2)
   can partition L1 forwarding, while allowing L2 reachability through
   non-forwarding adjacencies to exist.  In such a case a node cannot
   replace a route through non-forwarding adjacency with a L1 shortcut
   and the client can use the L2 tunnel to the flood reflector for
   forwarding while it MUST initiate an alarm and declare
   misconfiguration.


Przygienda, et al.       Expires March 12, 2020                 [Page 9]

Internet-Draft      draft-przygienda-flood-reflector      September 2019


   A flood reflector with directly L2 attached prefixes should advertise
   those in L1 as well since based on preference of L1 routes the
   clients will not try to use the L2 non-forwarding adjacency to route
   the packet towards them.  A very, very corner case is when the flood
   reflector is reachable via L2 non-forwarding adjacency (due to
   underlying L1 partition) only in which case the client can use the L2
   tunnel to the flood reflector for forwarding towards those prefixes
   while it MUST initiate an alarm and declare misconfiguration.

   Instead of modifying the computation procedures one could imagine a
   flood reflector solution where the FR would re-advertise the L2
   prefixes with a 'third-party' next-hop but that would have less
   desirable convergence properties than the solution proposed and force
   a fork-lift of all L2 routers to make sure they disregard such
   prefixes unless in the same L1 domain as the FR.

8.  IANA Considerations

   This document will request IANA to assign new TLV type value in the
   ISIS TLV Codepoints registry.

   This document will request IANA to assign new TLV type value in the
   'Sub-TLVs for TLVs 22, 23, 25, 141, 222, and 223 (Extended IS
   reachability, IS Neighbor Attribute, L2 Bundle Member Attributes,
   inter-AS reachability information, MT-ISN, and MT IS Neighbor
   Attribute TLVs)' registry.

9.  Security Considerations

   This document introduces no new security concerns to ISIS or other
   specifications referenced in this document.

10.  Acknowledgements

   Thanks to Shraddha and Chris Bowers for thorough review.

11.  References

11.1.  Informative References

   [ID.draft-ietf-idr-bgp-optimal-route-reflection-19]
              Raszuk et al., R., "BGP Optimal Route Reflection", July
              2019.

   [RFC4271]  Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
              Border Gateway Protocol 4 (BGP-4)", RFC 4271,
              DOI 10.17487/RFC4271, January 2006,
              <https://www.rfc-editor.org/info/rfc4271>.


Przygienda, et al.       Expires March 12, 2020                [Page 10]

Internet-Draft      draft-przygienda-flood-reflector      September 2019


   [RFC4456]  Bates, T., Chen, E., and R. Chandra, "BGP Route
              Reflection: An Alternative to Full Mesh Internal BGP
              (IBGP)", RFC 4456, DOI 10.17487/RFC4456, April 2006,
              <https://www.rfc-editor.org/info/rfc4456>.

   [RFC8099]  Chen, H., Li, R., Retana, A., Yang, Y., and Z. Liu, "OSPF
              Topology-Transparent Zone", RFC 8099,
              DOI 10.17487/RFC8099, February 2017,
              <https://www.rfc-editor.org/info/rfc8099>.

11.2.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

Authors' Addresses

   Tony Przygienda
   Juniper
   1137 Innovation Way
   Sunnyvale, CA
   USA

   Email: prz _at_ juniper.net


   Yiu Lee
   Comcast
   1800 Bishops Gate Blvd
   Mount Laurel, NJ  08054
   US

   Email: Yiu_Lee _at_ comcast.com


   Alankar Sharma
   Comcast
   1800 Bishops Gate Blvd
   Mount Laurel, NJ  08054
   US

   Email: Alankar_Sharma _at_ comcast.com


Przygienda, et al.       Expires March 12, 2020                [Page 11]

Internet-Draft      draft-przygienda-flood-reflector      September 2019


   Russ White
   Juniper
   1137 Innovation Way
   Sunnyvale, CA
   USA

   Email: russw _at_ juniper.net


Przygienda, et al.       Expires March 12, 2020                [Page 12]