OSPF A. Smirnov Internet-Draft Cisco Systems, Inc. Intended status: Standards Track April 10, 2015 Expires: October 12, 2015 OSPF for large-scale networks with regular topologies draft-smirnov-ospf-dive-01 Abstract Many popular topologies for large-scale networks have highly regular structure with distinctive design pattern. Examples of such topologies include hub-and-spoke (also known as "star") common in enterprise WAN networks, fat-tree and Clos topologies common in datacenters. For number of reasons in such large-scale networks distance-vector protocols perform better than OSPF. On the other hand network backbones have no highly regular topology pattern and there OSPF outperforms distance-vector protocols. As a result large- scale networks frequently employ different routing protocols in different regions of the network, complicating network operations. This document proposes OSPF extensions to improve scalability of routing for large-scale networks. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on October 12, 2015. Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. Smirnov Expires October 12, 2015 [Page 1] Internet-Draft OSPF Routing in large-scale networks April 2015 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 3. Problem definition . . . . . . . . . . . . . . . . . . . . . 4 3.1. Typical regular network topologies . . . . . . . . . . . 4 3.1.1. Hub-and-spoke topology . . . . . . . . . . . . . . . 4 3.1.2. Fat-tree topology . . . . . . . . . . . . . . . . . . 5 3.1.3. Clos topology . . . . . . . . . . . . . . . . . . . . 5 3.2. Problems with OSPF routing in large-scale networks . . . 5 4. Solution requirements . . . . . . . . . . . . . . . . . . . . 7 5. Functional summary . . . . . . . . . . . . . . . . . . . . . 8 6. Protocol Details . . . . . . . . . . . . . . . . . . . . . . 10 6.1. The DIVE area . . . . . . . . . . . . . . . . . . . . . . 10 6.2. Hello packets and the database exchange on DIVE area interfaces . . . . . . . . . . . . . . . . . . . . . 11 6.3. LSA generation into the DIVE area . . . . . . . . . . . . 12 6.3.1. Metric Sub-TLV . . . . . . . . . . . . . . . . . . . 13 6.4. SPF calculation in the DIVE area . . . . . . . . . . . . 14 6.5. Translation of LSAs and route propagation . . . . . . . . 15 6.5.1. Hub routers: Propagation of routes from the core network into the DIVE area . . . . . . . . . . . . . 16 6.5.2. Hub routers: Propagation of routes from the DIVE area into the backbone area . . . . . . . . . . . . . . . 17 6.5.3. Hub routers: Propagation of routes from the DIVE area into the non-backbone area . . . . . . . . . . . . . 18 6.5.4. Route propagation on Spoke routers . . . . . . . . . 19 7. Other considerations for the DIVE area . . . . . . . . . . . 20 7.1. Routing considerations . . . . . . . . . . . . . . . . . 20 7.2. LSDB size considerations . . . . . . . . . . . . . . . . 20 7.3. Optimal DIVE area design . . . . . . . . . . . . . . . . 21 8. Backward Compatibility . . . . . . . . . . . . . . . . . . . 21 9. Security Considerations . . . . . . . . . . . . . . . . . . . 21 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 22 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 12.1. Normative References . . . . . . . . . . . . . . . . . . 22 12.2. Informative References . . . . . . . . . . . . . . . . . 22 Smirnov Expires October 12, 2015 [Page 2] Internet-Draft OSPF Routing in large-scale networks April 2015 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 23 1. Introduction OSPF is a link-state protocol which was designed to provide routing in networks of arbitrary topology. Big modern networks may have thousands of routers providing the same type of service. To simplify network design and operations as well as to unify hardware and software configurations of routers such networks are frequently built by replicating a basic design element hundreds and thousands of times. Resulting network has highly regular topology exhibiting a distinctive pattern. Such regular designs include hub-and-spoke topology common in enterprise networks, "fat-tree" and Clos topologies common in data center networks. Running routing protocols in such networks poses number of problems arising mostly from the very large number of routers in the network. On the other hand, regular pattern of the topology allows certain simplifications. OSPF (and link state protocols in general) can be used to provide routing in networks with regular topologies but it does not make any use of the regularity. This makes OSPF especially vulnerable to the elements of the large scale. Real-life networks combine regions of regular topologies with (smaller scale) regions of free topology where OSPF works the best. Continuing examples above, these are the headquarter (HQ) network of the enterprise or interconnections between datacenters. For operational simplicity it is desirable to have the same routing protocol running in both parts of the network. The present document specifies extensions to OSPF to improve its scalability in the very large scale networks with regular topologies. Section 3 of this document details typical problems seen if OSPF is used as the routing protocol in the large scale networks with regular topologies. Section 4 lists requirements for the routing solution. Section 5 gives brief overview of the solution. Section 6 defines details of the new protocol behavior. 2. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Smirnov Expires October 12, 2015 [Page 3] Internet-Draft OSPF Routing in large-scale networks April 2015 3. Problem definition 3.1. Typical regular network topologies 3.1.1. Hub-and-spoke topology Hub-and-spoke topology (also frequently called a "star" topology) comes in many variants but they all possess set of common properties allowing to group them into a single category: o A few (usually one or two) routers with central location which on one side connect to each remote site by a point-to-point Layer-2 connection and on another side to the network backbone. These routers are usually referred as 'hubs'. o Remote site may have a single router possibly dual-homed to hub routers or multiple routers each connecting to a hub router. These routers are frequently referred as 'spokes'. Internally site's network may consist of multiple devices but typically it is relatively small and simple. To the outside world site's internal reachability is described by just a few prefixes. o Connection between hubs and spokes is the only connection between a site and rest of the network. All traffic going in/out of the site flows through the hubs. Inter-spoke traffic is minimal and may even be blocked, for example for security reasons. o Hub-and-spoke network usually provides some sort of redundancy. Redundant hub routers and redundant connections to sites are common; redundant spoke routers are not unusual. o Number of sites/spokes may be very large requiring each hub router to handle thousands of spoke neighbors. o All sites share the same routing administrative policy, i.e. the same route filtering and summarization rules. On hub routers hub-and-spoke layer-2 connections may be presented to OSPF as either single point-to-multipoint interface with large number of neighbors or as multiple point-to-point Layer-3 interfaces. Examples of layer-2 technologies commonly used to create hub-and- spoke networks are (from historical to modern technologies): SMDS, ISDN, Frame-Relay, IPsec VPN tunnels, VPLS. Hub-and-spoke topology is the most common topology for building enterprise WAN networks. It naturally maps into HQ/branch enterprise Smirnov Expires October 12, 2015 [Page 4] Internet-Draft OSPF Routing in large-scale networks April 2015 structure and provides optimal link capacity for traffic flowing between branches and centralized services located in the HQ. 3.1.2. Fat-tree topology Fat-tree topology is hierarchical structure resembling tree turned upside down with link capacity growing from the "leafs" up toward the "roots". This topology is common in datacenter designs. Since parts of the tree may run Layer-2 switching, IP routing protocols usually 'see' 2 or 3 levels in the tree hierarchy. Typically lower level nodes have redundant connections to the upper level. This topology may be seen as hub-and-spoke topology where each spoke site in turn has the tree-like organization. 3.1.3. Clos topology Clos topology is multistage switching network where each node in a stage connects to multiple nodes in previous and subsequent stages. This is another topology frequently used in datacenters. The key advantage of this topology is that number of elements in each stage and number of links connecting an element with previous and subsequent stages may be chosen in such way that all upstream and downstream links have the same capacity. Clos network always provides multiple same-cost paths between each pair of leaf nodes. On a node number of ECMP paths for each destination would be equal to number of connections to elements in the next stage of the Clos topology. In real-life deployments there may be as many as several dozens of ECMP paths for each routing destination prefix. 3.2. Problems with OSPF routing in large-scale networks Hub-and-spoke networks pose challenges to routing scalability: o All spoke sites placed into single OSPF area will, by the virtue of link-state protocol, receive full topology information describing each spoke in the area and its connection to hubs. This information is at best excessive as all links from the site go to hubs and knowledge of many other spoke links in the area cannot reveal alternative paths to destinations outside of the site. Worse yet, spoke routers are usually much smaller devices than hub routers and they are intended to serve tiny site network with a few routes and light traffic; spoke routers do not have sufficient resources (either memory or CPU) to hold and process the same link-state database as hub routers. Distance-vector protocols in the same topology propagate only prefix reachability information and do not tax spoke routers with topology view. Smirnov Expires October 12, 2015 [Page 5] Internet-Draft OSPF Routing in large-scale networks April 2015 o Inter-site visibility may be undesirable to decrease size of the routing table on spoke devices or because of security reasons. Link-state protocols do not allow routing information to be filtered within area flooding scope. To compare, distance-vector protocols allow route filtering and summarization on per-neighbor basis. o Having full topology visibility within an area may also lead hub routers to compute suboptimal paths. Consider example hub-and- spoke network with two hubs A and B and two spoke sites S1 and S2. Each spoke site is dual-connected to both hubs. Hubs A and B are Area Border Routers (ABRs) between hub-and-spoke WAN and the backbone. If link between hub A and site S1 goes down then A will choose (or at least consider) intra-area WAN route A -> S2 -> B -> S1. But typically a spoke site (S2 in this example) must not be considered for any transit traffic. o Size and stability of Router LSA of hub routers is also problematic due to very large number of point-to-point links in it describing connections to all spokes in the area. Both the size and probability of need to rebuild the Router LSA grow directly proportional to N, number of spokes in the area. Thus overall CPU resources consumed by flooding and processing hub's Router LSA grow as O(N^2). o Alternative network design is to separate each spoke into area of its own. This solves problems of spoke routers but transfers cumulative burden of supporting multiple areas to hub routers. This design requires hub router to be able to support thousands of NSSA areas, originate as many router LSAs, translate multiple LSAs from/into each area. Managing common route filtering and summarization policy is also difficult. Some problems described above call for designing as small areas as possible, while others vice verse are resolved by designing big areas. In a relatively small network it is possible to find a sensible design compromise but as number of spokes grows to thousands finding working compromise becomes more and more challenging and the balance becomes more and more fragile. As noted in Section 3.1.2, fat-tree topology can be viewed as a particular case of the hub-and-spoke topology. For this reason many problems described above for hub-and-spoke networks are equally affecting fat-tree networks. Clos networks add one more problem specific to this topology: Smirnov Expires October 12, 2015 [Page 6] Internet-Draft OSPF Routing in large-scale networks April 2015 o Section 3.1.3 underlined that Clos networks provide massive equal- cost multipth for most destinations. When a link goes down this rarely means that the node lost connectivity to any destination. OSPF in this situation rebuilds its Router LSA and floods it to all routers in the area. Ensuing SPF on all nodes in the area will result in no change in the routing on all routers not connected to the problematic link. For comparison, distance- vector routing protocols detect that there were no change in reachability of prefixes or their metrics and hence no update is sent to neighbors. As can be seen from these examples full knowledge of topology within an area, what is a key property of link state protocols and works so well in networks with arbitrary topology, becomes the biggest factor limiting routing scalability in networks with regular topologies. For this reason distance-vector protocols are the tool of choice for network designer working with large hub-and-spoke networks. Factors specific to networks with regular topologies, such as link between hub and spoke being the only connectivity to rest of the network and low number of prefixes advertised in each direction, negate convergence slowness which affects distance-vector protocols in more complex topologies. 4. Solution requirements For OSPF solution to be as scalable as distance-vector protocol these design goals were taken into consideration: o Spoke routers MUST be protected from routing information sent from/to other spoke routers unless explicitly required by network's policy. o Solution MUST require no modification to OSPF routers other than those connected to hub-and-spoke network itself. o Number of LSAs which hub router originates MUST NOT grow faster than O(N) where N is number of spoke sites and preferably should not depend on N. Size of LSAs MUST NOT depend on N. o Solution MUST protect against routing loops should a spoke site becomes connected to another site and/or to rest of the network. Supporting such configuration is not a goal and such configuration looses important property defining spoke site of hub-and-spoke networks. But network must be protected from routing meltdown in case of accidental misconfiguration. o Solution SHOULD provide easy and scalable way to apply common administrative routing policy via centralized configuration. Smirnov Expires October 12, 2015 [Page 7] Internet-Draft OSPF Routing in large-scale networks April 2015 5. Functional summary Area Border Routers in OSPF propagate inter-area routing information by announcing reachability and routing metrics. Thus inter-area and external routes are announced in OSPF as in distance-vector routing protocols. The solution satisfying requirements laid in the Section 4 is to create new type of OSPF area which is devoid of LSAs carrying topology information (i.e. Router and Network LSAs). The routing information in this area is propagated only by prefix LSAs flooded with link-local scope. In many aspects such area behaves similarly to distance-vector routing protocols. Due to this the area type is called DIstance-VEctor or DIVE area. In hub-and-spoke network DIVE area would cover only links between hubs and spoke routers. Other interfaces on hub routers will be placed into a regular OSPF area carrying link-state routing information. This may or may not be the backbone area. Further in this document this area and network behind it is called "the core network". Spoke routers are connected to area covering the site network. Example hub-and-spoke network using DIVE area is depicted on Figure 1. Smirnov Expires October 12, 2015 [Page 8] Internet-Draft OSPF Routing in large-scale networks April 2015 ............. ........... . DIVE . . . . +-----+ Site 1 . . area | | . . _____|Spoke| NSSA . . / | |. . ............. ............. . / +-----+ ......... . . . . . / . . . . . . / . . +-----+ +-----+ / +-----+ . Backbone | | Regular | |/ | | ......... . | ABR | | Hub |___________|Spoke|. . . area 0 | | area | |\ | | . . +-----+ +-----+ \ +-----+ . . . . . . \ . . Site 2 . ............. ............. . \ +-----+ . . \ | | NSSA . . \_____|Spoke|. . . | | ......... . +-----+ . . ............. Figure 1 Note that network using DIVE area design resembles [RFC4364] BGP/MPLS VPN network where OSPF is used as PE-CE protocol [RFC4577]. The DIVE area is analogue of BGP 'superbackbone' in MPLS VPN network and hub and spoke routers are analogues of MPLS VPN PE routers. DIVE area does not have LSAs with area scope flooding. This solves problems related to excessive visibility of routing information where it is undesirable. There is also no visibility or reachability of routers other than immediately connected neighbors. Each router in the DIVE area is ABR and there is no concept of a router internal to the DIVE area. In this sense DIVE area is always a one-hop area. Since DIVE area deviates from traditional one level hierarchy of OSPF areas (backbone area/all other areas connected to it) it must employ strict rules of accepting and propagating routing information to prevent routing information looping. These rules are further discussed in Section 6.5. ABRs connected to DIVE area perform translation of LSAs from and into the DIVE area similar to translation of NSSA LSA into External LSA on ABR between NSSA and backbone areas. Routing information propagated throughout DIVE area is encoded into Prefix Attribute LSAs for OSPFv2 ([I-D.ietf-ospf-prefix-link-attr]) Smirnov Expires October 12, 2015 [Page 9] Internet-Draft OSPF Routing in large-scale networks April 2015 and Extended Prefix LSAs for OSPFv3 ([I-D.ietf-ospf-ospfv3-lsa-extend]). In both cases 'old' style LSAs carrying either topological or routing information are not originated or flooded into the DIVE area. 6. Protocol Details 6.1. The DIVE area Current specification defines new type of OSPF area called DIVE area. There is no concept of router internal to the DIVE area, all DIVE area routers are ABRs. At the time of configuring DIVE area, role of the router in the area must be provided as configuration parameter. Currently supported DIVE area router roles are Hub and Spoke. Router role has important implications during translation of routes between the DIVE area and other connected area(s). Details of LSA translation are covered in Section 6.5. In the subsequent text terms 'Hub' and 'Spoke' (with capital letter) are used to denote routers' configured role in the DIVE area while terms 'hub' and 'spoke' (with small letter) describe position of the router in the hub-and-spoke topology. Router of any role may be connected to multiple DIVE areas. Hierarchical DIVE areas are not defined by the current specification. In other words, router's role in all connected DIVE areas SHOULD be the same. Hub router must be connected to either backbone or non-backbone regular area. Hub router cannot be connected to either stub or NSSA area. Site area (or areas) connected to the Stub router must be a non- backbone regular area, NSSA or stub area. Note that route filtering and summarization are best to be applied on the hub routers. This will both protect the high-scale DIVE area from flooding unnecessary information and provide centralized location to manage the route filtering/summarization policy on a few hub routers rather than on many spokes. So stub and NSSA areas on spoke sites would provide limited benefit comparing to regular non-backbone area and SHOULD be used only if there exist direct spoke-to-spoke neighborships between some sites. Virtual links through DIVE area are not supported. Routers connected to the DIVE area MUST support Prefix Attribute LSAs for OSPFv2 ([I-D.ietf-ospf-prefix-link-attr]) and Extended LSAs for OSPFv3 ([I-D.ietf-ospf-ospfv3-lsa-extend]). Smirnov Expires October 12, 2015 [Page 10] Internet-Draft OSPF Routing in large-scale networks April 2015 6.2. Hello packets and the database exchange on DIVE area interfaces All routers connected to the DIVE area must agree on the area's configuration and learn roles of neighbors. Roles of the local router and neighbor determine LSA translation and route propagation. Section 6.5 details these rules. Router's role is advertised in Hello packets sent on interfaces in the DIVE area. Two new bits, called DV-bits, are used to encode router's role in the DIVE area. DV-bits are allocated in the Extended Options and Flags LLS TLV for OSPFv2 [RFC5613] and in the Options field of OSPFv3: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+--+ | | | | | | | | | | | | | | | | | | | | | | | | | | |DV |F|I|RS|LR| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+--+ Bits in Extended Options and Flags TLV 0 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+-+-+--+-+-+--+-+--+ | | | | | | | | | | | | | DV|L|AF|*|*|DC|R|N|MC|E|V6| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+-+-+--+-+-+--+-+--+ OSPFv3 Options Field Meaning of DV bits is defined as: 0 0 - Interface of sending router does not belong to DIVE area 0 1 - Sending router has role Hub 1 0 - Sending router has role Spoke 1 1 - Reserved When Hello packet is received from previously unknown neighbor DV bits are checked to see if neighbor's interface belongs to a DIVE area. If neighbor advertises any of DV bits set and receiving interface does not belong to a DIVE area OR if both DV bits advertised by the neighbor are clear and receiving interface belongs to a DIVE area then received packet MUST be silently discarded. DV bits advertised by the neighbor must be stored in neighbor's data structure and compared when receiving subsequent Hello packets from Smirnov Expires October 12, 2015 [Page 11] Internet-Draft OSPF Routing in large-scale networks April 2015 the neighbor. Change in advertised DV bits MUST generate BadLSReq Neighbor FSM event. Processing this event will cause adjacency with the neighbor to be reset and LSDB exchange to [re]start. LS database of DIVE area may contain only Opaque LSAs for OSPFv2 and Extended LSAs for OSPFv3. LSA types defined in [RFC2328] and [RFC5340] are not flooded into the DIVE area, including AS External LSAs with the domain flooding scope. OSPFv2 opaque LSAs with domain flooding scope are not flooded into DIVE areas. OSPFv3 flooding of unknown LSA types is performed as described by [RFC5340]. Choosing neighbors to establish the full adjacency or to stop neighborship formation at the 2-Way Neighbor FSM state does NOT depend on DIVE area roles of the local router and of the neighbor and works as described in [RFC2328]. On broadcast and NBMA interfaces of Spoke routers in the DIVE area implementation SHOULD have Router Priority by default set to 0. If during the LS database exchange with neighbor in DIVE area router receives Database Description packet describing LSA of a type not allowed in the DIVE area then SeqNumberMismatch Neighbor FSM event MUST be generated and LSDB exchange must restart. If OSPF interface type is broadcast then implementation SHOULD support Incremental Hellos as described by [RFC5820]. If Incremental Hellos are supported then they MUST be enabled by default on broadcast interfaces in DIVE area. On point-to-multipoint interfaces Hub routers SHOULD default to sending unicast Hellos to discovered neighbors rather than sending multicast Hello packets listing all known neighbors. 6.3. LSA generation into the DIVE area Following types of LSAs containing listed TLV types may be originated into the DIVE area: For OSPFv2 (see [I-D.ietf-ospf-prefix-link-attr]): o OSPFv2 Extended Prefix Opaque LSA * OSPFv2 Extended Prefix TLV Extended Prefix Opaque LSA MUST have LSA Type 9. Extended Prefix TLV as defined by [I-D.ietf-ospf-prefix-link-attr] may advertise attributes for several route types. Only following route types may be present in Extended Prefix TLV in LSAs originated into the DIVE area: Smirnov Expires October 12, 2015 [Page 12] Internet-Draft OSPF Routing in large-scale networks April 2015 1 - Intra-Area 3 - Inter-Area 5 - AS External This specification defines one new sub-TLV of OSPFv2 Extended Prefix TLV - Metric Sub-TLV, see Section 6.3.1. For OSPFv3 (see [I-D.ietf-ospf-ospfv3-lsa-extend]): o E-Intra-Area-Prefix-LSA o E-Inter-Area-Prefix-LSA o E-AS-External-LSA o E-Link-LSA Extended Prefix LSAs may contain following TLV types: o 6 - Intra-Area Prefix TLV o 3 - Inter-Area Prefix TLV o 5 - External Prefix TLV All Extended prefix LSAs originated into the DIVE area MUST have link-local flooding scope. Thus their LSA types will be: LSA function code LS Type Description ----------------- ------- ----------------------- 35 0x8023 E-Inter-Area-Prefix-LSA 37 0x8025 E-AS-External-LSA 40 0x8028 E-Link-LSA 41 0x8029 E-Intra-Area-Prefix-LSA 6.3.1. Metric Sub-TLV One new sub-TLV is defined for OSPFv2 to carry metric of the route. This is required because in the DIVE area Extended Prefix Opaque LSAs do not accompany [RFC2328] LSAs and must carry all route information. The Metric Sub-TLV is a Sub-TLV of the OSPF Extended Prefix TLV defined in [I-D.ietf-ospf-prefix-link-attr]. It MAY appear more than once in the top level TLV and has the following format: Smirnov Expires October 12, 2015 [Page 13] Internet-Draft OSPF Routing in large-scale networks April 2015 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E| MT-ID | Metric | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where: Type: TBD, suggested value is 3. Length: 4. E: one bit field. For AS External routes this defines the type of external metric. Its function and meaning are fully analogous to the E-bit of Type-5 LSA [RFC2328]. This bit is always 0 for intra- and inter-area route metrics. MT-ID: Multi-Topology ID (as defined in [RFC4915]). Metric: The cost of this route. For inter-area and external routes all 24 bits of the field may be used to encode route metric. For intra-area routes upper 8 bits must be 0, thus valid metric for intra-area route is in the range 1 to 2^16-1. If more than one instance of the Metric Sub-TLV is present in the Extended Prefix TLV then each instance MUST describe metric with different Multi-Topology ID. 6.4. SPF calculation in the DIVE area Intra-area SPF calculation within the DIVE area is reduced to walking the list of neighbors in the area and adding neighbors which have reached the FULL adjacency state to the table of routers' reachability. Each reachable router is marked to be both ABR and ASBR. Cost of the routing table entry is equal to the cost of local interface associated with the neighbor. Unlike of other area types in the DIVE area routers which do not have fully established adjacency between them do not have valid intra-area path to reach each other. Calculation of inter-area and AS External routes follows algorithms described in [RFC2328] for OSPFv2 and [RFC5340] for OSPFv3 with following caveat. OSPFv2 Extended Prefix LSA does not provide ordering of prefixes by prefix types. Hence there are no separate phases of computing inter-area and then AS external routes. Instead, Smirnov Expires October 12, 2015 [Page 14] Internet-Draft OSPF Routing in large-scale networks April 2015 all Extended Prefix LSAs and all Extended Prefix TLVs in them are examined in turn and type of calculated route is determined by the Route Type field of the Extended Prefix TLV being examined. 6.5. Translation of LSAs and route propagation Each router connected to a DIVE area is an Area Border Router and will originate LSAs into connected non-DIVE areas to describe reachability of prefixes received via the DIVE area. And vice verse, it will originate LSAs into the DIVE area to describe reachability of routes learned via other connected areas. Moreover, in DIVE area all LSAs propagating routing information have link-local scope. In those cases where routing information should propagate between routers which do not have direct adjacency, intermediate routers will originate their own LSAs carrying routing information one hop further. Accordingly to distance-vector routing principles metric of such routes will be increased to reflect cost of the path to reach destination from the router originating LSA. There are two cases when routing information has to be re-advertised within the DIVE area: o If inter-spoke site traffic is not prohibited then hub routers must advertise to spokes inter-spoke routing information. This may be either in the form of summarized routing information covering multiple spoke sites (including advertisement of default route) or in the form of non-summarized routing information hub received from spoke routers. In the latter case hub would re- advertise in the DIVE area routing information received from spoke neighbors in the area. o DIVE area is attached to the core network via redundant hub routers and hubs advertise into the network summarized routing information covering multiple site prefixes. If link between one of hubs and a spoke site is lost then the hub must know alternative paths to the spoke network via other hubs. Direct neighborship between hub routers in the DIVE area would provide such alternate path. Thus in this scenario hub routers advertise summarized routing information into the core network and exchange non-summarized spoke prefix reachability via DIVE area adjacency. Note that in both scenarios above re-advertisement of routing information within DIVE area is done by Hub routers and information being re-advertized was received by the Hub from Spoke routers. Routers whose role in DIVE area was configured as Spoke MUST NOT re- advertise into the DIVE area routing information received via a DIVE area. Routers whose role in DIVE area was configured as Hub MUST NOT Smirnov Expires October 12, 2015 [Page 15] Internet-Draft OSPF Routing in large-scale networks April 2015 re-advertize routing information received from other Hub routers in this DIVE area. Route filtering and/or summarization is frequently configured on Hub routers. Summarization reduces number of LSAs to originate, maintain and flood. Managing LSDB size is an important aspect of scalability in a large-scale network. Summarization may be performed in both directions - to summarize reachability of core networks advertized toward the spoke sites (in the ultimate summarization case Hubs may advertize toward spokes only one - default - route) and to summarize into the core reachability of remote sites connected by the hub-and- spoke network. To improve stability of LSA advertising summarized routing information an implementation MUST allow cost of the summary route to be statically provided via configuration and SHOULD have static assignment of summary cost (as opposed to dynamically computing cost of the summary route from costs of component routes falling into the summary range) as default cost selection mechanism. A spoke site for redundancy reasons may be connected to the hub-and- spoke network by more than one spoke router. To prevent looping of routing information, routes propagated from the DIVE area into the spoke site network must not be re-advertised back into the DIVE area by another spoke router. This is achieved by setting by Spoke routers the Down bit in LSAs advertised into the spoke site network. Unlike of looping prevention for MPLS VPN PE routers [RFC4576], Spoke routers are allowed to install into their own routing table routes derived from LSAs with the Down bit set but they MUST NOT re- advertising them into the DIVE area. Following chapters describe route propagation and re-advertisement rules. For Hub routers LSA translation rules for routes learned from the DIVE area depend on if the Hub is connected to the backbone area or non-backbone area. 6.5.1. Hub routers: Propagation of routes from the core network into the DIVE area After completing calculation of routes during SPF DIVE area's Hub router will perform Area Border Router's functions. This section lists rules to propagate routing information from the core network into the DIVE area. Each prefix being propagated will be described by one Prefix TLV, see Section 6.3. Strategy of packing Prefix TLVs into LSAs (one or multiple Prefix TLVs per LSA; LS ID selection etc) is outside of the scope of this document. To advertise routing information into DIVE area a Hub router MUST: Smirnov Expires October 12, 2015 [Page 16] Internet-Draft OSPF Routing in large-scale networks April 2015 o Examine each reachable prefix in its routing table. If the best path for the prefix lies through the DIVE area then proceed to the next prefix. o Check if the route falls into any configured summary range. If it is, mark the summary as reachable. Compare type of the route with type of the summary using usual OSPF route preference rules (intra-area route is preferred over inter-area; external Type-1 is preferred over external Type-2 etc.). If route's type is more preferable store it as new type of the summary route and proceed to the next route o Otherwise add into DIVE area's LSAs route TLV of the appropriate type. The LSA MUST be added to LSDB of all interfaces where exist Spoke neighbor(s) in a state above Down: * If route is intra-area or inter-area then originate inter-area route TLV. Use cost of the route as cost advertised in the TLV * If route is Type-1 AS external route then originate Type-1 AS External route TLV. Use cost of the route as cost advertised in the TLV * If route is Type-2 AS external route then originate Type-2 AS External route TLV. Use Type-2 metric of the LSA which contributed to the route plus one as cost advertised in the TLV o For routes which became unreachable advertise the LSA without TLV corresponding to the route or flush the LSA if applicable Calculation of the summary route reachability and type, as well as flushing TLVs of unreachable routes is the same for all router roles and route propagation scenarios, so for brevity they are omitted in the following sections. 6.5.2. Hub routers: Propagation of routes from the DIVE area into the backbone area If the Hub router is connected to (i.e. has interfaces in) the backbone area then route advertisement rules are: o For routes whose LSA was originated by a Spoke router originate into the backbone area LSA of the corresponding type: * For intra- and inter-area routes originate Type-3 Summary LSA (OSPFv2) or Inter-area Prefix LSA (OSPFv3) using cost of the route Smirnov Expires October 12, 2015 [Page 17] Internet-Draft OSPF Routing in large-scale networks April 2015 * For Type-1 AS external routes originate Type-5 External LSA (OSPFv2) or AS-External LSA (OSPFv3) advertising Type-1 external route and using cost of the route * For Type-2 AS external routes originate Type-5 External LSA (OSPFv2) or AS-External LSA (OSPFv3) advertising Type-2 external route and using metric received in the Prefix TLV plus one * When advertising AS external routes the Hub router MUST also announce itself as ASBR o If LSA was not originated into the backbone because the route is subsumed by summarization then instead add TLV to the LSA in LSDB of all interfaces where exist Hub neighbor(s) in a state above Down. Otherwise to provide inter-spoke connectivity TLV MAY be added to the LSA in LSDB of all interfaces where exist Spoke neighbor(s) in a state above Down. In either case the TLV MUST have the same route type as route being advertised. For intra- inter-area and Type-1 external routes advertised cost is taken as cost of the route. For Type-2 external routes the cost is equal to metric received in the Prefix TLV plus one o Note that a Hub router MUST NOT advertise into either the backbone or to other Hubs routes received from Hubs 6.5.3. Hub routers: Propagation of routes from the DIVE area into the non-backbone area If the Hub router is not connected to the backbone area then it cannot advertise inter-area routing information. To provide compromise between network design flexibility and compatibility with [RFC2328]/[RFC5340] implementations the Hub router will advertise routing information as AS external routes. For routes whose LSA was originated by a Spoke the Hub router MAY originate into the non-backbone area LSA of the following type: o For intra- and inter-area routes originate Type-5 External LSA (OSPFv2) or AS-External LSA (OSPFv3) advertising Type-1 AS external route with metric equal to cost of the route o For Type-1 AS external routes originate Type-5 External LSA (OSPFv2) or AS-External LSA (OSPFv3) advertising Type-2 external route with metric of one Smirnov Expires October 12, 2015 [Page 18] Internet-Draft OSPF Routing in large-scale networks April 2015 o For Type-2 AS external routes originate Type-5 External LSA (OSPFv2) or AS-External LSA (OSPFv3) advertising Type-2 external route and using metric received in the Prefix TLV plus one Propagation of the route to other Hub or Spoke routers in the same DIVE area is the same as described in the previous section. 6.5.4. Route propagation on Spoke routers To advertise routing information received from the DIVE area into areas of the site network Spoke router MUST: o For intra- and inter-area routes originate into the site area Type-3 Summary LSA (OSPFv2) or Inter-Area-Prefix LSA (OSPFv3). Metric advertised in the LSA is set equal to cost of the route o For Type-1 AS external routes originate into the site area Type-5 External LSA (OSPFv2) or AS-External LSA (OSPFv3) advertising Type-1 external route with metric equal to cost of the route o For Type-2 AS external routes originate Type-5 External LSA (OSPFv2) or AS-External LSA (OSPFv3) advertising Type-2 external route with metric equal to the metric in TLV contributing to the route o In all above cases the LSA MUST have the Down bit set To advertise into DIVE area routing information learned from attached site network area Spoke router: o MUST skip routes which were produced from LSAs with the Down bit set o Since site area is a non-backbone area Spoke router MUST NOT have inter-area routes learned via the site network o For other route types add into DIVE area's LSAs route TLV of the type as listed below. The LSA MUST be added to LSDB of all interfaces where exist Hub neighbor(s) in a state above Down: * If route is intra-area then originate inter-area route TLV. Use cost of the route as cost advertised in the TLV * If route is Type-1 AS external or translatable NSSA route then originate Type-1 AS External route TLV. Use cost of the route as cost advertised in the TLV Smirnov Expires October 12, 2015 [Page 19] Internet-Draft OSPF Routing in large-scale networks April 2015 * If route is Type-2 AS external or translatable NSSA route then originate Type-2 AS External route TLV. Use Type-2 metric of the LSA which contributed to the route as cost advertised in the TLV 7. Other considerations for the DIVE area 7.1. Routing considerations Route propagation rules in DIVE area make sure that information is advertised between Hubs and Spokes and into respective connected areas. These rules prohibit multiple re-advertisement of the routing information within the DIVE area. Thus the DIVE area may only serve as shim layer between traditional OSPF areas and it is not possible to build full OSPF network functioning on principles of distance- vector protocol. Routing information traveling through DIVE area looses track of its true originator. To prevent routing loops, routes delivered via the DIVE area are made worse. For routes carrying metric comparable with cost of intra-domain path this is done by adding cost of links to reach route's origin. For routes carrying cost external to the OSPF domain this is done by incrementing the external cost. This increment in the metric also solves problem of originator receiving back its own routing information. For example, if spokes are connected to a Hub by a point-to-multipoint interface and the Hub wants to advertise to spokes prefix received from a Spoke router then the Spoke router which originated the prefix will receive its own information back even though the LSA has link-local flooding scope. Fast-poisoning of routes which became unreachable is ensured by rules which prevent a Spoke router from re-advertising back to Hubs (directly or indirectly via other Spoke routers connected to the same spoke site) any routing information received on the DIVE area interface. 7.2. LSDB size considerations LSAs in DIVE area have link-local flooding scope. This solves scalability problems of spoke routers because they don't have to deal with information originated for or from the other spokes (unless it is desired). This also solves input-output constraints on hub routers by limiting volume of information which has to be exchanged with each spoke. On the other hand this may have adverse effect on the size of the link-state database a hub router has to maintain. This is the case when spoke routers are connected by point-to-point OSPF interfaces. In this case the database size of hub router is multiplied by number of interfaces to spoke sites. This problem can Smirnov Expires October 12, 2015 [Page 20] Internet-Draft OSPF Routing in large-scale networks April 2015 be addressed by grouping spoke connections into smaller number of point-to-multipoint interfaces. 7.3. Optimal DIVE area design Given these considerations, the recommended DIVE area design for Hub routers is: o Spoke routers are connected via small number of point-to- multipoint interfaces o Hub routers, if necessary, are interconnected within the DIVE area via interfaces separate from connections to Spokes o Hub routers do route summarization of routing information they advertise both into the core network and into the DIVE area toward Spoke routers. 8. Backward Compatibility Devices attached to the DIVE area MUST conform to this specification. Awareness of devices is checked via new options bits in Hello packets before the start of adjacency formation, thus devices not supporting this specification cannot join the DIVE area. This specification is fully backward compatible with devices not immediately connected to DIVE area. New information defined by this specification is not propagated to such devices. Current specification includes measures to protect a network in case of basic misconfiguration or design problem. 9. Security Considerations This document does not introduce any new security implications. General security considerations described in [I-D.ietf-ospf-prefix-link-attr] and [I-D.ietf-ospf-ospfv3-lsa-extend] apply to LSAs in DIVE area. 10. IANA Considerations This specification updates several IANA OSPF registries: o New bits (DV-bits) are reserved in the "LLS Type 1 Extended Options and Flags" registry of the Extended Options and Flags Link Local Signaling TLV o New bits (DV-bits) are registered in the "OSPFv3 Options" registry Smirnov Expires October 12, 2015 [Page 21] Internet-Draft OSPF Routing in large-scale networks April 2015 o One new value is being added to the registry of OSPFv2 Extended Prefix TLV Sub-TLVs (Metric sub-TLV) 11. Acknowledgements The author would like to thank Paul Wells and Alvaro Retana for early discussions. 12. References 12.1. Normative References [I-D.ietf-ospf-ospfv3-lsa-extend] Lindem, A., Mirtorabi, S., Roy, A., and F. Baker, "OSPFv3 LSA Extendibility", draft-ietf-ospf-ospfv3-lsa-extend-04 (work in progress), September 2014. [I-D.ietf-ospf-prefix-link-attr] Psenak, P., Gredler, H., Shakir, R., Henderickx, W., Tantsura, J., and A. Lindem, "OSPFv2 Prefix/Link Attribute Advertisement", draft-ietf-ospf-prefix-link-attr-01 (work in progress), September 2014. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. [RFC5340] Coltun, R., Ferguson, D., Moy, J., and A. Lindem, "OSPF for IPv6", RFC 5340, July 2008. [RFC5613] Zinin, A., Roy, A., Nguyen, L., Friedman, B., and D. Yeung, "OSPF Link-Local Signaling", RFC 5613, August 2009. [RFC5820] Roy, A. and M. Chandra, "Extensions to OSPF to Support Mobile Ad Hoc Networking", RFC 5820, March 2010. 12.2. Informative References [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, February 2006. [RFC4576] Rosen, E., Psenak, P., and P. Pillay-Esnault, "Using a Link State Advertisement (LSA) Options Bit to Prevent Looping in BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4576, June 2006. Smirnov Expires October 12, 2015 [Page 22] Internet-Draft OSPF Routing in large-scale networks April 2015 [RFC4577] Rosen, E., Psenak, P., and P. Pillay-Esnault, "OSPF as the Provider/Customer Edge Protocol for BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4577, June 2006. [RFC4915] Psenak, P., Mirtorabi, S., Roy, A., Nguyen, L., and P. Pillay-Esnault, "Multi-Topology (MT) Routing in OSPF", RFC 4915, June 2007. Author's Address Anton Smirnov Cisco Systems, Inc. De Kleetlaan 6a Diegem 1831 Belgium Email: as@cisco.com Smirnov Expires October 12, 2015 [Page 23]