Internet Engineering Task Force A. Ballardie INTERNET DRAFT Consultant C. Fletcher London Internet Exchange 26 January 2002 Multicast Router-Switch Protocol (MRSP) Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright Notice Copyright (C) The Internet Society (2001). All Rights Reserved. Abstract MRSP is a layer 2 protocol that runs over Ethernet. Together with minor enhancements to PIM-SM [PIM] and PIM-SSM [SSM] it can be used to build uni-directional multicast forwarding state in Ethernet Switches that interconnect IP multicast routers. MRSP's forwarding state a) restricts (S,G) multicast traffic to only those routers that Ballardie/Fletcher [Page 1] INTERNET-DRAFT Expires: July 2002 January 2002 request it, and b) allows multiple ingress points for (S,G) multicast traffic by building multiple distribution trees per (S,G). This allows network operators to restrict multicast traffic to a subset of routers on the switched Ethernet network, and provides the capability to implement AS based multicast routing policies. These features are particularly relevant and important to switch based multicast Internet Exchange points. Note, worked examples accompany this draft, available at: www.linx.net/chris/mrsp/ 1. Introduction IGMP "snooping" Switches [SNOOP], layer 2 protocols such as GMRP [IEEE], and some proprietary protocols (e.g. Cisco's CGMP) constrain IP multicast traffic flow from Switches to group member hosts, but no IETF protocol exists to restrict IP multicast traffic between switches or from switches to routers. This draft addresses these issues, and the issue of inter-domain multicast policy routing. This draft builds on the Multicast-Friendly Internet Exchange architecture draft [MIX] published in 1999. That draft was motivated because the multicast-enabled parts of the Internet - the MBONE [MBONE] - which comprised one flat, virtual, and largely tunnelled routing domain, had already outgrown itself from a manageability point of view, and there was a need to document a set of conventions to transition the MBONE to a network architecture similar to the unicast Internet architecture. This architecture comprises multicast capable Autonomous Systems (ASs) interconnecting at Internet Exchange points. It is IETF recommended practice that multicast routing between ASs is source based, explicit join (e.g. PIM-SM [PIM]); flood & prune multicast routing is not recommended between ASs. MRSP ensures that multicast traffic flow across the Exchange (layer 2) switched infrastructure reflects multicast routing policy. Multicast routing policy expression and enforcement are currently limited in the Switched Internet Exchange environment for reasons explained in section 2.1. To these ends, we are revisiting the issues of running IP multicast over Internet Exchange points. 2. Review of MIX The goal of MIX [MIX] was to define a set of protocols and operating Ballardie/Fletcher [Page 2] INTERNET-DRAFT Expires: July 2002 January 2002 procedures to enable native, scalable, policy based multicast routing and traffic forwarding over an Internet Exchange point, without imposing any constraints on intra-domain multicast. MIX recommended FDDI or ATM PVCs rather than switches as the layer 2 medium due to "a number of unresolved issues" with switches at that time. Layer 2 Switching technology has since advanced considerably, particularly in respect of their multicast "awareness" and multi-layer capabilities, and there appears to be a growing trend to deploying high speed Switching technology at Internet Exchange points. The MRSP protocol we propose is fully dependent on the use of Ethernet based Switches at the Exchange points. In support of its objectives, MIX proposed that Exchange members use M-BGP [BGP-4+] in support of policy based multicast paths (reachability), PIM-SM [PIM] as the multicast routing protocol to use across the Exchange, and MSDP [MSDP] for announcing active sources between domains. 2.1 Limitations of the Current (MIX) Architecture All routers attached to an Exchange point switch typically belong to the same (Virtual) LAN / subnet. With multiple routers reachable via a single interface, interface based RPF checks are not guaranteed to ensure a multicast packet was forwarded by the RPF M-BGP peer for the packet's source, and therefore multicast routing policy cannot be enforced. Consequently, Exchanges implicitly impose fully meshed M- BGP peerings between multicast Exchange members. This fact is compounded by the use of PIM's ASSERT mechanism across the Exchange; PIM's ASSERT mechanism elects a single LAN forwarder per source on multi-access LANs when multiple routers would otherwise forward S traffic onto the LAN. Consequently, PIM is dictating multicast policy when M-BGP was designed for that purpose. 3. MRSP On a multi-access LAN the issues of multicast traffic containment and and implementation of multicast routing policy can only be addressed if the layer 2 infrastructure is capable of supporting multiple layer 2 multicast distribution trees per source. MRSP is a layer 2 (Ethernet) protocol designed for building and maintaining layer 2 multicast distribution trees. Ballardie/Fletcher [Page 3] INTERNET-DRAFT Expires: July 2002 January 2002 3.1 MRSP Requirements MRSP must be administratively enabled on switch ports and multicast router interfaces. In an infrastructure containing multiple switches, all switches must enable MRSP on all router ports, and all routers must enable MRSP on all switch ports. Enabling MRSP on a switch port(s) causes the default multicast forwarding behaviour on those switch ports to be: filter all received multicast traffic unless MRSP forwarding state exists for the particular (S,G) combination. Router interfaces with MRSP enabled are required to run a layer 3 explicit join multicast routing protocol supporting (S,G) joins/prunes (e.g. PIM-SM, PIM-SSM). MRSP implementations should not restrict multicast traffic for link local groups (224.0.0.0/24). MRSP implementations should be configurable so as NOT to restrict other groups or group ranges if so desired. MRSP is a layer 2 protocol encapsulated by Ethernet, and will require the assignment of a new Ethertype. 3.2. MRSP Functional Overview MRSP is designed to build uni-directional layer 2 multicast distribution trees. Within the scope of a switched Ethernet network, these trees are source based. To satisfy multicast routing policy, it may be necessary to build/maintain multiple trees per source. The layer 2 multicast distribution trees are realised by MRSP, which is used to add/modify/delete multicast forwarding state in Switches that interconnect routers. MRSP joins/prunes serve this purpose. No MRSP forwarding state is maintained by routers. MRSP's actions (events) are initially driven by the layer 3 explicit join multicast routing protocol, e.g. PIM-SM or PIM-SSM. This implies that (most) MRSP messages are triggered by routers. For MRSP to function as described requires minor enhancements to PIM- SM (and PIM-SSM). These are as follows: +o As already explained, PIM ASSERTs elect a single ingress domain per source, thereby limiting multicast routing policy. Multicast Ballardie/Fletcher [Page 4] INTERNET-DRAFT Expires: July 2002 January 2002 routing protocols, including PIM, were not designed as policy rout- ing protocols; M-BGP was designed for this purpose. We therefore recommend that PIM's ASSERT mechanism be disabled when running PIM over a multicast Exchange point. MRSP Switch forwarding state allows multiple ingresses per multicast source without causing duplicates, and thereby puts multicast routing policy back in the control of M-BGP. +o A multicast Exchange member router MAY (and indeed SHOULD) apply policy verification to a received L3 Join, and have the ability to reject as well as accept the join. In order to provide this capa- bility, we recommend PIM be enhanced with a JOIN_NACK for use in this type of environment. +o Finally, since MRSP makes the shared LAN behave more like a series of point-to-point links, it is no longer necessary (or desirable at an Exchange point) to multicast L3 joins/prunes. We therefore sug- gest PIM be modified to incorporate (perhaps as a configurable option) unicast joins/prunes. MRSP is a soft-state protocol; MRSP Joins must be refreshed periodi- cally otherwise the Switch state they previously instantiated will expire. 3.3 MRSP Protocol There is one MRSP protocol component for Routers, and another for Switches. MRSP devices use an HELLO protocol (tbd) to monitor "liveness" of adjacent MRSP devices. The layer 2 MRSP Joins/Prunes are explicitly addressed to a layer 2 group address listened to by all Switches (assignment to be requested). These L2 join/prunes are processed by all Switches on the L2 spanning tree path between the L2 join/prune originator router and intended recipient router, both of which are carried in the MRSP join/prune. It follows that L2 joins/prunes are forwarded hop-by-hop over the spanning tree path leading to the L2 join/prune's intended L2 destination. A MRSP (layer 2) Join is triggered when a router receives (and accepts) a Layer 3 (S,G) Join on an MRSP interface. The MRSP Join Ballardie/Fletcher [Page 5] INTERNET-DRAFT Expires: July 2002 January 2002 travels in the reverse direction of the corresponding L3 Join. The MRSP Join establishes/augments MRSP forwarding state in the switches it traverses. A switch's MRSP forwarding entry has one upstream port (the port over which the MRSP Join arrives) and one or more down- stream ports (the port(s) over which a MRSP Join is forwarded). Since MRSP Joins travel downstream wrt the source, MRSP Joins can be aggregated from their upstream point of origin, thus reducing the MRSP message overhead. An aggregated MRSP Join carries a list of intended recipient routers which are downstream. As this join travels towards the recipient routers, the spanning tree path to different recipient routers will diverge. At diverging points, the MRSP Join is duplicated and modified as necessary. A (layer 2) MRSP Prune is triggered by a router immediately after the router sends a L3 Prune. A MRSP Prune flows in the same direction as the corresponding L3 Prune. A MRSP Prune removes the port over which it arrived from the corresponding MRSP forwarding entry. Similar to a layer 3 Prune, a MRSP Prune is not forwarded upstream by a switch if, after processing the MRSP Prune, that switch still has downstream forwarding state for the same (S,G). If a MRSP Prune is lost, the corresponding Switch state will eventu- ally time out through lack of MRSP Join refresh (since the L3 joins have ceased, so too do the L2 MRSP joins). When MRSP is disabled on a router's interface, the last MRSP message a router sends on that interface is a MRSP BYE message. This message may be unicast or multicast to the Switch, but the message is not forwarded. On receipt of the BYE message, the Switch resumes its default multicast forwarding behaviour on that port. 3.4 MRSP Multicast Forwarding State A MRSP Switch multicast forwarding entry consists of: Layer 2 source address, layer 3 source address, layer 3 group address, out-port-list. A Switch uses its MRSP forwarding state as follows: when a multicast frame arrives via a MRSP Switch port, the frame's L2 source address is used as the primary index into the forwarding table. Several entries may exist per L2 source address. The L2 source is RPF checked by the switch to ensure it arrived on the correct port for the L2 source. The Layer 3 source and destination IP addresses in the data- gram must be matched with the L3 (IP) source and group fields in the Ballardie/Fletcher [Page 6] INTERNET-DRAFT Expires: July 2002 January 2002 forwarding table to uniquely identify the correct forwarding entry. If all these tests are successful, a copy the frame is forwarded over each port listed in the out-port-list, otherwise the frame is dis- carded. 3.5 MRSP Message Types There are 5 types of MRSP message: +o type 1: HELLO, used to establish MRSP "liveness" of a neighbouring MRSP device. +o type 2: JOIN, instigated by routers, used for establishing/refresh- ing Switch multicast forwarding state. +o type 3: PRUNE, instigated by routers, used for modifying/deleting Switch multicast forwarding state. +o type 4: ERROR, used for signalling MRSP error conditions to a neighbouring MRSP device. +o type 5: BYE, used to inform a MRSP neighbour that MRSP is being disabled on this neighbour. 4. Summary This draft has described a new Router-Switch protocol for Ethernet, MRSP, which when combined with our suggested enhancements to PIM (SM and SSM), can be used to build uni-directional multicast forwarding state in Ethernet Switches that interconnect IP multicast routers. The advantages of MRSP (combined with the suggested enhancements to PIM) are that a) (S,G) multicast traffic is restricted only to those routers that request it, and b) it allows multiple ingress points per multicast source. The end result is that multicast traffic flow flows only where it is wanted, saving network resources, and multicast pol- icy can be applied and enforced. These features are currently not supported either separately, or together, by any existing IETF proto- col(s). However, these features are particularly relevant and impor- tant to inter-domain multicasting, given multicast domains increas- ingly interconnect at switch based Internet Exchange points. Ballardie/Fletcher [Page 7] INTERNET-DRAFT Expires: July 2002 January 2002 References [MIX] H. LaMaster, S. Schulz, J. Meylor, D. Meyer. Multicast-Friendly Internet Exchange (MIX). Work in progress, June 1999. [SNOOP] M. Christensen, F. Solensky. IGMPv3 and IGMP Snooping Switches. Work in progress, February 2001. draft-ietf-idmr- snoop-00.txt [IEEE] IEEE 802.1D, see http://www.ieee802.org/1/pages/802.1D.html [MBONE] [PIM-SM] W. Fenner, M. Handley, H. Holbrook, I. Kouvelas. Protocol Independent Multicast - Sparse Mode Protocol Specification. Work in progress, March 2001. draft-ietf-pim-sm-v2-new-02.txt,ps. [SSM] S. Bhattacharyya et al. An Overview of Source-Specific Multi- cast(SSM) Deployment. Work in progress, July 2000. draft-bhattach- pim-ssm-00.txt [BGP-4+] T. Bates , R. Chandra , D. Katz , Y. Rekhter, "Multiprotocol Extensions for BGP-4", RFC 2283 [MSDP] D. Farinacci et al. Multicast Source Discovery Protocol. Work in progress, January, 2000. draft-ietf-msdp-spec-02.txt Author Information Tony Ballardie Consultant ABallardie@acm.org Chris Fletcher London Internet Exchange 3 Park Road Peterborough PE1 2UX UK chris@linx.net Ballardie/Fletcher [Page 8] INTERNET-DRAFT Expires: July 2002 January 2002 Full Copyright Statement Copyright (C) The Internet Society (2001). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this docu- ment itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of develop- ing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MER- CHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC editor function is currently provided by the Internet Society. Ballardie/Fletcher [Page 9]