INTERNET DRAFT Ali Boudani, Bernard Cousin Expires: April 2005 Jean-Marie Bonnin IRISA-France October 2004 An Effective Solution for Multicast Scalability: The MPLS Multicast Tree (MMT) Status of this Memo By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, or will be disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668." Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsolete by other documents at anytime. It is inappropriate to use Internet Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract A multicast router should keep forwarding state for every multicast tree passing through it. The number of forwarding states grows with the number of groups. In this paper, we present a new approach, the MPLS multicast Tree (MMT), which utilizes MPLS LSPs between multicast tree branching node routers in order to reduce forwarding states and enhance scalability. In our approach only routers that are acting as multicast tree branching node routers for a group need to keep forwarding state for that group. All other non- branching node routers simply forward data packets by traffic engineered unicast routing using MPLS LSPs. We can deduce that our approach can be largely deployed because it uses for multicast traffic the same unicast MPLS forwarding scheme. We will briefly discuss the multicast scalability problem, related works and different techniques for forwarding state reduction. We discuss also the advantages of our approach, and conclude that it is feasible and promising. Finally, we analytically evaluate our approach. 1 Introduction Multicast has become increasingly important with the emergence of network-based applications such as IP telephony, video conferencing, Boudani et al. [Page 1] INTERNET-DRAFT The MPLS Multicast Tree (MMT) October 2004 distributed interactive simulation (DIS) and software upgrading. Using the multicast services, a single transmission is needed for sending a packet to n destinations, while n independent transmissions would be required using the unicast services. A multicast routing protocol should be simple to implement, scalable, robust, use minimal network overhead, consume minimal memory resources, and inter-operate with other multicast routing protocols. Multicast suffers from scalability problem with the number of concurrently active multicast groups because it requires a router to keep forwarding state for every multicast tree passing through it and the number of forwarding states grows with the number of groups. Scalability can be evaluated not only in terms of the overhead growth in the presence of a large number of groups but also by the number of participants per group and by groups for which the set of participants changes often over time. Overhead can be measured in terms of memory resources (in routers) as they relate to routing states maintained per group and can be measured also by bandwidth resources in terms of control or signaling messages per group and also by processing power. Recently, significant research effort has focused on the multicast scalability problem. Some schemes attempt to reduce forwarding state by tunneling [1] or by forwarding state aggregation [2]. Both these works attempt to aggregate routing state after this has been allocated to groups. Other architectures aim to eliminate forwarding states at routers either completely by explicitly encoding the list of destinations in the data packets, instead of using a multicast address [3] or partially by using branching node routers in the multicast tree [4,5]. The Xcast proposal [3], used for small groups, eliminates the need for forwarding states. The source encodes the list of destinations in the Xcast header, and then sends the packet to a router. Each router along the way parses the header, partitions the destinations based on each destination's next hop, and forwards a packet with an appropriate Xcast header to each of the next hops. In SEM [4], we proposed that the source uses unicast encoding for multicast packets and sends them to its next hop branching node routers. Each branching node router acts as a source and packets travel from a branching node router to another. A special mechanism was introduced to inform each branching node router about its next hop branching node routers for a group. REUNITE [5], uses recursive unicast trees to implement multicast service. REUNITE does not use class D IP addresses. Instead, both group identification and data forwarding are based on unicast IP addresses. Only branching node routers for a group need to keep multicast forwarding state. All other non-branching node routers simply forward data packets by unicast routing. We think that using the branching node routers to forward multicast data packets in unicast mode is very efficient in order to reduce forwarding state and thus enhance scalability. The main problem is how to ensure that a branching node router has a complete knowledge about its next hop branching node routers. And another issue is how can we reduce the effect of encapsulated multicast packets in Boudani et al. [Page 2] INTERNET-DRAFT The MPLS Multicast Tree (MMT) October 2004 unicast packets. We think that a network information manager system (NIMS) in each domain can be used to resolve the first problem while the multi protocol label switching (MPLS) [9] presents an efficient solution for the second problem. A framework for MPLS multicast traffic engineering that explain IP multicast deployment in MPLS environment was proposed by Ooms et al [6]. Another studies about MPLS and multicast proposed by Farinacci et al. [7] explain how to use PIM to distribute MPLS labels for multicast routes. Other expired Internet drafts studied the same subject [10]. The latest interesting proposal is aggregated multicast [8]. The key idea of aggregated mulicast is that, instead of constructing a tree for each individual multicast session in the core network, one can have multiple multicast sessions share a single aggregated tree to reduce multicast state and, correspondingly, tree maintenance overhead at network core. In this proposal there was two requirements: (1) original group addresses of data packets must be preserved somewhere and can be recovered by exit nodes to determine how to further forward these packets;(2) some kind of identification for the aggregated tree which the group is using must be carried and transit nodes must forward packets based on that. Instead of using IP encapsulation as in SEM, which of course, adds complexity and processing overhead, a potentially much better possibility is to use MPLS [9]. To handle aggregated tree management and matching between multicast groups and aggregated trees, a centralized management entity called tree manager is introduced. In group to aggregated tree matching, complication arises when there is no perfect match or no existing aggregated tree covers a group. There was a disadvantage in leaky matching because certain bandwidth should be wasted to deliver data to nodes that are not involved for the group. In our proposal we intend to resolve also this problem of leaky matching. Using MPLS with multicast has many benefits not only for reducing multicast forwarding states but also for traffic engineering and QoS issues. In this paper, we only focus on the scalability problem. We propose a novel approach to reduce multicast state, the MPLS multicast Tree. This approach uses MPLS LSPs between multicast tree branching node routers in order to reduce forwarding states and enhance scalability. The main difference with previous approaches is that we use only multicast states for branching routers in the multicast tree. Other routers between two branching nodes don't have multicast states. Unicast is used between two branching routers. This way the total number of multicast forwarding states may be significantly reduced. By using MPLS LSPs between branching routers, memory resource is reduced also and no need for encapsulation techniques. The MPLS forwarding process conserve the router resources also. In this paper, we examine several design and implementation issues of our scheme and present an evaluation for our protocol. The remainder of this paper is organized as follow. In section 2 the concept of MPLS multicast aggregation is described and some implementation related issues are discussed. Section 3 contains the approach analysis and an evaluation for its forwarding state and Boudani et al. [Page 3] INTERNET-DRAFT The MPLS Multicast Tree (MMT) October 2004 messaging overhead. Section 4 is a summary followed by a list of references. 2 Observation and proposal: MPLS Multicast Tree The key idea of MPLS multicast tree is that, instead of having multicast forwarding states in all routers in a constructed tree for each individual multicast session in the core network (backbone), one can have only multicast forwarding states in branching node routers of the tree. By using the branching node routers, multicast forwarding states are reduced, correspondingly, the tree maintenance overhead at network core. The MPLS multicast tree is targeted basically for intra-domain multicast, but it is not limited for that only since it is based on MPLS, and can be used also for inter-domain multicast. 2.1 Observation In conventional multicast, routers on the multicast tree located between the source and a group member should maintain multicast states for the group. Let's consider the network illustrated in fig.1 (single multicast session with only one source and one group). Suppose that there are group members at router R5, then routers R1, R2 R3, R4 should all have multicast state for the group G even if they don't have directly connected members. S | R1 | R2 | R3 | R4 / \ / \ R6 R5 Fig.1 : Multicast tree (one source, one group) Let's take the same network but with multiple multicast sessions (one single source and two groups). We suppose that there are members for group G1 at R5 and members for group G2 at R6. Routers R1, R2, R3, R4 should maintain states for groups G1 and G2 even if they don't have directly connected group members. When the number of group increase in time multicast states became harder and harder. Fig.2 represents a multicast topology (constructed tree) resulting from a traceroute experiments [5]. In the entire multicast tree there are 8 branching node routers out of 97 routers. Fig.2: Traceroutre experiment from a source to 15 other sites Boudani et al. [Page 4] INTERNET-DRAFT The MPLS Multicast Tree (MMT) October 2004 One can conclude that maintaining multicast forwarding states in non branching node routers is a waste of resources and that only branching node routers for multicast groups should maintain multicast forwarding states. In conventional multicast a router can discover that it is a branching node router for a group but it doesn't have any idea about its next hop branching node routers for that group. A second observation is that even when 2 multicast trees share multiple links, multicast forwarding states on routers for the shared links will not be aware of that and there is no possibility for aggregation. In REUNITE there was a completely unicast approach for delivering multicast packets. However, many of the LAN and WAN technologies have native support for multicast. Sending individual unicast messages to each of the receivers in a multicast-capable subnet as ethernet is very inefficient. Multicast packets should always contain the IP multicast header. In SEM we have introduced a tunneling process which can be implemented to conventional multicast too but messages between branching node routers in order to construct tunnels can be considered as expensive overheads. Our proposal will take all these observations into account. Overheads are the header processing time that take a router to determine the routing and also the size of multicast routing table. These two overheads are related. Our goal is to minimize the multicast overheads in each router. In our proposal, only branching node routers will contain the multicast routing table. All other routers do not need the routing states. In aggregated multicast[8], the solution was to have a single shared aggregated multicast tree but as mentioned in discussions complication arises when there is no perfect match or no existing tree covers a group. The disadvantage here is that certain bandwidth is wasted to deliver data to nodes that are not involved for the group. Bandwidth can crucial factors for provisioning QoS in multicast networks and even for best effort Internet. 2.2 proposal In order to inform a branching node router about its next hop branching node routers for a group, each domain should contain a network information manager system (NIMS) for each group, charged to collect join messages from all group members in that domain. After collecting all join and prune messages, the NIMS should compute the multicast tree for that group in the domain. The computation for a group means discovering all branching node routers and the next hop branching node routers for each group. We should note that scalable multicast protocols like PIM-SM (shared tree) and CBT use a core router similar to the one used in our approach. In our approach we are suggesting that the NIMS should collect join messages from all group members and have a complete overview about the multicast network. Boudani et al. [Page 5] INTERNET-DRAFT The MPLS Multicast Tree (MMT) October 2004 The NIMS send then messages to all branching node routers to inform them about their next hop branching node routers. On receiving this message, a branching node router creates a multicast forwarding state for the multicast session. Once branching node routers and their next hop are identified, packets will be send from a branching node router to another until achieving their target. Tunneling between branching node routers was studied in [1], [4], and [5] but was judged very expensive and very complicated since multicast packets should be encapsulated as unicast packets and send after that over the tunnel. Based on observations, we propose a new approach, the MPLS multicast tree, which utilizes MPLS dynamically established LSPs between multicast tree branching node routers in order to reduce forwarding states and enhance scalability. When a multicast packet arrives to the ingress router of a MPLS domain, the packet is analyzed according to its multicast IP header. The router should determine who are the next hop branching node routers for that packet. Based on this information, multiple copies of the packets are generated and a MPLS label is pushed to the multicast packet corresponding to each next branching node router. Then a labeled multicast packet will not be different from a labeled unicast packet. When arriving to a next hop branching node router, the label is pulled and again the same process is repeated. This process should be repeated until the packet arrives to its destination. When arriving to a LAN, the packet unlabeled can be delivered by conventional multicast protocols according to IGMP informations. Note that labeled multicast packets will be examined at each branching router and that is why a multicast packet on non-branching router will be treated as a MPLS labeled unicast packet. So no extra charges are needed for multicast packets since the MPLS tables already exist in routers. The examination of a multicast packets in intermediate routers on the tree is considered as a disadvantage of this approach but we leave this issue for further studies. When a branching node router receive the message from the NIMS it should calculate the label corresponding to the next hop router and it should add a label with an interface to the (S, G) entry in the branching node router. In this way we can ensure that there is no extra overhead. 2.3 Protocol evaluation The efficiency of the MMT can be evaluated in terms of state information requirement, tree cost, data processing efficiency, and control overhead. In this analysis, we will focus on the state information requirement, and control overhead of the MMT. The state information requirement can be measured using the average multicast forwarding table size. State information analysis includes also the MPLS overhead. The control overhead can be measured using the total number of control packets sent to all the links that are needed to maintain the protocol states. Boudani et al. [Page 6] INTERNET-DRAFT The MPLS Multicast Tree (MMT) October 2004 2.4 MPLS overhead In MMT, MPLS as unicast traffic engineering tool will be used also as multicast traffic engineering tool. IN MMT, multicast will take all the benefits of MPLS, that already been used for unicast, and will introduce no extra overheads. A multicast packet will be treated like a unicast packet and that is an advantage. 2.5 Control overhead The control overhead of MMT can be measured in terms of average number of control messages sent per link or the total percentage of bandwidth spent on control traffic. Boudani et al. [Page 7] INTERNET-DRAFT The MPLS Multicast Tree (MMT) October 2004 In PIM-SSM, each distribution tree needs to be refreshed periodically and that to rapidly detect a router that belong to a tree, goes down. In our approach, we don't need the join and prune messages periodically. When a router goes down, the unicast tables will detect it and thus no extra information is needed. 3 related issues 3.1 The tree manager One can argue that using NIMS and sending join and leaves messages to it are weak points in our approach. But we have to mention that the most scalable approaches in conventional multicast like PIM-SM and CBT use similar techniques. In PIM-SM and CBT, a router should act as a Rendez-Vous point that collect join messages for each group and these join messages will be refreshed periodically. But in our approach, messages will not be sent periodically. Also a few messages will be sent from the NIMS to branching routers, thus there is no remarkable overhead. We think that our approach has an advantage over these conventional multicast protocols since we don't force multicast packets to be sent all the way to the Rendez-Vous point and next to receivers. By contrast packets follow always the constraints shortest path. Besides there is no switching between shared tree and source specific tree. NIMS could be unique for each group like the case in CBT and PIM-SM. We can also imagine that there is multiple NIMSs that can communicate with each other to update informations about topology. Finally, in OSPF networks the NIMS can easily collect informations. 3.2 MPLS issues 3.2.1 Multicast aggregation In conventional multicast, it is not possible to aggregate multicast IP addresses. Receivers can be located anywhere in the Internet, there is no other alternative than having one entry by multicast IP address in the multicast routing table. Some scheme tried to aggregate by using leaky trees. Since in MMT, we are using MPLS, aggregation of multicast IP addresses can be transformed to label aggregations. There is no wasted bandwidth to realize aggregation, the aggregation overhead is zero. We have an advantage here that the traffic engineer will control perfectly its network. Multicast groups may share some links in their multicast trees and aggregation introduce benefits. 3.2.2 Loop detection In MMT, multicast loop detection for multicast traffic is the same loop detection in unicast since packets will be send using MPLS labels between branching node routers. Boudani et al. [Page 8] INTERNET-DRAFT The MPLS Multicast Tree (MMT) October 2004 3.2.3 LSP constructions One can say that LSP construction is very expensive if there is no multicast traffic, but these LSPs are already constructed and used by unicast traffic also and thus no extra LSP construction overheads. There are no MPLS extra states for multicast traffic needed. 3.2.4 Multicast LDP extensions There are no modifications to LDP. Labels used for multicast packets are the same used for unicast packets and that is the major difference with all other MPLS multicast approaches. We have just to be sure that at branching node routers, when a packet arrive, the label is pulled up and the new labels (corresponding to each next hop branching node router) are pushed in. 3.2.5 Multicast load balancing In MMT, multicast packets will be sent from a branching node to another, but we are not obliged to follow the same path constructed by conventional multicast protocols. By using different LSPs, load balancing is ensured. 3.3 Using MMT for inter-domain multicast Join messages in domains are sent to the NIMS. Each domain had its own NIMS for the group. To receive packets from sources belonging to other domains, the NIMS, after calculating which border router will transmit the packets, will sent a message to create forwarding state for the group in that border router. Due to the creation of this forwarding state, the border router will contact border routers in other domains with a normal (S, G) join message. These border routers will contact NIMS routers in their domains. Other domains don't need to be implementing MMT. Once a packet arrives in a border router for a particular domain, a label is pushed down and sent to all receivers. 3.4 Other ideas (difference between multicast and unicast traffic) In our approach, multicast packets will follow the same path as unicast packets. One can say that multicast packets should follow different paths from unicast packets, but this solution is too heavy because encoding technique described in [11] will be used. There is no meaning then to have the same labels for unicast and multicast and there are extra forwarding states in the routers. 3.5 Switching between L2 and L3 in branching node router One may say that switching between L2 and L3 in branching node router eliminates the gain obtained by using the MPLS. This is true Boudani et al. [Page 9] INTERNET-DRAFT The MPLS Multicast Tree (MMT) October 2004 if there is a lot of branching node routers in the multicast tree in a domain. But, it is also true that it is better than having multicast states in all routers. But as a solution for this problem. We can imagine that every multicast session (S,G) is associated at the ingress router of a domain to a label (this is the NIMS mission to reserve labels for sessions). this label represents the group only in the domain, so there is no need for label allocation. The modified label distribution protocol can manage this lable distribution. One may argue that we can not attribute and associate labels to all multicast sessions since the multicast labels are limited (20 bits). This is true, but the traffic engineer should manage the demands and include the QoS and differtiated services to serve the sessions in the way that more important sessions passes first. This is very convenient with the Internet best-effort logic. 4 Conclusion In this paper, we present a new approach, the MPLS multicast tree, which utilizes MPLS LSPs between multicast tree branching node routers in order to reduce forwarding states and enhance scalability. In our approach only routers that are acting as multicast tree branching node routers for a group need to keep forwarding state for that group. All other non-branching node routers simply forward data packets by traffic engineered unicast routing using MPLS LSPs. In our approach, we do not need MPLS multicast labels different from those used for unicast. The MPLS labels used for unicast are used for multicast traffic also. Of course the disadvantage is that multicast packets need to be examined at the IP level in each branching node router on the multicast tree. We leave this issue for further studies. We briefly discussed the multicast scalability problem and related works for forwarding state reduction. We discussed also the advantages of our approach, and concluded that it is feasible and promising. Finally, we analytically evaluate our approach and we deduced that our approach could be largely deployed because it uses for multicast the same unicast MPLS forwarding scheme. References [1] J. Tian et al., Forwarding state reduction for sparse mode multicast Communication,Proceedings of IEEE INFOCOM, MArch 1998. [2] P. Radoslavov et al., Exploiting the bandwidth-memory tradeoff in Boudani et al. [Page 10] INTERNET-DRAFT The MPLS Multicast Tree (MMT) October 2004 multicast state aggregation, Technical report, USC Dept. of CS Technical Report 99-697, july 1999. [3] R. Boivie et al., Explicit Multicast (Xcast) Basic Specification, Internet draft, 2000. [4] A. Boudani et al., Simple Explicit Multicast(SEM), Internet draft, 2001. [5] I. Stoica, et al., "REUNITE: A recursive Unicast Approach to Multicast", http://www.ieee-infocom.org/2000/papers/613.ps. [6] D. Ooms et al., Framework for IP multicast in MPLS, Internet draft, december 2000 [7] D. Farinacci et al., Using PIM to distribute MPLS labels for multicast routes, Internet draft, november 2000 [8] A. Fei and al., Aggregated multicast: an approach to reduce multicast state, UCLA CSD Technical Report #010012, June 2001 [9] Multiprotocol label switching architecture, RFC3031, January 2001. [10] http://ardnoc41.canet2.net/mpls/drafts/index.html. [11] E. Rosen et al., MPLS label stack encoding, RFC3032, January 2001. Authors Addresses Ali Boudani IRISA/INRIA Rennes Campus Universitaire de Beaulieu Avenue du General Leclerc 35042 Rennes France Phone : (33) 02 99 84 25 37 Fax : (33) 02 99 84 25 29 E-mail : aboudani@irisa.fr Bernard Cousin IRISA/INRIA Rennes Campus Universitaire de Beaulieu Avenue du General Leclerc 35042 Rennes France Phone : (33) 02 99 84 73 33 Fax : (33) 02 99 84 71 71 E-mail : bcousin@irisa.fr Jean Marie Bonnin IRISA/INRIA Rennes Campus Universitaire de Beaulieu Avenue du General Leclerc 35042 Rennes France Phone : (33) 02 99 12 70 07 Fax : (33) 02 99 12 70 30 Boudani et al. [Page 11] INTERNET-DRAFT The MPLS Multicast Tree (MMT) October 2004 E-mail : jm.bonnin@enst-bretagne.fr Copyright Statement Copyright (C) The Internet Society (2004). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights." "This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Expiration Date: April 2005