Network Working Group Dino Farinacci Internet Draft Yakov Rekhter Expiration Date: December 1999 Eric C. Rosen Cisco Systems, Inc. June 1999 Using PIM to Distribute MPLS Labels for Multicast Routes draft-farinacci-mpls-multicast-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document specifies a method of distributing MPLS labels for multicast routes. The labels are distributed in the same PIM messages that are used to create the corresponding routes. The method is media-type independent, and therefore works for multi- access/multicast capable LANs, point-to-point links, and NBMA networks. Farinacci, Rekhter & Rosen [Page 1] Internet Draft draft-farinacci-mpls-multicast-00.txt June 1999 Table of Contents 1 Overview ........................................... 2 2 Proposal ........................................... 3 2.1 Piggybacking ....................................... 3 2.2 Labels for LANs with Multiple Downstream Nodes ..... 5 2.3 Labels for Point-to-Point Links .................... 5 2.4 Labels for NBMA Networks ........................... 5 2.5 Corner cases ....................................... 6 2.6 When NOT to Send a Labelled Multicast Packet ....... 7 2.7 No Conflict between Unicast and Multicast Labels ... 7 3 Modifications to PIMv2 ............................. 7 4 Label Distribution for dense-mode groups ........... 8 5 Security Considerations ............................ 9 6 Acknowledgments .................................... 9 7 References ......................................... 9 1. Overview PIM [2] is used to combine MPLS label distribution with the distribution of (*,G) join state, (S,G) join state, or (S,G)RPT-bit prune state. Labels and multicast routes are sent together in one message. The design of this method has been motivated by the following goals: o If an interface attaches to a network with data-link broadcast capability, an LSR should never have to send more than one copy of a given multicast data packet out that interface. However, it is NOT a goal for that LSR to be able to send the same packet, with the same label, out multiple interfaces. o When an interface supports data link multicasting, it must be possible to have a single Label Information Base (LIB) for that interface. That is, the receiver of a labeled packet should be able to interpret the label without knowing who the transmitter is. o When a LAN contains multiple label distribution peers, it should be possible to use data link multicast to distribute the label distribution control packets themselves. Other aspects of label distribution methodology should remain as consistent with unicast label distribution as possible. Multicast label distribution Farinacci, Rekhter & Rosen [Page 2] Internet Draft draft-farinacci-mpls-multicast-00.txt June 1999 procedures should not depend on the media type. o Once the label for a particular multicast tree on a given LAN has been assigned, unicast routing changes should not cause redistribution or reassignment of the label for that group on that LAN. o When a multicast routing table change requires a label distribution change, the latency between the two should be minimized, both to improve performance and to minimize the possibility of race conditions. o The procedures should work with either dense-mode or sparse mode operation. 2. Proposal 2.1. Piggybacking A LSR that supports multicast sends PIM Join/Prune messages on behalf of hosts that join groups. It sends Join/Prune messages to upstream neighboring LSRs toward the RP for the shared-tree (*,G) or toward a source for a source-tree (S,G). Labels are distributed by being associated with addresses in the join list or the prune list. In particular: 1. If an LSR, Rd, joins the shared tree for a group, the Join/Prune message it sends upstream will contain the group address followed by a join-list. The join-list will contain an element which contains the address of the RP. This element will also contain a a label, and this label can be used by the upstream LSR, Ru, when it sends multicast data down the shared tree. Intuitively, this label represents the route downstream from the current node along the shared tree. 2. If an LSR, Rd, joins a source tree for a group, the Join/Prune message it sends upstream will contain the group address followed by a join-list. The join-list will contain an element which contains the address of the source. This element will also contain a label, and this label can be used by the upstream LSR, Ru, when it sends multicast data down the source tree. Intuitively, this label represents the route downstream from the current node along the specified source tree. Farinacci, Rekhter & Rosen [Page 3] Internet Draft draft-farinacci-mpls-multicast-00.txt June 1999 3. Suppose an LSR, Rd, has (S,G)RPT-bit state with a null output interface list. This indicates that all of its downstream neighbors on the shared tree for G have pruned source S from the shared tree. Rd sends a Join/Prune message upstream (on the shared tree), containing the group address followed by a prune-list. The prune-list contains an element which contains the address of the source. In this case, no label is included in the element. 4. Suppose an LSR, Rd, as the result of receiving, from a downstream neighbor on the shared tree, a Join/Prune message such as described in 3, creates (S,G)RPT-bit state with a non- null output interface list. In this case, it may send a Join/Prune message upstream on the shared tree, containing the group address followed by a prune-list. An element of the prune list will contain the address S and a corresponding label. However, a special bit (the "don't prune" bit) in the element will be set indicating to the upstream LSR that the source S is not really to be pruned from the shared tree. The result is that the upstream LSR, Ru, will still send packets from S to G to Rd, and will label those packets as specified. When Rd receives such packets, it forwards them according to the output interface list of the (S,G)RPT-bit entry. Intuitively, this label represents a route along the shared tree, but only for packets from the specified source. 5. An LSR which receives a Join/Prune message as described in 4 may send a corresponding Join/Prune message (with the "don't prune" bit set) to its upstream LSR on the shared tree. Again, this label represents a route along the shared tree, but only for packets from the specified source. Rules 3-5 above ensure that if a source is pruned off the shared tree at some point, any packets from that source which is sent down the shared tree will have a label that implicitly identifies the source. Thus if those packets encounter a node with (S,G)RPT-bit state, they will be sent according to the output interface list of the (S,G)RPT- bit entry, NOT according to the output interface list of the (*,G) entry. Farinacci, Rekhter & Rosen [Page 4] Internet Draft draft-farinacci-mpls-multicast-00.txt June 1999 2.2. Labels for LANs with Multiple Downstream Nodes Since PIM Join/Prune messages are multicast on a LAN, other downstream LSRs that are interested in the group will hear the message. They must cache the binding of multicast routing table state and label state together. Since the upstream LSR is going to forward data packets using the advertised label, they must be ready to accept the data packet with that advertised label. The first downstream LSR that joins a group is the label assigner on that LAN for that multicast route. All other downstream LSRs that send PIM Join/Prune messages will use the same label that the assigner selected. A LSR that sends a PIM Join/Prune message with a label of 0 means that it doesn't know the label for the associated multicast routing table entry. When this occurs, the assigner can trigger a PIM Join/Prune message making the label known. 2.3. Labels for Point-to-Point Links The procedure of section 2.2 works on point-to-point links because there is only one downstream LSR on the link which always becomes the label assigner. 2.4. Labels for NBMA Networks On NBMA networks, all PIM routers are known to each other through pseudo-broadcast mechanisms provided by the data-link layer. However, PIM Join messages are unicast to the upstream LSR. Therefore, other downstream LSRs will not hear the label assigner's advertisement. Therefore we treat an NBMA network with one upstream and n downstream LSRs as n point-to-point links, from the upstream LSR to each of the downstream LSRs. Each downstream LSR then assigns its own label, and the upstream LSR must replicate the multicast data packets. Therefore the procedure of section 2.2 applies. Note that this is not incompatible with the use of native point-to- multipoint capabilities at the data link layer. Farinacci, Rekhter & Rosen [Page 5] Internet Draft draft-farinacci-mpls-multicast-00.txt June 1999 2.5. Corner cases Multiple downstream LSRs cannot assign the same label value for any multicast route because they partition the label space into non- overlapping ranges according to [4]. When a LSR is enabled on an interface, it obtains a unique label range for the LAN. When the label assigner leaves the group, the label that it assigned still remains active. The next highest IP addressed downstream LSR becomes the owner of that label and may change it if it sees fit. However, it is not required to change it. All downstream LSRs can continue to use the assignment in their Join messages. If two systems both join for the first time (they do not have state), at the same time and each choose a different label value, the highest IP addressed downstream LSR's label will be used by the upstream LSR. The lower addressed LSR will hear the higher addressed LSR's Join too and will also use it's label. If the label assigner crashes, the highest IP addressed downstream LSR assigns a new label to the multicast routes, which were assigned by the crashing LSR, and triggers a Join message so all other LSRs on the LAN to use the new label. When a LAN partitions due to a layer-2 switch failure, it follows the same logic for the case when a LSR stops joining for a group. When the partition heals, there may be an RPF neighbor change in one of the partitions. When there is an RPF neighbor change and the downstream routers trigger joins to their new RPF neighbor with a different label assignment than the other partition is using, one of two resolutions occur: 1) The LSR which is the allocator in the partition of the new RPF neighbor will trigger a join if it has a higher IP address than the allocator in the other region. The downstream routers in the other partition use the new label assignment immediately. 2) If the LSR which is the allocator in the partition of the new RPF neighbor has a lower IP address, all downstream routers and the new RPF neighbor will switch to the label assigned by the allocator in the other partition. If an RPF change occurs (the topology changed so the upstream LSR is different), the PIM protocol spec indicates that a PIM Join may be triggered to get on the new distribution tree as soon as possible. In this case, if the label assigner becomes the upstream LSR, then the new highest IP addressed downstream LSR may become the label assigner. It may change the label if it sees fit. Otherwise, the same Farinacci, Rekhter & Rosen [Page 6] Internet Draft draft-farinacci-mpls-multicast-00.txt June 1999 label is used. 2.6. When NOT to Send a Labelled Multicast Packet PIM Hello messages, sent periodically by all PIM-capable routers, will indicate if the router is MPLS-capable. An upstream router on a LAN will therefore know if all routers on the same LAN are LSRs or not. If there are ANY MPLS-incapable routers which are interested in a particular group, the upstream router will transmit to the LAN only unlabelled multicast data packets for that group. If there are any group members on a LAN, only unlabelled multicast data for that group will be transmitted onto that LAN. Routers that support non-PIM multicast are assumed, for the purposes of this procedure, to be MPLS-incapable. 2.7. No Conflict between Unicast and Multicast Labels MPLS uses different data-link layer code-points [5] to distinguish multicast labeled packets from unicast labeled packets. Therefore, the assignment of labels for unicast routes is completely independent from the assignment of labels for multicast routes. For example, the same label value could be allocated for a unicast route and for a multicast route, without any possibility of ambiguity. 3. Modifications to PIMv2 PIMv2 has a packet format for each address type it may support when encoding both multicast and unicast addresses. We will define a new address type called "Label Address" for unicast address encoding. The label will accompany the source address in the Encoded Source Address format as specified in [2]. The label value will be in a 32-bit quantity following the source address. We also take one bit from the PIMv2 reserved field to be the "don't prune" bit (shown below as the "D" bit). So, for example, an IPv4 Label Address format would look like: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Rsrvd |D|S|W|R| Mask Len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Farinacci, Rekhter & Rosen [Page 7] Internet Draft draft-farinacci-mpls-multicast-00.txt June 1999 | Label | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Current Multicast Route Timer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Label If the high-order bit is clear, the low-order 20 bits are a label value (as described in [5]) assigned by the LSR sending the Join/Prune message. All other bits should be set to 0 by the sender and should be ignored by the receiver. If the high-order bit is set, the low-order 28 bits are a label value in the VPI/VCI format of (as described in [7]) assigned by the LSR sending the Join/Prune message. All other bits should be set to 0 by the senderand should be ignored by the receiver. Current Multicast Route Timer The sender of a Join/Prune message inserts the current time left before expiration for the multicast route table entry described by the Source Address (either the (S,G) or (*,G) entry). This is needed so all routers on a common multi-access subnet can time-out the entry close to the same time without each other recreating the state when the source goes inactive. Refer to [2] for other field descriptions not specified here. 4. Label Distribution for dense-mode groups In dense-mode PIM, there is no downstream Join message traveling upstream to perform the binding of multicast routes with labels. However, since we don't want a separate algorithm for dense-mode groups, we extend this basic design for dense-mode PIM. When a downstream LSR creates (S,G) state from the receipt of 1) data, or 2) Join/Prune or Graft messages, it will start a periodic timer to send Join messages with label assignment information present. The messages look no different and are treated on receipt no differently than in the sparse-mode case. The periodic Join message will be multicast on the LAN with an upstream target address of 0.0.0.0. All multicast LSRs on the LAN must know the group operates in dense-mode. This is accomplished using standard PIM mechanisms. Farinacci, Rekhter & Rosen [Page 8] Internet Draft draft-farinacci-mpls-multicast-00.txt June 1999 5. Security Considerations Security considerations are not discussed in this memo. 6. Acknowledgments The authors would like to thank Fred Baker for his comments. We also thank the authors of [6] for their critique of an earlier version. 9.0 Author's Addresses Dino Farinacci Cisco Systems, Inc. 170 Tasman Drive San Jose, CA, 95134 Email: dino@cisco.com Yakov Rekhter Cisco Systems, Inc. 170 Tasman Drive San Jose, CA, 95134 Email: yakov@cisco.com Eric C. Rosen Cisco Systems, Inc. 250 Apollo Drive Chelmsford, MA, 01824 Email: erosen@cisco.com 7. References [1] "Multiprotocol Label Switching Architecture", draft-ietf-mpls- arch-05.txt, Rosen, Viswanathan, Callon, April 1999. [2] "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification", RFC 2362, Estrin, Farinacci, Helmy, Thaler, Deering, Handley, Jacobson, Liu, Sharma, Wei, June 1998. [3] "LDP Specification", , Andersson, Doolan, Feldman, Fredette, Thomas, June 1999. Farinacci, Rekhter & Rosen [Page 9] Internet Draft draft-farinacci-mpls-multicast-00.txt June 1999 [4] "Partitioning Label Space amoung Multicast Routers on a Common Subnet", , Farinacci, October 1998. [5] "MPLS Label Stack Encoding", , Rosen, Rekhter, Farinacci, Tappan, Fedorkow, Li, Conta, April 1999. [6] "Framework for IP Multicast in MPLS", , Ooms, Livens, Sales, Ramalho, Acharya, Griffoul, Ansari, May 1999. [7] "MPLS using LDP and ATM VC Switching", , Davie, Lawrence, McCloghrie, Rekhter, Rosen, Swallow, Doolan, April 1999. Farinacci, Rekhter & Rosen [Page 10]