Internet Engineering Task Force Jon Crowcroft INTERNET DRAFT draft-crowcroft-mat-00.txt November, 2001 Expires: April, 2002 Multicast Address Translation Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract Network Address Translation is a technique for reducing the complexity of address re-assignment. It can also lead to a reduction in the routing state in the core. This draft is about using the technique for the same reasons for multicast. The approach is complementary to IP-in-IP tunnels for multicast, but avoids the overhead of encapsulation. It has the same state setup requirements at the edges, as does NAT. For multicast sessions, there is often an aggregate known as the "cross section" of traffic which is of sufficient common interest that it is carried across the core. In this case, MAT can be used to reduce the multicast olist routing state in the access routers, and the full (S,G) state in the core. Some promising candidate algorithms for this are described in the paper [NGC2001]. As well as extracting the tree from the routers and computing the overlaps and so on, clearly, a protocol to establish the state for mapping would be required in exactly the same way that a tunnel setup protocol is needed. This is for further work. This technique is not necessary for Source Specific Multicast (although it could be used). The interaction with IGMPv3 is to be studied. 1. Introduction In this document we propose the possible use of address translation as a means to aggregate state for multicast groups that have overlapping coverage in the core. This type of thing has been proposed before but typically by use of tunnels [Infocom 1998], or implicitly by examination of the possibility of state aggregation merely within a router [Infocom 2000]. Both of these approaches have drawbacks. The first has the problem with the additional header overhead and subsequent problems for MTU discovery (or configuration). The second is difficult since many service providers wish to account for traffic on a per (S,G, iif/oif) basis, and the aggregation technique potentially masks this. We do not intend to solve this latter problem (if providers want fine grain accounting they have to provision routers for fine grain state). We are proposing something that might avoid the former problem. Recently, aggregation of state amongst multiple trees was examined for realistic group numbers, membership distribution and network topologies [NGC 2001]. In that work, no specific mechanism other than encapsulation was proposed. As above, this has drawbacks of the additional header overhead. However, there is no particular reason why one could not use address translation to perform the aggregation, instead of tunnels. If two trees are congruent in a region, then we can translate all the destination addresses, Gi, into one G (e.g. G0). Usually, sources are distinct for distinct groups. In the case where a source is sending to more than group for which address translation is being performed in a region, we propose translating the source address(es) for the later created (or detected) group(s) at the ingress, and then re- translating at the egress point from the core. (Si to Si' mapping). A translation management protocol would piggy back on the routing protocol, and would contain for the various base Group (G0), the list of Gi, and the Si to Si' mappings. To preserve the RPF safety feature of multicast trees, we suggest that the mapping is from Si to an Si', drawn from an equal or more specific prefix that matches. Where G' has near coverage, but not perfect congruence, add links to egress routers where there are receivers for Gi, but not G0, where additional load is small, and add reverse re-translators back to Si for traffic where there are receivers for G0 but not Gi so that normal filtering removes the traffic (or add a filter - same h/w and s/w support often involved in many router operating systems anyhow). The rules for "near" coverage could be the same as those used to configure the RP to source tree switch in PIM-SM (or similar traffic threshold derived rules). An alternative approach to traffic based tree merging, would be to use actual topological similarity. Two approaches suggest themselves 1. Code the tree topologies as a data structure such as a depth then breadth, and do a pure length comparison: if the structures map to a certain threshold point, say they are "equivalent" the threshold to be set by managers at each node in the tree. 2. Use information from neighbouring routers in the tree- e.g. we can split neighbours into "same versus different AS", "same versus different level of OSPF or ISIS", "same versus different prefix", "same prefix up to some point". A Bloom filter could be applied with the "leakiness" deliberately set high (high is a TBD parameter!), which can then allow for approximate matching. 2. MAT and NAT One view of of address translation [RFC2663] is that it provides a valuable way to conserve addresses. Another view might be that it reduces the globally visible routing state since it means that only active hosts at sites use globally visible prefixes, and thus require forwarding entries. This is, of course, not free - it incurs both the translation state at the edges, and the protocol messages (e.g. via an ALG) to setup and maintain the translation state. In the case of Real Specific IP, the idea is extended to multiple consenting peer realms, rather than between a single global visible address space and any of a set of private address spaces. There is potentially an analogy between Real Specific IP and the use of MATs for Administratively Scoped multicast addresses, which should be explored in the interests of address conservation, as well as the main goal of core router multicast forwarding state reduction. An important aspect of NATs that has to be understood for Multicast Address Translation is the relationship between packet flows and session flows. In unicast applications to date, most the common cases are client-server, and the TU approach can be used to model how a session initiation relates to the subsequent packet flow, and therefore can be used to trigger (or figure out where to add explicitly) the NAT state instantiation. Multicast is somewhat different. We need to look at both the receiver oriented nature of multicast (i.e. IGMP, and the prune/graft messages in PIM), and the data driven aspects of some tree building (PIM SM and DVMRP and PIM DM flood and prune). Finally, higher level protocols that carry IP addresses are also important. In fact, NATs must be aware of them (or else near by firewalls and application layer gateways must coordinate with NATs to translate the application layer messages to, subject to all the security problems of being a "man in the middle", albeit with "permission"). Note that there are already many examples of problems caused by the fact that multicast applications do not rely on the 5-tuple that unicast applications employ, (often because of the liberal use of multiple groups, and also because of other levels of multiplexing). Hence RTP uses CSRC and SSRC fields to indicate the contributer, while some reliable multicast protocols use their own payload multiplex. This means that a simple port-MAP is not simple, or indeed effective (or even feasible in some cases as already discovered for port NATs with unicast RTP and RTCP on two ports!). In multicast, this means that we must understand SIP and SAP messages, as well as potentially RTSP and several others. The problem here is that we do not currently have the luxury of using the FQDN as a stable namespace to hang the translations from, as NATs do. MATs have to contend with a variety of session end point naming schemes. Last but not least, in inter-domain multicast, current practice is to use MSDP, which entails source advertisement. However, we envisage a set of MAT servers which coordinate around the edge of a core domain, and can typically be co-located with the MSDP servers in any case, so that the appropriate translations to source advertisements can be easily achieved. 3. Deployment A key problem with this is deployment. However, we have NAT capable routers in many places. I see this as a minor mod to a family of protocols. One interesting possibility to consider is that one could multicast the mappings. However, note that this would have to be done reliably (e.g. using PGM or SRM or similar). The same idea has been discussed in terms of scaling BGP (instead of a mesh of TCP connections, use a reliable multicast protocol) and also for link state information flooding. If we run out of Si's from the same prefix space, fall back to no translation, or else to tunnels. This, obviously should be configurable, as should the thresholds used to set the MAT aggregation. 4. Discussion This document is a stake in the turf concerning the use of address translation for multicast. A number of obvious questions need clarifying before the work can be continued (or discontinued). 0. Failure modes (the system should fail safe through appropriate use of soft state and timers). 1. Protocols that mention multicast addresses. We've mentioned some. We are sure there are many others! 2. Feedback - many protocols use feedback, even multicast protocols. For example, reliable multicast transport protocols such as PGM, RLC, and RMTP use feedback messages. Layered coded multicast protocols ("multi-rate congestion control") may have very interesting interactions with the proposed scheme here. Most notably, the aggregation here may make the reverse path trees different again. However, many multicast applications and transport protocols already have to deal with asymmetry (e.g. inter-domain unicast routes through BGP is asymmetric, and non-bi-dir PIM creates asymmetric routes). Note some systems have several levels of feedback (e.g. RTP, RTCP, where ports differ). If a port MAT was required, this would need further consideration. 3. Unicast - One can imagine a case of ultimate aggregation ,where a unicast translation is done. This might enable, for example, multicast islands to communicate via unicast only cores. Some researchers have commented that really fast (partially optical) future IP routers may be hard to build if they have to support packet replication for multicast fan out. Its possible that this scheme could provide a workaround for that. Note also that one could use a MAT as an alternative to tunnels for edge access from dial-up or other edge networks (e.g. non IGMP/multicast capable DSLAM or Cable Modem access nets), into a multicast capable core. Given these are the same places NATs get deployed, this could easily leverage that. 4. SSM- Source Specific multicast propagates IGMPv3 information. The interaction between that and this needs close examination. 5. The idea of Realm Specific Multicast IP (administrative scoped addresses and MATs) needs further exploration. etc 5. Acknowledgements Thanks are due to Colin Perkins for a discussion at NGC in November in London, England. 6. References [RFC2663] IP Network Address Translator (NAT) Terminology and Considerations, P. Srisuresh, M. Holdrege, Aug 1999. [RFC3102] Realm Specific IP: Framework, M. Borella, J. Lo, D. Grabelsky, G. Montenegro, October 2001. [RFC3022] Traditional IP Network Address Translator (Traditional NAT), P. Srisuresh, K. Egevang, Jan 2001. [Infocom 1998] Forwarding State Reduction for sparse mode multicast communications, J. Tian, G Neufeld, in Proc of IEEE Infocom 1998, March 1998. [Infocom 2000] On the aggregatability of multicast forwarding state, D. Thaler, M. Handley in Proc of IEEE Infocom 2000, March 2000 [NGC 2001] Aggregated Multicast with Inter-Group Tree Sharing, Aiguo Fei, Junhong Cei, Mario Gerla, Michalis Faloutsos in Proc NGC 2001, November, 2001, available online via http://www.cs.ucla.edu/NRL/hpi/papers.html 7. Security Considerations The E2E mantra is violated badly. Black holing attacks are very possible DDOS on the MATs is clearly quite a possibility. As usual, all the caveats of intermediate devices that require some information about higher levels apply. 8. IANA Considerations There are no IANA considerations regarding this document yet. If this note sees any subsequent work, then I would expect at least one protocol to emerge, in which case code points will be needed. AUTHORS' ADDRESSES Jon Crowcroft Marconi Professor of Communications Systems University of Cambridge Computer Laboratory William Gates Building J J Thomson Avenue Cambridge CB3 0FD UK E-Mail: Jon.Crowcroft@cl.cam.ac.uk Tel: +44 (0)1223 763633 Fax: +44 (0)1223 334 678 This draft was created in November 2001. It expires April 2002.