TRILL Working Group Y. Li Internet Draft Huawei Technologies Intended status: Informational July 4, 2010 Expires: January 2011 Requirements for multicast to unicast optimization in Trill draft-liyz-trill-reqs-mlt2uni-opt-00.txt Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on January 5, 2009. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Y. Li Expires January 5, 2011 [Page 1] Internet-Draft Reqs for mlt2uni optimization July 2010 Abstract Instead of always blindly forwarding multicast/broadcast frames through a distribution tree in Trill, some optimizations can be done to convert multicast/broadcast to unicast for certain IP controlling packets. This draft shows the scenarios in which such optimizations are needed. Table of Contents 1. Introduction.................................................2 2. Conventions used in this document............................3 3. Problem Statement............................................3 3.1. DHCP....................................................3 3.2. ARP for address resolution..............................4 3.3. NS/NA for DAD...........................................5 4. Optimization requests in Trill...............................6 5. Security Considerations......................................7 6. IANA Considerations..........................................7 7. Conclusions..................................................7 8. References...................................................7 8.1. Normative References....................................7 8.2. Informative References..................................8 9. Acknowledgments..............................................8 1. Introduction In traditional IP networks, some controlling frames such as DHCP replies [RFC2131], NS/NA(neighbor solicitation/neighbor advertisement) for duplicate address detection [RFC4861] in IPv6 may be multicasted/broadcasted. The basic reason behind is no valid IP address for an end station available at the time the IP packet is delivered so that broadcast or multicast has to be used to reach the end station. In addition, packets like ARP and NS/NA for address resolution are also required to be flooded in the whole VLAN. Subsequently their L2 frame is also multicasted/broadcasted. Unlike traditional bridge, RBridge defined in Trill maintains the forwarding table for MAC address of the end stations and the corresponding egress RBridge. The end station is typically reachable by the egress RBridge. Such information is available at the ingress RBridge. Theoretically the ingress RB does not need to flood the frames to all other RBs just for the reason of undefined IP address. Ingress RB may choose to unicast the frame to the right egress RB with proper Trill encapsulation without changing the native Ethernet frame in order to save bandwidth resource. Y. Li Expires January 5, 2011 [Page 2] Internet-Draft Reqs for mlt2uni optimization July 2010 This draft tries to identify and analyze if unicasting instead of flooding in L2 by RBridges can be used for various controlling frames. Types of frames that may be appropriate for such optimization include o DHCP o ARP for address resolution o NS/NA for DAD (duplicate address detection) It is noted that ordinary multicast/broadcast data frames are not candidates for multicast-to-unicast optimization. 2. Conventions used in this document The same terminology and acronyms are used in this document as in [RFCtrill]. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC2119]. 3. Problem Statement Some frames are meant for single listener only though they are multi- destination in format. With the scaling of network and increased diversity of VLAN usage, flooding is expected to be avoided as much as possible to save the resource and minimize the potential security problem. The following sub sections will examine them in details. 3.1. DHCP DHCP client broadcasts DHCPDISCOVER message for address application and DHCP server replies with DHCPOFFER. The client may choose one from multiple offers and send DHCPREQUEST to DHCP server. If the selected DHCP server successfully allocates the IP address to the client, it replies with DHCPACK otherwise it replies with DHCPNAK. DHCP replies normally are broadcast frames since the client may not be able to receive unicast IP packet destined to the going-to-be- assigned IP address before whole DHCP process finishes. In Trill ingress RB of DHCP replies will deliver the broadcast frame to the proper distribution tree. Such procedures encounters flooding problem in large layer 2 network. There are some existing protocol in-band methods to eliminate such broadcast. Y. Li Expires January 5, 2011 [Page 3] Internet-Draft Reqs for mlt2uni optimization July 2010 1. DHCP L2 snooping and relay. In fixed broadband network, access node (e.g. DSLAM) may act as a DHCP L2 relay. It inserts DHCP relay agent option [RFC3046], normally known as option 82, to the DHCP message. Circuit-Id is one of the information of relay agent option which identifies the port connecting to the subscriber. When DHCP replies reach the access node, access node checks the circuit-id information and unicasts the frame to the subscriber through the destined port. Option 82 was originally designed for L3 DHCP relay agent though some implementations use it for L2 equipment too. Draft Layer 2 Relay Agent Information [I-D.ietf-dhc-l2ra] describes the detailed behaviors of DHCP L2 relay agent. If only egress RB of DHCP replies enabled with DHCP L2 relay agent function, it does not help to make the frame to be unicast from ingress to egress RB. Cascaded DHCP L2 relay agent may be useful but it is unclear how such relay agents work and what optimization to Trill encapsulation should be done to adapt their behavior. 2. Multicast at L3 but unicast at L2. Destination MAC address is replaced by the content of chaddr field of DHCP reply message [I-D. ietf-dhc-l2ra-extensions]. It prevents DHCP server or DHCP L3 relay agent from flooding the DHCP replies to L2 relay agent. The requirement for this approach is the correct deployment of L2 and/or L3 DHCP relay agent at appropriate RBs. Trill is used in a relatively large scale L2 network with mixed RBs and normal bridges. It may be preferred to have optimization on RBs to make the DHCP replies unicast from ingress to egress in Trill domain without touching the content of native frames. Other than DHCP replies, it may be desired to optimize DHCP requests (DHCPDISCOVER/DHCPREQUEST) too. If RB is configured to know the egress point of DHCP server or DHCP layer 3 relay agent, DHCP requests can be unicast to the proper edge RB. Such configuration is probably distributed via link state information. 3.2. ARP for address resolution ARP queries are broadcast frames. In the earlier version of RBridge base protocol, designated RB for an ARP reply may optionally send a link state update to inform other RBs about the binding information. Each RB caches the ARP bindings and may send local reply for a particular ARP request or send a unicast ARP request to target instead of flood. Y. Li Expires January 5, 2011 [Page 4] Internet-Draft Reqs for mlt2uni optimization July 2010 It is a practical way to convert ARP request from broadcast to unicast. Detailed optimization is expected to be described in a separate document in TRILL WG (according to the updated charter of TRILL WG). The following points should be covered, o Format and frequency of binding information distributed via link state update. The amount of changes (including creation, aging and refreshing) of ARP binding information is probably huge at peak time. The frequency for sending the update should be carefully considered. o Aging time. Typically ARP entry at switch and end host has aging time. The dealing of inconsistency of ARP entry at edge RBs should be described. Longer aging time will result in less flooding but longer detection of stale record and higher possibility of packet loss. If end host nomadism was within consideration, the problem would be more complicated. 3.3. NS/NA for DAD Though ND [RFC4861] in IPv6 performs similar functions as ARP in IPv4, there are differences which should be considered in optimization. o Individual address resolution: request is sent in broadcast/multicast and reply is sent in unicast. ARP and ND perform almost the same in this case. Section 3.2 has covered this scenario. o Duplicate address detection: gratuitous ARP is sent in broadcast and if there is duplication ARP reply is returned to the sender in unicast. While in IPv6 DAD-NS and DAD-NA are both sent in multicast. The requests (ARP request and DAD-NS) can be optimized as that in section 3.2. It may be wanted that requests for DAD should always be identified so that RB with hit entry of duplicated address in local cache at RBridge does not do the local reply. In addition, optimization of DAD-NA may be required and is described in the later part of this sub section. o Address advertisement: gratuitous ARP is sent in broadcast in IPv4 and unsolicited NA is sent in multicast in IPv6. Unlike ARP/ND for address resolution, such frames may be identified not to trigger the link state information to be distributed as end nodes and RBs may learn the binding information directly from the flooded frames. Y. Li Expires January 5, 2011 [Page 5] Internet-Draft Reqs for mlt2uni optimization July 2010 To explain the second bullet above in details, NS/NA messages in IPv6 basically are used for duplicate address detection(DAD), address resolution, and neighbor unreachability detection [RFC4861]. NS message with unspecified address as source address indicates a DAD request. A responding NA will be multicasted to all-nodes multicast address in case of address duplication. Target link-layer address field is set to the source link layer address of the solicited NS. If DAD-NA message was optimized to be delivered directly to the solicited sender instead of multicasting, it will not affect the correct function of DAD process. It is uncertain if multicasted DAD- NA message helps in distributing source IP and MAC binding information. If it is true, changing DAD-NA to unicast may have some minor side-effect. However, ordinarily binding information is learned from NS/NA for address resolution and probably unsolicited NA. DAD-NA is not a major learning source for information binding. There are similar ways to avoid flooding of DAD-NA as those in DHCP, e.g. add DHCP option 82 like information to NS/NA [I-D.li-6man-ns- mark] and multicast at L3 but unicast at L2 [I-D.gundavelli-v6ops-l2- unicast]. They are depending on either the proper deployment of agent-like node or the modification to the native frame. 4. Optimization requests in Trill ARP optimization is believed to be a requirement in Trill which needs more elaborated design. One of the functions of ARP optimization is to reduce the flooding among RBridges. In addition to ARP, it is expected that other frames such as DHCP and DAD-NA to be considered for optimization too. Trill may optionally provide the multicast to unicast optimization to the frames. Such optimization is within Trill domain only, i.e. between ingress to egress, and is compatible with current DHCP/ARP/ND implementation. Other nodes like ordinary bridges operate in their normal way which can be retaining the multicasting. Trill naturally tunnels native Ethernet frame from ingress RB to egress RB without the need of modifying the destination MAC address of native frame. Its working does not rely on the deployment of protocol in-band mechanism such as DHCP relay agent. Y. Li Expires January 5, 2011 [Page 6] Internet-Draft Reqs for mlt2uni optimization July 2010 5. Security Considerations Multicast to unicast optimization requires the snooping of specific frames at edge RB. However it does not introduce extra security vulnerability. The methods such as rate limiter and trusted port filter should be used normally. 6. IANA Considerations This document requires no IANA actions. 7. Conclusions Flooding has both pros and cons in Ethernet. It makes broadcast data frames naturally reaching every end station in a VLAN; on the other hand, it is a big burden for a large scaled network. Multicast to unicast optimization can be optionally used to decrease flooding. We suggest considering it for DHCP and DAD-NA messages together with ARP optimization. 8. References 8.1. Normative References [RFCtrill] R. Perlman, D. Eastlake, D. Dutt, S. Gai, and A. Ghanwani, "RBridges: Base Protocol Specification", draft-ietf-trill- rbridge-protocol-16.txt, work in progress. [RFC2131] Droms, R., "Dynamic Host Configuration Protocol", RFC 2131, March 1997. [RFC3046] Patrick, M., "DHCP Relay Agent Information Option", RFC 3046, January 2001. [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, September 2007. [I-D.ietf-dhc-l2ra] Joshi, B. and Kurapati, P., "Layer 2 Relay Agent Information", draft-ietf-dhc-l2ra-04, April 2009. [I-D.ietf-dhc-l2ra-extensions] Joshi, B., Kurapati, P., Kamath, M., De Cnodder, S., "Extensions to Layer 2 Relay Agent", draft- ietf-dhc-l2ra-extensions-01, April 2009. Y. Li Expires January 5, 2011 [Page 7] Internet-Draft Reqs for mlt2uni optimization July 2010 8.2. Informative References [I-D.gundavelli-v6ops-l2-unicast] Gundavelli, S., Townsley, M., Troan, O., Dec, W., "Unicast Transmission of IPv6 Multicast Messages on Link-layer", draft-gundavelli-v6ops-l2-unicast- 00, February 2010 [I-D.li-6man-ns-mark] Li, H. and Li, Y., "Line identification in IPv6 Neighbour Solicitation messages", draft-li-6man-ns-mark-00, July 2009 9. Acknowledgments This document was prepared using 2-Word-v2.0.template.dot. Y. Li Expires January 5, 2011 [Page 8] Internet-Draft Reqs for mlt2uni optimization July 2010 Authors' Addresses Li Yizhou Huawei Technologies 101 Software Avenue, Nanjing 210012 China Phone: +86-25-56622310 Email: liyizhou@huawei.com Y. Li Expires January 5, 2011 [Page 9]