Transport Area Working Group B. Briscoe Internet-Draft BT Intended status: Informational March 02, 2011 Expires: September 3, 2011 Guidelines for Adding Congestion Notification to Protocols that Encapsulate IP draft-briscoe-tsvwg-ecn-encap-guidelines-00 Abstract The purpose of this document is to guide the design of congestion notification in any lower layer protocol that encapsulates IP. The aim is for explicit congestion signals to propagate consistently from lower layer protocols into IP. Then the IP internetwork layer can act as a portability layer to carry congestion notification from non- IP-aware congested nodes to the destination transport endpoint. Following these guidelines should assure interworking between new encapsulations of congestion notification, whether specified by the IETF or other standards bodies. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 3, 2011. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents Briscoe Expires September 3, 2011 [Page 1] Internet-Draft ECN Encapsulation Guidelines March 2011 carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Design Guidelines for Adding Congestion Notification to Protocols that Encapsulate IP . . . . . . . . . . . . . . . . 7 3.1. Indication of ECN Support in the Wire Protocol . . . . . . 7 3.2. Encapsulation Guidelines . . . . . . . . . . . . . . . . . 8 3.3. Decapsulation Guidelines . . . . . . . . . . . . . . . . . 9 3.4. Reframing and Congestion Markings . . . . . . . . . . . . 10 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 5. Security Considerations . . . . . . . . . . . . . . . . . . . 11 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 11 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 8. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 11 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 9.1. Normative References . . . . . . . . . . . . . . . . . . . 11 9.2. Informative References . . . . . . . . . . . . . . . . . . 12 Briscoe Expires September 3, 2011 [Page 2] Internet-Draft ECN Encapsulation Guidelines March 2011 1. Introduction Explicit Congestion Notification (ECN [RFC3168]) was defined in the IP header to allow a congested resource to notify the onset of congestion without having to drop packets, by explicitly marking a proportion of packets with the congestion experienced (CE) codepoint. Some subnetwork technologies (e.g. Frame Relay) have always supported explicit notification of congestion and it is gradually being added to others. The IETF would like to encourage this trend. Of course, the IETF does not have standards authority over every link layer protocol. So this document gives guidelines for designing propagation of congestion notification across the interface between IP and protocols that may encapsulate IP (i.e. that can be layered beneath IP). Each lower layer technology will exhibit slightly different issues and compromises, so the IETF or the relevant standards body must be free to define the specifics of each lower layer congestion notification scheme. However, if the guidelines below are followed, congestion notification should interwork between different technologies, using IP in its role as a 'portability layer'. Often link and physical layer resources are 'non-blocking' by design. In these cases congestion notification does not need to be implemented at the lower layer; ECN in IP would be sufficient. A degenerate example is a point-to-point Ethernet link. Excess loading of the link merely causes the queue from the higher layer to back up, while the lower layer remains immune to congestion. Even a whole meshed subnetwork can be made immune to interior congestion by careful network design; interior links can be sufficiently provisioned relative to the edge capacities to absorb even worst-case patterns of load. Nonetheless, other subnetworks can involve a mesh of links where traffic can converge on interior nodes and cause congestion. One way to support ECN in such cases has been to use so called "layer 3 switches". These are Ethernet switches that can bury into the Ethernet payload to find an IP header and manipulate or act on certain IP fields (Diffserv, ECN etc). For instance, in Data Center TCP [DCTCP], layer 3 switches are used to mark the ECN field of the IP header within the Ethernet payload. Although marking the IP header on a layer 3 switch is a useful tactical approach in a controlled environment such as within a data centre, it presents problems in more general settings. For instance, many Ethernet switches are not layer 3 switches so they cannot read and modify an IP payload. Also it might be costly to find an IP Briscoe Expires September 3, 2011 [Page 3] Internet-Draft ECN Encapsulation Guidelines March 2011 header (v4 or v6) when it may be encapsulated by more than one Ethernet header (e.g. when using multiple encapsulations of MAC in MAC [IEEE802.1Qah]). In these more general cases, ECN may best be supported by standardising explicit notification of congestion into the specific link layer protocol. It will then also be necessary to define how the end-station of the lower layer subnet propagates this explicit signal up to the IP internetwork layer. Many logical link technologies are based on self-contained protocol data units (PDUs). In these typical cases, at each decapsulation of an outer (lower layer) header, any congestion marking will have to be arranged to propagate into the forwarded (upper layer) header. It can then continue forwards, possibly picking up further congestion signals from congested resources along the way until it finally reaches the destination transport. Then typically the destination will feed this congestion notification back to the source transport. The purpose of this document is to guide the design of congestion notification in any lower layer, so that explicit congestion signals in PDUs at a lower layer will propagate consistently into PDUs of the upper IP layer. These guidelines are consistent with the way in which propagation of ECN has already been defined for IP-in-IP [RFC6040], IP-in-MPLS and MPLS-in-MPLS [RFC5129]. [RFC6040] brings the original ECN specification [RFC3168] and the specification of IPsec [RFC4301] into line. [RFC5129] defines how a label switched router can mark the outer MPLS shim header to signal congestion and, when the packets reach the decapsulator at the edge of the MPLS network, the marks can be propagated into the IP header (or into an inner MPLS header) and onward to the destination transport. 1.1. Scope This document only concerns wire protocol processing of explicit notification of congestion and makes no changes or recommendations concerning algorithms for congestion marking or congestion response. This document focuses on the congestion notification interface between IP (v4 or v6) and lower layer protocols that can encapsulate IP. However, it is likely that the guidelines will also be useful when a lower layer protocol encapsulates itself (e.g. Ethernet MAC in MAC [IEEE802.1Qah]) or when it encapsulates other protocols. In some layer 2 technologies, explicit congestion notification has been defined for use internally within the subnet, but the interface Briscoe Expires September 3, 2011 [Page 4] Internet-Draft ECN Encapsulation Guidelines March 2011 with ECN in IP has not been defined. If the lower layer has its own feedback and load regulation, there is no need to propagate congestion signalling up the layers. For instance, EFCI (explicit forward congestion indication) has been present in ATM [ITU-T.I.371] for a long time, but it has been used for internal control and management rather than being propagated to endpoint transports for them to control end-to-end congestion. FECN (forward ECN) was included in Frame Relay standards but Frame Relay has no internal rate control mechanisms. Therefore, as no interface to IP ECN has ever been defined, FECN is only used to detect where more capacity should be provisioned [Buck00]. In another example, quantised congestion notification (QCN - also known as backward congestion notification or BCN) is being defined for Ethernet [IEEE802.1Qau]. QCN avoids the need to define a congestion notification interface with IP by limiting itself to confined scenarios where all endpoints are directly attached by the same layer 2 technology, such as in server area networks. One aim of the guidelines below is to avoid an outcome where one congestion notification scheme has been defined for internal use within a subnetwork technology, but then another has to be defined for interfacing to IP. Whether some form of explicit congestion notification already exists in a lower layer protocol or whether it is being added, this document can be used to design the interface to propagate explicit congestion notification between the lower layer protocol and IP. S.9.3 of RFC3168 on adding ECN to IP [RFC3168] pointed out that the IETF might in future want to define how ECN should be added to IETF protocols that encapsulate IP, such as L2TP [RFC2661], GRE [RFC1701, RFC2784] or PPTP [RFC2637]. More recently, the IETF is working on adding ECN support as a TRILL header option [trill-rbridge-options]. This document is intended to provide design guidelines for any protocol that might encapsulate IP, whether maintained by the IETF or by other standards bodies. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Further terminology used within this document: Briscoe Expires September 3, 2011 [Page 5] Internet-Draft ECN Encapsulation Guidelines March 2011 Protocol data unit (PDU): Information that is delivered as a unit among peer entities of a layered network consisting of protocol control information (typically a header) and possibly user data of that layer. The scope of this document includes layer 2 and layer 3 networks, where the PDU is respectively termed a frame or a packet. PDU is a general term for either. It also includes a payload with a shim header lying somewhere between layer 2 & 3. Encapsulator: The link or tunnel endpoint function that adds an outer header to a PDU (also termed the 'link ingress', the 'ingress tunnel endpoint' or just the 'ingress' where the context is clear). Decapsulator: The link or tunnel endpoint function that removes an outer header from a PDU (also termed the 'link egress', the 'egress tunnel endpoint' or just the 'egress' where the context is clear). Incoming header: The header of an arriving PDU before encapsulation. Outer header: The header added to encapsulate a PDU. Inner header: The header encapsulated by the outer header. Outgoing header: The header forwarded by the decapsulator using logic that combines the fields in the outer and inner headers. ECN-PDU: A PDU destined for an ECN-capable transport (i.e. a transport that will understand explicit congestion notifications). This is intended to be a general term for a PDU at any layer, not just an IP PDU. An IP packet with a non-zero ECN field would be an ECN-PDU, but the term is intended to also be used to describe PDUs of protocols that encapsulate IP packets. Not-ECN-PDU: A PDU destined for a transport that is not ECN-capable. Load Regulator: For each flow of PDUs, the transport function that is capable of controlling the data rate. Typically located at the data source, but in-path nodes can regulate load in some congestion control arrangements (e.g. admission control or policing nodes). Note the term "a function capable of controlling the load" deliberately includes a transport that doesn't actually control the load but ought to (e.g. an application without congestion control that uses UDP). Briscoe Expires September 3, 2011 [Page 6] Internet-Draft ECN Encapsulation Guidelines March 2011 Congestion Baseline: The function that created (or most recently reset) the congestion notification field. 3. Design Guidelines for Adding Congestion Notification to Protocols that Encapsulate IP These guidelines are consitent with the guidelines on the design of alternate schemes for IP tunnelling of the ECN field [RFC6040] and the more general best current practice for the design of alternate ECN schemes given in [RFC4774]. The capitalised term 'SHOULD (NOT)' has been used in preference to 'MUST (NOT)' because it is difficult to know the compromises that will be necessary in each protocol design. If a particular protocol design chooses to contradict a 'SHOULD (NOT)' given in the advice below, it MUST include a sound justification. 3.1. Indication of ECN Support in the Wire Protocol An active queue management (AQM) scheme SHOULD NOT apply explicit congestion notifications to PDUs that are destined for legacy transports that will not understand them. Therefore the lower layer needs to be able to distinguish whether PDUs are destined for an ECN- capable transport or not. We use the term ECN-PDUs for a PDU that is destined for an ECN-capable transport, and Not-ECN-PDU for one destined for a transport that will not understand ECN. In IP, if the ECN field in each PDU is cleared to the Not-ECT (not ECN-capable transport) codepoint, it indicates that the transport will not understand congestion markings. The mechanism a lower layer uses to distinguish the ECN-capability of PDUs need not mimic that of IP, but it should achieve the same outcome. For instance, ECN- capable transports might only be allowed to use PDUs identified by a particular set of labels or tags. Alternatively, logical link protocols that use flow state might determine whether a PDU should be congestion marked by checking for ECN-support in the flow state. Whatever mechanisms might be invented, it SHOULD be possible for the lower layer protocol not to forward congestion on Not-ECN-PDUs destined for transports that will not understand them. The per-domain checking of ECN support in MPLS [RFC5129] is a good example of a way to avoid sending congestion markings to transports that will not understand them. In MPLS, header space is extremely limited, therefore RFC5129 does not provide a field in the MPLS header to indicate that the PDU is destined for an ECN-capable transport. Instead, interior nodes in a domain are allowed to set explicit congestion indications without checking whether the PDU is destined for a transport that will understand them. Nonetheless, Briscoe Expires September 3, 2011 [Page 7] Internet-Draft ECN Encapsulation Guidelines March 2011 this is made safe by requiring that the network operator upgrades all decapsulating edges of a whole domain to support ECN at once. Therefore, on decapsulation there will always be a check that the higher layer transport is ECN-capable (by checking for the Not-ECT codepoint in the inner IP header). If the decapsulator discovers that the higher layer (inner header) indicates the transport will not understand ECN, it drops an ECN-marked packet on behalf of the earlier congested node (see Decapsulation Guideline Paragraph 1 in Section 3.3). Note that it was only appropriate to define such an incremental deployment strategy because MPLS is targeted solely at professional operators, who can be expected to ensure that a whole subnetwork is consistently configured. This strategy might not be appropriate for other link technologies targeted at zero-configuration deployment by the general public (e.g. Ethernet). For such 'plug-and-play' environments it will be necessary to invent a failsafe approach that ensures congestion markings will never fall into black holes however inconsistently a system is put together. Alternatively, congestion notification relying on correct system configuration could be confined to flavours of Ethernet intended only for professional network operators, such as IEEE 802.1ah Provider Backbone Bridges (PBB). 3.2. Encapsulation Guidelines 1. Even if an incoming PDU is capable of understanding ECN (an ECN- PDU), the ingress to a subnet SHOULD disable ECN when it forwards the PDU at the lower layer (e.g. by setting the outer header of the PDU to identify it as a Not-ECN-PDU) unless the ingress can guarantee that the corresponding egress of the link will correctly propagate any congestion markings added to the outer header across the subnet. This guarantee might be provided: * by inherent design of the protocol * by configuration * by the ingress explicitly checking that the egress propagates ECN. 2. Once the ingress to a subnet has established that ECN will be correctly propagated at the egress, on encapsulation it SHOULD encode the same level of congestion in outer headers as is arriving in incoming headers. For example it might copy any incoming congestion notification into the outer header of the lower layer protocol. Briscoe Expires September 3, 2011 [Page 8] Internet-Draft ECN Encapsulation Guidelines March 2011 This meets the requirement that all outer headers ought to reflect congestion accumulated along the whole upstream path, not just since the ingress of the subnet. More precisely, congestion notifications in outer headers SHOULD reflect congestion since the node that is regulating the load (the Load Regulator). This guideline is intended to ensure that any bulk congestion monitoring further downstream is meaningful (e.g. by a network management node monitoring ECN in passing frames). The level of ECN in passing frames is not meaningful unless the Congestion Baseline (where congestion notifications start at zero) is known. It is not useful to start a new Congestion Baseline at the start of each subnet, but it is useful to define the Congestion Baseline at the node that is regulating the load (the Load Regulator). In some circumstances (e.g. pseudowire emulations with link-local flow control), the whole path is divided into segments, each with its own congestion notification and feedback loop. In these cases, the function that regulates load at the start of each segment will need to reset congestion notification (i.e. clear any accumulated congestion notifications) at the start of its segment. 3.3. Decapsulation Guidelines Congestion notification SHOULD NOT simply be copied from outer headers to the forwarded header. The outgoing congestion notification field SHOULD be calculated from the inner and outer headers, using the following rules. If there is any conflict, rules earlier in the list take precedence over rules later in the list: 1. If the arriving inner header is a Not-ECN-PDU it implies the transport will not understand explicit congestion markings. Then: * If the outer header carries an explicit congestion marking, the packet SHOULD be dropped--the only indication of congestion the transport will understand. * If the outer is an ECN-PDU that carries no indication of congestion or a Not-ECN-PDU the PDU SHOULD be forwarded, but still only as a Not-ECN-PDU. 2. If the outer header does not support explicit congestion notification (a Not-ECN-PDU), but the inner header does (an ECN- PDU), the inner header SHOULD be forwarded unchanged. Briscoe Expires September 3, 2011 [Page 9] Internet-Draft ECN Encapsulation Guidelines March 2011 3. In some lower layer protocols congestion may be signalled as a numerical level, such as in quantised congestion notification [IEEE802.1Qau]. If such an encoding encapsulates an ECN-capable IP packet, a function will be needed to convert the quantised congestion level into the frequency of congestion markings in outgoing IP packets. 4. Congestion indications may be ranked by severity. For instance no congestion would be the weakest indication, with possibly increasing levels of congestion encoded by increasingly stronger indications, e.g. pre-congestion notification (PCN) can be encoded at two severity levels [pcn-3-in-1-encoding]. If the arriving inner header is an ECN-PDU, where the inner and outer headers carry indications of congestion of different severity, the more severe indication SHOULD be forwarded in preference to the less severe. Obviously, if the severities in both inner and outer are the same, the same severity should be forwarded. 5. The inner and outer headers might carry a combination of congestion notification fields that should not be possible given any currently used protocol transitions. For instance, if Encapsulation Guideline Paragraph 2 in Section 3.2 had been followed, it should not be possible to have a less severe indication of congestion in the outer than in the inner. It MAY be appropriate to log unexpected combiantions of headers and possibly raise an alarm. If a safe outgoing codepoint can be defined for such a PDU, the PDU SHOULD be forwarded rather than dropped. Some implementers discard PDUs with currently unused combinations of headers just in case they represent an attack. However, an approach using alarms is preferable so that operators can keep track of possible attacks but currently unused combinations are not precluded from use in future standards actions. 3.4. Reframing and Congestion Markings Where framing boundaries are different between two layers, congestion indications SHOULD be propagated on the basis that a congestion indication on a PDU applies to all the octets in the PDU. On average, an encapsulator or decapsulator SHOULD approximately preserve the number of marked octets arriving and leaving (counting the size of inner headers, but not added encapsulating headers). An algorithm for reframing congestion indications over different sized PDUs SHOULD NOT hold any marked octets back to be signalled in Briscoe Expires September 3, 2011 [Page 10] Internet-Draft ECN Encapsulation Guidelines March 2011 later frames. For instance, a reframing algorithm might maintain a count of marked octets arriving and departing. Such an algorithm SHOULD propagate a congestion indication in the next departing PDU if there have been more arriving marked octets than departing, even if after marking the next PDU the count of departing marked octets will be greater than those arriving. 4. IANA Considerations This memo includes no request to IANA. 5. Security Considerations {TBA} 6. Conclusions {TBA} 7. Acknowledgements Bob Briscoe is partly funded by Trilogy, a research project (ICT- 216372) supported by the European Community under its Seventh Framework Programme. The views expressed here are those of the author only. 8. Comments Solicited Comments and questions are encouraged and very welcome. They can be addressed to the IETF Transport Area working group mailing list , and/or to the authors. 9. References 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, September 2001. [RFC4774] Floyd, S., "Specifying Alternate Semantics for the Explicit Congestion Notification (ECN) Field", BCP 124, RFC 4774, Briscoe Expires September 3, 2011 [Page 11] Internet-Draft ECN Encapsulation Guidelines March 2011 November 2006. 9.2. Informative References [Buck00] Buckwalter, J., "Frame Relay: Technology and Practice", Pub. Addison Wesley ISBN-13: 978- 0201485240, 2000. [DCTCP] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel, P., Prabhakar, B., Sengupta, S., and M. Sridharan, "Data Center TCP (DCTCP)", ACM SIGCOMM CCR 40(4)63--74, October 2010, . [IEEE802.1Qah] IEEE, "IEEE Standard for Local and Metropolitan Area Networks--Virtual Bridged Local Area Networks--Amendment 6: Provider Backbone Bridges", IEEE Std 802.1Qah-2008, August 2008, . (Access Controlled link within page) [IEEE802.1Qau] IEEE, "IEEE Standard for Local and Metropolitan Area Networks--Virtual Bridged Local Area Networks - Amendment 10: Congestion Notification", 2009, . (Work in Progress; Access Controlled link within page) [ITU-T.I.371] ITU-T, "Traffic Control and Congestion Control in B-ISDN", ITU-T Rec. I.371 (03/04), March 2004. [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, "Generic Routing Encapsulation (GRE)", RFC 1701, October 1994. [RFC2637] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, W., and G. Zorn, "Point-to-Point Tunneling Protocol", RFC 2637, July 1999. [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", RFC 2661, Briscoe Expires September 3, 2011 [Page 12] Internet-Draft ECN Encapsulation Guidelines March 2011 August 1999. [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, March 2000. [RFC4301] Kent, S. and K. Seo, "Security Architecture for the Internet Protocol", RFC 4301, December 2005. [RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion Marking in MPLS", RFC 5129, January 2008. [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion Notification", RFC 6040, November 2010. [pcn-3-in-1-encoding] Briscoe, B., Moncaster, T., and M. Menth, "Encoding 3 PCN-States in the IP header using a single DSCP", draft-ietf-pcn-3-in-1-encoding-04 (work in progress), January 2011. [trill-rbridge-options] 3rd, D., Ghanwani, A., Bestler, C., and V. Manral, "RBridges: TRILL Header Options", draft-ietf-trill-rbridge-options-03 (work in progress), October 2010. Author's Address Bob Briscoe BT B54/77, Adastral Park Martlesham Heath Ipswich IP5 3RE UK Phone: +44 1473 645196 EMail: bob.briscoe@bt.com URI: http://bobbriscoe.net/ Briscoe Expires September 3, 2011 [Page 13]