INTERNET-DRAFT Balaji Venkat HCL-Technologies India Pvt Limited, Expires June 1999 (HCL-Cisco software development center), chennai, india December 1998 MTU discovery using TCP MSS and Discussion on MSS value in SYN reply Status of this memo This document is an Internet-draft. Internet-drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-drafts. Internet-drafts are draft documents valid for a maximum of six months and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use Internet-drafts as reference material or cite them other than as " work in progress ". To learn the current status of any Internet-Draft, please check the "lid- abstracts.txt" listing contained in the Internet-Drafts Shadow directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific rim), ftp.ietf.org (US East coast), or ftp.isi.edu (US West Coast). Distribution of this memo is unlimited. Abstract Path MTU discovery as it exists now finds the least MTU of a given path. Traceroute through IP option [3] provides a method for finding the MTU on each hop using an ICMP message as a reply from the target host, with output link MTU in a portion of the message. The method proposed in this document intends to find the MTU on each hop on an internet path, without using the ICMP message for traceroute. This mechanism intends to acheive the same goal as the traceroute through IP option, but through a different mechanism. Discovery of the MTU of each router on a internet path would serve as a valuable network debugging tool. The way in which it is proposed to be implemented, it has the advantage of being automatically supported by all of the routers that support the TCP layer. It has a couple of disadvantages that it generates quite a few TCP packets and the amount of time it takes to run to discover each MTU along the path is quite substantial. This document specifies the MTU discovery mechanism with the existing IP and TCP options and the ICMP message types that Balaji Expires June 1999 [ Page 1 ] MTU Discovery December 1998 exist on all routers that support TCP layer in the internet. This method is suggested as an alternative to the Traceroute through IP option [3]. The intention is not to obsolete RFC 1393. This document also suggests that by default a reply SYN packet from a target host should include a MSS value that is derived from the MTU of the connected network of the incoming interface. Table of contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . .2 2. Path MTU discovery and MTU discovery today . . . . . . . .3 3. MTU discovery (an alternative) . . . . . . . . . . . . . .3 4. Leveraging from Traceroute . . . . . . . . . . . . . . . .3 5. TCP Maximum segment size . . . . . . . . . . . . . . . . .4 6. Basic Algorithm . . . . . . . . . . . . . . . . . . . . .6 7. Exceptions (where this wont work). . . . . . . . . . . . .7 8. References . . . . . . . . . . . . . . . . . . . . . . . .7 9. Author' s address . . . . . . . . . . . . . . . . . . . .7 Acknowledgements This proposal is a product of the author's idea. The mechanism proposed here is a further enhancement of the RFC 1191 by Mogul & Deering [1]. It utilizes the TCP connection setup and traceroute mechanisms (prior to RFC 1393) for achieving its purpose. 1. Introduction When a IP host transmits a datagram to a destination, the data is transmitted as a series of IP datagrams. It is recommended that these datagrams be of the largest size that does not require fragmentation anywhere along the path from the source to the destination. (For a further analysis of this topic, see [1]). This datagram is referred to as the Path MTU (PMTU), and it is equal to the minimum of the MTUs of each hop in the path. To discover the MTU of each hop on an internet path, there exists a traceroute with IP option mechanism as suggested by Malkin [3] that makes use of a ICMP message to get the output link MTU. The method suggested in this draft uses a method that offers an alternative mechanism (which is a combination of that employed by traceroute prior to RFC 1393 [3] and the TCP connection setup) to the traceroute with IP option. Balaji Expires June 1999 [ Page 2 ] MTU Discovery December 1998 2. Path MTU discovery and MTU discovery today The technique as it exists today, involves using the Dont Fragment bit in the IP header to dynamically discover the PMTU of a path. The basic idea is that a source host initially assumes that the PMTU of a path is the (known) MTU of its first hop, and sends all datagrams on that path with the DF bit set. If any of the datagrams are too large to be forwarded without fragmentation by some router along the path, that router will discard them and return ICMP "Datagram too big " message as per RFC 1191. Earlier to this the ICMP message sent was Destination Unreachable message with a code meaning "Fragmentation needed and DF set" [2]. The PMTU process of discovery ends when the host's estimate of the PMTU is low enough that its datagrams can be delivered without fragmentation. Or, the host may elect to end the discovery process by ceasing to set the DF bit in the datagram headers; it may do so for example, because it is willing to have datagrams fragmented in some circumstances. Normally, the host continues to set DF in all datagrams, so that if the route changes and the new PMTU is lower it will be discovered. As per RFC 1191, if an intermediate router has a MTU lower than size of the datagram and hence requires fragmentation, an ICMP message is sent with a field in the IP header field in the message meaning Datagram too big, that reports the MTU of the constricting hop. This method offers to provide the Path MTU and nothing more, in that it does not report the MTU of each intervening hop in the path. MTU discovery today involves using the ICMP message "Traceroute" to discover the MTU of each intermediate hop in an internet path. Setting an appropriate IP option (section 2.2 Malkin [3]) and sending the datagram to the target hop acheives this and prompts the target hop to send the ICMP "Traceroute" message with the output link MTU. 3. MTU discovery (An alternative) The mechanism proposed in this draft, intends to find the MTU of each intervening hop in a given path. This information would be provided using a technique that is a combination of traceroute prior to RFC 1393 and TCP connection setup. The MTU discovery mechanism would gather the information regarding each hop's MTU on a internet path and provide the same to the user of this mechanism. 4. Leveraging from traceroute This utility would leverage off traceroute as it existed prior to RFC 1393, in finding the intermediate hops to a destination on a given internet path. Balaji Expires June 1999 [ Page 3 ] MTU Discovery December 1998 Traceroute's algorithm would be required for that very purpose. This would be done as specified by the RFC 792 using the TTL field in the IP header [2]. This method does not intend to use the traceroute using IP option mechanism as suggested by Malkin [3]. In fact it intends to provide an alternative mechanism for discovering the MTU on each hop on a internet path. 5. TCP Maximum Segment Size. The other mechanism in this alternative method which would follow up what is done by traceroute, would be the initial packet exchange during the TCP connection setup. The maximum segment size (MSS) is the largest chunk of data that TCP will send to the other end. When a connection is established, each end can announce its MSS. The resulting IP datagram is normally 40 bytes larger; 20 bytes for the TCP header and 20 bytes for the IP header. When a connection is established, each end has the option of announcing the MSS it expects to receive. The SYN segment sent in the TCP connection setup contains the MSS option. If one end does not receive an MSS from the other end, a default of 536 bytes is assumed. When TCP sends a SYN segment, either because a local application wants to initiate a connection, or when a connection request is received from another host, it can send an MSS value up to the outgoing interface's MTU, minus the size of the fixed TCP and IP headers. For an Ethernet this implies an MSS of upto 1460 bytes. The destination to which the connection is intended MAY then announce its MSS value in the reply for the SYN. This is a method discussed by Mogul & Deering [1]. Some implementations set the MSS value in the reply SYN segment to the minimum of incoming interface MTU - 40 bytes and the default MSS (536) derived from the conservative maximum of 576. Limiting the MSS value to a minimum of the default MSS 536 and the value derived from MTU of the connected network, would in fact cause an unnecessary limiting of the segment to 536 bytes if in case the least MTU along the entire path is greater than 576. Why do we need to limit the size of the segment to that value which is lower than what is possible to be transmitted without fragmentation ? Thus the suggestion would be to always return the MTU derived value of the MSS to the connection seeking host. The suggestion gets its basis from what is suggested in section 3 of RFC 1191 [1]. The suggestion made in this draft slightly differs in its calculation of MSS from that proposed by RFC 1191 [1]. Section 3 of Mogul & Deering states "Actually, many implementations always send an MSS option, but set the value to 536 if the destination is non-local. This behaviour was correct when the internet was full of hosts that did not follow the rule that datagrams larger than 576 octets should not be be sent to non-local destinations. Now that most hosts do follow this rule, it is unnecessary to limit the value in the TCP MSS option to Balaji Expires June 1999 [ Page 4 ] MTU Discovery December 1998 536 for non-local peers. Moreover, doing this prevents PMTU discovery from discovering PMTUs larger than 576, so hosts SHOULD no longer lower the value they send in the MSS option. The MSS option should be 40 octets less than the size of the largest datagram the host is able to reassemble (MMS_R, as defined in [1]); in many cases, this will be the architectural limit of 65495 ( 65535 - 40 ) octets. A host MAY send an MSS value derived from the MTU of its connected network (the maximum MTU over its connected networks, for a multi-homed host); this should not cause problems for PMTU discovery, and may dissuade a broken peer from sending enormous datagrams)." The suggestion made by RFC 1191 states that the MTU returned should be the maximum of the MTUs over the connected networks. But the relevance of returning the maximum MTU of connected networks for a request for a TCP connection over a path that might not possibly be that path over which the maximum MTU is configured, is brought into question. Let us suppose that the maximum MTU of the connected networks in a host receiving a request for a TCP connection, belongs to an FDDI interface. If this FDDI interface is not the incoming interface for the packets to be sent through the requested TCP connection from a source, then returning an MSS derived from this FDDI interface would be erroneously projecting the maximum segment size receivable by that host on the true incoming interface. In that sense sending an MSS derived from maximum MTU of connected networks seems to be flawed. So we see that there is one set of implementations that are at one end of the spectrum, that always set the MSS for a non-local peer seeking connection to a conservative maximum of 536 and another set of implementations at the other end of the spectrum, that set the MSS derived from the maximum of the MTUs of all of the connected networks. There exists the median approach set of implementations that set the MSS to the MTU of the incoming interface. There are arguments for each of these but sadly a lack of uniformity. There are arguments for the maximum MTU approach that state that owing to assymetric nature of the paths and rapidly changing routes, to set a hard limit on the size of the datagram to be sent other than the maximum MTU, is a concern. Thus a more median approach or method would be to calculate the MSS value to be set in the MSS option in the SYN segment, based on the minimum of MTU (of the incoming interface) derived MSS and default MSS, where default MSS would be equal to the largest datagram the host can reassemble. But there is a problem here in that setting 65495 would quite possibly tickle (Mogul & Deering [1]) some IP implementations that have sign-bit bugs. Consider three hosts A , B and C connected in the manner shown in fig 1.0. Let us say the host C wants to initiate a TCP connection with host A. The MTUs of the various networks are as shown. The SYN of the host C is sent with an MSS value of 1460 which is MTU 1500 - 40 bytes. In reply to this the host A stack responds with MSS 256 which is the MTU of the outgoing interface on host A minus Balaji Expires June 1999 [ Page 5 ] MTU Discovery December 1998 40 bytes for the TCP and IP headers. +----------+ +-----------+ MTU=1500 +-----------+ | host A |-----------------| host B |--------------| host C | +----------+MTU=296 MTU=296 +-----------+ MTU= 1500 +-----------+ SYN <------------------------------------------------------ SYN ------------------------------------------------------> Fig 1.0 SYN with MSS This mechanism offers a way to obtain the MTU of the interface on each of the hops in a internet path. Utilizing this and the traceroute's mechanism of identifying the intermediate hosts, it would be possible to discover the MTU of each hop in an internet path. 6. Basic Algorithm. The basic algorithm for identifying the MTU on each hop would be to traceroute the intermediate hops to a given destination. Storing these values and then initiating a connection to the finger/http port of each host through an iterative method with a MSS value of 65535 in the outgoing SYN segment. Once the connection initiation is done, the SYN packet would send the source's MSS value and in reply the hop whose MTU is to be discovered would reply with its MSS value. On obtaining it the MTU of the said interface from that hop would be available by adding 40 to the returned value in the MSS portion of the TCP header. Iteratively going through the list of hops the MTU of each hop would be found. Once the MTU is computed a FIN packet would be sent to the finger/http port of the target hop and the connection closed with appropriate packets exchanged for connection closure. For those hops that do not support TCP layer as part of their stack implementation, there would be either a timeout (if the hop does not return a ICMP Unreachable error) from the source, or on the reciept of the ICMP Unreachable from the IP layer, a default value of 576 would be assumed as the MTU for that hop. Thus a value of 576 bytes returned would denote that the MTU discovery on that hop did not work. Some implementations set the MSS value to 536 if the MTU is more than 576 for non-local peers. In that case, effectively MTU would be assumed to be 576. If the finger/http port on a target hop is not available or if finger or http port is not supported on that hop, it would be viable for the discovery to try alternate ports of the kind that are available by default on most routers and are kept open. Balaji Expires June 1999 [ Page 6 ] MTU Discovery December 1998 A certain amount of overhead is expected in terms of TCP packet exchanges everytime a connection is sought to be setup and torn down for finding the MTU. 7. Exceptions (Where this wont work) This algorithm wont work in certain cases. In case the implementation returns 536 as MSS if the MTU of the interface on which the non-local connection seeker is accessible is greater than 576, then the MTU would have to be assumed to be 576 for that hop. In case the implementation returns the MSS as derived from the highest MTU of all its connected networks, then the highest MTU of its connected networks would be the one shown as the MTU for the incoming interface. This would indeed be erroneous if that MTU did not belong to the actual incoming interface. But most implementations calculate the MSS from the MTU of the incoming interface. Thus in most cases this mechanism would succeed in giving the proper picture regarding the MTU. 8. References [1] J.Mogul, S.Deering, Path MTU discovery RFC 1191, DECWRL and Stanford University, November 1990. [2] J. Postel, Internet Control Message Protocol. RFC 792, SRI Network Information Center, September 1981. [3] G.Malkin, Traceroute using an IP option, Xylogics, Inc, January 1993. 9. Author's address V.Balaji Venkat HCL-Technologies India Pvt Limited, (HCL-Cisco software development center), 49/50 Nelson Manickam road, Chennai - 600 029 Tamil Nadu, India. Phone : 091-44-481 9938 Fax : 091-44-481 9939 Email : bvenkat@cisco.com Balaji Expires June 1999 [ Page 7 ]