Fragmentation Avoidance in DNS

Fragmentation Avoidance in DNS Japan Registry Services Co., Ltd.

Chiyoda First Bldg. East 13F, 3-8-1 Nishi-Kanda Chiyoda-ku, Tokyo 101-0065 Japan +81 3 5215 8451 fujiwara@jprs.co.jp

none

11400 La Honda Road Woodside, CA 94062 United States of America +1 650 393 3994 paul@redbarn.org

operations Internet-Draft EDNS0 enables a DNS server to send large responses using UDP and is widely deployed. Path MTU discovery remains widely undeployed due to security issues, and IP fragmentation has exposed weaknesses in application protocols. Currently, DNS is known to be the largest user of IP fragmentation. It is possible to avoid IP fragmentation in DNS by limiting response size where possible, and signaling the need to upgrade from UDP to TCP transport where necessary. This document proposes to avoid IP fragmentation in DNS.

DNS has EDNS0 mechanism. It enables a DNS server to send large responses using UDP. EDNS0 is now widely deployed, and DNS (over UDP) is said to be the biggest user of IP fragmentation. Fragmented DNS UDP responses have systemic weaknesses, which expose the requestor to DNS cache poisoning from off-path attackers. (See for references and details.) summarized that IP fragmentation introduces fragility to Internet communication. The transport of DNS messages over UDP should take account of the observations stated in that document. TCP avoids fragmentation using its Maximum Segment Size (MSS) parameter, but each transmitted segment is header-size aware such that the size of the IP and TCP headers is known, as well as the far end’s MSS parameter and the interface or path MTU, so that the segment size can be chosen so as to keep the each IP datagram below a target size. This takes advantage of the elasticity of TCP’s packetizing process as to how much queued data will fit into the next segment. In contrast, DNS over UDP has little datagram size elasticity and lacks insight into IP header and option size, and so must make more conservative estimates about available UDP payload space. This document proposes to set IP_DONTFRAG / IPV6_DONTFRAG in DNS/UDP messages in order to avoid IP fragmentation, and describes how to avoid packet losses due to IP_DONTFRAG / IPV6_DONTFRAG.

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP14 when, and only when, they appear in all capitals, as shown here. “Requestor” refers to the side that sends a request. “Responder” refers to an authoritative, recursive resolver or other DNS component that responds to questions. (Quoted from EDNS0 ) “Path MTU” is the minimum link MTU of all the links in a path between a source node and a destination node. (Quoted from ) “Path MTU discovery” is defined by , and . IP_DONTFRAG option is not defined by any RFCs. It is similar to IPV6_DONTFRAG option defined in . IP_DONTFRAG option is used on BSD systems to set the Don’t Fragment bit when sending IPv4 packets. On Linux systems this is done via IP_MTU_DISCOVER and IP_PMTUDISC_DO. Many of the specialized terms used in this document are defined in DNS Terminology .

The methods to avoid IP fragmentation in DNS are described below:

UDP responders SHOULD send DNS responses with IP_DONTFRAG / IPV6_DONTFRAG options. If the UDP responder detects immediate error that the UDP packet cannot be sent beyond the path MTU size (EMSGSIZE), the UDP responder MAY recreate response packets fit in path MTU size, or TC bit set. UDP responders MAY probe to discover the real MTU value per destination. UDP responders SHOULD compose UDP responses that result in IP packets that do not exceed the path MTU to the requestor. If the path MTU discovery failed or is impossible, UDP responders SHOULD compose UDP responses that result in IP packets that do not exceed the default maximum DNS/UDP payload size described in . The cause and effect of the TC bit is unchanged from EDNS0 .

UDP requestors SHOULD send DNS requests with IP_DONTFRAG / IPV6_DONTFRAG options. UDP requestors MAY probe to discover the real MTU value per destination. Then, calculate their maximum DNS/UDP payload size as the reported path MTU minus IPv4/IPv6 header size (20 or 40) minus UDP header size (8). If the path MTU discovery failed or is impossible, use the default maximum DNS/UDP payload size described in . UDP requestors SHOULD use the requestor’s payload size as the calculated or the default maximum DNS/UDP payload size. UDP requestors MAY drop fragmented DNS/UDP responses without IP reassembly to avoid cache poisoning attacks. DNS responses may be dropped by IP fragmentation. Upon a timeout, UDP requestors may retry using TCP or UDP, per local policy.

Fragmentation avoidance is achieved with the IP(V6)_DONTFRAG option. The purpose of packet size limitation is to decrease packet loss due to the effects of the IP(V6)_DONTFRAG option. Default maximum DNS/UDP payload size depends on the connectivity of each node, it cannot be determined unconditionally. However, there are good proposed values. Operators MAY select a good number from . Details of proposed values are described in . Source IPv4 IPv6 Minimal: RFC 4035 MUST 1220 1220 Software developers / DNSFlagDay2020 propose 1232 1232 (1280-40-8) Authors’ recommendation 1400 1400 (1500 -40 -8 - some headers) Maximum: Ethernet MTU 1500 1472 (1500-20-8) 1452 (1500-40-8) Measured MTU -20-8 MTU -40-8 However, operators of DNS servers SHOULD measure their path MTU to the Internet at setting up DNS servers (and when network configuration changes). How to measure path MTU is described in . Operators of authoritative servers (that offer global DNS zones) and full-service resolvers (that access authoritative servers of the global DNS) SHOULD measure their path MTU to well-known locations on the Internet, such as [a-m].root-servers.net or [a-m].gtld-servers.net. Operators of full-service resolvers would be well advised to measure their path MTU to several authority name servers and to a random sample of their expected stub resolver client networks, to find the upper boundary on IP/UDP packet size in the average case. Or, operators of ISPs know their customers’ connectivity and customers’ MTU to ISPs’ servers. This limit should not be exceeded by most messages received or transmitted by a full resolver, or else fallback to TCP will occur too often. DNS clients (stub resolvers) need to specify an appropriate requestor’s payload size when supporting EDNS0. In case of CPEs, embedded devices, and user devices, network operators can not control them, developers may choose small values such as 1220 and 1232. Other DNS servers are out-of-scope of this document. (For example, Forwarding only resolvers, or private DNS).

The proposed method supports incremental deployment. When a full-service resolver implements the proposed method, its stub resolvers (clients) and the authority server network will no longer observe IP fragmentation or reassembly from that server, and will fall back to TCP when necessary. When an authoritative server implements the proposed method, its full service resolvers (clients) will no longer observe IP fragmentation or reassembly from that server, and will fall back to TCP when necessary.

Large DNS responses are the result of zone configuration. Zone operators SHOULD seek configurations resulting in small responses. For example, Use smaller number of name servers (13 may be too large) Use smaller number of A/AAAA RRs for a domain name Use ‘minimal-responses’ configuration: Some implementations have ‘minimal responses’ configuration that causes DNS servers to make response packets smaller, containing only mandatory and required data (). Use smaller signature / public key size algorithm for DNSSEC. Notably, the signature size of ECDSA or EdDSA is smaller than RSA.

In prior research ( and dns-operations mailing list discussions), there are some authoritative servers that ignore EDNS0 requestor’s UDP payload size, and return large UDP responses. It is also well known that there are some authoritative servers that do not support TCP transport. Such non-compliant behavior cannot become implementation or configuration constraints for the rest of the DNS. If failure is the result, then that failure must be localized to the non-compliant servers.

This document has no IANA actions.

The author would like to specifically thank Paul Wouters, Mukund Sivaraman, Tony Finch, Hugo Salgado, Peter van Dijk, Brian Dickson, Puneet Sood and Jim Reid for extensive review and comments.

Internet Protocol Path MTU discovery This memo describes a technique for dynamically discovering the maximum transmission unit (MTU) of an arbitrary internet path. It specifies a small change to the way routers generate one type of ICMP message. For a path that passes through a router that has not been so changed, this technique might not discover the correct Path MTU, but it will always choose a Path MTU as accurate as, and in many cases more accurate than, the Path MTU that would be chosen by current practice. [STANDARDS-TRACK] Key words for use in RFCs to Indicate Requirement Levels In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements. Advanced Sockets Application Program Interface (API) for IPv6 This document provides sockets Application Program Interface (API) to support "advanced" IPv6 applications, as a supplement to a separate specification, RFC 3493. The expected applications include Ping, Traceroute, routing daemons and the like, which typically use raw sockets to access IPv6 or ICMPv6 header fields. This document proposes some portable interfaces for applications that use raw sockets under IPv6. There are other features of IPv6 that some applications will need to access: interface identification (specifying the outgoing interface and determining the incoming interface), IPv6 extension headers, and path Maximum Transmission Unit (MTU) information. This document provides API access to these features too. Additionally, some extended interfaces to libraries for the "r" commands are defined. The extension will provide better backward compatibility to existing implementations that are not IPv6-capable. This memo provides information for the Internet community. Protocol Modifications for the DNS Security Extensions This document is part of a family of documents that describe the DNS Security Extensions (DNSSEC). The DNS Security Extensions are a collection of new resource records and protocol modifications that add data origin authentication and data integrity to the DNS. This document describes the DNSSEC protocol modifications. This document defines the concept of a signed zone, along with the requirements for serving and resolving by using DNSSEC. These techniques allow a security-aware resolver to authenticate both DNS resource records and authoritative DNS error indications. This document obsoletes RFC 2535 and incorporates changes from all updates to RFC 2535. [STANDARDS-TRACK] Path MTU Discovery for IP version 6 This document describes Path MTU Discovery (PMTUD) for IP version 6. It is largely derived from RFC 1191, which describes Path MTU Discovery for IP version 4. It obsoletes RFC 1981. Extension Mechanisms for DNS (EDNS(0)) The Domain Name System's wire protocol includes a number of fixed fields whose range has been or soon will be exhausted and does not allow requestors to advertise their capabilities to responders. This document describes backward-compatible mechanisms for allowing the protocol to grow.This document updates the Extension Mechanisms for DNS (EDNS(0)) specification (and obsoletes RFC 2671) based on feedback from deployment experience in several implementations. It also obsoletes RFC 2673 ("Binary Labels in the Domain Name System") and adds considerations on the use of extended labels in the DNS. Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words RFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings. DNS Terminology The Domain Name System (DNS) is defined in literally dozens of different RFCs. The terminology used by implementers and developers of DNS protocols, and by operators of DNS systems, has sometimes changed in the decades since the DNS was first defined. This document gives current definitions for many of the terms used in the DNS in a single document.This document obsoletes RFC 7719 and updates RFC 2308. Packetization Layer Path MTU Discovery for Datagram Transports This document specifies Datagram Packetization Layer Path MTU Discovery (DPLPMTUD). This is a robust method for Path MTU Discovery (PMTUD) for datagram Packetization Layers (PLs). It allows a PL, or a datagram application that uses a PL, to discover whether a network path can support the current size of datagram. This can be used to detect and reduce the message size when a sender encounters a packet black hole. It can also probe a network path to discover whether the maximum packet size can be increased. This provides functionality for datagram transports that is equivalent to the PLPMTUD specification for TCP, specified in RFC 4821, which it updates. It also updates the UDP Usage Guidelines to refer to this method for use with UDP datagrams and updates SCTP.The document provides implementation notes for incorporating Datagram PMTUD into IETF datagram transports or applications that use datagram transports.This specification updates RFC 4960, RFC 4821, RFC 6951, RFC 8085, and RFC 8261. IP Fragmentation Considered Fragile This document describes IP fragmentation and explains how it introduces fragility to Internet communication.This document also proposes alternatives to IP fragmentation and provides recommendations for developers and network operators. Requirements for Internet Hosts - Communication Layers This RFC is an official specification for the Internet community. It incorporates by reference, amends, corrects, and supplements the primary protocol standards documents relating to hosts. [STANDARDS-TRACK] DNS Security (DNSSEC) Hashed Authenticated Denial of Existence The Domain Name System Security (DNSSEC) Extensions introduced the NSEC resource record (RR) for authenticated denial of existence. This document introduces an alternative resource record, NSEC3, which similarly provides authenticated denial of existence. However, it also provides measures against zone enumeration and permits gradual expansion of delegation-centric zones. [STANDARDS-TRACK] Security Implications of Predictable Fragment Identification Values IPv6 specifies the Fragment Header, which is employed for the fragmentation and reassembly mechanisms. The Fragment Header contains an "Identification" field that, together with the IPv6 Source Address and the IPv6 Destination Address of a packet, identifies fragments that correspond to the same original datagram, such that they can be reassembled together by the receiving host. The only requirement for setting the Identification field is that the corresponding value must be different than that employed for any other fragmented datagram sent recently with the same Source Address and Destination Address. Some implementations use a simple global counter for setting the Identification field, thus leading to predictable Identification values. This document analyzes the security implications of predictable Identification values, and provides implementation guidance for setting the Identification field of the Fragment Header, such that the aforementioned security implications are mitigated. Internet Protocol, Version 6 (IPv6) Specification This document specifies version 6 of the Internet Protocol (IPv6). It obsoletes RFC 2460. UDP Usage Guidelines The User Datagram Protocol (UDP) provides a minimal message-passing transport that has no inherent congestion control mechanisms. This document provides guidelines on the use of UDP for the designers of applications, tunnels, and other protocols that use UDP. Congestion control guidelines are a primary focus, but the document also provides guidance on other topics, including message sizes, reliability, checksums, middlebox traversal, the use of Explicit Congestion Notification (ECN), Differentiated Services Code Points (DSCPs), and ports.Because congestion control is critical to the stable operation of the Internet, applications and other protocols that choose to use UDP as an Internet transport must employ mechanisms to prevent congestion collapse and to establish some degree of fairness with concurrent traffic. They may also need to implement additional mechanisms, depending on how they use UDP.Some guidance is also applicable to the design of other protocols (e.g., protocols layered directly on IP or via IP-based tunnels), especially when these protocols do not themselves provide congestion control.This document obsoletes RFC 5405 and adds guidelines for multicast UDP usage. Domain Validation++ For MitM-Resilient PKI Fraunhofer Institute for Secure Information Technology SIT, Darmstadt, Germany Fraunhofer Institute for Secure Information Technology SIT, Darmstadt, Germany Fraunhofer Institute for Secure Information Technology SIT, Darmstadt, Germany Fraunhofer Institute for Secure Information Technology SIT, Darmstadt, Germany Fraunhofer Institute for Secure Information Technology SIT, Darmstadt, Germany Fragmentation Considered Poisonous IP fragmentation attack on DNS cz.nic Measures against cache poisoning attacks using IP fragmentation in DNS JPRS DNS flag day 2020 Measuring DNS Flag Day 2020 APNIC Labs APNIC Labs

“Fragmentation Considered Poisonous” proposed effective off-path DNS cache poisoning attack vectors using IP fragmentation. “IP fragmentation attack on DNS” and “Domain Validation++ For MitM-Resilient PKI” proposed that off-path attackers can intervene in path MTU discovery to perform intentionally fragmented responses from authoritative servers. stated the security implications of predictable fragment identification values. DNSSEC is a countermeasure against cache poisoning attacks that use IP fragmentation. However, DNS delegation responses are not signed with DNSSEC, and DNSSEC does not have a mechanism to get the correct response if an incorrect delegation is injected. This is a denial-of-service vulnerability that can yield failed name resolutions. If cache poisoning attacks can be avoided, DNSSEC validation failures will be avoided. In Section 3.2 (Message Side Guidelines) of UDP Usage Guidelines we are told that an application SHOULD NOT send UDP datagrams that result in IP packets that exceed the Maximum Transmission Unit (MTU) along the path to the destination. A DNS message receiver cannot trust fragmented UDP datagrams primarily due to the small amount of entropy provided by UDP port numbers and DNS message identifiers, each of which being only 16 bits in size, and both likely being in the first fragment of a packet, if fragmentation occurs. By comparison, TCP protocol stack controls packet size and avoid IP fragmentation under ICMP NEEDFRAG attacks. In TCP, fragmentation should be avoided for performance reasons, whereas for UDP, fragmentation should be avoided for resiliency and authenticity reasons.

There are many discussions for default path MTU size and maximum DNS/UDP payload size. The minimum MTU for an IPv6 interface is 1280 octets (see Section 5 of ). Then, we can use it as default path MTU value for IPv6. The corresponding minimum MTU for an IPv4 interface is 68 (60 + 8) . Most of the Internet and especially the inner core has an MTU of at least 1500 octets. Maximum DNS/UDP payload size for IPv6 on MTU 1500 ethernet is 1452 (1500 minus 40 (IPv6 header size) minus 8 (UDP header size)). To allow for possible IP options and distant tunnel overhead, authors’ recommendation of default maximum DNS/UDP payload size is 1400. defines that “A security-aware name server MUST support the EDNS0 message size extension, MUST support a message size of at least 1220 octets”. Then, the smallest number of the maximum DNS/UDP payload size is 1220. In order to avoid IP fragmentation, proposed that the UDP requestors set the requestor’s payload size to 1232, and the UDP responders compose UDP responses fit in 1232 octets. The size 1232 is based on an MTU of 1280, which is required by the IPv6 specification , minus 48 octets for the IPv6 and UDP headers. analyzed the result of , reported that their measurements suggest that in the interior of the Internet between recursive resolvers and authoritative servers the prevailing MTU is at 1,500 and there is no measurable signal of use of smaller MTUs in this part of the Internet, and proposed that their measurements suggest setting the EDNS0 Buffer size to IPv4 1472 octets and IPv6 1452 octets.

Socket options: “IP_MTU (since Linux 2.2) Retrieve the current known path MTU of the current socket. Valid only when the socket has been connected. Returns an integer. Only valid as a getsockopt(2).” (Quoted from Debian GNU Linux manual: ip(7)) “IPV6_MTU getsockopt(): Retrieve the current known path MTU of the current socket. Only valid when the socket has been connected. Returns an integer.” (Quoted from Debian GNU Linux manual: ipv6(7)) Section 3.4 of specifies FIND_MAXSIZES() as one of “INTERNET/TRANSPORT LAYER INTERFACEs”.

The Linux tool “tracepath” can be used to measure the path MTU to a destination. Or, “ping/ping6” command with “-D” Don’t Fragment bit set / Disable IPv6 fragmentation options.

Some implementations have ‘minimal responses’ configuration that causes a DNS server to make response packets smaller, containing only mandatory and required data. Under the minimal-responses configuration, DNS servers compose response messages using only RRSets corresponding to queries. In case of delegation, DNS servers compose response packets with delegation NS RRSet in authority section and in-domain (in-zone and below-zone) glue in the additional data section. In case of non-existent domain name or non-existent type, the start of authority (SOA RR) will be placed in the Authority Section. In addition, if the zone is DNSSEC signed and a query has the DNSSEC OK bit, signatures are added in answer section, or the corresponding DS RRSet and signatures are added in authority section. Details are defined in and .