Internet Engineering Task Force L. Song Internet-Draft Beijing Internet Institute Intended status: Standards Track May 8, 2018 Expires: November 9, 2018 ATR: Additional Truncation Response for Large DNS Response draft-song-atr-large-resp-01 Abstract As the increasing use of DNSSEC and IPv6, there are more public evidence and concerns on IPv6 fragmentation issues due to larger DNS payloads over IPv6. This memo introduces an simple improvement on DNS server by replying an additional truncated response just after the normal fragmented response. It can be used to relieve users suffering on DNS latency and failures due to large DNS response. It also can be utilized as a measuring and troubleshooting tool to locate the issue and conquer. REMOVE BEFORE PUBLICATION: The source of the document with test script is currently placed at GitHub [ATR-Github]. Comments and pull request are welcome. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on November 9, 2018. Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents Song Expires November 9, 2018 [Page 1] Internet-DrATR: Additional Truncation Response for Large DNS May 2018 (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. The ATR mechanism . . . . . . . . . . . . . . . . . . . . . . 4 3. Indicating a ATR response . . . . . . . . . . . . . . . . . . 5 4. Operational considerations . . . . . . . . . . . . . . . . . 6 4.1. ATR timer . . . . . . . . . . . . . . . . . . . . . . . . 6 4.2. ATR payload size . . . . . . . . . . . . . . . . . . . . 7 4.3. Less aggresiveness of ATR . . . . . . . . . . . . . . . . 8 5. Security Considerations . . . . . . . . . . . . . . . . . . . 9 6. IANA considerations . . . . . . . . . . . . . . . . . . . . . 9 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 Appendix A. How well does ATR actually work? . . . . . . . . . . 11 Appendix B. Considerations on Resolver awareness of ATR . . . . 12 Appendix C. Revision history of this document . . . . . . . . . 13 C.1. draft-song-atr-large-resp-01 . . . . . . . . . . . . . . 13 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 14 1. Introduction Large DNS response is identified as a issue for a long time. There is an inherent mechanism defined in [RFC1035] to handle large DNS response (larger than 512 octets) by indicating (set TrunCation bit) the resolver to fall back to query via TCP. Due to the fear of cost of TCP, EDNS(0) [RFC6891] was proposed which encourages server to response larger response instead of falling back to TCP. However, as the increasing use of DNSSEC and IPv6, there are more public evidence and concerns on user's suffering due to packets dropping caused by IPv6 fragmentation in DNS due to large DNS response. It is observed that some IPv6 network devices like firewalls intentionally choose to drop the IPv6 packets with fragmentation Headers[I-D.taylor-v6ops-fragdrop]. [RFC7872] reported more than 30% drop rates for sending fragmented packets. Regarding IPv6 fragmentation issue due to larger DNS payloads in response, one measurement [IPv6-frag-DNS] reported 35% of endpoints using IPv6-capable DNS resolver can not receive a fragmented IPv6 response over UDP. Moreover, most of the underlying issues with fragments are unrevealed due to good redundancy and resilience of DNS. It is hard Song Expires November 9, 2018 [Page 2] Internet-DrATR: Additional Truncation Response for Large DNS May 2018 for DNS client and server operators to trace and locate the issue when fragments are blocked or dropped. The noticeable DNS failures and latency experienced by end users are just the tip of the iceberg. Depending on retry model, the resolver's failing to receive fragmented response may experience long latency or failure due to timeout and reties. One typical case is that the resolver finally got the answer after several retires and it falls back to TCP after deceasing the payload size in EDNS0. To avoid that issue, some authoritative servers may adopt a policy ignoring the UDP payload size in EDNS0 extension and always truncating the response when the response size is large than a expected one. However one study [Not-speak-TCP] shows that about 17% of resolvers in the samples can not ask a query in TCP when they receive truncated response. It seems a dilemma to choose hurting either the users who can not receive fragments or the users without TCP fallback capacity. There is also some voice of "moving all DNS over TCP". But It is generally desired that DNS can keep the efficiency and high performance by using DNS UDP in most of time and fallback as soon as possible to TCP if necessary for some corner case. To relieve the problem, this memo introduces an small improvement on DNS responding process by replying an Additional Truncated Response (ATR) just after a normal large response which is to be fragmented. Generally speaking ATR provides a way to decouple the EDNS0 and TCP fallback in which they can work independently according to the server operator's requirement. One goal of ATR is to relieve the hurt of users, both stub and recursive resolver, from the position of server, both authoritative and recursive server. It does not require any changes on resolver and has a deploy-and-gain feature to encourage operators to implement it to benefit their resolvers. Another goal of ATR is to help troubleshooting for DNS operators where ATR can be deployed as a measurement tool to identify vulnerable servers and users. A flag bit is required in EDNS0 OPT header to distinguish ATR response from a ordinary truncated response. A resolver (or troubleshooter) can tell if there is any fragment not received in a certain transaction with a name server by receiving only ATR response without ordinary UDP response. Another way of using ATR as measurement tool is to reply ATR response to specific group of resolvers and record the TCP connections it received during that period. It can help identify vulnerable users and adopt ATR to them selectively. [REMOVE BEFORE PUBLICATION] Note that in Appendix A of this memo there is a brief introduction of the test done by APNIC on how well does ATR actually work. And comments are also attached by the author Song Expires November 9, 2018 [Page 3] Internet-DrATR: Additional Truncation Response for Large DNS May 2018 of this memo. It may help people to understand what the benefit and tradeoff that ATR brings. 2. The ATR mechanism The ATR mechanism is very simple that it involves a ATR module in the responding process of current DNS implementation . As show in the following diagram the ATR module is right after truncation loop if the packet is not going to be fragmented. A DNS +-------------+ +-------------+ Normal query | | No | | response +------> Truncation +--------> ATR +---------> | loop | | Module | | truncation? | | truncation? | +-------------+ +-------------+ yes| yes| +-----+ | +-----+timer+--> | +-----+ | Truncated Response +---------------> Truncated Response Figure 1: High-Level Testbed Components The ATR responding process goes as follows: o When an authoritative server receives a query and enters the responding process, it first go through the normal truncation loop to see whether the size of response surpasses the EDNS0 payload size. If yes, it ends up with responding a truncated packets. If no, it enters the ATR module. o In ATR module, similar like truncation loop, the size of response is compared with a value called ATR payload size. If the response of a query is larger than ATR payload size, the server firstly sends the normal response and then coin a truncated response with the same ID of the query. o The server can reply the coined truncated response in no time. But considering the possible impact of network reordering, it is suggested a timer to delay the second truncated response, for example 10~50 millisecond which can be configured by local operation. Song Expires November 9, 2018 [Page 4] Internet-DrATR: Additional Truncation Response for Large DNS May 2018 Note that the choice of ATR payload size and timer SHOULD be configured locally. And the operational consideration and guidance is discussed in Section 4.2 and Section 4.1 respectively. There are three typical cases of ATR-unaware resolver behavior when a resolver send query to an ATR server in which the server will generate a large response with fragments: o Case 1: a resolver (or sub-resolver) will receive both the large response and a very small truncated response in sequence. It will happily accepts the first response and drop the second one because the transaction is over. o Case 2: In case a fragment is dropped in the middle, the resolver will end up with only receiving the small truncated response. It will retry using TCP in no time. o Case 3: For those (probably 30%*17% of them) who can not speak TCP and sitting behind a firewall stubbornly dropping fragments. Just say good luck to them! In the case authoritative server truncated all response surpass certain value , for example setting IPv6-edns-size to 1220 octets, ATR will helpful for resolver with TCP capacity, because the resolver still has a fair chance to receive the large response. 3. Indicating a ATR response As introduced in ATR it is necessary to distinguish ATR response in a special way from a ordinary truncated response. It enables resolver operators to log cases where ATR responses is received without a (reassembled) UDP response to a query. Without an indicator that distinguishes ATR response, there would be no way to avoid false alarms from authoritative servers that always and only return truncated responses when the message exceeds some size. It is actually the use case where Google RDNS is considering. Google RDNS would like to use such indications to flag problematic name servers where RDNS should restrict maximum EDNS to a lower value than the default 4096 that currently used. A simple way for that indicator of ATR response is to define a bit in the Z field on the EDNS0 OPT header in the response. This has the virtue of simplicity, and only a minimal risk of breaking existing implementations. This bit is referred to as the "ATR Response" (AT) bit. In the context of the EDNS0 OPT meta-RR, just following the DO bit, the AT bit is the second bit of the third and fourth bytes of the "extended RCODE and flags" portion (section 6.1.3 of RFC6891) of the EDNS0 OPT meta-RR, structured as follows: Song Expires November 9, 2018 [Page 5] Internet-DrATR: Additional Truncation Response for Large DNS May 2018 +0 (MSB) +1 (LSB) +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 0: | EXTENDED-RCODE | VERSION | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 2: |DO|AT| Z | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Figure 2: Wire format of extended RCODE and flags with AT bit The logic of AT bit is simple that setting the AT bit to one in a response indicates to the resolver that the response is an ATR response. The AT bit cleared(set to zero) indicates the response is a ordinary response. Note that AT bit and TC bit SHOULD be set and appear in the response as a pair. The response will be ignored if only AT bit is set. The indication of ATR defined in this memo is for measurement and logging purpose. But it is possible used by resolver operator in other aspects, because it signals the resolver the large response is fragmented and dropped on the path. The resolver could act more actively if it is able to recognized that bit. More is discussed in Appendix B. 4. Operational considerations In previous sections, only behavior of ATR server and AT bit are specified. There are lots of space for operational issues, such as the parameter of the ATR timer and ATR payload size, and policies on when ATR is triggered to avoid side-effect. 4.1. ATR timer As introduced in Section 2 ATR timer is a way to avoid the impact of network reordering(RO). The value of the timer is critical, because if the delay is too short, the ATR response may be received earlier than the fragmented response (the first piece), the resolver will fall back to TCP bearing the cost which should have been avoided. If the delay is too long, the client may timeout and retry which negates the incremental benefit of ATR. Generally speaking, the delay of the timer should be "long enough, but not too long". To the best knowledge of author, the nature of RO is characterized as follows hopefully helping ATR users understand RO and how to operate ATR appropriately in RO context. Song Expires November 9, 2018 [Page 6] Internet-DrATR: Additional Truncation Response for Large DNS May 2018 o RO is mainly caused by the parallelism in Internet components and links other than network anomaly [Bennett]. It was observed that RO is highly related to the traffic load of Internet components. So RO will long exists as long as the traffic load continue increase and the parallelism is used to enhance network throughput. o The probability of RO varies largely depending on the different tests samples. Some work shown RO probability below 2% [Paxson] [Tinta] and another work was above 90% [Bennett]. But it is agreed that RO is site-dependent and path-dependent. It is observed in that when RO happens, it is mostly exhibited consistently in a small percentages of the paths. It is also observed that higher rates smaller packets were more prone to RO because the sending inter-spacing time was small. o It was reported that the inter-arrival time of RO varies from a few milliseconds to multiple tens of milliseconds [Tinta]. And the larger the packet the larger the inter-arrival time, since larger packets will take longer to be transmitted. Reasonably we can infer that firstly RO should be taken into account because it long exists due to middle Internet components which can not be avoided by end-to-end way. Secondly the mixture of larger and small packets in ATR case will increase the inter-arrival time of RO as well as the its probability. The good news is that the RO is highly site specific and path specific, and persistent which means the ATR operator is able to identify a few sites and paths, setup a tunable timer setting for them, or just put them into a blacklist without replying ATR response. Based on the above analysis it is hard to provide a perfect value of ATR timer for all ATR users due to the diversity of networks. It seems OK to set the timer with a range from ten to hundreds ms, just below the timeout setting of typical resolver. Is suggested that a decision should be made as operator-specific according to the statistic of the RTT of their users. Some measurement shown [Brownlee][Liang] the mean of response time is below 50 ms for the sites with lots of anycast instance like L-root, .com and .net name servers. For that sites, delay less than 50 ms is appropriate. 4.2. ATR payload size Regarding the operational choice for ATR payload size, there are some good input from APNIC study [scoring-dns-root]on how to react to large DNS payload for authoritative server. The difference in ATR is that ATR focuses on the second response after the ordinary response. Song Expires November 9, 2018 [Page 7] Internet-DrATR: Additional Truncation Response for Large DNS May 2018 For IPv4 DNS server, it is suggested the study that do not truncate and fragment IPv4 UDP response with a payload up to 1472 octets which is Ethernet MTU(1500) minus the sum of IPv4 header(20) and UDP header(8). The reason is to avoid gratuitously fragmenting outbound packets and TCP fallback at the source. In the case of ATR, the first ordinary response is emitted without knowing it be to fragmented or not on the path. If a large value is set up to 1472 octets, payload size between 512 octets and the large value size will probably get fragmented by aggressive firewalls which leads losing the benefit of ATR. If ATR payload size set exactly 512 octets, in most of case ATR response and the single unfragmented packets are under a race at the risk of RO. Given IPv4 fragmentation issue is not so serious compared to IPv6, it is suggested in this memo to set ATR payload size 1472 octets which means ATR only fit large DNS response larger than 1500 octets in IPv4. For IPv6 DNS server, similar to IPv4, the APNIC study is suggested that do not truncate IPv6 UDP packets with a payload up to 1,452 octets which is Ethernet MTU(1500) minus the sum of IPv6 header(40) and UDP header(8). 1452 octets is chosen to avoid TCP fallback in the context that most TCP MSS in the root server is not set probably at that time. In the case of ATR considering the second truncated response, a smaller size: 1232 octets, which is IPv6 MTU for most network devices(1280) minus the sum of IPv6 header(40) and UDP header(8), should be chosen as ATR payload size to trigger necessary TCP fallback. As a complementary requirement with ATR, the TCP MSS should be set 1220 octets to avoid Packet Too Big ICMP message as suggested in the APNIC study. In short, it is recommended that in IPv4 ATR payload size SHOULD be 1472 octets, and in IPv6 the value SHOULD be 1232 octets. 4.3. Less aggresiveness of ATR There is a concern ATR sends TC=1 response too aggressively especially in the beginning of adoption. One of the idea to mitigate this aggressiveness, ATR may respond TC=1 responses at a low possibility, such as 10%. Another way to mitigating is to reply ATR response selectively. It is observed that RO and IPv6 fragmentation issues are path specific and persistent due to the Internet components and middle box. So it is reasonable to keep a ATR "whitelist" by counting the retries and Song Expires November 9, 2018 [Page 8] Internet-DrATR: Additional Truncation Response for Large DNS May 2018 recording the IP destination address of that large response causing many retires. ATR only acts to those queries from the IP address in the white list. 5. Security Considerations There may be concerns on DDoS attack problem due to the fact that the ATR introduces multiple responses from authoritative server. DNS cookies [RFC7873] and RRL on authoritative may be possible solutions 6. IANA considerations EDNS(0) [RFC6891] defines 16 bits as extended flags in the OPT record, these bits are encoded into the TTL field of the OPT record. IETF Standards Action is required for assignments of new EDNS(0) flags. This document reserves one of these bits as the AT bit. It is requested that the second bit (left most) be allocated. Thus the USE of the OPT record TTL field would look like: +0 (MSB) +1 (LSB) +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 0: | EXTENDED-RCODE | VERSION | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 2: |DO|AT| Z | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Figure 3: the USE of the OPT record TTL field for AT bit 7. Acknowledgments Many thanks to reviewers and their comments. Geoff Huston and Joao Damas did a testing on the question "How well does ATR actually work?". Alexander Dupuy proposed the idea to distinguish ATR responses from normal ones. Akira Kato contributed ideas on operational consideration. 8. References [ATR-Github] "XML source file and test script of DNS ATR", September 2017, . Song Expires November 9, 2018 [Page 9] Internet-DrATR: Additional Truncation Response for Large DNS May 2018 [Bennett] "Packet Reordering is Not Pathological Network Behavior", December 1999, . [Brownlee] "Response time distributions for global name servers", 2002, . [How-ATR-Work] APNIC, "How well does ATR actually work?", April 2018, . [I-D.taylor-v6ops-fragdrop] Jaeggli, J., Colitti, L., Kumari, W., Vyncke, E., Kaeo, M., and T. Taylor, "Why Operators Filter Fragments and What It Implies", draft-taylor-v6ops-fragdrop-02 (work in progress), December 2013. [IPv6-frag-DNS] "Dealing with IPv6 fragmentation in the DNS", August 2017, . [Liang] Tsinghua University, "Measuring Query Latency of Top Level DNS Servers", February 2013, . [Not-speak-TCP] "A Question of DNS Protocols", August 2013, . [Paxson] "End-to-End Internet Packet Dynamics", August 1999, . [RFC1035] Mockapetris, P., "Domain names - implementation and specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, November 1987, . [RFC6891] Damas, J., Graff, M., and P. Vixie, "Extension Mechanisms for DNS (EDNS(0))", STD 75, RFC 6891, DOI 10.17487/RFC6891, April 2013, . Song Expires November 9, 2018 [Page 10] Internet-DrATR: Additional Truncation Response for Large DNS May 2018 [RFC7872] Gont, F., Linkova, J., Chown, T., and W. Liu, "Observations on the Dropping of Packets with IPv6 Extension Headers in the Real World", RFC 7872, DOI 10.17487/RFC7872, June 2016, . [RFC7873] Eastlake 3rd, D. and M. Andrews, "Domain Name System (DNS) Cookies", RFC 7873, DOI 10.17487/RFC7873, May 2016, . [scoring-dns-root] APNIC, "Scoring the DNS Root Server System", November 2016, . [Tinta] "Characterizing End-to-End Packet Reordering with UDP Traffic", August 2009, . Appendix A. How well does ATR actually work? It is worth of mentioning APNIC report[How-ATR-Work] on "How well does ATR actually work?" done by Geoff Huston and Joao Damas after 00 version of ATR draft. It was reported firstly in IEPG meeting before IETF 101 and then posted in APNIC Blog later. It is said the test was performed over 55 million endpoints, using an online ad distribution network to deliver the test script across the Internet. The result is positive that ATR works! From the end users' perspective, in some 9% of IPv4 cases the use of ATR by the server will improve the speed of resolution of a fragmented UDP response by signaling to the client an immediate switch to TCP to perform a re-query. The IPv6 behavior would improve the resolution times in 15% of cases. It also analyzed the pros and cons of ATR. On one hand, It is said that ATR certainly looks attractive if the objective is to improve the speed of DNS resolution when passing large DNS responses. And ATR is incrementally deployable in favor of decision made by each server operator. On another hand, ATR also has some negative factors. One issue is adding another DNS DDoS attack vector due to the additional packet sent by ATR, (author's note : very small adding actually.) Another issue is risk of RO by the choice of the delay timer which is discussed fully in Section 4.1. As a conclusion, it is said that "ATR does not completely fix the large response issue. If a resolver cannot receive fragmented UDP responses and cannot use TCP to perform DNS queries, then ATR is not Song Expires November 9, 2018 [Page 11] Internet-DrATR: Additional Truncation Response for Large DNS May 2018 going to help. But where there are issues with IP fragment filtering, ATR can make the inevitable shift of the query to TCP a lot faster than it is today. But it does so at a cost of additional packets and additional DNS functionality". "If a faster DNS service is your highest priority, then ATR is worth considering", said at the end of this report This test and report definitely made ATR more shiny attracting attention in the community. But it is found that there are still something unknown on "How well does ATR actually work". Firstly the test only counts the "success" case ("failure" otherwise) in which the "success" for large UDP case can be achieved by normal retries. The latency of that retries may reduced by ATR is not taken into account. Secondly more analysis could be done in future to compare ATR with TCP. From the failure rate of users, we see ATR and TCP have similar performance (same for IPv4, and only 2% difference in IPv6). But there are resolvers who are able to receive the fragments more sooner in ATR case , but they fall back to TCP in TCP case. If not misunderstood, the two unknown parts underestimates the benefit of ATR in terms of speed of response. The third unknown part is about the ATR timer and RO impact. In APNIC test, 10 ms was adopted as the delay of ATR timer according to 00 version of this draft. Different delay of ATR timer may not change the key result on gains of ATR (9% for IPv4 and 15% for IPv6). But the cost of RO is not measured. In the majority cases ATR is not needed, say 87% in IPv4, and 79% in IPv6. So it overestimate ATR in this regards if RO cost is not taken into account. Appendix B. Considerations on Resolver awareness of ATR ATR proposed in this memo is a server-side function which requires no change in resolver, so it is not required that resolver should recognized or use AT bit. But it may helpful for some special cases where a resolver is able to recognized or use AT bit. One case is that when receiving a ATR response a ATR-aware resolver can adopt a "happyeyeballs" strategy by opening a separate transaction sending the query via TCP instead of falling back to TCP and closing the original UDP transaction. Listen to port 53 on both TCP and UDP port 53 will enhance the availability and reduce the latency. It will add more tolerance to network reordering issue as well. However, it should be taken into account about the balance of resolver's resource. Less priority should be given to that function when the resolver is "busy". Song Expires November 9, 2018 [Page 12] Internet-DrATR: Additional Truncation Response for Large DNS May 2018 Another case is that a ATR-aware resolver is able to indicate its support for ATR to prudent servers who do not reply ATR response to every query and resolver. If that requirement is valid, it is possible for resolver to re-use the AT bit defined in this memo as a indication asking ATR response from the server. However the two cases are currently outside of the scope of server- ATR specification. It needs further discussion. Appendix C. Revision history of this document C.1. draft-song-atr-large-resp-01 After receiving reviews and comments, changes of 01 version are shown as belows: o Rewrite introduction and add another goal of ATR as a measuring tool; o Add section 3 indicating a ATR response. An bit in the EDNS0 OPT header is defined as a indicator of ATR response. The flag bit is called "ATR Response" (AT) bit; o Add Section 4 Operation considerations, which discuss ATR timer , ATR payload size, and less aggressiveness of ATR; o Add IANA consideration to register the AT bit; o Add section 7 Acknowledgments; o Append a list of references regarding Network reordering, and APNIC's study on IPv6 and DNS; o Add Appendix A, An introduce of APNIC testing work and author's comments; o Appendix B. Considerations on Resolver awareness of ATR; o Change the category="std" . It is said in RFC6891 IETF Standards Action is required for assignments of new EDNS(0) flags. So the draft should be categorized as standard track if registering AT bit is desired in this document. Change history is also available in the public GitHub repository where this document is maintained: . Song Expires November 9, 2018 [Page 13] Internet-DrATR: Additional Truncation Response for Large DNS May 2018 Author's Address Linjian Song Beijing Internet Institute Email: songlinjian@gmail.com URI: http://www.biigroup.com/ Song Expires November 9, 2018 [Page 14]