Individual submission L-J. Liman
Internet-Draft Autonomica
Intended status: Informational October 26, 2009
Expires: April 27, 2010

Top Level Domain Name Specification
draft-liman-tld-names-01

Abstract

The precise syntax allowed in top-level domain name labels has been the subject to some debate. RFC 1123, for example, makes the statement that top-level domain names are "alphabetic". This document updates the definition of allowable top-level domain names in order to support internationalized domain names (IDNs), as encoded by the IDNA protocols. This document focuses narrowly on the issue of IDNs and does not make any other changes or clarifications to existing domain name syntax rules.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http:/⁠/⁠datatracker.ietf.org/⁠drafts/⁠current/⁠.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on April 27, 2010.

Copyright Notice

Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http:/⁠/⁠trustee.ietf.org/⁠license-⁠info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

The precise syntax allowed in top-level domain (TLD) name labels has been the subject to some debate. RFC 1123 [RFC1123], for example, states that TLD names must be "alphabetic", which is interpreted as excluding the hyphen (or dash) character. This document updates the definition of allowable top-level domain names to support internationalized domain names that consist of Unicode letters, as encoded by the IDNA protocols [RFCXXX]. In particular, this document clarifies that ASCII TLDs beginning with the IDN A-label prefix (currently "xn--"), as encoded by IDNA, are permissible as DNS TLD names as long as they are made from Unicode letters. This document focuses narrowly on the issue of allowable ASCII labels encoded by the IDNA protocols and does not (and is not intended to) make any other changes or clarifications to existing domain name syntax rules.

1.1. Terminology

The terminology used in this document is as defined in RFC 0952 [RFC0952] and RFC 1035 [RFC1035].

1.2. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

2. Background

RFC 0952 [RFC0952] states (among other things) that a host name is;

RFC 1123 [RFC1123] reaffirms this definition, making two additional changes to the syntax:

and

The restrictions on host names and specifically TLD names have always been, at least in part, driven by human factors considerations. Underscores in host names are avoided because they are indistinguishable from hyphens when seen on a page or written in longhand, and to some extent because of early internationalization issues. The original "no leading digits" rule was driven by wanting to make sure that even imprecise programming or human thought errors didn't confuse addresses with names.

The wish to express TLD names in other scripts than Latin makes it necessary to relax the the rules for TLD names. However, the old motivations for keeping the TLD names alphabetical still hold, and furthermore, certain characteristics of some IDN names with digits in them make them unsuitable as DNS labels. The problem is referred to as "jumping digits", and is described in draft-ietf-idnabis-bidi.

In order to keep changes to existing specifications to a minimum but to still allow for IDN TLD names, this document hereby changes the existing specification to allow for IDN TLD names in the "A-label form" as specified by the IDNA-2008 specifications, i.e., an ASCII-compatible-encoding, using reversible Punycode conversion from valid IDN labels, with IDN A-label prefix (currently "xn--"), but requiring that the native-character ("Unicode") form consist of letters only.

Hence, the ABNF expression that matches a valid TLD label is as follows:


          tldlabel = traditional-tld-label / idn-label

          traditional-tld-label = 1*63(ALPHA)

          idn-label = Restricted-A-label

          ALPHA    = %x41-5A / %x61-7A   ; A-Z / a-z

        

Restricted-A-label is an A-label as defined in draft-ietf-idna-defs converted from (and convertible to) a U-label that is consistent with the definition in draft-ietf-idna-defs and that is further restricted to contain only Unicode characters of General Category "L". Note that "L" contains several sub-categories. The list is:


          ; Letter
          L = Ll / Lm / Lo / Lt / Lu

          Ll = Lowercase-Letter

          Lm = Modifier-Letter

          Lo = Other-Letter

          Lt = Titlecase-Letter

          Lu = Uppercase-Letter

        

although IDNA prohibits (categorizes as DISALLOWED) all characters in the last two categories and several of the characters that fall into the other categories.

This new specification reflects current practice in registration of TLD names by the IANA, and allows for IDNs.

3. Other Limitations on Top Level Domain Labels

It should be noted that there are many issues that must be considered in making any changes to current restrictions on DNS labels, especially at the top level. DNS software is widely deployed, and some of that software contains embedded assumptions that may not hold if DNS names are used at the top level that differ from the older rules. For example, when TLDs longer than 3 characters became available (e.g., .info, .museum, etc.), some deployed systems did not process such DNS names properly. This document does not take the position that no problems will result when IDN TLDs are created, but does recognize that relaxing the syntax of allowed TLDs is necessary in order to allow deployment of IDNs to happen.

It is also carefully noted that the above specification is not the only limiting factor on TLD labels. There may be other entities than the IETF that have influence over TLD names, and which may decide to restrict the names further. The above technical specification is just one limiting factor.

4. IANA Considerations

This memo changes the specifications for TLD names registered by the IANA, and the IANA is requested to change its registration process to use the above specification.

5. Security Considerations

This document is believed to have limited security consequences.

It may introduce stability issues where names registered under this new specification may inter-operate badly with old software written to enforce a strict interpretation of the old specification. This might also open up attack vectors (e.g. form names being truncated). However, it is believed that such software is scarce on the Internet, and since TLD names that do not adhere to a strict interpretation of the old specification are already used (including test IDNs) without apparent problems, it is believed that this change of the specification will not create major stability or security problems on the Internet.

6. References

6.1. Normative References

[RFC1035] Mockapetris, P., "Domain names - implementation and specification", STD 13, RFC 1035, November 1987.
[RFC1123] Braden, R., "Requirements for Internet Hosts - Application and Support", STD 3, RFC 1123, October 1989.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

6.2. Informative References

[RFC0952] Harrenstien, K., Stahl, M. and E. Feinler, "DoD Internet host table specification", RFC 952, October 1985.

Appendix A. To Do

  1. Clean up references. Check situation with references to Internet Drafts. Are they/will they be published as RFCs before this draft?
  2. Verify quotations.
  3. Get rid of the term "jumping digits" and replace with appropriate wording. Also mention additional reasons not to have digits that relate to Input Method Editors and localization.

Appendix B. Change History

Appendix B.1. draft-liman-tld-named-01

Substantial comments and improvements supplied by Thomas Narten and John Klensin. Decided to go for a minimal change approach. Also noted that U-labels have to be letters due to jumping digit problem. Rewritten major parts.

Appendix B.2. draft-liman-tld-named-00

First cut. Prompted by Olafur Gudmundsson and Tina Dam.

Author's Address

Lars-Johan Liman Autonomica AB Franzengatan 5 SE-112 51 Stockholm, Sweden EMail: liman@autonomica.se URI: http://www.autonomica.se/