Domain Names, A Case for Clarifying

Which came first, the concept of Domain Names or the protocol called DNS? This question is at the heart of whether or how Domain Names are put to use in ways avoiding the DNS protocol. The discussion leading to "The '.onion' Special-Use Domain Name" , a document designating "onion" as a top-level domain in the Special Use Domain Names registry (see "Special Use Domain Names" ), opened the question of how to treat Domain Names that were designed to be used external to the DNS. The history of Domain Names and DNS had become intertwined to the point over time to the point that what is essentially a case of permission-less innovation led to a contentious discussion on the IETF's DNS Operations working group mail list. A portion of the discussion centered around a seeming conflict among processes to register Domain Names, such as the process launched from "Memorandum of Understanding Concerning the Technical Work of the Internet Assigned Numbers Authority" , for registering a name in the global, public DNS and the process for registering a name in the Special Use Domain Names registry. To help establish a way forward, a look backward is needed. A document search, sticking to RFC documents, reveals evidence of discussions on domains prior to the DNS, with the DNS protocol's base documents indicating that the DNS is based on some simplifying assumptions, implying there is a larger cconcept in play. To help bolster the idea that Domain Names came first, a look at how other protocols have treated identifying names, how Domain Names are put to use, how what a name is further restricted for the protocol's needs. From this it has become apparent that the concept of Domain Names has drifted over time, which leads to some uncertainty when it comes to looking forward. During reviews of this document, documented studies of other difficulties have surfaced. "IAB Thoughts on Encodings for Internationalized Domain Names" documents issues related to converting human-readable forms of Domain Names in forms useful to automated applications when there is no clear architecture or precise definition of how to handle Domain Names. "Issues in Identifier Comparison for Security Purposes" documents issues related to the same conversion as related to evaluating security policies. The presence of these studies suggests a need to examine the architecture of naming and identifiers. The most glaring omission in the document survey is a definitive foundation for Domain Names. There are abstract descriptions of the concept that come close to being a definition. The descriptions though are too loose to be something that can be tested objectively, frustrating discussions when it comes to innovations in the use of Domain Names. This document settles being a literature search covering the RFC series and comes to making a case for clarications to be made. There are obvious continuations to this work, such as the earlier Internet Engineering Notes, other published works, and interviews with participants in the early days. That will be conducted as follow on work to this document.

To establish a solid foundation accommodating an installed based and permission-less innovation, having a clear definition of Domain Names would be great. This document, however, does not attempt to achieve a definition. This document's goal has settled into compiling a narrative on the history, within perhaps artificial bounds (the RFC series), and declaring that there is a need to clarify Domain Names. In this document are criteria for performing a clarification, recognizing from experience in preparing "The Role of Wildcards in the Domain Name System" and "DNS Zone Transfer Protocol (AXFR)" that clarifications may have adverse impacts on deployed software, thus entering into a clarifications activity is not to be taken without considerations. There is one deviation from the strict rules of relying on the RFC series which has been included, that is the section on determining the origin of the term "resolve" in the context of Domain Names and the DNS. This work is interesting and adds to the larger picture of what needs to be done. The experience of that side route illustrates the need to expand the literature search beyond the RFC series and to include other publications and recollections.

Two or three decades into the history of Domain Names, a popular notion has taken hold that Domains Names were defined and specified in the definition of the Domain Name System (DNS). There are two documents that form the basic definition of the DNS, "Domain names - concepts and facilities" and "Domain names - implementation and specification" referred to as RFC 1034 and RFC 1035, respectively. (Note that there is another pair of Request for Comments documents with the same titles that precede RFC 1034 and RFC 1035, those were declared obsolete in favor of the newer documents.) Together RFC 1034 and RFC 1035 form STD 13, a full standard cataloged by the RFC Editor. The definitions of DNS domain names within RFC 1034 and RFC 1035 have become the apparently-authoritative source for discussions on what is a Domain Name. Throughout this document the term "Domain Names" is capitalized to emphasize the concept of the names and DNS is used to describe the protocol and algorithms described in STD 13, including any applicable updates, related standards track documents and experimental track documents. The term "domain" is a generic term, there are many naming systems in existence. The use of the term Domain Names in this document refers to the roughly-defined set of protocols and their applications' use of a naming structure that is prevalent amongst many protocols defined in IETF RFC documents. The truth is, STD 13 does not define Domain Names, the documents define only how Domain Names are used and processed in the DNS. However, the way in which the RFC documents read seem to lend to the confusion. RFC 1034, section 2 begins with this text:

"This RFC introduces domain style names, their use for Internet mail and host address support, and the protocols and servers used to implement domain name facilities." Which seems to indicate that RFC 1034 is the origin of Domain Names. Immediately following is section 2.1, entitled "The history of domain names" which includes the following text. (The text differs from the original presentation only in wrapping of text to fit current formatting rules.) "The result was several ideas about name spaces and their management [IEN-116, RFC-799, RFC-819, RFC-830]. The proposals varied, but a common thread was the idea of a hierarchical name space, with the hierarchy roughly corresponding to organizational structure, and names using "." as the character to mark the boundary between hierarchy levels. A design using a distributed database and generalized resources was described in [RFC-882, RFC-883]. Based on experience with several implementations, the system evolved into the scheme described in this memo." The only reference included in that text not otherwise mentioened in this document is IEN-116. The reference for that is defined as:"

"[IEN-116] J. Postel, "Internet Name Server", IEN-116, USC/Information Sciences Institute, August 1979. A name service obsoleted by the Domain Name System, but still in use." The DNS as it is known today did not invent Domain Names. Work on the Simple Mail Transfer Protocol preceding the DNS mentions domain names, and even it was not the origin of the concept. The DNS is not even the first attempt at an Internet naming system, see "The Domain Naming Convention for Internet User Applications" and "A Distributed System for Internet Name Service" . One important phrase to keep in mind is: "To simplify implementations," which appears in both RFC 1034 and RFC 1035 as well as their predecessors RFC 882 and RFC 883. This gives credence to the notion that Domain Names exist beyond the DNS.

The first effort taken, in preparation for writing this document, was to scan for the earliest use of the term "domain name" or "name domain". This work is detailed in the following section, but, as noted in private email by reviews of early versions of the document, gave the impression that Domain Names were somehow a by-product of the effort to develop electronic mail. To challenge the notion that email begat domain names, a search through RFC documents for the use of the term resolve as it applies to Domain Names was also done.

Domain Names emerged from the need to build a hierarchy around the growing number of identified hosts exchanging email. "SIMPLE MAIL TRANSFER PROTOCOL" , explains, in its section 3.7:

"At some not too distant future time it might be necessary to expand the mailbox format to include a region or name domain identifier. There is quite a bit of discussion on this at present, and is likely that SMTP will be revised in the future to take into account naming domains." Knowing the origins of a concept helps setting the correct boundaries for discussion. The past isn't meant to restrict the future but meant to help provide a context, include forgotten ideas, and help identify rational for scope creep. "Internet Name Domains" has (arguably) the first formation of what is a Domain Name:

"In its most general form, a standard internet mailbox name has the syntax <user>.<host>@<domain> , where <user> is the name of a user known at the host <host> in the name domain <domain>." Prior to this, domain referred to principally an administrative domain, such as the initial organizations involved in networks at the time. "NCP/TCP TRANSITION PLAN" contains this, indicating the passage from the host tables:

"It might be advantageous to do away with the host name table and use a Name Server instead, or to keep a relatively small table as a cache of recently used host names." "Computer Mail Meeting Notes" contains this:

"The conclusion in this area was that the current "user@host" mailbox identifier should be extended to 'user@host.domain' where 'domain' could be a hierarchy of domains." "The Domain Naming Convention for Internet User Applications" contains this:

"A decision has recently been reached to replace the simple name field, "<host>", by a composite name field, '<domain>' " A domain name began to take on its current form:

"Internet Convention: Fred@F.ISI.ARPA" In addition, "simple name" is defined as what we now call a label, and a "complete (fully qualified) name" is defined as "concatenation of the simple names of the domain structure tree nodes starting with its own name and ending with the top level node name". Noticeably absent is a terminating dot or any mention or representation of a root. "The Domain Naming Convention for Internet User Applications" (RFC 819) also defines ARPA as a top-level name (as opposed to top-level domain name). This is an early mention of the role of top-level names. Additionally, the use of "." as a separating character is mentioned. This walk thru history relies solely on the record left behind inside RFC documents. The precise chain of events is likely slightly different and nuanced. The point of the exercise is to show that Domain Names are a concept the emerged over time, spawned the DNS with its domain names, a definition of host names derived from the host tables, and was heavily influenced by SMTP as the driving application. The definition of the FTP protocol, originally defined in "FILE TRANSFER PROTOCOL" , never mentions hosts, domains or host names. But no formal definition of Domain Names has been written and recorded. Note: Concurrent with the writing of this document, the Domain Name Systems Operations working group is documenting a definition for "Domain Names". The first edition of "DNS Terminology" has a recitation of the original definition from STD 13, the successor edition (still in preparation) has a new, further reaching definition.

As much as Domain Names were influenced by SMTP, electronic mail was not the origin of the Domain Names concepts, this was a hypothesis came from a personal view of the early days of Internet work. To test this, a look for the use of the term "resolve" or "resolution" was conducted in early (arbitrarily defined as pre-1000) RFC documents. The term "resolve" appears numerous times, but in many different contexts. "Resolve" has many meanings, consulting a dictionary, such as Merriam Webster's dictionary , none which seem to match the use associated with domain names. For example, a committee can resolve to solve a certain question. This use of "resolve" occurred numerous times in early RFC documents unrelated to Domain Names. In "Proposed Official Standard for the Format of ARPA Network Messages" the term resolve was used in the sense of mapping an identifier into an address or something actionable. A section on Semantics (C), Address fields (1.), General (a.), bullet 1 states:

"<path>s are used to refer to a location, on the ARPANET, containing a stored address list. The <phrase> should contain text which the referenced host can resolve to a file. This standard is not a protocol and so does not prescribe HOW data is to be retrieved from the file." Private email to the (reachable) authors of the document pointed to the use of "resolve" stemming from work on programming languages and compiler theory. In that field of work, variables are associated with machine addresses when linking code. There are formal papers including "A Theory of Name Resolution" using the term and the term resolution is used in the field of "Automated Reasoning" . The exercise of determining how the term "resolve" came to be part of Domain Names and DNS shows tthat there are influences, topics, terms and concepts from technologies preceeding Domain Name and DNS that can be researched to help establish a foundation from which to build. There is more work to do here.

Subtypes of Domain Names have come to be defined for different protocols, evolving and sometimes building on previous definitions.

The DNS protocol places size restrictions on Domain Names and defines rules for matching domain names, treating sets of Domain Names as equivalent to each other. (This matching refers to treating upper case and lower case ASCII letters as equivalent.) The DNS defines the format used to transmit the names across the network as well as rules for displaying them inside text zone files. The DNS creates the notion that names are assigned by an authority per zone. Placing size restrictions on Domain Names is significant in reducing the overall population of names that can be represented in the DNS. The matching rules have the effect of creating (to use a term from graph theory) cliques, distorting the tree-nature of the Domain Name graph. A clique is a completely connected sub-graph implying cyclic paths, a tree is a graph that is acyclic. In sum, the treatment of ASCII (and only ASCII) cases as equivalent is a distortion of the Domain Name hierarchy. DNS defines two formats for domain names. One is the "on-the-wire" format used inside messages, a flags-and-length octet followed by some count of octets for each label with the final length of 0 representing the root. The other is a version that can be rendered in printable ASCII characters, complete with a means to represent other characters via an escape sequence. This does not alter the Domain Name concept but has implications when it comes to interoperating with other protocol definitions of their domain name use. DNS assumes that there is, in concept, a central authority creating names within the DNS management structure (called a zone). Although the DNS does not define how a central authority is implemented nor how it coins names, the names have to come from a single point to appear in a zone. There are other means for claiming names, an example will be mentioned later. DNS domain names could appear to be the same as address literals, such as "192.0.2.1" or "0:0:0:0:0:FFFF::192.0.2.1". Such DNS domain names are not used for two reasons. Applications expecting a Domain Name (as a comment line parameter as an example) would opt to treat the string as an address literal and would therefore not look for the string in the DNS domain name space. The management model of the DNS would prefer to aggregate (as in routing) addresses belonging together in the same zone, resulting in labels appearing in reverse order. E.g., the network address 192.0.2.1 would be represented by a DNS domain name as "1.2.0.192.in-addr.arpa." as described in RFC 1035. For IPv6, the convention used is documented in "DNS Extensions to Support IP Version 6" , section 2.5. See also "Issues in Identifier Comparison for Security Purposes" section 3.1, "Host Names", in particular, section 3.1.1 and 3.1.2 on address literals, and section 4.1, "Conflation." DNS domain names have become the dominant definition of domain names due to the success (scale) of the DNS on the public Internet. Many protocols interact with the DNS but instead of supporting the complete definition of DNS domain names, the protocols rely on a subset more commonly called host names.

Work on the definition of a host name began well before the issuance of the STD 13 documents defining DNS. The rules for the Preferred Syntax in RFC 1034 conform to the host name rules outlined in "DoD Internet host table specification" . The host name definition was presented again in "Requirements for Internet Hosts -- Application and Support" (which is part of STD 3). In section 2.1 of RFC 1123, one (of two mentions) definition of host name is presented, noting that the definition is a relaxation of what is in RFC 952. Host names are subsets of DNS domain names in the sense that the character set is limited. In particular, only "let" (i.e., presumably letters a-z), "digits" and "hyphen" can be used, with hyphen only internal to a label. (This description is meant to be illustrative, not normative. See the grammar presented on page 5 of RFC 952 for specifics.) "Hypertext Transfer Protocol -- HTTP/1.0" , Section 3.2.2 "http URL" specifically references section 2.1 of RFC 1123. The reference is explicit. "Simple Mail Transfer Protocol" refers to RFC 1035 for a definition of domain names but includes text close to what is in the previous paragraph, noting that domain names as used in SMTP refer to both hosts and to other entities. RFC 5321 updates RFC 1123, but does not cite the latter for a definition of host names. RFC 5321 additionally requires brackets to surround address literals, referring to the use case as an "alternative to a domain name." See also "IAB Thoughts on Encodings for Internationalized Domain Names" , particularly section 3 entitled "Use of Non-ASCII in DNS" for more thoughts on host names.

In "Uniform Resource Identifier (URI): Generic Syntax" , also known as STD 66, mentions in its section 3.2.2 (page 20) that the host subcomponent of the URI Authority (section 3.2) "should conform to the DNS syntax". This comes after discussion that the host subcomponent is not strongly tied to the DNS, i.e., names can be managed via a concept other than the DNS. There's no discussion on the rationale but this enables the reuse of code parsing and marshalling the host subcomponent between different Domain Name environments. This reinforces the notion that there's a need to understand how Domain Names interoperate amongst protocols and applications. And reinforces the need to derive or make explicit a way for client software to know how to resolve a name, that is, convert a name into a network address.

The above definition includes address literals such as 192.0.2.1 for IPv4 and even IPv6 literals such as ::ffff:192.0.2.1. Yes, these might qualify as Domain Names. The addresses might be encased in square brackets "[" and "]" (SMTP mentioned already). In the DNS, as previously described in section 3.1, they are represented per appropriate conventions.

The original uses of Domain Names (such as DNS domain names and host names) assumed the ASCII character set. Specifically, making the labels case insensitive prohibited a straightforward use of any method of representation of non-ASCII characters. "Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework" , with associated other documents, defines IDNA2008 as a convention for handling non-ASCII characters in DNS domain names. In figure 1 of that document, the sets of legal DNS domain name formats are defined. Noted in the footnotes of the figure, applications unaware of IDNA2008 cannot distinguish the subsets defined by the document meaning this definition is not an alteration of Domain Names, but, like host names, yet another subset of DNS domain names.

"Suggested Practices for Registration of Internationalized Domain Names (IDN)" presents reasons why DNS domain name registration is restricted in the context of IDN. (That RFC refers to an older form than IDNA2008, but the concepts still apply.) This is yet another convention related to DNS domain names, excluding names that would lead to undesirable outcomes.

The Tor network is an activity organized by the Tor Project, Inc., described on its main web page "https://www.torproject.org/index.html.en". One component of the Tor network name space are Domain Names ending in ".onion". (There are other suffixes in use, but it isn't very clear how they are used, defined or whether they are active.) The way in which Domain Names are used in Tor is described in two web documents "Tor Rendezvous Specification" and "Special Hostnames in Tor" available from the project's website. Syntactically, a Tor domain name fits within the DNS domain name definition but the manner of assignment is different in a manner incompatible with the DNS. (Not better or worse, still significantly different.) Tor domain names are derived from cryptographic keys and organized by distributed hash tables, instead of assigned by a central authority per zone.

"Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile" , section 4.2.1.6 "Subject Alternative Name" a dNSName is defined to be a host name, with the further restriction that the name " " cannot be used. (The subtle irony is that a name consisting of just a blank would hardly qualify as a Domain Name.)

Multicast DNS uses a name space ending with ".local." as described in "Multicast DNS" . The rules for Multicast DNS domain names differ from DNS domain names. Multicast DNS domain names are encoded as Net-Unicode as defined in " Unicode Format for Network Interchange" with the DNS domain name tradition of case folding the ASCII letters when matching names. Appendix F of RFC 6762 gives an explanation of why the punycode algorithm, defined in "Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)" , is not used.

The precursor to DNS, host tables, still exists in remnants in many operating systems. There are library functions, used by applications to resolve DNS domain names, that can return names of arbitrary length (meaning, for example longer than what DNS domain names are defined to be). "Basic Socket Interface Extensions for IPv6" , addresses this in Section 6, further documentation can be found as part of "The Open Group Base Specifications Issue 7" and "Microsoft Winsock Functions" .

This section is used to list (some) other protocols that use Domain Names but in general do not impose any other restrictions that what has been mentioned above. SSH, documented in "The Secure Shell (SSH) Protocol Architecture" uses host names, using the name when storing public keys of hosts. SSH clients, not necessarily the protocol, illustrate how applications juggle the different forms of Domain Names. SSH can be invoked to open a secure shell with a host via its DNS domain name/host name or it can be used to open a secure shell with a host via its Multicast DNS domain name. Or, many others, including name of a purely local, per-user scope. (Note that SSH does not distinguish between DNS names and Multicast DNS domain names in the protocol definition, the difference is handled in resolution libraries belonging to the computing platform.) FTP, defined in "FILE TRANSFER PROTOCOL (FTP)" , is silent on domain names but client implementations of the protocol behave as SSH clients, being un aware the differences between definitions of Domain Names. DHCP, defined in "Dynamic Host Configuration Protocol" , includes domain names in its Domain Search Option as described in "Dynamic Host Configuration Protocol (DHCP) Domain Search Option" . The encoding of Domain Names used is the on-the-wire format of the DNS, using DNS-defined message compression. DHCP handles Domain Names in other options, such as defined in "The DHCP Client FQDN Option" , in the same format. The significance of this is that most other protocols represent DNS domain names or host names in a human readable form, DHCP is using the machine-friendly format.

If there is a use of Domain Names not listed here it is merely an omission. The goal in this document is to provide a survey that is sufficient to avoid hand-waving arguments, recognizing the diminishing return in trying to build a complete roster of uses of Domain Names. If there are omissions that ought to be included, please send references for the use case to the author (while this is an Internet Draft, that is).

Any single protocol can define a format for a conceptual Domain Name. Examples given above show that many protocols have done so. From the examples, it is clear that the way in which protocols have interpreted Domain Names has varied, leading to, at least, user interfaces having to have built-in intelligence when handling names and, at worst, a growing confusion over how the Domain Name space is to be managed. When protocols having different formats and rules for Domain Names interact, software implementing the protocols translate one protocol's domain name format to another's format. Even when the translation is straightforward, it is predictable that software will fail to handle this situation well. Often the clash of definitions impacts the design of a new protocol and/or an extension of a protocol. For example, adding non-ASCII domain names has to be done with backwards compatibility with an installed base of ASCII-assuming code. This clash can inhibit new uses of Domain Names. Search lists are a Domain Name mechanism studied in "SSAC Advisory on DNS 'Search List' Processing" . One of the particular use cases related to this topic is the issuance of search lists via DHCP and then used by any user-client protocol implementation. This emphasizes an interoperability consideration for how Domain Names are treated in different protocols, not just among implementations of one protocol. The detection and handling of Fully Qualified Domain Names is an interoperability issue as well. At issue is the significance of the terminating separation character in a printed version of a Domain Name. Many clients, when passed a Domain Name as an identifier will add a dot at the end of the argument if the argument does not already end in a dot. Some do this only after applying the aforementioned search list. As mentioned in the SSAC document in the previous paragraph, inconsistency leads to surprising results. The Special Use Domain Names registry lists Domain Names that are to be treated in a manner inconsistent with the DNS normal processing rules. This registry contains Domain Names regardless of whether the name is a DNS domain name and regardless whether the name is a top-level (domain) name or is positioned elsewhere in the tree structure. These are reasons this document is needed. The reason for the confusion over what's a legal domain name stems from application-defined restrictions. For example, using a one-label domain name ("dotless") for sending email is not a problem with the DNS nor the name in concept, but is a problem for mail implementations that expect more than one label. (One-label names may be assumed to be in ARPA host table format.) The "IAB Statement: Dotless Domains Considered Harmful" elaborates.

Comments or contributions from Andrew Sullivan, Paul Hoffman, George Michaelson, Kevin Darcy, Joe Abley, Jim Reid, Tony Finch, Robert Edmonds, hellekin, Stephane Bortzmeyer, Ray Bellis, Bob Harold, Alec Muffett, Stuart Cheshire, Dave Thaler, Niall O'Reilly, John Klensin, Dave Crocker, Ken Pogran, John Vittal and a growing list of others I am losing track of. Not to imply endorsement.

None.

Nothing direct. This document proposes a definition of the term "Domain Name" and surveys how it has been variously applied. In some sense, loosely defined terms give rise to security hazards. Beyond that, there is no impact of "security."