INTERNET DRAFT Ron Daniel draft-daniel-naptr-00.txt Los Alamos National Laboratory Michael Mealling Georgia Institute of Technology 13 June, 1996 Resolution of Uniform Resource Identifiers using the Domain Name System Status of this Memo =================== This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). This draft expires 18 Dec., 1996. Abstract: ========= Uniform Resource Locators (URLs) are the foundation of the World Wide Web, and are a vital Internet technology. However, they have proven to be brittle in practice. The basic problem is that URLs typically identify a particular path to a file on a particular host. There is no graceful way of changing the path or host once the URL has been assigned. Neither is there a graceful way of replicating the resource located by the URL to achieve better network utilization and/or fault tolerance. Uniform Resource Names (URNs) have been hypothesized as a replacement for URLs that would overcome such problems. URNs and URLs are both instances of a broader class of identifiers known as Uniform Resource Identifiers (URIs). This document describes a new DNS Resource Record, NAPTR, that provides rules for mapping parts of URIs to domain names. By changing the mapping rules, we can change the host that is contacted to resolve a URI. This will allow a more graceful handling of URLs over long time periods, and forms the foundation for a new proposal for Uniform Resource Names. In addition to locating resolvers, the NAPTR provides for other naming systems to be grandfathered into the URN world, provides independence between the name assignment system and the resolution protocol system, and allows multiple services (Name to Location, Name to Description, Name to Resource, ...) to be offered. In conjunction with the SRV RR proposal [3], the NAPTR record allows those services to be replicated for the purposes of fault tolerance and load balancing. Introduction: ============= Uniform Resource Locators have been a significant advance in locating resources on the Internet. However, their brittle nature over time has been recognized for several years. The Uniform Resource Identifer working group proposed the development of Uniform Resource Names to serve as persistent, location-independent identifiers for Internet resources in order to overcome most of the problems with URLs. RFC-1737 [1] sets forth requirements on URNs. Even if URN systems were in place now, there would still be a tremendous number of URLs. It should be possible to develop a URN resolution system that can also provide location independance for those URLs. This is related to the requirement in [1] to be able to grandfather in names from other naming systems, such as ISO Formal Public Identifiers, Library of Congress Call Numbers, ISBNs, ISSNs, etc. The developers of various URN proposals have held a series of meetings, resulting in a compromise known as the Knoxville framework. The major principle behind the Knoxville framework is that any name assignment hierarchy should be seperate from the resolution hierarchy. This is in marked contrast to the Domain Name System, where the two are identical. Readers are referred to [2] for background on the Knoxville framework for additional information on the context and purpose of this proposal. For the reasons mentioned earlier, we want to be able to resolve URNs and URLs within the same framework. For the short term, DNS is the obvious candidate for the resolution framework, since it is widely deployed and understood. However, it is not appropriate to use DNS to maintain information on a per-resource basis. First of all, DNS was never intended to handle that many records. Second, the limited record size is inappropriate for catalog information. Third, we have the requirement mentioned above about grandfathering of other name systems. Therefore our approach is to use DNS to locate "resolvers" that can provide information on individual resources, potentially including the resource itself. To locate a resolver, given a URI, we will go through a rewriting procedure. This procedure may take multiple steps, but the beginning is always the same. Every URI has a colon-delimited prefix. NAPTR resolution begins by taking this prefix, appending the well-known suffix "urn.net", and querying DNS for NAPTR records at that domain name. Based on the results of this query, zero or more additional queries may be needed to locate resolvers for the URI. Three brief examples of this procedure are given in the next section. The NAPTR RR provides the level of indirection needed to keep the naming system independent of the resolution system, its protocols, and services. Coupled with the new SRV resource record proposal[3] there is also the potential for replicating the resolver on multiple hosts, overcoming some of the most significant problems of URLs. This is an important and subtle point. Not only do the NAPTR and SRV records allow us to replicate the resource, we can replicate the resolvers that know about the replicated resource. Brief overview and examples of the NAPTR RR: ============================================ A detailed description of the NAPTR RR will be given later, but to give a flavor for the proposal we first give a simple description of the record and two examples of its use. The key fields in the NAPTR RR are order, service, flags, and pattern. * The order field specifies the order in which records should be processed when multiple NAPTR records are returned in response to a single query. * The service field specifies the resolution protocol and resolution service(s) that will be available if the rewrite specified by the pattern field is applied. (Resolution services are operations such as N2R (URN to Resource), N2L (URN to URL), N2C (URN to URC), etc). * The flags field contains modifiers that affect what happens in the next DNS lookup. * Pattern provides the rules on how to rewrite the original URN into a new domain name for the next lookup. The pattern may be a simple replacement string (the usual case) or a regular expression substitution string. (It should be noted that the client applies all the substitutions and performs all lookups - this will not be handled in DNS itself). Example 1 --------- Consider a URN that uses the DUNS namespace. DUNS numbers are identifiers for approximately 30 million registered businesses around the world, assigned and maintained by Dunn and Bradstreet. The URN might look like: urn:duns:002372413:annual-report-1997 The first step in the resolution process is to find out about the DUNS namespace. The namespace identifier, duns, is extracted from the URN, prepended to urn.net, and the NAPTRs for duns.urn.net looked up. It might return records of the form: ;; order flags service pattern duns.urn.net NAPTR 10 "s" "dunslink+N2L+N2C" "_dunslink._udp.isi.dandb.com" 10 "s" "rcds+N2C" "_rcds._udp.isi.dandb.com" 10 "s" "http+N2L+N2C+N2R" "_http._tcp.isi.dandb.com" This says that the provider offers the special dunslink protocol, which will allow you to get either a URL or a URC (description) of the resource. The Resource Cataloging and Distribution Service (RCDS) could be used to get a URC for the resource, while HTTP could be used to get a URL, URC, or the resource itself. Assuming we don't know dunslink, but do know RCDS, the pattern field says that our next lookup should be for _rcds._udp.isi.dandb.com. While in this example the new domain name was provided, NAPTRs also allow the domain name to be constructed from the URN using regular expressions, which are denoted by an initial '/' in the pattern. There are two important considerations to allow for delegation of name resolution. First, NAPTR rewrites can potentially be iterative. The last NAPTR is known as the "terminal NAPTR". Once we have the terminal NAPTR, our next probe into the DNS will be for a SRV or A record instead of another NAPTR. The flags field is used to indicate a terminal lookup. If it has a value of "s", the next lookup should be for SRV RRs, "a" denotes that A records should sought. Second, records MUST be processed in the order specified by the order field. This allows administrators to say that "all records mathing this pattern go to server1, all others go to server2". Since our example RR specified the "s" flag, it was terminal. Our next action is to lookup SRV RRs for _rcds._udp.isi.dandb.com, which will tell us hosts that can provide the necessary resolution service. That lookup might return: ;; Pref Weight Port Target rcds._udp.isi.dandb.com SRV 0 0 1000 defduns.isi.dandb.com SRV 0 0 1000 dbmirror.com.au SRV 0 0 1000 ukmirror.com.uk telling us three hosts that could actually do the resolution, and giving us the port we should use to talk to their RCDS server. (The reader is referred to the SRV proposal [3] for the interpretation of the fields above). Since we know certain NAPTRs are terminal, we could include the corresponding SRV RRs as additional info. Further, the SRV RRs could include A records as additional info. While this recursive provision of additional information is not explicitly blessed in the DNS specifications, it is not forbidden, and BIND does take advantage of it [4]. This is a significant optimization. In conjunction with a long TTL for *.urn.net records, the average number of probes to DNS for resolving DUNS URNs would approach one. Example 2 --------- Consider a URN namespace based on the existing hierarchy in DNS. We might use MIME Content-Ids for such names. The URN might look like this: urn:cid:199606121851.1@gatech.edu The first step in the resolution process is to find out about the CID namespace. The namespace identifier, cid, is extracted from the URN, prepended to urn.net, and the NAPTR for cid.urn.net looked up. It might return records of the form: ;; order flags service pattern cid.urn.net IN NAPTR 10 "" "" "/.+@([^@]+)/\1/i" ;; We have only one NAPTR response, so ordering the responses is not a problem. The pattern begins with "/", which signals that it is a regular expression. We apply that regexp to the entire URN to see if it matches, which it does. The \1 part of the regexp returns the string "gatech.edu". Since the flags field is empty, NAPTR records for this string are looked up which could return something like: ;; order flags service pattern gatech.edu IN NAPTR 10 "s" "z39.50+N2L+N2C" "_z3950._tcp.gatech.edu" 20 "s" "rcds+N2C" "_rcds._udp.gatech.edu" 30 "s" "http+N2L+N2C+N2R" "_http._tcp.gatech.edu" ;; Now the flags field tells us that this is the last NAPTR patterns we should see, and after the rewrite (a simple substitution in this case) we should look up SRV records to get information on the hosts that can provide the necessary service. Assuming we know the Z39.50 protocol, our lookup might return: ;; Pref Weight Port Target _z3950._tcp.gatech.edu IN SRV 0 0 1000 z3950.gatech.edu IN SRV 0 0 1000 z3950.cc.gatech.edu IN SRV 0 0 1000 z3950.uga.edu ;; telling us three hosts that could actually do the resolution, and giving us the port we should use to talk to their Z39.50 server. There is a significant caveat about the use of backslashes in DNS zone files. DNS treats backslashes as the escape character so that '.' can be escaped when necessary. This means that when a regular expression is entered into the zone file, the backslashes must be escaped by another backslash. For the case of the cid.urn.net record above, the regular expression entered into the zone file should be "/.+@([^@]+)/\\1/i". When the client code actually recieves the record, the pattern will have been converted to "/.+@([^@]+)/\1/i". Example 3 --------- As mentioned above, the NAPTR RR can also be used for URLs that have already been assigned. Assume we have the URL for a very popular piece of software that the publisher wishes to mirror at multiple sites around the world: http://www.foo.com/software/latest-beta.exe We extract the prefix, "http", and lookup NAPTR records for http.urn.net. This returns a record of the form ;; order flags service pattern http.urn.net IN NAPTR 10 "" "" "/.*\/\/([^\/:]+)/\1/i" This expression returns everything after the first double slash and before the next slash or colon. Backslashes are needed to escape the forward slash since the forward slash character is what seperates the components of the substitution pattern. (Recall from the previous example that in the zone file, this pattern actually needs to be entered as "/.*\\/\\/([^\\/]+)/\\1/i". Applying this pattern to the URL extracts "www.foo.com". Looking up NAPTR records for that might return: ;; order flags service pattern www.foo.com IN NAPTR 10 "s" "http+L2R" "_http._tcp.foo.com" 10 "s" "ftp+L2R" "_ftp._tcp.foo.com" Looking up SRV records for _http._tcp.foo.com would return information on the hosts that foo.com has designated to be its mirror sites. The client can then pick one for the user. NAPTR RR Format =============== The format of the NAPTR RR is given below. The DNS type code for NAPTR is 104. Domain TTL Class Order Flags Service Pattern where: Domain The domain name this resource record refers to. TTL Standard DNS Time To Live field Class Standard DNS meaning Order Specifies the order in which NAPTR records are to be processed. This is similar to the preference field in an MX record, and is used so domain administrators can delegate parts of their URI namespaces to different servers. When a client recieves multiple NAPTR RRs, it first eliminates any which it can not handle. They might require an unknown protocol or not provide any known resolution services. After this, the remaining records should be sorted according to the "order" field. Once an order for the records has been established, they should be processed in that order. Flags A String giving flags to control aspects of the rewriting. Flags are single characters from the alphabet [A-Z0-9]. The case of the alphabetic characters is not significant. At this time only two flags, "S" and "A", are defined. "S" means that the next lookup should be for SRV records instead of NAPTR records. "A" means that the next lookup should be for A records. The S and A flags are mutually exclusive. Service Specifies the resolution service(s) available down this rewrite path. May also specify the particular protocol that is used to talk with a resolver. If a protocol is specified in the service field, the rewrite will almost certainly be terminal, although the flags field should still be considered the authoritative source of information on that subject. The service field may take any of the values below (using the Augmented BNF of RFC 822[5]): service_field = [ [protocol]*("+" rs)] protocol = "RCDS" / "HTTP" rs = "N2L" / "N2Ls" / "N2R" / "N2Rs" / "N2C" i.e. an optional protocol specification followed by 0 or more resolution services. Each resolution service is indicated by an initial '+' character. Note that the empty string is also a valid service field. This will typically be seen at the top levels of a namespace, when it is impossible to know what services and protocols will be offered by a particular publisher within that name space. At this time the known protocols are RCDS and HTTP. More will be allowed later. At this time the allowed service requests are: N2L - Given a URN, return a URL N2Ls - Given a URN, return a set of URLs N2R - Given a URN, return an instance of the resource. N2Rs - Given a URN, return multiple instances of the resouce, typically encoded using multipart/alternative. N2C - Given a URN, return a collection of meta-information on the named resource. The format of this response is the subject of another document. L2R - Given a URL, return the resource. The actual format of the service request and response will be determined by the resolution protocol, and is the subject for other documents. Pattern Rules on how to obtain a new domain name to lookup, along with modifiers. The pattern may be a simple replacement domain name, or it may be a sed(1)-style substitution expression that is applied to the URN. A couple of examples: "lanl.gov" Simple substitution pattern. The next DNS lookup will be for the domain name "lanl.gov". "/^.*inet:([^:]+):*/\\1/i" Regular expression replacement. The URN will be searched from the beginning for the pattern "inet:". All characters after that colon, up to but not including the next colon, will form the next domain name to be looked up. The /i flag says that the match will be case-insensitive, so IneT, Inet, INET, ... would all be matched. The new search key would also be normalized to lower case. Advice to domain administrators: ================================ Beware of regular expressions. Not only are they a pain to get correct on their own, but there is the previously mentioned interaction with DNS. Any backslashes in a regexp must be entered twice in a zone file in order to appear once in a query response. More seriously, the need for double backslashes has probably not been tested by all implementors of DNS servers. The order field should be used to control the evaluation order a client follows in order to properly determine how the namespace has been delegated. For example, if you want all URIs beginning with the prefix P to go to server S, and all other URIs to go to server S2, you should have two NAPTR records, and the one specifying the prefix should have the smaller value for order so that it is checked first. Administrators are encouraged to provide SRV records and A records as additional information in terminal NAPTR records. Usage ===== Pseudocode for a client using NAPTRs is given below: // // findResolver(URN) // Given a URN, find a host that can resolve it. // findResolver(string URN) { sprintf(key, "%s.urn.net", extractNS(URN)); do { rewrite_flag = false; if (key has been seen) { quit with a loop detected error } add key to list of "seens" records = lookup(type=NAPTR, key); // get all NAPTR RRs for 'key' sort naptr records by order, deleting those requiring us to use unknown protocols or services. If multiple records have the same "order", arrange them randomly in the group. foreach ( naptr record ) { // in order of preference newkey = rewrite(URN, naptr[j].pattern); if (newkey) { // If there was a rewite, go with it if (strcasecmp(flags, "S") { // Was rewrite terminal? services = naptr[j].services; srvs = any SRV RRs returned as additional info } key = newkey; rewriteflag = true; } } } while (rewriteflag); if (!srvs) { // No SRVs came in as additional info, look them up srvs = lookup(type=SRV, key); } sort SRV records by preference, weight, ... foreach (SRV record) { // in order of preference try contacting srv[j].target using the protocol and one of the resolution service requests from the "services" field of the last NAPTR record. if (successful) return (target, protocol, service); // Actually we would probably return a result, but this // code was supposed to just tell us a good host to talk to. } die with an "unable to find a host" error; } Notes: ====== - The "urn:" prefix is a matter of religious controversy. Client code should handle the cases when it is and is not present. Similarly, if regular expressions are used in NAPTR records, they should be immune to the presence or absence of "urn:". - A client must examine all the RRs in a reply, and must process them according to the value of the order field. - If a record at a particular order matches the URI, but the client doesn't know the specified protocol and service, the client may continue to examine records that have the same order. The client shall not consider records with a higher value of order. This is necesary to make delegation of portions of the namespace work. - When multiple RRs have the same "order", the client may use preferred resolution services or protocols to weight the random selection of a match. Some form of randomization MUST be performed to be considered a conforming implementation. - If the lookup after a rewrite fails, clients are strongly encouraged to report a failure, rather than backing up to pursue other rewrite paths. - Note that SRV RRs impose additional requirements on clients. Acknowledgements: ================= The authors would like to thank Keith Moore for all his consultations during the development of this draft. We would also like to thank Paul Vixie for his assistance in debugging our implementation. References: =========== [1] RFC-1737 "Functional Requirements for Uniform Resource Names", Karen Sollins and Larry Masinter, Dec. 1994. [2] draft-daigle-urn-framework-01.txt "A Uniform Resource Naming Framework", Leslie Daigle and Patrik Faltstrom, June, 1996. [3] draft-gulbrandsen-dns-rr-srvcs-03.txt " A DNS RR for specifying the location of services (DNS SRV)", Arnt Gulbrandsen and Paul Vixie, March 1996. [4] Paul Vixie, personal communication. [5] RFC-822 "Standard for the Format of ARPA Internet Text Messages", Dave H. Crocker, August 1982. [6] Keith Moore, personal communication. Security Considerations ======================= The use of "urn.net" as the registry for URN namespaces is subject to denial of service attacks, as well as other DNS spoofing attacks. The rewrite rules make identifiers from other namespaces subject to the same attacks as normal domain names. Since they have not been easily resolvable before, this may or may not be considered a problem. Author Contact Information: =========================== Ron Daniel Los Alamos National Laboratory MS B287 Los Alamos, NM, USA, 87545 voice: +1 505 665 0597 fax: +1 505 665 4939 email: rdaniel@lanl.gov Michael Mealling Office of Information Technology, Network Services Georgia Institute of Technology Atlanta, GA, USA 30332-0730 voice: (404) 894-1712 fax: (404) 894 9548 michael.mealling@oit.gatech.edu This draft expires 18 Dec., 1996.