INTERNET DRAFT                                                  Ron Daniel
draft-daniel-naptr-00.txt                   Los Alamos National Laboratory
                                                          Michael Mealling
                                           Georgia Institute of Technology
                                                             13 June, 1996


              Resolution of Uniform Resource Identifiers
                     using the Domain Name System


Status of this Memo
===================

    This document is an Internet-Draft.  Internet-Drafts are working
    documents of the Internet Engineering Task Force (IETF), its
    areas, and its working groups.  Note that other groups may also
    distribute working documents as Internet-Drafts.
  
    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other
    documents at any time.  It is inappropriate to use Internet-
    Drafts as reference material or to cite them other than as
    ``work in progress.''
  
    To learn the current status of any Internet-Draft, please check
    the ``1id-abstracts.txt'' listing contained in the Internet-
    Drafts Shadow Directories on ftp.is.co.za (Africa),
    nic.nordu.net (Europe), munnari.oz.au (Pacific Rim),
    ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).

    This draft expires 18 Dec., 1996.
  
  
Abstract:
=========

Uniform Resource Locators (URLs) are the foundation of the World Wide
Web, and are a vital Internet technology. However, they have proven to
be brittle in practice. The basic problem is that URLs typically
identify a particular path to a file on a particular host. There is no
graceful way of changing the path or host once the URL has been
assigned. Neither is there a graceful way of replicating the resource
located by the URL to achieve better network utilization and/or fault
tolerance. Uniform Resource Names (URNs) have been hypothesized as a
replacement for URLs that would overcome such problems. URNs and URLs
are both instances of a broader class of identifiers known as Uniform
Resource Identifiers (URIs).

This document describes a new DNS Resource Record, NAPTR, that provides
rules for mapping parts of URIs to domain names.  By changing the
mapping rules, we can change the host that is contacted to resolve a
URI. This will allow a more graceful handling of URLs over long time
periods, and forms the foundation for a new proposal for Uniform
Resource Names.

In addition to locating resolvers, the NAPTR provides for other naming
systems to be grandfathered into the URN world, provides independence
between the name assignment system and the resolution protocol system,
and allows multiple services (Name to Location, Name to Description,
Name to Resource, ...) to be offered.  In conjunction with the SRV RR
proposal [3], the NAPTR record allows those services to be replicated
for the purposes of fault tolerance and load balancing.


Introduction:
=============

Uniform Resource Locators have been a significant advance in locating
resources on the Internet. However, their  brittle nature over time
has been recognized for several years. The Uniform Resource Identifer
working group proposed the development of Uniform Resource Names to serve
as persistent, location-independent identifiers for Internet resources
in order to overcome most of the problems with URLs. RFC-1737 [1] sets
forth requirements on URNs.

Even if URN systems were in place now, there would still be a
tremendous number of URLs.  It should be possible to develop a URN
resolution system that can also provide location independance for those
URLs.  This is related to the requirement in [1] to be able to
grandfather in names from other naming systems, such as ISO Formal
Public Identifiers, Library of Congress Call Numbers, ISBNs, ISSNs,
etc.

The developers of various URN proposals have held a series of meetings,
resulting in a compromise known as the Knoxville framework. The
major principle behind the Knoxville framework is that any name
assignment hierarchy should be seperate from the resolution hierarchy.
This is in marked contrast to the Domain Name System, where the two are
identical.  Readers are referred to [2] for background on the Knoxville
framework for additional information on the context and purpose of this
proposal.

For the reasons mentioned earlier, we want to be able to resolve URNs
and URLs within the same framework. For the short term, DNS is the
obvious candidate for the resolution framework, since it is widely
deployed and understood. However, it is not appropriate to use DNS to
maintain information on a per-resource basis. First of all, DNS was
never intended to handle that many records.  Second, the limited record
size is inappropriate for catalog information.  Third, we have the
requirement mentioned above about grandfathering of other name
systems.

Therefore our approach is to use DNS to locate "resolvers" that can
provide information on individual resources, potentially including the
resource itself. To locate a resolver, given a URI, we will go through
a rewriting procedure. This procedure may take multiple steps, but the
beginning is always the same. Every URI has a colon-delimited prefix.
NAPTR resolution begins by taking this prefix, appending the well-known
suffix "urn.net", and querying DNS for NAPTR records at that domain
name.  Based on the results of this query, zero or more additional
queries may be needed to locate resolvers for the URI. Three brief
examples of this procedure are given in the next section.

The NAPTR RR provides the level of indirection needed to keep the naming
system independent of the resolution system, its protocols, and services.
Coupled with the new SRV resource record proposal[3] there is also the
potential for replicating the resolver on multiple hosts, overcoming some
of the most significant problems of URLs. This is an important and subtle
point. Not only do the NAPTR and SRV records allow us to replicate
the resource, we can replicate the resolvers that know about the replicated
resource.


Brief overview and examples of the NAPTR RR:
============================================

A detailed description of the NAPTR RR will be given later, but to give
a flavor for the proposal we first give a simple description of the
record and two examples of its use.

The key fields in the NAPTR RR are order, service, flags, and pattern.
* The order field specifies the order in which records should
  be processed when multiple  NAPTR records are returned in response to
  a single query.
* The service field specifies the resolution protocol and resolution
  service(s) that will be available if the rewrite specified by the
  pattern field is applied. (Resolution services are operations such
  as N2R (URN to Resource), N2L (URN to URL), N2C (URN to URC), etc).
* The flags field contains modifiers that affect what happens in the
  next DNS lookup.
* Pattern provides the rules on how to rewrite the original URN into
  a new domain name for the next lookup. The pattern may be a simple
  replacement string (the usual case) or a regular expression substitution
  string. (It should be noted that the client applies all the substitutions
  and performs all lookups - this will not be handled in DNS itself).

Example 1
---------

Consider a URN that uses the DUNS namespace. DUNS numbers are
identifiers for approximately 30 million registered businesses around
the world, assigned and maintained by Dunn and Bradstreet. The URN
might look like:

                 urn:duns:002372413:annual-report-1997

The first step in the resolution process is to find out about the DUNS
namespace. The namespace identifier, duns, is extracted from the URN,
prepended to urn.net, and the NAPTRs for duns.urn.net looked up. It might
return records of the form:

;;               order flags service            pattern
duns.urn.net NAPTR 10   "s"  "dunslink+N2L+N2C" "_dunslink._udp.isi.dandb.com"
                   10   "s"  "rcds+N2C"         "_rcds._udp.isi.dandb.com"
                   10   "s"  "http+N2L+N2C+N2R" "_http._tcp.isi.dandb.com"

This says that the provider offers the special dunslink
protocol, which will allow you to get either a URL or a URC (description)
of the resource. The Resource Cataloging and Distribution Service (RCDS)
could be used to get a URC for the resource, while HTTP could be used to
get a URL, URC, or the resource itself.

Assuming we don't know dunslink, but do know RCDS, the pattern field
says that our next lookup should be for _rcds._udp.isi.dandb.com. While
in this example the new domain name was provided, NAPTRs also allow
the domain name to be constructed from the URN using regular
expressions, which are denoted by an initial '/' in the pattern.

There are two important considerations to allow for delegation of
name resolution. First, NAPTR rewrites can potentially
be iterative. The last NAPTR is known as the "terminal NAPTR". Once we
have the terminal NAPTR, our next probe into the DNS will be for a SRV 
or A record instead of another NAPTR. The flags field is used to indicate
a terminal lookup. If it has a value of "s", the next lookup should
be for SRV RRs, "a" denotes that A records should sought.

Second, records MUST be processed in the order specified by
the order field. This allows administrators to say that "all
records mathing this pattern go to server1, all others go to
server2".

Since our example RR specified the "s" flag, it was terminal. Our
next action is to lookup SRV RRs for _rcds._udp.isi.dandb.com, which
will tell us hosts that can provide the necessary resolution service. 
That lookup might return:

;;                          Pref Weight Port Target
  rcds._udp.isi.dandb.com SRV 0    0    1000 defduns.isi.dandb.com
                          SRV 0    0    1000 dbmirror.com.au
                          SRV 0    0    1000 ukmirror.com.uk

telling us three hosts that could actually do the resolution, and
giving us the port we should use to talk to their RCDS server.
(The reader is referred to the SRV proposal [3] for the interpretation
of the fields above).

Since we know certain NAPTRs are terminal, we could include the
corresponding SRV RRs as additional info. Further, the SRV RRs could
include A records as additional info.  While this recursive provision
of additional information is not explicitly blessed in the DNS
specifications, it is not forbidden, and BIND does take advantage of it
[4].  This is a significant optimization. In conjunction with a long
TTL for *.urn.net records, the average number of probes to DNS for
resolving DUNS URNs would approach one.


Example 2 
---------

Consider a URN namespace based on the existing hierarchy
in DNS. We might use MIME Content-Ids for such names.
The URN might look like this:
  
        urn:cid:199606121851.1@gatech.edu
  
The first step in the resolution process is to find out about the CID
namespace. The namespace identifier, cid, is extracted from the URN,
prepended to urn.net, and the NAPTR for cid.urn.net looked up. It might
return records of the form:
 
;;                 order flags  service    pattern
cid.urn.net IN NAPTR 10   ""    ""         "/.+@([^@]+)/\1/i"
;;
 
We have only one NAPTR response, so ordering the responses is not a problem.
The pattern begins with "/", which signals that it is a regular expression.
We apply that regexp to the entire URN to see if it matches, which it does.
The \1 part of the regexp returns the string "gatech.edu". Since the
flags field is empty, NAPTR records for this string are looked up
which could return something like:

;;                order flags service            pattern
gatech.edu IN NAPTR 10   "s"  "z39.50+N2L+N2C"   "_z3950._tcp.gatech.edu"
                    20   "s"  "rcds+N2C"         "_rcds._udp.gatech.edu"
                    30   "s"  "http+N2L+N2C+N2R" "_http._tcp.gatech.edu"
;;

Now the flags field tells us that this is the last NAPTR patterns we
should see, and after the rewrite (a simple substitution in this case) we
should look up SRV records to get information on the hosts 
that can provide the necessary service.  Assuming we know the Z39.50
protocol, our lookup might return:
 
;;                            Pref Weight Port Target
_z3950._tcp.gatech.edu IN SRV 0    0      1000 z3950.gatech.edu
                       IN SRV 0    0      1000 z3950.cc.gatech.edu
                       IN SRV 0    0      1000 z3950.uga.edu
;;

telling us three hosts that could actually do the resolution, and
giving us the port we should use to talk to their Z39.50 server.

There is a significant caveat about the use of backslashes in DNS zone
files. DNS treats backslashes as the escape character so that '.' can
be escaped when necessary. This means that when a regular expression is
entered into the zone file, the backslashes must be escaped by another
backslash. For the case of the cid.urn.net record above, the regular
expression entered into the zone file should be "/.+@([^@]+)/\\1/i".
When the client code actually recieves the record, the pattern will
have been converted to "/.+@([^@]+)/\1/i".

Example 3
---------

As mentioned above, the NAPTR RR can also be used for URLs that have
already been assigned. Assume we have the URL for a very popular piece
of software that the publisher wishes to mirror at multiple sites around
the world:

     http://www.foo.com/software/latest-beta.exe

We extract the prefix, "http", and lookup NAPTR records for
http.urn.net. This returns a record of the form

;;                  order flags  service    pattern
http.urn.net IN NAPTR 10   ""    ""         "/.*\/\/([^\/:]+)/\1/i"

This expression returns everything after the first double slash and
before the next slash or colon. Backslashes are needed to escape the
forward slash since the forward slash character is what seperates the
components of the substitution pattern. (Recall from the previous
example that in the zone file, this pattern actually needs to be
entered as "/.*\\/\\/([^\\/]+)/\\1/i".  Applying this pattern to the
URL extracts "www.foo.com". Looking up NAPTR records for that might
return:

;;                  order flags  service    pattern
www.foo.com IN NAPTR  10  "s"    "http+L2R" "_http._tcp.foo.com"
                      10  "s"    "ftp+L2R"  "_ftp._tcp.foo.com"

Looking up SRV records for _http._tcp.foo.com would return information
on the hosts that foo.com has designated to be its mirror sites. The
client can then pick one for the user. 


NAPTR RR Format
===============

The format of the NAPTR RR is given below. The DNS type code for
NAPTR is 104.

    Domain TTL Class Order Flags Service Pattern

where:

Domain
       The domain name this resource record refers to.
TTL
       Standard DNS Time To Live field
Class
       Standard DNS meaning
Order
       Specifies the order in which NAPTR records are to be processed.
       This is similar to the preference field in an MX record, and is
       used so domain administrators can delegate parts of their URI
       namespaces to different servers.

       When a client recieves multiple NAPTR RRs, it first eliminates
       any which it can not handle. They might require an unknown protocol
       or not provide any known resolution services. After this, the
       remaining records should be sorted according to the "order" field.
       Once an order for the records has been established, they should
       be processed in that order.

Flags
       A String giving flags to control aspects of the rewriting. Flags
       are single characters from the alphabet [A-Z0-9]. The case of
       the alphabetic characters is not significant.

       At this time only two flags, "S" and "A", are defined. "S" means
       that the next lookup should be for SRV records instead of NAPTR
       records. "A" means that the next lookup should be for A records.
       The S and A flags are mutually exclusive.

Service 
       Specifies the resolution service(s) available down this rewrite
       path. May also specify the particular protocol that is used to
       talk with a resolver. If a protocol is specified in the service
       field, the rewrite will almost certainly be terminal, although the
       flags field should still be considered the authoritative source of
       information on that subject.

       The service field may take any of the values below (using the
       Augmented BNF of RFC 822[5]):

           service_field = [ [protocol]*("+" rs)]
           protocol      = "RCDS" / "HTTP"
           rs            = "N2L" / "N2Ls" / "N2R" / "N2Rs" / "N2C"

       i.e. an optional protocol specification followed by 0 or more
       resolution services. Each resolution service is indicated by
       an initial '+' character.

       Note that the empty string is also a valid service field. This
       will typically be seen at the top levels of a namespace, when it
       is impossible to know what services and protocols will be offered
       by a particular publisher within that name space.

       At this time the known protocols are RCDS and HTTP. More
       will be allowed later. At this time the allowed service requests
       are:
             N2L  - Given a URN, return a URL
             N2Ls - Given a URN, return a set of URLs
             N2R  - Given a URN, return an instance of the resource.
             N2Rs - Given a URN, return multiple instances of the resouce,
                    typically encoded using multipart/alternative.
             N2C  - Given a URN, return a collection of meta-information on
                    the named resource. The format of this response is the
                    subject of another document.
             L2R  - Given a URL, return the resource.
       The actual format of the service request and response will be
       determined by the resolution protocol, and is the subject for other
       documents.

Pattern
       Rules on how to obtain a new domain name to lookup, along with
       modifiers. The pattern may be a simple replacement domain name,
       or it may be a sed(1)-style substitution expression that is applied
       to the URN. A couple of examples:

         "lanl.gov"
               Simple substitution pattern. The next DNS lookup will be
               for the domain name "lanl.gov".

         "/^.*inet:([^:]+):*/\\1/i"
               Regular expression replacement. The URN will be searched
               from the beginning for the pattern "inet:". All characters
               after that colon, up to but not including the next colon,
               will form the next domain name to be looked up.  The /i
               flag says that the match will be case-insensitive, so
               IneT, Inet, INET, ... would all be matched. The new
               search key would also be normalized to lower case.

Advice to domain administrators:
================================

Beware of regular expressions. Not only are they a pain to get
correct on their own, but there is the previously mentioned interaction
with DNS. Any backslashes in a regexp must be entered twice in a zone
file in order to appear once in a query response. More seriously, the
need for double backslashes has probably not been tested by all
implementors of DNS servers.

The order field should be used to control the evaluation order a client
follows in order to properly determine how the namespace has been
delegated. For example, if you want all URIs beginning with the prefix
P to go to server S, and all other URIs to go to server S2, you should
have two NAPTR records, and the one specifying the prefix should have the
smaller value for order so that it is checked first.

Administrators are encouraged to provide SRV records and A records as
additional information in terminal NAPTR records.


Usage
=====

Pseudocode for a client using NAPTRs is given below:

    //
    // findResolver(URN)
    // Given a URN, find a host that can resolve it.
    // 
    findResolver(string URN) {
      sprintf(key, "%s.urn.net", extractNS(URN));
      do {
        rewrite_flag = false;
	if (key has been seen) {
	  quit with a loop detected error
	}
	add key to list of "seens"
	records = lookup(type=NAPTR, key); // get all NAPTR RRs for 'key'

	sort naptr records by order, deleting those requiring us to
                use unknown protocols or services. If multiple records
                have the same "order", arrange them randomly in the group.
	foreach ( naptr record ) {  // in order of preference
          newkey = rewrite(URN, naptr[j].pattern);
          if (newkey) { // If there was a rewite, go with it
            if (strcasecmp(flags, "S") { // Was rewrite terminal?
               services = naptr[j].services;
               srvs = any SRV RRs returned as additional info
            }
            key = newkey;
            rewriteflag = true;
          }
        }
      } while (rewriteflag);


      if (!srvs) { // No SRVs came in as additional info, look them up
        srvs = lookup(type=SRV, key);
      }  

      sort SRV records by preference, weight, ...
      foreach (SRV record) { // in order of preference
        try contacting srv[j].target using the protocol and one of the
            resolution service requests from the "services" field of the
            last NAPTR record.
        if (successful)
          return (target, protocol, service);
          // Actually we would probably return a result, but this
          // code was supposed to just tell us a good host to talk to.
      }
      die with an "unable to find a host" error;
    }


Notes:
======
  -  The "urn:" prefix is a matter of religious controversy. Client
     code should handle the cases when it is and is not present.
     Similarly, if regular expressions are used in NAPTR records, they
     should be immune to the presence or absence of "urn:".
  -  A client must examine all the RRs in a reply, and must process them
     according to the value of the order field.
  -  If a record at a particular order matches the URI, but the client
     doesn't know the specified protocol and service, the client may
     continue to examine records that have the same order. The client
     shall not consider records with a higher value of order. This is
     necesary to make delegation of portions of the namespace work.
  -  When multiple RRs have the same "order", the client may use
     preferred resolution services or protocols to weight the random
     selection of a match. Some form of randomization MUST be performed
     to be considered a conforming implementation.
  -  If the lookup after a rewrite fails, clients are strongly encouraged
     to report a failure, rather than backing up to pursue other rewrite
     paths.
  -  Note that SRV RRs impose additional requirements on clients.


Acknowledgements:
=================

The authors would like to thank Keith Moore for all his consultations
during the development of this draft. We would also like to thank Paul
Vixie for his assistance in debugging our implementation.


References:
===========

[1] RFC-1737 "Functional Requirements for Uniform Resource Names", Karen
    Sollins and Larry Masinter, Dec. 1994.

[2] draft-daigle-urn-framework-01.txt "A Uniform Resource Naming
    Framework", Leslie Daigle and Patrik Faltstrom, June, 1996.

[3] draft-gulbrandsen-dns-rr-srvcs-03.txt  " A DNS RR for specifying the
    location of services (DNS SRV)",  Arnt Gulbrandsen and Paul Vixie,
    March 1996.

[4] Paul Vixie, personal communication.

[5] RFC-822 "Standard for the Format of ARPA Internet Text Messages",
    Dave H. Crocker, August 1982.  

[6] Keith Moore, personal communication.


Security Considerations
=======================
  The use of "urn.net" as the registry for URN namespaces is subject to
  denial of service attacks, as well as other DNS spoofing attacks.

  The rewrite rules make identifiers from other namespaces subject to
  the same attacks as normal domain names. Since they have not been
  easily resolvable before, this may or may not be considered a problem.


Author Contact Information:
===========================

Ron Daniel
Los Alamos National Laboratory
MS B287
Los Alamos, NM, USA, 87545
voice:  +1 505 665 0597
fax:    +1 505 665 4939
email:  rdaniel@lanl.gov


Michael Mealling
Office of Information Technology, Network Services
Georgia Institute of Technology
Atlanta, GA, USA 30332-0730
voice:  (404) 894-1712
fax:  (404) 894 9548
michael.mealling@oit.gatech.edu


    This draft expires 18 Dec., 1996.