INTERNET-DRAFT Lewis Girod Date: March 13, 1998 Benjie Chen Expires: September 18, 1998 MIT Laboratory for Computer Science draft-girod-w3-id-res-ext-00.txt Henrik Frystyk World Wide Web Consortium John Mallery MIT Artificial Intelligence Laboratory WIRE - W3 Identifier Resolution Extensions Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Note: This is a very rough draft suitable only for experimental implementations. It is expected to change in the near future. Abstract WIRE extends HTTP with a new type of redirect response that permits a resolver to explicitly delegate a resolution to other resolvers and protocols. WIRE is an effort to make delegation more explicit, redirection more flexible, and resolution processes more efficient through the use of hints. This document defines WIRE and describes the expected behaviors of resolvers and clients using WIRE. WIRE is an extension of the HyperText Transfer Protocol (HTTP), and is intended to be compatible with HTTP/1.0 and above [4][5]. WIRE encourages use of long-lived URIs and at the same time supports protocol evolution without having to change currently deployed URIs or URI schemes. The extension is based on a simple URI resolution model that allows an application to dynamically request metadata describing where and how to access a resource. The model can use any generic metadata description language (e.g. RDF) and as the metadata itself is interpreted as a first class resource, metadata resources are no different than any other resource on the Web. 1 Introduction 1.1 Terminology A resolver is an application that translates a URI into another URI or in case it is the authoritative resolver, directly to the requested resource. A resolution process is the sequenced set of operations performed by a set of one or more resolvers is a nested set of operations that eventually will result in an entity being generated and returned to the requestor. 1.2 Resolution Model URI resolution models have been a perennial source of confusion. In this section we present a new, clearer resolution model. The WIRE model of URI resolution is fairly simple: to resolve a URI, ask the right server to resolve it, supplying any appropriate arguments or hints. The following diagram and discussion shows how this works for the "GET" method (other methods apply the same resolution process, differing only in the processing at the origin server): _____________________ GET | | Standard URI[?argument] --->| Right Resolver? | Yes ---> HTTP [hint: "hint"] |_____________________| Response ^ | No | Error | | | | v v +------- 350 Redirect 400 Bad Request If the location of the "right" resolver is unknown, the client sends a request to a favored or local resolver, optionally including a hint. This resolver will do some amount of work on the client's behalf, and will either return one of the following responses: * A standard HTTP response, indicating that the resolver is authoritative for that URI and that resolution is complete * A 350 redirect indicating that the resolution process is not done yet * A 400 response, indicating that a failure occurred in the resolution infrastructure, and no further authoritative data can be obtained A 200 response returns a view of the URI, or in the case of a query containing an argument, a view resulting from a method call on the resource identified by the URI. A 350 response returns hint information intended to help the client continue the resolution process. The hint information might describe resolvers that can further resolve the URI in question, or it might list alternate URIs that also resolve to the resource named by the original URI. To understand this model it is important to have a broader understanding of URI resolution. First, completion of resolution is largely in the eye of the beholder; the data one client resolves might be interpreted by another client as metadata that can be used to continue resolving the URI. In one sense a 350 redirect from an initial attempt at resolution is a view of the URI; in another sense it is simply metadata containing a hint for continued resolution at another site. When thinking about URI resolution it is important to keep this contextual information in mind. Second, the final step of resolution is performed by the resource itself, although in current implementations this activity is coordinated by the "server" process. A very important semantic difference between the 350 response and other HTTP responses has to do with the trust model for a 350 response. Unlike other HTTP responses, the content of metadata contained within a 350 response is the responsibility of the resolver that serves it, and is not in any way linked to the origin server for the resource it describes. If the resolvers are not trusted there is no guarantee that the metadata is accurate. The efforts to specify signed RDF metadata should provide methods for correcting these deficiencies in the future. 1.3 Problems this model does NOT solve Resolution using WIRE does not guarantee that a resolution process converges to a resolved resource. It does not guarantee that the resource returned is the correct or authoritative resource. It does not guarantee that a resolver understands every URI, nor that the authoritative resolver for any given URI exists or can be located. It does not guarantee that URIs are maintained in a persistent fashion or that URIs consistently resolve to resources that are conceptually equivalent. WIRE is designed to enable the construction of resolution systems that can provide some or all of these guarantees over specific namespaces. 1.4 Guide to this Document This document is organized as follows: Section 2 describes the syntax of WIRE; Section 3 describes the semantics and behaviors of resolver, clients and proxies; Section 4 gives a short example; Section 5 discusses the merit of caching; and Section 6 discusses security issues. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 ([6]). In grammar definitions, this document uses the same parsing constructs as RFC 1630[1], RFC 2068 ([5]). In particular, the non-terminals uri is defined in [1][4][5]. 2 Syntax of Extended Headers and Responses This section introduces the headers and the response code defined by WIRE. This specification attempts to define minimal syntactical requirements for WIRE conformance. 2.1 Hints URI resolution can be streamlined through the use of resolution hints. Resolution hints are encoded as URIs. Apart from the URI encoding rules, the syntax and semantics of resolution hints is specific to the resolution system indicated by the hint. For the purpose of caching and loop avoidance, two hints are lexically equivalent if they are octet-by-octet equal after applying URI normalization rules. Resolution hints are intended to be processed by both clients and resolvers during the resolution process. To the client, hints convey resolver locations and supported protocols and type models. To the resolver indicated by the hint, hints convey client preferences and tokens important to the resolution process. For example, it is essential that URN resolvers receive a hint that includes a token describing a starting point for resolution. This allows the resolver to skip delegation steps that would otherwise be necessary to obtain the hint. 2.2 Extended Request Headers 2.2.1 Optional Header The Optional request-header allows a client to declare itself to support WIRE. This header is optional. The presence of this header indicates that the resolver may return WIRE specific responses to the client. In the absence of this header the resolver must assume the client is not WIRE compliant. In this case the resolver may complete the resolution of the URI if it can do so without sending back WIRE specific responses, or it may reject the request with a 400 response code. Proxy behaviors are discussed in more detail in section 3.3. See [10] for a description of the use of the Optional header. The Optional header has the following grammar: Optional-Header = "Optional" ":" WIRE-specification-URN WIRE-specification-URN = "urn:specs:WIRE/0.0" 2.2.2 Resolution-Hint header The Resolution-Hint request-header can be used by a client to supply a resolution hint to the resolver. It has the following grammar: Resolution-Hint-Header = "Resolution-Hint" ":" hint hint = quoted-absolute-uri quoted-absolute-uri = "\"" absolute-uri "\"" The Resolution-Hint header is optional. If this header is absent from a request, the resolver may either attempt to resolve the URI without a hint (in many cases this may slow down the resolution process) or reject the request with a 400 response code. 2.3 Extended Response Code - 350 In order to support distribution of resolution authority, WIRE includes the 350 response, a new response code intended to express delegation and redirection in the resolution framework. To delegate or redirect the resolution of a URI, a WIRE resolver may return one or more alternate URIs, each bound to zero or more resolution hints. The Resolver-Location header in a 350 response encodes this comma delimited set of bindings. Within each binding, the list of hints is delimited by semicolons. The Resolver-Location header has the following grammar: Resolver-Location-Header = "Resolver-Location" ":" binding *("," binding) binding = alternate-uri *(";" hint) alternate-uri = "\"\"" | quoted-relative-uri | quoted-absolute-uri hint = quoted-absolute-uri quoted-relative-uri = "\"" relative-uri "\"" As shown above, the alternate URI specified in the binding may appear as an absolute URI or in two other forms. If the binding applies to the original request-URI, the empty string ("") may be used instead of repeating the original request-URI. If a binding applies to a URI that can be expressed relative to the original request-URI, a relative URI may be quoted in place of an absolute URI. A 350 response code may also include an entity containing additional metadata relevant to the request-URI. Note that the data in the Resolver-Location header and the optional entity in a 350 response follows the amended trust model described in section 1. 3 Semantics and Behavior of WIRE 3.1 Resolver Behaviors and Response Semantics When a resolver receives a resolution request on a URI, the resolver should attempt to resolve the URI, making use of any supplied hint information. If the resolver is able to resolve the request-URI, the resolver processes the request as would a normal HTTP server. If the resolver fails to make progress towards the resolution of the URI, a 4xx error code may be returned. The 404 error code should only be returned if the resolver has authority over the request-URI and that URI is not bound to a resource. Other 4xx codes indicate a temporary or permanent failure in the resolution infrastructure. If the resolver makes progress towards resolution of the request-URI, a 350 response may be used to redirect resolution to alternate URIs or to delegate the resolution of the URI to alternate resolvers. Redirection and delegation information is conveyed via the Resolver-Location header defined in section 2.3. The client may then attempt to use supplied hints to resolve any of the alternate URIs listed in the header. The encoding and semantics of the hints is defined by individual resolution frameworks. The hints supplied in a 350 response can specify a collection of alternate protocols or resolution systems. New types of hints, referring to new or location-specific protocols, can be phased in along with more standard hints as fallback options. This can add scalability and flexibility to the resolution framework by providing information about alternate resolution mechanisms to clients that are aware of those alternate mechanisms. A resolver may receive both standard HTTP requests and requests from WIRE-aware clients. Clients indicate awareness of WIRE by including an Optional request header with the URI of the WIRE specification. A WIRE-compliant resolver that receives a request without an appropriate Optional header may either reject the request with a 400 response code or proxy the resolution of the URI on behalf of the client. In the latter case, the resolver acts as a delegation proxy (section 3.3). For caching purposes, a 350 response should also include information on the life-time of delegation/redirection responses. This information can be encoded in one of the HTTP caching headers ([4][5]). 3.2 Client Behaviors When a client makes a resolution request to a resolver, the resolver cannot guarantee that the URI can be resolved. If the URI is within the authority of another resolver, a 350 response may be returned to the client. To continue the resolution process, the client should: 1. Parse the Resolver-Location header in the 350 response. 2. Select one of the alternate URIs, and select one of the resolvers specified by the hints bound to that URI. 3. Initiate a new request to the specified resolver, using the protocol and address information specified in the hint. 4. Request the selected URI, optionally including any appropriate hint information in the Resolution-Hint request header. Some resolvers require that the hint be forwarded while others do not. Whenever new resolution hints are returned to a client, there is the possibility of a "delegation loop", in which a new hint is lexically equivalent to a hint previously applied in the resolution of the same request-URI. A "redirection loop" can occur if an alternate URI is lexically equivalent to a previous URI in a sequence of redirections. To detect delegation loops, a client should keep for each request-URI a history of resolution hints previously applied, and should lexically compare new resolution hints to the history. Detection of redirection loops should already be performed by web clients. These methods do not guarantee the detection of delegation or redirection loops. Malicious resolvers can cause delegation loops by returning resolution hints that are semantically equivalent to a prior hint but are not lexically equivalent to any previous hint. Loop detection is only guaranteed to work for resolvers that recognize only a small set of distinct resolution hints. Clients that wish to receive 350 responses must include the Optional header with the URN of the WIRE specification in each resolution request. Older clients and clients that do not want to receive delegation responses may omit the Optional header. 3.3 Proxying WIRE Functionalities WIRE extends the normal HTTP proxy behavior and additionally defines a new behavior called a delegation proxy. In both of these cases normal HTTP proxy semantics should be followed in addition to the WIRE specific semantics. 3.3.1 HTTP/WIRE Proxy Behavior A WIRE resolver used as a proxy may proxy non-local resolution requests based on local policy decisions. If performing the resolution is contrary to policy, an appropriate error response should be returned. Otherwise, it should attempt to resolve the URI with or without a resolution hint, and the results of that effort to the client. If new resolution hint is supplied, the resolver should attempt to resolve the URI based on the text of the URI and global information. If the resolution hint supplied indicates a remote resolver, the proxy should initiate a request to that resolver. 3.3.2 Delegation Proxy Behavior If the requesting client is not WIRE compliant, it cannot parse a 350 delegation/redirection response. In this case, a delegation proxy may choose to perform the entire resolution process on behalf of the client and pass the result of the resolution back to the original client. A delegation proxy should follow the guidelines for a WIRE client when performing the resolution. The first result entity that conforms to standard HTTP should be returned to the client. A client requests delegation proxy behavior by omitting the Optional request header indicating WIRE. If a resolver's policy prohibits this behavior, or if at any time the resolver decides to abort the resolution, a 400 response should be returned to the client. 4 Example A client wishes to resolve the URI urn:cid:9802032044@thebe.lcs.mit.edu. It sends a resolution request to http://urn.org/ GET urn:cid:9802032044@thebe.lcs.mit.edu HTTP/1.0 Host: urn.org Optional: "urn:specs:WIRE/0.0" The resolver at urn.org determines that the URI can be resolved using another resolver, and sends back HTTP/1.0 350 Resolution Delegated Resolver-Location: "";"res-hint:http://thebe.lcs.mit.edu/;scope=urn:cid:" To continue the resolution process, the client makes another resolution request, this time to http://thebe.lcs.mit.edu/ GET urn:cid:9802032044@thebe.lcs.mit.edu HTTP/1.0 Host: thebe.lcs.mit.edu Optional: "urn:specs:WIRE/0.0" Resolution-Hint: "res-hint:http://thebe.lcs.mit.edu/;scope=urn:cid:" The resolver at thebe.lcs.mit.edu is the authoritative resolver for the URI. It returns the resolution result HTTP/1.0 200 OK 5 Caching The exact behaviors of URI resolution differ from resolution framework to resolution framework. For example, ignoring DNS lookups, URL resolutions are usually a one step process, whereas URN resolutions may take a few delegations to complete. We hope to realize significant performance gains for URI resolution frameworks that require more than one delegations through the caching of 350 responses. To cache a 350 response, both the request URI and the value of the Resolution-Hint header that was used in the request must be included in the key that maps to the cached entity. Standard HTTP headers that restrict or control caching must be heeded. The need for caching 350 responses stems from the delegation nature of some URI resolution frameworks. Since the discovery process required to refresh a cached non-350 HTTP response may be significant, caching the 350 responses preceding the non-350 HTTP response may result in significant performance improvements, depending on cache expiration times. One optimization when caching 350 responses might retain only the response that has resulted from the longest sequence of delegations, that is, discarding intermediate 350 responses. Applying this optimization, subsequent requests for the same URI may skip directly to the most specific delegation. If this optimization is applied, the most restrictive caching requirements for any response in the sequence must be applied to the most specific sequence. A similar optimization would be to cache all the intermediate 350 responses, and for each subsequent request, re-use a non-expired delegation response with the longest sequence of delegations. Applying this optimization, most subsequent requests can skip many delegation steps. 6 Security Considerations WIRE inherits the HTTP security model at the transport level. That does not, however, imply the safety and authenticity of the resolution metadata or resolved data. One possible approach to guarantee safety and authenticity is to include signed metadata. Those problems require further study and are beyond the scope of this document. Possible conflicts between the HTTP trust model and the 350 response raise security concerns, as mentioned in section 1. In short, 350 responses without security extensions are responses from untrusted resolvers. Measures such as loop-avoidance should be applied to detect and prevent denial-of-service attacks. Implementations of WIRE should follow the security restrictions of the environment the resolver operates in. For example, Resolvers on firewalls operating under both single-step and delegation proxy behaviors may be required to filter out resolution requests from outside the firewall that intend to use an internal resource. Such requests, in most cases, are not allowed. However it is quite essential that such proxying resolvers forward resolution requests from internal clients to the outside world, unless an organization intend to mirror resolution services over all URI namespaces internally. 7 Acknowledgments The motivation leading to this work stemmed from a few directions. John Mallery's experience with implementing the PDI namespace for URNs indicated that the THTTP ([8]) spec did not adequately cover error messages and redirection. Discussions with John Mallery and Henrik Frystyk Nielsen led to the initial formulation of this protocol specification, in an effort to rework THTTP into an official HTTP extension. Henrik Frystyk Nielsen also provided invaluable ideas and feedback on modifying the original design to fit into the generic ideas of URI resolution. Karen Sollins and Dorothy Curtis have also provided many insightful ideas and feedback on the general resolution architecture and on this document. 8. Authors Addresses Lewis Girod Benjie Chen MIT Laboratory for Computer Science 545 Technology Square Cambridge, MA 02139, USA Email: {girod,benjie}@lcs.mit.edu Henrik Frystyk Nielsen Technical Staff, World Wide Web Consortium MIT Laboratory for Computer Science 545 Technology Square Cambridge, MA 02139, USA Email: frystyk@w3.org John Mallery MIT Artificial Intelligence Laboratory 545 Technology Square Cambridge, MA 02139, USA Email: jcma@ai.mit.edu References 1. RFC 1630 "Uniform Resource Identifiers in WWW", T. Berners-Lee, June 1994 2. RFC 1737 "Functional Requirements for Uniform Resource Names" K. Sollins, L. Masinter, December, 1994. 3. RFC 1738 "Uniform Resource Locators", T. Berners-Lee, L. Masinter, M. McCahill, December 1994 4. RFC 1945 "Hypertext Transfer Protocol -- HTTP/1.0", T. Berners-Lee, R. Fielding, H. Frystyk, May, 1996 5. RFC 2068 "Hypertext Transfer Protocol -- HTTP/1.1", R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T. Berners-Lee, January 1997 6. RFC 2119 "Key words for use in RFCs to Indicate Requirement Levels", S. Bradner, March 1997 7. RFC 2168 "Resolution of Uniform Resource Identifiers using the Domain Name System", R. Daniel, M. Mealling, June 1997 8. RFC 2169 "A Trivial Convention for using HTTP in URN Resolution", R. Daniel, June 1997 9. RFC 2276 "Architectural Principles of Uniform Resource Name Resolution", K. Sollins, September, 1997 10. Internet Draft draft-ietf-http-ext-mandatory-00.txt "Mandatory Extensions in HTTP", H. Frystyk Nielsen, P. Leach, S. Lawrence, March 1998. draft-girod-w3-id-res-ext-00.txt Expires: September 18, 1998