Internet Engineering Task Force L. Daigle Internet-Draft TCE Intended status: Informational March 2015 Expires: August 31, 2015 Internet Application Identifier Architecture draft-daigle-appidarch-00.txt Abstract This document outlines a general architecture for Internet applications, through the perspective of applications identifiers. It provides a survey of past approaches, drawing out common elements and highlighting common traps and roadblocks. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 31, 2015. Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Basic components of Application Identifier Architecture . . . 2 3. Application Identifier Architectures in More Detail . . . . . 2 3.1. System . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Daigle Expires August 31, 2015 [Page 1] Internet-Draft App ID Arch March 2015 3.2. Identifiers . . . . . . . . . . . . . . . . . . . . . . . 3 3.3. Identified . . . . . . . . . . . . . . . . . . . . . . . . 3 4. Survey of existing work . . . . . . . . . . . . . . . . . . . 5 4.1. Domain Name System . . . . . . . . . . . . . . . . . . . . 5 4.2. World Wide Web . . . . . . . . . . . . . . . . . . . . . . 6 4.3. Classic URIs . . . . . . . . . . . . . . . . . . . . . . . 8 4.4. IP addresses . . . . . . . . . . . . . . . . . . . . . . . 10 5. Common design choices and challenges . . . . . . . . . . . . . 10 5.1. Identifiers . . . . . . . . . . . . . . . . . . . . . . . 10 5.2. Identified . . . . . . . . . . . . . . . . . . . . . . . . 10 6. Issues in (mis)using identifiers . . . . . . . . . . . . . . . 11 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11 1. Introduction This document posits a high level architecture of Internet application identifier systems, as well as a survey of IETF efforts dealing in standardization of Internet applications and services using those identifier systems. Status of this revision: this is a very drafty -00 document. The hope and expectation is that it is enough to stimulate some thought and discussion to flesh out future documents. 2. Basic components of Application Identifier Architecture There are 3 basic components of an Application Identifier Architecture: o System o Identifiers o Identified The System is the context in which the identifiers are created, used and in which they are intended to make sense. This is usually transparent, except when identifiers are used outside of this context, causing greater or lesser problems over time. This is discussed in more detail, below. Identifiers are typically strings of bits or characters. They may have multiple representations. The concept of what is being Identified is also dependent on the System -- whether it's Internet hosts, services, documents, parts of documents, people or other actors from the physical world, their representatives in the Internet, etc. 3. Application Identifier Architectures in More Detail 3.1. System Daigle Expires August 31, 2015 [Page 2] Internet-Draft App ID Arch March 2015 As noted above, the System is the context in which the identifiers make sense. In the Domain Name System, for example, the system initially consisted of the set of hosts attached to the Internet, and it has generalized to the set of addressable Internet services. These are organized into ?domains?, which are operated under the control of a single entity, and individual domains are completely independent of each other. 3.2. Identifiers Identifiers may identify a thing that exists ? content, service, location ? or is posited to exist. Typical actions on identifiers include: o "Minting" -- creation of an identifier, usually including association with the identified thing o Transformation -- changing the bits (characters) of the identifier by some set of rules and/or to conform to some structure; relative or absolute o Comparison -- of identifiers. Are 2 different identifiers the same? Identifying the same thing? Expressing relationship between things? o Resolution -- using the identifier to access what it identifies o Validation -- confirmation that the identifier conforms to the system?s rules (syntax) o Status check -- has the identifier been minted? Is it active within the system? o Authentication -- confirmation that the identifier association is valid (as minted) o Lookup -- some systems support look up ? finding identifier entries based on partial fragments, typically leading characters (bits) o Search -- some systems support searches for identifiers based on fragments of the related data o Subscribe -- subscribing to an identifier allows you to get periodic updates as to state of the identifier/identified. 3.3. Identified Identifiers can be associated with just about any level of concept, construct, network or software element, or entity in the physical world. The range of possible identified things is generally scoped by the System. Daigle Expires August 31, 2015 [Page 3] Internet-Draft App ID Arch March 2015 From the perspective of application architectures, there are 4 levels of things that may be identified, and may have individual identifiers: o Entity/resource -- the thing itself. For example, the published work "Moby Dick" o Instance -- a specific copy of the thing, e.g., a copy of "Moby Dick". o Properties -- the characteristics of the thing. The set of properties discussed is generally constrained by the System. o Relationship -- an identifier may identify something as "part-of" a larger entity. There are actions that may be performed on the things identified: o Assign properties -- associate values with properties of identified item o Get properties o Intrinsic -- E.g., format, number of words o Applied -- director's cut o Publish -- put copy somewhere o First instance o Cache/replica o Get (a copy) o Any copy o Specific service o Closest o Cheapest o Authenticate -- confirm (cryptographically) that the resource is genuinely the one expected / related to identifier o Comparison o Equivalence o Send -- a reflection of "get"? o Subscribe Daigle Expires August 31, 2015 [Page 4] Internet-Draft App ID Arch March 2015 o Search 4. Survey of existing work 4.1. Domain Name System The Domain Name System (DNS) was designed to provide identifiers to allow storage and retrieval of (sets of) properties associated with Internet hosts and services ? real and virtual. DNS identifiers are hierarchical, dot-separated labels, typically expressed as characters. Host names are a subset of domain name identifiers, with some restrictions on the permissible characters. o "Minting" -- the authority for a domain can create labels within that domain. So-called "synthetic" domain labels are created dynamically. o Transformation -- relative domain names can be understood within the context of a ?search domain? o Comparison -- domain names are matched on an octet-by-octet basis (IDNs are not considered here) o Resolution -- DNS resolution means "DNS lookup" -- using the DNS infrastructure to retrieve resource records associated with the domain name. Resolution can be tailored to retrieve particular types of resource records (e.g., A records, or AAAA records, or MX records) o Validation -- any string that conforms to DNS syntax may be considered valid. o Status check -- DNS does not distinguish between "inactive" and "not minted". That is, either every label in the hierarchical domain name is accessible in an authoritative domain server (in which case the domain name is "minted" and "active") or the DNS will return the value that it does not exist. (Not true in DNSSEC?) o Authentication -- domain names are not authenticated (see below for DNSSEC). o Lookup -- DNS resolution is lookup of domain names o Search -- DNS does not support search (wildcards?) o Subscribe -- N/A DNS identifies resource records. The resource records are themselves descriptions of Internet hosts, services, and other Daigle Expires August 31, 2015 [Page 5] Internet-Draft App ID Arch March 2015 information stored in the DNS, but fundamentally the DNS identifier is for resource records. o Entity/resource -- a set of resource records o Instance -- copies of resource records may be stored in caching servers; there are no special identifiers to distinguish primary or cached results o Properties -- N/A o Relationship -- N/A There are actions that may be performed on the things identified: o Assign properties -- resource records have time to live (TTL) and serial numbers included o Get properties -- parsed as part of the response from the server. o Publish -- publishing a DNS resource record amounts to updating a DNS zone file with the record. o Get (a copy) -- resolve the domain name identifier o Authenticate -- DNSSEC is used to authenticate that the resource records/response received for domain name resolution are as they were published. o Comparison -- of RRs? o Send -- N/A o Search -- N/A 4.2. World Wide Web The World Wide Web (WWW) is largely defined by the HTTP protocol. "Pages" defined in HTML are the primary design target, though these days much content of varying formats is delivered via the HTTP protocol. For the sake of discussion, we'll say that WWW identifiers are HTTP scheme URIs. o "Minting" -- typically, HTTP URIs are not composed consciously, so much as assembled practically with components of the domain name hosting the web server and some path components based on how the website is laid out hierarchically (which may or may not relate to Daigle Expires August 31, 2015 [Page 6] Internet-Draft App ID Arch March 2015 an underlying file structure) o Transformation -- HTTP URIs may be relative (to current page in the "hierarchy", current domain authority etc) o Comparison -- HTTP URIs are not inherently comparable except by characterwise comparision or determining relative relationships o Resolution -- HTTP URIs are resolved by parsing the authority component from the URI, connecting to the server, and requesting the resource associated with the path part of the URI. o Validation -- any string that conforms to HTTP URI syntax may be considered valid. o Status check -- HTTP does not distinguish between "inactive" and "not minted". That is, either the HTTP server is available and the resource is found on it (in which case the URI is "minted" and "active") or the server (or path) are not found. o Authentication -- HTTP URIs are not authenticated (see below for authentication of servers). o Lookup -- N/A o Search -- HTTP does not support search (within server?) o Subscribe -- N/A HTTP URIs identify "web pages" or "resources". o Entity/resource -- web content (page) o Instance -- copies of pages may be stored in caching servers; there are no special identifiers to distinguish primary or cached results o Properties -- properties may be embedded within the HTML document, but there are no special identifiers to query/retrieve properties; as part of the HTTP protocol, capabilities may be negotiated o Relationship -- relative URIs (?) There are actions that may be performed on the things identified: o Assign properties -- the web server may assign properties to a web page. o Get properties --parsed as part of the response from the server. o Publish -- publishing a web page is done on the backend, out of band of the WWW system o Get (a copy) -- resolve the URI Daigle Expires August 31, 2015 [Page 7] Internet-Draft App ID Arch March 2015 o Authenticate -- certs are used, within HTTP, to authenticate the server is empowered to operate for a given domain name. Individual pages are not authenticated. o Comparison -- many web pages look alike -- but there is no inherent way to claim two web pages (documents) are "the same". o Send -- N/A o Search -- within the WWW there is no support for search (all search is achieved as an external system). 4.3. Classic URIs The advent of the WWW heralded a burst of development of standards for applications and content on the Internet. Much work was done to elaborate a general system of identifiers, supporting a broad range of application needs -- Uniform Resource Identifiers in the large, encompassing Locators (identifiers of Internet "location"), Names (persistent, location-independent identifiers for resources), Characteristics (metadata about resources), Agents (for composing actions). In the large, the classic perspective on URIs was that they would identify all resources (documents, services, media, components, classes, parameters etc) that would be referenced within Internet protocols. o "Minting" -- dependent on the URI scheme. The HTTP URI scheme is outlined above as a dynamic URI creation example. Some (namespaces of) URNs require more explicit minting of identifiers to be used in URNs (e.g., ISBN URNs). o Transformation -- dependent on the URI scheme. URNs were intended to be (authoritatively) transformed into URLs identifying the location of the desired resource at a given point in time. o Comparison -- Scheme-dependent -- there is no URI-wide support for comparing URIs (except byte-wise equality). o Resolution -- Scheme-dependent. Some URI schemes do not have Internet-based resolution capabilities. o Validation -- any string that conforms to URI syntax may be considered valid. Individual schemes may include validation services (e.g., out of band lookup services, built-in checksums, etc). o Status check -- Generally, URIs do not distinguish between "inactive" and "not minted". That is, successful resolution Daigle Expires August 31, 2015 [Page 8] Internet-Draft App ID Arch March 2015 implies minted, unsuccessful resolution is ambiguous. Individual schemes may provide more refined methods of confirmation of status (SIP URIs?) o Authentication -- There is no URI-wide support for URI authentication. o Lookup -- N/A o Search -- URIs are not inherently searchable o Subscribe -- N/A Classically, URIs identify Internet "resources". However, URIs have been found in contexts that are disjoint from the Internet (e.g., XML component identification). "Resource" is deliberately general -- could be documents, services, etc. Each URI scheme refines its scope of intended resources. Entity/resource -- typically Internet content or service, but see comment above. Instance -- Most URI schemes do not support identification of instances. However, URNs are intended to identify locations of multiple instances of a given resource. Properties -- URIs may identify URCs (Uniform Resource Characteristics) an articulation of the properties of a given resource. (URCs were never finally standardized). Relationship -- relative URIs (?) There are actions that may be performed on the things identified: o Assign properties -- publish a URC o Get properties -- retrieve a URC associated with the URI. o Publish -- scheme-dependent o Get (a copy) -- resolve the URI o Authenticate -- certs are used, within some URI schemes, to authenticate the server is empowered to operate for a given domain name. Individual resources are not authenticated. (Unless you have a URC with a checksum?) o Comparison -- many resources look alike -- but there is no inherent way to claim two resources are "the same". o Send -- N/A o Search -- within the URI space there is no support for search (all search is achieved as an external system). Daigle Expires August 31, 2015 [Page 9] Internet-Draft App ID Arch March 2015 4.4. IP addresses To be added... the interesting thing about IP addresses is considering the context in which they are defined, and then contrasting that with the places they turn up. 5. Common design choices and challenges In creating a system, there are important common design choices that need to be made. Sometimes the answer is implicit within the overall design constraints of the system. Other times, considerable effort is required to refine specifications and make appropriate choices. As noted, these are common design questions. This is an area where understanding of previous systems? design discussions can be particularly helpful (in order not to repeat them needlessly). 5.1. Identifiers In defining identifiers for a system, is the intention that they: o Name something -- the identifier will be associated with one entity (or instance of an entity), wherever that entity may be located. o Locate something -- identify the location of an entity (at some point in time). o Are Smart or dumb identifiers -- "smart" identifiers have structures that can be parsed to determine something about the thing identified (e.g., domain in which it is stored); "dumb" identifiers are opaque and must be resolved within the system. o Have uniqueness -- is the identifier/resource binding unique? o Have scope -- is the uniqueness (or other properties) only maintained within some limited scope, or is it global? o Permanent -- what is the expected level of permanence of the identifier's relevance (the ability to use it, the binding between the identifier and the identified resource). Or, are they transient identifiers? 5.2. Identified The specifics of the identified resource need to be defined, as well. o Instances -- can there be multiple instances of a single resource? How can they be distinguished and/or how can two instances be equated. This is important if one needs to be able to cache instances or otherwise validate "local" copies. o Scope of applicability -- what constitutes a "resource" in this system? Daigle Expires August 31, 2015 [Page 10] Internet-Draft App ID Arch March 2015 6. Issues in (mis)using identifiers Things like using IP addressed out of context of the routing system ? assumptions about uniqueness and volatility may be improper. 7. IANA Considerations This memo includes no request to IANA. 8. Security Considerations This document is about considering applications systems. Security is important to applications, but is not specifically called out here. 9. References [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997, . Author's Address Leslie Daigle ThinkingCat Enterprises Leesburg, VA 20176 US Email: ldaigle@thinkingcat.com Daigle Expires August 31, 2015 [Page 11]