Internet-Draft H. Alvestrand draft-alvestrand-directory-defs-01.txt Cisco Systems Target Category: Informational December 2000 Expires: June 2001 Definitions for talking about directories Status of this Memo The following text is food for the I-D machinery. The file name of this memo is draft-alvestrand-directory-defs- 01.txt This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. The intended place to discuss this memo is the open mailing list directory@apps.ietf.org - subscribe by sending mail to directory- request@apps.ietf.org. Abstract When discussing systems for making information accessible through the Internet in standardized ways, it may be useful if the people discussing have a common understanding of the terms they use. One group of such systems is known under the term "directories". This document is not intended to be either comprehensive or definitive, but is intended to give some aid in mutual comprehension when discussing information access methods to be incorporated into Internet Standards-Track documents. Reference to this document would, for instance, give one the power to agree that the Domain Name Service is a global lookup repository with perimeter integrity and loose, converging consistency, while an LDAP Tags for the names of languages Harald Alvestrand draft-alvestrand-directory-defs-01.txt Expires Dedember 1999 directory server is a local, centralized repository with both lookup and search capability. 1. Introduction and basic terms We suggest using the following terms for the remainder of this document: - Information: Something for which one can imagine multiple worlds where the item in question has different values. The fact of which particular value is true for this world is information. This definition is extremely abstract, and intentionally so "Information is that which reduces uncertainty" - Charles T. Meadow, quoting Shannon - Data: Structure of representation symbols (for instance electronic charge, characters on paper or marks in stone) that people use to represent, structure or make accessible information. - Repository: Amount of data that is accessible through one or more access methods. Again, this term is deliberately not defined more strictly. (Note CIP discussion of indexing/bringing togetherą.diff between 1, 2 and many in the eye of the beholder) - Requester: Entity that may (try to) access data in a repository. Note that no assumption is made that the requester is animal, vegetable or mineral. - Maintainer: Entity that causes changes to the data in the repository. Usually, all maintainers are requesters, since they need to look at the data too, but the roles are distinct. - Access method: Well-defined series of operations that will cause information known to a repository to also be known to the user. - Site: Entity that hosts all or part of a repository, and makes it available through one or more access methods. A site may in various contexts be a machine, a datacenter, a network of datacenters, or a single device. 2. Dimensions of classification 2.1 Uniqueness and scope Some information systems are global, in the sense that only one can sensibly exist in the world. Others are inherently local, in that each locality, site or even box will run its own information store, independent of all others. draft-alvestrand-directory-defs-01.txt [Page 2] Tags for the names of languages Harald Alvestrand draft-alvestrand-directory-defs-01.txt Expires Dedember 1999 The following terms are suggested: - Global repository: A repository that there can be only one of in the world. The world itself is a prime example; the public telephone system's number assignments is another. - Local repository: A class of repository of which multiple instances can exist, each with information relevant to that particular repository, with no need for coordination between them. ((( better term needed ))) - Centralized repository: A repository where all access to data has to pass through some single point of control (site). - Distributed repository: A repository that is not centralized. - Replicated repository: A distributed repository where all sites have the same information Cooperative repository: A distributed repository where not all sites have all the information, but where mechanisms exist to get the info to the requester, even when it is not available to the site originally askedThe term "global" is often a matter of social or legal context; for instance, the telephone numbering system is global by international treaty, while the debate about whether the Domain Name System is global in fact or just a local repository with ambitions has proved bait for too many discussions to enumerate. Some claim that globality is in the eye of the beholder; "everything is local to some context". When discussing technology, it may be wise to use "very widely deployed" instead. 2.2 Search, Lookup, Query and Notify A different consideration when describing repositories is the types of method they offer to find information. The chief classifications are: - Lookup repositories require the user to know or guess some exact value before asking for information, sometimes called a "lookup key" and sometimes called a "name". They usually return a single piece of information as a response. - Search repositories require the user to know some approximate value of some information. They usually return zero, one or more responses that match the information supplied according to some algorithm. An orthogonal dimension has to do with time: - Query repositories will answer a request with a response, and once that is over with, will do nothing more. - Notify repositories will get a request from an user to have information returned at some later time when it becomes available, current or whatever, and will respond at that time with a notification that information is available. draft-alvestrand-directory-defs-01.txt [Page 3] Tags for the names of languages Harald Alvestrand draft-alvestrand-directory-defs-01.txt Expires Dedember 1999 - Subscription repositories are like notify repositories, but will transfer the actual information when available. 2.3 Consistency models Consistency (or the lack thereof) is a property of distributed repositories; for this particular discussion, we ignore the subject of semantically inconsistent data (such as an assertion that a man is blind and has a valid driver's license), and focus on the problem of consistency where inconsistency is defined as having the same request, using the same credentials, be answered with different data at different sites. Distributed repositories may have: - Strict consistency, where the problem above never arises. This is quite expensive. - Strict internal consistency, where the replies always reflect a consistent picture of the total repository, but some sites may reflect an earlier version of the repository than others - Loose, converging consistency, where different parts of the repository may be updated at different times as seen from a single site, but the process is designed in such a way that if one stops making changes to the repository, all sites will sooner or later present the same information - Inconsistency, where no guarantee can be made whatsoever One interesting variant is subset consistency, where the system is consistent (according to one of the definitions above), but not all questions will be answered at all sites; possibly because different sites have different policies on what they make available (NetNews), or because different sites only need different subsets of the "whole picture" (BGP). 2.4 Security models It's harder to describe security models in a few sentences than other properties of information systems. Some thoughts, though: On trust in information: Why do we trust a piece of information to be correct? - Because it's in the repository (and therefore must have been authorized). This is perimeter (or Eggshell) integrity. - Because it contains internal integrity checks, usually involving digital signatures by verifiable identities This is item integrity; the granularity of the integrity and the ability to do integrity checks on the relationships between objects draft-alvestrand-directory-defs-01.txt [Page 4] Tags for the names of languages Harald Alvestrand draft-alvestrand-directory-defs-01.txt Expires Dedember 1999 is extremely important and extremely hard to get right, as is establishing the roots of the trust chains. - Because it fits other available information, and causes the right things to happen when I use it. This is hopeful integrity. Which integrity model to choose is a matter of evaluating the cost of implementing the integrity, the cost of having the integrity break on you, and the impact of cost on doing business. On access to information, the usual categories apply: - Open access: Anyone can get the information. - Property-based access: Access because of what you are, or where you are. For example limited to "same network", "physically present" or "resolvable DNS name" - Identity-based access: Access because of who you are (or successfully claim to be). username/password, certificatesą.. These are then backed up by a layer specifying what the identity you have proven yourself to be has access to - Token-based access: Access because of what you have. Hardware tokens, smartcards, certificates, capability keysą. In this case, access is given to all who can present that credential, without caring about their identity. The most common approaches are identity-based and open access; however, "what you have" access is commonly used informally in, for example, password-protected FTP or Web sites where the password is shared between all members of a group. 2.5 Update models A few examples: - Read-only repositories have no standard means of changing the information in them. This is usually accomplished through some other interface than the standard interface. - Read-mostly repositories are designed based on a theory that reads will greatly outnumber updates; this may, for instance, be reflected in relatively slow consistency-updating protocols. - Read-write repositories assume that the updates and the read operations are of the same order of magnitude. 2.6 The term "Directory" The definitions above never used the term "Directory". In most common usages, the properties that a repository must have in order to be worthy of being called a directory are: - Search draft-alvestrand-directory-defs-01.txt [Page 5] Tags for the names of languages Harald Alvestrand draft-alvestrand-directory-defs-01.txt Expires Dedember 1999 - Convergent consistency All the other terms above may vary across the set of things that are called "directories". 3. Classification of some real systems 3.1 The Domain Name System The DNS is a global lookup repository with loose, converging consistency and query capability only. It is either strictly read-only or read-mostly (with Dynamic DNS), has an open access model, and mainy perimeter integrity (some would say hopeful integrity). DNSSEC aims to give it item integrity. If one opens up the box and looks at the relationship between primary and secondary nameservers, that can be seen as a limited form of notify capability, but this is not available to end-users of the total system. 3.2 The (imagined) X.500 Global Directory X.500 was intended to be a global search repository with loose, converging consistency. It was intended to be read-mostly, perimeter secure and query-capable. 3.3 The Global BGP Routing Information Database The Global or top-level BGP routing information database is a global read-write repository with loose, converging subset consistency (not all routes are carried everywhere) and very limited integrity control, mostly intended to be perimeter integrity based on "access control based on what you are". 3.4 The NetNews system NetNews is a global read-write repository with loose (non-converging) subset consistency (not all sites carry all articles, and article retention times differ). Between sites it offers subscription capability; to users it offers both search and lookup functionality. 3.5 SNMP MIBs An SNMP agent can be thought of as a local, centralized repository offering lookup functionalty. With SNMPv3, it offers all kinds of access models, but mostly "access because of what you have" seems popular. 3.6 The MBONE MBONE can be thought of as a highly transient, read-write repository with subscription capability. draft-alvestrand-directory-defs-01.txt [Page 6] Tags for the names of languages Harald Alvestrand draft-alvestrand-directory-defs-01.txt Expires Dedember 1999 4. Security Considerations Security is a very relevant question when considering information access systems. Some issues to consider are: - Controlled access to information - Controlled rights to update information - Protection of the information path from provider to consumer - With personal information, privacy issues - Interactions between multiple ways to access the same information 5. Character set considerations . 6. Acknowledgements 7. Author's Address Harald Tveit Alvestrand Cisco Systems Weidemanns vei 27 N-7043 Trondheim NORWAY EMail: Harald@alvestrand.no Phone: +41 44 29 94 . 8. References Appendix A: TODO Check out relevance of DDDS documents. Drag in referents for examples. draft-alvestrand-directory-defs-01.txt [Page 7]