Internet-Draft Thommy Eklof Category: Informational Ericsson Expires: April 14, 2000 Leslie L. Daigle Thinking Cat Enterprises October 14, 1999 Wide Area Directory Deployment Experiences draft-eklof-dag-experiences-01.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 14, 2000. Abstract The TISDAG (Technical Infrastructure for Swedish Directory Access Gateway) project provided valuable insight into the current reality of deploying a wide-scale directory service. This document catalogues some of the experiences gained in developing the necessary infrastructure for a national (i.e., multi-organizational) directory service and pilot deployment of the service in an environment with off-the-shelf directory service products. A perspective on the project's relationship to other directory deployment projects is provided, along with some proposals for future extensions of the work (larger scale deployment, other application areas). These are our own observations, based on work done and general project discussions. No doubt, other project participants have their own list of project experiences; we don't claim this document is exhaustive! 1.0 Introduction 1.1 Overview of the TISDAG project As described in more detail in [TISDAG], the original intention of the TISDAG project was to provide the infrastructure for a national whitepages directory service. To be effective, such an infrastructure needed to address the concrete realities of end-users' existing client software, as well as the needs of information providers ("Whitepages Directory Service Providers" -- WDSPs). These realities include the existence of multiple protocols (so-called directory service access protocols, as well as more general Internet application protocols such as HTTP and SMTP). The project was also sensitive to the fact that WDSPs have many good reasons for being reluctant to relinquish copies of their subscribers' personal data. 1.2 Organization of this document In an effort to communicate the experiences with this project, from conception through implementation and pilot deployment, this document is divided into 3 major sections. The first section reviews specific lessons learned by the authors through the TISDAG project and implementation of one conformant system. Next, some perspectives are offered on the relationship of the TISDAG work to other large-scale directory projects that are currently on-going, to give a sense of how these efforts might possibly interact. Finally, some preliminary thoughts on applying the DAG system to other applications and deployment environments are outlined. More speculation on useful development of architectural principles is provided in a separate document ([DAG++]). 2.0 The TISDAG project itself 2.1 TISDAG overview Briefly, the technical infrastructure proposed for the TISDAG project (see [TISDAG] for the complete overview and technical specification) provides end-user client software with connection points to perform basic whitepages queries. Different connection points are provided for the various protocols end-users are likely to wish to use to access the information -- WWW (http), e-mail (SMTP), Whois++, LDAPv2 and LDAPv3. For each client, a transaction will be carried out within the bounds of the protocol's syntax and semantics. However, since the TISDAG system does not maintain a replicated copy of all whitepages information, but rather an index over the data that allows redirection (referrals) to services that are likely to contain responses that match the client's query, a fair bit of background work must be done by the DAG system in order to fulfill the client's query. The first, and most important step, is for the system to make a query against the DAG Referral Index -- a server containing index information (obtained by the Common Indexing Protocol (see [CIP1, CIP2, CIP3]) in the Tagged Index Object format (see [TIO]). This index contains sufficient information to indicate which of the many participating WDSPs should be contacted to complete the query. Wherever possible, these referrals are passed back to the querying client so that it can contact relevant WDSPs directly. This minimizes the amount of work done by the DAG system itself, and allows WDSPs greater visibility (which is an incentive for participating in the system). Protocols which support referrals natively include Whois++ and LDAPv3 -- although these may only be referred to servers of the same protocol. Since many protocols do not support referrals (e.g., LDAPv2), and in order to address referrals to servers using a protocol other than the calling client's own, a secondary step of "query chaining" is provided to pursue these extra referrals within the DAG system itself. For example, if an LDAPv2 client connects to the system, a query is made against the Referral Index to determine which WDSPs may have answers for the query, and then resources within the DAG system are used to pursue the query at the designated WDSPs' servers. The results from these different services are packaged into a single response set for the client that made the query. The architecture that was developed in order to support the required functionality separated the system into distinct components to handle incoming queries from client software ("Client Access Points", or CAPs), a referral index (RI) to maintain an index over the collected whitepages information and provide referrals, based on actual data queries, to WDSPs that might have relevant information, and finally components that mediate access to WDSP whitepages servers to perform queries and retrieve results for the client's query ("Service Access Points", or SAPs). Several CAPs and SAPs exist within the system -- at least one for every protocol supported for incoming queries and WDSP servers, respectively. Designed to be implementable as separate programs, these components interact with each other through the use of an internal protocol -- the DAG/IP. Pragmatically, the use of the protocol means that different components can reside on different machines, for reasons of load-balancing and performance enhancement. It also acts as a "common language" for the CAPs, SAPs and RI to express queries and receive results. This outlines the planned or ideal behaviour of the system; once designed, a pilot phase was started for the project to compare reality against expectations. Two independent implementations of the software were created, and a test deployment was set up within the Swedish University Network (SUNET). More detail on the project and its current status can be found at http://tisdag.sunet.se/. The rest of this section outlines some conclusions drawn from making a reality of the proposed architecture -- both successes and surprises. 2.2 Some successes Implementation and pilot deployment of software meeting the TISDAG technical specification did demonstrate some important successes of the approach. Most notably, the system works pretty much as expected (see exceptions below) to provide transparent middleware for whitepages directory services. That is, client software and WDSP servers were minimally affected -- from the point of view of behaviour and configuration, the DAG system looked like a server to clients, and a client to servers. The goal of the TISDAG project, operationally, was to be able to provide responses to end-user queries in reasonable response times (although not "an addressbook replacement"). The prototype systems demonstrated some success in achieving responses within 10 seconds, at least with the limited testbed of a configuration with 10 WDSP's providing directory service information. More observations on system performance are provided below. The DAG system does demonstrate that it is possible to build referral-level services at a national level (although the deployment has yet to prove conclusively that it can, in its current formulation, operate as a transparent query-fulfillment proxy service). The success of the implementation demonstrated that it is possible, in some sense, to do (semantic) protocol mapping with N+M complexity instead of NxM mappings. That is, protocol translations had to be defined for "N" allowable end-user query access protocols to/from the DAG/IP, and "M" supported WDSP server protocols, instead of requiring each of the N input components to individually map to the M output protocols. As a correlated issue, the prototype system demonstrated some successes with mapping between schema representations in the different protocol paradigms -- in a large part because system's schemas were kept simple and focused on the minimal needs to support the base service requirements. 2.3 Some surprises Over the span of a dozen months from the first "final" draft of the specification through the implementation and first deployment of the software system, a few surprises did surface. These fell into two categories: those that surfaced when the theoretical specification was put into practice, and others that became apparent when the resulting system was put into operation with commercial software clients and servers. More detail is provided in the Appendix concerning specific software issues encounterd, but some of the larger issues that surfaced during the implementation phase are describe below. 2.3.1 LDAP objectclasses and the "o" attribute It came as a considerable surprise, some months into the project, that none of the "standard" LDAP person objectclasses included organization ("o") as an attribute. The basic assumption seems to be that "o" will be part of the distinguished name for an entry, and therefore there is little (if any) cause to list it out separately. This does make it trickier to store information for people across multiple organizations (e.g., at an ISP's directory server) and use the organization name in query refinement. (Roland Hedberg caught this issue, and has flagged it to the authors of the "inetorgperson" objectclass document). 2.3.1 The Tagged Index Object The Tagged Index Object ("TIO"), used to carry indexes of WDSP information to the RI, is designed to have record (entry) tags to reduce the number of false positive referrals generated when doing a search in the RI. One of the features of the first index object type, Whois++'s centroid (see [centroid]) was the fact that the index object size did not grow linearly with the size of data indexed -- i.e., at some point the growth of the index object slowed as compared to that of the underlying data set. At first glance, this also seems to be the case for the TIO. However, as the index grows in size the compression factor of the TIO may not achieve the same efficiency as the centroids. One reason for this is that the tagged lists can get quite long, depending on the ordering of the assignment of tags to the underlying data. That is, the tagging as defined allows for a compressed expression of tag "ranges" -- e.g., "1-500" instead of "1,2,3,[...]500". Thus, it might be interesting to explore an optimal "sorting" of underlying data, before applying tags, in order to arrange the most common tokens have consecutive tags (maximal compression of the tag lists). It's not clear if this can be done efficiently over the entire set of records, attributes, and tokens, but it would bear some investigation, to produce the most compressed TIO for transmission. Additionally, in order to make (time) efficient use of the tags in the RI in practice, it is almost necessary to "reinflate" the index object to be able to do joins on tag lists associated with tokens that match. Alternatively, the compressed tag list can be stored, and there is an additional cost associated with comparing the tag lists for matching tokens -- i.e., list comparison operations done outside the scope of a base database management system. There was an unexpected tradeoff to be made. 2.3.3 Handling Status Messages Mapping of status messages from multiple sub-transactions into a single status communication for the end-user client software became something of a challenge. When chaining a query to multiple WDSPs (though the SAPs), it is not uncommon for at least one of the WDSP servers to return an error code or be unavailable. If one WDSP cannot be reached, out of several referrals, should the client software be given the impression that the query was completed successfully, or not? Most client protocol error handling models are not sophisticated enough to make this level of distinction clear. 2.3.4 Deployment with Commercial Software When it then was time to test the resulting software with standard commercial client and server software, a few more surprises came to light (primarily in terms of these softwares' expected worldview and occasional implementation shortcuts). Again, more detail is provided in the Appendix, but highlights included client software that could only handle a very small subset of a protocol's defined status message lexicon (e.g., 2 system messages supported), and client software that automatically appended additional terms to a query specified by the user (e.g., adding "or email="). 2.4 Some observations 2.4.1 Participation of the WDSPs One of the things that came to light was that the nature of the index object generated by the WDSPs has an important impact on performance -- both in terms of integrating the index object into the Referral Index, and in terms of efficiency of handling queries. A proposal might be either to define more clearly how the WDSPs should generate the CIP index object (currently left to their discretion), or to alert individual WDSPs when their index objects are considered substandard. On another front, when chaining referrals to WDSP servers, some servers perform more efficiently than others, affecting the overall response time of the DAG system. From a service point of view, it should also be possible to suggest to WDSP's that are consistently slow (longer than some selected response time) that they are substandard. 2.4.2 Index Objects and Referral Index size As described in more detail [complex], there are many factors that can influence the growth factor of index objects (as more data is indexed). That work dealt specifically with tokenized data for Whois++ centroids, and is not immediately generalizable to all forms of the Tagged Index Object. However, the particular structure of the TIO used for the TISDAG project is similar enough in structure to a centroid that the same "order of magnitude" and growth characteristics are applicable. Factors that affects the size of the data ("number of entries"): . Number of generated tokens The number of tokens generated from the directory data depends on what is tokenized. If data is tokenized on names and addresses (i.e. not unique data like phone numbers) a rough estimation is that the number_of_tokens = 0.2 * number_of_data_records. The growth is linear in the span from a few thousend to at least 1.2 million records. The growth should then level off since the sets of names and addresses are finite, but the current tests have not shown a break point. If data is tokenized on something that is unique, e.g. phone numbers, then a rough estimation is that the number_of_tokens = number_of_data_records. Note that it is possible to tokenize in different ways, for example divide the phone numbers in parts. This would result in fewer tokens. . Number of directories Since the tokens are generated individually for each directory, the data size depends on the number of directories. 10 directories with 100.000 records will generate the same amount of tokens as one directory with 1.000.000 records. 2.4.3 Index Object and Query Performance Factors that affects the performance ("queries/second"): . Type of query (exact, substring, etc.) A 'substring' query is slower than an 'exact' query due to: 1) somewhat slower look-up in the internal DAG database than an exact query. 2) Mostly, a larger amount of data is fetched from the internal DAG database due to more hits, which generates more index processing. 3) Substring queries are sent to the directory servers which also results in more hits and more data fetched. The directory servers may also be more or less effective in handling substring queries. . Number of search attributes A query with one or few attributes will most of the time result in many hits, which results in a lot of data, both internally in DAG and from the directory servers. On the other hand, a query with many attributes will result in a somewhat slower look-up in the internal DAG database. . Number of directories A larger number of directories may result in many referrals, but it depends on the query. A simple query will generate a lot of referrals, which means a lot of data from the directories has to be fetched. It will also result in a somewhat slower look-up in the internal DAG database. . Number of chained referrals Queries that are not chained are faster, since the result data does not have to be sent through the DAG system. Chained queries to several directories can be processed in parallel in the SAPs, but all data has to be processed in the CAP before sent to the client. . Response time in the directory servers The response time from the directory servers are of course critical. The total response time for DAG is never faster than the slowest involved directory server. . Number of tokens (size of Tagged Index Objects) The number of tokens has little impact on the look-up time in the internal DAG database. 2.5 Some evolutions To date, the TISDAG project has been "alive" for just over two years. During that time, there have been a number of evolutions -- in terms of technologies and ideas outside the project (e.g., user and service provider expectations, deployment of related software, etc) as well as goals and understanding within the scope of the project. Chief among these last is the fact that the project set out to primarily fulfill the role of a national referral service, and gradually evolved towards becoming more of a transparent protocol proxy service, fulfilling client queries as completely as possible, within the client protocol's semantics. This evolution was probably provoked by a number of reasons -- existing client & server software has a narrower range of accepted (expected) behaviour than their protocol specs may describe, once the technology was there for some proxying, going all the way seemed to be within reach, etc. >From the point of view of providing a national whitepages service, this is a very positive evolution. However, it did place some strains on the original system architecture, for which some adjustments have been proposed (more detail below). What is less clear is the impact this evolution will have on the flexibility of the system architecture -- in terms of addressing other applications, different protoocols (and protocol paradigms), etc. That is, the original intention of the system was to very simply fulfill an unsophisticated role -- "find things that sort of match the input query and let the client itself determine if the match is close enough". As the requirements become more sophisticated, the simplicity of the system is impacted, and perhaps more brittle. (Some proposals for avoiding this are outlined in [DAG++], which attempts to return to the underlying principles and propose steps forward at that level). In terms of impact within the TISDAG project, this evolution lead to the following technical adjustments: . The latest version of the technical specification makes a distinction (in the internal protocol grammar) between queries directed at the Referral Index, and those passed to SAPs to fulfill a query. This distinction keeps the query-routing queries simple, but allows more sophistication in expressing a query designed to fulfill the client's original semantic expression. . The additional constraints in the SAP query language is still not enough to allow the internal protocol to express very sophisticated queries. Originally intended only for query-routing queries, the DAG/IP expects all queries to be token-based (whereas LDAP queries are phrase-oriented). This means that SAPs have to do a good deal of "post-pruning" of WDSP result sets to match the DAG/IP query sent by a CAP for query fulfillment. And, CAPs must in turn do more post-pruning to match the DAG/IP results (from the SAPs) to the original query semantics. The real strength of the TISDAG project was that it separated the technical framework needed to support the service from the configuration required in order to support a particular application or service -- query & schema mapping, configuration for protocols, etc. Future improvements should focus on evolving that framework, maintaining the separation from the specific applications, services, and protocols that may use it. 3.0 Related Projects The TISDAG project is not alone in attempting to solve the problems of providing coordinated access to resources managed by multiple, disparate services. 3.1 The Norwegian Directory of Directories (NDD) Described in [NDD], the Norwegian Directory of Directories project also aims to provide necessary infrastructure for a national directory service. It assumes LDAP (v2 or v3) accessibility of WDSP information (provided by the WDSP itself, or through other arrangements), and aims to resolve some of the trickier issues associated with hooking together already-operational LDAP servers into a coherent network: uniform distinguished naming scheme, and content-based referrals. It also addresses some of the pragmatic realities of being compatible with different versions of LDAP clients -- e.g., v2, which does not support referrals, and v3, which does. At the heart of the system is the "Referral Index and Organizational information" (RIO) server, which provides a searchable catalogue over Norwegian organization. This faciliates the location of whitepages servers for individual organizations (assuming the query includes information about which organization(s) is(are) interesting). This work can be seen as being complementary to the TISDAG work, in that it provides a more focused service for integrating LDAP directory servers. However, there is still some requirement that one knows the organization to which a person belongs before doing a search for their e-mail address. This may be reasonable for seeking mail addresses associated with a person's work organization, but is less often successful when it comes to finding a personal e-mail address -- in an age where ISPs abound, a priori knowledge of a user's ISP identification is unlikely. 3.2 DESIRE Directory Services The EC funded project DESIRE II (http://www.desire.org) is developing a distributed European indexing system for information on Research and Education. The Directory Services work undertaken by DANTE and SURFnet proposes an architecture applied to a server mesh structure to create a wide-area directory service infrastructure. This service is intended to support both whitepages information with LDAP servers at WDSPs, as well as a Web-search meshes at various places using Whois++ for information about resources and routing of queries to other index-based services. Like the TISDAG project, the DESIRE directory services project aims to act as a focal point for queries, allowing client software to access appropriate resources from a wide range of disparate services. There are architectural differences between the approach used in the TISDAG project and the DESIRE directory service project, but many of the driving needs are the same, and the approach of using content-based indexing and referrals was also selected. 4.0 Some Directions for TISDAG Next Steps The fun thing with technology is that there are always more tweaks and changes that can be made. However, a service should evolve in response to specific customer needs, and there are 3 critical ways in which the TISDAG service itself should advance. These are outlined below, in terms of possibilities perceived at this time, rather than specific recommendations for underlying technology changes that would be necessary to fulfill them. 4.1 Integrating multiple DAG networks: mesh. 4.1.1 Overview of mesh possibilities The Common Indexing Protocol is designed to facilitate the creation not only of query referral indexes, but also of meshes of (loosely) affiliated referral indexes. The purpose of such a mesh of servers is to implement some kind of distributed sharing of indexing and/or searching tasks across different servers. So far, the TISDAG project has focused on creating a single referral index; the obvious next step is to integrate that into a larger set of interoperating services. Two different possibilities are possible for extending the TISDAG service to a mesh model (or some combination of both). First, it should be possible to create a mesh of DAG-based services. Or, it might be interesting to use the mesh architecture to incorporate access to other types of services (e.g., the Norwegian Directory of Directories). In either case, the basic principle for establishing a mesh is that interoperating services should exchange index objects, according to the architecture of the mesh (e.g., hierarchical, or graph-like, preferrably without loops!). As is outlined in the CIP documentation ([CIP1]), many possibilities exist for mechanisms for creating indexes over multiple referral servers -- for example, WDSP index objects could be passed along untouched, or a referral index server's contents could be aggregated into a new index object, generating referrals back to that server. The proposal is that the mesh should be constructed using index objects aggregated over participating services' servers. That is, referrals will be generated to other recognized services, not their individual participants. This can be done as a hierarchy or a level mesh one-layer deep, but the important reason for not simply passing forward index objects (unaggregated) is that individual services may support different ranges of access protocols, have particular security requirements, etc. Referrals should be directed to a CAP or CAPs -- either the standard ones used by the DAG system, or new ones established to support particular semantics of remote systems (e.g., other query types, etc). Within a given DAG system, referrals to these remote servers will look just like any other referral, although a particular SAP or SAPs may be established to provide query fulfillment (again, to enable translations between variations of service, to allow secure access if the relationship between the services is restricted, etc). In the following scenarios of mesh traversal, the assumption is that the primary service in discussion (Country A in Scenario 1, Country B in Scenario 2) is a DAG-based service. The scenarios are presented in the light of interoperating DAG services, but in most cases it would be equally applicable if the remote service was provided by some other service architecture. Again, the key element for establishing a mesh of any sort is the exchange of the CIP index object, not internal system architecture. 4.1.2 Scenario 1: Top Down Suppose 2 countries tie their services together. A user makes a query in Country A. A certain number of hits are made against the index objects of A's WDSPs. There is also a hit in the aggregate index of Country B. There are 3 possible cases under which this must be handled: Case 1: Country A and Country B are running services that are essentially the same -- in terms of protocols, queries, and schema that are supported. In this case, one referral should be generated per protocol supported by Country B's service. The referral can be passed back as far as the client, if its protocol supports referrals. Alternatively, the CAP may chain the referral through an appropriate SAP, in the usual fashion. In other words, the CAPs of Country B's service act as WDSPs to Country A's service. Consider the following illustration (only relevant CAPs, SAPs, etc, are shown; others suppressed for lack of room): +-----------------+ (1) |-----+ Country A | +-------+ ------>|Prot1| DAG | |A-WSDP1| <------| CAP | +-----| | Prot1 | (2) |-----+ |Prot1| +-------+ | | SAP | ----+ | +-----| +-------+ (3)| | +-------+ | |A-WDSP2| | | | RI-A | | | Prot1 | | +-----------------+ +-------+ | | +-------+ | |A-WDSP3| | | Prot2 | +----------------+ +-------+ | [...] | | +-----------------+ | |-----+ Country B | +-------+ +-------->|Prot1| DAG | |B-WSDP1| | CAP | +-----| | Prot2 | |-----+ |Prot1| +-------+ | | SAP | | +-----| +-------+ | +-------+ | |B-WDSP2| | | RI-B | | | Prot1 | +-----------------+ +-------+ [...] where Prot[i] is some particular query protocol RI-A has an index over all A-WDSP[i] and RI-B RI-B has an index over all B-WDSP[i] (1) is the query to the Country A DAG system, which yields a referral based on the index object from RI-B (2) is that referral (3) is the resolution of that referral, which the client takes to the Country B DAG system directly (to find out which, if any, B-WDSP[i] have relevant information) Case 2: Country A and Country B are running services that address the same service type (e.g., whitepages), but are not using an identical collection of protocols, allowed queries, or schema. The index object that Country B sent to Country A's DAG service must be constructed in terms of Country A's service, in order for appropriate hits to be generated against the index object (i.e. for referrals to Country B's service). However, to resolve the referral, it will be necessary to do some further protocol/schema/query mapping. This can be done by a special SAP established within Country A's service, that maps Country A's service into the published service of Country B. Country A may then elect to support only one of Country B's access protocols, and the designated SAP will always contact one type of CAP at Country B. Alternatively, Country B can establish a particular CAP that does the mapping from Country A's service into something that is most appropriate against the internal structure of its service. In this case, Country A's referral will be to a special CAP in Country B's service (which, again, will look like a WDSP to the Country A service); in fact, the referral may be handled directly by the client software. The difference between the two possible approaches lies in the responsibility of managing the relationship between the 2 service types. On the one hand, Country A could handle it if it knows its service as well as the published access to Country B. On the other, Country B could be responsible for establishing a CAP for every country that may want to connect to it. The latter can, in some cases, be justified by the amount of internal optimization that can be done, and because it reduces the overhead for Country A's service (can pass the referral directly back to the client software). Consider the following illustration (only relevant CAPs, SAPs, etc, are shown; others suppressed for lack of room): +-----------------+ (1) |-----+ Country A | +-------+ ------>|Prot1| DAG | |A-WSDP1| <------| CAP | +-----| | Prot1 | (2) |-----+ |Prot1| +-------+ | | SAP | ----+ | +-----| +-------+ (3)| | +-------+ | |A-WDSP2| | | | RI-A | | | Prot1 | | +-----------------+ +-------+ | | +-------+ | |A-WDSP3| | | Prot2 | +----------------+ +-------+ | [...] | | +-----------------+ | |-----+ Country B | +-------+ | |Prot3| DAG | |B-WSDP1| | | CAP | +-----| | Prot3 | | |-----+ |Prot3| +-------+ | |---------+ | SAP | | |Country A| +-----| +-------->|CAP:Prot1| | |---------+ | +-------+ | +-------+ | |B-WDSP2| | | RI-B | | | Prot3 | +-----------------+ +-------+ [...] where Prot[i] is some particular query protocol RI-A has an index over all A-WDSP[i] and RI-B RI-B has an index over all B-WDSP[i] (1) is the query to the Country A DAG system, which yields a referral based on the index object from RI-B (2) is that referral (3) is the resolution of that referral, which the client takes to the Country B DAG system directly, but to a CAP that is specifically designed to accommodate protocols from Country A's service, and map it (and schema) into Country B's service. Likely, all Country B referrals will be chained for the Country A client Case 3: The third possibility is, in fact, a refinement of the first. If Country A and Country B are running services that are every way identical except for the data (WDSPs covered), then it may make sense to NOT aggregate Country B's WDSP index objects, but to copy them to Country A's server. Then, Country A's CAPs might be given access to the SAPs of Country B in order to carry out chaining directly at the remote service (instead of implicating Country A's SAPs and Country B's CAPs, as in the first example above). The answer does not come from technology -- it depends entirely on the nature of the relationship that can be established between Country A and Country B's services. 4.1.3 Scenario 2: Working Up The above scenario implicitly assumes that Country A's server had received index objects from Country B's server. This will be the case if Country A's server is higher in the levels of a hierarchy of services (established by agreements between the service operators), or if the network is comprised of servers that share their index objects with all others, for example. In the latter case, searching at any one of the servers in the service yields the full range of results -- referrals will be made to any other server that might have data that fulfills the user's query. The sharing of the index objects is a mechanism to allow each server to manage local data, while enabling distributed load-sharing on the basic query handling. However, if a hierarchical, or at least not-completely-connected model is used for the server network, queries carried out at a level other than the top of the hierarchy, or in one particular branch of the hierarchy, will not actually be matched against all index objects. Therefore, there may be other servers to which the query should be directed if the full space needs to be searched. Suppose, for example, that in the above example Country B is in fact lower in the hierarchy than Country A. A user sending a query to Country B's service may be content to limit the scope of the query to that country's information (this is true in enough real-life situations that this hierarchical relationship becomes an effective mechanism for scoping queries and avoiding having to flood the entire network with every single query or keep full copies of all data in every server). Still in theoretical stages, the DAG/IP provides control constructs to allow DAG components to act according to the topology of the mesh. A CAP might use the "polled-by" system command to establish what other servers in the mesh exist in higher levels (and therefore would be worth contacting if the scope of the search is to be increased). In the example above, a CAP in Country B's system could determine that Country A's service was polling Country B, and therefore make it a logical target for expanding the scope of the query. More experience (primarily with server mesh topologies) is necessary before it will be clear how to best make use of these capabilities: . should the CAP always broaden the scope? only if there are no local referrals? under user direction? . should the CAP use a local SAP to contact the remote service's CAP? . is it better to completely connect the mesh of servers, or produce some kind of hierarchy? . etc 4.1.4 Other considerations Depending on the context in which a mesh is established (e.g., between national white pages services, or different units of a corporate organization, etc), it may be useful to allow individual WDSPs to indicate whether they are willing to have their data included in a DAG system's aggregated index object (i.e., allowing the DAG system to receive referrals from other systems in the mesh). 4.2 Security support There is a need for security considerations when making use of a wide-scaled directory system in other application areas than the public white-pages application of the TISDAG project. There are issues whether the directory service is distributed across the Internet, or even if it functions completely within an internal, closed network. In the medical area, searching for patient information over multiple hospitals and other institutions within the health-care community, the requirements for security are very high. The DAG-system applied to medical application should only be integrated with the existing security model already implemented in the health-care community; the use of DAG in a medical application will have impact on the DAG-system's security architecture. Proposals for such a system are presented elsewhere, but some of the specific requirements are listed below. In an internal closed hospital network, it is possible to expect dedicated, application-specific interfaces and protocols. For this client software, the specific steps of achieving authorization and carrying out the transaction should be welded into a single service interaction. The DAG-system needs support for a uniform authentication and authorization service interface for facilitating access control decisions and requesting access control information about users, roles, organisations. For example, access control requirements of a medical application may include: - authentication of the user, e.g. doctor - authorization, classified by roles for individual users, roles and organizations - time availability, e.g. time of the day or day of the week - encryption of the information - required confidentiality/integrity information protection based on relation to users, roles and organisations. - secure network communications, host properties Security in updates and CIP index objects is provided by encryption and signature of objects from registered WDSPs. Using CIP index objects inherits the security considerations of CIP, for more details see [CIP1]. 4.3 WDSPs attributes and schemas Today the DAG system makes use of 2 information schemas -- the DAGPERSON schema for information about specific people, and the DAGORGROLE schema for organizational roles. The technical specification includes a definition of the schema, as well as an understood mapping to (and from) some standard schemas used in the supported protocols. Nevertheless, to include new WDSPs which may not have all attributes in schemas, may use different schemas as well as query attributes, it should be possible to provide creation and use of new customized/standardlized schemas and perform schema mapping if it's neccessary. It might also be possible to constrain queries to desired query attributes, templates, or object classes. In practice, this means that different WDSP's may choose to use different subparts of one defined schema, or even implement local customizations. 5.0 Other applications of DAG One of the tests of flexibility of an architecture is to see how well it stands up when tried in new environments. In that light, we present 2 completely different applications in which DAG-like systems can be considered. 5.1 DAG as applied to medical applicationns As alluded to above, the need for accurate and complete information is very large within the health-care community. The information about patients can not be centralized from all different institutions that own and administrate the data. Patients do not always face the same doctor or even go to the same hospital or clinic within the community. Instead of requiring centralized mirroring of complete information and to avoid rebuilding the whole structure for information, the DAG can be used as a gateway which interconnects the different patient records databases within a hospital or within a geographical area. There is some architectural differences to apply DAG to medical application such as definition of schemas and security considerations. 5.2 Wireless access to DAG The ability to provide differentiated forms of access to directory service information will be very important in the future. Particularly in the context of wide-area directory deployment (such as white pages services for companies and operators, but also over multiple operators, etc) will be very important in the future. Users will demand to access information from their PC at work or at home, from PDA's and from cellular phones (wireless). Wireless Application Protocol -- WAP, as specified in [WAP] and other forms of access are providing new ways to access the Internet. Cellular phones that can handle WAP (i.e browser functionality in the phone) are already on the market, as well as operators running WAP server functionality in their cellular phone networks. The architecture of DAG is very modular and facilitates the addition of new CAPs, such as a "WAP-CAP" or other supported protocol by WAP servers. Using WAP as an access mechanism would enable a person to have access to their personal address book, a company or operator's directory service. This information could be used for several other services: voice calling, click to dial, click to buy etc. The advantage here is that the directory service provides a context for the user. This context, or the search to the directory service, must be limited, for example to avoid an overgeneralized query and too many hits for the query. Providing this type of service will also be useful in the context of Unified Messaging Services. Unified Messaging services are becoming more and more used today -- the user recieves an email which is forwarded to their cellular phone. A wide area directory system, such as a DAG system, could be used to facilitate things like searches for, or lookup of the E.164 number for an email adress. Of specific interest is that this would allow searches for information over multiple operators. Like the TISDAG project's pilot deployment, this section only describes "read-only" access from different clients to the DAG-system. In the longer term, it would be useful to distinguish between reads, searches/lookups and writes. This suggests that proper integration with the Authentication, Authorisation and Accounting (AAA policy) is needed for the different accesses to DAG. This security issue have to be addressed again and more about that is provided in a separate document ([DAG++]). 6.0 Some conclusions Although fewer people now hold out the hope of a unified global directory service, based on standardize protocols, it is interesting to see more projects providing infrastructure that permits unified access to what is otherwise an unforgivingly diverse and dislocated set of information servers. What cannot be dictated (in standardized protocols and schemas) may yet be accommodated through service infrastructure. The right approach seems to be to build better and better frameworks for supporting such diversified services, without making the framework architecture dependent on specific technologies. 7.0 Acknowledgements This document outlines the perspectives and opinions of the authors, based on experience as well as many fruitful and enlightening discussions with others: Roland Hedberg, Torbjorn Granat, Patrik Granholm, Rikard Wessblad and Sandro Mazzucato. 8.0 Authors' Addresses Thommy Eklof Ericsson S-126 25 STOCKHOLM Sweden Email: thommy.eklof@ericsson.com Leslie L. Daigle Thinking Cat Enterprises Email: leslie@thinkingcat.com 9.0 References Request For Comments (RFC) and Internet Draft documents are available from numerous mirror sites. [CIP1] J. Allen, M. Mealling, "The Architecture of the Common Indexing Protocol (CIP)", RFC 2651, August 1999. [CIP2] J. Allen, M. Mealling, "MIME Object Definitions for the Common Indexing Protocol (CIP)", RFC 2652, August 1999. [CIP3] J. Allen, P. Leach, R. Hedberg, "CIP Transport Protocols", RFC 2653, August 1999. [DAG++] L. Daigle, T.Eklof, "An Architecture for Integrated Directory Services, "Internet Draft (work in progress), June 1999 [TISDAG] L. Daigle, R. Hedberg "Technical Infrastructure for Swedish Directory Access Gateways (TISDAG)," RFC XXXX, June 1999 [centroid] Deutsch, et al., "Architecture of the WHOIS++ service", RFC 1835, August 1995. [NDD] R. Hedberg, H. Alvestrand, "Technical Specifica- tion, The Norwegian Directory of Directories (NDD)," Internet Draft (work in progress), May 1999 [TIO] R. Hedberg, B. Greenblatt, R. Moats, M. Wahl, "A Tagged Index Object for use in the Common Indexing Protocol", RFC 2654, August 1999. [complex] P. Panotzki, "Complexity of the Common Indexing Protocol: Predicting Search Times in Index Server Meshes", Master's Thesis, KTH, September 1996 [WAP] The Wireless Application Protocol, http://www.wapforum.org Appendix -- Specific Software Issues and Deployment Experiences The following paragraphs outline practical deployment experiences in an anecdotal fashion. This is not meant to be construed as an exhaustive, authoritative evaluation of existing client software, but rather an indication of the types of challenges the average implementation team may expect to encounter in a development and deployment effort. Character encoding ------------------ One client's addressbook sends iso-8859 encoding (depending on the font configuration in the browser) when querying a directory server but the directory server responds with Unicode (UTF-8) encoding. This means that the LDAP CAP would have to handle different character set encodings for request and response. Referrals --------- Today there appears to be only one commercial addressbook supporting LDAPv3. All the others support only LDAPv2. However, this LDAPv3 client software does not handle referrals correctly -- the client couldn't handle server the result contains "response code 10" (designated for referrals). From what was observed, there was now way for the client or the end-user to decide if, or which, referrals to follow-up. It is therefore not clear how the LDAP clients handle a combination of both referrals and results -- but the supposition is that it doesn't work. Objectclasses in LDAP --------------------- No objectclass is defined in the query to the DAG-system from the LDAP-clients. This means that the DAG-system doesn't see any differences between "inetOrgPerson" and "organisationalRole" when attribute "cn" is representing both "name" and "role". This is not so much a problem as that it has interesting side effects. Namely, although most directory user interfaces (found in browsers, mail programs) claim only to support person-related queries, in practise a user of the client could use the interface to send a query with role in the name entry. Query with attribute Organisation --------------------------------- It is possible to send a query with attribute "organisation" but it would result in no hits because of that the organisation attribute is not included in the objectclass "inetOrgPerson". Roland Hedberg has proposed a change for the latest release of the objectclass definition document. To provide the desired ability to narrow search focus to some range of organization names (attribute values), there are three possible approaches with differing merits/detractions: Recommend the use of the "locality" attribute -- although a more standard definition would be required (locality is currently used for everything from organization to county to map coordinates). Recommend or require that the attribute organisation should be inherited in objectclass "inetOrgPerson". Build the LDAP DAG-SAP to submit 2 query to the WDSP. The second is the same as the first, with only cn filters if the entire query including "o" results in no hits (i.e., back off from the organization filtering if it doesn't seem to be supported). Configuration ------------- It is not possible to see what character set a LDAP clients want to use. The recommendation so far in he project has been to define a unique port for each character set. This requires extra end-user configuration of client software, and proper advertizing of the port number-charset mapping provided in the service. DN -- When the user wants to look-up more information about a person found in a preliminary search, the LDAP client uses the entry's DN together with host and port to the DAG system. Not only does that mean that the client submits a non-compliant query to the DAG system, as DNs are not part of any of the defined queries for the service, it simply does not provide the desired effect of getting to the user's entry. Response Codes -------------- The LDAPv3 client that was used does not support more than 2 response codes -- "success" and "size limit exceeded". All the other response codes are translated to "size limit exceeded", although no results are returned. That is, if the error was in fact that the size limit was exceeded, the results up to the size limit are presented. If it was another response code mapped to that one, no results are presented. Sending and loading CIP Index Objects ------------------------------------- At least one server is quoting the CIP-object incorrectly for the Swedish characters A-Ring, A-Umlaut and O-Umlaut. Sending quoted printable CIP-objects with PINE mail software works. Source - Labeled URI -------------------- The original plan for the use of the labeled-URI attribute was to use it to return a pointer to the WDSP that provided the user information. However, the standard use of the labeled-URI attribute, which may in fact be populated in the data returned by a WDSP, is to contain the URI for more private related homepages.