Internet-Draft Leslie Daigle draft-daigle-ura-01.txt Peter Deutsch November 22, 1995 Bill Heelan Chris Alpaugh Mary Maclachlan Bunyip Information Systems, Inc Uniform Resource Agents (URAs) ---------------------------------------- Abstract ---------------------------------------- This paper presents an experimental architecture for an agent system that provides sophisticated Internet information access and management. Not a generalized architecture for active objects that roam the Internet, these agents are modeled as extensions of existing pieces of the Internet information infrastructure. The proposed agent technology focuses on the necessary information structures to encapsulate Internet activities into objects that can be activated, transformed, and combined into larger structured activities. ---------------------------------------- Status of this Memo ---------------------------------------- This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months. Internet-Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet-Drafts as reference material or to cite them other than as a "working draft" or "work in progrss." To learn the current status of any Internet-Draft, please check the 1id-abstracts.txt listing contained in the Internet-Drafts Shadow Directories on ds.internic.net, nic.nordu.net, venera.isi.edu, or munnari.oz.au. ---------------------------------------- Acknowledgements ---------------------------------------- Several people have shared throughts and viewpoints that have helped shape the thinking behind this work over the past few years. We'd like to thank, in particular, Chris Weider, Patrik Faltstrom, Michael Mealling, Alan Emtage, and the participants in the IETF URI Working Group for many thought-provoking discussions. ---------------------------------------- Introduction ---------------------------------------- The evolution of Internet information systems has been characterized by building upon successive layers of encapsulated technologies. Machine address numbers were devised, and then encapsulated in advertised machine names, which has allowed the evolution of the Domain Name System (DNS) [RFC1034, RFC1035]. Protocols were developed for accessing Internet resources of various descriptions, and then uniform mechanisms for specifying resource locations, standardized across protocol types, were developed (URLs) [RFC1738]. Each layer of Internet information primitives has served as the building blocks for the next level of abstraction and sophistication of information access, location, discovery and management. The work described in this paper is an experimental system designed to take another step in encapsulation. While TCP/IP protocols for routing, addressing, etc, have permitted the connection and accessibility of a plethora of information services on the Internet, these must yet be considered a diverse collection of heterogeneous resources. The World Wide Web effort is the most successful to date in attempting to knit these resources into a cohesive whole. However, the activity best-supported by this structure is (human) browsing of these resources as documents. The Uniform Resource Agent (URA) initiative explores the possibility of specifying an activity with the same kind of precision accorded to resource naming and identification. A fully-instantiated URA carries out a task delegated by an invoker (human or otherwise). The nature of the task is determined by the agent that the invoker instantiates; that originating object encapsulates some knowledge of existence of relevant Internet resources and information required in order to access them. In this way, URAs can be used to allow invokers to carry out high-level Internet resource activities while insulating them from the details of Internet protocols, etc. Also, by formally specifying a high-level Internet activity in an agent, the activity can be repeated by the same or another invoker at a later date, the activity can be incorporated into a still higher-level activity, the agent object can be modified to carry out a related task, etc. More detail describing the underlying philosophy of this particular approach can be found in [IIAW95]. ---------------------------------------- Motivation ---------------------------------------- As a very simple example, consider the client task of subscribing to a mailing list. There are many mechanisms for providing the user information necessary to complete a subscription. Currently, all applications which provide the ability to subscribe to mailing lists must contain net-aware code to carry out the task once the requisite personal data has been solicited from the user. Furthermore, any application program that embeds the ability to subscribe in its code necessarily limits the set of mailing lists to which a client can subscribe (i.e, to those types foreseen by the software's creators). If, instead, there is an agent to which this task can be delegated, all applications can make use of the agent, and that agent becomes responsible for carrying out the necessary interactions to complete the subscription. Furthermore, that agent may be a client to other agents which can supply particular information about how to subscribe to new types of mail servers, etc. ---------------------------------------- Relationship to Other Internet Agents ---------------------------------------- A number of Internet-aware agent and transportable code systems have been released recently -- Java [JAVA], TCL [TCL] and Safe-TCL, Telescript [TELE], and the TACOMA system [TACOMA], to name a few of them. Some of these systems, like Java, focus on providing mechanisms for creating and distributing (inter)active documents in the World Wide Web. Others, like TACOMA, have more general intentions of providing environments for mobile, interacting processes. While each of these systems makes its individual contribution to solving the transportation, communication, and security issues normally associated with agent systems, they yield more objects that exist within the Internet information space. That is, while they may permit individual users to have a more sophisticated interaction with a particular information resource, they do not address the more general Internet problems of naming, identifying, locating resources, and locating the same or similar resources again at a later date. It is this set of problems that URAs specifically set out to address. ---------------------------------------- The Experimental Architecture ---------------------------------------- At the centre of the URA architecture is the concept of a (persistent) specification of an activity. For purposes that should become clear as the expected usage of URAs is described in more detail, we choose to support this concept with the following requirements of the architecture: . there is a formalized environment in which these specifications are manipulated (examined, executed, etc). This is referred to as a URAgency. . the activity specifications are modular, and independent of a given URAgency environment. Thus, they exist as object constructs that can be shared amongst URAgencies. There is a standardized _virtual_ structure of these URA objects, although different types may exist, with different underlying implementations. Basic URAgency Requirements --------------------------- A URAgency is a software system that manipulates URA objects. In the terminology of objects, a URAgency identifies the types of URAs it handles, and is responsible for applying methods to objects of those types. For the purposes of this experimental work, the only methods it is required to support are those to get information about a given URA, and to execute a URA. The expected result of applying the "get information" method to a URA is a description of some or all of the URA following the standardized virtual structure of a URA object, outlined below. The appropriate way to "execute" a URA is to supply information for the individual URA data segments (in effect, to permit the creation of an instance of a virtual object), or to identify a URA instance. Again, the information is to be supplied in accordance with the virtual structure below. When a URAgency claims to handle a particular type of URA, it must have the ability to map that URA type's implementation structure into and out of the virtual, standard URA structure, it must know how to activate the URA, and it must satisfy any runtime dependencies for that type of URA. For example, a URA type may consist of a Pascal program binary which, when run with particular command line arguments, yields information in the standard URA object structure. Activating this type of URA might consist of executing the Pascal binary with an input file containing all the necessary data segments. A URAgency claiming to handle this sort of URA type must first be able to provide an environment to execute the Pascal binary (for whatever platform it was compiled), and also be able to interact with the Pascal binary according to these conventions to get information about the URA, or execute it. As an alternative example, a URA type may consist of a script in some interpreted language, with the URA object structure embedded as data structures within the script. A URAgency handling this type of URA might have to be able to parse the script to pull out the standard URA object structure, and provide the script language interpreter for the purposes of executing the URA. URA Object Structure -------------------- In order to capture the necessary information for carrying out the type of Internet activity described in the introductory paragraphs of this document, six basic (virtual) components of a URA object have been identified. Any implementation of a URA type is expected to be able to conform to this structure within the context of a URAgency. The six basic components of a URA object are: URA HEADER: Identification of the URA object, including a URA name, type and abstract, creator name, resources required by the URA, etc. ACTIVATION DATA: Specification of the data elements required to carry out the URA activity. For example, in the case of an Internet search for "people", this could include specification of fields for person name, organization, e-mail address, etc. TARGETS: Specification of the URL/URN's to be accessed to carry out the activity. Note that, until URN's are in common use, the ability to tweak URL's will be necessary. A key issue for URAs is the ability to transport them and activate them far from the creator's originating site. This may have implications in terms of accessibility of resource sites. For example, a software search created in Canada will likely access a Canadian Archie server, and North American ftp sites. However, an invoker in Australia should not be obliged to edit the URA object in order to render it relevant in Australia. The creator, then, can use this section to specify the expected type of service, with variables for the parts that can be modified in context (e.g., the host name for an Archie server, or a mirror ftp site). EXPERIENCE INFORMATION: Specification of data elements that are not strictly involved in conversing with the targets in order to carry out the agent's activity. This space can be used to store information from one invocation of a URA instance to the next. This kind of information could include date of last execution, or URLs of resources located on a previous invocation of the agent. ACTIVITY: If URAs were strictly data objects, specifying required data and URL/URN's would suffice to capture the essence of the composite net interaction. However, the variability of Internet resource accesses and the scope of what URAs could accomplish in the net environment seem to suggest the need to give the creator some means of organizing the instantiation of the component URL/URN's. Thus, the body of the URA should contain a scripting mechanism that minimally allows conditional instantiation of individual URL/URN's. These conditions could be based on which (content) data elements the user provided, or accessibility of one URL/URN, etc. It also provides a mechanism for suggesting scheduling of URL/URN instantiation. The activity is specified by a script or program in a language specified by the URA type, or by the URA header information. All the required activation data, targets, and experience information are referenced by their specification names. RESPONSE FILTER: The main purpose of the ACTIVITY module is to specify the steps necessary to take the ACTIVATION DATA, contact the TARGETS, and collect responses from those services. The purpose of the RESPONSE FILTER module is to transform those responses into the result of the URA invocation. This transformation may be along the lines of reformatting some text, or it may be a more elaborate interpretation (e.g., providing a relevance rating for a retrieved HTML page). The response filter is specified by a script or program in a language specified by the URA type, or by the URA header information. All the required activation data, targets, and experience information are referenced by their specification names. See Appendix 1 for a more detailed description of the components of a URA. Appendix 2 contains a sample virtual URA structure. The Architecture in Action -------------------------- Having introduced the required capabilities of the URAgency and virtual structure of URA objects, it is now time to elaborate on the tasks and interactions that are best supported by URAs. URAs are constructed by identifying net-based resources of interest (targets) to carry out a particular task. The activation data component of a URA is the author's mechanism for specifying (to the invoker) the elements of information that are required for successful execution . An invoker creates an instance of a URA object by providing data that is consistent with, or fills in, this template. Such an instance encapsulates everything that the agent "needs to know" in order to contact the specified target(s), make a requrest of the resource ("get", or "search", etc) and return a result to the invoker. This encapsulation is a sophisticated identification of the task results. For exmaple, in the case of a mailing list subscription URA, the creator will identify the target URL for a resource that handles list subscription (e.g., an HTML form), and specify the data required by that resource (user name, user mail address, mailing list identifier, etc). When an invoker provides that information and instantiates the URA, the resulting object completely encapsulates all that is needed in order to subscribe the user -- the subscription result is identified. URAs are manipulated through the application of methods. This, in turn , is governed by the URAgency with which the invoker is interacting. However, because the virtual structure of URAs is represented consistently across URA types and URAgencies, a URAgency can act as one of the targets of a URA. Since methods can be applied to URAs remotely, URAs can act as invokers of URAs. This can yield a complex structure of task modules. For example, a URA designed to carry out a generalized search of book-selling resources might make use of individual URAs tailored to each resource. Thus, the top-level URA becomes the orchastrating URA for access to a number of disparate resources, while being insulated from the minute details of accessing those resources. ---------------------------------------- A Prototype Implementation ---------------------------------------- The experimental work with URAs includes a prototype implementation of URA objects. These are written in the Tcl scripting language. A sample prototype Tcl URA can be found in Appendix 3. The URAgency that was created to handle these URAs is part of the Silk Desktop Internet Resource Discovery tool. Silk provides a graphical user interface environment that allows the user to access and search for Internet information without having to know where to look or how to look. Silk presents a list of the available URAs to carry out these activities (e.g., "search for tech reports", "hotlist", etc). For each activity, the user is prompted for the activation data, and Silk's URAgency executes the URA. The Silk software also supports the creation and maintenance of URA object instances. Users can add new URAs by creating new Tcl scripts (per the guidelines in the "URA Writer's Guide", available with the Silk software. See [SILK]). The Silk graphical interface hides some of the mechanics of the underlying URAgency. A more directly-accessible version of this URAgency will become available. ---------------------------------------- Conclusions ---------------------------------------- This work was originally conceived as an extension to the family of Uniform Resource Identifiers -- URLs, and the proposed Uniform Resource Names (URNs), and Uniform Resource Characteristics (URCs). The approach of formalizing the characteristics of an information task in a standardized object structure is seen as a means of identifying a class of resources, and contributes to the level of abstraction with which users can refer to Internet resources. Although still in experimental stages, this work has already evoked interest and shown promise in the area of providing mechanisms for building more advanced tools to interact with the Internet at a more sophisticated level than just browsing web pages. One of the major difficulties that has been faced in developing a collection of URAs is the brittleness induced by interacting with services that are primarily geared towards human-users. Small changes in output formats (easily discernable by the human eye) can be entirely disruptive to a software client that must apply a parsing and interpretation mechanism based on placement of cues in the text. This problem is certainly not unique to URAs -- any software acting upon results from such a service is affected. Perhaps there is the need for an evolution of "service entrances" to information servers on the Internet -- mechanisms for getting "just the facts" from an information server. Of course, one way to provide such access is for the service provider to develop and distribute a URA that interacts with the service. When the service's interface changes, the service provider will be moved to update the URA that was built to access it reliably. Work will continue to develop new types of URAs, as well as other URAgencies. This will necessitate the creation of URAgency interaction standards -- the "common virtual URA object structure" is the first step towards defining a lingua franca among URAs of disparate types and intention. ---------------------------------------- References ---------------------------------------- [IIAW95] Leslie L. Daigle, Peter Deutsch, "Agents for Internet Information Clients", CIKM'95 Intelligent Information Agents Workshop, December 1995. Available from [JAVA] "The Java Language: A White Paper" Available from [RFC1034] P.V. Mockapetris, "Domain Names - Concepts and Facilities", RFC 1034, November 1987. [RFC1035] P.V. Mockapetris, "Domain Names - Implementation and Specification", RFC 1035, November 1987. [RFC1738] T. Berners-Lee, L. Masinter, M. McCahill, "Uniform Resource Locators (URL)", RFC 1738, December 1994. [SILK] Bunyip's Silk project homepage: [SILKURA] Silk URA information: [TACOMA] Johansen, D. van Renesse, R. Schneider, F. B., "An Introduction to the TACOMA Distributed System", Technical Report 95-23, Department of Computer Science, University of Tromso, Norway, June 1995. [TCL] Ousterhout, J. K. "Tcl and the Tk Toolkit", Addison Wesley, 1994. [TELE] White, J. E., "Telescript Technology: The Foundation for the Electronic Marketplace", General Magic White Paper, General Magic Inc., 1994. ---------------------------------------- Authors' Address ---------------------------------------- Leslie Daigle Peter Deutsch Bill Heelan Chris Alpaugh Mary Maclachlan Bunyip Information Systems, Inc. 310 St. Catherine St. West Suite 300 Montreal, Quebec, CANADA H2X 2A1 Phone: (514) 875-8611 E-mail: ura-bunyip@bunyip.com ---------------------------------------- Appendix 1 -- Virtual URA Structure ---------------------------------------- This appendix contains a BNF-style description of the expected virtual structure of a URA object. This "virtual structure" acts as the canonical representation of the information encapsulated in a given URA. It is expected that more information may optionally be contained in the elements of the components -- the elements listed here are offered as the "minimum" or "standard" set. N.B.: []-delimited items are optional %% denotes a comment \0 represents the empty string | is "or" {} are literal characters This form is used for convenience and clarity of expression -- whitespace and ordering of individual elements are not considered significant. := {} := { URAHDR } { ACTDATA } { TARG } { EXPINFO } { ACTSPEC } { RESPFILT } := { name } { author } { version } [ { lang } ] [ { parent } ] := | \0 := { { name } { response } { prompt } [ { required } ] [ { default } ] } := | \0 := { { name } { protocol } { url } [ { } ] } := | := %% a complete, valid URL string (e.g., http://www.bunyip.com/) := { { scheme } { host } [ { port } ] { selector } } := { { name } { response } { prompt } } := { { name } { response } { prompt } } := { { name } { response } { prompt } } := { { name } { response } { prompt } } := { { name } { response } } := := %% Without requiring more detail... := \n | \0 := 0 | 1 := := := := := := := := := := := := http-get | http-post | ... := := := := := := := := := := := := := ---------------------------------------- Appendix 2 -- Sample Virtual URA Representation ---------------------------------------- A valid virtual representation of a Silk Tcl URA is presented below. The actual URA from which it was drawn is given in Appendix 3. { {URAHDR {name {DejaNews Search}} {author {Leslie Daigle}} {version {1.0}} } {ACTDATA {name {Topic Keywords}} {prompt {Topic Keywords}} {response {}} } {EXPINFO {name {Comments}} {prompt {Comments}} {response {}} } {ACTSPEC {proc mapResponsesToDejanews {} { set resp "" if {[uraAreResponsesSet {Topic Keywords}]} { lappend resp [list query [uraGetSpecResponse {Topic Keywords}]] } return $resp } proc uraRun {} { global errorInfo foreach serv [uraListOfServices] { set u [uraGetServiceURL $serv] switch -- $serv { dejanews { if [catch { set query [mapResponsesToDejanews] if {$query != {}} { set result [uraHTTPPostSearch $u $query] if {$result != ""} { set list [dejanews_uraHTTPPostCanonicalize $result] puts $list } } }] { puts stderr $errorInfo } } default { # can't handle other searches, yet. } } } } } } {RESPFILT { proc dejanews_uraHTTPPostCanonicalize {htmlRes} { set result {} set lines {} set clause {} set garb1 "" set garb2 "" # Get the body of the result page -- throw away leading and # trailing URLs regexp {([^
]*)
(.*)
.*} $htmlRes garb1 garb2 mainres set lines [split $mainres "\n"] foreach clause $lines { if [regexp {
.*(..\/..).*([^<]*).*([^<]*).*} \ $clause garb1 dt relurl desc grp] { lappend r [list HEADLINE [format "%s (%s, %s)" [string trim $desc] \ [string trim $grp] $dt]] lappend r [list URL [format "http://www.dejanews.com/cgi-bin/%s" $relurl]] lappend r [list TYPE "text/plain"] lappend result $r } } return $result } } } } ---------------------------------------- Appendix 3 -- Sample Silk Tcl URA ---------------------------------------- The following is a valid Silk Tcl URA. For more information on the implementation and structure of Silk-specific URAs, see the "URA Writers Guide" that accompanies the distribution of the Silk software (available from ). # ----------------------------------------------------------------------------- # # URA initialization # # ----------------------------------------------------------------------------- # # Initialize the URA, its search specs and searchable services. # # URA init. set uraDebug 1 uraInit { {name {DejaNews Search}} {author {Leslie Daigle}} {version {1.0}} {description "This URA will search for UseNet News articles."} {help "This is help on UseNet News search script."} } # # bug: handling of choices/labels is kind of gross. # # Search spec. init. foreach item { { {name {Topic Keywords}} {field Topic} {tag STRING} {description {Keywords to search for in news articles}} {prompt {Topic Keywords}} {help {Symbols to look up, separated by spaces.}} {type STRING} {subtype {}} {allowed .*} {numvals 1} {required 0} {response {}} {respset 0} } } { uraSearchSpecInit $item } uraAnnotationInit { {help {Enter comments to store with an instance}} {numvals 1} {subtype {}} {response {}} {name Comments} {required 0} {class ANNOTATION} {type TEXT} {description {General comments about this URA.}} {respset 1} {prompt Comments} {field {}} {allowed .*} } uraResultInit { {name {Related Pages}} {contents { { {HEADLINE {The DejaNews UseNet search service}} {TYPE text/plain} {URL http://www.dejanews.com} } }} } foreach item { { {name dejanews} {protocol http-post} {url http://marge.dejanews.com/cgi-bin/nph-dnquery} } } { uraServicesInit $item } proc dejanews_uraHTTPPostCanonicalize {htmlRes} { set result {} set lines {} set clause {} set garb1 "" set garb2 "" # Get the body of the result page -- throw away leading and trailing URLs regexp {([^
]*)
(.*)
.*} $htmlRes garb1 garb2 mainres set lines [split $mainres "\n"] foreach clause $lines { uraDebugPuts stderr [format "Line: %s" $clause] if [regexp {
.*(..\/..).*([^<]*).*([^<]*).*} \ $clause garb1 dt relurl desc grp] { uraDebugPuts stderr [format "Date: %s Rel URL: %s Desc: %s Group: %s" \ $dt $relurl $desc $grp] lappend r [list HEADLINE [format "%s (%s, %s)" [string trim $desc] \ [string trim $grp] $dt]] lappend r [list URL [format "http://www.dejanews.com/cgi-bin/%s" $relurl]] lappend r [list TYPE "text/plain"] lappend result $r } } return $result } # ----------------------------------------------------------------------------- # # Mapping procedures # # ----------------------------------------------------------------------------- # # There is one procedure, for each searchable service, to map the search # spec responses to a form suitable for inclusion into a search URL (or # whatever form the particular query procedure accepts). # # # proc mapResponsesToDejanews {} { set resp "" if {[uraAreResponsesSet {Topic Keywords}]} { lappend resp [list query [uraGetSpecResponse {Topic Keywords}]] } return $resp } # # bug: need better error reporting (i.e. which searches didn't work and why, etc.) # proc uraRun {} { global errorInfo foreach serv [uraListOfServices] { set u [uraGetServiceURL $serv] switch -- $serv { dejanews { if [catch { set query [mapResponsesToDejanews] uraDebugPuts stderr [format "%s: query is `%s'." $serv $query] if {$query != {}} { set result [uraHTTPPostSearch $u $query] if {$result != ""} { uraDebugPuts stderr [format "%s: result is `%s'." $serv $result] set list [dejanews_uraHTTPPostCanonicalize $result] uraDebugPuts stderr [format "%s: list is `%s'." $serv $list] puts $list } } }] { puts stderr $errorInfo } } default { # can't handle other searches, yet. } } } } ------------------------------------------------------------------------------ "The Relentless Pursuit of Perfection Leslie Daigle is a depth-first search of the infinitely deep leslie@bunyip.com tree of life's experiences." Montreal, Canada -- ThinkingCat ------------------------------------------------------------------------------