Internet Draft Andre Beck Markus Hofmann Expires: August 2001 Lucent Technologies Document: draft-beck-opes-irml-00.txt Updates: draft-beck-opes-psrl-00.txt February 2001 Category: Informational IRML: A Rule Specification Language for Intermediary Services Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract Intermediary services are a new class of applications running on network edge intermediaries like caching proxies or dedicated application servers. They are described in [2] and [3]. These intermediary services can be executed on behalf of clients, access providers, or content providers. In order to control the execution of intermediary services, these parties provide service-specific rules that trigger services if rule conditions are met for incoming or outgoing messages. The Intermediary Rule Markup Language (IRML) is an XML-based language that can be used to describe service-specific execution rules. It allows clients, access providers, and content providers to specify when and how to execute intermediary services. Table of Contents Status of this Memo................................................1 Abstract...........................................................1 1. Terminology.....................................................3 Beck, Hofmann Expires August 2001 [Page 1] Internet Draft IRML February 2001 2. Problem Description and Goals...................................3 3. IRML Syntax and Grammar.........................................4 3.1. The "rulemodule" Element......................................4 3.2 The "owner" Element............................................5 3.2.1. The "name" Element..........................................5 3.2.2. The "id" Element............................................5 3.3. The "protocol" Element........................................6 3.4. Examples of the "owner", "name", "id", "protocol" Elements....6 3.5. The "rule" Element............................................6 3.5.1. The "property" Element......................................7 3.5.2. The "action" Element........................................9 3.5.3. Examples of the "rule", "property" and "action" Elements....9 4. Order of Service Execution.....................................10 5. Security Considerations........................................10 6. Acknowledgement................................................11 7. Reference......................................................11 Author's Addresses................................................11 Appendix - IRML DTD...............................................12 Appendix - Rule Module Examples...................................12 Full Copyright Statement..........................................14 Beck, Hofmann Expires August 2001 [Page 2] Internet Draft IRML February 2001 1. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [4]. intermediary Intermediaries are application-aware devices located in the communication path between client and origin server, for example (caching) proxies, gateways, switches etc. rule module A rule module contains a set of rules and information about the rule module owner. rule Rules contain conditions and actions that are to be executed if the conditions are met. action The execution of a local/remote service module. Message properties MAY be modified as the result of the execution. service module Service modules are executable code modules that can be executed in a local service execution environment on the intermediary or a remote service execution environment on a dedicated application server. They may run on behalf of content providers, access providers, and clients. 2. Problem Description and Goals The three parties that may wish to run intermediary services as described in [2] and [3] are the same parties that are involved in a typical Web transaction: 1. Client 2. Access provider (ISP, CDN etc.) 3. Content provider Each party must be able to express the conditions under which they wish to run a service. A content provider, for instance, may want to adapt its pages for users with small wireless devices. Providers of free Internet services may want to insert advertisements into all HTML pages served to their clients. Web users may wish to have certain Web pages translated into a different language. Beck, Hofmann Expires August 2001 [Page 3] Internet Draft IRML February 2001 These examples demonstrate the need for rules that tell the intermediary hosting these services when to run what service. The three parties for which services may be executed must provide these rules to the intermediary. A rule engine on the intermediary processes rules that apply to incoming requests/outgoing responses in order to determine what service modules need be executed when and in what order. Since the intermediary processing the rules is not necessarily maintained by the parties that may wish to author rules, a standard specification language is required. This document defines the Intermediary Rule Markup Language (IRML) in an attempt to create a standard rule format that will be supported by vendors of service-enabled intermediaries and by third parties offering network edge service applications. The Intermediary Rule Markup Language defined in this document also serves as a standard representation of rules for intermediary services. This facilitates the exchange and discussion of these kind of rules between and within groups of rule authors. It is beyond the scope of this document to define a secure and reliable mechanism for transferring rule files to intermediaries. Likewise, this document does not describe the specifics of how to (efficiently) process rules on the intermediaries. 3. IRML Syntax and Grammar IRML is an application of XML. Thus, its syntax is governed by the rules of the XML syntax as defined in [5], and its grammar is specified by a DTD, or Document Type Definition. The IRML DTD can be found in Appendix A. Valid and well-formed IRML documents consist of one or more rule modules. Each rule module contains a set of rules and information about the rule module provider. Rule modules can be provided by a content provider, an access provider, or by a client (although usually indirectly through an access provider). The rules contained in rule modules each consist of a number of conditions and a number of consequent actions that must be executed if the conditions are met. The conditions within a rule refer to message properties in the request or response message of a given Web transaction. They are met if the property value matches the pattern(s) specified in the rule condition(s). 3.1. The "rulemodule" Element Beck, Hofmann Expires August 2001 [Page 4] Internet Draft IRML February 2001 The "rulemodule" element is the root element for all rule modules and MAY/MUST contain the following elements (see also IRML DTD in Appendix A). 3.2 The "owner" Element The "owner" element specifies the owner of the rule module. Each rule module can have exactly one owner. Attributes of "owner" Name Values ---------------------------------------------------- class content provider|access provider|client The "class" attribute assigns a rule module owner to one of the three types of rule module providers: content providers, access providers, and clients. 3.2.1. The "name" Element The "name" element contains a descriptive name for the rule module owner. This could be the company name for content and access providers and a customer login for clients. The name does not have to be unique among rule module owners. 3.2.2. The "id" Element The "id" element contains an identifier for the rule module owner. The identifier MUST be unique within a class of rule module providers. The "id" element determines whether a particular Web transaction is relevant to a rule module and thus, whether the contained rules have to be processed for this Web request/response. For example, a rule module provided by a content provider should only be processed for Web request referring to Web resources owned by this particular content provider. Therefore, if the rule module owner is a content provider, the "id" element MUST contain the domain name(s) of the content provider. If a content provider owns more than one domain and the relevant rule module pertains to more than one of them, the "id" element MAY even contain more than one domain name separated by the "|" character (see "owner" example). The specified domain name(s) MAY also contain a port number. If no port number is specified, then the default port for the specified protocol is assumed, e.g. 80 for HTTP. Beck, Hofmann Expires August 2001 [Page 5] Internet Draft IRML February 2001 If the rule module owner is an access provider, then the "id" element is of less importance since a particular intermediary is usually associated with only one specific access provider. If the rule module owner is a client, then a unique client identifier, e.g. a customer id, MUST be chosen in order to associate client rule modules with client requests. These client identifiers are most likely access provider-specific. If an access provider assigns only static IP numbers to its customers, the "id" element can also contain the IP number of the module owner. Otherwise, the dynamic IP addresses of incoming client requests MUST be mapped to the unique client "id" element value in order to determine whether a specific rule module must be processed for a particular client request/server response. 3.3. The "protocol" Element The "protocol" element contains the name of the protocol acronym the rule module pertains to. Although most services operate on HTTP, IRML is not limited to HTML messages. Any other message-based protocol that fits into the IRML framework can be used. 3.4. Examples of the "owner", "name", "id", "protocol" Elements Yahoo Inc. www.yahoo.com|dir.yahoo.com:8000 http abeck 205.167.45.1 http 3.5. The "rule" Element The "rule element" contains one or more "property" and/or "action" elements. Attributes of "rule" Name Values ---------------------------- processing-point 1|2|3|4 Beck, Hofmann Expires August 2001 [Page 6] Internet Draft IRML February 2001 The "processing-point" attribute specifies at which of the four points in figure 1 the rule engine on the intermediary must process a rule. The four "processing-points" are derived from the Extensible Proxy Services Framework as described in [2] and only apply to caching proxies. Implementation architectures for other intermediaries might define different or additional "processing- points". Figure 1 shows the typical HTTP data flow between a client, a caching proxy, and an origin server. The four processing points (1- 4) represent locations in the round trip message flow where rules can be processed and service modules can be executed. Note that the message flow may skip points 3 and 4 after point 1 if the requested object can be served from cache. +--------+ +-----------+ +--------+ | |<------|4 3|<------| | | Client | | Caching | | Origin | | | | Proxy | | Server | | |------>|1 2|------>| | +--------+ +-----------+ +--------+ Figure 1: Rule Processing/Service Execution Points Point 1: Client Request A HTTP request from a client has been received. A possible cache lookup has not yet occurred. Point 2: Proxy Request The requested Web object cannot be served from the cache and the origin server is about to be contacted for the HTTP resource. Point 3: Origin Server Response The HTTP response from the origin server has been received. It has not yet been stored in the cache. Point 4: Proxy Response The HTTP response from the cache or the origin server is about to be sent back to the client. Depending on the service type, rules may be processed and services may be executed at any of the four points outlined in figure 1. A virus scanning service for instance could be executed at point 3 in figure 1 in order to scan all Web objects for viruses before they can be stored in the cache. A URL-based request filtering service on the other hand should be executed at point 1 and an ad insertion service will probably be executed at point 4. 3.5.1. The "property" Element Beck, Hofmann Expires August 2001 [Page 7] Internet Draft IRML February 2001 The "property" element contains one or more other "property" elements and one or more "action" elements. "property" elements are conditions, that, if met, will lead to the execution of the service modules specified in the contained "action" elements. Nested "property" elements represent a hierarchical "AND" relationship. This means that an inner "property" condition can only be true if the outer "property" condition is true and so forth. Attributes of "property" Name Values Default ---------------------------------------- name CDATA matches CDATA case-sensitive (yes|no) "no" The "name" attribute specifies the name of the message property that is to be matched. This can be either a request or a response message property. The specified property names usually refer to protocol- specific header names. For HTTP messages for example, the list of protocol-specific header names is defined in [6]. IRML, however, is not limited to the message properties defined in protocol specifications. It also supports any user-defined message properties. Service modules for instance could add user-defined headers to request or response messages that would be processed by other service modules. For HTTP messages, IRML also defines the following property names that cannot be directly mapped to HTTP headers: Property Name Value -------------------------------------------------------------- "request-line" the first line of a HTTP request "response-line" the first line of a HTTP response "request-path" the relative path of the request URI "user-id" a value to identify a user, assigned by the access provider and unique for all customers of the same access provider In addition to these HTTP-specific headers, IRML defines environment properties that are independent of the used protocol: Property Name Value -------------------------------------------------------------- "system-date" system date in the format "yyyymmdd" "system-time" system time in the format "hhmmss" The values of the aboved listed environment properties MUST be provided by the service platform. The "matches" attribute specifies the pattern against which the property value MUST be matched by the rule engine on the Beck, Hofmann Expires August 2001 [Page 8] Internet Draft IRML February 2001 intermediary. The "matches" pattern MUST be a regular expression compliant with the basic or extended regular expression syntax as defined in [7]. If needed, the double-quote character (") MUST be represented in any attribute value as """ (as specified in [5]). The "case-sensitive" attribute specifies whether the matching of the specified pattern must be performed case sensitive or not. The default value for this attribute is "no", which means that pattern matching is case insensitive unless otherwise specified. If a "rule" element contains an "action" element that is not nested in one or more "property" elements, then the specified action must be performed for all messages that pass through the specified processing point. A user profiling service, for example, may have to be triggered for all user requests. 3.5.2. The "action" Element The "action" element contains a URI specifying the name, location and optional parameters of the service module that is to be executed on the intermediary or a dedicated application server. For local service modules, the "proxylet" scheme as defined in [8] MUST be used. If the service module resides on a dedicated application server and ICAP [9] is used as the transport protocol, the "action" element MUST contain an ICAP-URI as defined in the current version of the ICAP specification [9]. In both cases, any arguments MAY be passed as part of the service module name using the standard "?"-encoding of attribute-value pairs used in HTTP [6]. Only one service URI MAY be specified per "action" element. A "property" element, however, MAY contain several "action" elements. 3.5.3. Examples of the "rule", "property" and "action" Elements requestlog Beck, Hofmann Expires August 2001 [Page 9] Internet Draft IRML February 2001 proxylet://localhost/translate?target=de icap://mcaffee.com/viruscheck 4. Order of Service Execution The order in which service modules on the intermediary are executed may change the final result of a Web transaction. For example, an ad insertion service executed against the result of a Web page translation service may produce a different result than a reverse execution order. Up to three rule modules may have to be processed by a service- enabled intermediary per Web transaction. The order in which these rule modules are processed MUST reflect the order in which request/response messages pass by the rule module authors. This means that for incoming requests at points 1 and 2 in figure 1, the order MUST be: 1. Client rule module 2. Access provider rule module 3. Content provider rule module For outgoing responses at points 3 and 4, the order MUST be: 1. Content provider rule module 2. Access provider rule module 3. Client rule module Within a single rule module, the intermediary MUST process and execute all rules and actions IN THE ORDER THEY ARE SPECIFIED in the rule module (both within "property" and "rule" elements). If the rule processor determines that multiple actions must be executed for any given transaction, it MUST take into account that message property values may be modified by executed service modules. This may require waiting for the completion of a triggered service module before rule conditions of subsequent rules can be evaluated. 5. Security Considerations Beck, Hofmann Expires August 2001 [Page 10] Internet Draft IRML February 2001 Although beyond the scope of this document, it is clearly necessary to define a secure mechanism for transferring rule modules to intermediaries. This will include authenticating and authorizing rule module owners and service-enabled intermediaries. The integrity of rule modules must also be guaranteed. Also, a security context must be established on the intermediary for each rule module to ensure that rule modules may not execute service modules or call library functions on the intermediary without without being authorized to do so. 6. Acknowledgement The authors would like to thank all the active participants in the OPES mailing list for their thought-provoking discussion, and many of the ideas, suggestions have been incorporated into the document. Especially we want to acknowledge the following people for their helpful contributions: Lily Yang, Christian Maciocco, Mark Nottingham, and Michael Condry. 7. Reference 1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. 2 Tomlinson, G., et al., "Extensible Proxy Services Framework", http://www.ietf.org/internet-drafts/draft-tomlinson-epsfw-00.txt, July 2000 3 Hofmann, M., Beck, A., "Example Services for Network Edge Proxies", http://www.ietf.org/internet-drafts/draft-beck-opes- esfnep-01.txt 4 Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", Request for Comments 2119, Harvard University, March 1997 5 Bray, T., et al., Extensible Markup Language (XML) 1.0 (Second Edition), http://www.w3.org/TR/2000/REC-xml-20001006, October 2000 6 Fielding, R., et al., "Hypertext Transfer Protocol -- HTTP/1.1", Request for Comments 2616, June 1999 7 ISO/IEC DIS 9945-2:1992, Information technology - Portable Operating System Interface (POSIX) - Part 2: Shell and Utilities (IEEE Std 1003.2-1992); X/Open CAE Specification, Commands and Utilities, Issue 4, 1992 8 Maciocco, C., Hofmann, M., "OMML: OPES Meta-data Markup Language", draft-maciocco-opes-omml-00.txt 9 Elson, J., et al., "ICAP, the Internet Content Adaptation Protocol", http://www.ietf.org/internet-drafts/draft-elson-opes- icap-00.txt, December 2000 Author's Addresses Beck, Hofmann Expires August 2001 [Page 11] Internet Draft IRML February 2001 Andre Beck Markus Hofmann Bell Laboratories Lucent Technologies 101 Crawfords Corner Rd. Holmdel, New Jersey 07733 Phone: (732) 332-5983 Email: {abeck, hofmann}@bell-labs.com Appendix - IRML DTD Appendix - Rule Module Examples Content Provider Rule Module Example for Advertisement Insertion Service Lucent Technologies www.lucent.com http icap://adserver.net/insertad Beck, Hofmann Expires August 2001 [Page 12] Internet Draft IRML February 2001 Access Provider Rule Module Example for Advertisement Insertion Service for Free Internet Service Comcast Free Internet Service www.comcast.com http icap://adserver.com/insert_ad Client Rule Module Example for Language Translation and Virus Scanning Service Markus Hofmann 23242 http icap://mcaffee.com/virus_scan?mode=respmod Document language is probably not German -> Page needs to be translated --> proxylet://localhost/translate?target=de Content Provider Rule Module Example for Content Adaptation Service Beck, Hofmann Expires August 2001 [Page 13] Internet Draft IRML February 2001 for Wireless Web Access Devices Yahoo Inc. www.yahoo.com http icap://wapgateway.nl/transcode Full Copyright Statement Copyright (C) The Internet Society (2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Beck, Hofmann Expires August 2001 [Page 14]