Internet Draft Bert Culpepper draft-culpepper-sipping-app-interact- reqs-02.txt November 3, 2002 Robert Fairlie-Cuninghame Expires: May 2003 Nuera Communications, Inc. Session Initiation Protocol Based Application Interaction Requirements Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This particular draft is intended to be discussed in the SIPPING Working Group. Discussion of it therefore belongs on that list. The charter for SIPPING working group may be found at http://www.ietf.org/html.charters/sipping-charter.html Abstract This document defines the high level requirements for a framework and/or one or more mechanisms that support user interaction, via SIP-based user agents, with applications residing on remote network servers. The requirements in this document address the overall features of such a system, without regard to its architecture. SIP currently supports media-based application interactions using methods such as speech, video and end-to-end telephony-related tones; however, it is desired that more general application interaction models are defined, especially those that are not restricted to the media plane. In addition, it is desired that an application be able to present the user with application-specific Culpepper/Fairle-Cuninghame [Page 1] Internet Draft SIP-Based App Interaction Reqs Nov 3, 2002 user interfaces and information. The user agent should also be able to generate activity indications back to an application to communicate actions on physical or logical user interfaces. The document also defines a number of topic-related terms to assist in disambiguating discussions of the issues. 1. Motivation Telecommunications services in circuit-switched networks have utilized end-user indications as the means for users to interact with the services while users are engaged in a call. These end-user indications, such as those produced by a user pressing keys, are sent end-to-end through each of the network entities participating in the call. As communications services move to IP networks, the ability for users to interact with their communications services in a real-time like fashion must also follow. Unlike the legacy circuit-switched networks, nodes hosting many services in IP networks infrequently reside along the path taken by the media. Users of communications services have become accustomed to control of services through interaction via the communications terminal. The traditional means by which users interact with their communications services in legacy networks is via the use of DTMF generated as a result of the user pressing a key on terminal's keypad. Because of this, there is a significant desire to duplicate the use of DTMF to support user interaction with services tightly associated with IP communications sessions. The Internet network model for communications separates session control from the session media in that the devices involved in session control are not necessarily tightly coupled to the devices that process media. As the transport of DTMF is provided for in IP networks as a media stream, access to these user indications by the network entities involved in the session control is awkward. In addition, limiting user interaction with communications services to input devices that emulate the traditional telephone keypad constrain the user devices unnecessarily. In addition to legacy application interaction methods such as DTMF, there is a desire for new interaction methods that support the use of web pages, keyboards and other user devices used to access the Internet to be available. These new interaction methods should operate, from a user's perspective, in a consistent and seamless manner with legacy methods such as DTMF. It is for these reasons a different mechanism than that based on legacy networks is needed to transport user indications for application interaction in IP networks. The Session Initial Protocol (SIP) [2] has been chosen as the session control protocol for multimedia session establishment within the general Internet and in many other IP-based networks. Because of this choice, it is desirable to have a mechanism supporting user Culpepper/Fairlie-Cuninghame [Page 2] Internet Draft SIP-Based App Interaction Reqs Nov 3, 2002 application interaction that works with SIP. As SIP deals with session control and not media transport, the mechanism should not be limited to the media plane. 2. Terminology The following acronyms and terms are used in this document. Requestor: The agent responsible for requesting user indications or application presentations from the Reporter. The Requestor is normally associated with the Application Entity. Reporter: The agent responsible for detecting and reporting user activity indications or presenting a user application component to the user. This framework restricts the Reporter to being a SIP UA and is normally associated with the User or User Device. UA: SIP User Agent [2]. User Activity Indication (UAI): The message(s) containing the data associated with the reporting of discrete user indications, for instance, a mouse click or button press. It refers to indications relating to discrete stimulus-based interactions rather than media stream-based interactions such as voice or video. Physical User Interface: The collection of physical input and presentation devices possessed by a device, for instance, a display, speaker, microphone or dialpad. Logical User Interface (LUI): The logical collection of user interface components (see definition below) used by a user to interact with a group of (explicitly) cooperating applications. A logical user interface is independent of all other application interactions occurring on the device. User Interface Component (UIC): A component (physical or otherwise) used for application interaction. Examples of UICs include: a web- page window, a media-based video window, a speaker, microphone or a key-based input device. A UIC may only generate user activity indications when the user is interacting with the associated logical user interface. Presentation-based Interaction: A presentation-based UIC will present an application-supplied user interface (or simply application-supplied information) to the user. A presentation-based component will also commonly allow a user to interact directly with the supplied interface through stimulus-based methods (UAIs). An example is a web-page window & pointing device or simply a display screen with no associated input device. Media-based Interaction: Media-based interaction refers to user input supplied via UICs that process media (e.g., audio). Media- based UI components allow bi-directional or unidirectional Culpepper/Fairlie-Cuninghame [Page 3] Internet Draft SIP-Based App Interaction Reqs Nov 3, 2002 interaction through the media plane, for instance, a speaker or a microphone (unidirectional) or a speaker & microphone combination (bi-directional). Media-based UICs may present application-supplied user interfaces or information to the user; however, these components do not generate discrete user activity indications and merely relay un-interpreted media streams to/from the application. The resulting framework does not alter the normal SIP session semantics but simply allows the media-based SIP session to be associated with a UIC within a logical user interface. Input-based Interaction: Input-based interaction refers to user input supplied via UICs that do not present an application-supplied interface to the user but rather correspond to a (usually physical) interface possessed by the device, for instance, a dialpad or keyboard. Input-based UI components generate User Activity Indications in response to user actions. 3. End-to-end Verses Asynchronous User Activity Indications The end-to-end user activity indications currently supported in IP networks require "workarounds" in SIP networks so that applications along the session signaling path have access to the indications. The current solution requires "DTMF forking" be supported by the endpoint, or requires the receiving entity, when it's not the final destination for the session's media, to re-generate the indication towards the destination. In many scenarios, the indications meant for the application are not used at the destination. User activity indications needed for application interaction on the other hand, are only needed between an endpoint/user and the application within the network. Using end-to-end mechanisms for application interaction, when the application is not itself an endpoint in the session, is problematic as indicated above. 4. General Requirements R1: The framework must support the collection of device/user input generated in the context of a SIP dialog or conversation-space. R2: The framework must transport user activity indications to network elements independently of the media plane. R3: The transport mechanism must be sensitive to the limited bandwidth constraints of some signaling planes; for instance, reliability through blind retransmission is not acceptable. R4: The framework must support multiple network entities or applications requesting and receiving user activity indications from a SIP UA independently of each other. Culpepper/Fairlie-Cuninghame [Page 4] Internet Draft SIP-Based App Interaction Reqs Nov 3, 2002 R5: The framework must provide a means for a network application/entity to indicate its desire to receive user activity indications and/or to present an application interface on the User's UA. R6: The framework must support a means for a requestor to be able to determine the user interface components that are available at the UA for application use. R7: The framework must provide a means for a SIP UA to indicate its capability/intent to fulfill a request for user activity indications. R8: The framework must provide a means whereby the Requestor can indicate its desire to only receive a subset of the supported user activity indications for any non-trivial UI component. R9: The framework must provide a means to prevent the transport of UAIs unless implicitly or explicitly requested by an entity. R10: The framework should support devices with a wide range of user interfaces for both presentation-based and input-based interaction modes, for instance, it must support devices that possess a display UI component, as well as those that do not; from devices that only have physical buttons to those that only have display-based pointing devices. R11: The framework must be extensible so that a variety of non key- based user activity indications can be supported now or in the future, for instance, sliders, dials, switches, local voice- commands, hyperlinks, biometrics, etc. R12: The framework must support reliable delivery of UAIs at least as good as the session control protocol. R13: The framework must ensure that the receiver of user activity indications (i.e., the Requestor) can determine their original order of occurrence and detect any missing indications. R14: The framework must allow the user to know which application is associated with each UIC. R15: The framework must provide a mechanism that allow users to have assurances that the user input they are providing is only seen by the application that created the user interface component. R16: The framework must support the ability for each user interface component to be associated with a separate logical user interface. Each logical user interface may be associated with the same or different applications. For example, a user may want to interact with a voice-recording application and a prepaid calling application within the same call but allow each application to use a different logical user interface. Culpepper/Fairlie-Cuninghame [Page 5] Internet Draft SIP-Based App Interaction Reqs Nov 3, 2002 R17: The framework must allow user interface components created through this mechanism to be updated or removed as desired by the creating application entity. R18: Unless authorized by the user, application interaction resources established through this framework should be terminated when they are no longer associated with a SIP dialog (by the User Agent). 5. Key-Based Input Specific Requirements K1: The framework must address the collection of DTMF-based user activity indications. K2: The framework must address the collection of user activity indications for device- and/or user- specific buttons. K3: For key-based indications, the framework must provide some form of indication of key press duration. K4: For key-based indications, the framework must provide some form of indication of a key-press' occurrence in time relative to other key presses. 6. Desirables D1: The framework should allow a device to indicate relative preferences amongst its various supported user interface components. D2: To help manage feature interaction, the framework should also allow a means of prioritizing user interface component requests from multiple network entities within a single SIP dialog. 7. Acknowledgements The authors would like to acknowledge the detailed comments and additions to this document by Jonathan Rosenberg of Dynamicsoft, Inc. and Eric Chueng of AT&T Labs. 8. Authors Robert Fairlie-Cuninghame Nuera Communications, Inc. 50 Victoria Rd Farnborough, Hants GU14-7PG United Kingdom Phone: +44-1252-548200 Email: rfairlie@nuera.com Culpepper/Fairlie-Cuninghame [Page 6] Internet Draft SIP-Based App Interaction Reqs Nov 3, 2002 Bert Culpepper Email: bertculpepper@netscape.net 9. References 1 S. Bradner, "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. 2 J. Rosenberg, H. Schulzrinne, et. al., "SIP: Session Initiation Protocol", RFC 3261, June 2002. Culpepper/Fairlie-Cuninghame [Page 7]