Internet Draft Bert Culpepper draft-culpepper-sipping-app-interact- InterVoice-Brite, Inc. reqs-01.txt May 1, 2002 Robert Fairlie-Cuninghame Expires: November, 2002 Nuera Communications, Inc. Network Application Interaction Requirements Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This particular draft is intended to be discussed in the SIPPING Working Group. Discussion of it therefore belongs on that list. The charter for SIPPING working group may be found at http://www.ietf.org/html.charters/sipping-charter.html Abstract This document defines the requirements for a mechanism that supports the interaction of SIP-based user agents with applications residing on remote network servers. SIP currently supports media-based application interactions using methods such as speech, video and end-to-end telephony-related tones; however, it is desired that more general application interaction models are supported, especially those that are not restricted to the media plane. It is desired that an application can present the user with application-specific user interfaces and information. The user must also be able generate activity indications back to application to communicate actions on physical or virtual user interfaces. The document also defines a number of topic-related terms to assist in disambiguating discussions of the issues. Culpepper/Fairle-Cuninghame [Page 1] Internet Draft Application Interaction Requirements May 1, 2002 1. Motivation Telecommunications services in circuit-switched networks have utilized end-user indications as the means for users to interact with the services while users are engaged in a call. These end-user indications, such as those produced by a user pressing keys, are sent end-to-end through each of the network entities participating in the call. As communications services move to IP networks, the ability for users to interact with their communications services in a real-time like fashion must also follow. Users of communications services have become accustomed to control of services through interaction via the communications terminal. The traditional means by which users interact with their communications services in legacy networks is via the use of DTMF generated as a result of the user pressing a key on terminal's keypad. Because of this, there is a significant desire to duplicate the use of DTMF to support user interaction with services tightly associated with IP communications sessions. The Internet network model for communications separates session control from the session media in that the devices involved in session control are not necessarily tightly coupled to the devices that process media. As the transport of DTMF is provided for in IP networks as a media stream, access to these user indications by the network entities involved in the session control is awkward. In addition, limiting user interaction with communications services to input devices that emulate the traditional telephone keypad constrain the user devices unnecessarily. In addition to legacy application interaction methods such as DTMF, there is a desire for new interaction methods such as web pages, keyboards and other are input devices to be available. These new interaction methods should operate, from a user's perspective, in a consistent and seamless manner with legacy methods such as DTMF. It is for these reasons a different mechanism than that based on legacy networks is needed to transport user indications for service (application) interaction in IP networks. The Session Initial Protocol (SIP) [2] has been chosen as the session control protocol for multimedia session establishment in IP networks. Because of this choice, it is desirable to have a mechanism supporting user service interaction that works with SIP. As SIP deals with session control and not media transport, the mechanism should not be limited to the media plane. While other protocol approaches have been proposed, none are seen as supporting dynamic/real-time like application interaction on many of the devices that are used for personal communications. 2. Terminology Culpepper/Fairlie-Cuninghame [Page 2] Internet Draft Application Interaction Requirements May 1, 2002 The following acronyms and terms are used in this document. Requestor: The agent responsible for requesting user indications or application presentations from the Reporter. The Requestor is normally associated with the Application Entity. Reporter: The agent responsible for detecting and reporting user activity indications or presenting a user application component to the user. This framework restricts the Reporter to being a SIP UA and is normally associated with the User or User Device. UA: SIP User Agent [2]. User Activity Indication (UAI): The message(s) containing the data associated with the reporting of discrete user indications, for instance, a mouse click or button press. It refers to indications relating to discrete stimulus-based interactions rather than media stream-based interactions such as voice or video. Physical User Interface: The collection of physical input and presentation devices possessed by a device, for instance, a display, speaker, microphone or dialpad. Virtual User Interface (VUI): The logical collection of user interface components used by a user to interact with a group of (explicitly) cooperating applications. A virtual user interface is independent of all other application interactions occurring on the device. User Interface Component (UIC): A logical component of a VUI (physical or otherwise) used for network application interaction. Examples of UICs include: a web-page window, a media-based video window, a speaker, microphone or a key-based input device. A user interface component may only generate user activity indications when the user is interacting with the associated virtual user interface. Presentation-based Interaction: A presentation-based UIC will present an application-supplied user interface (or simply application-supplied information) to the user. A presentation-based component will also commonly allow a user to interact directly with the supplied interface through stimulus-based methods (UAIs). An example is a web-page window & pointing device or simply a display screen with no associated input device. Media-based Interaction: Media-based UI components allow bi- directional or unidirectional interaction through the media plane, for instance, a speaker or a microphone (unidirectional) or a speaker & microphone combination (bi-directional). Media-based UI components may present application-supplied user interfaces or information to the user; however, these components do not generate discrete user activity indications and merely relay un-interpreted Culpepper/Fairlie-Cuninghame [Page 3] Internet Draft Application Interaction Requirements May 1, 2002 media streams to/from the application. This framework does not alter the normal SIP session semantics but simply allows the media- based SIP session to be associated with a UI component within a virtual user interface. Input-based Interaction: Input-based interaction refers to user input supplied via UICs that do not present an application-supplied interface to the user but rather correspond to a (usually physical) interface possessed by the device, for instance, a dialpad or keyboard. Input-based UI components generate User Activity Indications in response to user actions. 3. End-to-end Verses Asynchronous User Activity Indications The end-to-end user activity indications currently supported in IP networks require "workarounds" in SIP networks so that applications along the session signaling path have access to the indications. The current solution requires "DTMF forking" be supported by the endpoint, or requires the receiving entity to re-generate the indication towards the destination. In many scenarios, the indications meant for the service application are not used at the destination. User activity indications needed for application interaction on the other hand, are only needed between an endpoint/user and the application within the network. Using end-to-end mechanisms for application interaction, when the application is not itself an endpoint in the session, is problematic as indicated above. 4. Application Interaction Models 4.1. Presentation-based Interaction Refer to the description of Presentation-based Interaction in Section 2. 4.2. Media-based Interaction Refer to the description of Media-based Interaction in Section 2. 4.3. Input-based Interaction As defined in Section 2, Input-based Interaction occurs through the generation of User Activity Indications from user actions. Two models have been proposed for how these indications should represent the user actions. 4.3.1. High-level Interaction In this model an end device has embedded application-specific knowledge and configuration. Rather than interacting through a set of key presses, the interaction occurs through an application- Culpepper/Fairlie-Cuninghame [Page 4] Internet Draft Application Interaction Requirements May 1, 2002 specific set of operations, for instance, "Go left", "Go right", and "Jump". The device or network must store a mapping from the actual device interface to the application-specific operations. An alternative way of viewing this form of interactions is that the set of operations are simply application-specific stimuli. Advantages: - Application interaction is independent of the device's actual interface. - Automatons can be used more easily to interact with supported applications. Disadvantages: - End devices are forced to incorporate application-specific knowledge or configuration in order to be able to use a service - this severely restricts the development and deployment of future applications. - Local devices are required to create and store the local mapping between the user interface and the application-specific stimuli (or the mapping must be stored in the network somewhere). The disadvantages of this approach are as great as the advantages. It is unacceptable to require that a device incorporates application-specific knowledge to be able to use a service - such a design principle is against the principles of the IETF. For this reason this method of interaction is NOT sufficient. Another drawback of this approach is that it does not intrinsically encourage application interoperability. This method of application interaction has been suggested by a number of people at IETF meetings; however, the IETF working group needs to decide whether or not this method of interaction should be encouraged. 4.3.2. Low-level Interaction In this model an application uses the mechanism to determine the makeup of a device's user interface and then interactions are driven though user interface associated stimuli. Advantages: - Devices do not need to incorporate application-specific information or configuration. - An application can adapt its operation to best suit the Culpepper/Fairlie-Cuninghame [Page 5] Internet Draft Application Interaction Requirements May 1, 2002 interface that a device possesses. For instance, this may mean greater reliance on non-tactile interface based methods such as voice recognition. The application is the only entity qualified to make these sorts of decisions. - In most instances, there is always some level of application interaction possible, albeit perhaps through a less graceful interface. Disadvantages: - User interface widgets cannot automatically be used with an application if the application does not recognize the widget (although this can be sidestepped by using local mapping configuration). 4.3.3. Input-based Interaction Summary These two schemes are not mutually exclusive and the benefits of both can be obtained. A device can utilize an enhanced level of interaction when interacting with an application that the device (or network) has knowledge and/or configuration for; likewise, an application can fall back to low-level interaction if the device (or network) does not possess the require application-specific knowledge. In an extensible input-based interaction framework it is difficult to prevent either interaction model, however, authors may find the IETF reluctant to standardize application-specific interfaces (which is similar to the standardization of services). 5. Requirements R1: The mechanism must support collecting device/user input generated in the context of a SIP dialog or conversation-space. R2: The mechanism must transport user activity indications to network elements independently of the media plane. R3: The transport mechanism must be sensitive to the limited bandwidth constraints of some signaling planes; for instance, reliability through blind retransmission is not acceptable. R4: The mechanism must support multiple network entities requesting and receiving indications from a SIP UA independently of each other. R5: The protocol mechanism must provide a means for a network entity to indicate its desire to receive user activity Culpepper/Fairlie-Cuninghame [Page 6] Internet Draft Application Interaction Requirements May 1, 2002 indications and/or to present an application interface on the User's UA. The protocol mechanism must provide a means for a SIP UA receiving a request to respond with its capability/intent to provide the requested services. R6: The mechanism must provide a means whereby the Requestor can indicate its desire to only receive a subset of the supported UAIs possible for any UI component. R7: User activity indications must not be generated unless implicitly or explicitly requested by an entity. R8: The mechanism should support devices with a wide range of user interfaces for both the presentation-based and input-based interaction modes, for instance, it must support devices that possess a display UI component, as well as those that do not; from devices that only have physical buttons to those that only have display-based pointing devices. R9: For key-based indications, the mechanism must accommodate devices with keypads/keyboards that range from very simple keypads to generic computer keyboards; the mechanism must also support the reporting of device- and/or user- specific buttons. R10: The mechanism must be extensible so that some non key-based user activity indications can be supported now or in the future, for instance, sliders, dials, switches, local voice- commands, hyperlinks, biometrics, etc. R11: A requestor must be able to determine the makeup/contents of the user interface possessed by a user device. Specifically, to determine the user interface components that are available for application use and the user activity indications that are supported by each component. R12: The mechanism must support reliable delivery of UAIs at least as good as the session control protocol. R13: For key-based indications, the mechanism must provide some form of indication of key press duration. R14: For key-based indications, the mechanism must provide some form of indication of a key-press' occurrence in time relative to other key presses. R15: The mechanism must ensure that the receiver of UAIs (i.e., the Requestor) can determine their original order of occurrence and detect any missing indications. R16: The mechanism must allow for end-to-end security/privacy between the Requestor and Reporter. Specifically, the mechanism must allow the Reporter (if desired) to ensure that transmitted user activity indications can only be viewed by the Culpepper/Fairlie-Cuninghame [Page 7] Internet Draft Application Interaction Requirements May 1, 2002 Requestor. R17: The Reporter must be able to identify and authenticate the Requestor for each user interface component. Specifically, in the case where the Requestor is an Application Entity, the User must be able to identify the application name & instance. An application instance consists of the application type [e.g., applications name, version & application designer name] and application instance [e.g., instance identifier & service provider's identity]. R18: The mechanism must support the ability for multiple virtual user interfaces to be associated with the same user session. Each virtual user interface may be associated with the same or different applications. For example, a user may want to interact with a voice-recording application and a prepaid calling application within the same call but allow each application to use a different virtual user interface. R19: The mechanism must support the ability for multiple applications to explicitly cooperate within the same virtual user interface. Specifically, each application may be associated with different UI components within the same virtual user interface. R20: The mechanism must allow user interface components created through this mechanism to be updated or removed as desired by the creating application entity. R21: The mechanism should not require the acknowledgement of a transmitted UAI before subsequent UAIs can be transmitted. 6. Desirables D1: The mechanism should be simple to implement and execute on devices with simple interfaces. D2: There should be a separation between the transport mechanism in the signaling plane and the message syntax. D3: The mechanism should allow a device to indicate relative preferences amongst the various user interface components. 7. Acknowledgements The authors would like to acknowledge the detailed comments and additions to this document by Jonathan Rosenberg of Dynamicsoft, Inc. 8. Authors Robert Fairlie-Cuninghame Nuera Communications, Inc. Culpepper/Fairlie-Cuninghame [Page 8] Internet Draft Application Interaction Requirements May 1, 2002 50 Victoria Rd Farnborough, Hants GU14-7PG United Kingdom Phone: +44-1252-548200 Email: rfairlie@nuera.com Bert Culpepper InterVoice-Brite, Inc. 701 International Parkway Heathrow, FL 32746 Phone: 407-357-1536 Email: bert.culpepper@intervoice-brite.com 9. References 1 S. Bradner, "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. 2 J. Rosenberg, H. Schulzrinne, et. al., "SIP: Session Initiation Protocol", draft-ietf-sip-rfc2543bis-09, Work in progress, February 2002. Culpepper/Fairlie-Cuninghame [Page 9]