Internet Draft                            Bert Culpepper 
   draft-culpepper-sipping-app-interact-     InterVoice-Brite, Inc. 
   reqs-01.txt 
   May 1, 2002                               Robert Fairlie-Cuninghame 
   Expires: November, 2002                   Nuera Communications, Inc. 
 
 
               Network Application Interaction Requirements 
 
 
Status of this Memo 
 
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026 [1]. 
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that 
   other groups may also distribute working documents as Internet-
   Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six 
   months and may be updated, replaced, or obsoleted by other documents 
   at any time.  It is inappropriate to use Internet- Drafts as 
   reference material or to cite them other than as "work in progress."  
    
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt  
    
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html. 
    
   This particular draft is intended to be discussed in the SIPPING 
   Working Group.  Discussion of it therefore belongs on that list.  
   The charter for SIPPING working group may be found at 
   http://www.ietf.org/html.charters/sipping-charter.html 
    
Abstract 
    
   This document defines the requirements for a mechanism that supports 
   the interaction of SIP-based user agents with applications residing 
   on remote network servers.  SIP currently supports media-based 
   application interactions using methods such as speech, video and 
   end-to-end telephony-related tones; however, it is desired that more 
   general application interaction models are supported, especially 
   those that are not restricted to the media plane.  It is desired 
   that an application can present the user with application-specific 
   user interfaces and information.  The user must also be able 
   generate activity indications back to application to communicate 
   actions on physical or virtual user interfaces.  The document also 
   defines a number of topic-related terms to assist in disambiguating 
   discussions of the issues. 

 
Culpepper/Fairle-Cuninghame                                    [Page 1] 

Internet Draft    Application Interaction Requirements      May 1, 2002 

    
1. Motivation 
    
   Telecommunications services in circuit-switched networks have 
   utilized end-user indications as the means for users to interact 
   with the services while users are engaged in a call.  These end-user 
   indications, such as those produced by a user pressing keys, are 
   sent end-to-end through each of the network entities participating 
   in the call.  As communications services move to IP networks, the 
   ability for users to interact with their communications services in 
   a real-time like fashion must also follow. 
 
   Users of communications services have become accustomed to control 
   of services through interaction via the communications terminal.  
   The traditional means by which users interact with their 
   communications services in legacy networks is via the use of DTMF 
   generated as a result of the user pressing a key on terminal's 
   keypad.  Because of this, there is a significant desire to duplicate 
   the use of DTMF to support user interaction with services tightly 
   associated with IP communications sessions.  The Internet network 
   model for communications separates session control from the session 
   media in that the devices involved in session control are not 
   necessarily tightly coupled to the devices that process media.  As 
   the transport of DTMF is provided for in IP networks as a media 
   stream, access to these user indications by the network entities 
   involved in the session control is awkward.  In addition, limiting 
   user interaction with communications services to input devices that 
   emulate the traditional telephone keypad constrain the user devices 
   unnecessarily. 
    
   In addition to legacy application interaction methods such as DTMF, 
   there is a desire for new interaction methods such as web pages, 
   keyboards and other are input devices to be available.  These new 
   interaction methods should operate, from a user's perspective, in a 
   consistent and seamless manner with legacy methods such as DTMF. 
    
   It is for these reasons a different mechanism than that based on 
   legacy networks is needed to transport user indications for service 
   (application) interaction in IP networks. 
    
   The Session Initial Protocol (SIP) [2] has been chosen as the 
   session control protocol for multimedia session establishment in IP 
   networks.  Because of this choice, it is desirable to have a 
   mechanism supporting user service interaction that works with SIP.  
   As SIP deals with session control and not media transport, the 
   mechanism should not be limited to the media plane.  While other 
   protocol approaches have been proposed, none are seen as supporting 
   dynamic/real-time like application interaction on many of the 
   devices that are used for personal communications. 


2. Terminology 

 
Culpepper/Fairlie-Cuninghame                                   [Page 2] 

Internet Draft    Application Interaction Requirements      May 1, 2002 


   The following acronyms and terms are used in this document. 
    
   Requestor: The agent responsible for requesting user indications or 
   application presentations from the Reporter.  The Requestor is 
   normally associated with the Application Entity. 
    
   Reporter: The agent responsible for detecting and reporting user 
   activity indications or presenting a user application component to 
   the user.  This framework restricts the Reporter to being a SIP UA 
   and is normally associated with the User or User Device. 
    
   UA: SIP User Agent [2]. 
    
   User Activity Indication (UAI): The message(s) containing the data 
   associated with the reporting of discrete user indications, for 
   instance, a mouse click or button press.  It refers to indications 
   relating to discrete stimulus-based interactions rather than media 
   stream-based interactions such as voice or video. 
    
   Physical User Interface: The collection of physical input and 
   presentation devices possessed by a device, for instance, a display, 
   speaker, microphone or dialpad. 
    
   Virtual User Interface (VUI): The logical collection of user 
   interface components used by a user to interact with a group of 
   (explicitly) cooperating applications.  A virtual user interface is 
   independent of all other application interactions occurring on the 
   device. 
    
   User Interface Component (UIC): A logical component of a VUI 
   (physical or otherwise) used for network application interaction.  
   Examples of UICs include: a web-page window, a media-based video 
   window, a speaker, microphone or a key-based input device.  A user 
   interface component may only generate user activity indications when 
   the user is interacting with the associated virtual user interface. 
    
   Presentation-based Interaction: A presentation-based UIC will 
   present an application-supplied user interface (or simply 
   application-supplied information) to the user.  A presentation-based 
   component will also commonly allow a user to interact directly with 
   the supplied interface through stimulus-based methods (UAIs).  An 
   example is a web-page window & pointing device or simply a display 
   screen with no associated input device. 
    
   Media-based Interaction: Media-based UI components allow bi-
   directional or unidirectional interaction through the media plane, 
   for instance, a speaker or a microphone (unidirectional) or a 
   speaker & microphone combination (bi-directional).  Media-based UI 
   components may present application-supplied user interfaces or 
   information to the user; however, these components do not generate 
   discrete user activity indications and merely relay un-interpreted 


Culpepper/Fairlie-Cuninghame                                   [Page 3] 

Internet Draft    Application Interaction Requirements      May 1, 2002 
 
   media streams to/from the application.  This framework does not 
   alter the normal SIP session semantics but simply allows the media-
   based SIP session to be associated with a UI component within a 
   virtual user interface. 
    
   Input-based Interaction: Input-based interaction refers to user 
   input supplied via UICs that do not present an application-supplied 
   interface to the user but rather correspond to a (usually physical) 
   interface possessed by the device, for instance, a dialpad or 
   keyboard.  Input-based UI components generate User Activity 
   Indications in response to user actions. 
    
3. End-to-end Verses Asynchronous User Activity Indications 
    
   The end-to-end user activity indications currently supported in IP 
   networks require "workarounds" in SIP networks so that applications 
   along the session signaling path have access to the indications.  
   The current solution requires "DTMF forking" be supported by the 
   endpoint, or requires the receiving entity to re-generate the 
   indication towards the destination.  In many scenarios, the 
   indications meant for the service application are not used at the 
   destination. 
    
   User activity indications needed for application interaction on the 
   other hand, are only needed between an endpoint/user and the 
   application within the network.  Using end-to-end mechanisms for 
   application interaction, when the application is not itself an 
   endpoint in the session, is problematic as indicated above. 
    
4. Application Interaction Models 
    
4.1. Presentation-based Interaction  
 
   Refer to the description of Presentation-based Interaction in 
   Section 2. 
   
4.2. Media-based Interaction  
 
   Refer to the description of Media-based Interaction in Section 2. 
 
4.3. Input-based Interaction 
 
   As defined in Section 2, Input-based Interaction occurs through the 
   generation of User Activity Indications from user actions.  Two 
   models have been proposed for how these indications should represent 
   the user actions. 
 
4.3.1. High-level Interaction 
    
   In this model an end device has embedded application-specific 
   knowledge and configuration.  Rather than interacting through a set 
   of key presses, the interaction occurs through an application-


Culpepper/Fairlie-Cuninghame                                   [Page 4] 

Internet Draft    Application Interaction Requirements      May 1, 2002 
 
   specific set of operations, for instance, "Go left", "Go right", and 
   "Jump".  The device or network must store a mapping from the actual 
   device interface to the application-specific operations. 
    
   An alternative way of viewing this form of interactions is that the 
   set of operations are simply application-specific stimuli. 
    
   Advantages: 
    
     - Application interaction is independent of the device's actual 
       interface. 
    
     - Automatons can be used more easily to interact with supported 
       applications. 
    
   Disadvantages: 
 
     - End devices are forced to incorporate application-specific 
       knowledge or configuration in order to be able to use a service 
       - this severely restricts the development and deployment of 
       future applications. 
 
     - Local devices are required to create and store the local mapping 
       between the user interface and the application-specific stimuli 
       (or the mapping must be stored in the network somewhere). 
 
          The disadvantages of this approach are as great as the 
          advantages.  It is unacceptable to require that a device 
          incorporates application-specific knowledge to be able to use 
          a service - such a design principle is against the principles 
          of the IETF.  For this reason this method of interaction is 
          NOT sufficient. 
           
          Another drawback of this approach is that it does not 
          intrinsically encourage application interoperability.  This 
          method of application interaction has been suggested by a 
          number of people at IETF meetings; however, the IETF working 
          group needs to decide whether or not this method of 
          interaction should be encouraged. 
    
4.3.2. Low-level Interaction 
    
   In this model an application uses the mechanism to determine the 
   makeup of a device's user interface and then interactions are driven 
   though user interface associated stimuli. 
    
   Advantages: 
 
     - Devices do not need to incorporate application-specific 
       information or configuration. 
 
     - An application can adapt its operation to best suit the 


Culpepper/Fairlie-Cuninghame                                   [Page 5] 

Internet Draft    Application Interaction Requirements      May 1, 2002 
 
       interface that a device possesses. 
 
          For instance, this may mean greater reliance on non-tactile 
          interface based methods such as voice recognition.  The 
          application is the only entity qualified to make these sorts 
          of decisions. 
           
     - In most instances, there is always some level of application 
       interaction possible, albeit perhaps through a less graceful 
       interface. 
    
   Disadvantages:  
    
     - User interface widgets cannot automatically be used with an 
       application if the application does not recognize the widget 
       (although this can be sidestepped by using local mapping 
       configuration). 
    
4.3.3. Input-based Interaction Summary 
 
   These two schemes are not mutually exclusive and the benefits of 
   both can be obtained.  A device can utilize an enhanced level of 
   interaction when interacting with an application that the device (or 
   network) has knowledge and/or configuration for; likewise, an 
   application can fall back to low-level interaction if the device (or 
   network) does not possess the require application-specific 
   knowledge. 
    
          In an extensible input-based interaction framework it is 
          difficult to prevent either interaction model, however, 
          authors may find the IETF reluctant to standardize 
          application-specific interfaces (which is similar to the 
          standardization of services). 
    
5. Requirements 
    
   R1:  The mechanism must support collecting device/user input 
        generated in the context of a SIP dialog or conversation-space. 
         
   R2:  The mechanism must transport user activity indications to 
        network elements independently of the media plane. 
 
   R3:  The transport mechanism must be sensitive to the limited 
        bandwidth constraints of some signaling planes; for instance, 
        reliability through blind retransmission is not acceptable. 
    
   R4:  The mechanism must support multiple network entities requesting 
        and receiving indications from a SIP UA independently of each 
        other. 
 
   R5:  The protocol mechanism must provide a means for a network 
        entity to indicate its desire to receive user activity 


Culpepper/Fairlie-Cuninghame                                   [Page 6] 

Internet Draft    Application Interaction Requirements      May 1, 2002 
 
        indications and/or to present an application interface on the 
        User's UA.  The protocol mechanism must provide a means for a 
        SIP UA receiving a request to respond with its 
        capability/intent to provide the requested services. 
 
   R6:  The mechanism must provide a means whereby the Requestor can 
        indicate its desire to only receive a subset of the supported 
        UAIs possible for any UI component. 
 
   R7:  User activity indications must not be generated unless 
        implicitly or explicitly requested by an entity. 
    
   R8:  The mechanism should support devices with a wide range of user 
        interfaces for both the presentation-based and input-based 
        interaction modes, for instance, it must support devices that 
        possess a display UI component, as well as those that do not; 
        from devices that only have physical buttons to those that only 
        have display-based pointing devices. 
 
   R9:  For key-based indications, the mechanism must accommodate 
        devices with keypads/keyboards that range from very simple 
        keypads to generic computer keyboards; the mechanism must also 
        support the reporting of device- and/or user- specific buttons. 
 
   R10: The mechanism must be extensible so that some non key-based 
        user activity indications can be supported now or in the 
        future, for instance, sliders, dials, switches, local voice-
        commands, hyperlinks, biometrics, etc. 
 
   R11: A requestor must be able to determine the makeup/contents of 
        the user interface possessed by a user device.  Specifically, 
        to determine the user interface components that are available 
        for application use and the user activity indications that are 
        supported by each component. 
 
   R12: The mechanism must support reliable delivery of UAIs at least 
        as good as the session control protocol. 
 
   R13: For key-based indications, the mechanism must provide some form 
        of indication of key press duration. 
         
   R14: For key-based indications, the mechanism must provide some form 
        of indication of a key-press' occurrence in time relative to 
        other key presses. 
 
   R15: The mechanism must ensure that the receiver of UAIs (i.e., the 
        Requestor) can determine their original order of occurrence and 
        detect any missing indications. 
 
   R16: The mechanism must allow for end-to-end security/privacy 
        between the Requestor and Reporter.  Specifically, the 
        mechanism must allow the Reporter (if desired) to ensure that 
        transmitted user activity indications can only be viewed by the 
 
Culpepper/Fairlie-Cuninghame                                   [Page 7] 

Internet Draft    Application Interaction Requirements      May 1, 2002 
 
        Requestor. 
 
   R17: The Reporter must be able to identify and authenticate the 
        Requestor for each user interface component.  Specifically, in 
        the case where the Requestor is an Application Entity, the User 
        must be able to identify the application name & instance.  An 
        application instance consists of the application type [e.g., 
        applications name, version & application designer name] and 
        application instance [e.g., instance identifier & service 
        provider's identity]. 
         
   R18: The mechanism must support the ability for multiple virtual 
        user interfaces to be associated with the same user session.  
        Each virtual user interface may be associated with the same or 
        different applications.  For example, a user may want to 
        interact with a voice-recording application and a prepaid 
        calling application within the same call but allow each 
        application to use a different virtual user interface. 
 
   R19: The mechanism must support the ability for multiple 
        applications to explicitly cooperate within the same virtual 
        user interface.  Specifically, each application may be 
        associated with different UI components within the same virtual 
        user interface. 
         
   R20: The mechanism must allow user interface components created 
        through this mechanism to be updated or removed as desired by 
        the creating application entity. 
      
   R21: The mechanism should not require the acknowledgement of a 
        transmitted UAI before subsequent UAIs can be transmitted. 
    
6. Desirables 
    
   D1:  The mechanism should be simple to implement and execute on 
        devices with simple interfaces. 
    
   D2:  There should be a separation between the transport mechanism in 
        the signaling plane and the message syntax. 
 
   D3:  The mechanism should allow a device to indicate relative 
        preferences amongst the various user interface components. 
 
7. Acknowledgements 
    
   The authors would like to acknowledge the detailed comments and 
   additions to this document by Jonathan Rosenberg of Dynamicsoft, 
   Inc. 
    
8. Authors 
    
   Robert Fairlie-Cuninghame 
   Nuera Communications, Inc. 
 
Culpepper/Fairlie-Cuninghame                                   [Page 8] 

Internet Draft    Application Interaction Requirements      May 1, 2002 
 
   50 Victoria Rd 
   Farnborough, Hants GU14-7PG 
   United Kingdom 
   Phone: +44-1252-548200 
   Email: rfairlie@nuera.com 
    
   Bert Culpepper 
   InterVoice-Brite, Inc. 
   701 International Parkway 
   Heathrow, FL  32746 
   Phone: 407-357-1536 
   Email: bert.culpepper@intervoice-brite.com 
    
9. References 
                     
   1  S. Bradner, "The Internet Standards Process -- Revision 3", BCP 
      9, RFC 2026, October 1996. 
    
   2  J. Rosenberg, H. Schulzrinne, et. al., "SIP: Session Initiation 
      Protocol", draft-ietf-sip-rfc2543bis-09, Work in progress, 
      February 2002. 


Culpepper/Fairlie-Cuninghame                                   [Page 9]