Network Working Group C. Boulton Internet-Draft Ubiquity Software Corporation Expires: September 9, 2006 T. Melanchuk BlankSpace S. McGlashan Hewlett-Packard A. Shiratzky Radvision March 8, 2006 A VoiceXML Interactive Voice Response (IVR) Control Package for the Session Initiation Protocol (SIP) draft-boulton-ivr-vxml-control-package-00 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on September 9, 2006. Copyright Notice Copyright (C) The Internet Society (2006). Abstract This document defines a Session Initiation (SIP) Control Package for Interactive Voice Response (IVR) interaction using VoiceXML. This Boulton, et al. Expires September 9, 2006 [Page 1] Internet-Draft Media Server Control Package March 2006 Control Package provides IVR functionality using the SIP Control Framework [9] and extends the Basic IVR control package [10] with support for VoiceXML dialogs. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 3 3. Control Package Definition . . . . . . . . . . . . . . . . . . 4 3.1. Control Package Name . . . . . . . . . . . . . . . . . . . 4 3.2. Common XML Support . . . . . . . . . . . . . . . . . . . . 4 3.3. Framework Message Usage . . . . . . . . . . . . . . . . . 4 3.4. CONTROL Message Body . . . . . . . . . . . . . . . . . . . 5 3.4.1. dialogprepare . . . . . . . . . . . . . . . . . . . . 5 3.4.2. dialogstart . . . . . . . . . . . . . . . . . . . . . 7 3.4.3. dialoguser . . . . . . . . . . . . . . . . . . . . . . 9 3.4.4. dialogterminate . . . . . . . . . . . . . . . . . . . 10 3.5. REPORT Message Body . . . . . . . . . . . . . . . . . . . 10 3.5.1. dialogprepared . . . . . . . . . . . . . . . . . . . . 11 3.5.2. dialogstarted . . . . . . . . . . . . . . . . . . . . 11 3.5.3. dialogexit . . . . . . . . . . . . . . . . . . . . . . 11 3.5.4. dialoguser . . . . . . . . . . . . . . . . . . . . . . 12 3.5.5. Error Messages . . . . . . . . . . . . . . . . . . . . 12 4. Namelist . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5. Formal Syntax . . . . . . . . . . . . . . . . . . . . . . . . 13 6. Security Considerations . . . . . . . . . . . . . . . . . . . 18 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 7.1. Control Package Registration . . . . . . . . . . . . . . . 19 7.2. URN Sub-Namespace Registration . . . . . . . . . . . . . . 19 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 19 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 9.1. Normative References . . . . . . . . . . . . . . . . . . . 19 9.2. Informative References . . . . . . . . . . . . . . . . . . 19 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21 Intellectual Property and Copyright Statements . . . . . . . . . . 22 Boulton, et al. Expires September 9, 2006 [Page 2] Internet-Draft Media Server Control Package March 2006 1. Introduction The SIP Control Framework [9] provides a generic approach for establishment and reporting capabilities of remotely initiated commands. The Framework utilizes many functions provided by the Session Initiation Protocol [3] (SIP) for the rendezvous and establishment of a reliable channel for control interactions. The Control Framework also introduces the concept of a Control Package. A Control Package is an explicit usage of the Control Framework for a particular interaction set. This specification defines a package for IVR functions using VoiceXML dialogs [12]. As a recognized international standard for IVR dialogs, VoiceXML is used extensively within media server control languages (cf. [14], [11], [13], [8], [7]). To ensure interoperability, if a media server supports this package, then it MUST support VoiceXML 2.0 dialog scripts. It MAY support other dialog script formats. The VoiceXML package extends the basic IVR control package ([10]). The extensions only affect the and elements: in particular, dialog scripts may also be specified inline using a element the default value for the type attribute is "application/ voicexml+xml" HTTP fetching and caching of dialog scripts can be configured using attributes of an element Otherwise, this package follows precisely the syntax and semantics of the basic IVR control package. Other control packages may be defined which extend the capabilities of the control package defined in this document. Such control package must respect the syntax and semantics of this control package. 2. Conventions and Terminology In this document, BCP 14/RFC 2119 [1] defines the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL". In addition, BCP 15 indicates requirement levels for compliant implementations. The following additional terms are defined for use in this document: Boulton, et al. Expires September 9, 2006 [Page 3] Internet-Draft Media Server Control Package March 2006 Dialog: A dialog performs media interaction with a user. A dialog is identified by a URI and has an associated mimetype. Dialogs typically feature basic capabilities such as playing audio prompts, collecting DTMF input and recording audio input from the user. More advanced dialogs may also feature synthesized speech, recording and playback of video, recognition of spoken input, and mixed initiative conversations. Application server: A SIP [3] application server (AS) hosts and executes services such as interactive media and conferencing in an operator's network. An AS influences and impacts the SIP session, in particular by terminating SIP sessions on a media server, which is under its control. Media Server: A media server (MS) processes media streams on behalf of an AS by offering functionality such as interactive media, conferencing, and transcoding to the end user. Interactive media functionality is realized by way of dialogs, which are identified by a URI and initiated by the application server. 3. Control Package Definition This section fulfills the mandatory requirement for information that MUST be specified during the definition of a Control Framework Package, as detailed in Section 8 of [9]. 3.1. Control Package Name The Control Framework requires a Control Package definition to specify and register a unique name. The name of this Control Package is "msc-ivr-vxml" (Media Server Control - Interactive Voice Response - VoiceXML). This value appears in the 'Control-Packages' SIP header that is present in the INVITE dialog request that creates the control channel, as specified in [9]. 3.2. Common XML Support The Control Framework requires a Control Package definition to specify if the attributes for media dialog or conference references are required. This package requires that the XML Schema in Section 12.1 of [9] MUST be supported. 3.3. Framework Message Usage IVR functionality includes capabilities such as playing a prompt, recording user input, collecting DTMF, TTS, ASR and other media-based Boulton, et al. Expires September 9, 2006 [Page 4] Internet-Draft Media Server Control Package March 2006 processing. These functions are expressed in VoiceXML dialogs. The AS can send the following CONTROL messages to the MS: : prepare a dialog for later execution : execute a dialog (as defined or previously prepared) : send a user-defined message to an active dialog : terminate a dialog (prepared or started) The MS response is specified in responses and/or REPORT messages. The precise response is depend on the IVR dialog state, and the contents of the control message. If an XML message is not well- formed or invalid according to the schema in Section 5, then 4XX response is generated. For the command, the response is a (terminate) REPORT with message (if the dialog was prepared successfully) or with message (if there was an error preparing the dialog). For the command, the response is an (update) REPORT with message (if the dialog was started successfully), then zero or more (update) REPORT messages (reporting information gathered during the dialog) and finally a (terminate) REPORT with a message. If the dialog does not start, the response is a (terminate) REPORT with a message. For the command, the response is 200 if the message is understood. For the command, the response is 200 if the command is understood. The MS can send following CONTROL message to the AS: : send a user-defined message from an active dialog The AS responds with a 200 response if the message was understood. 3.4. CONTROL Message Body A valid CONTROL body message MUST conform to the schema defined in Section 5. 3.4.1. dialogprepare The request is sent from the AS to the MS to request preparation of an IVR dialog. A prepared dialog is executed when the AS sends a request referencing the prepared dialog (see Boulton, et al. Expires September 9, 2006 [Page 5] Internet-Draft Media Server Control Package March 2006 Section 3.4.2). A element has the following attributes: src: string identifying the URI of the dialog document to prepare. The parameter is optional. The MS MUST support VoiceXML dialogs and MAY support other dialog types. type: string identifying the MIME type of the document. The default value is "application/voicexml+xml". The attribute is optional. The element has the following child elements: : an XML data structure (see Section 4) to pass parameters into the dialog. The element is optional. : contains a list of one or more elements where each item element has mandatory name and value attributes. The element is optional. The AS uses this element to subscribe to events generated by the MS. Notifications of dialog events are delivered using a REPORT (see Section 3.4.3). If the MS does not support a specific event notification to which the AS subscribes, then the MS MUST ignore the individual . This protocol does not require the MS to support any specific event notifications, but the MS MAY support notification events such as "dtmf" (indicating that a DTMF key has been pressed), or "tone" (indicating that a tone has been detected), "audiostart" (audio playback has started), "bargein" (user has barged in), "mark" (a mark has been encountered in the output stream), "goto" (dialog has transitioned to another location), and so forth. : contains the dialog script itself; e.g. a VoiceXML document. The element is optional. : contains attributes to configure HTTP 1.1 [2] fetching and caching. maxage: string defining a time interval according to the max-age parameter in HTTP. The attribute is optional. maxstale: string defining a time interval according to the max- stale parameter in HTTP. The attribute is optional. enctype: string identifying the encoding type of the submitted document. The default value is "application/ x-www-form-url-encoded". The attribute is optional. method: string indicating the HTTP method to use. Permitted values are "post" or "get". The default value is "get". The attribute is optional. The element is optional. Exactly one of the src attribute or the element MUST be specified; otherwise, it is an error. For example, a request to prepare a dialog where the dialog script is indicated using the src attribute: Boulton, et al. Expires September 9, 2006 [Page 6] Internet-Draft Media Server Control Package March 2006 Where the namelist parameter "audio" would be available in the VoiceXML script as "connection.ccxml.values.audio" so different prompts can be played using the same dialog script. In the following example, the VoiceXML dialog script is specified inline.
When an MS has received a request, it MUST reply with a or REPORT message. 3.4.2. dialogstart The element is sent by the AS to request execution of a dialog. The dialog may be defined in the dialogstart request itself, or reference a previously prepared dialog. The element has the following attributes: src: string identifying the URI of the dialog document to start. The attribute is optional. The MS MUST support VoiceXML dialogs and MAY support other dialog types. type: string identifying the MIME type of the document. The default value is "application/voicexml+xml". The attribute is optional. prepareddialogid: string identifying a dialog previously prepared using a dialogprepare request. The attribute is optional. connection-id: string identifying the SIP dialog connection on which this dialog is to be started (see Section 12.1 of [9]). The attribute is optional. Boulton, et al. Expires September 9, 2006 [Page 7] Internet-Draft Media Server Control Package March 2006 conf-id: string identifying the conference on which this dialog is to be started (see Section 12.1 of [9]). The attribute is optional. Exactly one of the connection-id or conf-id attributes MUST be specified. It is an error to specify both connection-id and conf-id. The element has the following child elements defined: : an XML data structure (see Section 4) to pass parameters into the dialog. The element is optional. : contains a list of one or more elements where each item element has mandatory name and value attributes. The element is optional. The AS uses this element to subscribe to events generated by the MS. Notifications of dialog events are delivered using REPORT (see Section 3.4.3). If the MS does not support a specific event notification to which the AS subscribes, then the MS MUST ignore the individual . This protocol does not require the MS to support any specific event notifications, but the MS MAY support notification events such as "dtmf" (indicating that a DTMF key has been pressed), or "tone" (indicating that a tone has been detected), "audiostart" (audio playback has started), "bargein" (user has barged in), "mark" (a mark has been encountered in the output stream), "goto" (dialog has transitioned to another location), and so forth. : contains the dialog script itself; e.g. a VoiceXML document. The element is optional. : contains attributes to configure HTTP 1.1 [2] fetching and caching. maxage: string defining a time interval according to the max-age parameter in HTTP. The attribute is optional. maxstale: string defining a time interval according to the max- stale parameter in HTTP. The attribute is optional. enctype: string identifying the encoding type of the submitted document. The default value is "application/ x-www-form-url-encoded". The attribute is optional. method: string indicating the HTTP method to use. Permitted values are "post" or "get". The default value is "get". The attribute is optional. The element is optional. If the prepareddialogid is not specified, exactly one of the src attribute or the element MUST be specified; otherwise, it is an error. If the prepareddialogid is specified, it is an error to specify the src attribute, src element or the type attribute. If the prepareddialogid is specified and the Boulton, et al. Expires September 9, 2006 [Page 8] Internet-Draft Media Server Control Package March 2006 contained a element, it is an error to specify it in . Likewise, If the prepareddialogid is specified and the contained a element, it is an error to specify it in . For example, a request to start a dialog on a conference where the dialog script is indicated using the src attribute: Where the namelist parameter "media" would be available in the VoiceXML script as "connection.ccxml.values.media" so different prompts can be played using the same dialog script. In the following example, the VoiceXML dialog script is specified inline.
In this example, a previously prepared dialog with the dialogid "vxi1" is started. When an MS has received a request, it MUST reply with a or REPORT message. 3.4.3. dialoguser During execution of a dialog, a CONTROL can be used to pass asynchronous, user-defined events from the AS to the MS, or vice versa from the MS to the AS. Boulton, et al. Expires September 9, 2006 [Page 9] Internet-Draft Media Server Control Package March 2006 The MS is not required to support receiving or sending asynchronous events. If it does not support receiving asynchronous events, a 4XX response will be returned instead of 200. The element has the following attributes: name: string indicating the name of event. The string is restricted to a sequence of alphanumeric or "." characters. The attribute is mandatory. dialogid: string identifying the dialog. The attribute is mandatory. A element has the following child element: : an XML data structure (see Section 4) to pass information from the AS to the dialog. The element is optional. For example, the AS sends the MS information which may be announced to the user in the dialog identified as "vxi1": 3.4.4. dialogterminate A dialog that has been prepared or has been started can be terminated by a request element from the AS. The element has the following attributes: dialogid: string identifying the dialog. The attribute is mandatory. immediate: string with the values "true" or "false" indicating whether the dialog is to be terminated immediately or not. The default is "false". The attribute is optional. For example, assuming a dialog with the dialogid "vxi1" has been started, it can be terminated immediately with the following request: The request causes execution of the dialog to be terminated. If the request is for immediate termination, then the MS sends a 200 response. If the request is for non-immediate termination, then the MS send a REPORT (or a failure message). 3.5. REPORT Message Body A valid REPORT body MUST conform to the schema defined in Section 5. Boulton, et al. Expires September 9, 2006 [Page 10] Internet-Draft Media Server Control Package March 2006 3.5.1. dialogprepared The element has following attributes: dialogid: string identifying the dialog. The MS assigns a globally unique identifier for this dialog and reuses it in subsequent references to the dialog; for example, as the prepareddialogid in and in dialog notifications. The attribute is mandatory. For example, a response when the dialog was prepared successfully: 3.5.2. dialogstarted The element has the following attributes: dialogid: string identifying the dialog. If prepareddialogid is specified in the request, then dialogid MUST have the same value. If prepareddialogid is not specified, then the MS assigns a globally unique identifier for this dialog and reuses it in subsequent references to the dialog; for example, in dialog notifications. The attribute is mandatory. [Editors Note: do we want to allow dialog names to be defined by the AS?] For example, a response when the dialog was started successfully. 3.5.3. dialogexit The element has the following attributes: dialogid: string identifying the dialog. The attribute is mandatory. The element has the following child element: : an XML data structure (see Section 4) to pass information from the dialog to the AS. The element is optional. For example, the dialog exits without data being returned: The dialog exits and data is returned to the AS: Boulton, et al. Expires September 9, 2006 [Page 11] Internet-Draft Media Server Control Package March 2006 3.5.4. dialoguser The element in a REPORT message can provide asychronous user-defined information to the MS during execution of a dialog. The element has the following attributes: name: string indicating the name of event. The string is restricted to a sequence of alphanumeric or "." characters. The attribute is mandatory. dialogid: string identifying the dialog. The attribute is mandatory. A element has the following child element: : an XML data structure (see Section 4) to pass information from the AS to the dialog. The element is optional. For example, the MS sends the AS a midcall update on data collected so far: [Editors note: Since is available as a CONTROL message, it may not be necessary as REPORT message.] 3.5.5. Error Messages [Editors Note: These message may be restructured as a general error element with a type attribute (e.g. type="dialognotprepared").] The element has following attributes: dialogid: string identifying the dialog. The attribute is mandatory. reason: string specifying the reason why dialog preparation failed. The attribute is optional. For example, a response when dialog preparation failed due to an HTTP 404 error when fetching a VoiceXML script: Boulton, et al. Expires September 9, 2006 [Page 12] Internet-Draft Media Server Control Package March 2006 The element has the following attributes: dialogid: string identifying the dialog. The attribute is mandatory. reason: string specifying the reason why the dialog failed to start. The attribute is optional. For example, a response when dialog failed to start due to a resource error: 4. Namelist The element is a container for parameter data. Each parameter is specified using a top-level element. The name of the parameter is specified in a "name" attribute with a non- empty string value. A simple value for a parameter is specified using a "value" attribute with a string value. For example: Multiple value parameters, such as a list of prompt URIs, can be specified using space separation. For example: [Editors Note: we may also want to investigate the use of s nested within a top-level to specify complex values. ] 5. Formal Syntax [Editors note: A later version of the XML schema may be reference the basic IVR schema and specify the package extensions in terms of schema extensions. ] Boulton, et al. Expires September 9, 2006 [Page 13] Internet-Draft Media Server Control Package March 2006 [Editors note: A later version of the XML schema will provide more constraints as expressed in the textual definitions; for example, single occurrence of elements, co-occurence on attributes, etc.] VoiceXML IVR 1.0 schema (20060308) Boulton, et al. Expires September 9, 2006 [Page 14] Internet-Draft Media Server Control Package March 2006 Boulton, et al. Expires September 9, 2006 [Page 15] Internet-Draft Media Server Control Package March 2006 Boulton, et al. Expires September 9, 2006 [Page 16] Internet-Draft Media Server Control Package March 2006 Boulton, et al. Expires September 9, 2006 [Page 17] Internet-Draft Media Server Control Package March 2006 6. Security Considerations Security Considerations to be included in later versions of this document. Boulton, et al. Expires September 9, 2006 [Page 18] Internet-Draft Media Server Control Package March 2006 7. IANA Considerations This document registers a new SIP Control Framework Package and a new XML namespace. 7.1. Control Package Registration Control Package name: msc-ivr-vxml 7.2. URN Sub-Namespace Registration TODO: urn:ietf:params:xml:ns:msc-ivr-vxml 8. Acknowledgments TODO 9. References 9.1. Normative References [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 9.2. Informative References [2] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. [3] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [4] Rosenberg, J. and H. Schulzrinne, "Reliability of Provisional Responses in Session Initiation Protocol (SIP)", RFC 3262, June 2002. [5] Rosenberg, J. and H. Schulzrinne, "Session Initiation Protocol (SIP): Locating SIP Servers", RFC 3263, June 2002. [6] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002. [7] Burger, E., Van Dyke, J., and A. Spitzer, "Basic Network Media Services with SIP", RFC 4240, December 2005. Boulton, et al. Expires September 9, 2006 [Page 19] Internet-Draft Media Server Control Package March 2006 [8] Van Dyke, J., Burger, E., and A. Spitzer, "Media Server Control Markup Language (MSCML) and Protocol", draft-vandyke-mscml-06 (work in progress), December 2004. [9] Boulton, C., Melanchuk, T., McGlashan, S., and A. Shiratzky, "A Control Framework for the Session Initiation Protocol (SIP)", draft-boulton-sip-control-framework-01 (work in progress), March 2006. [10] Boulton, C., Melanchuk, T., McGlashan, S., and A. Shiratzky, "A Basic Interactive Voice Response (IVR) Control Package for the Session Initiation Protocol (SIP)", draft-boulton-ivr-control-package-01 (work in progress), March 2006. [11] Auburn, R J., "Voice Browser Call Control: CCXML Version 1.0", W3C Working Draft (work in progress), June 2005. [12] McGlashan, S., Burnett, D., Carter, J., Danielsen, P., Ferrans, J., Hunt, A., Lucas, B., Porter, B., Rehor, K., and S. Tryphonas, "Voice Extensible Markup Language (VoiceXML) Version 2.0", W3C Recommendation, March 2004. [13] Melanchuk, T., "Media Session Markup Language (MSML)", draft-melanchuk-sipping-msml-07 (work in progress), November 2005. [14] McGlashan, S., Auburn, R., Burke, D., Candell, E., and R. Surapaneni, "Media Server Control Protocol (MSCP)", draft-mcglashan-mscp-01 (work in progress), January 2006. Boulton, et al. Expires September 9, 2006 [Page 20] Internet-Draft Media Server Control Package March 2006 Authors' Addresses Chris Boulton Ubiquity Software Corporation Building 3 Wern Fawr Lane St Mellons Cardiff, South Wales CF3 5EA Email: cboulton@ubiquitysoftware.com Tim Melanchuk BlankSpace Email: tim.melanchuk@gmail.com Scott McGlashan Hewlett-Packard Gustav III:s boulevard 36 SE-16985 Stockholm, Sweden Email: scott.mcglashan@hp.com Asher Shiratzky Radvision 24 Raoul Wallenberg st Tel-Aviv, Israel Email: ashers@radvision.com Boulton, et al. Expires September 9, 2006 [Page 21] Internet-Draft Media Server Control Package March 2006 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Boulton, et al. Expires September 9, 2006 [Page 22]