D. Burke Internet-Draft Voxpilot Expires: June 11, 2006 M. Scott VoiceGenie J. Haynie Vocalocity R. Auburn Voxeo S. McGlashan Hewlett-Packard December 8, 2005 SIP Interface to VoiceXML Media Services draft-burke-vxml-00.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on June 11, 2006. Copyright Notice Copyright (C) The Internet Society (2005). Abstract This document describes a SIP interface to VoiceXML media services, Burke, et al. Expires June 11, 2006 [Page 1] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 which is commonly employed between application servers and media servers offering VoiceXML processing capabilities. Comments Comments are solicited and should be addressed to the authors. Burke, et al. Expires June 11, 2006 [Page 2] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1. IVR Services with Application Servers . . . . . . . . 4 1.1.2. PSTN IVR Service Node . . . . . . . . . . . . . . . . 5 1.1.3. 3GPP IMS Media Resource Function (MRF) . . . . . . . . 6 1.1.4. CCXML <-> VoiceXML Interaction . . . . . . . . . . . . 6 1.1.5. Other Use Cases . . . . . . . . . . . . . . . . . . . 7 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 2. VoiceXML Session Establishment and Termination . . . . . . . . 8 2.1. Service Identification . . . . . . . . . . . . . . . . . . 8 2.2. Initiating a VoiceXML Session . . . . . . . . . . . . . . 9 2.3. Preparing a VoiceXML Session . . . . . . . . . . . . . . . 11 2.4. Terminating a VoiceXML Session . . . . . . . . . . . . . . 11 2.5. Session Variable Mappings . . . . . . . . . . . . . . . . 11 2.6. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.6.1. Basic Session Establishment . . . . . . . . . . . . . 13 2.6.2. VoiceXML Session Preparation . . . . . . . . . . . . . 14 2.6.3. MRCP Establishment . . . . . . . . . . . . . . . . . . 15 3. Media Support . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1. Offer/Answer . . . . . . . . . . . . . . . . . . . . . . . 18 3.2. Early Media . . . . . . . . . . . . . . . . . . . . . . . 18 3.3. Modifying the Media Session . . . . . . . . . . . . . . . 19 3.4. Audio and Video Codecs . . . . . . . . . . . . . . . . . . 19 3.5. DTMF . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4. Returning Data to the Application Server . . . . . . . . . . . 21 4.1. HTTP Mechanism . . . . . . . . . . . . . . . . . . . . . . 21 4.2. SIP Mechanism . . . . . . . . . . . . . . . . . . . . . . 21 5. Outbound Calling . . . . . . . . . . . . . . . . . . . . . . . 23 5.1. Third Party Call Control Mechanism . . . . . . . . . . . . 23 5.2. REFER Mechanism . . . . . . . . . . . . . . . . . . . . . 23 6. Call Transfer . . . . . . . . . . . . . . . . . . . . . . . . 25 6.1. Blind . . . . . . . . . . . . . . . . . . . . . . . . . . 25 6.2. Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6.3. Consultation . . . . . . . . . . . . . . . . . . . . . . . 29 7. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 31 8. Security Considerations . . . . . . . . . . . . . . . . . . . 32 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 33 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 34 10.1. Normative References . . . . . . . . . . . . . . . . . . . 34 10.2. Informative References . . . . . . . . . . . . . . . . . . 36 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 37 Intellectual Property and Copyright Statements . . . . . . . . . . 38 Burke, et al. Expires June 11, 2006 [Page 3] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 1. Introduction VoiceXML [VXML20], [VXML21] is a World Wide Web Consortium (W3C) standard for creating audio and video dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of audio and video, telephony, and mixed initiative conversations. VoiceXML allows Web-based development and content delivery paradigms to be used with interactive video and voice response applications. This document describes a SIP [RFC3261] interface to VoiceXML media services, which is commonly employed between application servers and media servers offering VoiceXML processing capabilities. SIP is responsible for initiating a media session to the VoiceXML media server and simultaneously triggering the execution of a specified VoiceXML application. The interface described here owes its genesis to the draft [SIPVXML] and leverages a mechanism for identifying dialog media services described in [NETANN]. The interface has been updated and extended to support the W3C Recommendation for VoiceXML 2.0 [VXML20] and VoiceXML 2.1 [VXML21]. A set of commonly implemented functions and extensions have been specified including VoiceXML dialog preparation, outbound calling, video media support, and transfers. VoiceXML session variable mappings have been defined for SIP with an extensible mechanism for passing application-specific values into the VoiceXML application. Mechanisms for returning data to the Application Server have also been added. 1.1. Use Cases The VoiceXML media service user in this document is generically referred to as an Application Server. In practice, it is intended that the interface defined by this document is applicable across a wide range of use cases. Several intended use cases are described below. 1.1.1. IVR Services with Application Servers Application Servers provide services to users of the network. Typically, there may be several Application Servers in the same network, each specialised in providing a particular service. Throughout this specification and without loss of generality, we posit the presence of an Application Server specialised in providing IVR services. A typical configuration for this use case is illustrated below. Burke, et al. Expires June 11, 2006 [Page 4] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 +--------------+ | | | Application |\ | Server | \ | | \ HTTP SIP +--------------+ \ / \ \ +-------------+ / SIP \ +--------------+ | |/ \| | | SIP | | VoiceXML | | User Agent | RTP | Media Server | | |=====================| | +-------------+ +--------------+ Consistent with the Web model, the VoiceXML application may reside directly on the Application Server and is served up via HTTP [RFC2616]. Note, however, that the application is not required to reside on a single Application Server since the web model allows the VoiceXML application to be hosted on a separate (HTTP) Application Server from the (SIP) Application Server that interacts with the VoiceXML Media Server via this specification. It is also possible for a static VoiceXML application to be stored locally on the VoiceXML Media Server, leveraging the VoiceXML 2.1 [VXML21] mechanism to interact with a Web/Application Server when dynamic behavior is required. The viability of static VoiceXML applications is further enhanced by the mechanisms defined in section 2.5, through which the Application Server can make session-specific information available within the VoiceXML session context. 1.1.2. PSTN IVR Service Node While this document is intended to enable enhanced use of VoiceXML as a component of larger systems and services, it is intended that devices that are completely unaware of this specification but that support [NETANN] remain capable of invoking VoiceXML services offered by a VoiceXML Media Server compliant with this document. A typical configuration for this use case is as follows: +-------------+ SIP +--------------+ | |---------------------| | | IP/PSTN | | VoiceXML | | Gateway | RTP | Media Server | | |=====================| | +-------------+ +--------------+ Note also that beyond the invocation and termination of a VoiceXML dialog, the semantics defined for call transfers using REFER are Burke, et al. Expires June 11, 2006 [Page 5] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 intended to be compatible with standard, existing IP/PSTN gateways. 1.1.3. 3GPP IMS Media Resource Function (MRF) The 3GPP IP Multimedia Subsystem (IMS) [TS23002] defines a Media Resource Function (MRF) used to offer media processing services such as conferencing, transcoding, and prompt/collect. The capabilities offered by VoiceXML are ideal for offering richer media processing services in the context of the MRF. In this architecture, the interface defined here corresponds to the "Mr" interface to the MRFC; the implementation of this interface might use separated MRFC and MRFP elements (as per the IMS architecture), or might be an integrated MRF (as is common practice). +----------+ | App | | Server | +----------+ | | SIP (ISC) | +----------+ SIP (Mr) +--------------+ | S-CSCF |---------------| VoiceXML | | | | MRF | +----------+ +--------------+ || || RTP (Mb) || The above diagram is highly simplified and shows a subset of nodes typically involved in MRF interactions. It should be noted that while the MRF will primarily be used by the Application Server via the S-CSCF, filter criteria on the S-CSCF could route calls directly to the MRF independently of Application Servers. Although the above is described in terms of the 3GPP IMS architecture, it is intended that it is also applicable to 3GPP2, NGN, and PacketCable architectures that are converging with 3GPP IMS standards. 1.1.4. CCXML <-> VoiceXML Interaction CCXML 1.0 [CCXML10] applications provide services mainly through controlling the interaction between Connections, Conferences, and Dialogs. Although CCXML is capable of supporting arbitrary dialog environments, VoiceXML is commonly used as a dialog environment in conjunction with CCXML applications; CCXML is specifically designed to effectively support the use of VoiceXML. CCXML 1.0 defines Burke, et al. Expires June 11, 2006 [Page 6] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 language elements that allow for Dialogs to be prepared, started, and terminated; it further allows for data to be returned by the dialog environment, for call transfers to be requested (by the dialog) and responded to by the CCXML application, and for arbitrary eventing between the CCXML application and running dialog application. The interface described in this document can be used by CCXML 1.0 implementations to control VoiceXML Media Servers. Note, however, that some CCXML language features require eventing facilities between CCXML and VoiceXML sessions that go beyond what is defined in this specification. For example, VoiceXML-controlled call transfers and mid-dialog application-defined events cannot be fully realized using this specification alone. A SIP event package [RFC3265] MAY be used in addition to this specification to provide extended eventing. 1.1.5. Other Use Cases In addition to the use cases described in some detail above, there are a number of other intended use cases that are not described in detail, such as: 1. Use of a VoiceXML Media Server as an adjunct to an IP-based PBX/ ACD, possibly to provide voicemail/messaging, automated attendant, or other capabilities. 2. Invocation and control of a VoiceXML session that provides the voice modality component in a multimodal system. 1.2. Terminology Application Server: A SIP Application Server hosts and executes services, in particular by terminating SIP sessions on a media server. The Application Server MAY also act as an HTTP server [RFC2616] in interactions with media servers. VoiceXML Media Server: A VoiceXML interpreter including a SIP-based interpreter context and the requisite media processing capabilities to support VoiceXML functionality. VoiceXML Session: A VoiceXML Session is a multimedia session comprising of at least a SIP user agent, a VoiceXML Media Server, the data streams between them, and an executing VoiceXML application. VoiceXML Dialog: Equivalent to VoiceXML Session. Burke, et al. Expires June 11, 2006 [Page 7] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 2. VoiceXML Session Establishment and Termination This section describes how to establish a VoiceXML Session, with or without preparation, and how to terminate a session. This section also addresses how session information is made available to VoiceXML applications. 2.1. Service Identification The SIP Request-URI is used to identify the VoiceXML media service as defined in [NETANN]. The user part of the SIP Request-URI is fixed to "dialog". The initial VoiceXML document is specified with the "voicexml" parameter. In addition, parameters are defined that control how the VoiceXML Media Server fetches the specified VoiceXML document. The list of parameters defined by this specification is as follows: voicexml: URL of the initial VoiceXML document to fetch. This will typically contain an HTTP URI, but may use other URI schemes, for example to refer to local, static VoiceXML documents. If the "voicexml" parameter is omitted, the VoiceXML Media Server may select the initial VoiceXML document by other means, such as by applying a default, or may reject the request. maxage: Used to set the max-age value of the Cache-Control header in conjunction with VoiceXML documents fetched using HTTP, as per [RFC2616]. If omitted, the VoiceXML Media Server will use a default value. maxstale: Used to set the max-stale value of the Cache-Control header in conjunction with VoiceXML documents fetched using HTTP, as per [RFC2616]. If omitted, the VoiceXML Media Server will use a default value. method: Used to set the HTTP method applied in the fetch of the initial VoiceXML document. Allowed values are "get" or "post" (case-insensitive). Default is "get". postbody: Used to set the application/x-www-form-urlencoded encoded [HTML4] HTTP body for "post" requests (or is otherwise ignored). The postbody value is the prepared application/ x-www-form-urlencoded content, subsequently URL-encoded (see note below). Other application-specific parameters may be added to the Request-URI and are exposed in VoiceXML session variables (see section 2.5). Parameters of the Request-URI in subsequent re-INVITEs are ignored. Burke, et al. Expires June 11, 2006 [Page 8] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 One consequence of this is that the VoiceXML Media Server cannot be instructed by the Application Server to change the executing VoiceXML Application after a VoiceXML Session has been started. Note: Special characters in Request-URI parameter values need to be URL-encoded as required by the SIP URI syntax, for example '?' (%3f), '=' (%3d), and ';' (%3b). The VoiceXML Media Server MUST therefore unencode Request-URI parameter values before making use of them or exposing them to running VoiceXML applications. As an example, the following SIP Request-URI identifies the use of VoiceXML media services, with 'http://appserver.example.com/promptcollect.vxml' as the initial VoiceXML document, to be fetched with max-age/max-stale values of 3600s/0s respectively: sip:dialog@mediaserver.example.com; \ voicexml=http://appserver.example.com/promptcollect.vxml; \ maxage=3600;maxstale=0 2.2. Initiating a VoiceXML Session A VoiceXML Session is initiated via the Application Server using a SIP INVITE or REFER (see section 5.2). Typically, the Application Server will be specialized in providing VoiceXML services. At a minimum, the Application Server may behave as a simple proxy by rewriting the Request-URI received from the User Agent to a Request- URI suitable for consumption by the VoiceXML Media Server (as specified in section 2.1). For example, a User Agent might present a dialed number: tel:8965 which the Application Server maps to a directory assistance application on the VoiceXML Media Server with a Request-URI of: sip:dialog@ms1.example.com; \ voicexml=http://as1.example.com/da.vxml The Application Server SHOULD insert its own URI in the Record-Route header so that it remains in the signaling path for subsequent signaling related to the session. This is of particular importance for call transfers so that upstream Application Servers or proxy servers see signaling originating from the Application Server and not the VoiceXML Media Server itself. Certain header values in the INVITE message to the VoiceXML Media Server are mapped into VoiceXML session variables and are specified in section 2.5. Burke, et al. Expires June 11, 2006 [Page 9] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 On receipt of the INVITE, the VoiceXML Media Server issues a provisional response, 100 Trying, and commences the fetch of the initial VoiceXML document. The 200 OK response indicates that the VoiceXML document has been fetched and parsed correctly and is ready for execution. Application execution commences on receipt of the ACK (except if the dialog is being prepared as specified in section 2.3). Note that the 100 Trying response MUST be sent on receipt of the INVITE in accordance with [RFC3261] since the VoiceXML Media Server cannot in general guarantee that the initial fetch will complete in less than 200 ms. As an optimization, prior to the 200 OK response, the VoiceXML Media Server MAY execute the application up to the point of the first VoiceXML waiting state or prompt flush. A VoiceXML Media Server, like any SIP User Agent, may be unable to accept the INVITE request for a variety of reasons. For instance, an SDP offer contained in the INVITE might require the use of codecs that are not supported by the Media Server. In such cases, the Media Server should respond as defined by [RFC3261]. However, there are error conditions specific to VoiceXML, as follows: 1. If the Request-URI does not conform to this specification, a 400 Bad Request MUST be returned (unless it is used to select other services not defined by this specification). 2. If the Request-URI does not include a "voicexml" parameter, and the VoiceXML Media Server does not elect to use a default page, the VoiceXML Media Server MUST return a final response of 400 Bad Request, and SHOULD include a Warning header with a 3-digit code of 399 and a human readable error message. 3. If the VoiceXML document cannot be fetched or parsed, the VoiceXML Media Server MUST return a final response of 500 Server Internal Error and SHOULD include a Warning header with a 3-digit code of 399 and a human readable error message. Informational note: Certain applications may pass a significant amount of data to the VoiceXML dialog in the form of Request-URI parameters. This may cause the total size of the INVITE request to exceed the MTU of the underlying network. In such cases, applications/implementations must take care either to use a transport appropriate to these larger messages (such as TCP), or to use alternative means of passing the required information to the VoiceXML dialog (such as the use of an HTTP redirector). This note also applies if the dialog is started using a REFER request as described in section 5.2. Burke, et al. Expires June 11, 2006 [Page 10] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 2.3. Preparing a VoiceXML Session In certain scenarios, it is beneficial to prepare a VoiceXML Session for execution prior to running it. A previously prepared VoiceXML Session is expected to execute with minimal delay when instructed to do so. If a media-less SIP dialog is established with the initial INVITE to the VoiceXML Media Server, the VoiceXML Application will not execute after receipt of the ACK. To run the VoiceXML Application, the AS must issue a re-INVITE to establish a media session. A media-less SIP dialog can be established by sending SDP containing no media lines in the initial INVITE. Alternatively, if no SDP is sent in the initial INVITE, the VoiceXML Media Server will include an offer in the 200 OK message, which can be responded to with an answer in the ACK with the media port(s) set to 0. Once a VoiceXML Application is running, a re-INVITE which disables the media streams (i.e. sets the ports to 0) will not otherwise affect the executing application. 2.4. Terminating a VoiceXML Session The Application Server can terminate a VoiceXML Session by issuing a BYE to the VoiceXML Media Server. Upon receipt of a BYE in the context of an existing VoiceXML Session, the the VoiceXML Media Server MUST send a 200 OK response, and MUST throw a 'connection.disconnect.hangup' event to the VoiceXML application. The VoiceXML Media Server may also initiate termination of the session by issuing a BYE request. This will typically occur as a result of encoutering a or in the VoiceXML application, due to the VoiceXML application running to completion, or due to unhandled errors within the VoiceXML application. 2.5. Session Variable Mappings The standard VoiceXML session variables are assigned values according to: session.connection.local.uri: Evaluates to the SIP URI specified in the To: header of the initial INVITE (or REFER). session.connection.remote.uri: Evaluates to the SIP URI specified in the From: header of the initial INVITE (or REFER). Burke, et al. Expires June 11, 2006 [Page 11] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 session.connection.redirect: This array is populated by information contained in the Diversion [DIV] header in the initial INVITE or is otherwise undefined. Each URI entry in the Diversion header is mapped, in reverse order, into an element of the session.connection.redirect array. Properties of each element of the array are mapped according to: the name-addr value is mapped to the uri property, the privacy parameter is mapped to the pi property, and the screen parameter is mapped to the si property. The reason parameter in the Diversion header is mapped to the reason property according to: * unknown - "unknown", * user-busy - "user busy", * no-answer - "no reply", * deflection - "deflection immediate response", * unavailable - "mobile subscriber not reachable". Other values of the reason parameter in the Diversion header are mapped verbatim to the reason property. session.connection.protocol.name: Evaluates to "sip". session.connection.protocol.version: Evaluates to "2.0". session.connection.protocol.sip.headers: This is an associative array where each key in the array is the case-sensitive, non-compact name of a SIP header in the initial INVITE. If multiple header fields of the same field name are present, the values are combined into a single comma-separated value. Implementations MUST at a minimum include the Call-ID header and MAY include other headers. For example, session.connection.protocol.sip.headers["Call-ID"] evaluates the the Call-ID of the SIP dialog. session.connection.protocol.sip.requesturi: This is an associative array where the array keys and values are formed from the URI parameters on the SIP Request-URI of the initial INVITE (or REFER) according to the following rules: * If the URI parameter name includes no periods, the key (of type string) is formed from the entire parameter name and its corresponding value is of type string and evaluates to the URI parameter value. Burke, et al. Expires June 11, 2006 [Page 12] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 * If the URI parameter name includes a period, the array key (of type string) is formed from the characters to the left of the period and the corresponding value is of type object. A property is added to that object whose name is formed from the characters to the right of the period (up to the next period). If the URI parameter name contains no further periods, the property is of type string and evaluates to the URI parameter value. Otherwise it is of type object and the process of adding properties repeats. In addition, the array's toString function returns the full SIP Request-URI. For example, assuming a Request-URI of sip:dialog@ example.com;voicexml=http://ajax.com;obj.x=1;obj.y=2;obj.z.a=3 then session.connection.protocol.requesturi["voicexml"] evaluates to "http://ajax.com", session.connection.protocol.requesturi["obj"].x evaluates to "1", session.connection.protocol.requesturi["obj"].y evaluates to "2", session.connection.protocol.requesturi["obj"].z.a evaluates to "3", session.connection.protocol.requesturi evaluates to the complete Request-URI. session.connection.aai: Evaluates to session.connection.protocol.sip.requesturi['aai'] session.connection.ccxml: Evaluates to session.connection.protocol.sip.requesturi['ccxml'] session.connection.protocol.sip.codecs: This is an array where each array element corresponds to a codec currently in use by the VoiceXML Session. Each element in the array is an object with at least one property called name. The name property evaluates to the MIME type [RFC3555] of the codec in use. Required parameters for a codec (and any optional parameters present) are mapped to corresponding named string properties. For example, for a media session employing G.711 mu-law audio sampled at 8kHz, session.connection.protocol.sip.codecs[0].name evaluates to "audio/PCMU" and session.connection.protocol.sip.codecs[0].rate evaluates to "8000". Note that this session variable is updated if the codecs for the VoiceXML Session change (due to a re- INVITE). 2.6. Examples 2.6.1. Basic Session Establishment This example illustrates an Application Server setting up a VoiceXML Session on behalf of a User Agent. Burke, et al. Expires June 11, 2006 [Page 13] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 User Agent Application Server VoiceXML Media Server | | | |(1) INVITE [offer] | | |----------------------->|(2) INVITE [offer] | |(3) 100 Trying |----------------------->| |<-----------------------|(4) 100 Trying | | |<-----------------------| | | | | |(5) HTTP GET | (fetch | |<-----------------------| initial | |(6) HTTP 200 OK [VXML] | VoiceXML) | |----------------------->| document) | | | | |(7) 200 OK [answer] | |(8) 200 OK [answer] |<-----------------------| |<-----------------------| | |(9) ACK | | |----------------------->|(10) ACK | | |----------------------->| (execute |(11) RTP | | VoiceXML |.................................................| application) | | | 2.6.2. VoiceXML Session Preparation This example demonstrates the preparation of a VoiceXML Session. In this example, the VoiceXML session is prepared prior to placing an outbound call to a User Agent, and is started as soon as the User Agent answers. The [answer1:0] notation is used to indicate an SDP answer with the media ports set to 0. Burke, et al. Expires June 11, 2006 [Page 14] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 User Agent Application Server VoiceXML Media Server | | | | |(1) INVITE | | |----------------------->| | |(2) 100 Trying | | |<-----------------------| | | | | |(3) HTTP GET | (fetch | |<-----------------------| initial | |(4) HTTP 200 OK [VXML] | VoiceXML) | |----------------------->| document) | | | | |(5) 200 OK [offer1] | | |<-----------------------| | |(6) ACK [answer1:0] | |(7) INVITE |----------------------->| |<-----------------------| | |(8) 200 OK [offer2] | | |----------------------->|(9) INVITE [offer2] | | |----------------------->| | |(10) 100 Trying | | |<-----------------------| | |(11) 200 OK [answer2] | |(12) ACK [answer2] |<-----------------------| |<-----------------------|(13) ACK | | |----------------------->| (execute |(14) RTP | VoiceXML |.................................................| application) | | | 2.6.3. MRCP Establishment The VoiceXML Media Server SHOULD use the [MRCPv2] protocol to handle media processing resources for speech recognition, speech synthesis, speaker verification and speaker identification. The example illustrates a VoiceXML Media Server establishing an MRCP Session. Burke, et al. Expires June 11, 2006 [Page 15] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 Application Server VoiceXML MS MRCPv2 Server Web Server | | | | |(1) INVITE [offer1] | | | |------------------->| | | |(2) 100 Trying | | | |<-------------------|(3) HTTP GET | | | |---------------------------------------->| | | | | | |(4) HTTP 200 OK [VXML]| | | |<----------------------------------------| | | | | | |(5) INVITE [offer2] | | | |--------------------->| | | | | | | |(6) 200 OK [answer2] | | | |<---------------------| | | | | | | |(7) ACK | | | |--------------------->| | | | | | | |(8) MRCP connection | | | |<-------------------->| | |(9) 200 OK [answer1]| | | |<-------------------| | | | | | | |(10) ACK | | | |------------------->| | | | | | | |(11) RTP | | | ...............................................| | | | | | The VoiceXML Media Server is responsible for establishing a session with the MRCPv2 Media Resource Server prior to sending the 200 OK response to the initial INVITE. The VoiceXML Media Server will perform the appropriate offer/answer with the MRCPv2 Media Resource Server based on the SDP capabilities of the Application Server and the MRCPv2 Media Resource Server. The VoiceXML Media Server will change the offer received from step 1 to establish a MRCPv2 session in step (5) and will re-write the SDP to include an m-line for each MRCPv2 resource to be used and other required SDP modifications as specified by MRCPv2. Once the VoiceXML Media Server performs the offer/answer with the MRCPv2 Media Resource Server, it will establish a MRCPv2 control channel in step (8). If a media-less SIP dialog is established with the initial INVITE to the VoiceXML Media Server, a MRCP session MUST not be established until the Application Server issues a re-invite to the VoiceXML Media Burke, et al. Expires June 11, 2006 [Page 16] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 Server. If an MRCP resource is required and an MRCP session cannot be established, the VoiceXML Media Server MUST decline service by returning a 503 Temporarily Unavailable final response. The VoiceXML Media Server SHOULD include a Warning header with a 3-digit code of 399 and a human readable error message such as "recognizer resource unavailable". Burke, et al. Expires June 11, 2006 [Page 17] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 3. Media Support This section describes the mandatory and optional media support required by this interface. 3.1. Offer/Answer The VoiceXML Media Server MUST support the standard offer/answer mechanism of [RFC3264]. In particular, if an SDP offer is not present in the INVITE, the VoiceXML Media Server will make an offer in the 200 OK response listing its supported codecs. 3.2. Early Media The VoiceXML Media Server MAY support early establishment of media streams by sending a 183 Session Progress provisional response to the initial INVITE. This allows the Application Server to establish media streams between a user agent and the VoiceXML Media Server while the initial VoiceXML document is being processed. This is useful primarily for minimizing the delay in starting a VoiceXML Session, since media stream establishment and initial VoiceXML document processing can occur in parallel. This can be particularly important in cases where the session with the user agent has already been established, since the user agent is already "connected". The following flow demonstrates the use of early media: User Agent Application Server VoiceXML Media Server | | | |...(Existing session)...| | | |(1) INVITE | | |----------------------->| | |(2) 183 [offer] | |(3) re-INVITE [offer] |<-----------------------| |<-----------------------| | |(4) 200 OK [answer] | | |----------------------->| | |(5) ACK | | |<-----------------------| | | |(6) HTTP GET | (fetch | |<-----------------------| initial | |(7) HTTP 200 OK [VXML] | VoiceXML) | |----------------------->| document) | | | | |(8) 200 OK [offer] | | |<-----------------------| | |(9) ACK [answer] | | |----------------------->| (execute Burke, et al. Expires June 11, 2006 [Page 18] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 |(10) RTP | | VoiceXML |.................................................| application) The use of early media is substantially complicated if the SDP supplied in the 183 Session Progress differs from that supplied in the 200 OK. Therefore, if a VoiceXML Media Server generates a 183 Session Progress provisional response containing SDP, it MUST return identical SDP when generating the 200 OK final response (i.e. the "gateway model" in [RFC3960]). Early media is not optimal in all circumstances; for instance, when handling an incoming call, a 183 Session Progress propagated by the Application Server to the user agent will typically stop the "ringback tone" a user would otherwise hear. Furthermore, a 183 Session Progress provisional response does not guarantee that the VoiceXML application will be executed successfully - the subsequent fetching of the VoiceXML document could fail. As such, Application Servers may choose to ignore any early media SDP from the VoiceXML Media Server. 3.3. Modifying the Media Session The VoiceXML Media Server MUST allow the media session to be modified via a re-INVITE and SHOULD support the UPDATE method [RFC3311] for the same purpose. In particular, it MUST be possible to change streams between sendrecv, sendonly, and recvonly as specified in [RFC3264]. Unidirectional streams are useful for announcement- or listening-only (hotword). The preferred mechanism for putting the media session on hold is specified in [RFC3264], i.e. the UA modifies the stream to be sendonly and mutes its own stream. Modification of the media session does not affect VoiceXML application execution. 3.4. Audio and Video Codecs For the purposes of achieving a basic level of interoperability, this section specifies a minimal subset of codecs and RTP payload formats that MUST be supported by the VoiceXML Media Server. For audio-only applications, G.711 mu-law and A-law MUST be supported using the RTP payload type 0 and 8 [RFC3551]. Other codecs and payload formats MAY be supported. Video telephony applications, which employ a video stream in addition to the audio stream, are possible in VoiceXML 2.0/2.1 through the use of multimedia file container formats such as the .3gp [TS26244] and Burke, et al. Expires June 11, 2006 [Page 19] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 .mp4 formats [IEC14496-14]. Video support is optional for this specification. If video is supported then: 1. H.263 Baseline [RFC2429] MUST be supported. For legacy reasons, the 1996 version of H.263 MAY be supported using the RTP payload format defined in [RFC2190] (payload type 34 [RFC3551]). 2. AMR-NB audio [RFC3267] SHOULD be supported. 3. MPEG-4 video [RFC3016] SHOULD be supported. 4. MPEG-4 AAC audio [RFC3016] SHOULD be supported. 5. Other codecs and payload formats MAY be supported. 3.5. DTMF DTMF events [RFC2833] MUST be supported. The VoiceXML Media Server MAY perform DTMF detection using other means such as detecting DTMF within the audio stream. If the SDP from the user agent indicates support for [RFC2833] telephone-event then that mechanism SHOULD be used only. Burke, et al. Expires June 11, 2006 [Page 20] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 4. Returning Data to the Application Server This section discusses the mechanisms for returning data (e.g. collected utterance or digit information) from the VoiceXML Media Server to the Application Server. 4.1. HTTP Mechanism At any time during the execution of the VoiceXML application, data can be returned to the Application Server via a HTTP POST using standard VoiceXML elements such as or . Notably, the element in VoiceXML 2.1 [VXML21] allows data to be sent to the Application Server efficiently without requiring a VoiceXML page transition and is ideal for short VoiceXML applications such as "prompt and collect". For most applications, it is necessary to correlate the information being passed over HTTP with a particular VoiceXML Session. One way this can be achieved is to include the SIP Call-ID (accessible in VoiceXML via the session.connection.protocol.sip.headers array) within the HTTP POST fields. Alternatively, a unique "POST-back URI" can be specified as an application-specific URI parameter in the Request-URI of the initial INVITE (accessible in VoiceXML via the session.connection.protocol.sip.requesturi array). 4.2. SIP Mechanism Data can be returned to the Application Server via the namelist attribute on or . Namelist variables are first converted to a string (via the ECMAScript toString() operation) and encoded in the message body of the BYE message using the application/ x-www-form-urlencoded format content type [HTML4]. The behavior of including a recording variable in the namelist is not defined. Note: This mechanism relies on a BYE being issued from the VoiceXML Media Server and hence is only available when the VoiceXML Application terminates. While this feature is useful for many applications (e.g. prompt and collect), the HTTP mechanism, which allows for mid-call information to be sent to the Application Server, may be preferable for some applications. If a VoiceXML Application returns data using the namelist attribute on [VXML21] and subsequently employs a namelist attribute on , the latter namelist information is discarded. This specification extends the application/x-www-form-urlencoded by replacing non-ASCII characters with one or more octets of the the UTF-8 representation of the character, with each octet in turn Burke, et al. Expires June 11, 2006 [Page 21] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 replaced by %HH, where HH represents the uppercase hexadecimal notation for the octet value and % is a literal character. The content type MUST include the charset parameter to indicate UTF-8 encoding. For example, consider the VoiceXML snippet: ... ... If id equals 1234 and pin equals 0000, say, the BYE message would look similar to: BYE sip:user@pc33.example.com SIP/2.0 Via: SIP/2.0/UDP 192.0.2.4;branch=z9hG4bKnashds10 Max-Forwards: 70 From: sip:dialog@example.com;tag=a6c85cf To: sip:user@example.com;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 231 BYE Content-Type: application/x-www-form-urlencoded;charset=utf-8 Content-Length: 16 id=1234&pin=0000 Burke, et al. Expires June 11, 2006 [Page 22] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 5. Outbound Calling Outbound calls can be triggered via the Application Server using either third party call control [RFC3725] or with the SIP REFER mechanism [RFC3515]. 5.1. Third Party Call Control Mechanism Flow IV from [RFC3725] is recommended in conjunction with the VoiceXML Session preparation mechanism. This flow has several advantages over others, namely: 1. Selection of a VoiceXML Media Server and preparation of the VoiceXML Application can occur before the call is placed to avoid the callee experiencing delays. 2. Avoids timing difficulties that could occur with other flows due to the time taken to fetch and parse the initial VoiceXML document. 3. The flow is IPv6 compatible. An example flow for an Application Server initiated outbound call is provided in section 2.6.2. 5.2. REFER Mechanism The Application Server can place a REFER request to the VoiceXML Media Server outside of a SIP dialog to initiate an outbound call. The Request-URI in the REFER is constructed identical to that of an INVITE to the VoiceXML Media Server and carries the same semantics. The Refer-To header contains the URI for the VoiceXML Media Server to place the call to. On receipt of the REFER request, the VoiceXML Media Server MUST issue a provisional response, 100 Trying. The 202 Accepted response indicates that the VoiceXML document has been fetched and parsed correctly. The VoiceXML Media Server proceeds to place the outbound INVITE and will execute the application after the ACK is sent. If the VoiceXML Session cannot be started, then the VoiceXML Media Server MUST respond to the REFER request using the procedure defined in section 2.2 above. An example is of the REFER initiated outbound call is given below. The NOTIFY messages, which contain message/sipfrag bodies [RFC3515], allow the Application Server to monitor the progress of the outbound call attempt. Burke, et al. Expires June 11, 2006 [Page 23] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 Note: An in-dialog REFER will result in a 403 Forbidden response. VoiceXML Media Server Application Server User Agent | | | |(1) REFER | | |<-----------------------| | |(2) 100 Trying | | |----------------------->| | |(3) NOTIFY | | |----------------------->| | |(4) 200 OK | | |<-----------------------| | |(5) HTTP GET | | |----------------------->| | |(6) HTTP 200 OK [VXML] | | |<-----------------------| | |(7) 202 Accepted | | |----------------------->| | |(8) INVITE [offer] | |------------------------------------------------>| |(9) 200 OK [answer] | |<------------------------------------------------| |(10) NOTIFY | | |----------------------->| | |(11) 200 OK | | |<-----------------------| | |(12) ACK | |------------------------------------------------>| |(13) RTP | |.................................................| | | Burke, et al. Expires June 11, 2006 [Page 24] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 6. Call Transfer Transfer capability is optional in VoiceXML [VXML20], [VXML21]. If transfer is supported, it MUST use the mechanism described in this section. The flows specified here assume that the Application Server is assuming a proxy server role. More complex behaviors are possible, for example the Application Server could act as a B2BUA to ensure it remains in the signaling path during transfers where the VoiceXML Media Server has dropped out. Note that the mechanisms used by the VoiceXML Media Server are still valid in the absence of an Application Server. In what follows, the provisional responses have been omitted for clarity. Note: The transfer flows specified here are selected on the basis that they provide the best interworking across a wide range of SIP devices. CCXML<->VoiceXML implementations, which require tight- coupling in the form of bi-directional eventing to support all transfer types defined in VoiceXML, may benefit from other approaches, such as the use of SIP event packages [RFC3265]. 6.1. Blind The blind transfer sequence is initiated by the VoiceXML Media Server via a REFER message [RFC3515] on the original SIP dialog. The Refer-To header contains the URI for the called part, as specified via the 'dest' or 'destexpr' attributes on the VoiceXML tag. If the REFER request succeeds, in which case the VoiceXML Media Server will receive a 202 Accepted, the VoiceXML Media Server throws the connection.disconnect.transfer event and will terminate the VoiceXML Session with a BYE message. If the REFER request results in a non-2xx response, the 's form item variable (or event raised) depends on the SIP response and is specified in the following table. Note that this indicates that the transfer request was rejected, and does not indicate the outcome of actually performing the transfer (e.g. busy, no answer, etc). Burke, et al. Expires June 11, 2006 [Page 25] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 +-------------------------+-----------------------------------+ | SIP Response | variable / event | +-------------------------+-----------------------------------+ | 404 Not Found | error.connection.baddestination | | 405 Method Not Allowed | error.unsupported.transfer.blind | | 503 Service Unavailable | error.connection.noresource | | (No response) | network_busy | | (Other 3xx/4xx/5xx/6xx) | unknown | +-------------------------+-----------------------------------+ An example is illustrated below (NOTIFY messages and provisional responses have been omitted for clarity). In this example, the Application Server behaves as a proxy and is not in the signaling path of the transferred call. User Agent 1 Application VoiceXML User Agent 2 (Caller) Server Media Server (Callee) | | | | |(0) RTP | | | |...................................| | | | | | | |(1) REFER | | |(2) REFER |<----------------| | |<----------------| | | |(3) 202 Accepted | | | |---------------->|(4) 202 Accepted | | | |---------------->| | | |(5) BYE | | |(6) BYE |<----------------| | |<----------------| | | |(7) 200 OK | | | |---------------->|(8) 200 OK | | | |---------------->| Stop RTP (0) | |(9) INVITE | |---------------------------------------------------->| |(10) 200 OK | |<----------------------------------------------------| |(11) ACK | |---------------------------------------------------->| |(13) RTP | |.....................................................| If the "aai" or "aaiexpr" attribute is present on , it is appended to the Refer-To URI as a parameter named "aai" in the REFER method. Burke, et al. Expires June 11, 2006 [Page 26] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 6.2. Bridge The bridge transfer function results in the creation of a small multi-party session involving the Caller, the VoiceXML Media Server, and the Callee. The VoiceXML Media Server invites the Callee to the session and will eject the Callee if the transfer is terminated. Issue Note: There is currently no standard mechanism to allow the Application Server to indicate dynamically to the VoiceXML Media Server that it wishes to be included in the signaling path for the new outbound SIP dialog. One possible mechanism is to add a specific parameter to the Application Server's URI in the Record- Route header in the initial inbound INVITE. When present this could be used to indicate that the VoiceXML Media Server must add that URI to its pre-exiting route set for outbound INVITEs. In the flows illustrated below, it is assumed that the Application Server is in the pre-existing route set configured in the local policy. If the "aai" or "aaiexpr" attribute is present on , it is appended to the Request-URI in the INVITE as a URI parameter named "aai". During the transfer attempt, audio specified in the transferaudio attribute of is streamed to User Agent 1. A VoiceXML Media Server MAY play early media received from the Callee to the Caller if the transferaudio attribute is omitted. The bridge transfer sequence is illustrated below. The VoiceXML Media Server (acting as a UAC) makes a call to User Agent 2 with the same codecs used by User Agent 1 via the Application Server. When the call setup is complete, RTP flows between User Agent 2 and the VoiceXML Media Server. This stream is mixed with User Agent 1's. Burke, et al. Expires June 11, 2006 [Page 27] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 User Agent 1 Application VoiceXML User Agent 2 (Caller) Server Media Server (Callee) | | | | |(0)RTP | | | |.......................................| | | | | | | |(1)INVITE [offer] | | | |<------------------| | | |(2)INVITE [offer] | | | |-------------------------------------->| | |(4) 200 OK [answer] | | |<--------------------------------------| | |(5) 200 OK [answer]| | | |------------------>| | | |(6) ACK | | | |<------------------| | | |(7) ACK | | |-------------------------------------->| | | |(8) RTP | | | |...................| mix | | | | 0+8 If a final response is not received from User Agent 2 from the INVITE and the connecttimeout expires (specified as an attribute of ), the VoiceXML Media Server will issue a CANCEL to terminate the transaction and the 's form item variable is set to noanswer. If INVITE results in a non-2xx response, the 's form item variable (or event raised) depends on the SIP response and is specified in the following table. +-------------------------+-----------------------------------+ | SIP Response | variable / event | +-------------------------+-----------------------------------+ | 404 Not Found | error.connection.baddestination | | 405 Method Not Allowed | error.unsupported.transfer.bridge | | 408 Request Timeout | noanswer | | 486 Busy Here | busy | | 503 Service Unavailable | error.connection.noresource | | (No response) | network_busy | | (Other 3xx/4xx/5xx/6xx) | unknown | +-------------------------+-----------------------------------+ The 405 Method Not Allowed response can be used by the AS to Burke, et al. Expires June 11, 2006 [Page 28] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 gracefully decline bridge transfers Once the transfer is established, the VoiceXML Media Server can "listen" to the media stream from User Agent 1 to perform speech or DTMF hotword, which when matched, results in a near-end disconnect, i.e. the VoiceXML Media Server issues a BYE to User Agent 2 and the VoiceXML application continues with User Agent 1. A BYE will also be issued to User Agent 2 if the call duration exceeds the maximum duration specified in the maxtime attribute on . If User Agent 2 issues a BYE during the transfer, the transfer terminates and the VoiceXML 's form item variable receives the value far_end_disconnect. If User Agent 1 issues a BYE during the transfer, the transfer terminates and the VoiceXML event connection.disconnect.transfer is thrown. 6.3. Consultation The consultation transfer (also called attended transfer [SIPEX]) is similar to a blind transfer except that the outcome of the transfer call setup is known and the Caller is not dropped as a result of an unsuccessful transfer attempt. Consultation transfer commences with the same flow as for bridge transfer except that the RTP streams are not mixed at step (8) and error.unsupported.transfer.consultation supplants error.unsupported.transfer.bridge. Assuming a new SIP dialog with User Agent 2 is created, the remainder of the sequence follows as illustrated below. Consultation transfer makes use of the Replaces: header such that User Agent 1 calls User Agent 2 and replaces the latter's SIP dialog with the VoiceXML Media Server with a new SIP dialog between the Caller and Callee. Burke, et al. Expires June 11, 2006 [Page 29] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 User Agent 1 Application VoiceXML User Agent 2 (Caller) Server Media Server (Callee) | | | | |(0) RTP | | | |...................................|(8) RTP | | | |.................| | |(9) REFER | | |(10) REFER |<----------------| | |<----------------| | | |(11) 202 Accepted| | | |---------------->|(12) 202 Accepted| | | |---------------->| | |(13) INVITE Replaces: ms1.example.com | |---------------------------------------------------->| |(14) 200 OK | |<----------------------------------------------------| |(15) ACK | |---------------------------------------------------->| |(16) RTP | |.....................................................| | | |(17) BYE | | | |<----------------| | | |(18) 200 OK | | | |---------------->| Stop |(19) NOTIFY | | | RTP (8) |---------------->|(20) NOTIFY | | | |---------------->| | | |(21) 200 OK | | |(22) 200 OK |<----------------| | |<----------------| | | | |(23) BYE | | |(24) BYE |<----------------| | |<----------------| | | |(25) 200 OK | | | |---------------->|(26) 200 OK | | Stop | |---------------->| | RTP (0) | | | | Burke, et al. Expires June 11, 2006 [Page 30] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 7. Contributors The editors gratefully acknowledge the following individuals and their companies who contributed to this specification: R. J. Auburn (Voxeo) Hans Bjurstrom (Hewlett-Packard) Dave Burke (Voxpilot) Emily Candell (Comverse) Brian Frasca (Tellme) Jeff Haynie (Vocalocity) Scott McGlashan (Hewlett-Packard) Mark Scott (VoiceGenie) Rao Surapaneni (Tellme) Burke, et al. Expires June 11, 2006 [Page 31] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 8. Security Considerations Exposing network services with well-known addresses may not be desirable. The VoiceXML Media Server SHOULD authenticate and authorize requesting endpoints per local policy. This is particularly important for REFER-initated outbound calls. Some applications may choose to transfer confidential information to or from the VoiceXML Media Server. The VoiceXML Media Server SHOULD implement the sips: and https: schemes to provide data integrity. The VoiceXML Media Server SHOULD use authentication and TLS when establishing MRCP control sessions with a MRCPv2 Media Resource Server. To mitigate against the possibility for denial of service attacks, the VoiceXML Media Server SHOULD have local policies such as time- limiting VoiceXML application execution. Burke, et al. Expires June 11, 2006 [Page 32] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 9. IANA Considerations This document makes no request of IANA. Note to RFC Editor: this section may be removed on publication as an RFC. Burke, et al. Expires June 11, 2006 [Page 33] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 10. References 10.1. Normative References [DIV] Levy, S. and B. Byerly, "Diversion Indication in SIP", draft-levy-sip-diversion-08 (work in progress), August 2004. [HTML4] Raggett, D., Le Hors, A., and I. Jacobs, "HTML 4.01 Specification", W3C Recommendation, Dec 1999. [MRCPv2] Shanmugham, S. and D. Burnett, "Media Resource Control Protocol Version 2", draft-ietf-speechsc-mrcpv2-09 (work in progress), Oct 2005. [NETANN] Burger, E., Van Dyke, J., and A. Spitzer, "Basic Network Media Services with SIP", draft-burger-sipping-netann-11 (work in progress), February 2005. [RFC1890] Schulzrinne, H., "RTP Profile for Audio and Video Conferences with Minimal Control", RFC 1890, January 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2190] Zhu, C., "RTP Payload Format for H.263 Video Streams", RFC 2190, September 1997. [RFC2429] Bormann, C., Cline, L., Deisher, G., Gardos, T., Maciocco, C., Newell, D., Ott, J., Sullivan, G., Wenger, S., and C. Zhu, "RTP Payload Format for the 1998 Version of ITU-T Rec. H.263 Video (H.263+)", RFC 2429, October 1998. [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. [RFC2833] Schulzrinne, H. and S. Petrack, "RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals", RFC 2833, May 2000. [RFC3016] Kikuchi, Y., Nomura, T., Fukunaga, S., Matsui, Y., and H. Kimata, "RTP Payload Format for MPEG-4 Audio/Visual Streams", RFC 3016, November 2000. [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, Burke, et al. Expires June 11, 2006 [Page 34] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 June 2002. [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002. [RFC3265] Roach, A., "Session Initiation Protocol (SIP)-Specific Event Notification", RFC 3265, June 2002. [RFC3267] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, "Real-Time Transport Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs", RFC 3267, June 2002. [RFC3311] Rosenberg, J., "The Session Initiation Protocol (SIP) UPDATE Method", RFC 3311, October 2002. [RFC3515] Sparks, R., "The Session Initiation Protocol (SIP) Refer Method", RFC 3515, April 2003. [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", STD 65, RFC 3551, July 2003. [RFC3555] Casner, S. and P. Hoschka, "MIME Type Registration of RTP Payload Formats", RFC 3555, July 2003. [RFC3725] Rosenberg, J., Peterson, J., Schulzrinne, H., and G. Camarillo, "Best Current Practices for Third Party Call Control (3pcc) in the Session Initiation Protocol (SIP)", BCP 85, RFC 3725, April 2004. [RFC3960] Camarillo, G. and H. Schulzrinne, "Early Media and Ringing Tone Generation in the Session Initiation Protocol (SIP)", RFC 3960, December 2004. [VXML20] McGlashan, S., Burnett, D., Carter, J., Danielsen, P., Ferrans, J., Hunt, A., Lucas, B., Porter, B., Rehor, K., and S. Tryphonas, "Voice Extensible Markup Language (VoiceXML) Version 2.0", W3C Recommendation, March 2004. [VXML21] Oshry, M., Auburn, R J., Baggia, P., Bodell, M., Burke, D., Burnett, D., Candell, E., Kilic, H., McGlashan, S., Lee, A., Porter, B., and K. Rehor, "Voice Extensible Markup Language (VoiceXML) Version 2.1", W3C Candidate Recommendation, June 2005. Burke, et al. Expires June 11, 2006 [Page 35] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 10.2. Informative References [CCXML10] Auburn, R J., "Voice Browser Call Control: CCXML Version 1.0", W3C Working Draft (work in progress), June 2005. [IEC14496-14] "Information technology. Coding of audio-visual objects. MP4 file format", ISO/IEC ISO/IEC 14496-14:2003, October 2003. [SIPEX] Johnston, A., Sparks, R., Cunningham, C., Donovan, S., and K. Summers, "Session Initiation Protocol Examples", draft-ietf-sipping-service-examples (work in progress), July 2005. [SIPVXML] Rosenberg, J., Mataga, P., and D. Ladd, "A SIP Interface to VoiceXML Dialog Servers", draft-rosenberg-sip-vxml-00 (work in progress), July 2001. [TS23002] "3rd Generation Partnership Project: Network architecture (Release 6)", 3GPP TS 23.002 v6.6.0, December 2004. [TS26244] "Transparent end-to-end packet switched streaming service (PSS); 3GPP file format (3GP)", 3GPP TS 26.244 v6.4.0, December 2004. Burke, et al. Expires June 11, 2006 [Page 36] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 Authors' Addresses Dave Burke Voxpilot 6 - 9 Trinity Street Dublin 2 Ireland Email: david.burke@voxpilot.com Mark Scott VoiceGenie 1120 Finch Avenue West, 8th floor Toronto, Ontario M3J 3H7 Canada Email: mscott@voicegenie.com Jeff Haynie Vocalocity 730 Peachtree Street, Suite 1100 Atlanta, GA 30308 USA Email: jhaynie@vocalocity.com R.J. Auburn Voxeo 100 East Pine Street #600 Orlando, FL 32801 USA Email: rj@voxeo.com Scott McGlashan Hewlett-Packard Gustav III:s boulevard 36 SE-16985 Stockholm Sweden Email: Scott.McGlashan@hp.com Burke, et al. Expires June 11, 2006 [Page 37] Internet-Draft SIP Interface to VoiceXML Media Services December 2005 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Burke, et al. Expires June 11, 2006 [Page 38]