Network Work Group E. Burger Internet Draft SnowShore Networks, Inc. Document: draft-burger-sipping-em-rqt-00.txt October 12, 2001 Category: Informational Expires: April 12, 2002 Why Early Media in SIP Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Discussions of requirements for SIP occur in the SIPING workgroup. The SIPING workgroup homepage is at . The SIPING discussion list is at siping@ietf.org. 1. Abstract This document describes the requirements for SIP Early Media. Early Media is the ability of a SIP network to deliver real-time media traffic from the called party to the calling party after the calling party issues an INVITE but before the called party accepts the INVITE with a 200 response. 2. Conventions used in this document This document refers to a media server for playing announcements. A media server is a general-purpose media resource processor that is capable of tone detection and generation, conferencing, interactive voice response, and announcements. We use the term media server in this document for simplicity. However, any SIP endpoint, such as an Burger INFORMATIONAL û Expires 4/2002 1 Why Early Media in SIP October 2001 intelligent SIP Phone or a dumb announcement server, can play the role described, as appropriate to the situation. This document refers to the calling party (the SIP User Agent Client or UAC) in the masculine (he, him, his) and the called party (the SIP User Agent Server or UAS) in the feminine (she, her, hers). This convention is purely for convenience and makes no assumption about the gender of the parties involved. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [2]. TABLE OF CONTENTS 1. Abstract...........................................................1 2. Conventions used in this document..................................1 3. Informative Text...................................................2 4. Introduction.......................................................3 5. PSTN Early Media...................................................3 6. SIP Early Media....................................................4 6.1. PSTN Interworking................................................4 6.1.1. Proxy Announcement.............................................5 6.1.2. Early Media with SIP Announcements.............................6 6.2. SIP Endpoints....................................................7 6.2.1. Intelligent SIP Endpoint.......................................7 6.2.2. Intelligent SIP Endpoint with External Media Server............8 7. Security Considerations............................................9 8. References.........................................................9 9. Acknowledgments...................................................10 10. Author's Addresses...............................................10 11. Full Copyright Statement.........................................11 3. Informative Text Note well that while there are call flows in this document, they are purely informative. Implementers MUST NOT depend on mechanisms or proposals in this document as an agreed-upon standard of any type. The good call flows will make it into their own, normative document. This is an INFORMATIONAL document, not a STANDARDS TRACK document. Burger INFORMATIONAL û Expires 4/2002 2 Why Early Media in SIP October 2001 4. Introduction People have written at length about how to do early media in SIP [3]. However, no one has published why there is a need for early media. Some people state there is no need for early media with the exception of providing an early talk path in the event the media arrives at the caller's user agent before the signaling does. Others state there is a need for early media to replicate the billing mechanisms of the legacy PSTN. Neither position is entirely correct. This document describes a number of scenarios where early media may be appropriate. Note that this document is a true Request for Comments: if you know of other scenarios where early media is appropriate, or if you can implement the scenarios without resorting to early media, please contact the author and participate on the SIPPING list. 5. PSTN Early Media The PSTN uses early media for two purposes. The first is to deliver the far-end talk path to the caller. If the far-end talk path was not open before the signaling arrived, there is a possibility that the network will clip the initial utterance spoken by the called party. For example, if the called party answers the phone and says "Hello", the network might clip the initial utterance to "Lo". With early media, the calling party hears the entire utterance. The second purpose for early media in the PSTN is to deliver signaling information to the caller in the absence of a digital signaling path. Digital Analog Signaling Signaling +--------+ +----+ +----+ +----+ +--------------+ | Caller |---| CO | ... | CO |<--| EO |---| Called Party | +--------+ +----+ +----+ +----+ +--------------+ The Caller's terminal device may have digital signaling to the PSTN, such as through ISDN BRI. In the ISDN, the terminal device can display call progress and generate local signaling tones. Call progress is the state of the call attempt: Alerting (ringing), Busy, No Answer, Redirected, etc. Signaling tones are the country- specific tones you hear when you place a call. Example of such tones are dial tone, ring tone, busy tone, reorder tone, etc. ITU-T document E.180 Supplement 2 [4] describes the actual tones in use in different countries. Even though the caller's device can generate and display call progress, call progress information may not be available. The Figure above shows a typical situation where the call progress information is not available in a digital form. In the depicted Burger INFORMATIONAL û Expires 4/2002 3 Why Early Media in SIP October 2001 situation, the terminating End Office (EO) does not have digital signaling connectivity to its adjacent Central Office (CO). The only way for the EO to signal the call progress of the caller's call attempt is with in-band tones. In the PSTN this is not a major problem in that the caller expects to hear ring tone, busy tone, and so on. For the most part, callers cannot tell the difference between locally generated and remotely generated call progress tones. One situation where the caller can tell the call progress comes from the far end is for international calls. As one can see from E.180 Supplement 2, there are many different tone plans for call progress. Said differently, when placing an international call, the call progress tones sound "foreign." Currently, almost all international calls use the far-end generated signaling tones in early media for caller signaling. 6. SIP Early Media This section describes various scenarios that people have proposed for early media in SIP. For some of the scenarios, there are alternative ways of achieving the same results without resorting to early media. For others, early media appears to be the appropriate solution. 6.1. PSTN Interworking Consider the following network topology. SIP +-------+ ------| Proxy | +--------+ +----+ +----+ / +-------+ | Caller |===| CO | ... | GW |< | SIP +--------+ +----+ +----+ \ RTP +--------+ ======| Media | | Server | +--------+ In this figure, the caller places a call to a SIP endpoint. Before terminating the call to the SIP endpoint or rejecting the call, the proxy wishes to play an announcement to the caller. The announcement could be verbal, but often is call progress tones. Some drafts propose using early media between the Media Server and the gateway. Proposed call flows (for example [5] and [6]) follow the following general theme. The gateway invites the SIP endpoint. A proxy intercepts the call and routes it to an media server, which plays the appropriate call signaling tones. In the example portrayed in the figure below, the media server plays an announcement and then returns a 486 Busy Here indication. A scenario for this would be a Do Not Disturb service that plays a Burger INFORMATIONAL û Expires 4/2002 4 Why Early Media in SIP October 2001 "I'm sorry, but Chris is not available. Please try her again later." message and then returns busy. 6.1.1. Proxy Announcement GW Proxy Media Server | | | | INVITE | | |--------------------->| INVITE | | |--------------------->| | | 180 Trying | | 180 Trying |<---------------------| |<---------------------| 183 Session Progress | | 183 Session Progress |<---------------------| |<---------------------| | | RTP | |<============================================| | | 486 Busy Here | | 486 Busy Here |<---------------------| |<---------------------| | | | | The reason for using 183 and 486 on the SIP side of the network is because mappings from SIP to ISUP, for example, map 200 OK to answered, 180 to ringing, 486 to busy, etc. It may be appropriate for the gateway and proxy to use early media signaling. Early media signaling accurately indicates the state of the call. In the scenario described above, the media server generates the call progress state (e.g., 180, 183, and 486). The proxy, as defined in Section 17 of [3], passes all responses to the gateway unmodified. The proxy could inform the media server which interim call progress state and final result code in the INVITE, possibly as parameters to the Request-URI, a new SIP header, or in the message body. Issues that support such a configuration are that it is trivially easy to correlate the gateway-proxy leg and the proxy-media server leg. They are both part of the same, proxied call. Although this scenario is workable, it does not truly drive a requirement for early media between the proxy and the media server. There are other configurations that can satisfy the need for early media signaling to the gateway without passing early media throughout the SIP network. Burger INFORMATIONAL û Expires 4/2002 5 Why Early Media in SIP October 2001 6.1.2. Early Media with SIP Announcements As described above, it is possible for the media server to drive call progress. However, by definition [7], media servers do not have the application logic to determine the appropriate interim and final result codes. It is more appropriate for the proxy to hand off the call to an application server, or have the application server functionality built-in, to terminate the call signaling and send the appropriate events to the gateway. GW Proxy App Server Media Server | | | | | INVITE | | | |---------->| INVITE | | | |---------->| INVITE | | | 180 |------------->| | 180 |<----------| 200 OK | |<----------| 183 |<-------------| | 183 |<----------| ACK | |<----------| |------------->| | RTP | |<=====================================| | | | | | | | BYE | | | 486 |<-------------| | 486 |<----------| OK (BYE) | |<----------| |------------->| | | | | In this configuration, the interaction between the application server and the gateway is standard SIP and SIP-PSTN inter-working. The status codes and the fact that there is early media makes sense within the SIP framework. Likewise, the interaction between the application server and the media server makes sense. While the call from the gateway to the intended SIP endpoint may or may not be successful, the call from the application server to the media server is successful. The application server contains all of the state machine and call logic for generating the proper result codes in the direction of the gateway. One may expect proxies that implement SIP-PSTN inter-working to have the application server functionality built in. This makes the characteristics of the proxy more like a back-to-back user agent (B2BUA). Such a configuration might impact billing systems that use SIP signaling for billing. Namely, there are now completed calls in the network, between the application server and the media server. The Burger INFORMATIONAL û Expires 4/2002 6 Why Early Media in SIP October 2001 billing system may need to correlate the gateway-application server leg with the application server-media server leg. This is not an insurmountable problem, however. In addition, such a configuration has an added benefit. Service providers can now outsource announcement services. Consider the following administrative mapping. : Administrative : Administrative Domain 1 : Domain 2 : +----+ +-------+ +------------+ : +--------------+ | GW | | Proxy | | App Server | : | Media Server | +----+ +-------+ +------------+ : +--------------+ : Here, announcements are a service of administrative domain 2, while PSTN termination and routing are a service of administrative domain 1. By using normal signaling between the application server and the media server, SIP-based billing between the two domains works as usual. There are completed "calls" to the announcement service, as opposed to incomplete "early media sessions". 6.2. SIP Endpoints 6.2.1. Intelligent SIP Endpoint Is early media purely a PSTN û SIP inter-working problem? We propose that it is not. Here is an example of a service that is a pure SIP user agent to SIP user agent interaction that requires early media. Consider an intelligent SIP phone with a Do Not Disturb feature. The user of the phone can record or select an announcement to play when a caller calls. Once the announcement plays, the phone rejects the call ("hangs up"). In this scenario, a 200 OK response from the phone to the INVITE would be incorrect. The user agent is not accepting the call. In fact, the user agent will ultimately reject the call. Here is a sample call flow. Burger INFORMATIONAL û Expires 4/2002 7 Why Early Media in SIP October 2001 UAC UAS | | | INVITE | |--------------------->| | 180 Trying | |<---------------------| | 183 Session Progress | |<---------------------| | RTP | |<=====================| | | | 486 Busy Here | |<---------------------| | | One might ask why the UAS does not simply return a 486 result code with Error-Info filled in with a URI for the announcement. In certain circumstances that may be appropriate. However, in the case of a SIP Phone UAS, it is unlikely to have the capacity to be handling arbitrary announcement requests simultaneous with arbitrary inbound calls. This method allows the UAS to control the serving of the announcement. 6.2.2. Intelligent SIP Endpoint with External Media Server What if the UAS does not have the capability of retrieving and playing announcements? It can call in the services of a media server. The following figure describes such a call flow. UAC UAS Media Server | | | | INVITE | | |---------->| | | | INVITE | | |------------->| | 180 | 200 OK | |<----------|<-------------| | 183 | ACK | |<----------|------------->| | RTP | |<=========================| | | | | | BYE | | |<-------------| | 486 | OK (BYE) | |<----------|------------->| | | | Burger INFORMATIONAL û Expires 4/2002 8 Why Early Media in SIP October 2001 Note that as described in section 6.1.1, it would not be appropriate for the media server to be using early media signaling. However, it is quite appropriate for the UAS to be using early media signaling to the UAC. 200 OK is inappropriate between the UAS and the UAC, as the UAS will not accept the call. Said differently, even though the UAS û Media Server interaction does not require early media, the UAC û UAS interaction does. 7. Security Considerations A network that allows early media may treat it differently from session media. For example, one or both of the parties may pay for session media while one or both parties might not have to pay for early media. If there is a billing difference between early media and session media, there may be an incentive for users to abuse the early media mechanisms to get free service. We admonished the reader to not directly implement call flows in Section 3 of this document. We have not analyzed these call flows for any security issues they may present. 8. References 1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. 2 Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997 3 Handley, M., Schulzrinne, H., Schooler, E., Rosenberg, J., "SIP: Session Initiation Protocol", draft-ietf-sip-rfc2543bis-04.txt, July 2001, work in progress. 4 -, "VARIOUS TONES USED IN NATIONAL NETWORKS", Publication E.180 Supplement 2, International Telecommunications Union Telecommunications Sector (ITU-T), January 1994. An informative reference for this document. 5 O'Connor, W., Burger, E., and Van Dyke, J., " Network Announcements with SIP", draft-burger-sipping-netann-00.txt, July 2001, work in progress. 6 Sen, S., Bharatia, J., Hogg, C., Audet, F., "Early Media Issues and Scenarios", draft-sen-sip-earlymedia-00.txt, July 2001, work in progress. Burger INFORMATIONAL û Expires 4/2002 9 Why Early Media in SIP October 2001 7 Hoffpauir, S., and Maxon, Lisa-Marie, "Enhanced Services Framework", International Softswitch Consortium, June 2001, work in progress. 9. Acknowledgments 10. Author's Addresses Eric Burger SnowShore Networks, Inc. 285 Billerica Rd. Chelmsford, MA 01824-4120 USA Phone: +1 978/367-8403 Email: eburger@snowshore.com Burger INFORMATIONAL û Expires 4/2002 10 Why Early Media in SIP October 2001 11. Full Copyright Statement Copyright (C) The Internet Society (2001). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement The Internet Society currently provides funding for the RFC Editor function. Burger INFORMATIONAL û Expires 4/2002 11