Network Working Group S. Proust, Ed.
Internet-Draft Orange
Intended status: Informational February 10, 2016
Expires: August 13, 2016

Additional WebRTC audio codecs for interoperability.
draft-ietf-rtcweb-audio-codecs-for-interop-05

Abstract

To ensure a baseline level of interoperability between WebRTC endpoints, a minimum set of required codecs is specified. However, to maximize the possibility to establish the session without the need for audio transcoding, it is also recommended to include in the offer other suitable audio codecs that are available to the browser.

This document provides some guidelines on the suitable codecs to be considered for WebRTC endpoints to address the most relevant interoperability use cases.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on August 13, 2016.

Copyright Notice

Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

As indicated in [I-D.ietf-rtcweb-overview], it has been anticipated that WebRTC will not remain an isolated island and that some WebRTC endpoints will need to communicate with devices used in other existing networks with the help of a gateway. Therefore, in order to maximize the possibility to establish the session without the need for audio transcoding, it is recommended in [I-D.ietf-rtcweb-audio] to include in the offer other suitable audio codecs beyond those that are mandatory to implement. This document provides some guidelines on the suitable codecs to be considered for WebRTC endpoints to address the most relevant interoperability use cases.

The codecs considered in this document are recommended to be supported and included in the Offer only for WebRTC endpoints for which interoperability with other non-WebRTC endpoints and non-WebRTC based services is relevant as described in Section 4.1.2, Section 4.2.2, Section 4.3.2. Other use cases may justify offering other additional codecs to avoid transcoding.

2. Definition and abbreviations

3. Rationale for additional WebRTC codecs

The mandatory implementation of OPUS [RFC6716] in WebRTC endpoints can guarantee codec interoperability (without transcoding) at state of the art voice quality (better than narrow band "PSTN" quality) between WebRTC endpoints. The WebRTC technology is also expected to be used to communicate with other types of endpoints using other technologies. It can be used for instance as an access technology to VoLTE services (Voice over LTE as specified in [IR.92]) or to interoperate with fixed or mobile Circuit Switched or VoIP services like mobile Circuit Switched voice over 3GPP 2G/3G mobile networks [TS23.002] or DECT based VoIP telephony [EN300175-1]. Consequently, a significant number of calls are likely to occur between terminals supporting WebRTC endpoints and other terminals like mobile handsets, fixed VoIP terminals, DECT terminals that do not support WebRTC endpoints nor implement OPUS. As a consequence, these calls are likely to be either of low narrow band PSTN quality using G.711 [G.711] at both ends or affected by transcoding operations. The drawback of such transcoding operations are listed below:

4. Additional suitable codecs for WebRTC

The following codecs are considered as relevant codecs with respect to the general purpose described in Section 3. This list reflects the current status of WebRTC foreseen use cases. It is not limitative and opened to further inclusion of other codecs for which relevant use cases can be identified. These additional codecs are recommended to be included in the offer in addition to OPUS and G.711 according to the foreseen interoperability cases to be addressed.

4.1. AMR-WB

4.1.1. AMR-WB General description

The Adaptive Multi-Rate WideBand (AMR-WB) is a 3GPP defined speech codec that is mandatory to implement in any 3GPP terminal that supports wideband speech communication. It is being used in circuit switched mobile telephony services and new multimedia telephony services over IP/IMS. It is especially used for voice over LTE as specified by GSMA in [IR.92]. More detailed information on AMR-WB can be found in [IR.36]. References for AMR-WB related specifications including detailed codec description and source code are in [TS26.171], [TS26.173], [TS26.190], [TS26.204].

4.1.2. WebRTC relevant use case for AMR-WB

The market of personal voice communication is driven by mobile terminals. AMR-WB is now very widely implemented in devices and networks offering "HD Voice" A high number of calls are consequently likely to occur between WebRTC endpoints and mobile 3GPP terminals offering AMR-WB. The use of AMR-WB by WebRTC endpoints would consequently allow transcoding free interoperation with all mobile 3GPP wideband terminals. Besides, WebRTC endpoints running on mobile terminals (smartphones) may reuse the AMR-WB codec already implemented on these devices.

4.1.3. Guidelines for AMR-WB usage and implementation with WebRTC

The payload format to be used for AMR-WB is described in [RFC4867] with bandwidth efficient format and one speech frame encapsulated in each RTP packets. Further guidelines for implementing and using AMR-WB and ensuring interoperability with 3GPP mobile services can be found in [TS26.114]. In order to ensure interoperability with 4G/VoLTE as specified by GSMA, the more specific IMS profile for voice derived from [TS26.114] should be considered in [IR.92]. In order to maximize the possibility of successful call establishment for WebRTC endpoints offering AMR-WB it is important that the WebRTC endpoints:

4.2. AMR

4.2.1. AMR General description

Adaptive Multi-Rate (AMR) is a 3GPP defined speech codec that is mandatory to implement in any 3GPP terminal that supports voice communication. This include both mobile phone calls using GSM and 3G cellular systems as well as multimedia telephony services over IP/IMS and 4G/VoLTE, such as, GSMA voice IMS profile for VoLTE in [IR.92]. In addition to impacts listed above, support of AMR can avoid degrading the high efficiency over mobile radio access.References for AMR related specifications including detailed codec description and source code are in [TS26.071], [TS26.073], [TS26.090], [TS26.104].

4.2.2. WebRTC relevant use case for AMR

A user of a WebRTC endpoint on a device integrating an AMR module wants to communicate with another user that can only be reached on a mobile device that only supports AMR. Although more and more terminal devices are now "HD voice" and support AMR-WB; there are still a high number of legacy terminals supporting only AMR (terminals with no wideband / HD Voice capabilities) that are still in use. The use of AMR by WebRTC endpoints would consequently allow transcoding free interoperation with all mobile 3GPP terminals. Besides, WebRTC endpoints running on mobile terminals (smartphones) may reuse the AMR codec already implemented on these devices.

4.2.3. Guidelines for AMR usage and implementation with WebRTC

The payload format to be used for AMR is described in [RFC4867] with bandwidth efficient format and one speech frame encapsulated in each RTP packets. Further guidelines for implementing and using AMR with purpose to ensure interoperability with 3GPP mobile services can be found in [TS26.114]. In order to ensure interoperability with 4G/VoLTE as specified by GSMA, the more specific IMS profile for voice derived from [TS26.114] should be considered in [IR.92]. In order to maximize the possibility of successful call establishment for WebRTC endpoints offering AMR, it is important that the WebRTC endpoints:

4.3. G.722

4.3.1. G.722 General description

G.722 [G.722] is an ITU-T defined wideband speech codec. G.722 was approved by ITU-T in 1988. It is a royalty free codec that is common in a wide range of terminals and endpoints supporting wideband speech and requiring low complexity. The complexity of G.722 is estimated to 10 MIPS [EN300175-8] which is 2.5 to 3 times lower than AMR-WB. Especially, G.722 has been chosen by ETSI DECT as the mandatory wideband codec for New Generation DECT with purpose to greatly increase the voice quality by extending the bandwidth from narrow band to wideband. G.722 is the wideband codec required for CAT-iq DECT certified terminals and the V2.0 of CAT-iq specifications have been approved by GSMA as minimum requirements for HD voice logo usage on "fixed" devices; i.e., broadband connections using the G.722 codec.

4.3.2. WebRTC relevant use case for G.722

G.722 is the wideband codec required for DECT CAT-iq terminals. DECT cordeless phones are still widely used to offer short range wireless connection to PSTN or VoIP services. G.722 has also been specified by ETSI in [TS181005] as mandatory wideband codec for IMS multimedia telephony communication service and supplementary services using fixed broadband access. The support of G.722 would consequently allow transcoding free IP interoperation between WebRTC endpoints and fixed VoIP terminals including DECT / CAT-IQ terminals supporting G.722. Besides, WebRTC endpoints running on fixed terminals implementing G.722 may reuse the G.722 codec already implemented on these devices.

4.3.3. Guidelines for G.722 usage and implementation

The payload format to be used for G.722 is defined in [RFC3551] with each octet of the stream of octets produced by the codec to be octet-aligned in an RTP packet. The sampling frequency for G.722 is 16 kHz but the rtp clock rate is set to 8000Hz in SDP to stay backward compatible with an erroneous definition in the original version of the RTP A/V profile. Further guidelines for implementing and using G.722 with purpose to ensure interoperability with multimedia telephony services over IMS can be found in section 7 of [TS26.114]. Additional information of G.722 implementation in DECT can be found in [EN300175-8] and full codec description and C source code in [G.722].

5. Security Considerations

Security considerations for WebRTC Audio Codec and Processing Requirements can be found in [I-D.ietf-rtcweb-audio]. Implementors making use of the additional codecs considered in this document are advised to also refer more specifically to the "Security Considerations" sections of [RFC4867] (for AMR and AMR-WB) and [RFC3551].

6. IANA Considerations

None.

7. Acknowledgements

The authors of this document are

though only the editor is listed on the front page.

The authors would like to thank Magnus Westerlund, Barry Dingle and Sanjay Mishra who carefully reviewed the document and helped to improve it.

8. References

8.1. Normative references

[G.722] ITU, "Recommendation ITU-T G.722 (2012): 7 kHz audio-coding within 64 kbit/s", 2012-09.
[I-D.ietf-rtcweb-audio] Valin, J. and C. Bran, "WebRTC Audio Codec and Processing Requirements", Internet-Draft draft-ietf-rtcweb-audio-10, February 2016.
[IR.92] GSMA, "IMS Profile for Voice and SMS V9.0", April 2015.
[RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", STD 65, RFC 3551, DOI 10.17487/RFC3551, July 2003.
[RFC4867] Sjoberg, J., Westerlund, M., Lakaniemi, A. and Q. Xie, "RTP Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs", RFC 4867, DOI 10.17487/RFC4867, April 2007.
[TS26.071] 3GPP, "3GPP TS 26.071 v12.0.0: Recommendation ITU-T G.722 (2012): "Mandatory Speech Codec speech processing functions; AMR Speech CODEC; General description".", 2014-09.
[TS26.073] 3GPP, "3GPP TS 26.073 v12.0.0: ANSI C code for the Adaptive Multi Rate (AMR) speech codec", 2014-09.
[TS26.090] 3GPP, "3GPP TS 26.090 v12.0.0: Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Transcoding functions.", 2014-09.
[TS26.104] 3GPP, "3GPP TS 26.104 v12.0.0: ANSI C code for the floating-point Adaptive Multi Rate (AMR) speech codec.", 2014-09.
[TS26.114] 3GPP, "IP Multimedia Subsystem (IMS); Multimedia telephony; Media handling and interaction V13.0.0", June 2015.
[TS26.171] 3GPP, "3GPP TS 26.071 v12.0.0: Recommendation ITU-T G.722 (2012): "Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; General description".", 2014-09.
[TS26.173] 3GPP, "3GPP TS 26.073 v12.1.0: ANSI-C code for the Adaptive Multi-Rate - Wideband (AMR-WB) speech codec.", 2015-03.
[TS26.190] 3GPP, "3GPP TS 26.090 v12.0.0: Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Transcoding functions.", 2014-09.
[TS26.204] 3GPP, "3GPP TS 26.104 v12.1.0: Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; ANSI-C code.", 2015-03.

8.2. Informative references

[EN300175-1] ETSI, "ETSI EN 300 175-1, Digital Enhanced Cordless Telecommunications (DECT); Common Interface (CI); Part 1: Overview v2.5.1", 2009.
[EN300175-8] ETSI, "ETSI EN 300 175-8, v2.5.1: Digital Enhanced Cordless Telecommunications (DECT); Common Interface (CI); Part 8: Speech and audio coding and transmission.", 2009.
[G.711] ITU, "Recommendation ITU-T G.711 (2012): Pulse code modulation (PCM) of voice frequencies", 1988-11.
[I-D.ietf-rtcweb-overview] Alvestrand, H., "Overview: Real Time Protocols for Browser-based Applications", Internet-Draft draft-ietf-rtcweb-overview-15, January 2016.
[IR.36] GSMA, "Adaptive Multirate Wide Band V3.0", September 2014.
[RFC6716] Valin, JM., Vos, K. and T. Terriberry, "Definition of the Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, September 2012.
[TS181005] ETSI, "Telecommunications and Internet converged Services and Protocols for Advanced Networking (TISPAN); Service and Capability Requirements V3.3.1 (2009-12)", 2009.
[TS23.002] 3GPP, "3GPP TS 23.002 v13.3.0: Network architecture", 2015-09.

Author's Address

Stephane Proust (editor) Orange 2, avenue Pierre Marzin Lannion, 22307 France EMail: stephane.proust@orange.com