Internet Draft Juin-Hwey Chen draft-chen-rtp-bv-00.txt Cheng-Chieh Lee June 18, 2003 Winnie Lee Expires: December 18, 2003 Jes Thyssen Broadcom Corporation RTP Payload Format for BroadVoice Speech Codecs Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document describes the RTP payload format for the BroadVoice(TM) narrowband and wideband speech codecs developed by Broadcom Corporation. The document also provides specifications for the use of BroadVoice with MIME and SDP. Table of Contents 1. Introduction....................................................2 2. Background......................................................2 3. RTP Payload Format for BroadVoice16 Narrowband Codec............3 3.1 BroadVoice16 Bit Stream Definition..........................3 3.2 Multiple BroadVoice16 Frames in An RTP Packets..............4 4. RTP Payload Format for BroadVoice32 Wideband Codec..............5 4.1 BroadVoice32 Bit Stream Definition..........................5 4.2 Multiple BroadVoice32 Frames in An RTP Packet...............7 5. Storage Format..................................................7 6. IANA Considerations.............................................8 6.1 MIME registration of BroadVoice16...........................8 6.2 MIME registration of BroadVoice32...........................9 Chen et al. [Page 1] INTERNET DRAFT RTP Payload format for BroadVoice June 2003 7. Mapping To SDP Parameters......................................10 8. Security Considerations........................................11 9. References.....................................................11 10. Authors' Addresses............................................11 1. Introduction This document specifies the payload format for sending BroadVoice encoded speech or audio signals using the Real-time Transport Protocol (RTP) [1]. The sender may send one or more BroadVoice codec data frames per packet, depending on the application scenario, based on network conditions, bandwidth availability, delay requirements, and packet-loss tolerance. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [2]. 2. Background BroadVoice [3] is a speech codec family developed by Broadcom for VoIP applications, including Voice over Cable, Voice over DSL, and IP phone applications. BroadVoice achieves high speech quality with a low coding delay and relatively low codec complexity. The BroadVoice codec family contains two codec versions. The narrowband version of BroadVoice, called BroadVoice16, or BV16 for short, encodes 8 kHz-sampled narrowband speech at a bit rate of 16 kilobits/second, or 16 kbit/s. The wideband version of BroadVoice, called BroadVoice32, or BV32, encodes 16 kHz-sampled wideband speech at a bit rate of 32 kbit/s. The BV16 and BV32 use very similar (but not identical) coding algorithms; they share most of their algorithm modules. To minimize the delay in real-time two-way communications, both the BV16 and BV32 encode speech with a very small frame size of 5 ms without using any look ahead. This allows VoIP systems based on BroadVoice to have a very low end-to-end system delay, by using a packet size as small as 5 ms if necessary. BroadVoice also has relatively low codec complexity when compared with other ITU-T standard speech codecs based on CELP (Coded Excited Linear Prediction), such as G.728, G.729, G.723.1, G.722.2, etc. Full-duplex implementations of the BV16 and BV32 take around 12 and 17 MIPS, respectively, on general-purpose 16-bit fixed-point DSP chips. The total memory footprints of the BV16 and BV32, including program size, data tables, and data RAM, are around 12 kwords, or 24 kbytes. Chen et al. [Page 2] INTERNET DRAFT RTP Payload format for BroadVoice June 2003 Cable Television Laboratories (CableLabs(R)) intends to adopt BroadVoice16 as a PacketCable(TM) audio codec standard for VoIP over Cable applications. 3. RTP Payload Format for BroadVoice16 Narrowband Codec The BroadVoice16 uses 5 ms frames and a sampling frequency of 8 kHz, so the RTP timestamp MUST be in units of 1/8000 of a second. The RTP payload for the BroadVoice16 has the format shown in the figure below. No additional header specific to this payload format is required. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTP Header [1] | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | | one or more frames of BroadVoice16 | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ When more than one codec data frame is present in a single RTP packet, the timestamp is, as always, that of the oldest data frame represented in the RTP packet. The assignment of an RTP payload type for this new packet format is outside the scope of this document, and will not be specified here. It is expected that the RTP profile for a particular class of applications will assign a payload type for this encoding, or if that is not done then a payload type in the dynamic range shall be chosen. 3.1 BroadVoice16 Bit Stream Definition The BroadVoice16 encoder operates on speech frames of 5 ms corresponding to 40 samples at a sampling rate of 8000 samples per second. For every 5 ms frame, the encoder encodes the 40 consecutive audio samples into 80 bits, or 10 octets. Thus, the 80-bit bit stream produced by the BroadVoice16 for each 5 ms frame is octet-aligned, and no padding bits are required. The bit allocation for the encoded parameters of the BroadVoice16 codec is listed in the following table. Chen et al. [Page 3] INTERNET DRAFT RTP Payload format for BroadVoice June 2003 Encoded Parameter Codeword Number of bits per frame ------------------------------------------------------------ Line Spectrum Pairs L0,L1 7+7=14 Pitch Lag PL 7 Pitch Gain PG 5 Log-Gain LG 4 Excitation Vectors V0,...,V9 5*10=50 ------------------------------------------------------------ Total: 80 bits The mapping of the encoded parameters in an 80-bit BroadVoice16 data frame is defined in the following figure. This figure shows the bit packing in "network byte order", also known as big-endian order. The bits of each 32-bit word are numbered 0 to 31, with the most significant bit on the left and numbered 0. The octets (bytes) of each word are transmitted most significant octet first. The bits of data field for each encoded parameter are numbered in the same order, with the most significant bit on the left. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | L0 | L1 | PL | PG | LG | V0| | | | | | | | |0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3|0 1| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | V0 | V1 | V2 | V3 | V4 | V5 | V6 | | | | | | | | | |2 3 4|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4|0 1 2 3| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V| V7 | V8 | V9 | |6| | | | |4|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1: BroadVoice16 bit packing 3.2 Multiple BroadVoice16 Frames in An RTP Packet More than one BroadVoice16 frame may be included in a single RTP packet by a sender. Senders have the following additional restrictions: o SHOULD NOT include more BroadVoice16 frames in a single RTP packet than will fit in the MTU of the RTP transport protocol. o MUST NOT split a BroadVoice16 frame between RTP packets. Chen et al. [Page 4] INTERNET DRAFT RTP Payload format for BroadVoice June 2003 It is RECOMMENDED that the number of frames contained within an RTP packet is consistent with the application. For example, in a telephony application where delay is important, the fewer frames per packet the lower the delay, whereas for a delay insensitive streaming or messaging application, many frames per packet would be acceptable. Information describing the number of frames contained in an RTP packet is not transmitted as part of the RTP payload. The only way to determine the number of BroadVoice16 frames is to count the total number of octets within the RTP packet, and divide the octet count by 10. 4. RTP Payload Format for BroadVoice32 Wideband Codec The BroadVoice32 uses 5 ms frames and a sampling frequency of 16 kHz, so the RTP timestamp MUST be in units of 1/16000 of a second. The RTP payload for the BroadVoice32 has the format shown in the figure below. No additional header specific to this payload format is required. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTP Header [1] | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | | one or more frames of BroadVoice32 | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ When more than one codec data frame is present in a single RTP packet, the timestamp is, as always, that of the oldest data frame represented in the RTP packet. The assignment of an RTP payload type for this new packet format is outside the scope of this document, and will not be specified here. It is expected that the RTP profile for a particular class of applications will assign a payload type for this encoding, or if that is not done then a payload type in the dynamic range shall be chosen. 4.1 BroadVoice32 Bit Stream Definition The BroadVoice32 encoder operates on speech frames of 5 ms corresponding to 80 samples at a sampling rate of 16000 samples per second. For every 5 ms frame, the encoder encodes the 80 consecutive audio samples into 160 bits, or 20 octets. Thus, the 160-bit bit stream produced by the BroadVoice32 for each 5 ms frame Chen et al. [Page 5] INTERNET DRAFT RTP Payload format for BroadVoice June 2003 is octet-aligned, and no padding bits are required. The bit allocation for the encoded parameters of the BroadVoice32 codec is listed in the following table. Number of bits Encoded Parameter Codeword per frame --------------------------------------------------------------- Line Spectrum Pairs L0,L1,L2 7+5+5=17 Pitch Lag PL 8 Pitch Gain PG 5 Log-Gains (1st & 2nd subframes) LG0,LG1 5+5=10 Excitation Vectors (1st subframe) VA0,...,VA9 6*10=60 Excitation Vectors (2nd subframe) VB0,...,VB9 6*10=60 --------------------------------------------------------------- Total: 160 bits The mapping of the encoded parameters in a 160-bit BroadVoice32 data frame is defined in the following figure. This figure shows the bit packing in "network byte order", also known as big-endian order. The bits of each 32-bit word are numbered 0 to 31, with the most significant bit on the left and numbered 0. The octets (bytes) of each word are transmitted most significant octet first. The bits of data field for each encoded parameter are numbered in the same order, with the most significant bit on the left. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | L0 | L1 | L2 | PL | PG |LG0| | | | | | | | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7|0 1 2 3 4|0 1| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LG0 | LG1 | VA0 | VA1 | VA2 | VA3 | | | | | | | | |2 3 4|0 1 2 3 4|0 1 2 3 4 5|0 1 2 3 4 5|0 1 2 3 4 5|0 1 2 3 4 5| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VA4 | VA5 | VA6 | VA7 | VA8 |VA9| | | | | | | | |0 1 2 3 4 5|0 1 2 3 4 5|0 1 2 3 4 5|0 1 2 3 4 5|0 1 2 3 4 5|0 1| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VA9 | VB0 | VB1 | VB2 | VB3 | VB4 | | | | | | | | |2 3 4 5|0 1 2 3 4 5|0 1 2 3 4 5|0 1 2 3 4 5|0 1 2 3 4 5|0 1 2 3| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |VB4| VB5 | VB6 | VB7 | VB8 | VB9 | | | | | | | | |4 5|0 1 2 3 4 5|0 1 2 3 4 5|0 1 2 3 4 5|0 1 2 3 4 5|0 1 2 3 4 5| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: BroadVoice32 bit packing Chen et al. [Page 6] INTERNET DRAFT RTP Payload format for BroadVoice June 2003 4.2 Multiple BroadVoice32 Frames in An RTP Packet More than one BroadVoice32 frame may be included in a single RTP packet by a sender. Senders have the following additional restrictions: o SHOULD NOT include more BroadVoice32 frames in a single RTP packet than will fit in the MTU of the RTP transport protocol. o MUST NOT split a BroadVoice32 frame between RTP packets. It is RECOMMENDED that the number of frames contained within an RTP packet is consistent with the application. For example, in a telephony application where delay is important, the fewer frames per packet the lower the delay, whereas for a delay insensitive streaming or messaging application, many frames per packet would be acceptable. Information describing the number of frames contained in an RTP packet is not transmitted as part of the RTP payload. The only way to determine the number of BroadVoice32 frames is to count the total number of octets within the RTP packet, and divide the octet count by 20. 5. Storage Format The storage format is used for storing speech frames, e.g., as a file or e-mail attachment. The file begins with a header that includes only a magic number to identify the codec that is used. The magic number for the BroadVoice16 narrowband codec MUST correspond to the ASCII character string "#!BV16\n", or "0x23 0x21 0x42 0x56 0x31 0x36 0x0A" in hexadecimal format. The magic number for the BroadVoice32 wideband codec MUST correspond to the ASCII character string "#!BV32\n", or "0x23 0x21 0x42 0x56 0x33 0x32 0x0A". A file contains the encoded bit stream of either BroadVoice16 or BroadVoice32, but not both. After the header that contains the magic number identifying the codec used, the encoded codec data frames are stored in a sequential order, as shown below. +--------+---------------+---------------+-----+---------------+ | Header | Codec frame 1 | Codec frame 2 | ... | Codec frame N | +--------+---------------+---------------+-----+---------------+ Chen et al. [Page 7] INTERNET DRAFT RTP Payload format for BroadVoice June 2003 6. IANA Considerations Two new MIME sub-types as described in this section are to be registered. The MIME names for the BV16 and BV32 codecs are to be allocated from the IETF tree since these two codecs are expected to be widely used for Voice-over-IP applications, espcially in Voice over Cable applications. 6.1 MIME registration of BroadVoice16 MIME media type name: audio MIME media subtype name: BV16 Required Parameter: none Optional parameters: The following parameters apply to RTP transfer only. ptime: Defined as usual for RTP audio (see RFC 2327). maxptime: The maximum amount of media which can be encapsulated in each packet, expressed as time in milliseconds. The time SHALL be calculated as the sum of the time the media present in the packet represents. The time SHOULD be a multiple of the duration of a single codec data frame (5 ms). If not signaled, the default maxptime value SHALL be 200 milliseconds. Encoding considerations: This type is defined for transfer of BV16-encoded data via RTP using the payload format specified in Sections 3 of RFC xxxx. It is also defined for other transfer methods using the storage format specified in Section 5 of RFC xxxx. Audio data is binary data, and must be encoded for non-binary transport; the Base64 encoding is suitable for Email. Security considerations: See Section 8 "Security Considerations" of RFC xxxx. Public specification: The BroadVoice16 codec will be specified in a CableLabs PacketCable standard document. Additional information: The following information applies to storage format only. Chen et al. [Page 8] INTERNET DRAFT RTP Payload format for BroadVoice June 2003 Magic number: ASCII character string "#!BV16\n" (or "0x23 0x21 0x42 0x56 0x31 0x36 0x0A" in hexadecimal) File extensions: bvn, BVN (stands for "BroadVoice, Narrowband") Macintosh file type code: none Object identifier or OID: none Intended usage: COMMON. It is expected that many VoIP applications, especially Voice over Cable applications, will use this type. Person & email address to contact for further information: Juin-Hwey (Raymond) Chen rchen@broadcom.com Author/Change controller: Author: Juin-Hwey (Raymond) Chen, rchen@broadcom.com Change Controller: IETF Audio/Video Transport Working Group 6.2 MIME registration of BroadVoice32 MIME media type name: audio MIME media subtype name: BV32 Required Parameter: none Optional parameters: The following parameters apply to RTP transfer only. ptime: Defined as usual for RTP audio (see RFC 2327). maxptime: The maximum amount of media which can be encapsulated in each packet, expressed as time in milliseconds. The time SHALL be calculated as the sum of the time the media present in the packet represents. The time MUST be a multiple of the duration of a single codec data frame (5 ms). If not signaled, the default maxptime value SHALL be 200 milliseconds. Encoding considerations: This type is defined for transfer of BV32-encoded data via RTP using the payload format specified in Sections 4 of RFC xxxx. It is also defined for other transfer methods using the storage format specified in Section 5 of RFC xxxx. Audio data is binary data, and must be encoded for non-binary transport; the Base64 encoding is suitable for Email. Chen et al. [Page 9] INTERNET DRAFT RTP Payload format for BroadVoice June 2003 Security considerations: See Section 8 "Security Considerations" of RFC xxxx. Additional information: The following information applies to storage format only. Magic number: ASCII character string "#!BV32\n" (or "0x23 0x21 0x42 0x56 0x33 0x32 0x0A" in hexadecimal) File extensions: bvw, BVW (stands for "BroadVoice, Wideband") Macintosh file type code: none Object identifier or OID: none Intended usage: COMMON. It is expected that many VoIP applications, especially Voice over Cable applications, will use this type. Person & email address to contact for further information: Juin-Hwey (Raymond) Chen rchen@broadcom.com Author/Change controller: Author: Juin-Hwey (Raymond) Chen, rchen@broadcom.com Change Controller: IETF Audio/Video Transport Working Group 7. Mapping To SDP Parameters Parameters are mapped to SDP [4] in a standard way. When conveying information by SDP, the encoding name SHALL be "BV16" for the BroadVoice16 narrowband codec and "BV32" for the BroadVoice32 wideband codec (the same as the MIME media subtype names). An example of the media representation in SDP for describing BV16 might be: m=audio 49120 RTP/AVP 97 a=rtpmap:97 BV16/8000 An example of the media representation in SDP for describing BV32 might be: m=audio 49122 RTP/AVP 99 a=rtpmap:99 BV32/16000 Chen et al. [Page 10] INTERNET DRAFT RTP Payload format for BroadVoice June 2003 8. Security Considerations RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [1] and any appropriate profile (for example, [5]). This implies that confidentiality of the media streams is achieved by encryption. Because the data compression used with this payload format is applied end-to-end, encryption may be performed after compression so there is no conflict between the two operations. A potential denial-of-service threat exists for data encoding using compression techniques that have non-uniform receiver-end computational load. The attacker can inject pathological datagrams into the stream which are complex to decode and cause the receiver to become overloaded. However, the encodings covered in this document do not exhibit any significant non-uniformity. 9. References [1] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", IETF RFC 1889, January 1996. [2] S. Bradner, "Key words for use in RFCs to Indicate requirement Levels", BCP 14, RFC 2119, March 1997. [3] PacketCable(TM) Audio/Video Codecs Specification, Cable Television Laboratories, Inc. [4] M. Handley and V. Jacobson, "SDP: Session Description Protocol", IETF RFC 2327, April 1998 [5] H. Schulzrinne, "RTP Profile for Audio and Video Conferences with Minimal Control" IETF RFC 1890, January 1996. 10. Authors' Addresses Juin-Hwey (Raymond) Chen Broadcom Corporation Room A3032 16215 Alton Parkway Irvine, CA 92618 USA Phone: +1 949-585-6288 Email: rchen@broadcom.com Chen et al. [Page 11] INTERNET DRAFT RTP Payload format for BroadVoice June 2003 Cheng-Chieh Lee Broadcom Corporation Room A3086 16215 Alton Parkway Irvine, CA 92618 USA Phone: +1 949-585-6467 Email: cclee@broadcom.com Winnie Lee Broadcom Corporation Room A2012E 200-13711 International Place Richmond, British Columbia V6V 2Z8 Canada Phone: +1 604-233-8605 Email: wlee@broadcom.com Jes Thyssen Broadcom Corporation Room A3053 16215 Alton Parkway Irvine, CA 92618 USA Phone: +1 949-585-5768 Email: jthyssen@broadcom.com Chen et al. [Page 12]