Internet Engineering Task Force                                    Sassan Ahmadi
Audio Video Transport WG                                              Nokia Inc. 
INTERNET-DRAFT                                                  November 1, 2003                                      
Expires: June 1, 2004


  Real-Time Transport Protocol (RTP) Payload and File Storage Formats for the 
            Variable-Rate Multimode Wideband (VMR-WB) Audio Codec
                        <draft-ahmadi-avt-rtp-vmr-wb-00.txt>


Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC 2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsolete by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or cite them other than as "work in progress".

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/lid-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This document is an individual submission to the IETF. Comments
   should be directed to the authors.

Copyright Notice

   Copyright (C) The Internet Society (2003). All Rights Reserved.

Abstract

   This document specifies a real-time transport protocol (RTP)
   payload format to be used for Variable-Rate Multimode Wideband (VMR-WB)
   speech codec. The payload format is designed to be able to   
   interoperate with existing VMR-WB transport formats on non-IP
   networks. In addition, a file format is specified for transport of
   VMR-WB speech data in storage mode applications such as
   email. A MIME type registration is included, for VMR-WB, specifying use of 
   both the RTP payload and the storage format.

   VMR-WB is a variable-rate multimode wideband speech codec that has a 
   number of operating modes, one of which is fully interoperable with AMR-WB
   (G.722.2) audio codec. Therefore, provisions have been made in 
   this draft to facilitate and simplify data packet exchange between VMR-WB and 
   AMR-WB (i.e., RFC 3267) in the interoperable mode with minimal logic
   interworking function in the transport layer (i.e., a RTP translator).

Sassan Ahmadi                                                        [page 1]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

Table of Contents

   1. Introduction.................................................3
   2. Conventions and Acronyms.....................................3
   3. Background on the Adaptive Multi-Rate Wideband (AMR-WB) Speech  
      Codec........................................................4
   4. The Variable-Rate Multimode Wideband (VMR-WB) Speech Codec...4
     4.1. Narrowband Speech Processing.............................6
     4.2. Continuous vs. Discontinuous Transmission................6
   5. Support for Multi-Channel Session............. ..............7
   6. Robustness against Packet Loss......... .....................7
     6.1. Forward Error Correction (FEC)...........................8
     6.2. Frame Interleaving and Multi-Frame Encapsulation.........8
   7. Bandwidth Efficient or Octet-aligned Mode....................9
   8. VMR-WB Voice over IP scenarios..............................11
   9. VMR-WB RTP Payload Format...................................11
     9.1. RTP Header Usage........................................12
     9.2. Payload Structure.......................................12
     9.3. Bandwidth-Efficient Mode................................12
       9.3.1. The Payload Header..................................14
       9.3.2. The Payload Table of Contents.......................16
       9.3.3. Speech Data.........................................16
       9.3.4. Algorithm for Forming the Payload...................16
       9.3.5 Payload Examples.....................................17
        9.3.5.1. Single Channel Payload Carrying a Single Frame...17
        9.3.5.2. Single Channel Payload Carrying Multiple Frames..18
        9.3.5.3. Multi-Channel Payload Carrying Multiple Frames...19
     9.4. Octet-aligned Mode......................................19
       9.4.1. The Payload Header..................................20
       9.4.2. The Payload Table of Contents ............... ......22
       9.4.3. Speech Data.........................................22
       9.4.4. Methods for Forming the Payload.....................22
       9.4.5. Payload Example.....................................22
        9.4.5.1. Basic Single Channel Payload Carrying 
                 Multiple Frames... ..............................22
     9.5. Implementation Considerations...........................23
   10. VMR-WB Storage Format......................................23
     10.1. Single Channel Header..................................24
     10.2. Multi-channel Header...................................25
     10.3. Speech Frames..........................................26
   11. Congestion Control (Network-Controlled Mode Switching).....26
   12. Security Considerations....................................26
     12.1. Confidentiality........................................27
     12.2. Authentication.........................................27
     12.3. Decoding Validation and Provision for Lost or Late  
           Packets......................... ......................28
   13. Payload Format Parameters..................................28
     13.1. VMR-WB MIME Registration...............................28
     13.2. Mapping MIME Parameters into SDP.......................31
   14. IANA Considerations........................................32
   15. Acknowledgements...........................................32
   Appendix A  VMR-WB Frame Structure.............................33


Sassan Ahmadi                                                        [page 2]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

   Appendix B  Interworking Function (IWF) for Interoperable AMR-WB 
               <-> VMR-WB Interconnections........................35
   References.....................................................38
   Normative References...........................................38
   Informative References.........................................39
   Author's Address...............................................39
   Full Copyright Statement.......................................40


1. Introduction

   This document specifies the payload format for packetization of 
   VMR-WB encoded speech signals into the Real-time Transport
   Protocol (RTP) [5]. The payload format supports transmission of
   single and multiple channels, multiple frames per payload, the use of 
   seamless mode switching, and interoperation with existing VMR-WB transport 
   formats on non-IP networks, as described in Section 4.

   The payload format itself is specified in Section 9. A related file
   format is specified in Section 10 for transport of VMR-WB speech data in 
   storage mode applications such as email. In Section 13, a MIME type 
   registration for VMR-WB is provided.

   Since VMR-WB is interoperable with AMR-WB and understanding that IP-based 
   interconnections are practically the most efficient method through which the 
   two codecs can be connected, an attempt has been made throughout this draft  
   to maximize the similarities with RFC 3267 while optimizing the payload for 
   the VMR-WB codec itself.
   

2. Conventions and Acronyms

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
   this document are to be interpreted as described in RFC2119 [3].

   The following acronyms are used in this document:

    3GPP   - the Third Generation Partnership Project
    3GPP2  - the Third Generation Partnership Project 2
    CDMA   - Code Division Multiple Access
    WCDMA  - Wideband Code Division Multiple Access
    GSM	- Global System for Mobile Communications
    AMR-WB - Adaptive Multi-Rate Wideband Codec
    VMR-WB - Variable-Rate Multimode Wideband Codec
    CMR    - Codec Mode Request
    CN     - Comfort Noise
    DTX    - Discontinuous Transmission
    FEC    - Forward Error Correction
    SID    - Silence Indicator (the frames containing only CN parameters) 
    VAD    - Voice Activity Detection
    IWF    - Interworking Function
    TrFO   - Transcoder-Free Operation
    UDP    - User Datagram Protocol
Sassan Ahmadi                                                        [page 3]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003
  
    RTP    - Real-Time Transfer Protocol
    MIME   - Multipurpose Internet Mail Extension
    IF2    - Interface Format 2 (an AMR-WB frame structure type)
    SDP    - Session Description Protocol
    SIP    - Session Initiation Protocol

   The term "frame-block" is used in this document to describe the
   time-synchronized set of speech frames in a multi-channel VMR-WB session. 
   In particular, in an N-channel session, a frame-block will contain N 
   speech frames, one from each of the channels, and all N speech frames 
   represent exactly the same time period.


3. Background on the Adaptive Multi-Rate Wideband (AMR-WB) Speech Codec

   The Adaptive Multi-Rate Wideband (AMR-WB) speech codec was developed by 
   3rd Generation Partnership Project (3GPP) for multimedia services in 3G  
   GSM/WCDMA cellular systems [1,2,4,6]. It was later selected by ITU-T as 
   G.722.2 Recommendation.

   The AMR-WB codec is a multi-mode speech codec with Voice Activity 
   Detection and Discontinuous Transmission (VAD/DTX) capability. AMR-WB 
   supports 9 wideband speech coding modes with respective bit rates ranging    
   from 6.6 to 23.85 kbps. The input/output sampling frequency used in AMR-WB 
   is 16000 Hz and the speech processing is performed on 20 ms frames. This 
   means that each AMR-WB encoded frame represents 320 speech samples. 

   The multi-rate encoding (i.e., multi-mode) capability of AMR-WB is 
   designed for preserving high speech quality under a wide range of 
   transmission conditions. That is the AMR-WB codec modes is adapted to 
   prevailing channel conditions by a tradeoff between total number of 
   source-coding and channel-coding bits.

   With AMR-WB, GSM mobile radio systems are able to use available
   bandwidth as effectively as possible. E.g. in GSM it is possible to
   dynamically adjust the speech encoding rate during a session so as
   to continuously adapt to the varying transmission conditions by
   dividing the fixed overall bandwidth between speech data and error
   protective coding to enable best possible trade-off between speech
   compression rate and error tolerance. To perform mode adaptation,
   the decoder (speech receiver) needs to signal the encoder (speech
   sender) the new mode it prefers. This mode change signal is called
   Codec Mode Request or CMR [4].
   
   Since in most sessions speech is sent in both directions between
   the two ends, the mode requests from the decoder at one end to the
   encoder at the other end are piggy-backed over the speech frames in
   the reverse direction. In other words, there is no out-of-band
   signaling needed for sending CMRs. 


4. The Variable-Rate Multimode Wideband (VMR-WB) Speech Codec

   VMR-WB is the wideband speech-coding standard developed by Third 
Sassan Ahmadi                                                           [page 4]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003
  
   Generation Partnership Project 2 (3GPP2) for multimedia services in 3G 
   CDMA cellular systems. Unlike AMR-WB, VMR-WB is a source-controlled 
   variable-rate multimode wideband speech codec. It has a number of 
   operating modes, where each mode is a tradeoff between voice quality and 
   system capacity. Therefore, corresponding to each mode is a quality and 
   average data rate (ADR). Note that the concept of mode in VMR-WB is 
   different from that of AMR-WB. The operating mode in VMR-WB is chosen 
   based on the traffic condition of the network and the desired quality of 
   service [9,10,11]. The desired ADR in each mode is obtained by encoding 
   speech frames at different rates available in CDMA Rate-Set II 
   depending on the characteristics of input speech and the maximum and minimum 
   rate constraints imposed by the network operator.
 
   While VMR-WB is a native CDMA codec complying with all CDMA system 
   requirements, it is further interoperable with AMR-WB in one of the 
   operational modes. This is due to the fact that VMR-WB and AMR-WB share the 
   same core technology. This feature enables Transcoder Free (TrFO) 
   interconnections between VMR-WB and AMR-WB across different wireless/wireline 
   systems (e.g., GSM/WCDMA and CDMA2000) without use of unnecessary complex 
   media format conversion. Due to incompatibility of the GSM/WCDMA and CDMA2000 
   signaling protocols, a complete interoperable interconnection between VMR-WB 
   and AMR-WB is accomplished through a minimal logic Interworking Function 
   (IWF) that resides in one of the gateways in the transport layer (see 
   Appendix B for more details) between the two incompatible terminals.

   The current implementation of VMR-WB is compliant with CDMA Rate-Set II 
   operation (i.e., Multiplex Option 2 [12,13]) and supports interoperability    
   with AMR-WB at 12.65 kbps (i.e., AMR-WB mode 2). However, the current 
   document has been drafted to accommodate future design extensions to VMR-WB 
   including other AMR-WB codec modes (i.e., AMR-WB modes 0 and 1).

   VMR-WB is able to transition between various modes with no degradation in  
   voice quality that is attributable to the mode switching itself; i.e., 
   seamless mode switching. The operation mode of the VMR-WB encoder may be  
   switched seamlessly without prior knowledge of the decoder. All modes (i.e., 
   mode 0, 1, 2, and 3) can be chosen depending on the traffic conditions (i.e., 
   congestion) and the desired quality of service. 

   While in the interoperable mode, mode switching is not allowed. There is only 
   one AMR-WB interoperable mode in VMR-WB. Since AMR-WB codec depending on 
   channel conditions may request a mode change, in-band data included in VMR-WB 
   frame structure, as shown in Appendix A, is used during an interoperable 
   interconnection to switch between AMR-WB codec modes 0, 1, or 2. 

   As mentioned earlier, VMR-WB is compliant with CDMA Multiplex Option 2 [13] 
   with the permissible encoding rates shown in Table 1.

   The CDMA system requires the codecs to generate speech frames compliant with   
   the above encoding rates while operating in Rate-Set II [12]. Also, in 
   certain conditions in CDMA system such as blank-and-burst or dim-and-burst 
   signaling all or part of the primary traffic is used by the system for 
   signaling thus the codecs are forced to use lower bit rates for encoding the 
   speech data [12].

Sassan Ahmadi                                                           [page 5]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

+-----------------+---------------------------------+-------------------------+
|    Frame Type   |   Bits per Packet (Frame Size)  |   Encoding Rate (kbps)  |
+-----------------+---------------------------------+-------------------------+
|   Full-Rate     |             266                 |          13.3           |   
|   Half-Rate     |             124                 |           7.2           |
|   Quarter-Rate  |              54                 |           2.7           |
|   Eighth-Rate   |              20                 |           1.0           |
|   Blank         |               0                 |            -            |
|   Erasure       |               0                 |            -            |
+-----------------+---------------------------------+-------------------------+
Table 1: CDMA Rate-Set II frame types and their associated encoding rates

VMR-WB is robust to high percentage of packet loss and packets with corrupted rate information. The reception of Blank or Erasure frame types at decoder invokes the built-in frame error concealment mechanisms. The built-in frame error concealment mechanism in VMR-WB conceals the effect of lost packets by exploiting in-band data and the data in the previous frames. The built-in noise pre-processing module in VMR-WB considerably improves the performance under severe background noise conditions.

The VMR-WB codec further has the capability to detect and conceal frames with corrupted rate information. The frames with erroneous rate information MAY be passed to the decoder by the CDMA Multiplex sublayer in the receiving side.

4.1. Narrowband Speech Processing

VMR-WB has the capability to operate with 8000 Hz sampled input/output speech signals in all modes of operation [9,10]. Mode switching MAY be utilized to change the mode of operation while processing narrowband speech signals. However, during a session, transition between narrowband and wideband processing is not allowed due to different timestamps and other likely synchronization problems.

4.2. Continuous vs. Discontinuous Transmission

The circuit-switched operation of VMR-WB within a CDMA network requires continuous transmission of the speech data during a conversation and once a voice service option is initiated [12,13]. Also the intrinsic source-controlled variable-rate feature of the CDMA speech codecs is REQUIRED for optimal operation of the CDMA system and interference control. However, VMR-WB has the capability to operate in a discontinuous transmission mode for some packet-switched applications over IP networks, where the number of transmitted bits and packets during silence period are reduced to a minimum. The VMR-WB DTX operation is similar to that of AMR-WB [4].

5. Support for Multi-Channel Session

Both the RTP payload format and the storage format defined in this document support multi-channel audio content (e.g., a stereophonic speech session).
      
Although VMR-WB codec itself does not support encoding of multi-channel audio content into a single bit stream, it can be used to separately encode and decode each of the individual channels.

Sassan Ahmadi                                                          [page 6]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

To transport (or store) the separately encoded multi-channel content, the speech frames for all channels that are framed and encoded for the same 20 ms periods
are logically collected in a frame-block.

At the session setup, out-of-band signaling must be used to indicate the number of channels in the session and the order of the speech frames from different channels in each frame-block. When using SDP for signaling, the number of channels is specified in the rtpmap attribute and the order of channels carried in each frame-block is implied by the number of channels as specified in Section 4.1 in [19].


6. Robustness against Packet Loss

The payload format support several features including forward error correction (FEC) and frame interleaving in order to increase robustness against lost packets.


6.1. Forward Error Correction (FEC)

The simple scheme of repetition of previously sent data is one way of achieving FEC. Another possible scheme which is more bandwidth efficient is to use payload external FEC, e.g., RFC2733 [19], which generates extra packets containing repair data. The FEC feature is included for further compatibility with AMR-WB payload.

The repetition method involves the simple retransmission of previously transmitted frame-blocks together with the current frame-block(s). This is done
by using a sliding window to group the speech frame-blocks to send in each payload. Figure 1 illustrates an example.

   --+--------+--------+--------+--------+--------+--------+--------+--
     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
   --+--------+--------+--------+--------+--------+--------+--------+--

     <---- p(n-1) ---->
              <----- p(n) ----->
                       <---- p(n+1) ---->
                                <---- p(n+2) ---->
                                         <---- p(n+3) ---->
                                                  <---- p(n+4) ---->

              Figure 1: An example of redundant transmission.

In this example each frame-block is retransmitted one time in the following RTP payload packet.  Here, f(n-2)..f(n+4) denotes a sequence of speech frame-blocks and p(n-1)..p(n+4) a sequence of payload packets.

The use of this approach does not require signaling at the session setup. In other words, the speech sender can choose to use this scheme without consulting the receiver. This is because a packet containing redundant frames will not look 


Sassan Ahmadi                                                           [page 7]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

different from a packet with only new frames.  The receiver may receive multiple copies or versions of a frame for a certain timestamp if no packet is lost.  If multiple versions of the same speech frame are received, it is RECOMMENDED that the highest rate be used by the speech decoder.

This redundancy scheme provides the same functionality as the one described in RFC 2198 "RTP Payload for Redundant Audio Data" [19]. In most cases the mechanism in this payload format is more efficient and simpler than requiring both endpoints to support RFC 2198 in addition. If the spread in time required between the primary and redundant encodings is larger than 5 frame times, the bandwidth overhead of RFC 2198 will be lower.

The sender is responsible for selecting an appropriate amount of redundancy based on feedback about the channel, e.g., in RTCP receiver reports, or network traffic. A sender should not base selection of FEC on the CMR, as this parameter most probably was set based on none-IP information. The sender is also responsible for avoiding congestion, which may be aggravated by redundant transmission.


6.2. Frame Interleaving and Multi-Frame Encapsulation

To decrease protocol overhead, the payload design allows several speech frame-blocks be encapsulated into a single RTP packet. One of the drawbacks of such approach is that in case of packet loss this means loss of several consecutive speech frame-blocks, which usually causes clearly audible distortion in the reconstructed speech. Interleaving of frame-blocks can improve the speech quality in such cases by distributing the consecutive losses into a series
of single frame-block losses. However, interleaving and bundling several frame-blocks per payload will also increase end-to-end delay and is therefore not appropriate for all types of applications. Streaming applications will most likely be able to exploit interleaving to improve speech quality in lossy transmission conditions.

This payload design supports the use of frame interleaving as an option. For the encoder (speech sender) to use frame interleaving in its outbound RTP packets for a given session, the decoder (speech receiver) needs to indicate its support via out-of-band means (see Section 13). 


7. Bandwidth Efficient or Octet-aligned Mode

For a given session, the payload format can be either bandwidth efficient or octet aligned, depending on the mode of operation that is established for the session via out-of-band means.

In the octet-aligned format, all the fields in a payload, including payload header, table of contents entries, and speech frames themselves, are individually aligned to octet boundaries to make implementations efficient. In the bandwidth efficient format only the full payload is octet aligned, so fewer padding bits are added.

Note, octet alignment of a field or payload means that the last octet is padded 

Sassan Ahmadi                                                           [page 8]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

with zeroes in the least significant bits to fill the octet. Also note that this padding is separate from padding indicated by the P bit in the RTP header.

Between the two payload formation approaches, only the octet-aligned mode has  the capability to use the interleaving to make the speech transport robust to packet loss.


8. VMR-WB Voice over IP Scenarios

The primary scenario for this payload format is IP end-to-end between two terminals incorporating VMR-WB codec, as shown in Figure 2. This payload format is expected to be useful for both conversational and streaming services.
   
       +----------+                         +----------+
       |          |                         |          |
       | TERMINAL |<----------------------->| TERMINAL |
       |          |    RTP/UDP/IP/VMR-WB    |          |
       +----------+                         +----------+

   Figure 2: IP terminal to IP terminal scenario

A conversational service puts requirements on the payload format. Low delay is a very important factor, i.e. fewer speech frame-blocks per payload packet. Low overhead is also required when the payload format traverses low bandwidth links, especially if the frequency of packets will be high. 

Streaming service has less strict real-time requirements and therefore can use a larger number of frame-blocks per packet than conversational service. This reduces the overhead from IP, UDP, and RTP headers. However, including several frame-blocks per packet makes the transmission more vulnerable to packet loss, so interleaving may be used to reduce the effect of packet loss on speech quality. A streaming server handling a large number of clients also needs a payload format that requires as few resources as possible when doing packetization. The octet-aligned and interleaving modes require the least amount of resources, while bandwidth efficient mode is more demanding.

Another scenario occurs when VMR-WB encoded speech will be transmitted from a non-IP system (e.g., 3GPP2/CDMA2000 network) to an RTP/UDP/IP VoIP terminal, and/or vice versa, as depicted in Figure 3.

    VMR-WB over 
3GPP2/CDMA2000 network
                         +------+                        +----------+
                         |      |                        |          |
    <------------------->|  GW  |<---------------------->| TERMINAL |
                         |      |   RTP/UDP/IP/VMR-WB    |          |
                         +------+                        +----------+
                             |
                             |           IP network
                             |

   Figure 3: GW to VoIP terminal scenario

Sassan Ahmadi                                                           [page 9]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

VMR-WB's capability to seamlessly switch between modes is exploited in CDMA (non-IP) networks to optimize speech quality for a given traffic condition. To preserve this functionality in scenarios including a gateway to an IP network, a
codec mode request (CMR) field is needed. The gateway will be responsible for forwarding the CMR between the non-IP and IP parts in both directions. The IP terminal should follow the CMR forwarded by the gateway to optimize speech quality going to the non-IP decoder. The mode control algorithm in the gateway SHOULD accommodate the delay imposed by the IP network on the response to CMR by the IP terminal.

The IP terminal should not set the CMR (see Section 9.3.1), but the gateway can set the CMR value on frames going toward the encoder in the non-IP part to optimize speech quality from that encoder to the gateway.  The gateway can alternatively set a lower CMR value, if desired, as one means to control congestion on the IP network.

A third likely scenario is that RTP/UDP/IP is used as transport between two non-IP systems, i.e., IP is originated and terminated in gateways on both sides of the IP transport, as illustrated in Figure 4. This is the most likely scenario for an interoperable interconnection between 3GPP/(GSM, WCDMA)/AMR-WB and 3GPP2/CDMA2000/VMR-WB.

   VMR-WB over                                                  AMR-WB over                        3GPP2/CDMA2000 network                                 3GPP/(GSM, WCDMA) network 
                                                  
                      +------+                     +------+ 
                      |  GW  |  RTP/UDP/IP/AMR-WB  |      |                    <-------------------->|------|<------------------->|  GW  |<------------------->
                      | IWF  |                     |      | 
                      +------+                     +------+ 
                          |                           |
                          |         IP network        |
                          |                           |

   Figure 4: GW to GW scenario (AMR-WB <-> VMR-WB interoperable interconnection)

The use of an Interworking Function (IWF) in the gateway immediately interfacing with the 3GPP2/CDMA2000 network is REQUIRED in this scenario. The IWF entity resides in the transport layer and is used for interoperable interconnections only. In addition, the CMR value may be set in packets received by the gateways on the IP network side.  The gateway should forward to the non-IP side a CMR value that is the minimum of two values (1) the CMR value it receives on the IP side; and (2) a CMR value it may choose for congestion control of transmission on the IP side.

The details of the traffic control algorithm are left to the implementation.

The fourth example VoIP scenario comprises a RTP/UDP/IP transport between two non-IP systems, i.e., IP is originated and terminated in gateways on both sides of the IP transport, as illustrated in Figure 5. This is the most likely scenario for Mobile Station-to-Mobile Station (MS-to-MS) Transcoder-Free (TrFO) interconnection between two 3GPP2/CDMA2000 terminals that both use VMR-WB codec.


Sassan Ahmadi                                                          [page 10]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

VMR-WB over                                                  VMR-WB over                        3GPP2/CDMA2000 network                                 3GPP2/CDMA2000 network 
                                                  
                      +------+                     +------+ 
                      |      |                     |      |                    <-------------------->|  GW  |<------------------->|  GW  |<------------------->
                      |      |  RTP/UDP/IP/VMR-WB  |      | 
                      +------+                     +------+ 
                          |                           |
                          |         IP network        |
                          |                           |

   Figure 5: GW to GW scenario (a CDMA2000 MS-to-MS voice over IP scenario) 

9. VMR-WB RTP Payload Format

The VMR-WB payload format is very similar to that of AMR-WB (i.e., RFC 3267). Both codecs' payloads have relatively identical structures to further simplify interoperable interconnections and consequently the interworking function. The only differences are in the non-interoperable interconnections where NO IWF is required. The payload format consists of the RTP header, payload header, and payload data.


9.1. RTP Header Usage

The format of the RTP header is specified in [5]. This payload format uses the fields of the header in a manner consistent with that specification.

The RTP timestamp corresponds to the sampling instant of the first sample encoded for the first frame-block in the packet. The timestamp clock frequency is the same as the sampling frequency, so the timestamp unit is in samples.

The duration of one speech frame-block is 20 ms for VMR-WB. For normal wideband operation of VMR-WB, the input/output sampling frequency is 16 kHz, corresponding to 320 samples per frame from each channel. Thus, the timestamp is increased by 320 for VMR-WB for each consecutive frame-block.

For narrowband operation of VMR-WB, the input/output sampling frequency is 8 kHz, corresponding to 160 encoded speech samples per frame from each channel. Thus, the timestamp is increased by 160 for VMR-WB for each consecutive frame-block while processing narrowband input/output speech signals. The choice of sampling frequency MUST be indicated in the beginning of a session (see section 13). The default input/output sampling rate is 16 kHz. Note that during a session, the sampling rate SHALL not be changed. 

A packet may contain multiple frame-blocks of encoded speech or comfort noise parameters. If interleaving is employed, the frame-blocks encapsulated into a payload are picked according to the interleaving rules as defined in Section 9.4.1.  Otherwise, each packet covers a period of one or more contiguous 20 ms
frame-block intervals. In case the data from all the channels for a particular frame-block in the period is missing, for example at a gateway from some other transport format, it is possible to indicate that no data is present for that frame-block rather than breaking a multi-frame-block packet into two, as 
Sassan Ahmadi                                                          [page 11]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

explained in Section 9.3.2.

The payload is always made an integral number of octets long by padding with zero bits if necessary.  If additional padding is required to bring the payload length to a larger multiple of octets or for some other purpose, then the P bit in the RTP header MAY be set and padding appended as specified in [5].

The RTP header marker bit (M) SHALL be set to 1 if the first frame-block carried in the packet contains a speech frame, which is the first in a talk spurt. For all other packets the marker bit SHALL be set to zero (M=0).

The assignment of an RTP payload type for this new packet format is outside the scope of this document, and will not be specified here. It is expected that the RTP profile under which this payload format is being used will assign a payload type for this encoding or specify that the payload type is to be bound dynamically.


9.2. Payload Structure

The complete payload consists of a payload header, a payload table of contents, and speech data representing one or more speech frame-blocks. The following diagram shows the general payload format layout:

   +----------------+-------------------+----------------
   | payload header | table of contents | speech data ...
   +----------------+-------------------+----------------
  
Payloads containing more than one speech frame-block are called compound payloads.

The following sections describe the variations taken by the payload format depending on whether the VMR-WB session is set up to use the bandwidth-efficient mode or octet-aligned mode and any of the OPTIONAL functions such as interleaving. Implementations SHOULD support both bandwidth-efficient and octet-aligned operation to increase interoperability.


9.3. Bandwidth-Efficient Mode

9.3.1. The Payload Header

In bandwidth-efficient mode, the payload header simply consists of a 4 bit codec mode request:

    0 1 2 3
   +-+-+-+-+
   |  CMR  |
   +-+-+-+-+

CMR (4 bits): Indicates a codec mode request sent to the speech encoder at the site of the receiver of this payload, provided that the network allows the use of the requested mode. Therefore, the network MAY overwrite the mode request depending on the network conditions. Also, during a VMR-WB <-> AMR-WB 
Sassan Ahmadi                                                          [page 12]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

interoperable interconnection, the operational mode of VMR-WB is set to "Rate-Set II mode 3". 

The value of the CMR field is set according to the following Table

+-----------+------------------------------------------------------------------+ 
|   CMR     |                      VMR-WB Codec Mode                           |
+-----------+------------------------------------------------------------------+
|     0     | Rate-Set II  mode 3 (AMR-WB interoperable mode at 6.6 kbps)      |
|     1     | Rate-Set II  mode 3 (AMR-WB interoperable mode at 8.85 kbps)     |
|     2     | Rate-Set II  mode 3 (AMR-WB interoperable mode at 12.65 kbps)    |                                           
|     3     | Rate-Set II  mode 2                                              |
|     4     | Rate-Set II  mode 1                                              | 
|     5     | Rate-Set II  mode 0                                              |                                         
|     6     | (reserved)                                                       |                     |   10-14   | (reserved)                                                       |   
|     15    | No Preference (Codec mode SHOULD be set by the network)          |
+-----------+------------------------------------------------------------------+
Table 3: List of valid CMR values and their associated VMR-WB operating modes.

The choice of values for each operating mode of VMR-WB was to ensure similarity with the compatible modes of AMR-WB in order to facilitate interoperability. The reserved values have not been implemented yet. The mode request received in the CMR field is valid until the next CMR is received, i.e. a newly received CMR value overrides the previous one. Therefore, if a terminal continuously wishes to receive frames in the same mode x, it needs to set CMR=x for all its outbound payloads, and if a terminal has no preference in which mode to receive, it SHOULD set CMR=15 in all its outbound payloads.

If receiving a payload with a CMR value, which is not valid, the CMR MUST be ignored by the receiver.

The CMR values 0 and 1 are used to maintain similarity with AMR-WB codec modes 0 and 1. Note that there is only one interoperable mode in VMR-WB (i.e., Rate-Set II mode 3). In-band signaling is used in VMR-WB as described in Appendix A to select between AMR-WB codec modes 0, 1, or 2. The Interworking Function will ensure correct codec mode setting on both sides of an interoperable interconnection.

The default input/output sampling rate of VMR-WB is 16 kHz. The narrowband operation of VMR-WB on 8 kHz input/output narrowband speech requires that both encoder and decoder of VMR-WB be informed of the desired sampling rate. This MUST be signaled to the encoder and decoder in the beginning of a real-time or non-real-time VoIP session through MIME parameter "sampling-frequency" (see section 13.1 for MIME registration parameters).

With a given sampling rate (i.e., 8/16 kHz), the encoder can switch between wideband or narrowband operation modes without prior knowledge of the decoder.

In a multi-channel session, CMR SHOULD be interpreted by the receiver of the payload as the desired encoding mode for all the channels in the session, if the network allows.


Sassan Ahmadi                                                          [page 13]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

An IP end-point SHOULD NOT set the CMR based on packet losses or other congestion indications, for several reasons:

     - The other end of the IP path may be a gateway to a non-IP
       network (such as a radio link) that needs to set the CMR field
       to optimize performance on that network.

     - Congestion on the IP network is managed by the IP sender, in
       this case at the other end of the IP path.  Feedback about
       congestion SHOULD be provided to that IP sender through RTCP or
       other means, and then the sender can choose to avoid congestion
       using the most appropriate mechanism.  That may include
       adjusting the codec mode, but also includes adjusting the level
       of redundancy or number of frames per packet.

The encoder SHOULD follow a received mode request, but MAY change to a different mode if the network necessitates it, for example to control congestion.

The CMR field MUST be set to 15 for packets sent to a multicast group. The encoder in the speech sender SHOULD ignore mode requests when sending speech to a multicast session but MAY use RTCP feedback information as a hint that a mode change is needed.

The codec mode selection MAY be restricted by a session parameter to a subset of the available modes. If so, the requested mode MUST be among the signaled subset (see Section 13).


9.3.2. The Payload Table of Contents

The table of contents (ToC) consists of a list of ToC entries, each representing a speech frame. 
   
In bandwidth-efficient mode, a ToC entry takes the following format: 

    0 1 2 3 4 5
   +-+-+-+-+-+-+
   |F|  FT   |Q|
   +-+-+-+-+-+-+

F (1 bit): If set to 1, indicates that this frame is followed by another speech frame in this payload; if set to 0, indicates that this frame is the last frame in this payload.

FT (4 bits): Frame type index whose value is chosen according to the following Table. Note that Rate-Set II (RS-II) contains four frame types that are the allowed encoding rates compatible with CDMA Multiplex option 2 [12,13].


Sassan Ahmadi                                                          [page 14]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

+----+-------------------------------------+----------------------------------+
| FT |                Encoding Rate        |          Frame Size (Bits)       |
+----+-------------------------------------+----------------------------------+
| 0  | RS-II Full-Rate (AMR-WB 6.6 kbps)   | 13 Preamble+132 Data+121 Padding | 
| 1  | RS-II Full-Rate (AMR-WB 8.85 kbps)  | 13 Preamble+177 Data+76  Padding |
| 2  | RS-II Full-Rate (AMR-WB 12.65 kbps) |        13 Preamble+253 Data      |
| 3  | RS-II Full-Rate  13.3 kbps          |               266                |
| 4  | RS-II Half-Rate  6.2 kbps           |       124 (Preamble + Data)      |
| 5  | RS-II Quarter-Rate 2.7 kbps         |                54                |
| 6  | RS-II Eighth-Rate 1.0 kbps          |                20                |
| 7  | (reserved)                          |                                  |
| 8  | (reserved)                          |                                  |
| 9  | RS-II CNG (AMR-WB SID+ padding)     |   5 Preamble+35 Data+14 Padding  |
| 10 | (reserved)                          |                                  |
| 11 | (reserved)                          |                                  |
| 12 | (reserved)                          |                                  |                               
| 13 | (reserved)                          |                                  |
| 14 | RS-II Erasure (AMR-WB SPEECH_LOST)  |                 0                |
| 15 | RS-II Blank (AMR-WB NO_DATA)        |                 0                |
+----+-------------------------------------+----------------------------------+
Table 4:VMR-WB payload frame types for real-time transport.

The Preamble bits in the interoperable Frame Types are used for in-band signaling to distinguish between different encoding rates. For example, the first 13 preamble bits in the Rate-Set II Interoperable Full-Rate are used to decode AMR-WB codec modes 0, 1, or 2.

During the interoperable mode, FT=14 (SPEECH_LOST) and FT=15 (NO_DATA) are used to indicate frames that are either lost or not being transmitted in this payload, respectively. FT=14 or 15 MAY be used in the non-interoperable modes to indicate frame erasure or blank frame, respectively.

Q (1 bit): Frame quality indicator. If set to 0, indicates the corresponding frame is corrupted. During the interoperable mode, the receiver side (with AMR-WB codec) should set the RX_TYPE to either SPEECH_BAD or SID_BAD depending on the frame type (FT), if Q=0.

For multi-channel sessions, the ToC entries of all frames from a frame-block are placed in the ToC in consecutive. When multiple frame-blocks are present in a
packet in bandwidth-efficient mode, they will be placed in the packet in order of their creation time.

Therefore, with N channels and K speech frame-blocks in a packet, there MUST be N*K entries in the ToC, and the first N entries will be from the first frame-block, the second N entries will be from the second frame-block, and so on.

The following figure shows an example of a ToC of three entries in a single 
channel session using bandwidth efficient mode.

    0                   1
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1|  FT   |Q|1|  FT   |Q|0|  FT   |Q|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Sassan Ahmadi                                                          [page 15]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

Below is an example of how the ToC entries will appear in the ToC of a packet carrying 3 consecutive frame-blocks in a session with two channels (L and R).

   +----+----+----+----+----+----+
   | 1L | 1R | 2L | 2R | 3L | 3R |
   +----+----+----+----+----+----+
   |<------->|<------->|<------->|
     Frame-    Frame-    Frame-
     Block 1   Block 2   Block 3


9.3.3. Speech Data

Speech data of a payload contains one or more speech as described in the ToC of the payload.

Each speech frame represents 20 ms of speech encoded in one of the available encoding rates depending on the operation mode. The
Length of the speech frame is defined by the frame type in the FT field. The order and numbering notation of the bits are as specified in the VMR-WB standard specification [10]. To facilitate the VMR-WB and AMR-WB interconnection and to simplify the interworking function during the interoperable mode, the order and the numbering notation of the speech codec bits closely follow those of AMR-WB RTP payload (i.e., RFC 3267).


9.3.4. Algorithm for Forming the Payload

The complete RTP payload in bandwidth-efficient mode is formed by packing bits from the payload header, table of contents, and speech frames, in order as defined by their corresponding ToC entries in the ToC list, contiguously into octets beginning with the most significant bits of the fields and the octets.

To be precise, the four-bit payload header is packed into the first octet of the payload with bit 0 of the payload header in the most significant bit of the octet. The four most significant bits (numbered 0-3) of the first ToC entry are packed into the least significant bits of the octet, ending with bit 3 in the least significant bit.  Packing continues in the second octet with bit 4 of the first ToC entry in the most significant bit of the octet. If more than one frame is contained in the payload, then packing continues with the second and successive ToC entries.  Bit 0 of the first data frame follows immediately after the last ToC bit, proceeding through all the bits of the frame in numerical order. Bits from any successive frames follow contiguously in numerical order for each frame and in consecutive order of the frames.

If speech data is missing for one or more speech frame within the sequence, because of, for example Blank and Burst or DTX, a ToC entry with FT set to
NO_DATA/Blank SHALL be included in the ToC for each of the missing frames, but no data bits are included in the payload for the missing frame (see Section 9.3.5.2 for an example).


9.3.5 Payload Examples

Sassan Ahmadi                                                          [page 16]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

9.3.5.1. Single Channel Payload Carrying a Single Frame

The following diagram shows a bandwidth-efficient VMR-WB payload from a single channel session carrying a single speech data block. 

In the payload, no specific mode is requested (CMR=15), the speech frame is not damaged at the IP origin (Q=1), and the encoding rate is VMR-WB Rate-Set II Half-Rate (FT=4). The encoded speech bits, d(0) to d(123), are arranged according to [2]. Finally, two zero bits are added to the end as padding to make the payload octet aligned.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | CMR=15|0| FT=4  |1|d(0)                                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     d(123)|P|P|
   +-+-+-+-+-+-+-+-+

9.3.5.2. Single Channel Payload Carrying Multiple Frames

The following diagram shows a single channel, bandwidth efficient compound VMR-WB payload that contains four frames, of which one has no speech data. VMR-WB is hypothetically operating in Rate-Set II mode 2. The first frame is a speech frame at Rate-Set II Full-Rate (FT=3) that is composed of speech bits d(0) to d(265). The second frame is an Rate-Set II Quarter-Rate (FT=5), consisting of bits g(0) to g(53). The third frame is Rate-Set II Blank/NO_DATA frame and does not carry any speech information, it is represented in the payload by its ToC entry (FT=15). The fourth frame in the payload is a speech frame encoded at Rate-Set II Half-Rate (FT=4), it consists of speech bits h(0) to h(123).

None of the frames is damaged at IP origin (Q=1). The encoded speech d(0) to d(265), g(0) to g(53), and h(0) to h(123), are sequentially arranged in the 
payload. (Note, no speech bits are present for the third frame). Finally, No padding bits are required to make the payload octet aligned.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | CMR=3 |1| FT=3  |1|1| FT=5  |1|1| FT=15 |1|0| FT=4  |1|d(0)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
Sassan Ahmadi                                                          [page 17]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003
 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     d(265)|g(0)
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                  g(53)|h(0)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                         h(123)|              
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   

9.3.5.3. Multi-Channel Payload Carrying Multiple Frames

The following diagram shows a two-channel payload carrying 3 frame-blocks, i.e. the payload will contain 6 speech frames. 

In the payload all speech frames contain the same encoding rate of Rate-Set II Full-Rate (FT=3) and are not damaged at IP origin. The CMR is set to 15,
i.e., the operating mode SHOULD be set by the network. The two channels are defined as left (L) and right (R) in that order. The encoded speech bits are
designated dXY(0)... dXY(K-1), where X = block number, Y = channel, and K is the number of speech bits for the corresponding encoding rate. Exemplifying this, for frame-block 1 of the left channel the encoded bits are designated as d1L(0) to d1L(265).

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | CMR=15|1|1L FT=3|1|1|1R FT=3|1|1|2L FT=3|1|1|2R FT=3|1|1|3L FT|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |3|1|0|3R FT=3|1|d1L(0)                                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :                                                               :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           d1L(265)|d1R(0)                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Sassan Ahmadi                                                          [page 18]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003   
   
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   
   |                                               d1R(265)|d2L(0) |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   d2L(265)|d2R(0)                                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       d2R(265)|d3L(0)                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   
   |                                           d3L(265)|d3R(0)     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   
   |d3R(265)P|P|P|P|
   +-+-+-+-+-+-+-+-+
   
  
9.4. Octet-aligned Mode

9.4.1. The Payload Header

In octet-aligned mode, the payload header consists of a 4 bit CMR, 4 reserved bits, and optionally, an 8 bit interleaving header, as shown below:

    0                   1
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
   +-+-+-+-+-+-+-+-+- - - - - - - - 
   |  CMR  |R|R|R|R|  ILL  |  ILP  |
   +-+-+-+-+-+-+-+-+- - - - - - - - 

CMR (4 bits): same as defined in section 9.3.1.

R: is a reserved bit that MUST be set to zero. All R bits MUST be ignored by the receiver.

ILL (4 bits, unsigned integer): This is an OPTIONAL field that is
     present only if interleaving is signaled out-of-band for the
     session. ILL=L indicates to the receiver that the interleaving
     length is L+1, in number of frame-blocks. 

ILP (4 bits, unsigned integer): This is an OPTIONAL field that is
     present only if interleaving is signaled. ILP MUST take a value
     between 0 and ILL, inclusive, indicating the interleaving index
     for frame-blocks in this payload in the interleave group. If the
     value of ILP is found greater than ILL, the payload SHOULD be
     discarded.

ILL and ILP fields MUST be present in each packet in a session if interleaving is signaled for the session.
Sassan Ahmadi                                                          [page 19]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

If Interleaving option is utilized, It MUST be performed on a frame-block basis as opposed to a frame basis in a multi-channel session.

The following example illustrates the arrangement of speech frame-blocks in an interleave group during an interleave session. Here we assume ILL=L for the interleave group that starts at speech frame-block n. We also assume that the first payload packet of the interleave group is s and the number of speech   frame-blocks carried in each payload is N. Then we will have:

    Payload s (the first packet of this interleave group):
      ILL=L, ILP=0,
      Carry frame-blocks: n, n+(L+1), n+2*(L+1),..., n+(N-1)*(L+1)

    Payload s+1 (the second packet of this interleave group):
      ILL=L, ILP=1,
      Carry frame-blocks: n+1, n+1+(L+1), n+1+2*(L+1),..., n+1+(N-1)*(L+1)

        ...

    Payload s+L (the last packet of this interleave group):
      ILL=L, ILP=L,
      Carry frame-blocks: n+L, n+L+(L+1), n+L+2*(L+1), ..., n+L+(N-1)*(L+1)

The next interleave group will start at frame-block n+N*(L+1).

There will be no interleaving effect unless the number of frame-blocks per packet (N) is at least 2. Moreover, the number of frame-blocks per payload (N) and the value of ILL MUST NOT be changed inside an interleave group. In other words, all payloads in an interleave group MUST have the same ILL and MUST contain the same number of speech frame-blocks.

The sender of the payload MUST only apply interleaving if the receiver has signaled its use through out-of-band means. Since interleaving will increase buffering requirements at the receiver, the receiver uses MIME parameter "interleaving=I" to set the maximum number of frame-blocks allowed in an interleaving group to I.

When performing interleaving the sender MUST use a proper number of frame-blocks per payload (N) and ILL so that the resulting size of an interleave group is less than or equal to I, i.e., N*(L+1)<=I.

9.4.2. The Payload Table of Contents

The table of contents (ToC) in octet-aligned mode consists of a list of ToC entries where each entry corresponds to a speech frame carried in the payload, i.e., 

   +---------------------+
   | list of ToC entries |
   +---------------------+
   
Note, for ToC entries with FT=14 or 15, there will be no corresponding speech frame in the payload. 

Sassan Ahmadi                                                          [page 20]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

The list of ToC entries is organized in the same way as described for bandwidth-efficient mode in 9.3.2, with the following exception; when interleaving is used, the frame-blocks in the ToC will almost never be placed consecutive in time. Instead, the presence and order of the frame-blocks in a packet will follow the pattern described in 9.4.1.

The following example shows the ToC of three consecutive packets, each carrying 3 frame-blocks, in an interleaved two-channel session. Here, the two channels are left (L) and right (R) with L coming before R, and the interleaving length is 3 (i.e., ILL=2). This makes the interleave group 9 frame-blocks large.
   
   Packet #1
   ---------

   ILL=2, ILP=0:
   +----+----+----+----+----+----+
   | 1L | 1R | 4L | 4R | 7L | 7R |
   +----+----+----+----+----+----+
   |<------->|<------->|<------->|
     Frame-    Frame-    Frame-
     Block 1   Block 4   Block 7
  
   Packet #2
   ---------

   ILL=2, ILP=1:
   +----+----+----+----+----+----+
   | 2L | 2R | 5L | 5R | 8L | 8R |
   +----+----+----+----+----+----+
   |<------->|<------->|<------->|
     Frame-    Frame-    Frame-
     Block 2   Block 5   Block 8
    
   Packet #3
   ---------

   ILL=2, ILP=2:
   +----+----+----+----+----+----+
   | 3L | 3R | 6L | 6R | 9L | 9R |
   +----+----+----+----+----+----+
   |<------->|<------->|<------->|
     Frame-    Frame-    Frame-
     Block 3   Block 6   Block 9
  

A ToC entry takes the following format in octet-aligned mode:  

    0 1 2 3 4 5 6 7
   +-+-+-+-+-+-+-+-+
   |F|  FT   |Q|P|P|
   +-+-+-+-+-+-+-+-+

F (1 bit): see definition in Section 9.3.2.
FT (4 bits unsigned integer): see definition in Section 9.3.2.
Sassan Ahmadi                                                          [page 21]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

Q (1 bit): see definition in Section 9.3.2.
P bits: padding bits MUST be set to zero.


9.4.3. Speech Data

In octet-aligned mode, speech data is carried in a similar way to that in the bandwidth-efficient mode as discussed in Section 9.3.3, with the following exceptions: 

    - The last octet of each speech frame MUST be padded with zeroes
      at the end if not all bits in the octet are used. In other
      words, each speech frame MUST be octet-aligned. 

    - When multiple speech frames are present in the speech data
      (i.e., compound payload), the speech frames MUST be arranged
      one whole frame after another.
    
  
9.4.4. Methods for Forming the Payload

The payload begins with the payload header of one octet or two if frame interleaving is selected.  The payload header is followed by the table of contents consisting of a list of one-octet ToC entries. 

The speech data follows the table of contents. For packetization in the normal order, all of the octets comprising a speech frame are appended to the payload as a unit. The speech frames are packed in the same order as their corresponding ToC entries are arranged in the ToC list, with the exception that if a given frame has a ToC entry with FT=14 or 15, there will be no data octets present for that frame.
   

9.4.5. Payload Example

9.4.5.1. Basic Single Channel Payload Carrying Multiple Frames

The following diagram shows an octet-aligned payload from a single channel session that carries two VMR-WB Rate-Set II Full-Rate frames (FT=3). In the payload, a codec mode request is sent (e.g., CMR=4), requesting the encoder at the receiver's side to use VMR-WB mode 1. No interleaving is 
used.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | CMR=4 |R|R|R|R|1|FT#1=3 |Q|P|P|0|FT#2=3 |Q|P|P|   f1(0..7)    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   f1(8..15)   |  f1(16..23)   |  ...                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | r |P|P|P|P|P|P|  f2(0..7)     |   f2(8..15)   |  f2(16..23)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Sassan Ahmadi                                                          [page 22]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003
  
   : ...                                                           :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        ...    | l |P|P|P|P|P|P|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
r= f1(264,265)
l= f2(264,265)

Note, in above example the last octet in both speech frames is padded with zeros to make them octet-aligned.


9.5. Implementation Considerations

An application implementing this payload format MUST understand all the payload parameters in the out-of-band signaling used. For example, if an application uses SDP, all the SDP and MIME parameters in this document MUST be understood. This requirement ensures that an implementation always can decide if it is capable or not of communicating.

No operation mode of the payload format is mandatory to implement. The requirements of the application using the payload format should be used to determine what to implement. To achieve basic interoperability with other applications implementing this payload, each implementation SHOULD at least implement both bandwidth-efficient and octet-aligned mode for single channel. The support of interleaving is OPTIONAL.


10. VMR-WB Storage Format

The storage format is used for storing VMR-WB encoded speech frames in a file or as an e-mail attachment. Multiple channel content is also supported.

The storage format for VMR-WB is identical to that of AMR-WB to ensure full compatibility in the interoperable mode. In general, VMR-WB file has the following structure:

   +------------------+
   | Header           |  
   +------------------+
   | Speech frame 1   |  
   +------------------+
   : ...              :
   +------------------+
   | Speech frame n   |
   +------------------+
   

10.1. Single channel Header

A single channel VMR-WB file header contains only a magic number.

The magic number for single channel VMR-WB files in non-interoperable modes MUST consist of ASCII character string:
   
Sassan Ahmadi                                                          [page 23]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003
 
     "#!VMR-WB\n" 
     (or 0x2321564d522d57420a in hexadecimal). 
   
Note, the "\n" is an important part of the magic numbers and MUST be included in the comparison; otherwise, the single channel magic number above will become indistinguishable from those of the multi-channel files defined in the next section.

The magic number for single channel VMR-WB files in the interoperable mode MUST consist of ASCII character string:
   
     "#!AMR-WB\n" 
     (or 0x2321414d522d57420a in hexadecimal).

Note that VMR-WB uses the same magic number as AMR-WB (see RFC 3267) when saving the encoded speech in the interoperable mode. Therefore, a file generated by VMR-WB is directly decodable with AMR-WB. However, since VMR-WB can only decode AMR-WB modes 0, 1, or 2, AMR-WB codec MUST be instructed not to generate the modes that are not in common so that files generated by AMR-WB can be decoded directly by VMR-WB (Note that the expansion of AMR-WB interoperable modes in VMR-WB decoder will ultimately ease this requirement).

10.2. Multi-channel Header

The multi-channel header consists of a magic number followed by a 32-bit channel description field, giving the multi-channel header the following structure:
   
   +----------------------------+
   |        magic number        | 
   +----------------------------+
   | channel description field  |
   +----------------------------+
   
The magic number for multi-channel VMR-WB files in the non-interoperable modes MUST consist of the ASCII character string:
   
     "#!VMR-WB_MC1.0\n" 
     (or 0x2321564d522d57425F4D43312E300a in hexadecimal). 
     
The version number in the magic numbers refers to the version of the file format. 

The magic number for multi-channel VMR-WB files in the interoperable mode MUST consist of the ASCII character string (see RFC 3267):
   
     "#!AMR-WB_MC1.0\n" 
     (or 0x2321414d522d57425F4D43312E300a in hexadecimal). 

The 32-bit channel description field is defined as:
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Reserved bits                                    | CHAN  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Sassan Ahmadi                                                          [page 24]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

Reserved bits: MUST be set to 0 when written, and a reader MUST ignore them.
   
CHAN (4 bit unsigned integer): Indicates the number of audio channels contained in this storage file. The valid values and the order of the channels within a frame block are specified in Section 4.1 in [19].


10.3. Speech Frames

After the file header, speech frame-blocks consecutive in time are stored in the file. Each frame-block contains a number of octet-aligned speech frames equal to the number of channels, and stored in increasing order, starting with channel 1.
      
Each stored speech frame starts with a one octet frame header with the following format:

    0 1 2 3 4 5 6 7 
   +-+-+-+-+-+-+-+-+
   |P|  FT   |Q|P|P|                                               
   +-+-+-+-+-+-+-+-+

The FT field and the Q bit are defined as follows. The P bits are padding and MUST be set to 0.
   
   +----+---------------------------------------------+------------------------+
   | FT |                Encoding Rate                |  Frame Size (Bits)     |
   +----+---------------------------------------------+------------------------+
   | 0  | RS-II Full-Rate (AMR-WB 6.6 kbps)           |          132           |     
   | 1  | RS-II Full-Rate (AMR-WB 8.85 kbps)          |          177           |
   | 2  | RS-II Full-Rate (AMR-WB 12.65 kbps)         |          253           |
   | 3  | RS-II Full-Rate 13.3 kbps                   |          266           |
   | 4  | RS-II Half-Rate 6.2 kbps                    |          124           |
   | 5  | RS-II Quarter-Rate 2.7 kbps                 |           54           |
   | 6  | RS-II Eighth-Rate 1.0 kbps                  |           20           |
   | 7  | (reserved)                                  |           -            |
   | 8  | (reserved)                                  |           -            |
   | 9  | RS-II CNG (AMR-WB SID)                      |           35           |
   | 10 | (reserved)                                  |           -            |
   | 11 | (reserved)                                  |           -            |
   | 12 | (reserved)                                  |           -            |                               
   | 13 | (reserved)                                  |           -            |
   | 14 | RS-II Erasure (AMR-WB SPEECH_LOST)          |           0            |
   | 15 | RS-II Blank (AMR-WB NO_DATA)                |           0            |
   +----+---------------------------------------------+------------------------+
Table 5: VMR-WB frame types for non-real-time transport and storage.

Q (1 bit): Frame quality indicator. If set to 0, indicates the corresponding frame is corrupted. 

Note that in the above Table no padding for the AMR-WB compatible Frame Types is included. This is due to the fact that no frame-size adjustment for those frames is needed (to make them compatible to CDMA Multiplex Option 2), since in the file storage, no real-time over-the-air transmission takes place. Following this one octet header, the speech bits are placed as defined in 9.3.3. The last octet 
Sassan Ahmadi                                                          [page 25]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

of each frame is padded with zeroes, if needed, to achieve octet alignment. 

The following example shows a VMR-WB speech frame encoded at Rate-Set II Half-Rate (with 124 speech bits) in the storage format.  

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |P| FT=4  |Q|P|P|                                               |
   +-+-+-+-+-+-+-+-+                                               +
   |                                                               |
   +          Speech bits for frame-block n, channel k             +
   |                                                               |
   +                                                               +
   |                                                               |
   +       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       | 
   +-+-+-+-+

Frame-blocks or speech frames that are lost in transmission and thereby not received MUST be stored as Blank/NO_DATA frames (FT=15) or Erasure/SPEECH_LOST (FT=14) in complete frame-blocks to keep synchronization with the original media.


11. Congestion Control (Network-Controlled Mode Switching)

The general congestion control considerations for transporting RTP data apply to VMR-WB speech over RTP as well. However, the multimode capability of VMR-WB speech coding may provide an advantage over other payload formats for controlling congestion since the bandwidth demand can be adjusted by selecting a different operating mode (i.e., mode switching).

Another parameter that may impact the bandwidth demand for VMR-WB is the number of frame-blocks that are encapsulated in each RTP payload. Packing more frame-blocks in each RTP payload can reduce the number of packets sent and hence the overhead from RTP/UDP/IP headers, at the expense of increased delay.

If forward error correction (FEC) is used to alleviate the packet loss, the amount of redundancy added by FEC will need to be regulated so that the use of FEC itself does not cause a congestion problem. 

It is RECOMMENDED that VMR-WB applications using this payload format employ congestion control. The actual mechanism for congestion control is not specified but should be suitable for real-time transport of datagrams.


12. Security Considerations

RTP packets using the payload format defined in this specification are subject to the general security considerations discussed in [5].

As this format transports encoded speech, the main security issues include 

Sassan Ahmadi                                                          [page 26]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

confidentiality and authentication of the speech itself. The payload format itself does not have any built-in security mechanisms. External mechanisms, such as SRTP [17], MAY be used.

This payload format does not exhibit any significant non-uniformity in the receiver side computational complexity for packet processing and thus is unlikely to pose a denial-of-service threat due to the receipt of pathological/corrupted data.

Note that robust built-in bad rate detection and concealment as well as frame erasure concealment mechanisms have been implemented in VMR-WB to alleviate the impacts of the reception of corrupted rate information, packet loss, and corrupted speech data [10].


12.1. Confidentiality

To achieve confidentiality of the encoded VMR-WB speech, all speech data bits MAY be encrypted. There is no need to encrypt the payload header or the table of contents due to the following reasons:

1) they only carry information about the requested speech mode, frame type, and frame quality

2) this information could be useful to some third party, e.g., quality monitoring. 

As long as the VMR-WB payload is only packed and unpacked at either end, encryption may be performed after packet encapsulation so that there is no conflict between the two operations.

Interleaving may affect encryption. Depending on the encryption scheme used, there may be restrictions on, for example, the time when keys can be changed. Specifically, the key change may need to occur at the boundary between interleave groups. 

The type of encryption method used may impact the error robustness of the payload data. The error robustness may be severely reduced when the data is encrypted unless an encryption method without error-propagation is used, e.g. a stream cipher. 


12.2. Authentication

To authenticate the sender of the speech, an external mechanism MUST be used. It is RECOMMENDED that such a mechanism protect all the speech data bits. 

Data tampering by a man-in-the-middle attacker could result in erroneous depacketization/decoding that could lower the speech quality. Tampering with the CMR field may result in speech in a different quality than desired.

To prevent a man-in-the-middle attacker from tampering with the payload packets, 
some additional information besides the speech bits SHOULD be protected.

Sassan Ahmadi                                                          [page 27]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

This may include the payload header, ToC, RTP timestamp, RTP sequence number, and the RTP marker bit. 


12.3. Decoding Validation and Provision for Lost or Late Packets

When processing a received payload packet, if the receiver finds that the calculated payload length, based on the information of the session and the values found in the payload header fields, do not match the size of the received packet, the receiver SHOULD discard the packet to avoid potential degradation of speech quality and to invoke the built-in frame error concealment mechanism. Therefore, invalid packets SHALL be treated as lost packets. 

Late packets (i.e., unavailability of a packet when needed for decoding at the receiver) SHALL be treated as lost packets. Furthermore, if the late packet is part of an interleave group, depending upon the availability of the other packets in that interleave group, decoding MUST be resumed from the next (sequential order) available packet. In other words, the unavailability of a packet in an interleave group at certain time SHOULD not invalidate the other packets within that interleave group that MAY arrive later.


13. Payload Format Parameters

This section defines the parameters that may be used to select optional features in the VMR-WB payload.  The parameters are defined here as part of the MIME subtype registration for the VMR-WB speech codec.  A mapping of the parameters into the Session Description Protocol (SDP) [8] is also provided for those applications that use SDP.  Equivalent parameters could be defined elsewhere for use with control protocols that do not use MIME or SDP.

The data format and parameters are specified for both real-time transport in RTP and for storage type applications such as e-mail attachments.


13.1. VMR-WB MIME Registration

The MIME subtype for the Variable-Rate Multimode Wideband (VMR-WB) audio codec is allocated from the IETF tree since VMR-WB is expected to be a widely used speech codec in general MMS, IMS, and VoIP applications. This MIME registration covers both real-time transfer via RTP and non-real-time transfers via stored files. 

Note, the receiver MUST ignore any unspecified parameter and use the default values instead.

   Media Type name:     audio

   Media subtype name:  VMR-WB

   Required parameters: none

Note that if no input parameters are defined, the default values will be used.

Sassan Ahmadi                                                          [page 28]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

Also note that "crc" and "robust-sorting" parameters from RFC 3267 are not applicable to VMR-WB RTP payload and storage file formats. To ensure compatibility between VMR-WB and AMR-WB in the interoperable sessions, one SHOULD make sure that AMR-WB does not utilize crc and robust-sorting (i.e., these options are deactivated in the session initiation).

OPTIONAL parameters:
These parameters apply to RTP transfer only.

  octet-align: Permissible values are 0 and 1. If 1, octet-aligned
               operation SHALL be used. If 0 or if not present,
               bandwidth-efficient operation is employed (default).

    mode-set:  Requested VMR-WB mode set. Restricts the active codec
               mode set to a subset of all modes. Possible values are
	    a comma separated list of modes from the set: 0, 1, 2, or 3  
               [10]. If such mode set is 
               specified by the decoder, the encoder MUST abide by the request 
               and MUST NOT use modes outside of the subset. If not
	    present, all codec modes are allowed for the session.

               During and upon initiation of an interoperable interconnection  
               between VMR-WB and AMR-WB, only Rate-Set II mode 3 SHALL be used. 
               There are three Frame Types (i.e., FT=0, 1, or 2) within this 
               mode that match AMR-WB modes 0, 1, and 2, respectively. 

               If the AMR-WB codec is engaged in an interoperable 
               interconnection with VMR-WB, the active AMR-WB codec mode set 
               SHOULD be limited to 0, 1, or 2.

    mode-change-period: Specifies a number of frame-blocks, N, that is
               the interval at which codec mode changes are allowed.
               The initial phase of the interval is arbitrary, but
               changes must be separated by multiples of N
               frame-blocks. If this parameter is not present, mode
               changes are allowed at any time during the session.

               Note that this consideration is only made for maximum 
               compatibility with AMR-WB; otherwise, VMR-WB modes can be 
               switched at any time as long as the mode switching interval is an 
               integer multiple of the frame size (i.e., 20 ms).

    mode-change-neighbor: Permissible values are 0 and 1.  If 1, mode
               changes SHALL only be made to the neighboring modes in
	    the active codec mode set. Neighboring modes are the
	    ones closest in bit rate to the current mode, either
	    the next higher or next lower rate. If 0 or if not
	    present, change between any two modes in the active
	    codec mode set is allowed.

    maxptime:  The maximum amount of media, which can be encapsulated
               in a payload packet, expressed as time in

           
Sassan Ahmadi                                                          [page 29]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003 
   
    	    milliseconds. The time is calculated as the sum of the
	    time the media present in the packet represents. The
	    time SHALL be an integer multiple of the frame size. If this
	    parameter is not present, the sender MAY encapsulate
	    any number of speech frames into one RTP packet. 

    interleaving: Indicates that frame-block level interleaving SHALL
               be used for the session and its value defines the
               maximum number of frame-blocks allowed in an
               interleaving group (see Section 9.4.1). If this
               parameter is not present, interleaving SHALL not be
               used. The presence of this parameter also implies
               automatically that octet-aligned operation SHALL be
               used.

    ptime:     see RFC2327 [8]. It SHALL be at least one frame size for VMR-WB.

    channels: The number of audio channels. The possible values and
	      their respective channel order is specified in section
	      4.1 in [19]. If omitted it has the default value of 1.

    These parameters apply to both real-time and non-real-time transfers

    dtx:       Permissible values are 0 and 1. The default is 0 (i.e., No DTX) 
               where VMR-WB normally operates as a continuous variable-rate 
               codec. If dtx=1, this MUST be signaled to both encoder and 
               decoder of VMR-WB to operate in Discontinuous Transmission (DTX) 
               mode.

    sampling-frequency: Permissible values are 0 and 1. The default value is 0 
              (i.e., 16000 Hz sampling frequency for input/output and normal 
              wideband operation). If the value is set to 1, the input/output  
              sampling frequency is 8000 Hz (i.e., narrowband operation). If the 
              sampling frequency is 
              signaled only to encoder or the decoder, different combinations of 
              input and output speech sampling frequencies are obtained (e.g., 
              input at 8000 Hz and output at 16000 Hz). Nevertheless, different 
              input and output sampling rates are not RECOMMENDED. The sampling 
              frequency SHALL not be changed during a session. Also note that 
              the time stamp is 320 for 16000 Hz sampling frequency and 160 for 
              8000 Hz sampling frequency.

Encoding considerations:
           This type is defined for transfer via both RTP (RFC
	3550) and stored-file methods as described in Sections
	9 and 10, respectively, of RFC XXXX. Audio data is
	binary data, and must be encoded for non-binary
	transport; the Base64 encoding is suitable for Email.

Security considerations:
           See Section 12 of RFC XXXX.

Public specification:
           The VMR-WB speech codec is specified in following 3GPP2 
Sassan Ahmadi                                                          [page 30]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003
           
           specifications C.P0052-0 and S.R0080-0.
           Transfer methods are specified in RFC XXXX.

Additional information:
           The following applies to stored-file transfer methods:

Magic numbers: 
           Single channel for the non-interoperable modes:
		 ASCII character string "#!VMR-WB\n" 
		 (or 0x2321564d522d57420a in hexadecimal)

           Single channel for the interoperable mode (see RFC 3267):
		 ASCII character string "#!AMR-WB\n" 
		 (or 0x2321414d522d57420a in hexadecimal)
		 
           Multi-channel for the non-interoperable modes:
		 ASCII character string "#!VMR-WB_MC1.0\n" 
		 (or 0x2321564d522d57425F4D43312E300a in hexadecimal)

           Multi-channel for the interoperable mode (see RFC 3267):
		 ASCII character string "#!AMR-WB_MC1.0\n" 
		 (or 0x2321414d522d57425F4D43312E300a in hexadecimal)

File extensions for the non-interoperable modes: vmr, VMR
                 Macintosh file type code: none
	      Object identifier or OID: none

File extensions for the interoperable mode (see RFC 3267): awb, AWB
                 Macintosh file type code: none
	      Object identifier or OID: none 

Person & email address to contact for further information:
                  Sassan Ahmadi, Ph.D.   Nokia Inc. USA 
                  sassan.ahmadi@nokia.com
	       
Intended usage: COMMON.
           It is expected that many VoIP applications (as well as
           mobile applications) will use this type.

Author/Change controller:
               Sassan Ahmadi, Ph.D.   Nokia Inc. USA
               sassan.ahmadi@nokia.com
               IETF Audio/Video Transport Working Group
	    

13.2. Mapping MIME Parameters into SDP

The information carried in the MIME media type specification has a specific mapping to fields in the Session Description Protocol (SDP) [8], which is commonly used to describe RTP sessions.  When SDP is used to specify sessions employing the VMR-WB codec, the mapping is as follows:

    - The MIME type ("audio") goes in SDP "m=" as the media name.
    - The MIME subtype (payload format name) goes in SDP "a=rtpmap" as
Sassan Ahmadi                                                          [page 31]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003
      
      the encoding name.  The RTP clock rate in "a=rtpmap" MUST be
      16000 for VMR-WB (although 8000 is also supported by VMR-WB for narrowband  
      I/O processing), and the encoding parameters
      (number of channels) MUST either be explicitly set to N or
      omitted, implying a default value of 1. The values of N that are
      allowed is specified in Section 4.1 in [19].

    - The parameters "ptime" and "maxptime" go in the SDP "a=ptime"
      and "a=maxptime" attributes, respectively.

    - Any remaining parameters go in the SDP "a=fmtp" attribute by
      copying them directly from the MIME media type string as a
      semicolon separated list of parameter=value pairs.

Some example SDP session descriptions utilizing VMR-WB encodings follow.  In these examples, long a=fmtp lines are folded to meet the column width constraints of this document; the backslash ("\") at the end of a line and the 
carriage return that follows it should be ignored.

Example of usage of VMR-WB in a possible VoIP scenario:

    m=audio 49120 RTP/AVP 98
    a=rtpmap:98 VMR-WB/16000
    a=fmtp:98 octet-align=1

Example of usage of VMR-WB in a possible streaming scenario (two channel stereo): 

    m=audio 49120 RTP/AVP 99
    a=rtpmap:99 VMR-WB/16000/2
    a=fmtp:99 interleaving=30
    a=maxptime:100

Note that the payload format (encoding) names are commonly shown in upper case.  MIME subtypes are commonly shown in lower case.  These names are case-insensitive in both places.  Similarly, parameter names are case-insensitive both in MIME types and in the default mapping to the SDP a=fmtp attribute.


14. IANA Considerations

One new MIME subtype must be registered, see Section 14. A new SDP attribute "maxptime", also defined in Section 14, needs to be registered. The "maxptime" attribute is expected to be defined in the revision of RFC 2327 [11] and is added here with a consistent definition. 


15. Acknowledgements
 
The author would like to thank Dr. Redwan Salami of VoiceAge Corporation and 
Ari Lakaniemi of Nokia Inc. for their technical support throughout the 
draft and review of this document.


Sassan Ahmadi                                                          [page 32]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

Appendix A- VMR-WB Frame Structure

VMR-WB encoder frame structure has been designed to minimize the complexity of the IWF during interoperable interconnections between VMR-WB and AMR-WB. Note that NO IWF is required for interconnections that utilize VMR-WB at both ends.

The VMR-WB frame structure has been designed similar to AMR-WB Interface Format 2 (IF2) bit-packing scheme [2]. The detailed information on VMR-WB frame structure can be found in [10]. For convenience, the frame structure of the interoperable mode is reviewed in this section to facilitate the description of the IWF procedures in Appendix B.
 
The VMR-WB 12.65 kbps Interoperable Full-Rate has the following structure:

<-----------------------------------266 Bits---------------------------------->
<------Preamble-----------><----------AMR-WB Compatible Data Bits------------->
0              7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+---------------------------------------------------+
|1|1|1|1|1|0|0|0|0|0|1|0|1|       AMR-WB mode 2 Class A, B, C Bits            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+---------------------------------------------------+

The first octet "11111000" denotes the Interoperable Full-Rate Type, the next 4 bits "0010" indicate the AMR-WB Frame Type; the 13th bit is the Frame Quality Bit Q of AMR-WB. Therefore, by removing the first octet, AMR-WB Frame Type, Quality bit, and 253 data bits identical to that of AMR-WB mode 2 are obtained, which is compliant with Interface Format 2 [2].

The VMR-WB 8.85 kbps Interoperable Full-Rate has the following structure:
  
<-----------------------------------266 Bits---------------------------------->
<------Preamble-----------><----------AMR-WB Compatible Data Bits------------->
0              7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+---------------------------------------------------+
|1|1|1|1|1|0|0|0|0|0|0|1|1|  AMR-WB mode 1 Class A, B, C Bits     |76 Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+---------------------------------------------------+
             		        
The first octet "11111000" denotes the Interoperable Full-Rate Type, the next 4 bits "0001" indicate the AMR-WB Frame Type; the 13th bit is the Frame Quality Bit Q of AMR-WB. The padding bits are used to adjust the frame size to 266 bits. Therefore, by removing the first octet and the last 76 padding bits, AMR-WB Frame Type, Quality bit, and 177 data bits identical to that of AMR-WB mode 1 are obtained, which is compliant with Interface Format 2 [2].

The VMR-WB 6.6 kbps Interoperable Full-Rate has the following structure:
  
<-----------------------------------266 Bits---------------------------------->
<------Preamble-----------><----------AMR-WB Compatible Data Bits------------->
0              7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+---------------------------------------------------+
|1|1|1|1|1|0|0|0|0|0|0|0|1|    AMR-WB mode 0 Class A, B, C Bits   |121 Padding|
+-+-+-+-+-+-+-+-+-+-+-+-+-+---------------------------------------------------+
             		        
The first octet "11111000" denotes the Interoperable Full-Rate Type, the next 4 bits "0000" indicate the AMR-WB Frame Type; the 13th bit is the Frame Quality 
Sassan Ahmadi                                                          [page 33]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

Bit Q of AMR-WB. The padding bits are used to adjust the frame size to 266 bits. Therefore, by removing the first octet and the last 121 padding bits, AMR-WB Frame Type, Quality bit, and 132 data bits identical to that of AMR-WB mode 0 are obtained, which is compliant with Interface Format 2 [2].

To satisfy the dim-and-burst requirement of the CDMA system, where the speech codec data rate is reduced to accommodate signaling traffic, three interoperable half-rate frame types may be generated depending on the AMR-WB codec mode in an interoperable interconnection. The VMR-WB Interoperable Half-Rate has the following structures depending on the AMR-WB codec mode [10]. Note that the type of the interoperable half-rate is determined by examining the preamble bits of a Rate-Set II Half-Rate frame.

Note that the use of interoperable Half-Rate Frames in VMR-WB is to comply with dim-and-burst signaling requirement of the CDMA system; however, since there is 
no corresponding half-rate mode within AMR-WB, the Interworking Function SHALL
convert these frames into the corresponding AMR-WB frames. There is NO transcoding involved and the conversion is accomplished by adding or removing bits from or to the beginning and the end of the packets in both directions in the transport layer. For non-interoperable modes NO Interworking Function is required.

The VMR-WB 12.65 kbps Interoperable Half-Rate has the following structure:
  
<-----------------------------------124 Bits----------------------------------->
<------Preamble-----------><----------AMR-WB Compatible Data Bits-------------->
0              7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+------------------------------------------------+-+-+
|1|1|0|1|1|1|1|1|0|0|1|0|1|   AMR-WB mode2 Class A, B, C Bits w/o FCB Bits |0|0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+------------------------------------------------+-+-+
             		        
Note that the Fixed-Codebook (FCB) indices (i.e., the last 144 bits [1]) are removed from the end of AMR-WB mode 2 bits to make them fit to Rate-Set II Half-Rate frame size. 

The first octet "11011111" denotes the Interoperable Half-Rate Type, the next 4 bits "0010" indicate the AMR-WB Frame Type, the 13th bit is the Frame Quality Bit Q of AMR-WB. The removal of the FCB indices by the IWF is done in an AMR-WB to VMR-WB interoperable interconnection and upon dim-and-burst signaling. The removed bits are replaced with random bits in VMR-WB to AMR-WB interoperable interconnection by IWF if an interoperable half-rate frame is received.

The VMR-WB 8.85 kbps Interoperable Half-Rate has the following structure:
  
<-----------------------------------124 Bits----------------------------------->
<------Preamble-----------><----------AMR-WB Compatible Data Bits-------------->
0              7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+----------------------------------------------------+
|1|1|0|1|1|1|1|1|0|0|0|1|1|    AMR-WB mode 1 Class A, B, C Bits w/o FCB Bits   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+----------------------------------------------------+
             		        
Note that some of the Fixed-Codebook (FCB) indices (i.e., the last 66 bits [1]) are removed from the end of AMR-WB mode 1 bits to make them fit to Rate-Set II Half-Rate frame size. 
Sassan Ahmadi                                                          [page 34]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

The first octet "11011111" denotes the Interoperable Half-Rate Type, the next 4 bits "0001" indicate the AMR-WB Frame Type, the 13th bit is the Frame Quality Bit Q of AMR-WB. The removal of the FCB indices by the IWF is done in an AMR-WB to VMR-WB interoperable interconnection and upon dim-and-burst signaling. The removed bits are replaced with random bits in VMR-WB to AMR-WB interoperable interconnection by IWF if an interoperable half-rate frame is received.

The VMR-WB 6.6 kbps Interoperable Half-Rate has the following structure:
  
<-----------------------------------124 Bits----------------------------------->
<------Preamble-----------><----------AMR-WB Compatible Data Bits-------------->
0              7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+----------------------------------------------------+
|1|1|0|1|1|1|1|1|0|0|0|0|1|    AMR-WB mode 0 Class A, B, C Bits w/o FCB Bits   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+----------------------------------------------------+
		        
Note that some of the Fixed-Codebook (FCB) indices (i.e., the last 21 bits [1]) are removed from the end of AMR-WB mode 0 bits to make them fit to Rate-Set II Half-Rate frame size. 

The first octet "11011111" denotes the Interoperable Half-Rate Type, the next 4 bits "0000" indicate the AMR-WB Frame Type, the 13th bit is the Frame Quality Bit Q of AMR-WB. The removal of the FCB indices by the IWF is done in an AMR-WB to VMR-WB interoperable interconnection and upon dim-and-burst signaling. The removed bits are replaced with random bits in VMR-WB to AMR-WB interoperable interconnection by IWF if an interoperable half-rate frame is received.

The comfort noise (CN) and Silence Descriptor update (SID_UPDATE) data bits during silence intervals are transmitted through Rate-Set II CNG Quarter-Rate, which has the following frame structure:
 
<-------------------------------------54 Bits---------------------------------->
<-Preamble-><---AMR-WB Compatible Data Bits-->
 0 1 2 3 4             
+-+-+-+-+-+----------------------------------+---------------------------------+
|1|0|0|1|1|     AMR-WB SID Bits (35 bits)    |         14 Padding Bits         |
+-+-+-+-+-+----------------------------------+---------------------------------+

The IWF in an AMR-WB to VMR-WB interoperable interconnection SHALL add the preamble and the padding bits to the beginning and the end of AMR-WB SID_UPDATE bits, respectively, to form a Rate-Set II CNG Quarter-Rate frame. These bits SHALL be removed from the VMR-WB incoming CNG Quarter-Rate packet to form the outgoing SID_UPDATEs for AMR-WB in an interoperable interconnection.

Appendix B- Interworking Function (IWF) for Interoperable AMR-WB <-> VMR-WB Interconnections

The output bit stream of VMR-WB codec has been arranged to closely follow the AMR-WB Interface Format 2 frame structure [2,10]. However, to comply with the constraints of CDMA Link-Layer Assisted Service Options and CDMA2000 Multiplex Sublayer [12,13], the output frame size of the CDMA speech codec in any of the encoding rates SHALL conform to the sizes allowed by CDMA Multiplex Option 2 [12,13] as shown in Table 1. To ensure a transparent speech data flow between 3GPP/AMR-WB and 3GPP2/VMR-WB, an interworking function operating at transport 
Sassan Ahmadi                                                          [page 35]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

layer is required. This is merely a RTP translator as envisioned in RFC3550. The following describes the functions performed by the IWF in an interoperable interconnection. Note that NO interworking function is needed if VMR-WB codecs are incorporated at both ends. Also note that if the path between AMR-WB and VMR-WB does not include the CDMA2000 air-interface, the IWF MAY be eliminated depending on the implementation. An example for such case would be if AMR-WB and VMR-WB are used for Internet based multimedia applications that does not involve any cellular network.

+------------+                                                    +------------+
|   VMR-WB   |                                                    |   AMR-WB   |
+------------+<-----------Session Initiation Using SIP----------->+------------+
|Intermediate|                                                    |Intermediate|
|  Protocol  |                                                    |  Protocol  |
|   Layers   |                        Gateway                     |   Layers   |
+------------+              +-----------------------+             +------------+
|     RTP    |              | Interworking Function |             |     RTP    |
+------------+              +-----------+-----------+             +------------+
|     UDP    |              |    UDP    |    UDP    |             |     UDP    |               
+------------+              +-----------+-----------+             +------------+
|     IP     |              |     IP    |     IP    |             |     IP     |
+------------+              +-----------+-----------+             +------------+
|  Data Link |              | Data Link | Data Link |             |  Data Link |
|    Layer   |              |   Layer   |   Layer   |             |    Layer   |
+------------+              +-----------+-----------+             +------------+
|  Physical  |              |  Physical |  Physical |             |  Physical  |
|    Layer   |<------------>|    Layer  |    Layer  |<----------->|    Layer   |
+------------+              +-----------+-----------+             +------------+

Figure B-1: The data flow in an interoperable interconnection between VMR-WB and AMR-WB.

When receiving a RTP payload from VMR-WB destined for AMR-WB, the IWF SHALL perform the following procedure for every speech packet within the payload; i.e., the payload may contain more than one speech frame.

- The preamble bits of the incoming speech data block SHALL be examined to determine the AMR-WB codec mode as well as to appropriately set the FT field of the outgoing payload ToC.

- In VMR-WB to AMR-WB path, the CMR of the outgoing payload SHALL be set to 2. This means that VMR-WB codec always requests AMR-WB encoder to operate in codec mode 2.

- For incoming FT values 0, 1, or 2, Remove the 13 bit preamble from the beginning of the incoming speech data block and form the AMR-WB compatible speech data block as follows:

* If FT field of the incoming payload ToC is 0, the extra 121 padding bits SHALL be removed from the end of the speech data block.
* If FT field of the incoming payload ToC is 1, the extra 76 padding bits SHALL be removed from the end of the speech data block.
* If FT field of the incoming payload ToC is 2, the speech data block must be used as is.
Sassan Ahmadi                                                          [page 36]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

- If FT field of the incoming payload ToC is 9, the 5-bit preamble in the beginning and the extra 14 padding bits at the end SHALL be removed from the incoming speech data block to form an AMR-WB SID_UPDATE frame.

- If FT field of the incoming payload ToC is 4 and the content of the incoming preamble is 11011111, the FT field of the outgoing payload ToC SHALL be set to 0, 1, or 2 depending on the contents of the four most significant bits of the second octet in the incoming preamble. If the aforementioned bits are 0010, 0001, or 0000, the FT field of the outgoing payload is set to 2, 1, or 0, respectively and 144 (FT=2), 66 (FT=1), or 21 (FT=0) randomly generated bits must be added to the end of the speech data to form a regular AMR-WB speech frame with 253, 177, 132 bits of speech data corresponding to codec modes 2, 1, or 0, respectively. Also the 8-bit preamble at the beginning of the incoming speech block SHALL be removed.

Note that the interoperable half-rate is used by VMR-WB codec to comply with CDMA dim-and-burst signaling requirements when using CDMA2000 Link Layer Assisted Protocols for real-time VoIP services [13,14]). This is an unlikely situation if the path between the source and destination codecs does not include the CDMA2000 air interface, in such cases, the IWF MAY be eliminated depending on the implementation of the terminal codecs.
 
When receiving a RTP payload from AMR-WB destined for VMR-WB, the IWF SHALL perform the following procedure for every speech packet within the payload; i.e., the payload may contain more than one speech frame.

- The valid CMR values in the AMR-WB to VMR-WB path are 0, 1, or 2.

- If FT field of the incoming payload ToC is 0, 121 padding bits (zeros) SHALL be added to the end of the outgoing speech data block and special bit pattern 
"1111100000001" SHALL be added to the beginning of the outgoing speech data block. 

- If FT field of the incoming payload ToC is 1, 76 padding bits (zeros) SHALL be added to the end of the outgoing speech data block and special bit pattern "1111100000011" SHALL be added to the beginning of the outgoing speech data block.

- If FT field of the incoming payload ToC is 2, NO padding bits (zeros) SHALL be added to the end of the outgoing speech data block and special bit pattern "1111100000101" SHALL be added to the beginning of the outgoing speech data block.

- If FT field of the incoming payload ToC is 9, 14 padding bits (zeros) SHALL be added to the end of the speech data block. The special bit pattern "10011" SHALL be added to the beginning of the outgoing speech block.

During an AMR-WB to VMR-WB interoperable interconnection, the CDMA Multiplex Sublayer in the 3GPP2/CDMA2000 link MAY request a half-rate speech frame, the IWF SHOULD convert the interoperable Full-Rate frames (i.e., FT values 0, 1, or 2) to the interoperable Half-Rate frames in the outgoing payload using the following procedure (This is an unlikely situation if the path between the source and destination codecs does not include the CDMA2000 air interface, in such cases the IWF MAY be eliminated depending on the implementation of the 
Sassan Ahmadi                                                          [page 37]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

terminal codecs):

- The size of the outgoing speech data must be adjusted to 124 bits; therefore extra bits SHALL be removed from the end of the incoming speech data. If the FT field of the incoming payload is 0, 1, or 2, then 144, 66, or 21 bits SHALL be removed from the end of the incoming speech data block, respectively, to form a Rate-Set II interoperable Half-Rate speech frame for VMR-WB. 

- The corresponding FT field of the outgoing payload SHALL be set to 4.

- The first 13 bits of the outgoing speech data block SHALL be set to "1101111100101" (for FT=2), "1101111100011" (for FT=1), or "1101111100001" (for FT=0), depending of the value of the FT field of the incoming payload (to be interpreted as interoperable half-rate frame by the VMR-WB decoder at the receiving side).

References
Normative References
  
   [1]  3GPP TS 26.190 "AMR Wideband speech codec; Transcoding
        functions", version 5.1.0 (2001-12), 3rd Generation
        Partnership Project (3GPP).

   [2]  3GPP TS 26.201 "AMR Wideband speech codec; Frame Structure",
        version 5.0.0 (2001-03), 3rd Generation Partnership Project
        (3GPP).

   [3]  S. Bradner, "Key words for use in RFCs to Indicate
        Requirement Levels", IETF RFC 2119, March 1997.

   [4]  3GPP TS 26.193 "AMR Wideband Speech Codec; Source Controlled
        Rate operation", version 5.0.0 (2001-03), 3rd Generation
        Partnership Project (3GPP).

   [5]  H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson,
        "RTP: A Transport Protocol for Real-Time Applications",
        IETF RFC 3550, July 2003.

   [6]  J. Sjoberg, et al., "Real-Time Transport Protocol (RTP) Payload Format  
        and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive 
        Multi-Rate Wideband (AMR-WB) Audio Codecs", IETF RFC 3267, June 2002.

   [7]  3GPP TS 26.192 "AMR Wideband speech codec; Comfort Noise
        aspects", version 5.0.0 (2001-03), 3rd Generation Partnership
        Project (3GPP).

   [8]  M. Handley and V. Jacobson, "SDP: Session Description
        Protocol", IETF RFC 2327, April 1998.

   [9]  3GPP2 S.R.0080-0 "CDMA2000 Wideband Speech Codec Stage 1 Requirements", 
        3GPP2 Technical Specification, February 2003.

   [10] 3GPP2 C.P.0052-0 "Source-Controlled Variable-Rate Multimode Wideband 
        Speech Codec Service Option for Wideband Spread Spectrum Communication 
Sassan Ahmadi                                                          [page 38]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003

        Systems", 3GPP2 Technical Specification, to be published in April 2004.

   [11] 3GPP2 C11-20030915-003R2 "CDMA2000 Wideband Speech Codec 
        Characterization Test Plan", 3GPP2 Technical Specification, September 
        2003.

   [12] 3GPP2 C.P.9021 "Link-Layer Assisted Robust Header Compression, Service 
        Options for Voice-over-IP Operation", 3GPP2 Technical Specification, 
        September 2002.

   [13] 3GPP2 C.S.0003A-2 "Medium Access Control (MAC) Standard for CDMA2000  
        Spread Spectrum Systems", Release A, 3GPP2 Technical Specification, 
        February 2002.

   [14] 3GPP2 C.S0005-0 "Upper Layer (Layer 3) Signaling Standard for
        cdma2000 Spread Spectrum Systems", Release 0, 3GPP2 Technical    
        Specification, June 2002. 

   
Informative References

   
   [15] S. Floyd, M. Handley, J. Padhye, J. Widmer, "Equation-Based
        Congestion Control for Unicast Applications", ACM SIGCOMM
	2000, Stockholm, Sweden 

   [16] J. Rosenberg, and H. Schulzrinne, "An RTP Payload Format
	for Generic Forward Error Correction", IETF RFC 2733,
	December 1999.

   [17] Baugher, et al., "The Secure Real Time Transport Protocol",
	IETF Draft (Work in Progress), November 2001.

   [18] C. Perkins, et al., "RTP Payload for Redundant Audio Data",
	IETF RFC 2198, September 1997.

   [19] H. Schulzrinne, "RTP Profile for Audio and Video Conferences 
   	with Minimal Control" IETF RFC 3551, July 2003.

   Any 3GPP document can be downloaded from
   the 3GPP web server, "http://www.3gpp.org/", see specifications.
   Any 3GPP2 document can be downloaded from
   the 3GPP2 web server, "http://www.3gpp2.org/", see specifications.

Author's Address

The editor will serve as the point of contact for all technical matters    related to this document.

   Dr. Sassan Ahmadi                     Phone: 1 (858) 831-5916
                                         Fax: 1 (858) 831-6513
   Nokia Inc.                            EMail: sassan.ahmadi@nokia.com
   12278 Scripps Summit Dr.
   San Diego, CA 92131 USA
Sassan Ahmadi                                                          [page 39]
INTERNET-DRAFT    VMR-WB RTP Payload and File Storage Formats   November 2003
   
   This Internet-Draft expires in six months from November 2003.


Full Copyright Statement

   Copyright (C) The Internet Society (2003). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assignees.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Acknowledgement

   Funding for the RFC Editor function is currently provided by the
   Internet Society.


Sassan Ahmadi                                                          [page 40]