Robust Header Compression                               Peter J. McCann
  INTERNET DRAFT                                               Tom Hiller
  Document: draft-mccann-rohc-gehcoarch-01.txt        Lucent Technologies
                                                           February, 2001


      Requirements and Architecture for Zero-Byte Header Compression


  Status of this Memo

     This document is an Internet-Draft and is in full conformance with
     all provisions of Section 10 of RFC2026 [Bradner96].

     Internet-Drafts are working documents of the Internet Engineering
     Task Force (IETF), its areas, and its working groups. Note that
     other groups may also distribute working documents as Internet-
     Drafts.

     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other
     documents at any time. It is inappropriate to use Internet- Drafts
     as reference material or to cite them other than as "work in
     progress."

     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.


  1. Abstract

     Efficient transmission of voice over wireless links requires
     significant engineering effort.  Because of the high cost of
     bandwidth on such links, special techniques for compression of
     voice data and its transmission over the air have been developed.
     The compression techniques and the wireless physical layers have
     been co-designed for maximum spectral efficiency and human
     perceptual euphony.

     Voice over IP (VOIP) applications should be able to leverage this
     engineering effort when used over wireless links.  We advocate a
     "zero-byte header compression" approach to this problem in order
     to enable the end-to-end service model while achieving maximum
     spectral efficiency.  This document outlines an architectural
     framework for a wireless VOIP application, including the wireless
     link layer and its interface to typical IP stack implementations,
     and discusses the protocol elements that should be standardized
     between the various components.


  McCann, Hiller             Expires 08/2001                           1
                                GEHCOARCH                 February, 2001


  2. Introduction

     Voice over IP (VOIP) promises to change radically the way that
     telephony services are built and delivered.  Integration of voice
     with the Internet will not just be a change in the way traffic is
     carried; rather, new types of services will be made possible by
     the integration of voice with existing Internet applications such
     as the World Wide Web and e-mail.  The key to these new services
     will be a platform that offers open programmability while offering
     a transport for VOIP in an integrated, robust, and efficient way.

     Wireless links offer great challenges to the transport of voice
     traffic, and significant engineering effort has gone into making
     them efficient for circuit voice applications.  New voice
     compression algorithms ("codecs"), such as EVRC [TIA-IS127], SMV
     [TIA-SMV], or AMR [ETSI-AMR] have been developed to minimize the
     amount of data that must be carried, and special over-the-air
     channels have been implemented to carry these codecs with a
     minimum of overhead bits and minimal latency.

     VOIP flows will be carried inside the Real-Time Protocol (RTP)
     [Shulzrinne96] on wired links.  However, for wireless links, the
     situation is less clear.  The limited bandwidth of wireless links
     makes it impossible to transmit the entire IP/UDP/RTP header with
     every packet, as the overhead would be prohibitive.  It is
     possible to compress these headers by transmitting only updates to
     the fields that change rather than the entire header [Bormann00],
     but these compression schemes are complex and can never entirely
     eliminate the overhead due to RTP.  Even when the header is
     compressed down to one byte per frame on average, the impact on
     spectral capacity is significant.  Also, the variable-sized frames
     produced by these compression protocols are unsuitable for typical
     wireless links that support only a limited number of frame sizes.

     The fundamental reason why these schemes cannot achieve the same
     efficiency as circuit data is that they discard information that
     is available at the physical channel layer, including the real-
     time nature of the traffic, which can assist in reconstructing the
     RTP header.  This document describes an architectural framework
     that allows such real-time information to be used while not
     restricting the choice of call control protocol, placement of call
     feature servers, or mobile station architecture.  We show that
     such a scheme can achieve complete transparency in the
     reconstruction of IP headers given a few reasonable assumptions
     about the behavior of the real-time physical link and the RTP
     packet stream.  A companion document [Hiller01] gives a concrete
     realization of part of the architecture by extending the base ROHC
     protocol with a zero-byte compression profile.


  McCann, Hiller             Expires 08/2001                           2
                                GEHCOARCH                 February, 2001


  2.1 Wireless Technology Considerations

     Cellular wireless technologies will support distinct bearer
     channels for real-time audio flows versus non-real-time data.
     Data for TCP, such as web or e-mail traffic, will suffer from the
     lossy nature of the wireless link unless a link-layer
     retransmission protocol is used to improve its reliability.  Such
     a retransmission protocol (called the Radio Link Protocol or RLP
     in the emerging cellular data networks) does improve reliability
     but only at the expense of additional buffering and latency.
     Real-time audio streams cannot tolerate the additional latency,
     which could be on the order of 1 second under adverse radio
     conditions.  For this reason, a separate bearer channel will be
     used for voice that does no retransmission.  This bearer will be
     very similar to the existing circuit voice channels.  The
     architecture outlined below allows the mobile station to make
     effective use of this channel for VOIP.

     The architecture of a mobile station should allow maximum
     flexibility in its hardware and software choices.  Two basic
     mobile station models have been identified in the wireless data
     community.  A "network model" station is one that is completely
     integrated, such as a phone plus browser or a palmtop with
     integrated radio hardware.  Such a device usually has a real-time
     operating system, a DSP chip for processing the audio codec, and
     an embedded IP stack implementation.  In contrast, a "relay model"
     station is one that is split in two: it consists of a piece of
     terminal equipment (such as a laptop computer) connected to a
     piece of radio equipment, usually by a serial connection.  The
     idea is to make use of the mostly stock operating system on the
     terminal equipment, while "relaying" the data to and from the
     wireless network via the radio equipment.  We take the point of
     view that VOIP applications must be supported for both kinds of
     mobile stations.  While network model phones will offer a tightly
     integrated set of services, relay model stations are likely to
     offer a much more open and programmable environment on the
     terminal equipment.  As these devices evolve we expect the
     distinction between network and relay models will blur as the
     wireless device moves closer to the UNIX notion of a "network
     interface" to a stock operating system, and the operating system
     evolves to take on more real-time functionality.


  3. Requirements

     A zero-byte header compression scheme makes use of physical
     channel timing to accurately reconstruct the RTP header fields.
     The basic requirement of such a scheme is to accurately
     reconstruct headers while adding a minimum of overhead bits when
     compared to ordinary circuit voice.


  McCann, Hiller             Expires 08/2001                           3
                                GEHCOARCH                 February, 2001


     Approximate voice activity factors (probability distribution of
     frame sizes) for the Selectable Mode Vocoder (SMV) are given in
     Figure 1.  These reflect one party's activity during a typical
     two-way interactive voice call.

               Rate            Activity %     Payload (bits)

               Full               20              171
               Half               20               80
               Quarter            10               40
               Eighth             50               16

          Figure 1: Activity of the 3GPP2 Selectable Mode Vocoder

     This vocoder is designed to operate synchronously with the
     underlying physical channel: it outputs one of the above frame
     sizes every 20 milliseconds.  Which frame size is output depends
     on the characteristics of the speech being compressed; typically,
     full-rate (171 bit) frames are used during active talk spurts,
     interspersed with half- and quarter-rate frames as needed.
     Eighth-rate frames are used mainly during silence periods, but
     they also contain information about the noise components present
     in the silence, which is referred to as "comfort noise
     generation".  Also, the physical link typically requires that some
     frame be transmitted during every 20ms interval so that power
     control can be maintained, and the eighth-rate frames play this
     role.

     The cdma2000 air interface has been designed with these frame
     sizes in mind, to support optimal transport of circuit voice.  It
     is not possible to perform a marginal adjustment to the frame
     sizes to accommodate header overhead.  This makes application of
     the basic ROHC RTP profile problematic at best: if one byte of
     LSB-encoded sequence number is added to a frame, it must be
     carried in the next-higher frame format.  For a full-rate frame,
     there is no next-higher frame format and so those frames could not
     be transported without breaking the synchronization with the
     underlying physical link and introducing additional framing, for
     example with the use of PPP HDLC flags or the ROHC segmentation
     mechanism.  This would introduce another 1 or 2 bytes of overhead
     per frame, and would also have a multiplier effect on the frame
     error rate since most vocoder frames would now span two physical
     frames.  Finally, this lack of synchronization would introduce an
     occasional lag between the vocoded frame time and real time that
     could add to the end-to-end latency and jitter of the RTP flow.

     Even a very conservative calculation, assuming these problems can
     be overcome and ignoring the contribution from eighth-rate 16 bit
     frames, yields an additional 400 bits per second from the header
     and segmentation overheads.  Compared to the average 3720 bps
     circuit voice rate, this overhead (greater than 10%) would

  McCann, Hiller             Expires 08/2001                           4
                                GEHCOARCH                 February, 2001


     significantly diminish the number of calls that can be handled in
     a given amount of spectrum.  We conclude that because the codec
     and physical link have been co-engineered to such tight
     tolerances, we should endeavor to use the vocoder/physical link
     largely unchanged from its existing implementation for circuit
     voice.  By not imposing any new format requirements on the vocoded
     frames, we allow development of future codecs to proceed with
     maximum flexibility.

     By making use of the real-time nature of the physical link, it is
     possible to eliminate header overhead while retaining the
     transparency offered by the basic ROHC RTP profile.  This is
     because the time at which a frame arrives can be used as an
     indication of the proper sequence number that should be assigned
     to the corresponding RTP packet.  In the opposite direction,
     packets can be scheduled for transmission during the correct
     physical layer interval so that all end-to-end semantics are
     preserved.

     In order for real-time to serve as a proxy for the RTP sequence
     number, it must be the case that the sequence number increments by
     one for every physical layer epoch.  This would be satisfied if
     the transmitter sends a vocoded frame for every epoch, as is done
     by the existing cdma2000 vocoders even during silence intervals.
     Under this assumption, synchronization for play-back or for
     cryptosync, could then be based on the accurate sequence number or
     timestamp included with each packet. Note that in 3G systems the
     mobile node transmits continuously even during silence so that the
     network may monitor power.

     We assume that IPv4 Identifiers (used in fragmentation and re-
     assembly) can be taken from a contiguous range and do not need to
     be encoded with every packet.  Only when a new range of
     Identifiers is chosen does an update need to be sent.  We also
     assume that other fields of an RTP header such as CSRCs are
     updated rarely, if at all, and so these updates can be carried
     over the sister reliable data link to the peer without imposing
     much additional overhead.  Finally, we assume that other events,
     such as a reset of the physical link due to hard handoff or a
     sequence number slippage due to clock drift, are similarly rare.
     All of these updates can be transported over the sister reliable
     data link when they occur.  These updates can make use of the same
     mechanism used to initialize the flow parameters, in order to
     minimize the complexity of the decompressor, or special optimized
     update methods may be developed.


  McCann, Hiller             Expires 08/2001                           5
                                GEHCOARCH                 February, 2001


     In summary, we pose the following set of requirements for a zero-
     byte compression scheme:

            - Little or no overhead compared to circuit voice

            - No change to the format of voice payloads.

            - Transparent decompression of all IP/UDP/RTP fields

     To meet requirements, we pose the following as assumptions on the
     operating environment:

            - The underlying link is synchronous: it provides precise
              indication of when a given frame arrived.

            - The RTP stream is well behaved: the sequence number is
              incremented every 20 milliseconds, and IPv4 Identifiers
              are taken from a contiguous range.


  4. Reference Architecture

     Our reference architecture is shown in Figure 2.


       Other NRT       VOIP            Zero-Byte
         Apps        Control------------Control
            \              \               |
             \              \              |
              +-------------IP Protocol    |
                              Stack       /
                                |        /
                                |       /
       Header Comp/  ------Data Link--+                   Peer
         Decomp      \        Layer                       System
                      \___      |                            |
                          \     |                            |
        Audio       Codec  \    +-------------->Physical<----+
       Hardware<--->Impl <--+------------------>Channel(s)<--+

      Figure 2.  Reference architecture for a system implementing
                 zero-byte header compression.


     The architecture diagram consists of nine components connected to
     a peer system.  Note that we expect zero-byte header compression
     to be somewhat asymmetric in that it will usually be implemented
     between a mobile station, where the VOIP and other applications
     reside, and a peer network entity that is just a data link
     termination point and a first-hop Internet router.  As such, the

  McCann, Hiller             Expires 08/2001                           6
                                GEHCOARCH                 February, 2001


     peer system in the network will likely be missing the audio
     hardware and codec implementation, and may not participate in the
     VOIP control.  Also, the mobile station may not need to actually
     perform header compression and decompression if its codec
     implementation is connected directly to the physical channel,
     which will likely be required to achieve the desired latency
     guarantees.

     The component named "Zero-Byte Control" would consist of the
     protocol logic used to set up and maintain the zero-byte header
     compression context.  We assume this will be realized as a profile
     extension to the ROHC framework, and give a concrete realization
     of the needed protocols in a companion document [Hiller01].

     In the following subsections we discuss each of the architectural
     elements in turn.  The next section will discuss the interfaces
     between them.


  3.1 Non Real-time Components

     It is important to distinguish between the real-time and non real-
     time components of Figure 2.  This is especially important for a
     relay model mobile station, as it impacts which elements of stock
     operating systems can be reused and which must be implemented as
     new real-time extensions.  In this subsection we examine the non
     real-time components.


  3.1.1 VOIP Control

     The VOIP control component is the implementation of the call
     signaling protocol, such as SIP [Handley00] or H.323 [ITU-H323].
     We make no assumptions on which protocol is used, and we do not
     require the network-side peer system to contain this element.  The
     mobile station will use one of the VOIP signaling protocols to
     interact with call feature servers that could be anywhere on the
     Internet.

     We assume that this component will open network-layer connections
     and will have access to the transport endpoint identifiers for the
     IP/UDP/RTP flow.  However, we do not require this element to
     actually process audio data; it will probably be implemented in
     user-space and would therefore add unpredictable latency to such
     flows.


  3.1.3 IP Protocol Stack Implementation

     We assume that the mobile station implements an IP protocol stack
     in conformance with RFC 1122 [Braden89].  Note that such an

  McCann, Hiller             Expires 08/2001                           7
                                GEHCOARCH                 February, 2001


     implementation is usually not capable of supporting hard real-time
     tasks.


  3.1.4 Data Link Layer

     The data link layer is the interface between the IP protocol stack
     and the wireless network device.  For cdma2000, this will be PPP
     [TIA-IS835].  For GPRS, this will be LLC [ETSI-LLC], and for UMTS,
     this will be PDCP [ETSI-PDCP].

     For cdma2000, we assume a mostly stock PPP implementation for
     interaction with the physical channels that support data and
     perform retransmission.  However, because the data link layer is
     not a hard real-time component, we would not place it on the audio
     traffic path inside the mobile station.


  3.1.5 Zero-Byte Control

     The Zero-Byte Control component is responsible for negotiating the
     use of ROHC parameters with the peer system and for setting up
     context information such as the fixed portion of the IP/UDP/RTP
     header.  It will interact with the VOIP control component to
     acquire these parameters, and will send them across the data link
     layer to the peer system.  It will also interact with the wireless
     device (possibly through the data link layer) to establish the
     physical audio channels and will identify the channel to be used
     when sending context information to the peer system.  It will also
     need to receive indications from the physical layer when the
     channel is reset, such as during hard handoff, so that the context
     can be re-synchronized.  Finally, it should get indications from
     the audio hardware and codec (or the header compression component
     if no codec implementation is present) about sequence number
     slippage due to clock drift so that re-synchronization updates can
     be sent to the peer.


  3.1.6 Other Non-Real-Time Applications

     We expect the terminal equipment to be a general-purpose computer
     and as such will have other applications running.  These
     applications may interact with other components such as the IP
     protocol stack, but in general will not be hard real-time tasks.
     These applications must co-exist will all the other components.


  3.2 Real-time Components

     Because we make use of the real-time nature of the physical
     channel, several components must be implemented as real-time

  McCann, Hiller             Expires 08/2001                           8
                                GEHCOARCH                 February, 2001


     tasks.  For a network model phone, this is similar to existing
     practice: a tightly integrated, real-time operating system on an
     embedded device schedules the audio sampling and playback to
     coincide with the physical frame rate of the wireless link.  For a
     relay model terminal, we wish to make use of the audio hardware on
     the connected terminal equipment.  This may require that the
     components be implemented using special real-time extensions to
     existing stock operating systems.


  3.2.1 Audio Hardware

     The audio hardware consists of the analog-to-digital (A/D) and
     digital-to-analog (D/A) converters used for sampling and playing
     back sound, along with the analog microphones and speakers.  In a
     network model phone this consists of the integrated equipment that
     is part of the phone.  In a relay model terminal it would be the
     "sound card" or other audio peripheral.  To achieve the required
     hard real-time performance we assume that special software drivers
     may be required in such relay model terminals.


  3.2.2 Codec Implementation

     The codec implementation converts the sampled audio to and from
     the special wireless-specific encoding format.  For a network
     model phone, this encoding is carried out on dedicated Digital
     Signal Processing (DSP) hardware.  In a relay model terminal, we
     assume this is performed on the general purpose CPU of the
     terminal equipment.


  3.2.3 Physical Channel

     As mentioned before, there will be two physical channels
     supporting the mobile station: one that runs RLP retransmission,
     supporting the latency tolerant data applications; and another
     that resembles a voice circuit.  VOIP control signaling will
     traverse the data-oriented RLP channel, while the voice bearer
     traffic will traverse the real-time circuit-like channel.

     Both channels must be available to the upper layers regardless of
     whether a relay model or network model terminal is used.  The
     voice channel supports real-time traffic and performs no
     buffering.  It will send a frame at precise, periodic intervals,
     such as 20 milliseconds for cdma2000.  The codec implementation
     must be able to supply frames for the physical channel at exactly
     this rate.

     We assume that a physical channel is established at a given point
     in time and that both peers can count the number of frame

  McCann, Hiller             Expires 08/2001                           9
                                GEHCOARCH                 February, 2001


     intervals that have elapsed so far.  The channel may be "reset" by
     handoff events (such as hard handoffs in cdma2000) that do not
     necessarily result in a change of peer system, but which may
     require re-synchronization of compression/decompression state with
     the physical channel.


  3.2.4 Header Compression/Decompression

     We expect the codec implementation to be directly connected to the
     physical channel on the mobile terminal side, and so concrete
     IP/UDP/RTP headers may not necessarily appear inside the mobile
     terminal.  Therefore, the header compression/decompression
     component is only really necessary on the network side of the
     zero-byte header compression protocol.  This component is drawn
     next to the data link layer in the diagram, and may in fact be
     integrated into the data link layer implementation.  It is
     responsible for classifying each packet coming down from the IP
     protocol stack against the fixed IP/UDP/RTP header fields we are
     attempting to compress.  The value of these fields is established
     by the Zero-Byte control component and installed into the header
     compression component, possibly via the data link layer.  Once the
     header has been stripped this component must schedule the payload
     for transmission on the physical layer at the appropriate frame
     interval, according to the sequence number and timestamp received
     in the header.

     In the opposite direction, when packets arrive on the network side
     from the physical channel, this component is responsible for
     regenerating the proper IP/UDP/RTP header and passing the packet
     on to the IP protocol stack.  It makes use of the physical arrival
     time to generate the proper timestamp and sequence number in the
     RTP header.

     Because the header compression/decompression component is sending
     and receiving packets from the IP protocol stack, it is at best a
     soft real-time component.  However, it must interact with the
     physical voice channel, which is a hard real-time component, both
     to properly record the frame arrival time and to schedule outgoing
     packets for transmission.  If the header compression/decompression
     is implemented in a separate network element from the physical
     channel, as is likely to be the case in the emerging cellular
     architectures [TIA-IS835], then this interaction could be
     accomplished with the proper use of sequence numbers on the
     interfaces between them so that each physical frame carries the
     information about precisely when it arrived or when it is to be
     transmitted.


  McCann, Hiller             Expires 08/2001                          10
                                GEHCOARCH                 February, 2001


  4. Interfaces

     In this section we examine the interfaces between the above
     components.  We distinguish between those interfaces that should
     be implemented as protocols, suitable for standardization in the
     IETF or elsewhere, and those that should remain Application
     Programming Interfaces (APIs) that may or may not need to be
     standardized.


  4.1 Protocol Reference Points

     In terms of new protocols, the interfaces that need to be
     standardized are listed below.  Some of these interfaces are
     opportunities for IETF protocols, while others should be carried
     out by other standards-setting organizations.


  4.1.1 Zero-Byte Control to Data Link Layer

     The Zero-Byte control component needs to negotiate the use of ROHC
     with its peer and convey the static portion of the IP/UDP/RTP
     header to the peer.  This should be done in such a way that the
     network side is not required to participate in the VOIP control
     protocol.  This means the network side depends on the mobile
     station to inform it what are the RTP flows that should be
     classified by the header stripping component as appropriate for
     sending over the physical voice channel.  Rather than create a new
     network-layer protocol, we advocate using new data link messages
     between the two systems to convey this information.

     We advocate extending ROHC with a new zero-byte profile.  The
     GEHCO proposal [Hiller01] is an attempt at this, and this work
     should be carried out in the IETF.  The particular mapping of ROHC
     onto a data link layer such as PPP should also be performed in the
     IETF.  The mapping of ROHC onto other link layers is a job for
     other standards bodies.


  4.1.2 Data Link Layer to Physical Channel

     Mobile terminals running PPP will typically generate an octet
     stream that is appropriate for an underlying physical channel
     running RLP.  However, prior to running PPP the mobile terminal
     must take steps to establish the channel.  Also, we require that
     the terminal be able to dynamically establish and release the
     voice channels used for real-time audio.  For a network model
     phone this may be supported by APIs within the phone, but for a
     relay model terminal this signaling needs to be carried out across
     a serial port.  Such signaling is usually the provenance of a

  McCann, Hiller             Expires 08/2001                          11
                                GEHCOARCH                 February, 2001


     modem control protocol ("AT commands") and standardization is
     probably best carried out in the International Telecommunications
     Union (ITU).  Note that in addition to the usual signaling to
     establish and release channels, we also need to obtain identifying
     information for each channel, along with a precise reference for
     the time at which the first physical voice frame will be
     transported across the wireless link.  This information will be
     used by the Zero-Byte control component to communicate the initial
     timestamp and sequence number offsets to the peer.  It must be
     possible to signal this information during a running PPP session.
     Also, the precise timing of handoff events in the network must be
     communicated to the Zero-Byte control implementation so that it
     can properly re-synchronize the compression state or carry out
     negotiation with the new peer system, if the handoff resulted in a
     change of attachment point.  Note that some handoff events
     resulting in a reset of the physical channel will not result in a
     change of peer attachment point, depending on the architecture of
     the underlying access network.  If Zero-Byte control state is
     proactively transferred from a source peer system to a target peer
     system, the relationship between RTP timestamps and physical layer
     frames must be preserved.


  4.1.2 Physical Channel to Codec or Header Compression/Decompression

     As stated above, the physical channel will interface directly to
     the codec implementation on the mobile station side and to a
     header compression/decompression process on the network side.  For
     a network model phone, the codec interface may be a proprietary
     API.  However, for a relay model terminal, we must standardize a
     new way to transport the frames across a serial connection in
     real-time.  This will require that we multiplex the real-time
     frames with the non-real-time data for PPP.  This multiplexing
     could be carried out with the use of escape characters on the
     serial interface; again, this work is probably best carried out
     within the ITU.  Any new special characters would need to be
     properly inserted into the ACCM of the PPP implementation.

     On the network side, the physical voice channel may be separated
     from the header compression/decompression process by an IP
     network.  If this is the case then each physical frame must carry
     a sequence number that indicates the exact frame time that it was
     received or is to be transmitted over the air.  Standardization of
     such interfaces is best carried out within the 3rd Generation
     Partnership Projects (3GPPs).


  4.2 API Reference Points

     Other interfaces between the components are best done as
     Application Programming Interfaces (APIs) and may or may not need

  McCann, Hiller             Expires 08/2001                          12
                                GEHCOARCH                 February, 2001


     to be standardized.  In any case we do not advocate the
     standardization of APIs within the IETF and we discuss these
     interfaces for illustration purposes only.


  4.2.1 VOIP Control to Zero-Byte Control

     The VOIP control component is responsible for end-to-end VOIP
     signaling such as SIP [Handley00] or H.323 [ITU-H323].  We expect
     these applications to be implemented by many different people and
     to use standard operating system interfaces.  Also, these
     applications should work the same way when used in wireless or
     wireline settings, except that the codecs should be tailored for
     the specific link layer currently in use.

     When used over wireless links, we expect that applications will
     want to make use of the optimized real-time path outlined above
     (audio hardware to codec to physical channel) rather than taking
     audio data into user space, performing a user space codec
     transformation, constructing RTP packets, and writing them to a
     standard UDP socket.  Such user space manipulation of audio
     traffic would introduce unpredictable latency to the flow.

     To enable the optimized real-time path, the VOIP control protocol
     should signal to the Zero-Byte control component that it has
     completed VOIP signaling and is ready to begin audio bearer flow.
     This signal might be a system call containing the IP/UDP/RTP
     parameters that have been negotiated and the codec to be used.
     This system call would be a one-line addition to existing VOIP
     client implementations.


  4.2.2 Zero-Byte Control to Real-time Path

     When the Zero-Byte control component receives a signal from the
     VOIP control component that the VOIP signaling has been completed,
     it must take the following steps:

       1) Open the new physical voice bearer channel;

       2) Send the peer system information about the flow, including
          the static header fields and identification of the physical
          bearer channel; and, finally,

       3) Trigger the audio hardware to begin sampling, and the codec
          implementation to begin encoding/decoding.

     The first step could be accomplished via an interface to the data
     link layer, or may be accomplished directly.  In any case we need
     to acquire precise timing information about when the first voice
     physical frame will be sent and this timing needs to be related to

  McCann, Hiller             Expires 08/2001                          13
                                GEHCOARCH                 February, 2001


     the internal time reference that will be used for RTP timestamps.
     In the second step, this timing information is used to inform the
     peer what header fields should be placed on the first physical
     frame.  The third step requires interaction with the real-time
     components such as the audio hardware and codec implementation, to
     enable the real-time data to start flowing.

     Note that whenever an event takes place that requires re-
     synchronization of the compression state, such as a physical layer
     reset or sequence number slippage due to clock drift, the Zero-
     Byte control component must update its peer with the appropriate
     state.  This update should include an offset, calculated from the
     time the channel was established or reset, indicating to which
     physical layer frame the update applies.  Such offset-indicating
     updates should also be sent when any of the normally static header
     fields, such as TTL, TOS, or CSRCs change.  This will enable
     completely transparent decompression of RTP header fields.


  4.2.3 Header Compression/Decompression to Data Link Layer

     The header compression component must classify all traffic from
     the IP protocol stack as to whether it is part of the RTP flow
     that needs to be sent on the voice physical channel.  Because it
     must examine each packet, it will probably be fairly tightly
     integrated with the data link layer.

     The header decompression component produces IP packets from the
     physical voice frames and sends them up the IP protocol stack.
     Getting packets to the IP protocol stack may be implemented by
     passing the packets through the data link layer.


  4.2.3 Other Interfaces

     The mobile terminal potentially will be executing many
     simultaneous applications and we expect all of the standard
     interfaces (network sockets, GUI) to be present.  Note that
     ordinary applications may want to use the audio hardware at the
     same time as a voice call is in progress.  This could be
     disallowed, or a special "audio mixer" process could be introduced
     between the audio hardware and the codec implementation to allow
     such simultaneous access.  For example, a system beep noise might
     be mixed into the telephone call in such a way that only the
     mobile terminal user would hear it.

     Much ado has been made about the proper reconstruction of the IP
     Identification field for each RTP packet.  We note that RTP
     payloads are required to stay within the path MTU [Handley99] and
     should never experience fragmentation.  However, in order to avoid
     any possibility of Identification field collision with other

  McCann, Hiller             Expires 08/2001                          14
                                GEHCOARCH                 February, 2001


     packets that may be fragmented, a new interface could be
     implemented between the Zero-Byte control and the IP protocol
     stack to "reserve" a range of Identification values for use by the
     RTP flow.  If the header decompression component always increments
     the Identification field by one for each reconstructed header, and
     wraps around to the beginning when the range is about to overflow,
     then no additional work is necessary to ensure uniqueness of IP
     Identification fields.


  5. Conclusions

     This draft has presented an architecture for zero-byte header
     compression and its implications for both a mobile station and the
     supporting network.  On the network side, with this architecture
     the peer in the network does not need to be aware of the VOIP
     control between the mobile and a SIP/H323 server that could be
     anywhere in the network.  When the header compression/
     decompression is performed in a network element that is physically
     separated from the physical channel (e.g. a PDSN from 3GPP2 [TIA-
     IS835]), the hard real-time requirements on this element can be
     alleviated through the proper use of sequence numbers on its
     interface to the radio channel elements.

     On the mobile side, this draft provides high level requirements
     for support of zero-byte header compression in the form of
     protocol interfaces and APIs.  Both monolithic network style
     mobiles as well as relay phone mobiles with laptops are discussed.
     Proper architecture of the mobile station allows the segregation
     of hard real-time processing from the non-real-time IP stack and
     applications.  Furthermore, convergence of wireline and wireless
     applications is a long-standing goal in the wireless industry.
     This architecture allows mobile end systems to run VOIP based
     applications developed for wireline access to operate in the
     wireless environment (although with wireless-specific codecs). The
     impact on VOIP applications could be as little as one line of code
     in the VOIP client itself.

     Finally, the draft has outlined protocol work items suitable for
     the IETF as well as external standards bodies, including the ITU
     and 3rd Generation Partnership Projects.  Any necessary APIs could
     be standardized by a collaboration between operating system
     vendors (open source or otherwise) and third party application
     developers, driven by wireless service providers.


  6. References

     [Bormann01]    Bormann, C. (ed.), "RObust Header Compression
                    (ROHC)," draft-ietf-rohc-rtp-09.txt, March 2001.
                    Work In Progress.

  McCann, Hiller             Expires 08/2001                          15
                                GEHCOARCH                 February, 2001


     [Braden89]     Braden, R. (ed.), "Requirements for Internet Hosts
                    -- Communication Layers," RFC 1122, October 1989.

     [Bradner96]    Bradner, S., "The Internet Standards Process,
                    Revision 3," RFC 2026, October 1996.

     [ETSI-AMR]     European Telecommunications Standards Institute,
                    "Adaptive Multi-Rate (AMR) Speech Transcoding," 3G
                    TS 26.090, February 2000.

     [ETSI-LLC]     European Telecommunications Standards Institute,
                    GSM 04.64.

     [ETSI-PDCP]    European Telecommunications Standards Institute, 3G
                    TS 25.323.

     [Handley99]    Handley, M., and Perkins, C., "Guidelines for
                    Writers of RTP Payload Format Specifications," RFC
                    2736, December 1999.

     [Handley00]    Handley, Schulzrinne, Schooler, Rosenberg, "SIP:
                    Session Initiation Protocol," draft-ietf-sip-
                    rfc2543bis-01.txt, August 2000.  Work In Progress.

     [Hiller01]     Hiller, T., and McCann, P., "Good Enough Header
                    COmpression (GEHCO)," draft-hiller-rohc-gehco-
                    01.txt, February 2001.  Work In Progress.

     [ITU-H323]     International Telecommunications Union, "Packet
                    Based Multimedia Communications Systems," ITU-T
                    Rec. H.323, September 1999.

     [Shulzrinne96] Schulzrinne, H., Casner, S., Frederick, R., and
                    Jacobson, V., "RTP: A Transport Protocol for Real-
                    Time Applications," RFC 1889, January 1996.

     [TIA-IS127]    Telecommunications Industry Association, "Enhanced
                    Variable Rate Codec, Speech Service 3 for Wideband
                    Spread Spectrum Digital Systems," TIA/EIA/IS-127,
                    February 1997.

     [TIA-IS835]    Telecommunications Industry Association, "Wireless
                    IP Network Standard," TIA/EIA/IS-835, June 2000.

     [TIA-SMV]      Telecommunications Industry Association,
                    "Selectable Mode Vocoder Service Option for
                    Wideband Spread Spectrum Communication Systems,"
                    TIA PN4575, 3GPP2 C.P9001, 1997.


  McCann, Hiller             Expires 08/2001                          16
                                GEHCOARCH                 February, 2001


  7. Authors' Addresses

     Peter J. McCann
     Lucent Technologies
     Rm 2Z-305
     263 Shuman Blvd
     Naperville, IL  60566-7050
     USA

     Phone: +1 630 713 9359
     FAX:   +1 630 713 4982
     EMail: mccap@lucent.com

     Tom Hiller
     Lucent Technologies
     Rm 2F-218
     263 Shuman Blvd
     Naperville, IL  60566-7050
     USA

     Phone: +1 630 979 7673
     FAX:   +1 630 979 7673
     EMail: tom.hiller@lucent.com


  Intellectual Property Statement

     The IETF takes no position regarding the validity or scope of any
     intellectual property or other rights that might be claimed to
     pertain to the implementation or use of the technology described
     in this document or the extent to which any license under such
     rights might or might not be available; neither does it represent
     that it has made any effort to identify any such rights.
     Information on the IETF's procedures with respect to rights in
     standards-track and standards-related documentation can be found
     in BCP-11.  Copies of claims of rights made available for
     publication and any assurances of licenses to be made available,
     or the result of an attempt made to obtain a general license or
     permission for the use of such proprietary rights by implementers
     or users of this specification can be obtained from the IETF
     Secretariat.

     The IETF invites any interested party to bring to its attention
     any copyrights, patents or patent applications, or other
     proprietary rights that may cover technology that may be required
     to practice this standard.  Please address the information to the
     IETF Executive Director.


  McCann, Hiller             Expires 08/2001                          17
                                GEHCOARCH                 February, 2001


  Full Copyright Statement

     Copyright (C) The Internet Society (2001). All Rights Reserved.
     This document and translations of it may be copied and furnished
     to others, and derivative works that comment on or otherwise
     explain it or assist in its implementation may be prepared,
     copied, published and distributed, in whole or in part, without
     restriction of any kind, provided that the above copyright notice
     and this paragraph are included on all such copies and derivative
     works. However, this document itself may not be modified in any
     way, such as by removing the copyright notice or references to the
     Internet Society or other Internet organizations, except as needed
     for the purpose of developing Internet standards in which case the
     procedures for copyrights defined in the Internet Standards
     process must be followed, or as required to translate it into
     languages other than English.

     The limited permissions granted above are perpetual and will not
     be revoked by the Internet Society or its successors or assigns.

     This document and the information contained herein is provided on
     an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
     ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
     IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
     THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
     WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


  McCann, Hiller             Expires 08/2001                          18