Internet Engineering Task Force                         M. Reha Civanlar
  INTERNET-DRAFT                                             Glenn L. Cash
  File: draft-civanlar-bmpeg-00.txt                       Barry G. Haskell

                                                        AT&T Labs-Research

                                                              August, 1996


                   RTP Payload Format for Bundled MPEG


                           Status of this Memo

  This document is an Internet-Draft.  Internet-Drafts are working
  documents of the Internet Engineering Task Force (IETF), its areas,
  and its working groups.  Note that other groups may also distribute
  working documents as Internet-Drafts.

  Internet-Drafts are draft documents valid for a maximum of six months
  and may be updated, replaced, or obsoleted by other documents at any
  time.  It is inappropriate to use Internet- Drafts as reference
  material or to cite them other than as ``work in progress.''

  To learn the current status of any Internet-Draft, please check the
  ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow
  Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
  munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
  ftp.isi.edu (US West Coast).

  Distribution of this memo is unlimited.


                                 Abstract

  This document describes a payload type for bundled, MPEG-2 encoded
  video and audio data to be used with RTP, version 2. Bundling has
  several advantages for this payload type particularly when it is used
  for video-on-demand applications.  A scheme for pre-transmission of
  vital information to improve error resilience is described also.


  1. Introduction

  This document describes a bundled packetization scheme for MPEG-2
  encoded audio and video streams using the Real-time Transport Protocol
  (RTP), version 2 [1].


draft-civanlar-bmpeg-00.txt                                     [Page 1]

INTERNET-DRAFT   RTP Payload Format for Bundled MPEG        August, 1996


  The MPEG-2 International standard consists of three layers: audio,
  video and systems [2]. The audio and the video layers define the
  syntax and semantics of the corresponding "elementary streams." The
  systems layer supports synchronization and interleaving of multiple
  compressed streams, buffer initialization and management, and time
  identification. A previous Internet draft [3] describes packetization
  techniques to transport individual audio and video elementary streams
  as well as the transport stream, which is defined at the system layer,
  using the RTP.

  The bundled packetization scheme is needed because it has several
  advantages over other schemes for some important applications
  including video-on-demand (VOD). Its advantages over independent
  packetization of audio and video are:

    1. Uses a single port per "program" (i.e. bundled A/V). This may increase
    the number of streams that can be served from e.g. a VOD server.

    2. Reduces the header overhead. Since using large packets increases the effects
    of losses and delay, audio only packets need to be smaller increasing the
    overhead. An A/V bundled format can provide an order of magnitude reduction
    in this overhead.

    3. Provides implicit synchronization. Audio segments are carried together with
    corresponding video segments. No other mechanism is needed for synchronization.

    4. Reduces overall receiver buffer size. Audio and video streams, when
    transmitted separately, may experience different delays and buffers need to be
    designed for the longest delay.

 And, the advantages over packetization of the transport layer streams
 are:

   1. Reduced overhead. It does not contain systems layer information which should
   be redundant for the RTP (essentially they address similar issues).

   2. Easier error recovery. Because of the structured packetization consistent
   with the ALF principle, loss concealment and error recovery can be made simpler
   and more effective.

2. Encapsulation of Bundled MPEG Video and Audio

Video encapsulation follows the rules described in [3] with the addition
of the following:

  each packet must contain an integral number of video slices

The video data is followed by a sufficient number of integral audio


draft-civanlar-bmpeg-00.txt                                     [Page 2]

INTERNET-DRAFT   RTP Payload Format for Bundled MPEG        August, 1996


frames to cover the duration of the video segment included in a packet.
For example, if the first packet contains three 1/900 seconds long
slices of video, and Layer I audio coding is used at a 44.1kHz sampling
rate, only one audio frame covering 384/44100 seconds of audio need be
included in this packet. Since the length of this audio frame (8.71
msec.) is longer than that of the video segment contained in this packet
(3.33 msec), the next few packets may not contain any audio frames until
the packet in which the covered video time extends outside the length of
the previously transmitted audio frames. It is possible, in this
proposal, to repeat the latest audio frame in no-audio packets for
packet loss resilience.

2.1. RTP Fixed Header for BMPEG Encapsulation

The following RTP header fields are used:

  Payload Type: A distinct payload type number should be assigned to
  BMPEG.

  M Bit: Set for packets containing end of a picture.

  timestamp: 32-bit 90 kHz timestamp representing transmission time of
  the MPEG picture and is monotonically increasing. Same for all packets
  belonging to the same picture. For packets that contain only a
  sequence, extension and/or GOP header, the timestamp is that of the
  subsequent picture.

2.2. BMPEG Specific Header:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|M|O|R|U| P |    Audio Length   |         Audio Offset          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

  M: Reserved for future use (1 bit).

  O: Audio offset present (1 bit). Set if audio offset information is
  included.

  R: Redundant audio (1 bit). Set if the audio frame contained in the
  packet is a repetition of the last audio frame.

  U: Header data changed (1 bit). Set if any part of the video sequence,
  extension, GOP and picture header data is different than that of the
  previously sent headers. It gets reset, when all the header data gets
  repeated.


draft-civanlar-bmpeg-00.txt                                        [Page 3]

INTERNET-DRAFT   RTP Payload Format for Bundled MPEG        August, 1996


  P: Picture type (2 bits). I (0), P (1), B (2) and D(3).

  Audio Length: Length of the audio data in bytes (10 bits).

  Audio Offset: (Optional, 16 bits) If O bit is set, contains the offset
  between the audio frame and the start of the video segment in this
  packet in number of audio samples.

3. Prior Transmission of the "High Priority" Information

In MPEG encoded video, loss of the header information, which includes
sequence, extension, GOP, and picture headers, causes severe
degradations in the decoded video. When possible, dependable
transmission of the header information to the receivers prior to the
start of the real-time transmission can improve the loss resiliency of
MPEG video significantly [4].

The "data partitioning" method in MPEG-2 defines the syntax and
semantics for partitioning an MPEG-2 encoded video bitstream into "high
priority" and "low priority" parts. If the "high priority" (HP) part is
selected to contain only the header information, it is less than two
percent of the video data and can be used for pre-transmission. In order
to synchronize the HP data with the BMPEG stream, the initial value of
the timestamp for the BMPEG stream should be inserted at the beginning
of the HP data. The HP data may be transmitted as an RTP header
extension.

If the length of the HP part is too long for pre-transmission, it may be
transmitted along with the A/V data using layered multimedia
transmission techniques for RTP [5].

Appendix 1. Error Recovery

Packet losses can be detected from a combination of the sequence number
and the timestamp fields of the RTP fixed header. The extent of the loss
can be determined from the timestamp, the slice number and the
horizontal location of the first slice in the packet. The slice number
and the horizontal location can be determined from the slice header and
the first macroblock address increment, which are located at fixed bit
positions.

If lost data consists of slices all from the same picture, new data
following the loss can simply be given to the video decoder which will
normally repeat missing pixels from a previous picture. The next audio
frame must be delayed by the duration of the lost video segment.

If the received new data after a loss is from the next picture and the U
bit is not set, previously received headers for the particular picture


draft-civanlar-bmpeg-00.txt                                        [Page 4]

INTERNET-DRAFT   RTP Payload Format for Bundled MPEG        August, 1996


type (determined from the P bits) can be given to the video decoder
followed by the new data.  If U is set, data deletion until a new
picture start code is advisable unless headers are available from
previously received HP data. In both cases audio needs to be delayed
properly.

If data for more than one picture is lost and HP data is not available,
resynchronization to a new video sequence header is advisable.

In all cases of large packet losses, if the HP data is available,
appropriate portions of it can be given to the video decoder and the
received data can be used irrespective of the U bit value or the number
of lost pictures.

Appendix 2. Resynchronization

As described in [3], use of frequent video sequence headers makes it
possible to join in a program at arbitrary times. Also, it reduces the
resynchronization time after severe losses.

References:

[1] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson,
    "RTP: A Transport Protocol for Real-Time Applications,"
    RFC 1889, January 1996.

[2] ISO/IEC International Standard 13818; "Generic coding of moving pictures
    and associated audio information," November 1994.

[3] D. Hoffman, G. Fernando, S. Kleiman, V. Goyal, "RTP Payload Format for
    MPEG1/MPEG2 Video," Internet Draft, draft-ietf-avt-mpeg-01.txt,
    November 1995.

[4] M. R. Civanlar, G. L. Cash, "A practical system for MPEG-2 based
video-on-demand over ATM packet networks and the WWW," Signal Processing:
Image Communication, no. 8, pp. 221-227, Elsevier, 1996.

[5] M. F. Speer, S. McCanne, "RTP Usage with Layered Multimedia Streams,"
Internet Draft, draft-speer-avt-layered-video-01.txt, June 1996.

Author's  Address:

   M. Reha Civanlar
   Glenn L. Cash
   Barry G. Haskell

   AT&T Labs-Research
   101 Crawfords Corner Road


draft-civanlar-bmpeg-00.txt                                     [Page 5]

INTERNET-DRAFT   RTP Payload Format for Bundled MPEG        August, 1996


   Holmdel, NJ 07733
   USA

   e-mail: civanlar|glenn|bgh@research.att.com


draft-civanlar-bmpeg-00.txt                                     [Page 6]