Internet Engineering Task Force Audio Visual Transport WG Internet-Draft D.Curet,E.Gouleau, S.Relier,C.Roux/ P.Clement/G.Cherry Document: draft-curet-avt-rtp- FT R&D /Thales BM/nCube mpeg4-flexmux-03.txt July, 1st 2002 Expires: January, 1 2003 RTP Payload Format for MPEG-4 FlexMultiplexed Streams draft-curet-avt-rtp-mpeg4-flexmux-03.txt STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as refer- ence material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This specification is a product of the Audio/Video Transport working group within the Internet Engineering Task Force and ISO/IEC MPEG-4 ad hoc group on MPEG-4 over Internet. Comments are solicited and should be addressed to the working group's mailing list at rem-conf@es.net and/or the authors. Section 9 of this document is intended for registering SDP names with IANA as in RFC 2048. Abstract MPEG-4 is a recent standard from ISO/IEC for the coding of natural and synthetic audio-visual data.This document describes a payload format for transporting MPEG-4 synchronised and multiplexed data using RTP. Several services provided by RTP are beneficial for MPEG-4 encoded and multiplexed data transport over the Internet. Additionally, the use of RTP makes it possible to synchronize MPEG-4 data with other real-time data types. curet et al. expires May 2002 [Page 1] Internet Draft RTP payload for MPEG-4 FlexMux streams July 02 1. Introduction The MPEG-4 standard (ISO/IEC 14496) can be represented in a layered architecture, where three layers can be identified as follows: +---------------------------------------+ media aware, | COMPRESSION LAYER: | | Elementary Streams (ES) encoding | delivery unaware| MPEG-4 part 2 Visual | layer | MPEG-4 part 3 Audio | | MPEG-4 part 1 Bifs,OD,IPMP,OCI | +---------------------------------------+ ================================================ ESI Interface +---------------------------------------+ media and | SYNC LAYER (SL) | delivery unaware| Elementary streams management | layer | and synchronisation | +---------------------------------------+ ================================================DAI Interface +---------------------------------------+ delivery aware, | DELIVERY LAYER (DMIF) | media unaware |provides FLEXMULTIPLEXING of SL streams| layer | and transparent access | | to the delivery technology | +---------------------------------------+ Although the Delivery Layer mostly focuses on the control plane it also encompasses multiplexing tools, called the Flexmux tools, to multiplex MPEG-4 SL streams. MPEG makes a "constant delivery delay" assumption: each MPEG-4 packet transmitted over the network should have a nearly constant transmission delay. Under this assumption, the reconstruction of the correct timing of MPEG-4 bitstreams is supported similarly both by the MPEG-4 SL stream and by the MPEG-4 FlexMux stream syntaxes. Different payload formats have been defined or are under definition to carry some (or all) types of MPEG-4 Elementary streams, to carry MPEG-4 SL streams, see [7], [14] & [15]. This document will specify a RTP payload format to enable the carriage of Flexmux streams. 2. MPEG-4 overview 2.1. Compresion layer: The compressed content produced by this layer are the Elementary Streams (ESs)that are organised in Access Units (AUs). An AU is the smallest element to which timestamps can be assigned. AUs are passed to the Synchronisation Layer (SL) together with timestamps, RandomAccess, and other information through the ESI interface. curet et al. expires January 2003 [Page 2] Internet Draft RTP payload for MPEG-4 FlexMux streams July 02 The Compression Layer processes the traditional individual audio/visual elementary streams (ES) and some associated 'systems' elementary streams (ES) such as Bifs, OD, IPMP and OCI elementary streams. The MPEG-4 audio/visual ES syntaxes are defined in[3] and[2]. The 'systems' ES syntaxes are described in [1]: the Bifs ES syntax allows a dynamic scene description. The OD ES syntax allows the description of the hierarchical relations, location and properties of different ESs through a dynamic set of Object Descriptors. The 'system' ES may require to be carried with a better protection than the traditional audio/visual ESs. The compression layer is unaware of a specific delivery technology, but it can react to the characteristics of a particular delivery layer such as the path-MTU or packet loss or bit error characteristics. 2.2. Synchronisation Layer: The MPEG-4 SL stream syntax is defined in [1]. It provides a unique and homogeneous encapsulation of any ES which is organised in AUs. This the case of all the MPEG-4 ESs, but it can also be the case of non MPEG-4 ESs. That layer primarily provides the synchronisation between ESs. Integer or fractional AUs, from the same ES, are encapsulated in SL packets that build an SL stream. SL packets are passed to the Delivery Layer (DMIF) through the DMIF Application Interface (DAI interface), which can allows assigning QoS requirements to the delivery of SL streams. The synchronisation layer is unaware of a specific delivery technology, but it can react to the characteristics of a particular delivery layer such as the path-MTU or packet loss. 2.3. The Delivery Layer & the Flexmultiplexing: The MPEG-4 Delivery Layer consists of the Delivery Multimedia Integration Framework defined in [4]. This layer is media unaware but delivery technology aware. It provides transparent access to and delivery of content irrespective of the technologies used. This interface supports content location independent protocols firstly for establishing the MPEG-4 session and secondly for accessing to transport channels. DMIF monitors transport channels on the QoS requirements assigned to the SL streams, and supports the Flexmultiplexing of the SL streams, by the means of the MPEG-4 FlexMux tools. There are different possible FlexMux tools. FlexMux streams delivery is defined in [4], while the curet et al. expires January 2003 [Page 3] Internet Draft RTP payload for MPEG-4 FlexMux streams July 02 FlexMux stream syntax is defined within [1]. This draft specifies an RTP [5] payload format for transporting multiplexed MPEG-4 encoded data streams. It can be presented as an instance of the MPEG-4 Delivery layer. The SL streams can be FlexMultiplexed into FlexMux streams. A FlexMux stream is a succession of FlexMux packets. Each FlexMux packet is built from a FlexMux packet header followed by a FlexMux packet payload. The FlexMux packet payload consists in one or several complete SL packets The FlexMux packet header is composed of a one byte index followed by a length field. The index gives the FlexMux Channel number. A particular FlexMux Channel number (index=238) is reserved to identify a particular "signalling" FlexMux channel. The ô238ö packets are dedicated to carry a FlexMux Clock Reference time stamp (FCR) indicating its arrival time and the bitrate of the FlexMux stream. FlexMux streams are piecewise constant bit-streams. A "238" packet can also carry, "in-band", the different FlexMux descriptors: FlexMuxChannel, FlexMuxBufferSize, FlexMuxTiming, FlexMuxCodeTable and FlexMuxIdent descriptors. The FlexMux descriptors can also be provided by out-of-band means (e.g. SDP). 2.3.1. Some of the advantages of FlexMultiplexing: 1. Since a typical MPEG-4 session may involve a large number of objects, that may be as many as a few hundred, transporting each ES as an individual RTP session may not always be practical. The use of one session per elementary stream cannot be much cost effective, both on the server side and on the client side in terms of performance, when the number of elementary streams will increase within a scene. 2. The use of one single session for a multiplexed bitstream enables to send a bunch of ESs that are tightly synchronized together. Some of these ESs can themselves be Bifs and OD ESs when a scene description is used with Audio-Visual ES, and some other ESs can be OCI ES, and even IPMP (or DRM) ES when such systems are involved. 3. The FlexMultiplexing management supports concatenation of multiple SL packets into one FlexMux packet, by the use of FlexMuxCode table entries. 4. If the applied multiplexing policy is smoothing at the maximum the multiplexed SL streams, mutual synchronization between these SL streams can be easily preserved when packet losses occur. 5. The use of the FlexMux technology enables possible curet et al. expires January 2003 [Page 4] Internet Draft RTP payload for MPEG-4 FlexMux streams July 02 interconnection between Internet network and digital television network, as MPEG normatively defines the use the MPEG-4 FlexMux syntax to carry MPEG-4 over MPEG-2 transport channels[9]. 6. The reconstruction of the correct timing of the FlexMux stream is possible, using timing samples (FCRs) carried within the FlexMux signalling channel, if the "constant delivery delay" assumption can be assumed. 7. The overall MPEG-4 receiver buffer size is reduced, as MPEG-4 compliant Flexmultiplexed streams, by the use of the MPEG-4 timestamps, respect the MPEG-4 system decoder model 8. The FlexMux in-band signalling mechanism that allows to signal dynamically, at anytime, within the stream itself, the bit-rate of the stream (FlexMux streams have a piecewise constant bit-rate), allowing to have an easy bandwidth management. FlexMux descriptors and Ad Hoc descriptors are carried within this in- band signalling mechanism. 9. Protection can be enhanced by means of repetition of vital SL packets. 10. Content providers are able to bundle together a single stream with assurance that associated streams will be kept together and synchronized. 2.3.2. Some of the disadvantages of FlexMultiplexing: One disadvantage that can be seen with FlexMultiplexing is the Flexmultiplexing policy itself, that brings more complexity to the server side. The major disadvantage with the packetization of the MPEG-4 Flexmultiplexed streams is the added packet header overhead. In order to minimize the overhead, two FlexMux tools, with two different FlexMux packet length fields are supported. MPEG-4 does not support a reduction mechanism of the carried MPEG-4 Flexmultiplexed streams packet headers. This issue needs certainly be resolved using a mechanism similar to what was proposed with [8]. 3. applicability statement 3.1. Environment of the payload format: This payload format is dedicated to applications relying on an MPEG-4 scene. The contents of a scene may vary completely from one application to another application. A scene may be either quite simple (description of a 2D layout, in the streaming case), or curet et al. expires January 2003 [Page 5] Internet Draft RTP payload for MPEG-4 FlexMux streams July 02 more complex (with 3D objects). A scene can be considered, on one hand, as a monolithic and static declaration, as on the other hand, it may be considered dynamicaly changing along with time. 3.2. Restrictions REQUIRED on the contents: Which Elementary streams to FlexMultiplex together?: According to the DMIF principle of MPEG-4, Elementary streams are FlexMuliplexed together when they require the same Quality of Service over the network. As in general, the System Elementary streams require a better protection (see next paragraph on the handling of System Elementary streams), the application may take advantage in FlexMultiplexing system streams in a first FlexMux stream provided with a good protection, while other "non system" streams will be Flexmultiplexed within a second FlexMux stream provided with less achieved protection. Which time base? When all the Elementary streams that build a scene are associated with a common time base, the FlexMux clock reference stream constitutes this common time base. When the network jitter can be handled (the "constant delivery delay" assumption verified), reconstruction of this common clock can be achieved, on the receiving side. 3.3. Handling of System Elementary streams: Senders SHOULD ensure that packet loss does not cause severe problems in application execution when the Elementary Streams carry System Elementary streams such as programmatic content, DRM (or IPMP), metadata information and "scene description" streams (OD commands, BIFS commands). MPEG-4 has introduced the "scene carousel mechanism" which supports the possibility to dynamically change the "scene description" by sending animation information (changes in parameters) and structural change information (updates). This mechanism provides "scene description" ESs with Random Access Points (RAPs). Like for key frames for video, the periodicity of transmission of these RAPs should be suitabily adjusted depending on the application and the network it is deployed on. Reliability can be improved by re-transmission, SL packet duplication or by using the "scene description carousel" mechanism, while observing the general congestion control principles. The application has to take into account the delays introduced by these two methods. Contents have to be send sufficiently in advance. The application can achieve synchronisation, at the receiver side, by the means of the different timestamps (FCRs, DTSs, and CTSs). curet et al. expires January 2003 [Page 6] Internet Draft RTP payload for MPEG-4 FlexMux streams July 02 When such measures are deemed insufficiently adequate, instead of this payload format applications SHOULD use more reliable means to transport the information, for example by applying an FEC scheme for RTP (such as in RFC 2733), or by using RTP over TCP (such as in RFC 2326), while still giving due consideration to congestion control. For a general description of methods to repair streaming media see RFC 2354. 4. Benefits of using RTP for transport: i. Ability to synchronize MPEG-4 streams with other RTP payloads ii. Monitoring MPEG-4 delivery performance through RTCP iii. Combining MPEG-4 and other real-time data streams received from multiple end-systems into a set of consolidated streams through RTP mixers iv. Converting data types, etc. through the use of RTP translators 5. Conventions used in this document 5.1. general: The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL NOT','SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'MAY', and 'OPTIONAL' in this document are to be interpreted as described in RFC-2119 [6]. 5.2. MPEG-4 glossary: AU :Access Unit, Bifs: Binary format for scene, CTS:Composition Timestamp, DAI: DMIF Application Interface DMIF: Delivery Multimedia Integration Framework, DTS: Decoding Timestamp, DRM: Digital Right Management ES: Elementary stream, ESI: Elementary stream Interface, FlexMux: Flexible Multiplex, FCR: FlexMux Clock reference, IPMP: Intellectual Property Management and Protection, OCI: Object Content Information OCR: Object Clock Reference OD: Object descriptor, QoS: Quality of service, SL: Synchronization layer 6. The RTP packet 6.1. The RTP packet header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | curet et al. expires January 2003 [Page 7] Internet Draft RTP payload for MPEG-4 FlexMux streams July 02 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ : contributing source (CSRC) identifiers | |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | | | | RTP Packet Payload | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1 - An RTP packet for MPEG-4 FlexMux stream 6.2. RTP header fields usage streams: Payload Type (PT): The assignment of a particular RTP payload type to this new packet format, is outside the scope of this document, and is not specified here. If the dynamic payload type assignment is used, it can be specified by some out of band means (e.g. SDP, according to the syntax proposed in the paragraph 9) that the MPEG-4 FlexMux payload format is used for the corresponding RTP packets. Marker (M) bit: The M bit is set to 1 to indicate that the RTP packet payload includes the end of each Access Unit of which data is contained in this RTP packet. The M bit is set to 0 when the RTP packet contains one or more Access Unit fragments that are not Access Unit ends. Extension (X) bit: Defined by the RTP profile used. Sequence Number: Increment by one for each RTP data packet sent. It starts with a random initial value for security reasons. Timestamp: 32 bits: it reflects the sampling instant of the FlexMux packet corresponding to the RTP packet. The sampling instant is given by the same clock that increments monotonically and linearly as the one which is used to validate the FCR field in the "238" specific FlexMux channel. Unless specified by an out Of band means (e.g. SDP), the resolution of the timestamp is set to its default value(90KHz). SSRC, CC and CSRC fields are used as described in RFC 1889 [5]. RTCP SHOULD be used as defined in RFC 1889 [5]. Timestamps in RTCP SR packets: The RTP timestamp value is the RTP timestamp that would be curet et al. expires January 2003 [Page 8] Internet Draft RTP payload for MPEG-4 FlexMux streams July 02 applied to an RTP packet for data that would be sent at the instant the SR packet is being generated and sent. The NTP timestamp value is the NPT time at which that SR packet is sent. 6.3. The RTP packet payload The RTP packet payload is built from an integer number of complete FlexMux packets, defined in [1]. Each FlexMux packet is built from a FlexMux packet header followed by a FlexMux packet payload. The FlexMux packet header is composed of a one byte index followed by a length field. The length field can be on one byte (when the"first" FlexMux tool is used) or on several bytes (when the "second" FlexMux tool is used). The index identifies the FlexMux packet in the FlexMux stream. 7. Fragmentation rules MPEG-4 Access Units (AUs) are the smallest entity to which MPEG-4 is assigning temporal references. AUs are embedded within SL packets, and the SL packets embedded within FlexMux packets. Some MPEG-4 codecs define optional syntax for Access Units sub- entities (AU parts) that are independently decodable for error resilience purposes. In that mode the maximum size of the AU parts depends on the path-MTU size, taking care of the SL and FlexMux packet headers'overhead. As no fragmentation occur at the FlexMux level, this section on fragmentation rules is only a concern for the compression Layer (when producing the AUs or the AU sub_entities that are independantly decodable -the AU parts) and the SL layer (when performing Access Unit -or AU part- fragmentation into SL packets), in order to prevent media decoding difficulties at the receiver side. A FlexMux packet contains one or several SL packets. If the first FlexMux tool is used, SL packets MUST be less than 255 bytes long. If the second FlexMux tool is used, this length is limited to 268 Mbytes. The size of the FlexMux packets should be adjusted such that the resulting RTP packet (embedding one or several FlexMux packets) is not larger than the path-MTU. 8. Transport of MPEG-4 FlexMux streams An MPEG-4 FlexMux packet is mapped directly onto the RTP curet et al. expires January 2003 [Page 9] Internet Draft RTP payload for MPEG-4 FlexMux streams July 02 payload without any addition of extra header fields or removal of any FlexMux packet header syntactic elements. There are packetization restrictions due to the fact that no synchronization pattern is part of the FlexMux packet header: An RTP packet should contain an integer number of complete FlexMux packets. An RTP packet payload should start with the start of a FlexMux packet. An RTP packet payload should end with the end of a FlexMux packet. Each RTP packet will contain a timestamp derived from the sender's clock reference, synchronized to the FlexMux Clock. That timestamp represents the sampling instant of the first byte of the RTP packet. The sampling instant is given by the same clock as the one used to validate the FCR field in the FlexMux stream. Special consideration is to be given to the transport of FlexMux packets that carry the FCR samples. Such FlexMx packets have an index ==238. When such a "238" FlexMux packet is transported within a RTP packet, it is always the first FlexMux packet of that RTP packet. The direct comparison between the FCR sample and the RTP time stamp allows the receiver to know the exact relationship between these two time stamps, when the RTP timestamp is randomly chosen. The FCRs, as well as the CTSs and the DTSs embedded within a FlexMux stream are samples of the same common clock. The relationship between the different CTSs and DTSs and the FCRs are defined according to the constraints of the MPEG-4 system decoder model). As the DTSs and CTSs embedded within a FlexMux stream are avail able at the application level (through the ESI interface, above the SL layer), synchronisation with other media can be achieved at the application level. When the "constant delivery delay" assumption can be assumed (handling, after estimation, of the network jitter) the FCR samples may be used, on the receiver side to accurately reconstruct the original senderÆs FlexMux clock. On the receiving side, the RTP packet timestamp will not be passed to the MPEG-4 Flexdemultiplexor. The FlexMux descriptors (declaration descriptor, timing descriptor, Channel Table descriptor, codetable entry descriptor, buffersize descriptor,etc..) describing the characteristics of the FlexMux stream may be provided by out of band means (e.g. SDP), and (or) by the inband signalling mechanism supported by the FlexMux stream syntax [1]. Ad Hoc (non MPEG) descriptors are also supported. When the IP packet marking facility is needed, as it is based on the 'degradationPriority' field present in each SL packet, all curet et al. expires January 2003 [Page 10] Internet Draft RTP payload for MPEG-4 FlexMux streams July 02 the FlexMux packets grouped in the same RTP packet should contain SL packets where the 'degradationPriority' field should be filled with the same value. Protection mechanisms for FlexMux streams within RTP packets are outside of the scope of this specification. 9. SDP syntax It is assumed that one typical way to signal the FlexMux streams characteristics of this payload format is via a SDP message that may be transported to the client in reply to a RTSP [11] DESCRIBE command, or via SAP [12]. The SDP protocol is decribed in [10]. 9.1. Types and Names This section describes the MIME types and names associated with this payload format. This section is intended for registration with IANA [13]. 9.1.1. MIME type registration MIME media type name: "video" or "audio" or "application" "video" SHOULD be used for MPEG-4 Visual streams (i.e. video as defined in ISO/IEC 14496-2 [2] and/or graphics as defined in ISO/IEC 14496-1 [1]) or MPEG-4 Systems streams that convey information needed for an audio/visual presentation. "audio" SHOULD be used for MPEG-4 Audio streams (ISO/IEC 14496-3) or MPEG-4 Systems streams that convey information needed for an audio only presentation. "application" SHOULD be used for MPEG-4 Systems streams (ISO/IEC14496-1) like MPEG-4 FlexMux streams that serve other purposes than only an audio/visual presentation. The payload names used in an RTPMAP attribute within SDP, to specify the mapping of payload number to its definition, also come from the MIME namespace. Each of the RTP payload mappings defined above has a distinct name. It is recommended that visual streams be identified under 'video', and audio streams be identified under 'audio', and otherwise 'application' be used. When a FLexMux stream is served (e.g. over HTTP) or otherwise must be identified by a MIME type, the type 'application/mpeg4- flexmux' SHALL be used. These files consist of concatenated FLexMux packets in transmission order. curet et al. expires January 2003 [Page 11] Internet Draft RTP payload for MPEG-4 FlexMux streams July 02 MIME media type name:application MIME subtype name:mpeg4-flexmux Required parameters: none Optional parameters: none Encoding considerations:base64 generally preferred; files are binary and should be transmitted without CR/LF conversion, 7-bit stripping etc. 9.1.2. attributes A new encoding name is defined for the a = rtpmap attribute, the new registred mpeg4-flexmux MIME subtype a = rtpmap: mpeg4-flexmux/