Network Working Group P. Thatcher
Internet-Draft Google
Intended status: Standards Track M. Zanaty
Expires: April 4, 2016 S. Nandakumar
Cisco Systems
A. Roach
B. Burman
B. Campen
October 02, 2015

RTP Payload Format Constraints


In this specification, we define a framework for identifying Source RTP Streams with the constraints on its payload format in the Session Description Protocol. This framework uses “rid” SDP attribute to: a) effectively identify the Source RTP Streams within a RTP Session, b) constrain their payload format parameters in a codec-agnostic way beyond what is provided with the regular Payload Types and c) enable unambiguous mapping between the Source RTP Streams to their media format specification in the SDP.

Note-1: The name ‘rid’ is not yet finalized. Please refer to Section “Open Issues” for more details on the naming.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on April 4, 2016.

Copyright Notice

Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents ( in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

Table of Contents

1. Introduction

Payload Type (PT) in RTP provides mapping between the format of the RTP payload and the media format description specified in the signaling. For applications that use SDP for signaling, the constructs rtpmap and/or fmtp describe the characteristics of the media that is carried in the RTP payload, mapped to a given PT.

Recent advances in standards such as RTCWEB and NETVC have given rise to rich multimedia applications requiring support for multiple RTP Streams with in a RTP session [I-D.ietf-mmusic-sdp-bundle-negotiation], [I-D.ietf-mmusic-sdp-simulcast] or having to support multiple codecs, for example. These demands have unearthed challenges inherent with:

This specification defines a new SDP framework for configuring and identifying Source RTP Streams (Section 2.1.10 [I-D.ietf-avtext-rtp-grouping-taxonomy]) called “RTP Source Stream Identifier (rid)” along with the SDP attributes to constrain their payload formats in a codec-agnostic way. The “rid” framework can be thought of as complementary extension to the way the media format parameters are specified in SDP today, via the “a=fmtp” attribute. This specification also proposes a new RTP header extension to carry the “rid” value, to provide correlation between the RTP Packets and their format specification in the SDP.

Note that the “rid” parameters only serve to further constrain the parameters that are established on a PT format. They do not relax any existing constraints.

2. Key Words for Requirements

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC2119]

3. Terminology

The terms Source RTP Stream, Endpoint, RTP Session, and RTP Stream are used as defined in [I-D.ietf-avtext-rtp-grouping-taxonomy].

[RFC4566] and [RFC3264] terminology is also used where appropriate.

4. Motivation

This section summarizes several motivations for proposing the “rid” framework.

  1. RTP PT Space Exhaustion: [RFC3550] defines payload type (PT) that identifies the format of the RTP payload and determine its interpretation by the application. [RFC3550] assigns 7 bits for the PT in the RTP header. However, the assignment of static mapping of payload codes to payload formats and multiplexing of RTP with other protocols (such as RTCP) could result in limited number of payload type numbers available for the application usage. In scenarios where the number of possible RTP payload configurations exceed the available PT space within a RTP Session, there is need a way to represent the additional payload configurations and also to effectively map a Source RTP Stream to its configuration in the signaling.
  2. Codec-Specific Media Format Specification in SDP: RTP Payload configuration is typically specified using rtpmap and fmtp SDP attributes. The rtpmap attribute provides the media format to RTP PT mapping and the ftmp attribute describes the media format specific parameters. The syntax for the fmtp attribute is tightly coupled to a specific media format (such as H.264, H.265, VP8). This has resulted in a myriad ways for defining the attributes that are common across different media formats. Additionally, with the advent of new standards efforts such as NETVC, one can expect more media formats to be standardized in the future. Thus, there is a need to define common media characteristics in a codec-agnostic way in order to reduce the duplicated efforts and to simplify the syntactic representation across the different codec standards.
  3. Multi-source and Multi-stream Use Cases: Recently, there is a rising trend with real-time multimedia applications supporting multiple sources per endpoint with various temporal resolutions (Scalable Video Codec) and spatial resolutions (Simulcast) per source. These applications are being challenged by the limited RTP PT space and/or by the underspecified SDP constructs for exercising granular control on configuring the individual Source RTP Streams.

5. SDP ‘rid’ Media Level Attribute

This section defines new SDP media-level attribute [RFC4566], “a=rid”.

a=rid:<rid-identifier> <direction> pt=<fmt-list> <rid-attribute>:<value> ...

A given “a=rid” SDP media attribute specifies constraints defining an unique RTP payload configuration identified via the “rid-identifier”. A set of codec-agnostic “rid-level” attributes are defined (Section 6) that describe the media format specification applicable to one or more Payload Types speicified by the “a=rid” line.

The ‘rid’ framework MAY be used in combination with the ‘a=fmtp’ SDP attribute for describing the media format parameters for a given RTP Payload Type. However in such scenarios, the ‘rid-level’ attributes (Section 6) further constrains the equivalent ‘fmtp’ attributes.

The ‘direction’ identifies the either ‘send’, ‘recv’ directionality of the Source RTP Stream.

A given SDP media description MAY have zero or more “a=rid” lines describing various possible RTP payload configurations. A given ‘rid-identifier’ MUST not be repeated in a given media description.

The ‘rid’ media attribute MAY be used for any RTP-based media transport. It is not defined for other transports.

Though the ‘rid-level’ attributes specified by the ‘rid’ property follow the syntax similar to session-level and media-level attributes, they are defined independently. All ‘rid-level’ attributes MUST be registered with IANA, using the registry defined in Section 12

Section 9 gives a formal Augmented Backus-Naur Form(ABNF) [RFC5234] grammar for the “rid” attribute.

The “a=rid” media attribute is not dependent on charset.

6. “rid-level’ attributes

This section defines the ‘rid-level’ attributes that can be used to constrain the RTP payload encoding format in a codec-agnostic way.

The following new SDP parameters shall be defined that represent things common across video codecs.

All the attributes are optional and are subjected to negotiation based on the SDP Offer/Answer rules described in Section 7

Section 9 provides formal Augmented Backus-Naur Form(ABNF) [RFC5234] grammar for each of the “rid-level” attributes defined in this section.

7. SDP Offer/Answer Procedures

This section describes the SDP Offer/Answer [RFC3264] procedures when using the ‘rid’ framework.

7.1. Generating the Initial SDP Offer

For each media description in the offer, the offerer MAY choose to include one or more “a=rid” lines to specify a configuration profile for the given set of RTP Payload Types.

In order to construct a given “a=rid” line, the offerer must follow the below steps:

  1. It MUST generate rid-identifier’ unique with in a media description
  2. It MUST set the direction for the ‘rid-identifier’ to one of ‘send’ or ‘recv’
  3. A listing of SDP format tokens (usually corresponding to RTP payload types) MUST be included to which the constraints expressed by the ‘rid-level’ attributes apply. The Payload Types chosen MUST either be defined as part of “a=rtpmap” or “a=fmtp” attributes.
  4. The Offerer then chooses the ‘rid-level’ attributes (Section 6) to be applied for the SDP format tokens listed.
  5. If an ‘a=fmtp’ attribute is also used to provide media-format-specific parameters, then the ‘rid-level’ attributes will further constrain the equivalent ‘fmtp’ parameters for the given Payload Type for those streams associated with the ‘rid’.

7.2. Answerer processing the SDP Offer

For each media description in the offer, and for each “a=rid” attribute in the media description, the receiver of the offer will perform the following steps:

7.2.1. ‘rid’ unaware Answerer

If the receiver doesn’t support the ‘rid’ framework proposed in this specification, the entire “a=rid” line is ignored following the standard [RFC3264] Offer/Answer rules. If a given codec would require ‘a=fmtp’ line when used without “a-rid” then the offer still needs to include that even when using RID.

7.2.2. ‘rid’ aware Answerer

If the answerer supports ‘rid’ framework, the following steps are executed, in order, for each “a=rid” line in a given media description:

  1. Extract the rid-identifier from the “a=rid” line and verify its uniqueness. In the case of a duplicate, the entire “a=rid” line is rejected and MUST not be included in the SDP Answer.
  2. As a next step, the list of payload types are verified against the list obtained from “a=rtpmap” and/or “a=fmtp” attributes. If there is no match for the Payload Type listed in the “a=rid” line, then remove the “a=rid” line. The exception being when ‘*’ is used for identifying the media format, where in the “a=rid” line applies to all the formats in a given media description.
  3. On verifying the Payload Type(s) matches, the answerer shall ensure that “rid-level” attributes listed are supported and syntactically well formed. In the case of a syntax error or an unsupported parameter, the “a=rid” line is removed.
  4. If the ‘depend’ rid-level attribute is included, the answerer MUST make sure that the rid-identifiers listed unambiguously match the rid-identifiers in the SDP offer.
  5. If the media description contains an “a=fmtp” attribute, the answerer verifies that the attribute values provided in the “rid-level” attributes are within the scope of their fmtp equivalents for a given media format.

7.3. Generating the SDP Answer

Having performed the verification of the SDP offer as described, the answerer shall perform the following steps to generate the SDP answer.

For each “a=rid” line:

  1. The answerer MAY choose to modify specific ‘rid-level’ attribute value in the answer SDP. In such a case, the modified value MUST be lower (more constrained) than the ones specified in the offer.
  2. The answerer MUST NOT modify the ‘rid-identifier’ present in the offer.
  3. The answerer is allowed to remove one or more media formats from a given ‘a=rid’ line. If the answerer chooses to remove all the media format tokens from an “a=rid” line, the answerer MUST remove the entire “a=rid” line.
  4. In cases where the answerer is unable to support the payload configuration specified in a given “a=rid” line in the offer, the answerer MUST remove the corresponding “a=rid” line.

7.4. Offering Processing of the SDP Answer

The offerer shall follow the steps similar to answerer’s offer processing with the following exceptions

  1. The offerer MUST ensure that the ‘rid-identifiers’ aren’t changed between the offer and the answer. If so, the offerer MUST consider the corresponding ‘a=rid’ line as rejected.
  2. If there exist changes in the ‘rid-level’ attribute values, the offerer MUST ensure that the modifications can be supported or else consider the “a=rid” line as rejected.
  3. If the SDP answer contains any “rid-identifier” that doesn’t match with the offer, the offerer MUST ignore the corresponding “a=rid” line.

7.5. Modifying the Session


8. Usage of ‘rid’ in RTP

The RTP fixed header includes the payload type number and the SSRC values of the RTP stream. RTP defines how you de-multiplex streams within an RTP session, but in some use cases applications need further identifiers in order to effectively map the individual RTP Streams to their equivalent payload configurations in the SDP.

This specification defines a new RTP header extension to include the ‘rid-identifier’. This makes it possible for a receiver to associate received RTP packets (identifying the Source RTP Stream) with a media description having the format constraint specificied.

8.1. RTP ‘rid’ Header Extension

The payload, containing the identification-tag, of the RTP ‘rid-identifier’ header extension element can be encoded using either the one-byte or two-byte header [RFC5285]. The identification-tag payload is UTF-8 encoded, as in SDP.

As the identification-tag is included in an RTP header extension, there should be some consideration about the packet expansion caused by the identification-tag. To avoid Maximum Transmission Unit (MTU) issues for the RTP packets, the header extension’s size needs to be taken into account when the encoding media. Note that set of header extensions included in the packet needs to be padded to the next 32-bit boundary using zero bytes [RFC5285]

It is recommended that the identification-tag is kept short. Due to the properties of the RTP header extension mechanism, when using the one-byte header, a tag that is 1-3 bytes will result in that a minimal number of 32-bit words are used for the RTP header extension, in case no other header extensions are included at the same time. In many cases, a one-byte tag will be sufficient; it is RECOMMENDED that implementations use the shortest tag that fits their purposes.

9. Formal Grammar

This section gives a formal Augmented Backus-Naur Form (ABNF) [RFC5234] grammar for each of the new media and rid-level attributes defined in this document.

rid-syntax = "a=rid:" rid-identifier SP rid-dir SP rid-fmt-list SP rid-attr-list

rid-identifier = 1*(alpha-numeric / "-" / "_")

rid-dir              = "send" / "recv"

rid-fmt-list   = "pt=" rid-fmt *( ";" rid-fmt )

rid-fmt        = "*" ; wildcard: applies to all formats
               / fmt

rid-attr-list  = rid-width-param
               / rid-height-param
               / rid-fps-param
               / rid-fs-param
               / rid-br-param
               / rid-pps-param
               / rid-depend-param

rid-width-param = "max-width=" param-val

rid-height-param = "max-height=" param-val

rid-fps-param   = "max-fps=" param-val

rid-fs-param    = "max-fs=" param-val

rid-br-param    = "max-br=" param-val

rid-pps-param    = "max-pps=" param-val

rid-depend-param = "depend=" rid-list

rid-list = rid-identifier *( ";" rid-identifier )

param-val  = byte-string

; WSP defined in {{RFC5234}}
; fmt defined in {{RFC4566}}
; byte-string in {{RFC4566}}

10. SDP Examples

10.1. Many Bundled Streams using Many Codecs

In this scenario, the offerer supports the Opus, G.722, G.711 and DTMF audio codecs, and VP8, VP9, H.264 (CBP/CHP, mode 0/1), H.264-SVC (SCBP/SCHP) and H.265 (MP/M10P) for video. An 8-way video call (to a mixer) is supported (send 1 and receive 7 video streams) by offering 7 video media sections (1 sendrecv at max resolution and 6 recvonly at smaller resolutions), all bundled on the same port, using 3 different resolutions. The resolutions include:

Expressing all these codecs and resolutions using 32 dynamic PTs (2 audio + 10x3 video) would exhaust the primary dynamic space (96-127). RIDs are used to avoid PT exhaustion and express the resolution constraints.

                                    Example 1

m=audio 10000 RTP/SAVPF 96 9 8 0 123
a=rtpmap:96 OPUS/48000
a=rtpmap:9 G722/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:123 telephone-event/8000
m=video 10000 RTP/SAVPF 98 99 100 101 102 103 104 105 106 107
a=rtpmap:98 VP8/90000
a=fmtp:98 max-fs=3600; max-fr=30
a=rtpmap:99 VP9/90000
a=fmtp:99 max-fs=3600; max-fr=30
a=rtpmap:100 H264/90000
a=fmtp:100 profile-level-id=42401f; packetization-mode=0
a=rtpmap:101 H264/90000
a=fmtp:101 profile-level-id=42401f; packetization-mode=1
a=rtpmap:102 H264/90000
a=fmtp:102 profile-level-id=640c1f; packetization-mode=0
a=rtpmap:103 H264/90000
a=fmtp:103 profile-level-id=640c1f; packetization-mode=1
a=rtpmap:104 H264-SVC/90000
a=fmtp:104 profile-level-id=530c1f
a=rtpmap:105 H264-SVC/90000
a=fmtp:105 profile-level-id=560c1f
a=rtpmap:106 H265/90000
a=fmtp:106 profile-id=1; level-id=93
a=rtpmap:107 H265/90000
a=fmtp:107 profile-id=2; level-id=93
a=mid:v1 (max resolution)
a=rid:1 send pt=*; max-width=1280; max-height=720; max-fps=30
a=rid:2 recv pt=*; max-width=1280; max-height=720; max-fps=30
m=video 10000 RTP/SAVPF 98 99 100 101 102 103 104 105 106 107
...same rtpmap/fmtp as above...
a=mid:v2 (medium resolution)
a=rid:3 recv pt=*; max-width=640; max-height=360; max-fps=15
m=video 10000 RTP/SAVPF 98 99 100 101 102 103 104 105 106 107
...same rtpmap/fmtp as above...
a=mid:v3 (medium resolution)
a=rid:3 recv pt=*; max-width=640; max-height=360; max-fps=15
m=video 10000 RTP/SAVPF 98 99 100 101 102 103 104 105 106 107
...same rtpmap/fmtp as above...
a=mid:v4 (small resolution)
a=rid:4 recv pt=*; max-width=320; max-height=180; max-fps=15
m=video 10000 RTP/SAVPF 98 99 100 101 102 103 104 105 106 107
...same rtpmap/fmtp as above...
...same rid:4 as above for mid:v5,v6,v7 (small resolution)...

...same as offer but swap send/recv...

10.2. Simulcast

Adding simulcast to the above example allows the mixer to selectively forward streams like an SFU rather than transcode high resolutions to lower ones. Simulcast encodings can be expressed using PTs or RIDs. Using PTs can exhaust the primary dynamic space even faster in simulcast scenarios. So RIDs are used to avoid PT exhaustion and express the encoding constraints. In the example below, 3 resolutions are offered to be sent as simulcast to a mixer/SFU.

                                    Example 2

m=audio ... same as from Example 1 ..
m=video ...same as from Example 1 ...
...same rtpmap/fmtp as above...
a=mid:v1 (max resolution)
a=rid:1 send pt=*; max-width=1280; max-height=720; max-fps=30
a=rid:2 recv pt=*; max-width=1280; max-height=720; max-fps=30
a=rid:5 send pt=*; max-width=640; max-height=360; max-fps=15
a=rid:6 send pt=*; max-width=320; max-height=180; max-fps=15
a=simulcast: send rid=1;5;6 recv rid=2
...same m=video sections as Example 1 for mid:v2-v7...

...same as offer but swap send/recv...

10.3. Scalable Layers

Adding scalable layers to the above simulcast example gives the SFU further flexibility to selectively forward packets from a source that best match the bandwidth and capabilities of diverse receivers. Scalable encodings have dependencies between layers, unlike independent simulcast streams. RIDs can be used to express these dependencies using the “depend” parameter. In the example below, the highest resolution is offered to be sent as 2 scalable temporal layers (using MRST).

                                    Example 3

m=audio ...same as Example 1 ...
m=video ...same as Example 1 ...
...same rtpmap/fmtp as Example 1...
a=mid:v1 (max resolution)
a=rid:0 send pt=*; max-width=1280; max-height=720; max-fps=15
a=rid:1 send pt=*; max-width=1280; max-height=720; max-fps=30; depend=0
a=rid:2 recv pt=*; max-width=1280; max-height=720; max-fps=30
a=rid:5 send pt=*; max-width=640; max-height=360; max-fps=15
a=rid:6 send pt=*; max-width=320; max-height=180; max-fps=15
a=simulcast: send rid=0;1;5;6 recv rid=2
...same m=video sections as Example1 for mid:v2-v7...

...same as offer but swap send/recv...

10.4. Simulcast with Payload Types

This example shows a simulcast Offer SDP that uses rid framework to identify:

and includes 2 “a=simulcast” lines to identify the simulcast streams with the Payload Types and rid-identifier respectively.

                                    Example 4

m=video 10000 RTP/AVP 97 98
a=rtpmap:97 VP8/90000
a=rtpmap:98 VP8/90000
a=fmtp:97 max-fs=3600
a=fmtp:98 max-fs=3600
a=rid:1 send pt=97; max-br=; max-height=720;
a=rid:2 recv pt=97; max-width=1280; max-height=720
a=rid:3 recv pt=98; max-width=320; max-height=180
a=simulcast send pt=97 recv pt=*
a=simulcast: send rid=1 recv rid=2;3

11. Open Issues

11.1. Name of the identifier

The name ‘rid’ is provisionally used and is open for further discussion.

Here are the few options that were considered while writing this draft

12. IANA Considerations

12.1. New RTP Header Extension URI

This document defines a new extension URI in the RTP Compact Header Extensions subregistry of the Real-Time Transport Protocol (RTP) Parameters registry, according to the following data:

    Extension URI: urn:ietf:params:rtp-hdrext:rid
    Description:   RTP Stream Identifier
    Contact:       <>
    Reference:     RFCXXXX

12.2. New SDP Media-Level attribute

This document defines “rid” as SDP media-level attribute. This attribute must be registered by IANA under “Session Description Protocol (SDP) Parameters” under “att-field (media level only)”.

The “rid” attribute is used to identify characteristics of RTP stream with in a RTP Session. Its format is defined in Section XXXX.

12.3. Registry for RID-Level Attributes

This specification creates a new IANA registry named “att-field (rid level)” within the SDP parameters registry. The rid-level attributes MUST be registered with IANA and documented under the same rules as for SDP session-level and media-level attributes as specified in [RFC4566].

New attribute registrations are accepted according to the “Specification Required” policy of [RFC5226], provided that the specification includes the following information:

The initial set of rid-level attribute names, with definitions in Section XXXX of this document, is given below

   Type            SDP Name                     Reference
   ----            ------------------           ---------
   att-field       (rid level)
                   max-width                     [RFCXXXX]
                   max-height                    [RFCXXXX]
                   max-fps                       [RFCXXXX]
                   max-fs                        [RFCXXXX]
                   max-br                        [RFCXXXX]
                   max-pps                       [RFCXXXX]
                   depend                        [RFCXXXX]

13. Security Considerations


14. Acknowledgements

Many thanks to review from Cullen Jennings, Magnus Westerlund.

15. References

15.1. Normative References

[I-D.ietf-avtext-rtp-grouping-taxonomy] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G. and B. Burman, "A Taxonomy of Semantics and Mechanisms for Real-Time Transport Protocol (RTP) Sources", Internet-Draft draft-ietf-avtext-rtp-grouping-taxonomy-08, July 2015.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, DOI 10.17487/RFC3264, June 2002.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, July 2003.
[RFC4566] Handley, M., Jacobson, V. and C. Perkins, "SDP: Session Description Protocol", RFC 4566, DOI 10.17487/RFC4566, July 2006.
[RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, DOI 10.17487/RFC5234, January 2008.
[RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP Header Extensions", RFC 5285, DOI 10.17487/RFC5285, July 2008.

15.2. Informative References

[I-D.ietf-mmusic-sdp-bundle-negotiation] Holmberg, C., Alvestrand, H. and C. Jennings, "Negotiating Media Multiplexing Using the Session Description Protocol (SDP)", Internet-Draft draft-ietf-mmusic-sdp-bundle-negotiation-23, July 2015.
[I-D.ietf-mmusic-sdp-simulcast] Burman, B., Westerlund, M., Nandakumar, S. and M. Zanaty, "Using Simulcast in SDP and RTP Sessions", Internet-Draft draft-ietf-mmusic-sdp-simulcast-01, July 2015.
[RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description Protocol (SDP) Grouping Framework", RFC 5888, DOI 10.17487/RFC5888, June 2010.

Authors' Addresses

Peter Thatcher Google EMail:
Mo Zanaty Cisco Systems EMail:
Suhas Nandakumar Cisco Systems EMail:
Adam Roach Mozilla EMail:
Bo Burman Ericsson EMail:
Byron Campen Mozilla EMail: