Using Simulcast in SDP and RTP SessionsEricssonGronlandsgatan 31SE-164 60 StockholmSwedenbo.burman@ericsson.comEricssonFarogatan 2SE-164 80 StockholmSweden+46 10 714 82 87magnus.westerlund@ericsson.comCisco170 West Tasman DriveSan JoseCA95134USAsnandaku@cisco.comCisco170 West Tasman DriveSan JoseCA95134USAmzanaty@cisco.comIn some application scenarios it may be desirable to send multiple
differently encoded versions of the same media source in different RTP
streams. This is called simulcast. This document describes how to
accomplish simulcast in RTP and how to signal it in SDP. The described
solution uses an RTP/RTCP identification method to identify RTP streams
belonging to the same media source, and makes an extension to SDP to
relate those RTP streams as being different simulcast formats of that
media source. The SDP extension consists of a new media level SDP
attribute that expresses capability to send and/or receive simulcast RTP
streams.Most of today's multiparty video conference solutions make use of
centralized servers to reduce the bandwidth and CPU consumption in the
endpoints. Those servers receive RTP streams from each participant and
send some suitable set of possibly modified RTP streams to the rest of
the participants, which usually have heterogeneous capabilities (screen
size, CPU, bandwidth, codec, etc). One of the biggest issues is how to
perform RTP stream adaptation to different participants' constraints
with the minimum possible impact on both video quality and server
performance.Simulcast is defined in this memo as the act of simultaneously
sending multiple different encoded streams of the same media source,
e.g. the same video source encoded with different video encoder types or
image resolutions. This can be done in several ways and for different
purposes. This document focuses on the case where it is desirable to
provide a media source as multiple encoded streams over RTP towards an intermediary so that the
intermediary can provide the wanted functionality by selecting which RTP
stream(s) to forward to other participants in the session, and more
specifically how the identification and grouping of the involved RTP
streams are done.This document describes a few scenarios where it is motivated to use
simulcast, and also defines the needed RTP/RTCP and SDP signaling for
it.This document makes use of the terminology defined in RTP Taxonomy, and RTP
Topologies. The following terms are especially noted or here
defined:An RTP middle node, defined in (Section 3.6 to 3.9).A common short term for the terms
"switching RTP mixer", "source projecting middlebox", and "video
switching MCU" as discussed in .One encoded stream or dependent
stream from a set of concurrently transmitted encoded streams and
optional dependent streams, all sharing a common media source, as
defined in . For example, HD and thumbnail
video simulcast versions of a single media source sent
concurrently as separate RTP Streams.Different formats of a simulcast
stream serve the same purpose as alternative RTP payload types in
non-simulcast SDP: to allow multiple alternative media formats for
a given RTP stream. As for multiple RTP payload types on the
m-line in offer/answer, any one of
the negotiated alternative formats can be used in a single RTP
stream at a given point in time, but not more than one (based on
RTP timestamp). What format is used can change dynamically from
one RTP packet to another.The
identification value used to refer to an individual simulcast
format, identical to the "rid-id" identification value for an
RTP Payload Format
Restriction and the corresponding content of "RtpStreamId" RTCP SDES
Item.The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.Many use cases of simulcast as described in this document relate to a
multi-party communication session where one or more central nodes are
used to adapt the view of the communication session towards individual
participants, and facilitate the media transport between participants.
Thus, these cases target the RTP Mixer type of topology.There are two principle approaches for an RTP Mixer to provide this
adapted view of the communication session to each receiving
participant:Transcoding (decoding and re-encoding) received RTP streams with
characteristics adapted to each receiving participant. This often
include mixing or composition of media sources from multiple
participants into a mixed media source originated by the RTP Mixer.
The main advantage of this approach is that it achieves close to
optimal adaptation to individual receiving participants. The main
disadvantages are that it can be very computationally expensive to
the RTP Mixer, typically degrades media Quality of Experience (QoE)
such as end-to-end delay for the receiving participants, and
requires RTP Mixer access to media content.Switching a subset of all received RTP streams or sub-streams to
each receiving participant, where the used subset is typically
specific to each receiving participant. The main advantages of this
approach are that it is computationally cheap to the RTP Mixer, has
very limited impact on media QoE, and does not require RTP Mixer
(full) access to media content. The main disadvantage is that it can
be difficult to combine a subset of received RTP streams into a
perfect fit to the resource situation of a receiving
participant.The use of simulcast relates to the latter approach, where it is more
important to reduce the load on the RTP Mixer and/or minimize QoE impact
than to achieve an optimal adaptation of resource usage.The media sources provided by a sending participant potentially
need to reach several receiving participants that differ in terms of
available resources. The receiver resources that typically differ
include, but are not limited to:This includes codec type (such as SDP MIME
type) and can include codec configuration options (e.g. SDP fmtp
parameters). A couple of codec resources that differ only in codec
configuration will be "different" if they are somehow not
"compatible", like if they differ in video codec profile, or the
transport packetization configuration.This relates to how the media source is
sampled, in spatial as well as in temporal domain. For video
streams, spatial sampling affects image resolution and temporal
sampling affects video frame rate. For audio, spatial sampling
relates to the number of audio channels and temporal sampling
affects audio bandwidth. This may be used to suit different
rendering capabilities or needs at the receiving endpoints, as
well as a method to achieve different transport capabilities,
bitrates and eventually QoE by controlling the amount of source
data.This relates to the amount of bits spent
per second to transmit the media source as an RTP stream, which
typically also affects the Quality of Experience (QoE) for the
receiving user.Letting the sending participant create a simulcast of a few
differently configured RTP streams per media source can be a good
tradeoff when using an RTP switch as middlebox, instead of sending a
single RTP stream and using an RTP mixer to create individual
transcodings to each receiving participant.This requires that the receiving participants can be categorized in
terms of available resources and that the sending participant can
choose a matching configuration for a single RTP stream per category
and media source.For example, assume for simplicity a set of receiving participants
that differ only in that some have support to receive Codec A, and the
others have support to receive Codec B. Further assume that the
sending participant can send both Codec A and B. It can then reach all
receivers by creating two simulcasted RTP streams from each media
source; one for Codec A and one for Codec B.In another simple example, a set of receiving participants differ
only in screen resolution; some are able to display video with at most
360p resolution and some support 720p resolution. A sending
participant can then reach all receivers with best possible resolution
by creating a simulcast of RTP streams with 360p and 720p resolution
for each sent video media source.In more elaborate cases, the receiving participants differ both in
available sampling and bitrate, and maybe also codec, and it is up to
the RTP switch to find a good trade-off in which simulcasted stream to
choose for each intended receiver. It is also the responsibility of
the RTP switch to negotiate a good fit of simulcast streams with the
sending participant.The maximum number of simulcasted RTP streams that can be sent is
mainly limited by the amount of processing and uplink network
resources available to the sending participant.The application logic that controls the communication session may
include special handling of some media sources. It is, for example,
commonly the case that the media from a sending participant is not
sent back to itself.It is also common that a currently active speaker participant is
shown in larger size or higher quality than other participants (the
sampling or bitrate aspects of ). Not sending the active speaker
media back to itself means there is some other participant's media
that instead has to receive special handling towards the active
speaker; typically the previous active speaker. This way, the
previously active speaker is needed both in larger size (to current
active speaker) and in small size (to the rest of the participants),
which can be solved with a simulcast from the previously active
speaker to the RTP switch.The application logic that controls the communication session may
allow receiving participants to apply preferences to the
characteristics of the RTP stream they receive, for example in terms
of the aspects listed in .
Sending a simulcast of RTP streams is one way of accommodating
receivers with conflicting or otherwise incompatible preferences.The following requirements need to be met to support the use cases in
previous sections:Identification. It must be
possible to identify a set of simulcasted RTP streams as originating
from the same media source:In SDP signaling.On RTP/RTCP level, at
least with prior knowledge of SDP (or similar) signaling.Transport usage. The solution
must work when using:Legacy SDP with separate
media transports per SDP media description.Bundled
SDP media descriptions.Capability negotiation. It must
be possible that:Sender can express
capability of sending simulcast.Receiver can express
capability of receiving simulcast.Sender can express
maximum number of simulcast streams that can be provided.Receiver can express
maximum number of simulcast streams that can be received.Sender can detail the
characteristics of the simulcast streams that can be
provided.Receiver can detail the
characteristics of the simulcast streams that it prefers to
receive.Distinguishing features. It must
be possible to have different simulcast streams use different codec
parameters, as can be expressed by SDP format values and RTP payload
types.Compatibility. It must be
possible to use simulcast in combination with other RTP mechanisms
that generate additional RTP streams:RTP Retransmission.RTP Forward Error Correction.Related payload types
such as audio Comfort Noise and/or DTMF.A single simulcast stream can consist of
multiple RTP streams, to support codecs where a dependent stream
is dependent on a set of encoded and dependent streams, each
potentially carried in their own RTP stream.Interoperability. The solution
must be possible to use in:Interworking with
non-simulcast legacy clients using a single media source per
media type.WebRTC environment with
a single media source per SDP media description.As an overview, the above requirements are met by signaling simulcast
capability and configurations in SDP:An offer or answer can contain a number of simulcast streams,
separate for send and receive directions.An offer or answer can contain multiple, alternative simulcast
stream formats in the same fashion as multiple, alternative formats
can be offered in a media description.A single media source per SDP media description is assumed, which
is aligned with the concepts defined in and
will specifically work in a WebRTC context, both with and without
BUNDLE
grouping.The codec configuration for a simulcast stream is expressed
through use of separately specified RTP payload format restrictions
with an associated RTP-level
identification mechanism to identify which RTP payload format
restrictions an RTP stream adheres to. This complements and
effectively extends simulcast stream identification and
configuration possibilities that could be provided by using only SDP
formats as identifier. Use of multiple RTP streams with the same
(non-redundancy) media type in the context of a single media source,
where those RTP streams are using different RtpStreamId, is a strong
but not totally unambiguous indication of those RTP streams being
part of a simulcast.It is possible, but not required to use source-specific signaling with the proposed
solution.This section further details the overview above. First, formal syntax is
provided, followed by the rest of the SDP
attribute definition in . Relating Simulcast Streams provides the
definition of the RTP/RTCP mechanisms used. The section is concluded
with a number of examples.This document defines a new SDP media-level "a=simulcast" attribute
with the following ABNF syntax:The "a=simulcast" attribute has a parameter in the form of one or
two simulcast stream descriptions, each consisting of a direction
("send" or "recv"), followed by a list of one or more simulcast
streams. Each simulcast stream consists of one or more alternative
simulcast formats. Each simulcast format is identified by a simulcast
stream identification (SCID). The SCID MUST have the form of an RTP
stream identifier, as described by RTP Payload Format
Restrictions.In the list of simulcast streams, each simulcast stream is
separated by a semicolon (";"). Each simulcast stream can in turn be
offered in one or more alternative formats, represented by SCIDs,
separated by a comma (","). Each SCID can also be specified as
initially paused, indicated by
prepending a "~" to the SCID. The reason to allow separate initial
pause states for each SCID is that pause capability can be specified
individually for each RTP payload type referenced by an SCID. Since
pause capability specified via the "a=rtcp-fb" attribute and SCID
specified by "a=rid" can refer to common payload types, it is
unfeasible to pause streams with SCID where any of the related RTP
payload type(s) do not have pause capability.Examples:Above are two examples of different "a=simulcast" lines.The first line is an example offer to send two simulcast streams
and to receive two simulcast streams. The first simulcast stream in
send direction can be sent in three different alternative formats
(SCID 1, 2, 3), and the second simulcast stream in send direction can
be sent in two different alternative formats (SCID 4, 5). Both of the
second simulcast stream alternative formats in send direction are
offered as initially paused. The first simulcast stream in receive
direction has no alternative formats (SCID 6). The second simulcast
stream in receive direction has two alternative formats (SCID 7, 8)
that are both offered as initially paused.The second line is an example answer to the first line, accepting
to send and receive the two offered simulcast streams, however send
and receive directions are specified in opposite order compared to the
first line, which lets the answer keep the same order of simulcast
streams in the SDP as in the offer, for convenience, even though
directionality is reversed. This example answer has removed all
offered alternative formats for the first simulcast stream (keeping
only SCID 1), but kept alternative formats for the second simulcast
stream in receive direction (4, 5). The answer thus accepts to send
two simulcast streams, without alternatives. The answer does not
accept initial pause of any simulcast streams, in either direction.
More examples can be found in .Simulcast capability is expressed through a new media level SDP attribute, "a=simulcast". The meaning of
the attribute on SDP session level is undefined, MUST NOT be used by
implementations of this specification and MUST be ignored if received
on session level. Extensions to this specification MAY define such
session level usage. The meaning of including multiple "a=simulcast"
lines in a single SDP media description is undefined, MUST NOT be used
by implementations of this specification and any additional
"a=simulcast" lines beyond the first under an "m=" line MUST be
ignored if received.There are separate and independent sets of simulcast streams in
send and receive directions. When listing multiple directions, each
direction MUST NOT occur more than once on the same line.Simulcast streams using undefined SCID MUST NOT be used as valid
simulcast streams by an RTP stream receiver. The direction for an SCID
MUST be aligned with the direction specified for the corresponding RTP
stream identifier on the "a=rid" line.The listed number of simulcast streams for a direction sets a limit
to the number of supported simulcast streams in that direction. The
order of the listed simulcast streams in the "send" direction suggests
a proposed order of preference, in decreasing order: the SCID listed
first is the most preferred and subsequent streams have progressively
lower preference. The order of the listed SCID in the "recv" direction
expresses a preference which simulcast streams that are preferred,
with the leftmost being most preferred. This can be of importance if
the number of actually sent simulcast streams have to be reduced for
some reason.SCID that have explicit dependencies to other SCID (even in the same
media description) MAY be used.Use of more than a single, alternative simulcast format for a
simulcast stream MAY be specified as part of the attribute parameters
by expressing the simulcast stream as a comma-separated list of
alternative SCID. In this case, it is not possible to align what
alternative SCID that are used across different simulcast streams,
like requiring all simulcast streams to use SCID alternatives
referring to the same codec format. The order of the SCID alternatives
within a simulcast stream is significant; the SCID alternatives are
listed from (left) most preferred to (right) least preferred. For the
use of simulcast, this overrides the normal codec preference as
expressed by format type ordering on the "m=" line, using regular SDP
rules. This is to enable a separation of general codec preferences and
simulcast stream configuration preferences.A simulcast stream can use a codec defined such that the same RTP
SSRC can change RTP payload type multiple times during a session,
possibly even on a per-packet basis. A typical example can be a speech
codec that makes use of Comfort Noise
and/or DTMF formats. In those cases,
such "related" formats MUST NOT be defined as having their own SCID
listed explicitly in the attribute parameters, since they are not
strictly simulcast streams of the media source, but rather a specific
way of generating the RTP stream of a single simulcast stream with
varying RTP payload type.If RTP stream pause/resume is
supported, any SCID MAY be prefixed by a "~" character to indicate
that the corresponding simulcast stream is initially paused already
from start of the RTP session. In this case, support for RTP stream
pause/resume MUST also be included under the same "m=" line where
"a=simulcast" is included. All RTP payload types related to such
initially paused simulcast stream MUST be listed in the SDP as
pause/resume capable as specified by , e.g. by
using the "*" wildcard format for "a=rtcp-fb".An initially paused simulcast stream in "send" direction MUST be
considered equivalent to an unsolicited locally paused stream, and be
handled accordingly. Initially paused simulcast streams are resumed as
described by the RTP pause/resume specification. An RTP stream
receiver that wishes to resume an unsolicited locally paused stream
needs to know the SSRC of that stream. The SSRC of an initially paused
simulcast stream can be obtained from an RTP stream sender RTCP Sender
Report (SR) including both the desired SSRC as "SSRC of sender", and
the SCID value in an RtpStreamId
RTCP SDES item.Including an initially paused simulcast stream in "recv" direction
in an SDP towards an RTP sender, SHOULD cause the remote RTP sender to
put the stream as unsolicited locally paused, unless there are other
RTP stream receivers that do not mark the simulcast stream as
initially paused. The reason to require an initially paused "recv"
stream to be considered locally paused by the remote RTP sender,
instead of making it equivalent to implicitly sending a pause request,
is because the pausing RTP sender cannot know which receiving SSRC
owns the restriction when TMMBR/TMMBN are used for pause/resume
signaling since the RTP receiver's SSRC in send direction is sometimes
not yet known.Use of the redundant audio data
format could be seen as a form of simulcast for loss protection
purposes, but is not considered conflicting with the mechanisms
described in this memo and MAY therefore be used as any other format.
In this case the "red" format, rather than the carried formats, SHOULD
be the one to list as a simulcast stream on the "a=simulcast"
line.The media formats and corresponding characteristics of simulcast
streams SHOULD be chosen such that they are different, either as
different SDP formats with differing "a=rtpmap" and/or "a=fmtp" lines,
as differently defined RTP payload format restrictions, or both. If
this difference is not required, RTP
duplication procedures SHOULD be considered instead of
simulcast.Note: The inclusion of "a=simulcast" or the use of simulcast
does not change any of the interpretation or Offer/Answer
procedures for other SDP attributes, like "a=fmtp" or "a=rid".An offerer wanting to use simulcast SHALL include the
"a=simulcast" attribute in the offer. An offerer listing a set of
receive simulcast streams and/or alternative formats as SCID in the
offer MUST be prepared to receive RTP streams for any of those
simulcast streams and/or alternative formats from the answerer.An answerer that does not understand the concept of simulcast
will also not know the attribute and will remove it in the SDP
answer, as defined in existing SDP
Offer/Answer procedures. Similarly, an answerer that receives
an offer with the "a=simulcast" attribute on session level SHALL
remove it in the answer. An answerer that understands the attribute
but receives multiple "a=simulcast" attributes under the same "m="
line SHALL ignore and remove all but the first in the answer. An answerer that does understand the attribute and that wants to
support simulcast in an indicated direction SHALL reverse
directionality of the unidirectional direction parameters; "send"
becomes "recv" and vice versa, and include it in the answer.An answerer that receives an offer with simulcast containing an
"a=simulcast" attribute listing alternative SCID MAY keep all the
alternative SCID in the answer, but it MAY also choose to remove any
non-desirable alternative SCID in the answer. The answerer MUST NOT
add any alternative SCID in send direction in the answer that were
not present in the offer receive direction. The answerer MUST be
prepared to receive any of the receive direction SCID alternatives,
and MAY send any of the send direction alternatives that are kept in
the answer.An answerer that receives an offer with simulcast that lists a
number of simulcast streams, MAY reduce the number of simulcast
streams in the answer, but MUST NOT add simulcast streams.An answerer that receives an offer without RTP stream
pause/resume capability MUST NOT mark any simulcast streams as
initially paused in the answer.An RTP stream pause/resume capable answerer that receives an
offer with RTP stream pause/resume capability MAY mark any SCID that
refer to pause/resume capable formats as initially paused in the
answer.An answerer that receives indication in an offer of an SCID being
initially paused SHOULD mark that SCID as initially paused also in
the answer, regardless of direction, unless it has good reason for
the SCID not being initially paused. One such reason could, for
example, be that the answerer would otherwise initially not receive
any media of that type at all.An offerer that receives an answer without "a=simulcast" MUST NOT
use simulcast towards the answerer. An offerer that receives an
answer with "a=simulcast" without any SCID in a specified direction
MUST NOT use simulcast in that direction.An offerer that receives an answer where some SCID alternatives
are kept MUST be prepared to receive any of the kept send direction
SCID alternatives, and MAY send any of the kept receive direction
SCID alternatives.An offerer that receives an answer where some of the SCID are
removed compared to the offer MAY release the corresponding
resources (codec, transport, etc) in its receive direction and MUST
NOT send any RTP packets corresponding to the removed SCID.An offerer that offered some of its SCID as initially paused and
that receives an answer that does not indicate RTP stream
pause/resume capability, MUST NOT initially pause any simulcast
streams.An offerer with RTP stream pause/resume capability that receives
an answer where some SCID are marked as initially paused, SHOULD
initially pause those RTP streams regardless if they were marked as
initially paused also in the offer, unless it has good reason for
those RTP streams not being initially paused. One such reason could,
for example, be that the answerer would otherwise initially not
receive any media of that type at all.Offers and answers inside an existing session follow the rules
for initial session negotiation, with the additional restriction
that any SCID marked as initially paused in such offer or answer
MUST already be paused, thus a new offer/answer MUST NOT replace use
of RTP stream pause/resume in the
session. Session modification restrictions in section 6.5 of RTP payload format restrictions
also apply.This document does not define the use of "a=simulcast" in
declarative SDP, partly motivated by use of the simulcast format identification
not being defined for use in declarative SDP. If concrete use cases
for simulcast in declarative SDP are identified in the future, we
expect that additional specifications will address such use.Note: It may not be beneficial for declarative use to be
limited to a single media source per "m=" line, as elaborated
further in .Simulcast RTP streams MUST be related on RTP level through RtpStreamId, as specified in the
SDP "a=simulcast" attribute parameters.
This is sufficient as long as there is only a single media source per
SDP media description. When using BUNDLE, where
multiple SDP media descriptions jointly specify a single RTP session,
the SDES MID identification mechanism in BUNDLE allows relating RTP
streams back to individual media descriptions, after which the above
described RtpStreamId relations can be used. Use of the RTP header extension for both MID and
RtpStreamId identifications can be important to ensure rapid initial
reception, required to correctly interpret and process the RTP
streams. Implementers of this specification MUST support the RTCP
source description (SDES) item method and SHOULD support RTP header
extension method to signal RtpStreamId on RTP level.RTP streams MUST only use a single alternative SCID at a time
(based on RTP timestamps), but MAY change format on a per-RTP packet
basis. This corresponds to the existing (non-simulcast) SDP
offer/answer case when multiple formats are included on the "m=" line
in the SDP answer.These examples describe a client to video conference service, using
a centralized media topology with an RTP mixer.Alice is calling in to the mixer with a simulcast-enabled client
capable of a single media source per media type. The client can send
a simulcast of 2 video resolutions and frame rates: HD 1280x720p
30fps and thumbnail 320x180p 15fps. This is defined below using the
"imageattr". In this example, only the
"pt" "a=rid" parameter is used, effectively achieving a 1:1 mapping
between RtpStreamId and media formats (RTP payload types), to
describe simulcast stream formats. Alice's Offer:The only thing in the SDP that indicates simulcast capability is
the line in the video media description containing the "simulcast"
attribute. The included "a=fmtp" and "a=imageattr" parameters
indicates that sent simulcast streams can differ in video
resolution. The RTP header extension for RtpStreamId is offered to
avoid issues with the initial binding between RTP streams (SSRCs)
and the RtpStreamId identifying the simulcast stream and its
format.The Answer from the server indicates that it too is simulcast
capable. Should it not have been simulcast capable, the
"a=simulcast" line would not have been present and communication
would have started with the media negotiated in the SDP. Also the
usage of the RtpStreamId RTP header extension is accepted.Since the server is the simulcast media receiver, it reverses the
direction of the "simulcast" and "rid" attribute parameters.Fred is calling in to the same conference as in the example above
with a two-camera, two-display system, thus capable of handling two
separate media sources in each direction, where each media source is
simulcast-enabled in the send direction. Fred's client is restricted
to a single media source per media description.The first two simulcast streams for the first media source use
different codecs, H264-SVC and H264. These two simulcast streams also have
a temporal dependency. Two different video codecs, VP8 and H264, are offered as alternatives
for the third simulcast stream for the first media source. Only the
highest fidelity simulcast stream is sent from start, the lower
fidelity streams being initially paused.The second media source is offered with three different simulcast
streams. All video streams of this second media source are loss
protected by RTP retransmission. Also
here, all but the highest fidelity simulcast stream are initially
paused.Fred's client is also using BUNDLE to send all RTP streams from
all media descriptions in the same RTP session on a single media
transport. Although using many different simulcast streams in this
example, the use of RtpStreamId as simulcast stream identification
enables use of a low number of RTP payload types. Note that the use
of both BUNDLE and
"a=rid" recommends using
the RTP header extension for carrying
these RTP stream identification fields, which is consequently also
included in the SDP. Note also that for "a=rid", the corresponding
SDES attribute is named RtpStreamId.Note: Empty lines in the SDP above are added only for
readability and would not be present in an actual SDP.This section discusses what the different entities in a simulcast
media path can expect to happen on RTP level. This is explored from
source to sink by starting in an endpoint with a media source that is
simulcasted to a RTP middlebox. That RTP middlebox sends media sources
both to other RTP middleboxes (cascaded middleboxes), as well as
selecting some simulcast format of the media source and sending it to
receiving endpoints. Different types of RTP middleboxes and their usage
of the different simulcast formats results in several different
behaviors.The most straightforward simulcast case is the RTP streams being
emitted from the endpoint that originates a media source. When
simulcast has been negotiated in the sending direction, the endpoint
can transmit up to the number of RTP streams needed for the negotiated
simulcast streams for that media source. Each RTP stream (SSRC) is
identified by associating it with
an RtpStreamId SDES item, transmitted in RTCP and possibly also as an
RTP header extension. In cases where multiple media sources have been
negotiated for the same RTP session and thus BUNDLE is used,
also the MID SDES item will be sent similarly to the RtpStreamId.Each RTP stream may not be continuously transmitted due to any of
the following reasons; temporarily paused using Pause/Resume, sender side application logic
temporarily pausing it, or lack of network resources to transmit this
simulcast stream. However, all simulcast streams that have been
negotiated have active and maintained SSRC (at least in regular RTCP
reports), even if no RTP packets are currently transmitted. The
relation between an RTP Stream (SSRC) and a particular simulcast
stream is not expected to change, except in exceptional situations
such as SSRC collisions. At SSRC changes, the usage of MID and
RtpStreamId should enable the receiver to correctly identify the RTP
streams even after an SSRC change.RTP streams in a multi-party RTP session can be used in multiple
different ways, when the session utilizes simulcast at least on the
media source to middlebox legs. This is to a large degree due to the
different RTP middlebox behaviors, but also the needs of the
application. This text assumes that the RTP middlebox will select a
media source and choose which simulcast stream for that media source
to deliver to a specific receiver. In many cases, at most one
simulcast stream per media source will be forwarded to a particular
receiver at any instant in time, even if the selected simulcast stream
may vary. For cases where this does not hold due to application needs,
then the RTP stream aspects will fall under the middlebox to middlebox
case .The selection of which simulcast streams to forward towards the
receiver, is application specific. However, in conferencing
applications, active speaker selection is common. In case the number
of media sources possible to forward, N, is less than the total amount
of media sources available in an multi-media session, the current and
previous speakers (up to N in total) are often the ones forwarded. To
avoid the need for media specific processing to determine the current
speaker(s) in the RTP middlebox, the endpoint providing a media source
may include meta data, such as the RTP Header
Extension for Client-to-Mixer Audio Level Indication.The possibilities for stream switching are media type specific, but
for media types with significant interframe dependencies in the
encoding, like most video coding, the switching needs to be made at
suitable switching points in the media stream that breaks or otherwise
deals with the dependency structure. Even if switching points can be
included periodically, it is common to use mechanisms like Full Intra Requests to request switching
points from the endpoint performing the encoding of the media
source.Inclusion of the RtpStreamId SDES item for an SSRC in the middlebox
to receiver direction should only occur when use of RtpStreamId has
been negotiated in that direction. It is worth noting that one can
signal multiple RtpStreamIds when simulcast signalling indicates only
a single simulcast stream, allowing one to use all of the RtpStreamIds
as alternatives for that simulcast stream. One reason for including
the RtpStreamId in the middlebox to receiver direction for an RTP
stream is to let the receiver know which restrictions apply to the
currently delivered RTP stream. In case the RtpStreamId is negotiated
to be used, it is important to remember that the used identifiers will
be specific to each signalling session. Even if the central entity can
attempt to coordinate, it is likely that the RtpStreamIds need to be
translated to the leg specific values. The below cases will have as
base line that RtpStreamId is not used in the mixer to receiver
direction.This section discusses the behavior in cases where the RTP
middlebox behaves like the Media-Switching Mixer (Section 3.6.2) in
RTP Topologies. The fundamental aspect
here is that the media sources delivered from the middlebox will be
the mixer's conceptual or functional ones. For example, one media
source may be the main speaker in high resolution video, while a
number of other media sources are thumbnails of each
participant.The above results in that the RTP stream produced by the mixer is
one that switches between a number of received incoming RTP streams
for different media sources and in different simulcast versions. The
mixer selects the media source to be sent as one of the RTP streams,
and then selects among the available simulcast streams for the most
appropriate one. The selection criteria include available bandwidth
on the mixer to receiver path and restrictions based on the
functional usage of the RTP stream delivered to the receiver. An
example of the latter, is that it is unnecessary to forward a full
HD video to a receiver if the display area is just a thumbnail.
Thus, restrictions may exist to not allow some simulcast streams to
be forwarded for some of the mixer's media sources.This will result in a single RTP stream being used for a
particular of the RTP mixer's media sources. This RTP stream is at
any point in time a selection of one particular RTP stream arriving
to the mixer, where the RTP header field values are rewritten to
provide a consistent, single RTP stream. If the RTP mixer doesn't
receive any incoming stream matched to this media source, the SSRC
will not transmit, but be kept alive using RTCP. The SSRC and thus
RTP stream for the mixer's media source is expected to be long term
stable. It will only be changed by signalling or other disruptive
events. Note that although the above talks about a single RTP
stream, there can in some cases be multiple RTP streams carrying the
selected simulcast stream for the originating media source,
including repair or other auxiliary RTP streams.The mixer may communicate the identity of the originating media
source to the receiver by including the CSRC field with the
originating media source's SSRC value. Note that due to the
possibility that the RTP mixer switches between simulcast versions
of the media source, the CSRC value may change, even if the media
source is kept the same.It is important to note that any MID SDES item from the
originating media source needs to be removed and not be associated
with the RTP stream's SSRC. This as there is nothing in the
signalling between the mixer and the receiver that is structured
around the originating media sources, only the mixer's media
sources. If they would be associated with the SSRC, the receiver
would likely believe that there has been an SSRC collision, and that
the RTP stream is spurious as it doesn't carry the identifiers used
to relate it to the correct context. However, this is not true for
CSRC values, as long as they are never used as SSRC. In these cases
one could provide CNAME and MID as SDES items. A receiver could use
this to determine which CSRC values that are associated with the
same originating media source.If RtpStreamIds are used in this scenario, it should be noted
that the RtpStreamId on a particular SSRC will change based on the
actual simulcast stream selected for switching. These RtpStreamId
identifiers will be local to this leg's signalling context. In
addition, the defined RtpStreamIds and their parameters need to
cover all the media sources and simulcast streams that can be
switched into this media source.This section discusses the behavior in cases where the RTP
middlebox behaves like the Selective Forwarding Middlebox (Section
3.7) in RTP Topologies. Applications
for this type of RTP middlebox results in that each originating
media source will have a corresponding media source on the leg
between the middlebox and the receiver. A SFM could go as far as
exposing all the simulcast streams for an media source, however this
section will focus on having a single simulcast stream that can
contain any of the simulcast formats. This section will assume that
the SFM projection mechanism works on media source level, and maps
one of the media source's simulcast streams onto one RTP stream from
the SFM to the receiver.This usage will result in that the individual RTP stream(s) for
one media source can switch between being active to paused, based on
the subset of media sources the SFM wants to provide the receiver
for the moment. With SFMs there exist no reasons to use CSRC to
indicate the originating stream, as there is a one to one media
source mapping. If the application requires knowing the simulcast
version received to function well, then RtpStreamId should be
negotiated on the SFM to receiver leg. Which simulcast stream that
is being forwarded is not made explicit unless RtpStreamId is used
on the leg.Any MID SDES items being sent by the SFM to the receiver are only
those agreed between the SFM and the receiver, and no MID values
from the originating side of the SFM are to be forwarded.A SFM could expose corresponding RTP streams for all the media
sources and their simulcast streams, and then for any media source
that is to be provided forward one selected simulcast stream.
However, this is not recommended as it would unnecessarily increase
the number of RTP streams and require the receiver to timely detect
switching between simulcast streams. The above usage requires the
same SFM functionality for switching, while avoiding the
uncertainties of timely detecting that a RTP stream ends. The
benefit would be that the received simulcast stream would be
implicitly provided by which RTP stream would be active for a media
source. However, using RtpStreamId to make this explicit also
exposes which alternative format is used. The conclusion is that
using one RTP stream per simulcast stream is unnecessary. The issue
with timely detecting end of streams, independent if they are
stopped temporarily or long term, is that there is no explicit
indication that the transmission has intentionally been stopped. The
RTCP based Pause and Resume mechanism
includes a PAUSED indication that provides the last RTP sequence
number transmitted prior to the pause. Due to usage, the timeliness
of this solution depends on when delivery using RTCP can occur in
relation to the transmission of the last RTP packet. If no explicit
information is provided at all, then detection based on non
increasing RTCP SR field values and timers need to be used to
determine pause in RTP packet delivery. This results in that one can
usually not determine when the last RTP packet arrives (if it
arrives) that this will be the last. That it was the last is
something that one learns later.This relates to the transmission of simulcast streams between RTP
middleboxes or other usages where one wants to enable the delivery of
multiple simultaneous simulcast streams per media source, but the
transmitting entity is not the originating endpoint. For a particular
direction between middlebox A and B, this looks very similar to the
originating to middlebox case on a media source basis. However, in
this case there is usually multiple media sources, originating from
multiple endpoints. This can create situations where limitations in
the number of simultaneous received media streams can arise, for
example due to limitation in network bandwidth. In this case, a subset
of not only the simulcast streams, but also media sources can be
selected. This results in that individual RTP streams can be become
paused at any point and later being resumed based on various
criteria.The MIDs used between A and B are the ones agreed between these two
identities in signalling. The RtpStreamId values will also be provided
to ensure explicit information about which simulcast stream they are.
The RTP stream to MID and RtpStreamId associations should here be long
term stable.Simulcast is in this memo defined as the act of sending multiple
alternative encoded streams of the same underlying media source. When
transmitting multiple independent streams that originate from the same
source, it could potentially be done in several different ways using
RTP. A general discussion on considerations for use of the different RTP
multiplexing alternatives can be found in Guidelines for
Multiplexing in RTP. Discussion and clarification on how to
handle multiple streams in an RTP session can be found in .The network aspects that are relevant for simulcast are:When using simulcast it might be
of interest to prioritize a particular simulcast stream, rather than
applying equal treatment to all streams. For example, lower bit-rate
streams may be prioritized over higher bit-rate streams to minimize
congestion or packet losses in the low bit-rate streams. Thus, there
is a benefit to use a simulcast solution with good QoS support.Using multiple RTP sessions incurs
more cost for NAT/FW traversal unless they can re-use the same
transport flow, which can be achieved by Multiplexing
Negotiation Using SDP Port Numbers.Use of multiple simulcast streams can require a significant amount
of network resources. If the amount of available network resources
varies during an RTP session such that it does not match what is
negotiated in SDP, the bitrate used by the different simulcast streams
may have to be reduced dynamically. What simulcast streams to
prioritize when allocating available bitrate among the simulcast
streams in such adaptation SHOULD be taken from the simulcast stream
order on the "a=simulcast" line. Simulcast streams that have
pause/resume capability and that would be given such low bitrate by
the adaptation process that they are considered not really useful can
be temporarily paused until the limiting condition clears.The chosen approach has a limitation that relates to the use of a
single RTP session for all simulcast formats of a media source, which
comes from sending all simulcast streams related to a media source under
the same SDP media description.It is not possible to use different simulcast streams on different
media transports, limiting the possibilities to apply different QoS to
different simulcast streams. When using unicast, QoS mechanisms based on
individual packet marking are feasible, since they do not require
separation of simulcast streams into different RTP sessions to apply
different QoS.It is also not possible to separate different simulcast streams into
different multicast groups to allow a multicast receiver to pick the
stream it wants, rather than receive all of them. In this case, the only
reasonable implementation is to use different RTP sessions for each
multicast group so that reporting and other RTCP functions operate as
intended. Such simulcast usage in multicast context is out of scope for
the current document and would require additional specification.This document requests to register a new media-level SDP attribute,
"simulcast", in the "att-field (media level only)" registry within the
SDP parameters registry, according to the procedures of and .IETF, contacted via
mmusic@ietf.org, or a successor address designated by IESGsimulcastSimulcast stream
descriptionNoSee of RFC
XXXX.Signals simulcast capability for a set of RTP
streamsNORMALNote to RFC Editor: Please replace "RFC XXXX" with the assigned
number of this RFC.The simulcast capability, configuration attributes, and parameters
are vulnerable to attacks in signaling.A false inclusion of the "a=simulcast" attribute may result in
simultaneous transmission of multiple RTP streams that would otherwise
not be generated. The impact is limited by the media description joint
bandwidth, shared by all simulcast streams irrespective of their number.
There may however be a large number of unwanted RTP streams that will
impact the share of bandwidth allocated for the originally wanted RTP
stream.A hostile removal of the "a=simulcast" attribute will result in
simulcast not being used.Neither of the above will likely have any major consequences and can
be mitigated by signaling that is at least integrity and source
authenticated to prevent an attacker to change it.Security considerations related to the use of "a=rid" and the
RtpStreamId SDES item is covered in
and . There are no additional
security concerns related to their use in this specification.Morgan Lindqvist and Fredrik Jansson, both from Ericsson, have
contributed with important material to the first versions of this
document. Robert Hansen and Cullen Jennings, from Cisco, Peter Thatcher,
from Google, and Adam Roach, from Mozilla, contributed significantly to
subsequent versions.The authors would like to thank Bernard Aboba, Thomas Belling, Roni
Even, and Adam Roach for the feedback they provided during the
development of this document.NOTE TO RFC EDITOR: Please remove this section prior to
publication.Added section on RTP AspectsAdded a requirement (5-4) on that capability exchange must be
capable of handling multi RTP stream cases.Added extmap attribute also on first signalling example as it
is a recommended to use mechanism.Clarified the definition of the simulcast attribute and how
simulcast streams relates to simulcast formats and SCIDs.Updated References list and moved around some references
between informative and normative categories.Editorial improvements and corrections.Aligned with recent changes in draft-ietf-mmusic-rid and
draft-ietf-avtext-rid.Modified the SDP offer/answer section to follow the generally
accepted structure, also adding a brief text on modifying the
session that is aligned with draft-ietf-mmusic-rid.Improved text around simulcast stream identification (as
opposed to the simulcast stream itself) to consistently use the
acronym SCID and defined that in the Terminology section.Changed references for RTP-level pause/resume and VP8 payload
format that are now published as RFC.Improved IANA registration text.Removed unused reference to
draft-ietf-payload-flexible-fec-scheme.Editorial improvements and corrections.Changed to only use RID identification, as was consensus during
IETF 94.ABNF improvements.Clarified offer-answer rules for initially paused streams.Changed references for RTP topologies and RTP taxonomy
documents that are now published as RFC.Added reference to the new RID draft in AVTEXT.Re-structured section 6 to provide an easy reference by the
updated IANA section.Added a sub-section 7.1 with a discussion of bitrate
adaptation.Editorial improvements.Removed text on multicast / broadcast from use cases, since it
is not supported by the solution.Removed explicit references to unified plan draft.Added possibility to initiate simulcast streams in paused
mode.Enabled an offerer to offer multiple stream identification (pt
or rid) methods and have the answerer choose which to use.Added a preference indication also in send direction
offers.Added a section on limitations of the current proposal,
including identification method specific limitations.Relying on the new RID solution for codec constraints and
configuration identification. This has resulted in changes in
syntax to identify if pt or RID is used to describe the simulcast
stream.Renamed simulcast version and simulcast version alternative to
simulcast stream and simulcast format respectively, and improved
definitions for them.Clarification that it is possible to switch between simulcast
version alternatives, but that only a single one be used at any
point in time.Changed the definition so that ordering of simulcast formats
for a specific simulcast stream do have a preference order.No changes. Only preventing expiry.Added this appendix.