Javascript Session Establishment ProtocolGoogle747 6th Ave SKirklandWA98033USAjustin@uberti.nameCisco170 West Tasman DriveSan JoseCA95134USAfluffy@iii.caMozilla331 Evelyn AveMountain ViewCA94041USAekr@rtfm.com
RAI
This document describes the mechanisms for allowing a Javascript
application to control the signaling plane of a multimedia session via
the interface specified in the W3C RTCPeerConnection API, and discusses
how this relates to existing signaling protocols.This document describes how the W3C WEBRTC RTCPeerConnection
interface is used to
control the setup, management and teardown of a multimedia session.The thinking behind WebRTC call setup has been to fully specify and
control the media plane, but to leave the signaling plane up to the
application as much as possible. The rationale is that different
applications may prefer to use different protocols, such as the
existing SIP or Jingle call signaling protocols, or something custom
to the particular application, perhaps for a novel use case. In this
approach, the key information that needs to be exchanged is the
multimedia session description, which specifies the necessary
transport and media configuration information necessary to establish
the media plane.With these considerations in mind, this document describes the
Javascript Session Establishment Protocol (JSEP) that allows for full
control of the signaling state machine from Javascript.
JSEP removes the browser almost entirely from the core signaling flow, which
is instead handled by the Javascript making use of two interfaces:
(1) passing in local and remote session descriptions and (2) interacting
with the ICE state machine.
In this document, the use of JSEP is described as if it always
occurs between two browsers. Note though in many cases it will
actually be between a browser and some kind of server, such as a
gateway or MCU. This distinction is invisible to the browser; it just
follows the instructions it is given via the API.JSEP's handling of session descriptions is simple and
straightforward. Whenever an offer/answer exchange is needed, the
initiating side creates an offer by calling a createOffer() API. The
application optionally modifies that offer, and then uses it to set up
its local config via the setLocalDescription() API. The offer is then
sent off to the remote side over its preferred signaling mechanism
(e.g., WebSockets); upon receipt of that offer, the remote party
installs it using the setRemoteDescription() API.To complete the offer/answer exchange, the remote party uses the createAnswer() API
to generate an appropriate answer, applies it using the
setLocalDescription() API, and sends the answer back to the initiator over
the signaling channel. When the initiator gets that answer, it installs
it using the setRemoteDescription() API, and initial setup is complete. This
process can be repeated for additional offer/answer exchanges.Regarding ICE , JSEP decouples the
ICE state machine from the overall signaling state machine, as the ICE
state machine must remain in the browser, because only the browser has
the necessary knowledge of candidates and other transport info.
Performing this separation also provides additional flexibility; in
protocols that decouple session descriptions from transport, such as
Jingle, the session description can be sent immediately and the
transport information can be sent when available. In protocols
that don't, such as SIP, the information can be used in the aggregated
form. Sending transport information separately can allow for faster
ICE and DTLS startup, since ICE checks can start as soon as any
transport information is available rather than waiting for all of
it.Through its abstraction of signaling, the JSEP approach does
require the application to be aware of the signaling process. While
the application does not need to understand the contents of session
descriptions to set up a call, the application must call the right
APIs at the right times, convert the session descriptions and ICE
information into the defined messages of its chosen signaling
protocol, and perform the reverse conversion on the messages it
receives from the other side.One way to mitigate this is to provide a Javascript library that
hides this complexity from the developer; said library would implement
a given signaling protocol along with its state machine and
serialization code, presenting a higher level call-oriented interface
to the application developer. For example, libraries exist to
adapt the JSEP API into an API suitable for a SIP or XMPP.
Thus, JSEP provides greater control
for the experienced developer without forcing any additional
complexity on the novice developer.One approach that was considered instead of JSEP was to include a
lightweight signaling protocol. Instead of providing session
descriptions to the API, the API would produce and consume messages
from this protocol. While providing a more high-level API, this put
more control of signaling within the browser, forcing the browser to
have to understand and handle concepts like signaling glare. In
addition, it prevented the application from driving the state machine
to a desired state, as is needed in the page reload case.A second approach that was considered but not chosen was to
decouple the management of the media control objects from session
descriptions, instead offering APIs that would control each component
directly. This was rejected based on a feeling that requiring exposure
of this level of complexity to the application programmer would not be
beneficial; it would result in an API where even a simple example
would require a significant amount of code to orchestrate all the
needed interactions, as well as creating a large API surface that
needed to be agreed upon and documented. In addition, these API points
could be called in any order, resulting in a more complex set of
interactions with the media subsystem than the JSEP approach, which
specifies how session descriptions are to be evaluated and
applied.One variation on JSEP that was considered was to keep the basic
session description-oriented API, but to move the mechanism for
generating offers and answers out of the browser. Instead of providing
createOffer/createAnswer methods within the browser, this approach
would instead expose a getCapabilities API which would provide the
application with the information it needed in order to generate its
own session descriptions. This increases the amount of work that the
application needs to do; it needs to know how to generate session
descriptions from capabilities, and especially how to generate the
correct answer from an arbitrary offer and the supported capabilities.
While this could certainly be addressed by using a library like the
one mentioned above, it basically forces the use of said library even
for a simple example. Providing createOffer/createAnswer avoids this
problem, but still allows applications to generate their own
offers/answers (to a large extent) if they choose, using the
description generated by createOffer as an indication of the browser's
capabilities.The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in .JSEP does not specify a particular signaling model or state
machine, other than the generic need to exchange session
descriptions in the fashion described by (offer/answer) in order for both sides of the
session to know how to conduct the session. JSEP provides mechanisms
to create offers and answers, as well as to apply them to a session.
However, the browser is totally decoupled from the actual mechanism by
which these offers and answers are communicated to the remote side,
including addressing, retransmission, forking, and glare handling.
These issues are left entirely up to the application; the application
has complete control over which offers and answers get handed to the
browser, and when.In order to establish the media plane, the user agent needs
specific parameters to indicate what to transmit to the remote side,
as well as how to handle the media that is received. These parameters
are determined by the exchange of session descriptions in offers and
answers, and there are certain details to this process that must be
handled in the JSEP APIs.Whether a session description applies to the local side or the
remote side affects the meaning of that description. For example, the
list of codecs sent to a remote party indicates what the local side is
willing to receive, which, when intersected with the set of codecs the
remote side supports, specifies what the remote side should send.
However, not all parameters follow this rule; for example, the DTLS-SRTP
parameters sent to a remote party
indicate what certificate the local side will use in DTLS setup, and
thereby what the remote party should expect to receive; the remote party
will have to accept these parameters, with no option to choose different
values.
In addition, various RFCs put different conditions on the format of
offers versus answers. For example, an offer may propose an arbitrary
number of media streams (i.e. m= sections), but an answer must contain
the exact same number as the offer.Lastly, while the exact media parameters are only known only after
an offer and an answer have been exchanged, it is possible for the
offerer to receive media after they have sent an offer and before they
have received an answer. To properly process incoming media in this
case, the offerer's media handler must be aware of the details of the
offer before the answer arrives.Therefore, in order to handle session descriptions properly, the
user agent needs: To know if a session description pertains to the local or
remote side.To know if a session description is an offer or an answer.To allow the offer to be specified independently of the
answer. JSEP addresses this by adding both setLocalDescription and
setRemoteDescription methods and having session description objects
contain a type field indicating the type of session description being
supplied. This satisfies the requirements listed above for both the
offerer, who first calls setLocalDescription(sdp [offer]) and then
later setRemoteDescription(sdp [answer]), as well as for the answerer,
who first calls setRemoteDescription(sdp [offer]) and then later
setLocalDescription(sdp [answer]).JSEP also allows for an answer to be treated as provisional by the
application. Provisional answers provide a way for an answerer to
communicate initial session parameters back to the offerer, in order
to allow the session to begin, while allowing a final answer to be
specified later. This concept of a final answer is important to the
offer/answer model; when such an answer is received, any extra
resources allocated by the caller can be released, now that the exact
session configuration is known. These "resources" can include things
like extra ICE components, TURN candidates, or video decoders.
Provisional answers, on the other hand, do no such deallocation
results; as a result, multiple dissimilar provisional answers can be
received and applied during call setup.In , the constraint at the signaling
level is that only one offer can be outstanding for a given session,
but at the media stack level, a new offer can be generated at any
point. For example, when using SIP for signaling, if one offer is
sent, then cancelled using a SIP CANCEL, another offer can be
generated even though no answer was received for the first offer. To
support this, the JSEP media layer can provide an offer via the
createOffer() method whenever the
Javascript application needs one for the signaling.
The answerer can send back zero or more provisional answers, and finally end the
offer-answer exchange by sending a final answer. The state machine for
this is as follows:Aside from these state transitions there is no other difference
between the handling of provisional ("pranswer") and final ("answer")
answers.In the WebRTC specification, session descriptions are formatted as
SDP messages. While this format is not optimal for manipulation from
Javascript, it is widely accepted, and frequently updated with new
features. Any alternate encoding of session descriptions would have to
keep pace with the changes to SDP, at least until the time that this
new encoding eclipsed SDP in popularity. As a result, JSEP currently
uses SDP as the internal representation for its session
descriptions.However, to simplify Javascript processing, and provide for future
flexibility, the SDP syntax is encapsulated within a
SessionDescription object, which can be constructed from SDP, and be
serialized out to SDP. If future specifications agree on a JSON format
for session descriptions, we could easily enable this object to
generate and consume that JSON.Other methods may be added to SessionDescription in the future to
simplify handling of SessionDescriptions from Javascript. In the
meantime, Javascript libraries can be used to perform these
manipulations.Note that most applications should be able to treat the
SessionDescriptions produced and consumed by these various API calls
as opaque blobs; that is, the application will not need to read or
change them. The W3C WebRTC API specification will provide appropriate
APIs to allow the application to control various session parameters,
which will provide the necessary information to the browser about what
sort of SessionDescription to produce.JSEP gathers ICE candidates as needed by the application.
Collection of ICE candidates is referred to as a gathering phase,
and this is triggered either by the addition of a new or recycled m=
line to the local session description, or new ICE credentials in the
description, indicating an ICE restart. Use of new ICE credentials can
be triggered explicitly by the application, or implicitly by the
browser in response to changes in the ICE configuration.
When the ICE configuration changes in a way that requires a new
gathering phase, a 'needs-ice-restart' bit is set. When this bit is
set, calls to the createOffer API will generate new ICE credentials.
This bit is cleared by a call to the setLocalDescription API with
new ICE credentials from either an offer or an answer, i.e., from either
a local- or remote-initiated ICE restart.When a new
gathering phase starts, the ICE Agent will notify the application that
gathering is occurring through an event. Then, when each new ICE
candidate becomes available, the ICE Agent will supply it to the
application via an additional event; these candidates will also
automatically be added to the current and/or pending local session
description. Finally, when all candidates have been gathered, an
event will be dispatched to signal that the gathering process is
complete.Note that gathering phases only gather the candidates needed by
new/recycled/restarting m= lines; other m= lines continue to use their
existing candidates. Also, when bundling is active, candidates are
only gathered (and exchanged) for the m= lines referenced in BUNDLE-tags,
as described in .
Candidate trickling is a technique through which a caller may
incrementally provide candidates to the callee after the initial
offer has been dispatched; the semantics of "Trickle ICE" are
defined in . This
process allows the callee to begin acting upon the call and setting
up the ICE (and perhaps DTLS) connections immediately, without
having to wait for the caller to gather all possible candidates.
This results in faster media setup in cases where gathering is not
performed prior to initiating the call.JSEP supports optional candidate trickling by providing APIs, as
described above, that
provide control and feedback on the ICE candidate gathering process.
Applications that support candidate trickling can send the initial
offer immediately and send individual candidates when they get the
notified of a new candidate; applications that do not support this
feature can simply wait for the indication that gathering is
complete, and then create and send their offer, with all the
candidates, at this time.Upon receipt of trickled candidates, the receiving application
will supply them to its ICE Agent. This triggers the ICE Agent to
start using the new remote candidates for connectivity checks.
As with session descriptions, the syntax of the IceCandidate
object provides some abstraction, but can be easily converted to
and from the SDP candidate lines.The candidate lines are the only SDP information that is
contained within IceCandidate, as they represent the only
information needed that is not present in the initial offer (i.e.,
for trickle candidates). This information is carried with the same
syntax as the "candidate-attribute" field defined for ICE. For
example:The IceCandidate object also contains fields to
indicate which m= line it should be associated with. The m= line can be
identified in one of two ways; either by a m= line index, or a MID.
The m= line index is a zero-based index, with index N referring to the
N+1th m= line in the SDP sent by the entity which sent the IceCandidate.
The MID uses the "media stream identification" attribute, as defined in
, Section 4, to identify the m= line.
JSEP implementations creating an ICE Candidate object MUST populate
both of these fields. Implementations receiving an ICE Candidate
object MUST use the MID if present, or the m= line index, if not
(as it could have come from a non-JSEP endpoint).Typically, when gathering ICE candidates, the browser will gather
all possible forms of initial candidates - host, server reflexive, and
relay. However, in certain cases, applications may want to have more
specific control over the gathering process, due to privacy or related
concerns. For example, one may want to suppress the use of host
candidates, to avoid exposing information about the local network, or
go as far as only using relay candidates, to leak as little
location information as possible (note that these choices come with
corresponding operational costs). To accomplish this, the browser
MUST allow the application to restrict which ICE candidates are used
in a session. Note that this filtering is applied on top of any
restrictions the browser chooses to enforce regarding which IP
addresses are permitted for the application, as discussed in
.
There may also be cases where the application wants to change which
types of candidates are used while the session is active. A prime
example is where a callee may initially want to use only relay
candidates, to avoid leaking location information to an arbitrary
caller, but then change to use all candidates (for lower operational
cost) once the user has indicated they want to take the call. For this
scenario, the browser MUST allow the candidate policy to be changed
in mid-session, subject to the aforementioned interactions with local
policy.To administer the ICE candidate policy, the browser will
determine the current setting at the start of each gathering phase.
Then, during the gathering phase, the browser MUST NOT
expose candidates disallowed by the current policy to the
application, use them as the source of connectivity checks, or
indirectly expose them via other fields, such as the raddr/rport
attributes for other ICE candidates.
Later, if a different policy is specified by the application, the
application can apply it by kicking off a new gathering phase via an
ICE restart.
JSEP applications typically inform the browser to begin ICE
gathering via the information supplied to setLocalDescription, as this
is where the app specifies the number of media streams, and thereby
ICE components, for which to gather candidates. However, to accelerate
cases where the application knows the number of ICE components to use
ahead of time, it may ask the browser to gather a pool of potential
ICE candidates to help ensure rapid media setup.
When setLocalDescription is eventually called, and the
browser goes to gather the needed ICE candidates, it SHOULD start by
checking if any candidates are available in the pool. If there are
candidates in the pool, they SHOULD be handed to the application
immediately via the ICE candidate event. If the pool becomes
depleted, either because a larger-than-expected number of ICE
components is used, or because the pool has not had enough time to
gather candidates, the remaining candidates are gathered as usual.
One example of where this concept is useful is an application that
expects an incoming call at some point in the future, and wants to
minimize the time it takes to establish connectivity, to avoid
clipping of initial media. By pre-gathering candidates into the pool,
it can exchange and start sending connectivity checks from these
candidates almost immediately upon receipt of a call. Note though that
by holding on to these pre-gathered candidates, which will be kept
alive as long as they may be needed, the application will consume
resources on the STUN/TURN servers it is using.
Video size negotiation is the process through which a receiver can
use the "a=imageattr" SDP attribute to indicate
what video frame sizes it is capable of receiving. A receiver may have
hard limits on what its video decoder can process, or it may wish to
constrain what it receives due to application preferences, e.g. a
specific size for the window in which the video will be displayed.
In order to determine the limits on what video resolution a
receiver wants to receive, it will intersect its decoder hard
limits with any mandatory constraints that have been applied to
the associated MediaStreamTrack. If the decoder limits
are unknown, e.g. when using a software decoder, the mandatory
constraints are used directly. For the answerer, these mandatory
constraints can be applied to the remote MediaStreamTracks that
are created by a setRemoteDescription call, and will affect the
output of the ensuing createAnswer call. Any constraints set
after setLocalDescription is used to set the answer will result
in a new offer-answer exchange. For the offerer, because it
does not know about any remote MediaStreamTracks until it receives
the answer, the offer can only reflect decoder hard limits.
If the offerer wishes to set mandatory constraints on video
resolution, it must do so after receiving the answer, and the result
will be a new offer-answer to communicate them.If there are no known decoder limits
or mandatory constraints, the "a=imageattr" attribute SHOULD be
omitted.Otherwise, an "a=imageattr" attribute is created with "recv"
direction, and the resulting resolution space formed by intersecting
the decoder limits and constraints is used to specify its minimum
and maximum x= and y= values. If the intersection is the null
set, i.e., there are no resolutions that
are permitted by both the decoder and the mandatory constraints,
this SHOULD be represented by x=0 and y=0 values.The rules here express a single set of preferences, and therefore,
the "a=imageattr" q= value is not important. It SHOULD be set to 1.0.The "a=imageattr" field is payload type specific.
When all video codecs supported have the same capabilities, use of a
single attribute, with the wildcard payload type (*), is RECOMMENDED.
However, when the supported video codecs have differing capabilities,
specific "a=imageattr" attributes MUST be inserted for each payload
type.
As an example, consider a system with a HD-capable, multiformat
video decoder, where the application has constrained the received
track to at most 360p. In this case, the implemention would generate this
attribute:a=imageattr:* recv [x=[16:640],y=[16:360],q=1.0]This declaration indicates that the receiver is capable of decoding
any image resolution from 16x16 up to 640x360 pixels. defines "a=imageattr" to be an advisory
field. This means that it does not absolutely constrain the video
formats that the sender can use, but gives an indication of the
preferred values.This specification prescribes more specific behavior.
When a sender of a given MediaStreamTrack, which is producing video
of a certain resolution, receives an "a=imageattr recv"
attribute, it MUST check to see if the original resolution
meets the size criteria specified in the attribute, and adapt the
resolution accordingly by scaling (if appropriate).
Note that when considering a MediaStreamTrack that
is producing rotated video, the unrotated resolution MUST be used.
This is required regardless of whether the receiver supports
performing receive-side rotation (e.g., through CVO), as it
significantly simplifies the matching logic.For an "a=imageattr recv" attribute, only size limits are
considered. Any other values, e.g. aspect ratio, MUST be ignored.
When communicating with a non-JSEP endpoint, multiple relevant
"a=imageattr recv" attributes may be received. If this occurs,
attributes other than the one with the highest "q=" value MUST
be ignored.If an "a=imageattr recv" attribute references a different
video codec than what has been selected for the MediaStreamTrack,
it MUST be ignored.If the original resolution matches the size limits in the
attribute, the track MUST be transmitted untouched.If the original resolution exceeds the size limits in the
attribute, the sender SHOULD apply downscaling to the
output of the MediaStreamTrack in order to satisfy the limits.
Downscaling MUST NOT change the track aspect ratio.If the original resolution is less than the size limits in the
attribute, upscaling is needed, but this may
not be appropriate in all cases. To address this concern, the
application can set an upscaling policy for each sent track.
For this case, if upscaling is permitted by policy, the sender
SHOULD apply upscaling in order to provide the desired resolution.
Otherwise, the sender MUST NOT apply upscaling.
The sender SHOULD NOT upscale in other cases, even if the policy
permits it.
Upscaling MUST NOT change the track aspect ratio.
If there is no appropriate and permitted scaling mechanism
that allows the received size limits to be satisfied,
the sender MUST NOT transmit the track.
In the special case of receiving a maximum resolution of [0, 0],
as described above, the sender MUST NOT transmit the track.
Some call signaling systems allow various types of forking where an
SDP Offer may be provided to more than one device. For example, SIP
defines both a "Parallel Search" and
"Sequential Search". Although these are primarily signaling level
issues that are outside the scope of JSEP, they do have some impact on
the configuration of the media plane that is relevant. When forking
happens at the signaling layer, the Javascript application responsible
for the signaling needs to make the decisions about what media should
be sent or received at any point of time, as well as which remote
endpoint it should communicate with; JSEP is used to make sure the
media engine can make the RTP and media perform as required by the
application. The basic operations that the applications can have the
media engine do are: Start exchanging media with a given remote peer, but keep all the
resources reserved in the offer.Start exchanging media with a given remote peer, and free any
resources in the offer that are not being used.Sequential forking involves a call being dispatched to multiple
remote callees, where each callee can accept the call, but only one
active session ever exists at a time; no mixing of received media is
performed.JSEP handles sequential forking well, allowing the application to
easily control the policy for selecting the desired remote endpoint.
When an answer arrives from one of the callees, the application can
choose to apply it either as a provisional answer, leaving open the
possibility of using a different answer in the future, or apply it
as a final answer, ending the setup flow.In a "first-one-wins" situation, the first answer will be applied
as a final answer, and the application will reject any subsequent
answers. In SIP parlance, this would be ACK + BYE.In a "last-one-wins" situation, all answers would be applied as
provisional answers, and any previous call leg will be terminated.
At some point, the application will end the setup process, perhaps
with a timer; at this point, the application could reapply the
pending remote description as a final answer.Parallel forking involves a call being dispatched to multiple
remote callees, where each callee can accept the call, and multiple
simultaneous active signaling sessions can be established as a
result. If multiple callees send media at the same time, the
possibilities for handling this are described in Section 3.1 of
. Most SIP devices today only support
exchanging media with a single device at a time, and do not try to
mix multiple early media audio sources, as that could result in a
confusing situation. For example, consider having a European
ringback tone mixed together with the North American ringback tone -
the resulting sound would not be like either tone, and would confuse
the user. If the signaling application wishes to only exchange media
with one of the remote endpoints at a time, then from a media engine
point of view, this is exactly like the sequential forking case.In the parallel forking case where the Javascript application
wishes to simultaneously exchange media with multiple peers, the
flow is slightly more complex, but the Javascript application can
follow the strategy that describes
using UPDATE. The UPDATE approach allows the
signaling to set up a separate media flow for each peer that it
wishes to exchange media with. In JSEP, this offer used in the
UPDATE would be formed by simply creating a new PeerConnection and
making sure that the same local media streams have been added into
this new PeerConnection. Then the new PeerConnection object would
produce a SDP offer that could be used by the signaling to perform
the UPDATE strategy discussed in .As a result of sharing the media streams, the application will
end up with N parallel PeerConnection sessions, each with a local
and remote description and their own local and remote addresses. The
media flow from these sessions can be managed by specifying SDP
direction attributes in the descriptions, or the application can
choose to play out the media from all sessions mixed together. Of
course, if the application wants to only keep a single session, it
can simply terminate the sessions that it no longer needs.This section details the basic operations that must be present to
implement JSEP functionality. The actual API exposed in the W3C API may
have somewhat different syntax, but should map easily to these
concepts.The PeerConnection constructor allows the application to specify
global parameters for the media session, such as the STUN/TURN servers
and credentials to use when gathering candidates, as well as the
initial ICE candidate policy and pool size, and also the bundle
policy to use.If an ICE candidate policy is specified, it functions as
described in , causing the
browser to only surface the permitted candidates (including
any internal browser filtering) to the application,
and only use those candidates for connectivity checks. The set of
available policies is as follows:
All candidates permitted by browser policy
will be gathered and used.All candidates except relay candidates will
be filtered out. This obfuscates the location information that
might be ascertained by the remote peer from the received
candidates. Depending on how the application deploys its relay
servers, this could obfuscate location to a metro or possibly
even global level.The default ICE candidate policy MUST be set to "all"
as this is generally the desired policy, and also
typically reduces use of application TURN server
resources significantly.If a size is specified for the ICE candidate pool, this indicates
the number of ICE components to pre-gather candidates for. Because
pre-gathering results in utilizing STUN/TURN server resources for
potentially long periods of time, this must only
occur upon application request, and therefore the default
candidate pool size MUST be zero.
The
application can specify its preferred policy regarding use of bundle,
the multiplexing mechanism defined in .
Regardless of policy, the application will always try to negotiate bundle
onto a single transport,
and will offer a single bundle group across all media section; use of
this single transport is contingent upon the answerer accepting bundle.
However, by specifying a policy from the list below, the application can
control exactly how aggressively it will try to bundle media streams together,
which affects how it will interoperate with a non-bundle-aware endpoint.
When negotiating with a non-bundle-aware endpoint, only the streams not
marked as bundle-only streams will be established.
The set of available policies is as follows:
The first media section of each type
(audio, video, or application) will contain transport parameters,
which will allow an answerer to unbundle that section.
The second and any subsequent media section of each type
will be marked bundle-only. The result is that if there are N
distinct media types, then candidates will be gathered for for N media
streams. This policy balances
desire to multiplex with the need to ensure basic audio and video can
still be negotiated in legacy cases.
All media sections will contain transport
parameters; none will be marked as bundle-only. This policy will allow all streams
to be received by non-bundle-aware endpoints, but require separate
candidates to be gathered for each media stream.
Only the first media section will contain
transport parameters; all streams other than the first
will be marked as bundle-only. This policy aims to minimize
candidate gathering and maximize multiplexing, at the cost of less
compatibility with legacy endpoints.
As it provides the best tradeoff between performance and
compatibility with legacy endpoints, the default bundle policy
MUST be set to "balanced".
The
application can specify its preferred policy regarding use of RTP/RTCP
multiplexing using one of the following policies:
The browser will gather both RTP and
RTCP candidates but also will offer "a=rtcp-mux", thus allowing
for compatibility with either multiplexing or non-multiplexing endpoints.The browser will only gather RTP candidates.
This halves the number of candidates
that the offerer needs to gather.
When acting as answerer, the browser will reject any m= section that does
not provide an "a=rtcp-mux" attribute.
The default multiplexing policy MUST be set to
"require". Implementations MAY choose to reject attempts by the
application to set the multiplexing policy to "negotiate".
The createOffer method generates a blob of SDP that contains a
offer with the supported
configurations for the session, including descriptions of the local
MediaStreams attached to this PeerConnection, the codec/RTP/RTCP
options supported by this implementation, and any candidates that
have been gathered by the ICE Agent. An options parameter may be
supplied to provide additional control over the generated offer.
This options parameter should allow for the following
manipulations to be performed:
To indicate support for a media type even if no
MediaStreamTracks of that type have been added to the session
(e.g., an audio call that wants to receive video.)To trigger an ICE restart, for the purpose of reestablishing
connectivity.In the initial offer, the generated SDP will contain all desired
functionality for the session (functionality that is supported but
not desired by default may be omitted); for each SDP line, the
generation of the SDP will follow the process defined for generating
an initial offer from the document that specifies the given SDP
line. The exact handling of initial offer generation is detailed in
below.In the event createOffer is called after the session is
established, createOffer will generate an offer to modify the
current session based on any changes that have been made to the
session, e.g. adding or removing MediaStreams, or requesting an ICE
restart. For each existing stream, the generation of each SDP line
must follow the process defined for generating an updated offer from
the RFC that specifies the given SDP line. For each new stream,
the generation of the SDP must follow the process of generating an
initial offer, as mentioned above. If no changes have been made, or
for SDP lines that are unaffected by the requested changes, the
offer will only contain the parameters negotiated by the last
offer-answer exchange. The exact handling of subsequent offer
generation is detailed in . below.Session descriptions generated by createOffer must be immediately
usable by setLocalDescription; if a system has limited resources
(e.g. a finite number of decoders), createOffer should return an
offer that reflects the current state of the system, so that
setLocalDescription will succeed when it attempts to acquire those
resources. Because this method may need to inspect the system state
to determine the currently available resources, it may be
implemented as an async operation.Calling this method may do things such as generate new ICE
credentials, but does not result in candidate gathering, or cause
media to start or stop flowing.The createAnswer method generates a blob of SDP that contains a
SDP answer with the supported
configuration for the session that is compatible with the parameters
supplied in the most recent call to setRemoteDescription, which MUST
have been called prior to calling createAnswer.
Like createOffer, the returned blob contains
descriptions of the local MediaStreams attached to this
PeerConnection, the codec/RTP/RTCP options negotiated for this
session, and any candidates that have been gathered by the ICE
Agent. An options parameter may be supplied to provide additional
control over the generated answer.As an answer, the generated SDP will contain a specific
configuration that specifies how the media plane should be
established; for each SDP line, the generation of the SDP must
follow the process defined for generating an answer from the
document that specifies the given SDP line. The exact handling of
answer generation is detailed in . below.Session descriptions generated by createAnswer must be
immediately usable by setLocalDescription; like createOffer, the
returned description should reflect the current state of the system.
Because this method may need to inspect the system state to
determine the currently available resources, it may need to be
implemented as an async operation.Calling this method may do things such as generate new ICE
credentials, but does not trigger candidate gathering or change
media state.Session description objects (RTCSessionDescription) may be of
type "offer", "pranswer", "answer" or "rollback". These types provide
information as to how the description parameter should be parsed,
and how the media state should be changed."offer" indicates that a description should be parsed as an
offer; said description may include many possible media
configurations. A description used as an "offer" may be applied
anytime the PeerConnection is in a stable state, or as an update to
a previously supplied but unanswered "offer"."pranswer" indicates that a description should be parsed as an
answer, but not a final answer, and so should not result in the
freeing of allocated resources. It may result in the start of media
transmission, if the answer does not specify an inactive media
direction. A description used as a "pranswer" may be applied as a
response to an "offer", or an update to a previously sent
"pranswer"."answer" indicates that a description should be parsed as an
answer, the offer-answer exchange should be considered complete, and
any resources (decoders, candidates) that are no longer needed can
be released. A description used as an "answer" may be applied as a
response to an "offer", or an update to a previously sent
"pranswer".The only difference between a provisional and final answer is
that the final answer results in the freeing of any unused resources
that were allocated as a result of the offer. As such, the
application can use some discretion on whether an answer should be
applied as provisional or final, and can change the type of the
session description as needed. For example, in a serial forking
scenario, an application may receive multiple "final" answers, one
from each remote endpoint. The application could choose to accept
the initial answers as provisional answers, and only apply an answer
as final when it receives one that meets its criteria (e.g. a live
user instead of voicemail)."rollback" is a special session description type implying
that the state machine should be rolled back to the previous
state, as described in .
The contents MUST be empty.Most web applications will not need to create answers using the
"pranswer" type. While it is good practice to send an immediate
response to an "offer", in order to warm up the session transport
and prevent media clipping, the preferred handling for a web
application would be to create and send an "inactive" final answer
immediately after receiving the offer. Later, when the called
user actually accepts the call, the application can create a new
"sendrecv" offer to update the previous offer/answer pair and start
the media flow. While this could also be done with an inactive
"pranswer", followed by a sendrecv "answer", the initial "pranswer"
leaves the offer-answer exchange open, which means that
neither side can send an updated offer during this time.As an example, consider a typical web application that will
set up a data channel, an audio channel, and a video channel. When
an endpoint receives an offer with these channels, it could send an
answer accepting the data channel for two-way data, and accepting
the audio and video tracks as inactive or receive-only. It could
then ask the user to accept the call, acquire the local media
streams, and send a new offer to the remote side moving the audio
and video to be two-way media. By the time the human has accepted
the call and triggered the new offer, it is likely that the ICE and
DTLS handshaking for all the channels will already have finished.Of course, some applications may not be able to perform this
double offer-answer exchange, particularly ones that are attempting
to gateway to legacy signaling protocols. In these cases, "pranswer"
can still provide the application with a mechanism to warm up the
transport.In certain situations it may be desirable to "undo" a change
made to setLocalDescription or setRemoteDescription. Consider a
case where a call is ongoing, and one side wants to change some of
the session parameters; that side generates an updated offer and
then calls setLocalDescription. However, the remote side, either
before or after setRemoteDescription, decides it does not want to
accept the new parameters, and sends a reject message back to the
offerer. Now, the offerer, and possibly the answerer as well, need
to return to a stable state and the previous local/remote
description. To support this, we introduce the concept of
"rollback".A rollback discards any proposed changes to the session,
returning the state machine to the stable state, and setting the
pending local and/or remote description back to null.
Any resources or candidates that were allocated by the
abandoned local description are discarded; any media that is
received will be processed according to the previous local and
remote descriptions. Rollback can only be used to cancel proposed
changes; there is no support for rolling back from a stable state to
a previous stable state. Note that this implies that once the answerer
has performed setLocalDescription with his answer, this cannot
be rolled back.
A rollback is performed by supplying a session description of
type "rollback" with empty contents to either setLocalDescription or
setRemoteDescription, depending on which was most recently used
(i.e. if the new offer was supplied to setLocalDescription, the
rollback should be done using setLocalDescription as well).
The setLocalDescription method instructs the PeerConnection to
apply the supplied session description as its local configuration. The type
field indicates whether the description should be processed as an offer,
provisional answer, or final answer; offers and answers are checked
differently, using the various rules that exist for each SDP
line.This API changes the local media state; among other things, it
sets up local resources for receiving and decoding media. In order
to successfully handle scenarios where the application wants to
offer to change from one media format to a different, incompatible
format, the PeerConnection must be able to simultaneously support
use of both the current and pending local descriptions (e.g. support
codecs that exist in both descriptions) until a final answer is
received, at which point the PeerConnection can fully adopt the
pending local description, or roll back to the current description
if the remote side denied the change.This API indirectly controls the candidate gathering process.
When a local description is supplied, and the number of transports
currently in use does not match the number of transports needed by
the local description, the PeerConnection will create transports as
needed and begin gathering candidates for them.If setRemoteDescription was previously called with an offer, and
setLocalDescription is called with an answer (provisional or final),
and the media directions are compatible, and media are available to
send, this will result in the starting of media transmission.The setRemoteDescription method instructs the PeerConnection to
apply the supplied session description as the desired remote configuration. As
in setLocalDescription, the type field of the description indicates how it
should be processed.This API changes the local media state; among other things, it
sets up local resources for sending and encoding media.If setLocalDescription was previously called with an offer, and
setRemoteDescription is called with an answer (provisional or final),
and the media directions are compatible, and media are available to
send, this will result in the starting of media transmission.The currentLocalDescription method returns a copy of the current
negotiated local description - i.e., the local description from the
last successful offer/answer exchange - in addition to any local
candidates that have been generated by the ICE Agent since the
local description was set.A null object will be returned if an offer/answer exchange has
not yet been completed.The pendingLocalDescription method returns a copy of the local
description currently in negotiation - i.e., a local offer set
without any corresponding remote answer - in addition to any local
candidates that have been generated by the ICE Agent since the
local description was set.A null object will be returned if the state of the PeerConnection
is "stable" or "have-remote-offer".The currentRemoteDescription method returns a copy of the current
negotiated remote description - i.e., the remote description from the
last successful offer/answer exchange - in addition to any remote
candidates that have been supplied via processIceMessage since the
remote description was set.A null object will be returned if an offer/answer exchange has
not yet been completed.The pendingRemoteDescription method returns a copy of the remote
description currently in negotiation - i.e., a remote offer set
without any corresponding local answer - in addition to any remote
candidates that have been supplied via processIceMessage since the
remote description was set.A null object will be returned if the state of the PeerConnection
is "stable" or "have-local-offer".
The canTrickleIceCandidates property indicates whether the remote
side supports receiving trickled candidates. There are three
potential values:
No SDP has been received from the other side,
so it is not known if it can handle trickle. This is the initial
value before setRemoteDescription() is called.SDP has been received from the other side
indicating that it can support trickle.SDP has been received from the other side
indicating that it cannot support trickle.As described in , JSEP
implementations always provide candidates to the application
individually, consistent with what is needed for Trickle ICE.
However, applications can use the canTrickleIceCandidates property to
determine whether their peer can actually do Trickle ICE, i.e.,
whether it is safe to send an initial offer or answer followed later
by candidates as they are gathered. As "true" is the only value that
definitively indicates remote Trickle ICE support, an application
which compares canTrickleIceCandidates against "true" will by default
attempt Half Trickle on initial offers and Full Trickle on subsequent
interactions with a Trickle ICE-compatible agent.
The setConfiguration method allows the global configuration of the
PeerConnection, which was initially set by constructor parameters, to
be changed during the session. The effects of this method call depend
on when it is invoked, and differ depending on which specific
parameters are changed:Any changes to the STUN/TURN servers to use affect
the next gathering phase. If an ICE gathering phase has already
started or completed, the 'needs-ice-restart'
bit mentioned in
will be set. This will cause the next call to createOffer to
generate new ICE credentials, for the purpose of forcing an ICE
restart and kicking off a new gathering phase, in which the new
servers will be used. If the ICE candidate pool has a nonzero size,
any existing candidates will be discarded, and new candidates will
be gathered from the new servers.
Any change to the ICE candidate policy affects the next
gathering phase. If an ICE gathering phase has already
started or completed, the 'needs-ice-restart' bit will be set.
Either way, changes to the policy have
no effect on the candidate pool, because pooled candidates are not
surfaced to the application until a gathering phase occurs, and so
any necessary filtering can still be done on any pooled candidates.
Any changes to the ICE candidate pool size take effect
immediately; if increased, additional candidates are pre-gathered;
if decreased, the now-superfluous candidates are discarded.
The bundle and RTCP-multiplexing policies MUST NOT be
changed after the construction of the PeerConnection.
This call may result in a change to the state of the ICE Agent,
and may result in a change to media state if it results in
connectivity being established.The addIceCandidate method provides a remote candidate to the ICE
Agent, which, if parsed successfully, will be added to the current
and/or pending remote description according to the rules defined for
Trickle ICE. Connectivity checks will be sent to the new candidate.This call will result in a change to the state of the ICE Agent,
and may result in a change to media state if it results in
connectivity being established.This section describes the specific procedures to be followed when
creating and parsing SDP objects.JSEP implementations must comply with the specifications listed
below that govern the creation and processing of offers and answers.
The first set of specifications is the "mandatory-to-implement" set.
All implementations must support these behaviors, but may not use all
of them if the remote side, which may not be a JSEP endpoint, does not
support them.The second set of specifications is the "mandatory-to-use" set.
The local JSEP endpoint and any remote endpoint must indicate support
for these specifications in their session descriptions.This list of mandatory-to-implement specifications is derived
from the requirements outlined in
.
is the base SDP specification
and MUST be implemented. MUST be supported for signaling
the UDP/TLS/RTP/SAVPF ,
TCP/DTLS/RTP/SAVPF
,
"UDP/DTLS/SCTP" ,
and "TCP/DTLS/SCTP"
RTP profiles.
MUST be implemented for
signaling the ICE credentials and candidate lines corresponding
to each media stream. The ICE implementation MUST be a Full
implementation, not a Lite implementation. MUST be implemented to signal
DTLS certificate fingerprints. MUST NOT be implemented to
signal SDES SRTP keying information.The grouping framework MUST
be implemented for signaling grouping information, and MUST be
used to identify m= lines via the a=mid attribute. MUST be
supported, in order to signal associations between RTP objects
and W3C MediaStreams and MediaStreamTracks in a standard way.
The bundle mechanism in MUST be
supported to signal the ability to multiplex RTP streams on a
single UDP port, in order to avoid excessive use of port number
resources.The SDP attributes of "sendonly", "recvonly", "inactive", and
"sendrecv" from MUST be
implemented to signal information about media direction. MUST be implemented to signal
RTP SSRC values and grouping semantics. MUST be implemented to signal
RTCP based feedback. MUST be implemented to signal
multiplexing of RTP and RTCP. MUST be implemented to signal
reduced-size RTCP messages. MUST be implemented to signal
RTX payload type associations. with bandwidth modifiers MAY
be supported for specifying RTCP bandwidth as a fraction of the
media bandwidth, RTCP fraction allocated to the senders and
setting maximum media bit-rate boundaries.TODO: any others?As required by , Section 5.13, JSEP
implementations MUST ignore unknown attribute (a=) lines.All session descriptions handled by JSEP endpoints,
both local and remote, MUST indicate support for the following
specifications. If any of these are absent, this omission MUST be
treated as an error. ICE, as specified in ,
MUST be used. Note that the remote endpoint may use a Lite
implementation; implementations MUST properly handle remote
endpoints which do ICE-Lite.DTLS or DTLS-SRTP ,
MUST be used, as appropriate for the media type, as
specified in
For media m= sections, JSEP endpoints MUST support both the "UDP/TLS/
RTP/SAVPF" and "TCP/DTLS/RTP/SAVPF" profiles and MUST indicate one of
these two profiles for each media m= line they produce in an offer.
For data m= sections, JSEP endpoints must support both the "UDP/DTLS/SCTP"
and "TCP/DTLS/SCTP" profiles and MUST indicate one of these two
profiles for each data m= line they produce in an offer.
Because ICE can select either TCP or UDP transport depending on
network conditions, both advertisements are consistent with
ICE eventually selecting either either UDP or TCP.
Unfortunately, in an attempt at compatibility, some endpoints
generate other profile strings even when they mean to support one of
these profiles. For instance, an endpoint might generate "RTP/AVP"
but supply "a=fingerprint" and "a=rtcp-fb" attributes, indicating
its willingness to support "(UDP,TCP)/TLS/RTP/SAVPF". In order to
simplify compatibility with such endpoints, JSEP endpoints MUST
follow the following rules when processing the media m= sections in
an offer:
The profile in any "m=" line in any answer MUST exactly match the profile
provided in the offer.Any profile matching the following patterns MUST be accepted:
"RTP/[S]AVP[F]" and "(UDP/TCP)/TLS/RTP/SAVP[F]"Because DTLS-SRTP is REQUIRED, the choice of SAVP or
AVP has no effect; support for DTLS-SRTP is determined by the presence
of one or more "a=fingerprint" attribute. Note that lack of an "a=fingerprint"
attribute will lead to negotiation failure.
The use of AVPF or AVP simply controls the timing
rules used for RTCP feedback. If AVPF is provided, or an "a=rtcp-fb"
attribute is present, assume AVPF timing, i.e. a default value of
"trr-int=0". Otherwise, assume that AVPF is being used in
an AVP compatible mode and use AVP timing, i.e., "trr-int=4".For data m= sections, JSEP endpoints MUST support receiving the "UDP/
DTLS/SCTP", "TCP/DTLS/SCTP", or "DTLS/SCTP" (for backwards
compatibility) profiles. Note that re-offers by JSEP endpoints MUST use the correct profile
strings even if the initial offer/answer exchange used an (incorrect)
older profile string.When createOffer is called, a new SDP description must be created
that includes the functionality specified in . The exact details of this
process are explained below.When createOffer is called for the first time, the result is
known as the initial offer.The first step in generating an initial offer is to generate
session-level attributes, as specified in , Section 5. Specifically: The first SDP line MUST be "v=0", as specified in , Section 5.1The second SDP line MUST be an "o=" line, as specified in
, Section 5.2. The value of the
<username> field SHOULD be "-". The value of the
<sess-id> field SHOULD be a cryptographically random
number. To ensure uniqueness, this number SHOULD be at least 64
bits long. The value of the <sess-version> field SHOULD be
zero. The value of the <nettype> <addrtype>
<unicast-address> tuple SHOULD be set to a non-meaningful
address, such as IN IP4 0.0.0.0, to prevent leaking the local
address in this field. As mentioned in , the entire o= line needs to be unique,
but selecting a random number for <sess-id> is sufficient
to accomplish this.The third SDP line MUST be a "s=" line, as specified in , Section 5.3; to match the "o=" line, a
single dash SHOULD be used as the session name, e.g. "s=-".
Note that this differs from the advice in
which proposes a single space, but as both "o=" and "s="
are meaningless, having the same meaningless value seems clearer.
Session Information ("i="), URI ("u="), Email Address ("e="),
Phone Number ("p="), Bandwidth ("b="), Repeat Times ("r="), and
Time Zones ("z=") lines are not useful in this context and
SHOULD NOT be included.Encryption Keys ("k=") lines do not provide sufficient
security and MUST NOT be included.A "t=" line MUST be added, as specified in , Section 5.9; both <start-time>
and <stop-time> SHOULD be set to zero, e.g. "t=0 0".An "a=ice-options" line with the "trickle" option MUST be
added, as specified in ,
Section 4.The next step is to generate m= sections, as specified in
Section 5.14, for each
MediaStreamTrack that has been added to the PeerConnection via the
addStream method. (Note that this method takes a MediaStream, which
can contain multiple MediaStreamTracks, and therefore multiple m=
sections can be generated even if addStream is only called once.)
m=sections MUST be sorted first by the order in which the
MediaStreams were added to the PeerConnection, and
then by the alphabetical ordering of the media type for the MediaStreamTrack.
For example, if a MediaStream containing both an audio and a video
MediaStreamTrack is added to a PeerConnection, the resultant m=audio
section will precede the m=video section. If a second MediaStream
containing an audio MediaStreamTrack was added, it would follow
the m=video section.Each m= section, provided it is not marked as
bundle-only, MUST generate a unique set of ICE credentials and gather
its own unique set of ICE candidates. Bundle-only m= sections MUST NOT
contain any ICE credentials and MUST NOT gather any candidates.
For DTLS, all m= sections MUST use the certificate for the
identity that has been specified for the PeerConnection; as a result,
they MUST all have the same fingerprint
value, or this value MUST be a session-level attribute.Each m= section should be generated as specified in , Section 5.14. For the m= line itself, the
following rules MUST be followed:The port value is set to the port of the default ICE candidate
for this m= section, but given that no candidates have yet been
gathered, the "dummy" port value of 9 (Discard) MUST be used, as
indicated in ,
Section 5.1.To properly indicate use of DTLS, the <proto> field MUST
be set to "UDP/TLS/RTP/SAVPF", as specified in
, Section 8,
if the default candidate uses UDP transport, or "TCP/DTLS/RTP/SAVPF",
as specified in
if the default candidate uses TCP transport.The m= line MUST be followed immediately by a "c=" line, as specified
in , Section 5.7. Again, as no candidates
have yet been gathered, the "c=" line must contain the "dummy" value
"IN IP4 0.0.0.0", as defined in ,
Section 5.1.Each m= section MUST include the following attribute lines:
An "a=mid" line, as specified in , Section 4. When generating mid values,
it is RECOMMENDED that the values be 3 bytes or less, to allow
them to efficiently fit into the RTP header extension defined in
,
Section 11.An "a=rtcp" line, as specified in ,
Section 2.1, containing the dummy value "9 IN IP4 0.0.0.0", because
no candidates have yet been gathered.An "a=msid" line, as specified in , Section 2.An "a=sendrecv" line, as specified in , Section 5.1.For each supported codec, "a=rtpmap" and "a=fmtp" lines, as
specified in , Section 6.
The audio and video codecs that MUST be supported are
specified in (see Section 3) and (see Section 5).If this m= section is for media with configurable frame sizes,
e.g. audio, an "a=maxptime" line, indicating the smallest of the
maximum supported frame sizes out of all codecs included above, as
specified in , Section 6.If this m= section is for video media, and there are known
limitations on the size of images which can be decoded, an "a=imageattr"
line, as specified in .
For each primary codec where RTP retransmission should be
used, a corresponding "a=rtpmap" line indicating "rtx" with the
clock rate of the primary codec and an "a=fmtp" line that
references the payload type of the primary codec, as specified
in , Section 8.1.For each supported FEC mechanism, "a=rtpmap" and "a=fmtp" lines,
as specified in , Section 6. The
FEC mechanisms that MUST be supported are specified in
, Section 6, and
specific usage for each media type is outlined in Sections 4
and 5."a=ice-ufrag" and "a=ice-pwd" lines, as specified in , Section 15.4.An "a=fingerprint" line for each of the endpoint's
certificates, as specified in ,
Section 5; the digest algorithm used for the fingerprint MUST
match that used in the certificate signature.
An "a=setup" line, as specified in , Section 4, and clarified for use in
DTLS-SRTP scenarios in , Section
5. The role value in the offer MUST be "actpass".An "a=rtcp-mux" line, as specified in , Section 5.1.1.An "a=rtcp-rsize" line, as specified in , Section 5.For each supported RTP header extension, an "a=extmap" line,
as specified in , Section 5. The
list of header extensions that SHOULD/MUST be supported is
specified in ,
Section 5.2. Any header extensions that require encryption MUST
be specified as indicated in ,
Section 4.For each supported RTCP feedback mechanism, an "a=rtcp-fb"
mechanism, as specified in ,
Section 4.2. The list of RTCP feedback mechanisms that
SHOULD/MUST be supported is specified in , Section 5.1.An "a=ssrc" line, as specified in , Section 4.1, indicating the SSRC to be
used for sending media, along with the mandatory "cname" source
attribute, as specified in Section 6.1, indicating the CNAME for
the source. The CNAME MUST be generated in accordance with
Section 4.9 of .If RTX is supported for this media type, another "a=ssrc"
line with the RTX SSRC, and an "a=ssrc-group" line, as specified
in , section 4.2, with semantics
set to "FID" and including the primary and RTX SSRCs.If FEC is supported for this media type, another "a=ssrc"
line with the FEC SSRC, and an "a=ssrc-group" line with semantics
set to "FEC-FR" and including the primary and FEC SSRCs, as
specified in , section 4.3. For
simplicity, if both RTX and FEC are supported, the FEC SSRC
MUST be the same as the RTX SSRC.If the bundle policy for this PeerConnection is set to
"max-bundle", and this is not the first m= section, or the bundle
policy is set to "balanced", and this is not the first m= section
for this media type, an "a=bundle-only" line.Lastly, if a data channel has been created, a m= section MUST be
generated for data. The <media> field MUST be set to
"application" and the <proto> field MUST be set to
"UDP/DTLS/SCTP" if the default candidate uses UDP transport,
or "TCP/DTLS/SCTP" if the default candidate uses TCP transport . The "fmt" value
MUST be set to "webrtc-datachannel" as specified in
, Section 4.1.
Within the data m= section, the "a=mid",
"a=ice-ufrag", "a=ice-pwd", "a=fingerprint", and "a=setup" lines MUST be included as mentioned
above, along with an "a=fmtp:webrtc-datachannel" line
and an "a=sctp-port" line referencing the SCTP port number
as defined in , Section 4.1.Once all m= sections have been generated, a session-level
"a=group" attribute MUST be added as specified in . This attribute MUST have semantics
"bundle", and MUST include the mid identifiers of each m= section.
The effect of this is that the browser offers all m= sections as one
bundle group. However, whether the m= sections are bundle-only
or not depends on the bundle policy.
The next step is to generate session-level lip sync groups as
defined in , Section 7. For each MediaStream
with more than one MediaStreamTrack, a group of type "LS" MUST be
added that contains the mid values for each MediaStreamTrack in that
MediaStream. Attributes which SDP permits to either be at the session
level or the media level SHOULD generally be at the media
level even if they are identical. This promotes readability,
especially if one of a set of initially identical attributes
is subsequently changed.
Attributes other than the ones specified above MAY be included,
except for the following attributes which are specifically
incompatible with the requirements of , and MUST NOT be
included: "a=crypto""a=key-mgmt""a=ice-lite"Note that when bundle is used, any additional attributes that are
added MUST follow the advice in on how
those attributes interact with bundle.Note that these requirements are in some cases stricter than those
of SDP. Implementations MUST be prepared to accept compliant SDP
even if it would not conform to the requirements for generating
SDP in this specification.When createOffer is called a second (or later) time, or is called
after a local description has already been installed, the processing
is somewhat different than for an initial offer.If the initial offer was not applied using setLocalDescription,
meaning the PeerConnection is still in the "stable" state, the steps
for generating an initial offer should be followed, subject to the
following restriction: The fields of the "o=" line MUST stay the same except for the
<session-version> field, which MUST increment if the session
description changes in any way, including the addition of
ICE candidates.If the initial offer was applied using setLocalDescription, but
an answer from the remote side has not yet been applied, meaning the
PeerConnection is still in the "local-offer" state, an offer is
generated by following the steps in the "stable" state above, along
with these exceptions: The "s=" and "t=" lines MUST stay the same.Each "m=" and c=" line MUST be filled in with the port, protocol,
and address of the default candidate for the m= section, as described
in , Section 4.3. If ICE checking
has already completed for one or more candidate pairs and a candidate
pair is in active use, then that pair MUST be used, even if ICE
has not yet completed. Note that this differs from the guidance in
, Section 9.1.2.2, which only refers to
offers created when ICE has completed. Each
"a=rtcp" attribute line MUST also be filled in with the port and
address of the appropriate default candidate, either the
default RTP or RTCP candidate, depending on whether RTCP multiplexing
is currently active or not. Note that if RTCP
multiplexing is being offered, but not yet active, the default RTCP
candidate MUST be used, as indicated in ,
section 5.1.3. In each case, if no candidates of the desired
type have yet been gathered, dummy values MUST be used, as
described above.Each "a=mid" line MUST stay the same.Each "a=ice-ufrag" and "a=ice-pwd" line MUST stay the
same, unless the ICE configuration has changed (either changes to
the supported STUN/TURN servers, or the ICE candidate policy), or
the "IceRestart" option (
was specified. If the m= section is bundled into another
m= section, it still MUST NOT contain any ICE credentials.
If the m= section is not bundled into another m= section,
for each candidate that has
been gathered during the most recent gathering phase
(see ),
an "a=candidate" line MUST be added, as defined in , Section 4.3., paragraph 3.
If candidate gathering for the section has completed, an
"a=end-of-candidates" attribute MUST be added, as described in
, Section 9.3.
If the m= section is bundled into another m= section, both
"a=candidate" and "a=end-of-candidates" MUST be omitted.
For MediaStreamTracks that are still present, the "a=msid",
"a=ssrc", and "a=ssrc-group" lines MUST stay the same.If any MediaStreamTracks have been removed, either through
the removeStream method or by removing them from an added
MediaStream, their m= sections MUST be marked as recvonly by
changing the value of the
directional attribute to "a=recvonly". The "a=msid", "a=ssrc",
and "a=ssrc-group" lines MUST be removed from the associated m=
sections.If any MediaStreamTracks have been added, and there exist
m= sections of the appropriate media type with no associated
MediaStreamTracks (i.e. as described in the preceding paragraph),
those m= sections MUST be recycled by adding the new
MediaStreamTrack to the m= section. This is done by adding the
necessary "a=msid", "a=ssrc", and "a=ssrc-group" lines to the
recycled m= section, and removing the "a=recvonly" attribute.
If the initial offer was applied using setLocalDescription, and
an answer from the remote side has been applied using
setRemoteDescription, meaning the PeerConnection is in the
"remote-pranswer" or "stable" states, an offer is generated based on
the negotiated session descriptions by following the steps mentioned
for the "local-offer" state above, along with these exceptions:
If a m= section exists in the current local description, but
does not have an associated local MediaStreamTrack (possibly
because said MediaStreamTrack was removed since the last
exchange), a m= section MUST still be generated in the new offer,
as indicated in , Section 8. The
disposition of this section will depend on the state of the
remote MediaStreamTrack associated with this m= section. If one
exists, and it is still in the "live" state, the new m= section
MUST be marked as "a=recvonly", with no "a=msid" or related
attributes present. If no remote MediaStreamTrack
exists, or it is in the "ended" state, the m= section MUST be
marked as rejected, by setting the port to zero, as indicated in
, Section 8.2.If any MediaStreamTracks have been added, and there exist
recvonly m= sections of the appropriate media type with no
associated MediaStreamTracks, or rejected m= sections of any
media type, those m= sections MUST be recycled, and a
local MediaStreamTrack associated with these recycled m=
sections until all such existing m= sections have been
used. This includes any recvonly or rejected m= sections
created by the preceding paragraph.In addition, for each non-recycled, non-rejected m=
section in the new offer, the following adjustments are made based
on the contents of the corresponding m= section in the current
remote description: The m= line and corresponding "a=rtpmap" and "a=fmtp" lines
MUST only include codecs present in the remote description.The RTP header extensions MUST only include those that are
present in the remote description.The RTCP feedback extensions MUST only include those that are
present in the remote description.The "a=rtcp-mux" line MUST only be added if present in the
remote description.The "a=rtcp-rsize" line MUST only be added if present in the
remote description.The "a=group:BUNDLE" attribute MUST include the mid identifiers
specified in the bundle group in the most recent answer, minus any
m= sections that have been marked as rejected, plus any newly added
or re-enabled m= sections. In other words, the bundle attribute must
contain all m= sections that were previously bundled, as long as
they are still alive, as well as any new m= sections.The "LS" groups are generated in the same way as with initial offers.The createOffer method takes as a parameter an RTCOfferOptions
object. Special processing is performed when generating a SDP
description if the following options are present.If the "OfferToReceiveAudio" option is specified, with an integer
value of N, and M audio MediaStreamTracks have been added to the
PeerConnection, the offer MUST include N non-rejected m= sections
with media type "audio", even if N is greater than M.
This allows the offerer to receive audio, including multiple independent
streams, even when not sending it; accordingly, the directional
attribute on the N-M audio m= sections without associated
MediaStreamTracks MUST be set to recvonly.If N is set to a value less than M, the offer MUST mark the
m= sections associated with the M-N most recently added
(since the last setLocalDescription) MediaStreamTracks as sendonly.
This allows the offerer to indicate that it does not want to receive
audio on some or all of its newly created streams.
For m= sections that have previously
been negotiated, this setting has no effect. [TODO: refer to
RTCRtpSender in the future]
For backwards compatibility with pre-standard
versions of this specification, a value of "true"
is interpreted as equivalent to N=1, and "false" as N=0.
If the "OfferToReceiveVideo" option is specified, with an integer
value of N, and M video MediaStreamTracks have been added to the
PeerConnection, the offer MUST include N non-rejected m= sections
with media type "video", even if N is greater than M.
This allows the offerer to receive video, including multiple independent
streams, even when not sending it; accordingly, the directional
attribute on the N-M video m= sections without associated
MediaStreamTracks MUST be set to recvonly.If N is set to a value less than M, the offer MUST mark the
m= sections associated with the M-N most recently added
(since the last setLocalDescription) MediaStreamTracks as sendonly.
This allows the offerer to indicate that it does not want to receive
video on some or all of its newly created streams.
For m= sections that have previously
been negotiated, this setting has no effect. [TODO: refer to
RTCRtpSender in the future]
For backwards compatibility with pre-standard
versions of this specification, a value of "true"
is interpreted as equivalent to N=1, and "false" as N=0.
If the "IceRestart" option is specified, with a value of
"true", the offer MUST indicate an ICE restart by generating new
ICE ufrag and pwd attributes, as specified in , Section
9.1.1.1. If this option is specified on an initial offer, it
has no effect (since a new ICE ufrag and pwd are already
generated). Similarly, if the ICE configuration has changed, this
option has no effect, since new ufrag and pwd attributes will be
generated automatically. This option is primarily useful for
reestablishing connectivity in cases where failures are detected by
the application.
If the "VoiceActivityDetection" option is specified, with a
value of "true", the offer MUST indicate support for silence
suppression in the audio it receives by including comfort noise
("CN") codecs for each offered audio codec, as specified in , Section 5.1, except for codecs that have their
own internal silence suppression support. For codecs that have their own
internal silence suppression support, the appropriate fmtp parameters
for that codec MUST be specified to indicate that silence suppression for
received audio is desired. For example, when using the Opus codec, the
"usedtx=1" parameter would be specified in the offer.
This option allows the endpoint to significantly reduce the amount of
audio bandwidth it receives, at the cost of some fidelity, depending on
the quality of the remote VAD algorithm.If the "VoiceActivityDetection" option is specified, with a value
of "false", the browser MUST NOT emit "CN" codecs.
For codecs that have their own internal silence suppression support,
the appropriate fmtp parameters for that codec MUST be
specified to indicate that silence suppression for received audio is
not desired. For example, when using the Opus codec, the
"usedtx=0" parameter would be specified in the offer.Note that setting the "VoiceActivityDetection" parameter when
generating an offer is a request to receive audio with silence
suppression. It has no impact on whether the local endpoint does
silence suppression for the audio it sends.The "VoiceActivityDetection" option does not have any impact on
the setting of the "vad" value in the signaling of the client to mixer
audio level header extension described in , Section 4. When createAnswer is called, a new SDP description must be created
that is compatible with the supplied remote description as well as the
requirements specified in . The exact details of this
process are explained below.When createAnswer is called for the first time after a remote
description has been provided, the result is known as the initial
answer. If no remote description has been installed, an answer
cannot be generated, and an error MUST be returned.Note that the remote description SDP may not have been created by
a JSEP endpoint and may not conform to all the requirements listed
in . For many cases, this is
not a problem. However, if any mandatory SDP attributes are missing,
or functionality listed as mandatory-to-use above is not present,
this MUST be treated as an error, and MUST cause the affected
m= sections to be marked as rejected.The first step in generating an initial answer is to generate
session-level attributes. The process here is identical to that
indicated in the Initial Offers section above, except
that the "a=ice-options" line, with the "trickle" option as specified in
, Section
4, is only included if such an option was present in the offer.The next step is to generate lip sync groups as defined
in , Section 7. For each
MediaStream with more than one MediaStreamTrack, a group
of type "LS" MUST be added that contains the mid values for
each MediaStreamTrack in that MediaStream. In some cases
this may result in adding a mid to a given LS group that
was not in that LS group in the associated offer. Although
this is not allowed by , it is
allowed when implementing this specification.
[[OPEN ISSUE: This is still under discussion. See:
https://github.com/rtcweb-wg/jsep/issues/162.]]
The next step is to generate m= sections for each m= section that
is present in the remote offer, as specified in , Section 6. For the purposes of this
discussion, any session-level attributes in the offer that are also
valid as media-level attributes SHALL be considered to be present in
each m= section.The next step is to go through each offered m= section.
If there is a local MediaStreamTrack of the same type which has been
added to the PeerConnection via addStream and not yet associated
with a m= section, and the specific m= section is either sendrecv or
recvonly, the MediaStreamTrack will be associated with the m=
section at this time. MediaStreamTracks are assigned to m=
sections using the canonical order described in
. If there are more m= sections of a certain
type than MediaStreamTracks, some m= sections will not have an
associated MediaStreamTrack. If there are more MediaStreamTracks of
a certain type than compatible m= sections, only the first N
MediaStreamTracks will be able to be associated in the constructed
answer. The remainder will need to be associated in a subsequent
offer.For each offered m= section, if the associated remote
MediaStreamTrack has been stopped, and is therefore in state "ended",
and no local MediaStreamTrack has been associated, the corresponding
m= section in the answer MUST be marked as rejected by setting the
port in the m= line to zero, as indicated in
, Section 6., and further processing for
this m= section can be skipped.Provided that is not the case, each m= section in the answer should
then be generated as specified in , Section 6.1. For the m= line itself, the following
rules must be followed:The port value would normally be set to the port of the default ICE candidate
for this m= section, but given that no candidates have yet been gathered,
the "dummy" port value of 9 (Discard) MUST be used, as
indicated in ,
Section 5.1.The <proto> field MUST be set to exactly match the <proto>
field for the corresponding m= line in the offer.The m= line MUST be followed immediately by a "c=" line, as specified
in , Section 5.7. Again, as no candidates
have yet been gathered, the "c=" line must contain the "dummy" value
"IN IP4 0.0.0.0", as defined in ,
Section 5.1.
If the offer supports bundle, all m= sections to be bundled must use
the same ICE credentials and candidates; all m= sections not being
bundled must use unique ICE credentials and candidates. Each
m= section MUST include the following:
If and only if present in the offer, an "a=mid" line, as specified in
, Section 9.1. The "mid" value
MUST match that specified in the offer.An "a=rtcp" line, as specified in ,
Section 2.1, containing the dummy value "9 IN IP4 0.0.0.0", because
no candidates have yet been gathered.If a local MediaStreamTrack has been associated, an "a=msid"
line, as specified in , Section 2.Depending on the directionality of the offer, the disposition
of any associated remote MediaStreamTrack, and the presence of an
associated local MediaStreamTrack, the appropriate directionality
attribute, as specified in ,
Section 6.1. If the offer was sendrecv, and the remote
MediaStreamTrack is still "live", and there is a local
MediaStreamTrack that has been associated, the directionality MUST be
set as sendrecv. If the offer was sendonly, and the remote
MediaStreamTrack is still "live", the directionality MUST be set
as recvonly. If the offer was recvonly, and a local
MediaStreamTrack has been associated, the directionality MUST be
set as sendonly. If the offer was inactive, the directionality
MUST be set as inactive.For each supported codec that is present in the offer,
"a=rtpmap" and "a=fmtp" lines, as specified in , Section 6, and , Section 6.1.
The audio and video codecs that MUST be supported are
specified in (see Section 3) and (see Section 5).
If this m= section is for media with configurable frame sizes,
e.g. audio, an "a=maxptime" line, indicating the smallest of the
maximum supported frame sizes out of all codecs included above, as
specified in , Section 6.If this m= section is for video media, and there are known
limitations on the size of images which can be decoded, an "a=imageattr"
line, as specified in .
If "rtx" is present in the offer, for each primary codec
where RTP retransmission should be used, a corresponding
"a=rtpmap" line indicating "rtx" with the clock rate of the
primary codec and an "a=fmtp" line that references the payload
type of the primary codec, as specified in , Section 8.1.For each supported FEC mechanism, "a=rtpmap" and "a=fmtp" lines,
as specified in , Section 6. The
FEC mechanisms that MUST be supported are specified in
, Section 6, and
specific usage for each media type is outlined in Sections 4
and 5."a=ice-ufrag" and "a=ice-pwd" lines, as specified in , Section 15.4.An "a=fingerprint" line for each of the endpoint's
certificates, as specified in ,
Section 5; the digest algorithm used for the fingerprint MUST
match that used in the certificate signature.
An "a=setup" line, as specified in , Section 4, and clarified for use in
DTLS-SRTP scenarios in , Section
5. The role value in the answer MUST be "active" or "passive";
the "active" role is RECOMMENDED.If present in the offer, an "a=rtcp-mux" line, as specified
in , Section 5.1.1. If the
"require" RTCP multiplexing policy is set and no "a=rtcp-mux"
line is present in the offer, then the m=line MUST be
marked as rejected by setting the port in the m= line to zero, as
indicated in , Section 6.
If present in the offer, an "a=rtcp-rsize" line, as specified
in , Section 5.For each supported RTP header extension that is present in
the offer, an "a=extmap" line, as specified in , Section 5. The list of header
extensions that SHOULD/MUST be supported is specified in , Section 5.2. Any
header extensions that require encryption MUST be specified as
indicated in , Section 4.For each supported RTCP feedback mechanism that is present in
the offer, an "a=rtcp-fb" mechanism, as specified in , Section 4.2. The list of RTCP feedback
mechanisms that SHOULD/MUST be supported is specified in , Section 5.1.If a local MediaStreamTrack has been associated, an "a=ssrc"
line, as specified in , Section
4.1, indicating the SSRC to be used for sending media, along with
the mandatory "cname" source attribute, as specified in Section
6.1, indicating the CNAME for the source.
The CNAME MUST be generated in accordance with
Section 4.9 of .If a local MediaStreamTrack has been associated, and RTX has
been negotiated for this m= section, another "a=ssrc" line with
the RTX SSRC, and an "a=ssrc-group" line, as specified in , section 4.2, with semantics set to
"FID" and including the primary and RTX SSRCs.If a local MediaStreamTrack has been associated, and FEC has
been negotiated for this m= section, another "a=ssrc" line with
the FEC SSRC, and an "a=ssrc-group" line with semantics set to
"FEC-FR" and including the primary and FEC SSRCs, as specified in
, section 4.3. For
simplicity, if both RTX and FEC are supported, the FEC SSRC
MUST be the same as the RTX SSRC.If a data channel m= section has been offered, a m= section MUST
also be generated for data. The <media> field MUST be set to
"application" and the <proto> and "fmt" fields MUST be set to
exactly match the fields in the offer.Within the data m= section, the "a=mid",
"a=ice-ufrag", "a=ice-pwd", "a=candidate",
"a=fingerprint", and "a=setup" lines MUST be included as mentioned
above, along with an "a=fmtp:webrtc-datachannel" line
and an "a=sctp-port" line referencing the SCTP port number
as defined in , Section 4.1.
If "a=group" attributes with semantics of "BUNDLE" are offered,
corresponding session-level "a=group" attributes MUST be added as
specified in . These attributes MUST
have semantics "BUNDLE", and MUST include the all mid identifiers from
the offered bundle groups that have not been rejected.
Note that regardless of the presence of "a=bundle-only" in the offer,
no m= sections in the answer should have an "a=bundle-only" line.
Attributes that are common between all m= sections MAY be moved
to session-level, if explicitly defined to be valid at
session-level.The attributes prohibited in the creation of offers are also
prohibited in the creation of answers.When createAnswer is called a second (or later) time, or is called
after a local description has already been installed, the processing
is somewhat different than for an initial answer.If the initial answer was not applied using setLocalDescription,
meaning the PeerConnection is still in the "have-remote-offer" state,
the steps for generating an initial answer should be followed, subject
to the following restriction: The fields of the "o=" line MUST stay the same except for the
<session-version> field, which MUST increment if the session
description changes in any way from the previously generated answer.
If any session description was previously supplied to
setLocalDescription, an answer is
generated by following the steps in the "have-remote-offer" state above,
along with these exceptions: The "s=" and "t=" lines MUST stay the same.Each "m=" and c=" line MUST be filled in with the port and
address of the default candidate for the m= section, as described
in , Section 4.3. Note, however,
that the m= line protocol need not match the default candidate,
because this protocol value must instead match what was supplied
in the offer, as described above. Each
"a=rtcp" attribute line MUST also be filled in with the port and
address of the appropriate default candidate, either the
default RTP or RTCP candidate, depending on whether RTCP multiplexing
is enabled in the answer. In each case, if no candidates of the desired
type have yet been gathered, dummy values MUST be used, as
described in the initial answer section above.Each "a=ice-ufrag" and "a=ice-pwd" line MUST stay the same,
unless the m= section is restarting, in which case new ICE
credentials must be created as specified in
, Section 9.2.1.1.
If the m= section is bundled into another m= section,
it still MUST NOT contain any ICE credentials.
If the m= section is not bundled into another m= section,
for each candidate that has
been gathered during the most recent gathering phase
(see ),
an "a=candidate" line MUST be added, as defined in , Section 4.3., paragraph 3.
If candidate gathering for the section has completed, an
"a=end-of-candidates" attribute MUST be added, as described in
, Section 9.3.
If the m= section is bundled into another m= section, both
"a=candidate" and "a=end-of-candidates" MUST be omitted.
For MediaStreamTracks that are still present, the "a=msid",
"a=ssrc", and "a=ssrc-group" lines MUST stay the same.The createAnswer method takes as a parameter an RTCAnswerOptions
object. The set of parameters for RTCAnswerOptions is different than
those supported in RTCOfferOptions;
the OfferToReceiveAudio, OfferToReceiveVideo, and IceRestart
options mentioned in are
meaningless in the context of generating an answer, as there is no need
to generate extra m= lines in an answer, and ICE credentials
will automatically be changed for all m= lines where the offerer chose
to perform ICE restart.The following options are supported in RTCAnswerOptions.Silence suppression in the answer is handled as described in
, with one exception:
if support for silence suppression was not indicated in the offer,
the VoiceActivityDetection parameter has no effect, and the answer
should be generated as if VoiceActivityDetection was set to
false. This is done on a per-codec basis (e.g., if the offerer somehow
offered support for CN but set "usedtx=0" for Opus, setting
VoiceActivityDetection to true would result in an answer with CN codecs
and "usedtx=0").
When a SessionDescription is supplied to setLocalDescription, the
following steps MUST be performed:
First, the type of the SessionDescription is checked against the
current state of the PeerConnection:
If the type is "offer", the PeerConnection state MUST be
either "stable" or "have-local-offer".If the type is "pranswer" or "answer", the PeerConnection
state MUST be either "have-remote-offer" or
"have-local-pranswer".If the type is not correct for the current state, processing MUST
stop and an error MUST be returned.Next, the SessionDescription is parsed into a data structure,
as described in the
section below. If parsing
fails for any reason, processing MUST stop and an error MUST be
returned.
Finally, the parsed SessionDescription is applied as described in
the section below.When a SessionDescription is supplied to setRemoteDescription, the
following steps MUST be performed:
First, the type of the SessionDescription is checked against the
current state of the PeerConnection:
If the type is "offer", the PeerConnection state MUST be
either "stable" or "have-remote-offer".If the type is "pranswer" or "answer", the PeerConnection
state MUST be either "have-local-offer" or
"have-remote-pranswer".If the type is not correct for the current state, processing MUST
stop and an error MUST be returned.Next, the SessionDescription is parsed into a data structure,
as described in the
section below. If parsing
fails for any reason, processing MUST stop and an error MUST be
returned.
Finally, the parsed SessionDescription is applied as described in
the section below.When a SessionDescription of any type is supplied to
setLocal/RemoteDescription, the implementation must parse it and
reject it if it is invalid. The exact details of this
process are explained below.The SDP contained in the session description object consists of a
sequence of text lines, each containing a key-value expression,
as described in , Section 5.
The SDP is read,
line-by-line, and converted to a data structure that contains the
deserialized information. However, SDP allows many types of lines, not
all of which are relevant to JSEP applications.
For each line, the implementation will first ensure it is syntactically
correct according its defining ABNF, check
that it conforms to and
semantics, and then either parse and store or discard the provided value,
as described below. A partial list of ABNF definitions for SDP attributes can found in:AttributeReference ptime Section 9 maxptime Section 9 rtpmap Section 9 recvonly Section 9 sendrecv Section 9 sendonly Section 9 inactive Section 9 framerate Section 9 fmtp Section 9 quality Section 9 msid Section 2 rtcp Section 2.1 setup Section 3, 4, and 5 connection Section 3, 4, and 5 fingerprint Section 5 rtcp-fb Section 4.2 candidate Section 15 extmap Section 7 mid Section 4 and 5 group Section 4 and 5 imageattr Section 3.1 extmap (encrypt option) Section 4 [TODO: ensure that every line is listed below.]
If the line is not well-formed, or cannot be parsed as described, the parser MUST stop
with an error and reject the session description. This ensures that
implementations do not accidentally misinterpret ambiguous SDP.First, the session-level lines are checked and parsed. These
lines MUST occur in a specific order, and with a specific syntax, as
defined in , Section 5. Note that while
the specific line types (e.g. "v=", "c=") MUST occur in the defined
order, lines of the same type (typically "a=") can occur in any order,
and their ordering is not meaningful.For non-attribute (non-"a=") lines, their sequencing, syntax,
and semantics, are checked, as mentioned above. The following
lines are not meaningful in the JSEP context and
MAY be discarded once they have been checked.
The "c=" line MUST be checked for syntax but its
value is not used. This supersedes the guidance in , Section 6.1, to use "ice-mismatch" to indicate
mismatches between "c=" and the candidate lines;
because JSEP always uses ICE, "ice-mismatch" is not useful in this context.
The "i=", "u=", "e=", "p=", "t=", "r=", "z=", and "k="
lines are not used by this specification; they MUST be checked
for syntax but their values are not used.The remaining lines are processed as follows:
The "v=" line MUST have a version of 0, as specified in
, Section 5.1.
The "o=" line MUST be parsed as specified in
, Section 5.2.
The "b=" line, if present, MUST be parsed as specified in
, Section 5.8, and the bwtype and
bandwidth values stored.
Specific processing MUST be applied for the following session-level
attribute ("a=") lines:
Any "a=group" lines are parsed as specified in
, Section 5, and the group's semantics
and mids are stored.If present, a single "a=ice-lite" line is parsed as specified in
, Section 15.3, and a value indicating
the presence of ice-lite is stored.If present, a single "a=ice-ufrag" line is parsed as specified in
, Section 15.4, and the ufrag value is
stored.If present, a single "a=ice-pwd" line is parsed as specified in
, Section 15.4, and the password
value is stored.If present, a single "a=ice-options" line is parsed as specified in
, Section 15.5, and the set of specified
options is stored.Any "a=fingerprint" lines are parsed as specified in
, Section 5, and the set of fingerprint
and algorithm values is stored.If present, a single "a=setup" line is parsed as specified in
, Section 4, and the setup value
is stored.Any "a=extmap" lines are parsed as specified in
, Section 5, and their values
are stored.TODO: identity, rtcp-rsize, rtcp-mux,
and any other attribs valid at session level.
Once all the session-level lines have been parsed, processing
continues with the lines in media sections.
Like the session-level lines, the media session lines MUST occur
in the specific order and with the specific syntax defined in
, Section 5.The "m=" line itself MUST be parsed as described in ,
Section 5.14, and the media, port, proto, and fmt values stored.Following the "m=" line, specific processing MUST be applied for the
following non-attribute lines:
As with the "c=" line at the session level,
the "c=" line MUST be parsed according to
, Section 5.7, but its value is not
used.The "b=" line, if present, MUST be parsed as specified in
, Section 5.8, and the bwtype and
bandwidth values stored.Specific processing MUST also be applied for the following attribute lines:
If present, a single "a=ice-ufrag" line is parsed as specified in
, Section 15.4, and the ufrag value is
stored.If present, a single "a=ice-pwd" line is parsed as specified in
, Section 15.4, and the password
value is stored.If present, a single "a=ice-options" line is parsed as specified in
, Section 15.5, and the set of specified
options is stored.Any "a=fingerprint" lines are parsed as specified in
, Section 5, and the set of fingerprint
and algorithm values is stored.If present, a single "a=setup" line is parsed as specified in
, Section 4, and the setup value
is stored.If the "m=" proto value indicates use of RTP, as decribed in
the section above, the following
attribute lines MUST be processed:
The "m=" fmt value MUST be parsed as specified in ,
Section 5.14, and the individual values stored.Any "a=rtpmap" or "a=fmtp" lines
MUST be parsed as specified in , Section 6,
and their values stored.If present, a single "a=ptime" line MUST be parsed as described in
, Section 6, and its value stored.If present, a single "a=maxptime" line MUST be parsed as described in
, Section 6, and its value stored.If present, a single direction attribute line (e.g. "a=sendrecv")
MUST be parsed as described in
, Section 6, and its value stored.Any "a=ssrc" or "a=ssrc-group" attributes MUST be parsed as
specified in , Sections 4.1-4.2, and their
values stored.Any "a=extmap" attributes MUST be parsed as specified in
, Section 5, and their values stored.Any "a=rtcp-fb" attributes MUST be parsed as specified in
, Section 4.2., and their values stored.If present, a single "a=rtcp-mux" attribute MUST be parsed as specified in
, Section 5.1.1, and its presence or
absence flagged and stored.If present, a single "a=rtcp-rsize" attribute MUST be parsed as specified in
, Section 5, and its presence or
absence flagged and stored.If present, a single "a=rtcp" attribute MUST be parsed as specified in
, Section 2.1, but its value is ignored. If present, a single "a=msid" attribute MUST be parsed as specified in
, Section 3.2, and its value stored. Any "a=candidate" attributes MUST be parsed as specified in
, Section 4.3, and their values stored. Any "a=remote-candidates" attributes MUST be parsed as specified in
, Section 4.3, but their values are ignored.If present, a single "a=end-of-candidates" attribute MUST be parsed as specified in
, Section 8.2,
and its presence or absence flagged and stored.Any "a=imageattr" attributes MUST be parsed as specified in
, Section 3, and their values stored.Otherwise, if the "m=" proto value indicates use of SCTP, the following
attribute lines MUST be processed:
The "m=" fmt value MUST be parsed as specified in
, Section 4.3, and
the application protocol value stored.An "a=sctp-port" attribute MUST be present, and it MUST be
parsed as specified in ,
Section 5.2, and the value stored.If present, a single "a=max-message-size" attribute MUST
be parsed as specified in ,
Section 6, and the value stored. Otherwise, use the specified default.Assuming parsing completes successfully, the parsed description is
then evaluated to ensure internal consistency as well as proper support
for mandatory features. Specifically, the following checks are
performed:
For each m= section, valid values for each of the
mandatory-to-use features enumerated in
MUST be present. These values MAY either be present at the
media level, or inherited from the session level.
ICE ufrag and password values, which MUST comply with the
size limits specified in , Section 15.4.DTLS setup value, which MUST be set according to the rules
specified in , Section 5, and MUST
be consistent with the selected role of the current DTLS
connection, if one exists.[TODO: may need revision, i.e., use of actpassDTLS fingerprint values, where at least one fingerprint
MUST be present.Each m= section is also checked to ensure prohibited features
are not used. If this is a local description, the "ice-lite"
attribute MUST NOT be specified.If this session description is of type "pranswer" or "answer", the
following additional checks are applied:
The session description must follow the rules defined in
, Section 6, including the requirement
that the number of m= sections MUST exactly match the number
of m= sections in the associated offer.For each m= section, the media type and protocol values MUST
exactly match the media type and protocol values in the
corresponding m= section in the associated offer.The following steps are performed at the media engine level to apply a local
description.First, the parsed parameters are checked to ensure that any modifications
performed fall within those explicitly permitted by
; otherwise, processing
MUST stop and an error MUST be returned.Next, media sections are processed. For each media section,
the following steps MUST be performed; if any parameters are out of
bounds, or cannot be applied, processing MUST stop and an error MUST be
returned.
If this media section is new, begin gathering candidates for it,
as defined in , Section 4.1.1,
unless it has been marked as bundle-only.Or, if the ICE ufrag and password values have changed, trigger the ICE Agent
to start an ICE restart and begin gathering new candidates for the
media section, as defined in , Section 9.1.1.1,
unless it has been marked as bundle-only.If the media section proto value indicates use of RTP:
If RTCP mux is indicated, prepare to demux RTP and RTCP from the RTP
ICE component, as specified in , Section 5.1.1.
If RTCP mux is not indicated, but was indicated in a previous
description, this MUST result in an error.For each specified RTP header extension, establish a mapping between
the extension ID and URI, as described in section 6 of .
If any indicated RTP header extension is unknown, this
MUST result in an error. If the MID header extension is supported, prepare to demux
RTP data intended for this media section based on
the MID header extension, as described in , Section 3.2. For each specified payload type, establish a mapping between the
payload type ID and the actual media format, as descibed in
. If any indicated payload
type is unknown, this MUST result in an error.For each specified "rtx" media format, establish a mapping between
the RTX payload type and its associated primary
payload type, as described in , Sections 8.6 and 8.7. If any
referenced primary payload types are not present, this MUST result in an error.If the directional attribute is of type "sendrecv" or
"recvonly", enable receipt and decoding of media.
Finally, if this description is of type "pranswer" or "answer", follow
the processing defined in the
section below.If the answer contains any "a=ice-options" attributes where "trickle"
is listed as an attribute, update the PeerConnection
canTrickle property to be true. Otherwise, set this property to false.The following steps are performed at the media engine level to apply a remote
description.The following steps MUST be performed for attributes at the
session level; if any parameters are out of
bounds, or cannot be applied, processing MUST stop and an error MUST be
returned.
For any specified "CT" bandwidth value, set
this as the limit for the maximum total bitrate for all m= sections,
as specified in Section 5.8 of . The implementation can decide how to
allocate the available bandwidth between m= sections to
simultaneously meet any limits on individual m= sections,
as well as this overall session limit.For any specified "RR" or "RS" bandwidth values, handle as specified in , Section 2.Any "AS" bandwidth value MUST be ignored, as the meaning of this
construct at the session level is not well defined. For each media section,
the following steps MUST be performed; if any parameters are out of
bounds, or cannot be applied, processing MUST stop and an error MUST be
returned.
If the description is of type "offer", and the ICE ufrag or password changed
from the previous remote description, as described in Section
9.1.1.1 of , mark that an ICE restart is needed.Configure the ICE components associated with this media section to use
the supplied ICE remote ufrag and password for their connectivity checks.Pair any supplied ICE candidates with any gathered local candidates, as described
in Section 5.7 of
and start connectivity checks with the appropriate credentials.If the media section proto value indicates use of RTP:
[TODO: header extensions]For each specified payload type that is also supported by the local implementation,
establish a mapping between the
payload type ID and the actual media format.
[TODO - Justin to add more to explain mapping.]
If any indicated payload
type is unknown, it MUST be ignored. [TODO: should fail on answers]For each specified "rtx" media format, establish a mapping between
the RTX payload type and its associated primary payload type,
as described in . If any
referenced primary payload types are not present, this MUST result in an error.For each specified fmtp parameter that is supported by the local implementation,
enable them on the associated payload types.For each specified RTCP feedback mechanism that is supported by the local implementation,
enable them on the associated payload types.For any specified "TIAS" bandwidth value, set this
value as a constraint on the maximum RTP bitrate to be
used when sending media, as specified in . If a "TIAS" value is not
present, but an "AS" value is specified, generate a
"TIAS" value using this formula:
TIAS = AS * 0.95 - 50 * 40 * 8
The 50 is based on 50 packets per second, the 40 is based on an estimate of
total header size, and the 0.95 is to allocate 5% to RTCP. If more
accurate control of bandwidth is needed,
"TIAS" should be used instead of "AS".For any "RR" or "RS" bandwidth values, handle as specified in , Section 2.Any specified "CT" bandwidth value MUST be
ignored, as the meaning of this construct at the media level is not well
defined.[TODO: handling of CN, telephone-event, "red"]If the media section if of type audio:
For any specified "ptime" value, configure the available payload types to
use the specified packet size. If the specified size is
not supported for a payload type, use the next closest
value instead.Finally, if this description is of type "pranswer" or "answer", follow
the processing defined in the
section below.In addition to the steps mentioned above for processing a local or
remote description, the following steps are performed when processing
a description of type "pranswer" or "answer".For each media section, the following steps MUST be performed:
If the media section has been rejected (i.e. port is set to zero in
the answer), stop any reception or transmission of media for this
section, and discard any associated ICE components, as described
in Section 9.2.1.3 of .If the remote DTLS fingerprint has been changed, tear down the existing
DTLS connection.If no valid DTLS connection exists, prepare to start a DTLS connection,
using the specified roles and fingerprints, on any underlying ICE components,
once they are active.If the media section proto value indicates use of RTP:
If the media section has RTCP mux enabled, discard any RTCP component,
and begin or continue muxing RTCP over the RTP component, as specified in
, Section 5.1.3. Otherwise, transmit RTCP over
the RTCP component; if no RTCP component exists, because RTCP mux
was previously enabled, this MUST result in an error.If the media section has reduced-size RTCP enabled, configure the
RTCP transmission for this media section to use reduced-size RTCP,
as specified in . If the directional attribute in the answer is of type "sendrecv" or
"sendonly", prepare to start transmitting media using the specified primary SSRC
and one of the selected payload types, once the underlying transport
layers have been established. Otherwise, stop transmitting RTP media,
although RTCP should still be sent, as described in
, Section 5.1. If the media section proto value indicates use of SCTP:
If no SCTP association yet exists, prepare to initiate a SCTP association
over the associated ICE component and DTLS connection, using the local
SCTP port value from the local description, and the remote SCTP port value
from the remote description, as described in , Section 10.2.If the answer contains valid bundle groups, discard any ICE components
for the m= sections that will be bundled onto the primary ICE components
in each bundle, and begin muxing these m= sections accordingly,
as described in , Section 8.2.
It is possible to change elements in the SDP returned from createOffer
before passing it to setLocalDescription. When an implementation
receives modified SDP it MUST either:
Accept the changes and adjust its behavior to match the SDP.Reject the changes and return an error via the error callback.
Changes MUST NOT be silently ignored.
The following elements of the session description MUST NOT be
changed between the createOffer and the setLocalDescription (or between
the createAnswer and the setLocalDescription), since they
reflect transport attributes that are solely under browser control, and
the browser MUST NOT honor an attempt to change them:The number, type and port number of m= lines.The generated ICE credentials (a=ice-ufrag and a=ice-pwd).The set of ICE candidates and their parameters (a=candidate).The DTLS fingerprint(s) (a=fingerprint).The contents of bundle groups, bundle-only parameters, or
"a=rtcp-mux" parameters.The following modifications, if done by the browser to a description
between createOffer/createAnswer and the setLocalDescription, MUST be
honored by the browser:Remove or reorder codecs (m=)The following parameters may be controlled by options passed into
createOffer/createAnswer. As an open issue, these changes may also be be
performed by manipulating the SDP returned from
createOffer/createAnswer, as indicated above, as long as the
capabilities of the endpoint are not exceeded (e.g. asking for a
resolution greater than what the endpoint can encode):[[OPEN ISSUE: This is a placeholder for other modifications, which
we may continue adding as use cases appear.]]Implementations MAY choose to either honor or reject any elements not
listed in the above two categories, but must do so explicitly as
described at the beginning of this section. Note that future standards
may add new SDP elements to the list of elements which must be accepted
or rejected, but due to version skew, applications must be
prepared for implementations to accept changes which must be
rejected and vice versa.The application can also modify the SDP to reduce the capabilities in
the offer it sends to the far side or the offer that it installs from the
far side in any way the application sees fit, as long as it is a
valid SDP offer and specifies a subset of what was in the original offer.
This is safe because the answer is not permitted to expand capabilities
and therefore will just respond to what is actually in the offer.As always, the application is solely responsible for what it sends to
the other party, and all incoming SDP will be processed by the browser
to the extent of its capabilities. It is an error to assume that all SDP
is well-formed; however, one should be able to assume that any
implementation of this specification will be able to process, as a
remote offer or answer, unmodified SDP coming from any other
implementation of this specification. Note that this example section shows several SDP fragments. To
format in 72 columns, some of the lines in SDP have been split into
multiple lines, where leading whitespace indicates that a line
is a continuation of the previous line. In addition, some blank lines
have been added to improve readability but are not valid in SDP. More examples of SDP for WebRTC call flows can be found in . This section shows a very simple example that sets up a
minimal audio / video call between two browsers and does not use
trickle ICE. The example in the following section provides a
more realistic example of what would happen in a normal browser
to browser connection. The flow shows Alice's browser initiating the session to
Bob's browser. The messages from Alice's JS to Bob's JS are
assumed to flow over some signaling protocol via a web
server. The JS on both Alice's side and Bob's side waits
for all candidates before sending the offer or answer,
so the offers and answers are complete. Trickle ICE is
not used. Both Alice and Bob are using the default policy of balanced.
The SDP for |offer-A1| looks like:
The SDP for |answer-A1| looks like: This section shows a typical example of a session between two
browsers setting up an audio channel and a data channel. Trickle
ICE is used in full trickle mode with a bundle policy of max-bundle,
an RTCP mux policy of require, and a single TURN server.
Later, two video flows, one for the
presenter and one for screen sharing, are added to the session.
This example shows Alice's browser initiating the
session to Bob's browser. The messages from Alice's JS to Bob's
JS are assumed to flow over some signaling protocol via a web
server.
The SDP for |offer-B1| looks like:
The SDP for |candidate-B1| looks like:
The SDP for |candidate-B2| looks like:
The SDP for |answer-B1| looks like:
The SDP for |candidate-B3| looks like:
The SDP for |candidate-B4| looks like:
The SDP for |offer-B2| looks like:
(note the increment of the version number in the o= line,
and the c= and a=rtcp lines, which indicate the local candidate
that was selected)
The SDP for |answer-B2| looks like:
(note the use of setup:passive to maintain
the existing DTLS roles, and the use of a=recvonly to
indicate that the video streams are one-way)
The IETF has published separate documents
describing the security architecture for WebRTC as a whole.
The remainder of this section describes security considerations
for this document.
While formally the JSEP interface is an API, it is better to
think of it is an Internet protocol, with the JS being untrustworthy
from the perspective of the browser. Thus, the threat model of applies. In particular, JS can call the API in any
order and with any inputs, including malicious ones. This is
particularly relevant when we consider the SDP which is passed to
setLocalDescription(). While correct API usage requires that the
application pass in SDP which was derived from createOffer() or
createAnswer() (perhaps suitably modified as described in , there is no guarantee that
applications do so. The browser MUST be prepared for the JS to pass in
bogus data instead.
Conversely, the application programmer MUST recognize that the
JS does not have complete control of browser behavior. One case that
bears particular mention is that editing ICE candidates out of the SDP
or suppressing trickled candidates does not have the expected
behavior: implementations will still perform checks from those
candidates even if they are not sent to the other side. Thus, for
instance, it is not possible to prevent the remote peer from
learning your public IP address by removing server reflexive
candidates. Applications which wish to conceal their public IP address
should instead configure the ICE agent to use only relay candidates.
This document requires no actions from IANA.Significant text incorporated in the draft as well and review was
provided by Peter Thatcher, Taylor Brandstetter, Harald Alvestrand and Suhas Nandakumar. Dan
Burnett, Neil Stratford, Anant Narayanan, Andrew Hutton,
Richard Ejzak, Adam
Bergkvist and Matthew Kaufman all provided valuable feedback on this proposal.
Interactive Connectivity Establishment (ICE): A Protocol for
Network Address Translator (NAT) Traversal for Offer/Answer
ProtocolsThe Session Description Protocol (SDP) Grouping
FrameworkIn this specification, we define a framework to group "m" lines
in the Session Description Protocol (SDP) for different purposes.
This framework uses the "group" and "mid" SDP attributes, both of
which are defined in this specification. Additionally, we specify
how to use the framework for two different purposes: for lip
synchronization and for receiving a media flow consisting of
several media streams on different transport addresses. This
document obsoletes RFC 3388. [STANDARDS-TRACK]Multiplexing RTP Data and Control Packets on a Single
PortThis memo discusses issues that arise when multiplexing RTP
data packets and RTP Control Protocol (RTCP) packets on a single
UDP port. It updates RFC 3550 and RFC 3551 to describe when such
multiplexing is and is not appropriate, and it explains how the
Session Description Protocol (SDP) can be used to signal
multiplexed sessions. [STANDARDS-TRACK]Extended RTP Profile for Real-time Transport Control Protocol
(RTCP)-Based Feedback (RTP/AVPF)Real-time media streams that use RTP are, to some degree,
resilient against packet losses. Receivers may use the base
mechanisms of the Real-time Transport Control Protocol (RTCP) to
report packet reception statistics and thus allow a sender to
adapt its transmission behavior in the mid-term. This is the sole
means for feedback and feedback-based error repair (besides a few
codec-specific mechanisms). This document defines an extension to
the Audio-visual Profile (AVP) that enables receivers to provide,
statistically, more immediate feedback to the senders and thus
allows for short-term adaptation and efficient feedback-based
repair mechanisms to be implemented. This early feedback profile
(AVPF) maintains the AVP bandwidth constraints for RTCP and
preserves scalability to large groups. [STANDARDS-TRACK]SIP: Session Initiation ProtocolThis document describes Session Initiation Protocol (SIP), an
application-layer control (signaling) protocol for creating,
modifying, and terminating sessions with one or more participants.
These sessions include Internet telephone calls, multimedia
distribution, and multimedia conferences. [STANDARDS-TRACK]WebRTC Video Processing and Codec Requirements
A Transport Independent Bandwidth Modifier for the Session Description Protocol (SDP)
WebRTC Audio Codec and Processing RequirementsThis document outlines the audio codec and processing
requirements for WebRTC client application and endpoint
devices.Stream Control Transmission Protocol (SCTP)-Based Media
Transport in the Session Description Protocol (SDP)SCTP (Stream Control Transmission Protocol) is a transport
protocol used to establish associations between two endpoints.
This document describes how to express media transport over SCTP
in SDP (Session Description Protocol). This document defines the
'SCTP', 'SCTP/DTLS' and 'DTLS/SCTP' protocol identifiers for
SDP.Connection-Oriented Media Transport over the Transport Layer
Security (TLS) Protocol in the Session Description Protocol
(SDP)This document specifies how to establish secure
connection-oriented media transport sessions over the Transport
Layer Security (TLS) protocol using the Session Description
Protocol (SDP). It defines a new SDP protocol identifier,
'TCP/TLS'. It also defines the syntax and semantics for an SDP
'fingerprint' attribute that identifies the certificate that will
be presented for the TLS session. This mechanism allows media
transport over TLS connections to be established securely, so long
as the integrity of session descriptions is
assured.</t><t> This document extends and updates RFC
4145. [STANDARDS-TRACK]TCP-Based Media Transport in the Session Description Protocol
(SDP)This document describes how to express media transport over TCP
using the Session Description Protocol (SDP). It defines the SDP
'TCP' protocol identifier, the SDP 'setup' attribute, which
describes the connection setup procedure, and the SDP 'connection'
attribute, which handles connection reestablishment.
[STANDARDS-TRACK]IANA registration of SDP 'proto' attribute
for transporting RTP Media over TCP under various RTP profiles.Cisco Systems Inc707 Tasman DriveSan JoseCA95134USAsnandaku@cisco.com
RAI
MMUSIC
RTP provides end-to-end network transport functions suitable for
applications transmitting real-time data, such as audio, video or
simulation data, over multicast or unicast network services. The data
transport is augmented by a control protocol (RTCP) to allow monitoring
of the data delivery in a manner scalable to large multicast networks,
and to provide minimal control and identification functionality.
This document describes how to express RTP media transport over TCP
in SDP (Session Description Protocol) under various configurations.
This document defines 'TCP/RTP/AVPF', 'TCP/RTP/SAVP', 'TCP/RTP/SAVPF',
'TCP/TLS/RTP/SAVP', 'TCP/TLS/RTP/SAVPF' protocol identifiers for SDP.
A Framework for SDP Attributes when MultiplexingA General Mechanism for RTP Header ExtensionsThis document provides a general mechanism to use the header
extension feature of RTP (the Real-Time Transport Protocol). It
provides the option to use a small number of small extensions in
each RTP packet, where the universe of possible extensions is
large and registration is de-centralized. The actual extensions in
use in a session are signaled in the setup information for that
session. [STANDARDS-TRACK]Encryption of Header Extensions in the Secure Real-time
Transport Protocol (SRTP)The Secure Real-time Transport Protocol (SRTP) provides
authentication, but not encryption, of the headers of Real-time
Transport Protocol (RTP) packets. However, RTP header extensions
may carry sensitive information for which participants in
multimedia sessions want confidentiality. This document provides a
mechanism, extending the mechanisms of SRTP, to selectively
encrypt RTP header extensions in SRTP.</t><t> This
document updates RFC 3711, the Secure Real-time Transport Protocol
specification, to require that all future SRTP encryption
transforms specify how RTP header extensions are to be
encrypted.Web Real-Time Communication (WebRTC): Media Transport and Use
of RTPThe Web Real-Time Communication (WebRTC) framework provides
support for direct interactive rich communication using audio,
video, text, collaboration, games, etc. between two peers'
web-browsers. This memo describes the media transport aspects of
the WebRTC framework. It specifies how the Real-time Transport
Protocol (RTP) is used in the WebRTC context, and gives
requirements for which RTP features, profiles, and extensions need
to be supported.Multiplexing Negotiation Using Session Description Protocol
(SDP) Port NumbersThis specification defines a new SDP Grouping Framework
extension, "BUNDLE", that can be used with the Session Description
Protocol (SDP) Offer/Answer mechanism to negotiate the usage of
bundled media, which refers to the usage of a single 5-tuple for
media associated with multiple SDP media descriptions ("m="
lines).Cross Session Stream Identification in the Session
Description ProtocolThis document specifies a grouping mechanism for RTP media
streams that can be used to specify relations between media
streams. This mechanism is used to signal the association between
the SDP concept of "m-line" and the WebRTC concept of
"MediaStream" / "MediaStreamTrack" using SDP signaling. This
document is a work item of the MMUSIC WG, whose discussion list is
mmusic@ietf.org.Security Considerations for WebRTCThe Real-Time Communications on the Web (RTCWEB) working group is tasked with standardizing protocols for real-time communications between Web browsers, generally called "WebRTC". The major use cases for WebRTC technology are real-time audio and/or video calls, Web conferencing, and direct data transfer. Unlike most conventional real-time systems (e.g., SIP-based soft phones) WebRTC communications are directly controlled by a Web server, which poses new security challenges. For instance, a Web browser might expose a JavaScript API which allows a server to place a video call. Unrestricted access to such an API would allow any site which a user visited to "bug" a user's computer, capturing any activity which passed in front of their camera. This document defines the WebRTC threat model and analyzes the security threats of WebRTC in that model.WebRTC Security ArchitectureThe Real-Time Communications on the Web (RTCWEB) working group is tasked with standardizing protocols for enabling real-time communications within user-agents using web technologies (commonly called "WebRTC"). This document defines the security architecture for WebRTC.Guidelines for Writing RFC Text on Security ConsiderationsAll RFCs are required to have a Security Considerations section. Historically, such sections have been relatively weak. This document provides guidelines to RFC authors on how to write a good Security Considerations section. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.Key words for use in RFCs to Indicate
Requirement LevelsHarvard University1350 Mass. Ave.CambridgeMA 02138- +1 617 495 3864sob@harvard.edu
General
keywordAn Offer/Answer Model with Session Description Protocol
(SDP)SDP: Session Description Protocol
Real Time Control Protocol (RTCP) attribute in Session Description Protocol (SDP)
The Session Description Protocol (SDP) is used to describe the parameters of media streams used in multimedia sessions. When a session requires multiple ports, SDP assumes that these ports have consecutive numbers. However, when the session crosses a network address translation device that also uses port mapping, the ordering of ports can be destroyed by the translation. To handle this, we propose an extension attribute to SDP.
WebRTC Forward Error Correction Requirements
This document makes recommendations for how Forward Error
Correction (FEC) should be used by WebRTC applications.
Datagram Transport Layer Security Version 1.2This document specifies version 1.2 of the Datagram Transport Layer Security (DTLS) protocol. The DTLS protocol provides communications privacy for datagram protocols. The protocol allows client/server applications to communicate in a way that is designed to prevent eavesdropping, tampering, or message forgery. The DTLS protocol is based on the Transport Layer Security (TLS) protocol and provides equivalent security guarantees. Datagram semantics of the underlying transport are preserved by the DTLS protocol. This document updates DTLS 1.0 to work with TLS version 1.2. [STANDARDS-TRACK]
Trickle ICE: Incremental Provisioning of Candidates for the
Interactive Connectivity Establishment (ICE) Protocol
JitsiStrasbourg67000France+33 6 72 81 15 55emcho@jitsi.orgRTFM, Inc.2064 Edgewood DrivePalo AltoCA94303USA+1 650 678 2350ekr@rtfm.comGoogle747 6th St SKirklandWA98033USA+1 857 288 8888justin@uberti.name&yetpeter@andyet.comhttps://andyet.com/
This document describes an extension to the Interactive
Connectivity Establishment (ICE) protocol that allows ICE agents
to send and receive candidates incrementally rather than
exchanging complete lists. With such incremental provisioning,
ICE agents can begin connectivity checks while they are still
gathering candidates and considerably shorten the time necessary
for ICE processing to complete. This mechanism is called "trickle
ICE".
Negotiation of Generic Image Attributes in the Session Description Protocol (SDP)This document proposes a new generic session setup
attribute to make it possible to negotiate different image
attributes such as image size. A possible use case is to make
it possible for a \%low-end \%hand- held terminal to display
video without the need to rescale the image, something that
may consume large amounts of memory and processing power. The
document also helps to maintain an optimal bitrate for video
as only the image size that is desired by the receiver is
transmitted. [STANDARDS-TRACK]Session Description Protocol (SDP) Bandwidth Modifiers for
RTP Control Protocol (RTCP) BandwidthThis document defines an extension to the Session Description
Protocol (SDP) to specify two additional modifiers for the
bandwidth attribute. These modifiers may be used to specify the
bandwidth allowed for RTP Control Protocol (RTCP) packets in a
Real-time Transport Protocol (RTP) session. [STANDARDS-TRACK]Source-Specific Media Attributes in the Session Description
Protocol (SDP)The Session Description Protocol (SDP) provides mechanisms to
describe attributes of multimedia sessions and of individual media
streams (e.g., Real-time Transport Protocol (RTP) sessions) within
a multimedia session, but does not provide any mechanism to
describe individual media sources within a media stream. This
document defines a mechanism to describe RTP media sources, which
are identified by their synchronization source (SSRC) identifiers,
in SDP, to associate attributes with these sources, and to express
relationships among sources. It also defines several source-level
attributes that can be used to describe properties of media
sources. [STANDARDS-TRACK]
Forward Error Correction Grouping Semantics in the Session
Description Protocol
This document defines the semantics for grouping the associated source
and FEC-based (Forward Error Correction) repair flows in the Session
Description Protocol (SDP). The semantics defined in this document
are to be used with the SDP Grouping Framework (RFC 5888). These
semantics allow the description of grouping relationships between the
source and repair flows when one or more source and/or repair flows
are associated in the same group, and they provide support for
additive repair flows. SSRC-level (Synchronization Source) grouping
semantics are also defined in this document for Real-time Transport
Protocol (RTP) streams using SSRC multiplexing. [STANDARDS-TRACK]
Support for Reduced-Size Real-Time Transport Control Protocol
(RTCP): Opportunities and ConsequencesThis memo discusses benefits and issues that arise when
allowing Real-time Transport Protocol (RTCP) packets to be
transmitted with reduced size. The size can be reduced if the
rules on how to create compound packets outlined in RFC 3550 are
removed or changed. Based on that analysis, this memo defines
certain changes to the rules to allow feedback messages to be sent
as Reduced-Size RTCP packets under certain conditions when using
the RTP/AVPF (Real-time Transport Protocol / Audio-Visual Profile
with Feedback) profile (RFC 4585). This document updates RFC 3550,
RFC 3711, and RFC 4585. [STANDARDS-TRACK]A Real-time Transport Protocol (RTP) Header Extension for Client-to-Mixer Audio Level IndicationEarly Media and Ringing Tone Generation in the Session
Initiation Protocol (SIP)This document describes how to manage early media in the
Session Initiation Protocol (SIP) using two models: the gateway
model and the application server model. It also describes the
inputs one needs to consider in defining local policies for
ringing tone generation. This memo provides information for the
Internet community.RTP Retransmission Payload FormatRTP retransmission is an effective packet loss recovery
technique for real-time applications with relaxed delay bounds.
This document describes an RTP payload format for performing
retransmissions. Retransmitted RTP packets are sent in a separate
stream from the original RTP stream. It is assumed that feedback
from receivers to senders is available. In particular, it is
assumed that Real-time Transport Control Protocol (RTCP) feedback
as defined in the extended RTP profile for RTCP-based feedback
(denoted RTP/AVPF) is available in this memo.
[STANDARDS-TRACK]Real-time Transport Protocol (RTP) Payload for Comfort Noise
(CN)SDP for the WebRTCThe Web Real-Time Communication (WebRTC) [WEBRTC] working group
is charged to provide protocol support for direct interactive rich
communication using audio,video and data between two peers' web
browsers. With in the WebRTC framework, Session Description
protocol (SDP) [RFC4566] is used for negotiating session
capabilities between the peers. Such a negotiataion happens based
on the SDP Offer/Answer exchange mechanism described in the RFC
3264 [RFC3264]. This document serves a introductory purpose in
describing the role of SDP for the most common WebRTC use-cases.
This SDP examples provided in this document is still a work in
progress, but aims to align closest to the evolving standards.Framework for Establishing a Secure Real-time Transport
Protocol (SRTP) Security Context Using Datagram Transport Layer
Security (DTLS)This document specifies how to use the Session Initiation
Protocol (SIP) to establish a Secure Real-time Transport Protocol
(SRTP) security context using the Datagram Transport Layer
Security (DTLS) protocol. It describes a mechanism of transporting
a fingerprint attribute in the Session Description Protocol (SDP)
that identifies the key that will be presented during the DTLS
handshake. The key exchange travels along the media path as
opposed to the signaling path. The SIP Identity mechanism can be
used to protect the integrity of the fingerprint attribute from
modification by intermediate proxies. [STANDARDS-TRACK]Datagram Transport Layer Security (DTLS) Extension to Establish Keys for the Secure Real-time Transport Protocol (SRTP)This document describes a Datagram Transport Layer Security (DTLS) extension to establish keys for Secure RTP (SRTP) and Secure RTP Control Protocol (SRTCP) flows. DTLS keying happens on the media path, independent of any out-of-band signalling channel present. [STANDARDS-TRACK]Session Description Protocol (SDP) Security Descriptions for
Media StreamsWebRTC 1.0: Real-time Communication Between BrowsersWebRTC IP Address Handling Recommendations Note: This section will be removed by RFC Editor before publication. Changes in draft-13:Clarified which SDP lines can be ignored.Clarified how to handle various received attributes.Revised how atttributes should be generated for bundled m= lines.Remove unused references.Remove text advocating use of unilateral PTs.Trigger an ICE restart even if the ICE candidate policy is being
made more strict.Remove the 'public' ICE candidate policy.Move open issues/TODOs into GitHub issues.Split local/remote description accessors into current/pending.Clarify a=imageattr handling.Add more detail on VoiceActivityDetection handling.Reference draft-shieh-rtcweb-ip-handling.Make it clear when an ICE restart should occur.Resolve reference TODOs.Remove MSID semantics.ice-options are now at session level.Default RTCP mux policy is now 'require'.Changes in draft-12:Filled in sections on applying local and remote descriptions.Discussed downscaling and upscaling to fulfill imageattr
requirements.Updated what SDP can be modified by the application.Updated to latest datachannel SDP.Allowed multiple fingerprint lines.Switched back to IPv4 for dummy candidates.Added additional clarity on ICE default candidates.Changes in draft-11:Clarified handling of RTP CNAMEs.Updated what SDP lines should be processed or ignored.Specified how a=imageattr should be used.Changes in draft-10:TODOChanges in draft-09:
Don't return null for {local,remote}Description after close().
Changed TCP/TLS to UDP/DTLS in RTP profile names.
Separate out bundle and mux policy.
Added specific references to FEC mechanisms.
Added canTrickle mechanism.
Added section on subsequent answers and, answer options.
Added text defining set{Local,Remote}Description behavior.
Changes in draft-08:
Added new example section and removed old examples in appendix. Fixed <proto> field handling.Added text describing a=rtcp attribute.Reworked handling of OfferToReceiveAudio and OfferToReceiveVideo per discussion
at IETF 90.Reworked trickle ICE handling and its impact on m= and c= lines per
discussion at interim.Added max-bundle-and-rtcp-mux policy.Added description of maxptime handling.Updated ICE candidate pool default to 0.Resolved open issues around AppID/receiver-ID.Reworked and expanded how changes to the ICE configuration are handled.Some reference updates.Editorial clarification.Changes in draft-07:
Expanded discussion of VAD and Opus DTX.Added a security considerations section.Rewrote the section on modifying SDP to require
implementations to clearly indicate whether any given
modification is allowed.Clarified impact of IceRestart on CreateOffer in
local-offer state.Guidance on whether attributes should be defined at the
media level or the session level.Renamed "default" bundle policy to "balanced".Removed default ICE candidate pool size and clarify how it works.Defined a canonical order for assignment of MSTs to m= lines.Removed discussion of rehydration.Added Eric Rescorla as a draft editor.Cleaned up references.Editorial cleanupChanges in draft-06: Reworked handling of m= line recycling.Added handling of BUNDLE and bundle-only.Clarified handling of rollback.Added text describing the ICE Candidate Pool and its behavior.Allowed OfferToReceiveX to create multiple recvonly m= sections.
Changes in draft-05: Fixed several issues identified in the createOffer/Answer
sections during document review.Updated references.Changes in draft-04: Filled in sections on createOffer and createAnswer.Added SDP examples.Fixed references.Changes in draft-03: Added text describing relationship to W3C specificationChanges in draft-02: Converted from nroffRemoved comparisons to old approaches abandoned by the working
groupRemoved stuff that has moved to W3C specificationAlign SDP handling with W3C draftClarified section on forking.Changes in draft-01: Added diagrams for architecture and state machine.Added sections on forking and rehydration.Clarified meaning of "pranswer" and "answer".Reworked how ICE restarts and media directions are
controlled.Added list of parameters that can be changed in a
description.Updated suggested API and examples to match latest thinking.Suggested API and examples have been moved to an appendix.Changes in draft -00: Migrated from draft-uberti-rtcweb-jsep-02.