Javascript Session Establishment ProtocolGoogle747 6th Ave SKirklandWA98033USAjustin@uberti.nameCisco170 West Tasman DriveSan JoseCA95134USAfluffy@iii.caMozilla331 Evelyn AveMountain ViewCA94041USAekr@rtfm.com
RAI
This document describes the mechanisms for allowing a Javascript
application to control the signaling plane of a multimedia session via
the interface specified in the W3C RTCPeerConnection API, and discusses
how this relates to existing signaling protocols.This document describes how the W3C WEBRTC RTCPeerConnection
interface is used to
control the setup, management and teardown of a multimedia session.The thinking behind WebRTC call setup has been to fully specify and
control the media plane, but to leave the signaling plane up to the
application as much as possible. The rationale is that different
applications may prefer to use different protocols, such as the
existing SIP or Jingle call signaling protocols, or something custom
to the particular application, perhaps for a novel use case. In this
approach, the key information that needs to be exchanged is the
multimedia session description, which specifies the necessary
transport and media configuration information necessary to establish
the media plane.With these considerations in mind, this document describes the
Javascript Session Establishment Protocol (JSEP) that allows for full
control of the signaling state machine from Javascript.
JSEP removes the browser almost entirely from the core signaling flow, which
is instead handled by the Javascript making use of two interfaces:
(1) passing in local and remote session descriptions and (2) interacting
with the ICE state machine.
In this document, the use of JSEP is described as if it always
occurs between two browsers. Note though in many cases it will
actually be between a browser and some kind of server, such as a
gateway or MCU. This distinction is invisible to the browser; it just
follows the instructions it is given via the API.JSEP's handling of session descriptions is simple and
straightforward. Whenever an offer/answer exchange is needed, the
initiating side creates an offer by calling a createOffer() API. The
application optionally modifies that offer, and then uses it to set up
its local config via the setLocalDescription() API. The offer is then
sent off to the remote side over its preferred signaling mechanism
(e.g., WebSockets); upon receipt of that offer, the remote party
installs it using the setRemoteDescription() API.To complete the offer/answer exchange, the remote party uses the createAnswer() API
to generate an appropriate answer, applies it using the
setLocalDescription() API, and sends the answer back to the initiator over
the signaling channel. When the initiator gets that answer, it installs
it using the setRemoteDescription() API, and initial setup is complete. This
process can be repeated for additional offer/answer exchanges.Regarding ICE , JSEP decouples the
ICE state machine from the overall signaling state machine, as the ICE
state machine must remain in the browser, because only the browser has
the necessary knowledge of candidates and other transport info.
Performing this separation also provides additional flexibility; in
protocols that decouple session descriptions from transport, such as
Jingle, the session description can be sent immediately and the
transport information can be sent when available. In protocols
that don't, such as SIP, the information can be used in the aggregated
form. Sending transport information separately can allow for faster
ICE and DTLS startup, since ICE checks can start as soon as any
transport information is available rather than waiting for all of
it.Through its abstraction of signaling, the JSEP approach does
require the application to be aware of the signaling process. While
the application does not need to understand the contents of session
descriptions to set up a call, the application must call the right
APIs at the right times, convert the session descriptions and ICE
information into the defined messages of its chosen signaling
protocol, and perform the reverse conversion on the messages it
receives from the other side.One way to mitigate this is to provide a Javascript library that
hides this complexity from the developer; said library would implement
a given signaling protocol along with its state machine and
serialization code, presenting a higher level call-oriented interface
to the application developer. For example, libraries exist to
adapt the JSEP API into an API suitable for a SIP or XMPP.
Thus, JSEP provides greater control
for the experienced developer without forcing any additional
complexity on the novice developer.One approach that was considered instead of JSEP was to include a
lightweight signaling protocol. Instead of providing session
descriptions to the API, the API would produce and consume messages
from this protocol. While providing a more high-level API, this put
more control of signaling within the browser, forcing the browser to
have to understand and handle concepts like signaling glare. In
addition, it prevented the application from driving the state machine
to a desired state, as is needed in the page reload case.A second approach that was considered but not chosen was to
decouple the management of the media control objects from session
descriptions, instead offering APIs that would control each component
directly. This was rejected based on a feeling that requiring exposure
of this level of complexity to the application programmer would not be
beneficial; it would result in an API where even a simple example
would require a significant amount of code to orchestrate all the
needed interactions, as well as creating a large API surface that
needed to be agreed upon and documented. In addition, these API points
could be called in any order, resulting in a more complex set of
interactions with the media subsystem than the JSEP approach, which
specifies how session descriptions are to be evaluated and
applied.One variation on JSEP that was considered was to keep the basic
session description-oriented API, but to move the mechanism for
generating offers and answers out of the browser. Instead of providing
createOffer/createAnswer methods within the browser, this approach
would instead expose a getCapabilities API which would provide the
application with the information it needed in order to generate its
own session descriptions. This increases the amount of work that the
application needs to do; it needs to know how to generate session
descriptions from capabilities, and especially how to generate the
correct answer from an arbitrary offer and the supported capabilities.
While this could certainly be addressed by using a library like the
one mentioned above, it basically forces the use of said library even
for a simple example. Providing createOffer/createAnswer avoids this
problem, but still allows applications to generate their own
offers/answers (to a large extent) if they choose, using the
description generated by createOffer as an indication of the browser's
capabilities.The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in .JSEP does not specify a particular signaling model or state
machine, other than the generic need to exchange SDP media
descriptions in the fashion described by (offer/answer) in order for both sides of the
session to know how to conduct the session. JSEP provides mechanisms
to create offers and answers, as well as to apply them to a session.
However, the browser is totally decoupled from the actual mechanism by
which these offers and answers are communicated to the remote side,
including addressing, retransmission, forking, and glare handling.
These issues are left entirely up to the application; the application
has complete control over which offers and answers get handed to the
browser, and when.In order to establish the media plane, the user agent needs
specific parameters to indicate what to transmit to the remote side,
as well as how to handle the media that is received. These parameters
are determined by the exchange of session descriptions in offers and
answers, and there are certain details to this process that must be
handled in the JSEP APIs.Whether a session description applies to the local side or the
remote side affects the meaning of that description. For example, the
list of codecs sent to a remote party indicates what the local side is
willing to receive, which, when intersected with the set of codecs the
remote side supports, specifies what the remote side should send.
However, not all parameters follow this rule; for example, the DTLS-SRTP
parameters sent to a remote party
indicate what certificate the local side will use in DTLS setup, and
thereby what the remote party should expect to receive; the remote party
will have to accept these parameters, with no option to choose different
values.
In addition, various RFCs put different conditions on the format of
offers versus answers. For example, a offer may propose an arbitrary
number of media streams (i.e. m= sections), but an answer must contain
the exact same number as the offer.Lastly, while the exact media parameters are only known only after
an offer and an answer have been exchanged, it is possible for the
offerer to receive media after they have sent an offer and before they
have received an answer. To properly process incoming media in this
case, the offerer's media handler must be aware of the details of the
offer before the answer arrives.Therefore, in order to handle session descriptions properly, the
user agent needs: To know if a session description pertains to the local or
remote side.To know if a session description is an offer or an answer.To allow the offer to be specified independently of the
answer. JSEP addresses this by adding both setLocalDescription and
setRemoteDescription methods and having session description objects
contain a type field indicating the type of session description being
supplied. This satisfies the requirements listed above for both the
offerer, who first calls setLocalDescription(sdp [offer]) and then
later setRemoteDescription(sdp [answer]), as well as for the answerer,
who first calls setRemoteDescription(sdp [offer]) and then later
setLocalDescription(sdp [answer]).JSEP also allows for an answer to be treated as provisional by the
application. Provisional answers provide a way for an answerer to
communicate initial session parameters back to the offerer, in order
to allow the session to begin, while allowing a final answer to be
specified later. This concept of a final answer is important to the
offer/answer model; when such an answer is received, any extra
resources allocated by the caller can be released, now that the exact
session configuration is known. These "resources" can include things
like extra ICE components, TURN candidates, or video decoders.
Provisional answers, on the other hand, do no such deallocation
results; as a result, multiple dissimilar provisional answers can be
received and applied during call setup.In , the constraint at the signaling
level is that only one offer can be outstanding for a given session,
but at the media stack level, a new offer can be generated at any
point. For example, when using SIP for signaling, if one offer is
sent, then cancelled using a SIP CANCEL, another offer can be
generated even though no answer was received for the first offer. To
support this, the JSEP media layer can provide an offer via the
createOffer() method whenever the
Javascript application needs one for the signaling.
The answerer can send back zero or more provisional answers, and finally end the
offer-answer exchange by sending a final answer. The state machine for
this is as follows:Aside from these state transitions there is no other difference
between the handling of provisional ("pranswer") and final ("answer")
answers.In the WebRTC specification, session descriptions are formatted as
SDP messages. While this format is not optimal for manipulation from
Javascript, it is widely accepted, and frequently updated with new
features. Any alternate encoding of session descriptions would have to
keep pace with the changes to SDP, at least until the time that this
new encoding eclipsed SDP in popularity. As a result, JSEP currently
uses SDP as the internal representation for its session
descriptions.However, to simplify Javascript processing, and provide for future
flexibility, the SDP syntax is encapsulated within a
SessionDescription object, which can be constructed from SDP, and be
serialized out to SDP. If future specifications agree on a JSON format
for session descriptions, we could easily enable this object to
generate and consume that JSON.Other methods may be added to SessionDescription in the future to
simplify handling of SessionDescriptions from Javascript. In the
meantime, Javascript libraries can be used to perform these
manipulations.Note that most applications should be able to treat the
SessionDescriptions produced and consumed by these various API calls
as opaque blobs; that is, the application will not need to read or
change them. The W3C WebRTC API specification will provide appropriate
APIs to allow the application to control various session parameters,
which will provide the necessary information to the browser about what
sort of SessionDescription to produce.JSEP gathers ICE candidates as needed by the application.
Collection of ICE candidates is referred to as a gathering phase,
and this is triggered either by the addition of a new or recycled m=
line to the local session description, or new ICE credentials in the
description, indicating an ICE restart. Use of new ICE credentials can
be triggered explicitly by the application, or implicitly by the
browser in response to changes in the ICE configuration.
When a new
gathering phase starts, the ICE Agent will notify the application that
gathering is occurring through a callback. Then, when each new ICE
candidate becomes available, the ICE Agent will supply it to the
application via an additional callback; these candidates will also
automatically be added to the local session description. Finally, when
all candidates have been gathered, a callback will be dispatched
to signal that the gathering process is complete.Note that gathering phases only gather the candidates needed by
new/recycled/restarting m= lines; other m= lines continue to use their
existing candidates.Candidate trickling is a technique through which a caller may
incrementally provide candidates to the callee after the initial
offer has been dispatched; the semantics of "Trickle ICE" are
defined in . This
process allows the callee to begin acting upon the call and setting
up the ICE (and perhaps DTLS) connections immediately, without
having to wait for the caller to gather all possible candidates.
This results in faster media setup in cases where gathering is not
performed prior to initiating the call.JSEP supports optional candidate trickling by providing APIs, as
described above, that
provide control and feedback on the ICE candidate gathering process.
Applications that support candidate trickling can send the initial
offer immediately and send individual candidates when they get the
notified of a new candidate; applications that do not support this
feature can simply wait for the indication that gathering is
complete, and then create and send their offer, with all the
candidates, at this time.Upon receipt of trickled candidates, the receiving application
will supply them to its ICE Agent. This triggers the ICE Agent to
start using the new remote candidates for connectivity checks.As with session descriptions, the syntax of the IceCandidate
object provides some abstraction, but can be easily converted to
and from the SDP candidate lines.The candidate lines are the only SDP information that is
contained within IceCandidate, as they represent the only
information needed that is not present in the initial offer (i.e.,
for trickle candidates). This information is carried with the same
syntax as the "candidate-attribute" field defined for ICE. For
example:The IceCandidate object also contains fields to
indicate which m= line it should be associated with. The m= line can be
identified in one of two ways; either by a m= line index, or a MID.
The m= line index is a zero-based index, with index N referring to the
N+1th m= line in the SDP sent by the entity which sent the IceCandidate.
The MID uses the "media stream identification" attribute, as defined in
, Section 4, to identify the m= line.
JSEP implementations creating an ICE Candidate object MUST populate
both of these fields. Implementations receiving an ICE Candidate
object MUST use the MID if present, or the m= line index, if not
(as it could have come from a non-JSEP endpoint).Typically, when gathering ICE candidates, the browser will gather
all possible forms of initial candidates - host, server reflexive, and
relay. However, in certain cases, applications may want to have more
specific control over the gathering process, due to privacy or related
concerns. For example, one may want to suppress the use of host
candidates, to avoid exposing information about the local network, or
go as far as only using relay candidates, to leak as little
location information as possible (note that these choices come with
corresponding operational costs). To accomplish this, the browser
MUST allow the application to restrict which ICE candidates are used
in a session. In addition, administrators may also wish to control
the set of ICE candidates, and so the browser SHOULD also allow
control via local policy, with the most restrictive policy prevailing.
There may also be cases where the application wants to change which
types of candidates are used while the session is active. A prime
example is where a callee may initially want to use only relay
candidates, to avoid leaking location information to an arbitrary
caller, but then change to use all candidates (for lower operational
cost) once the user has indicated they want to take the call. For this
scenario, the browser MUST allow the candidate policy to be changed
in mid-session, subject to the aforementioned interactions with local
policy.To administer the ICE candidate policy, the browser will
determine the current setting at the start of each gathering phase.
Then, during the gathering phase, the browser MUST NOT
expose candidates disallowed by the current policy to the
application, use them as the source of connectivity checks, or
indirectly expose them via other fields, such as the raddr/rport
attributes for other ICE candidates.
Later, if a different policy is specified by the application, the
application can apply it by kicking off a new gathering phase via an
ICE restart.
JSEP applications typically inform the browser to begin ICE
gathering via the information supplied to setLocalDescription, as this
is where the app specifies the number of media streams, and thereby
ICE components, for which to gather candidates. However, to accelerate
cases where the application knows the number of ICE components to use
ahead of time, it may ask the browser to gather a pool of potential
ICE candidates to help ensure rapid media setup.
When setLocalDescription is eventually called, and the
browser goes to gather the needed ICE candidates, it SHOULD start by
checking if any candidates are available in the pool. If there are
candidates in the pool, they SHOULD be handed to the application
immediately via the ICE candidate callback. If the pool becomes
depleted, either because a larger-than-expected number of ICE
components is used, or because the pool has not had enough time to
gather candidates, the remaining candidates are gathered as usual.
One example of where this concept is useful is an application that
expects an incoming call at some point in the future, and wants to
minimize the time it takes to establish connectivity, to avoid
clipping of initial media. By pre-gathering candidates into the pool,
it can exchange and start sending connectivity checks from these
candidates almost immediately upon receipt of a call. Note though that
by holding on to these pre-gathered candidates, which will be kept
alive as long as they may be needed, the application will consume
resources on the STUN/TURN servers it is using.
Some call signaling systems allow various types of forking where an
SDP Offer may be provided to more than one device. For example, SIP
defines both a "Parallel Search" and
"Sequential Search". Although these are primarily signaling level
issues that are outside the scope of JSEP, they do have some impact on
the configuration of the media plane that is relevant. When forking
happens at the signaling layer, the Javascript application responsible
for the signaling needs to make the decisions about what media should
be sent or received at any point of time, as well as which remote
endpoint it should communicate with; JSEP is used to make sure the
media engine can make the RTP and media perform as required by the
application. The basic operations that the applications can have the
media engine do are: Start exchanging media with a given remote peer, but keep all the
resources reserved in the offer.Start exchanging media with a given remote peer, and free any
resources in the offer that are not being used.Sequential forking involves a call being dispatched to multiple
remote callees, where each callee can accept the call, but only one
active session ever exists at a time; no mixing of received media is
performed.JSEP handles sequential forking well, allowing the application to
easily control the policy for selecting the desired remote endpoint.
When an answer arrives from one of the callees, the application can
choose to apply it either as a provisional answer, leaving open the
possibility of using a different answer in the future, or apply it
as a final answer, ending the setup flow.In a "first-one-wins" situation, the first answer will be applied
as a final answer, and the application will reject any subsequent
answers. In SIP parlance, this would be ACK + BYE.In a "last-one-wins" situation, all answers would be applied as
provisional answers, and any previous call leg will be terminated.
At some point, the application will end the setup process, perhaps
with a timer; at this point, the application could reapply the
existing remote description as a final answer.Parallel forking involves a call being dispatched to multiple
remote callees, where each callee can accept the call, and multiple
simultaneous active signaling sessions can be established as a
result. If multiple callees send media at the same time, the
possibilities for handling this are described in Section 3.1 of
. Most SIP devices today only support
exchanging media with a single device at a time, and do not try to
mix multiple early media audio sources, as that could result in a
confusing situation. For example, consider having a European
ringback tone mixed together with the North American ringback tone -
the resulting sound would not be like either tone, and would confuse
the user. If the signaling application wishes to only exchange media
with one of the remote endpoints at a time, then from a media engine
point of view, this is exactly like the sequential forking case.In the parallel forking case where the Javascript application
wishes to simultaneously exchange media with multiple peers, the
flow is slightly more complex, but the Javascript application can
follow the strategy that describes
using UPDATE. The UPDATE approach allows the
signaling to set up a separate media flow for each peer that it
wishes to exchange media with. In JSEP, this offer used in the
UPDATE would be formed by simply creating a new PeerConnection and
making sure that the same local media streams have been added into
this new PeerConnection. Then the new PeerConnection object would
produce a SDP offer that could be used by the signaling to perform
the UPDATE strategy discussed in .As a result of sharing the media streams, the application will
end up with N parallel PeerConnection sessions, each with a local
and remote description and their own local and remote addresses. The
media flow from these sessions can be managed by specifying SDP
direction attributes in the descriptions, or the application can
choose to play out the media from all sessions mixed together. Of
course, if the application wants to only keep a single session, it
can simply terminate the sessions that it no longer needs.This section details the basic operations that must be present to
implement JSEP functionality. The actual API exposed in the W3C API may
have somewhat different syntax, but should map easily to these
concepts.The PeerConnection constructor allows the application to specify
global parameters for the media session, such as the STUN/TURN servers
and credentials to use when gathering candidates, as well as the
initial ICE candidate policy and pool size, and also the BUNDLE
policy to use.If an ICE candidate policy is specified, it functions as
described in , causing the
browser to only surface the permitted candidates to the application,
and only use those candidates for connectivity checks. The set of
available policies is as follows:
All candidates will be gathered and used.Candidates with private IP addresses
[RFC1918] will be filtered out. This prevents exposure of
internal network details, at the cost of requiring
relay usage even for intranet calls, if the NAT does not allow
hairpinning as described in [RFC4787], section 6.All candidates except relay candidates will
be filtered out. This obfuscates the location information that
might be ascertained by the remote peer from the received
candidates. Depending on how the application deploys its relay
servers, this could obfuscate location to a metro or possibly
even global level.
Although it can be overridden by local policy, the default ICE
candidate policy MUST be set to allow all candidates, as this
minimizes use of application STUN/TURN server resources.If a size is specified for the ICE candidate pool, this indicates
the number of ICE components to pre-gather candidates for. Because
pre-gathering results in utilizing STUN/TURN server resources for
potentially long periods of time, this must only
occur upon application request, and therefore the default
candidate pool size MUST be zero.
Lastly, the
application can specify its preferred policy regarding use of BUNDLE,
the multiplexing mechanism defined in .
By specifying a policy from the list below, the application can
control how aggressively it will try to BUNDLE media streams together.
The set of available policies is as follows:
The application will BUNDLE all media streams of
the same type together. That is, if there are multiple audio and
multiple video MediaStreamTracks attached to a PeerConnection, all
but the first audio and video tracks will be marked as
bundle-only, and candidates will only be gathered for N media
streams, where N is the number of distinct media types. When
talking to a non-BUNDLE-aware endpoint, only the non-bundle-only
streams will be negotiated. This policy balances desire to
multiplex with the need to ensure basic audio and video still
works in legacy cases. Data channels will be in a separate bundle
group.
The application will offer BUNDLE, but mark none
of its streams as bundle-only. This policy will allow all streams
to be received by non-BUNDLE-aware endpoints, but require separate
candidates to be gathered for each media stream.
The application will BUNDLE all of its media
streams, including data channels,
on a single transport. All streams other than the first
will be marked as bundle-only. This policy aims to minimize
candidate gathering and maximize multiplexing, at the cost of less
compatibility with legacy endpoints.
Similar to max-bundle,
but RTCP candidates are not gathered. This policy reduces the
candidates that must be gathered to the absolute minimum,
but will not be compatible with legacy endpoints that do not
support RTCP mux.
As it provides the best tradeoff between performance and
compatibility with legacy endpoints, the default BUNDLE policy
MUST be set to "balanced".
The createOffer method generates a blob of SDP that contains a
offer with the supported
configurations for the session, including descriptions of the local
MediaStreams attached to this PeerConnection, the codec/RTP/RTCP
options supported by this implementation, and any candidates that
have been gathered by the ICE Agent. An options parameter may be
supplied to provide additional control over the generated offer.
This options parameter should allow for the following
manipulations to be performed:
To indicate support for a media type even if no
MediaStreamTracks of that type have been added to the session
(e.g., an audio call that wants to receive video.)To trigger an ICE restart, for the purpose of reestablishing
connectivity.In the initial offer, the generated SDP will contain all desired
functionality for the session (functionality that is supported but
not desired by default may be omitted); for each SDP line, the
generation of the SDP will follow the process defined for generating
an initial offer from the document that specifies the given SDP
line. The exact handling of initial offer generation is detailed in
below.In the event createOffer is called after the session is
established, createOffer will generate an offer to modify the
current session based on any changes that have been made to the
session, e.g. adding or removing MediaStreams, or requesting an ICE
restart. For each existing stream, the generation of each SDP line
must follow the process defined for generating an updated offer from
the RFC that specifies the given SDP line. For each new stream,
the generation of the SDP must follow the process of generating an
initial offer, as mentioned above. If no changes have been made, or
for SDP lines that are unaffected by the requested changes, the
offer will only contain the parameters negotiated by the last
offer-answer exchange. The exact handling of subsequent offer
generation is detailed in . below.Session descriptions generated by createOffer must be immediately
usable by setLocalDescription; if a system has limited resources
(e.g. a finite number of decoders), createOffer should return an
offer that reflects the current state of the system, so that
setLocalDescription will succeed when it attempts to acquire those
resources. Because this method may need to inspect the system state
to determine the currently available resources, it may be
implemented as an async operation.Calling this method may do things such as generate new ICE
credentials, but does not result in candidate gathering, or cause
media to start or stop flowing.The createAnswer method generates a blob of SDP that contains a
SDP answer with the supported
configuration for the session that is compatible with the parameters
supplied in the most recent call to setRemoteDescription, which MUST
have been called prior to calling createAnswer.
Like createOffer, the returned blob contains
descriptions of the local MediaStreams attached to this
PeerConnection, the codec/RTP/RTCP options negotiated for this
session, and any candidates that have been gathered by the ICE
Agent. An options parameter may be supplied to provide additional
control over the generated answer.As an answer, the generated SDP will contain a specific
configuration that specifies how the media plane should be
established; for each SDP line, the generation of the SDP must
follow the process defined for generating an answer from the
document that specifies the given SDP line. The exact handling of
answer generation is detailed in . below.Session descriptions generated by createAnswer must be
immediately usable by setLocalDescription; like createOffer, the
returned description should reflect the current state of the system.
Because this method may need to inspect the system state to
determine the currently available resources, it may need to be
implemented as an async operation.Calling this method may do things such as generate new ICE
credentials, but does not trigger candidate gathering or change
media state.Session description objects (RTCSessionDescription) may be of
type "offer", "pranswer", or "answer". These types provide
information as to how the description parameter should be parsed,
and how the media state should be changed."offer" indicates that a description should be parsed as an
offer; said description may include many possible media
configurations. A description used as an "offer" may be applied
anytime the PeerConnection is in a stable state, or as an update to
a previously supplied but unanswered "offer"."pranswer" indicates that a description should be parsed as an
answer, but not a final answer, and so should not result in the
freeing of allocated resources. It may result in the start of media
transmission, if the answer does not specify an inactive media
direction. A description used as a "pranswer" may be applied as a
response to an "offer", or an update to a previously sent
"pranswer"."answer" indicates that a description should be parsed as an
answer, the offer-answer exchange should be considered complete, and
any resources (decoders, candidates) that are no longer needed can
be released. A description used as an "answer" may be applied as a
response to a "offer", or an update to a previously sent
"pranswer".The only difference between a provisional and final answer is
that the final answer results in the freeing of any unused resources
that were allocated as a result of the offer. As such, the
application can use some discretion on whether an answer should be
applied as provisional or final, and can change the type of the
session description as needed. For example, in a serial forking
scenario, an application may receive multiple "final" answers, one
from each remote endpoint. The application could choose to accept
the initial answers as provisional answers, and only apply an answer
as final when it receives one that meets its criteria (e.g. a live
user instead of voicemail)."rollback" is a special session description type implying
that the state machine should be rolled back to the previous
state, as described in .
The contents MUST be empty.Most web applications will not need to create answers using the
"pranswer" type. While it is good practice to send an immediate
response to an "offer", in order to warm up the session transport
and prevent media clipping, the preferred handling for a web
application would be to create and send an "inactive" final answer
immediately after receiving the offer. Later, when the called
user actually accepts the call, the application can create a new
"sendrecv" offer to update the previous offer/answer pair and start
the media flow. While this could also be done with an inactive
"pranswer", followed by a sendrecv "answer", the initial "pranswer"
leaves the offer-answer exchange open, which means that
neither side can send an updated offer during this time.As an example, consider a typical web application that will
set up a data channel, an audio channel, and a video channel. When
an endpoint receives an offer with these channels, it could send an
answer accepting the data channel for two-way data, and accepting
the audio and video tracks as inactive or receive-only. It could
then ask the user to accept the call, acquire the local media
streams, and send a new offer to the remote side moving the audio
and video to be two-way media. By the time the human has accepted
the call and triggered the new offer, it is likely that the ICE and
DTLS handshaking for all the channels will already have finished.Of course, some applications may not be able to perform this
double offer-answer exchange, particularly ones that are attempting
to gateway to legacy signaling protocols. In these cases, "pranswer"
can still provide the application with a mechanism to warm up the
transport.In certain situations it may be desirable to "undo" a change
made to setLocalDescription or setRemoteDescription. Consider a
case where a call is ongoing, and one side wants to change some of
the session parameters; that side generates an updated offer and
then calls setLocalDescription. However, the remote side, either
before or after setRemoteDescription, decides it does not want to
accept the new parameters, and sends a reject message back to the
offerer. Now, the offerer, and possibly the answerer as well, need
to return to a stable state and the previous local/remote
description. To support this, we introduce the concept of
"rollback".A rollback discards any proposed changes to the session,
returning the state machine to the stable state, and setting the
modified local and/or remote description back to their previous
values. Any resources or candidates that were allocated by the
abandoned local description are discarded; any media that is
received will be processed according to the previous local and
remote descriptions. Rollback can only be used to cancel proposed
changes; there is no support for rolling back from a stable state to
a previous stable state. Note that this implies that once the answerer
has performed setLocalDescription with his answer, this cannot
be rolled back.
A rollback is performed by supplying a session description of
type "rollback" with empty contents to either setLocalDescription or
setRemoteDescription, depending on which was most recently used
(i.e. if the new offer was supplied to setLocalDescription, the
rollback should be done using setLocalDescription as well).
The setLocalDescription method instructs the PeerConnection to
apply the supplied SDP blob as its local configuration. The type
field indicates whether the blob should be processed as an offer,
provisional answer, or final answer; offers and answers are checked
differently, using the various rules that exist for each SDP
line.This API changes the local media state; among other things, it
sets up local resources for receiving and decoding media. In order
to successfully handle scenarios where the application wants to
offer to change from one media format to a different, incompatible
format, the PeerConnection must be able to simultaneously support
use of both the old and new local descriptions (e.g. support codecs
that exist in both descriptions) until a final answer is received,
at which point the PeerConnection can fully adopt the new local
description, or roll back to the old description if the remote side
denied the change.This API indirectly controls the candidate gathering process.
When a local description is supplied, and the number of transports
currently in use does not match the number of transports needed by
the local description, the PeerConnection will create transports as
needed and begin gathering candidates for them.If setRemoteDescription was previous called with an offer, and
setLocalDescription is called with an answer (provisional or final),
and the media directions are compatible, and media are available to
send, this will result in the starting of media transmission.The setRemoteDescription method instructs the PeerConnection to
apply the supplied SDP blob as the desired remote configuration. As
in setLocalDescription, the type field of the indicates how the blob
should be processed.This API changes the local media state; among other things, it
sets up local resources for sending and encoding media.If setLocalDescription was previously called with an offer, and
setRemoteDescription is called with an answer (provisional or final),
and the media directions are compatible, and media are available to
send, this will result in the starting of media transmission.The localDescription method returns a copy of the current local
configuration, i.e. what was most recently passed to
setLocalDescription, plus any local candidates that have been
generated by the ICE Agent.[[OPEN ISSUE: Do we need to expose accessors for both the current and
proposed local description?
https://github.com/rtcweb-wg/jsep/issues/16]]A null object will be returned if the local description has not
yet been established, or if the PeerConnection has been closed.The remoteDescription method returns a copy of the current remote
configuration, i.e. what was most recently passed to
setRemoteDescription, plus any remote candidates that have been
supplied via processIceMessage.[[OPEN ISSUE: Do we need to expose accessors for both the current and
proposed remote description?
https://github.com/rtcweb-wg/jsep/issues/16]]A null object will be returned if the remote description has not
yet been established, or if the PeerConnection has been closed.
[[TODO: Revise if the W3C API uses different stuff here.]]
The canTrickle property indicates whether the remote side supports
receiving trickled candidates. There are three potential values:
No SDP has been received from the other side,
so it is not known if it can handle trickle. This is the initial
value before setRemoteDescription() is called.SDP has been received from the other side
indicating that it can support trickle.SDP has been received from the other side
indicating that it cannot support trickle.As described in ,
JSEP implementations always provide candidates to the application
individually, consistent with what is needed for Trickle ICE.
However, applications can use the canTrickle property to determine
whether they can actually do Trickle ICE, i.e. safely send an initial
offer or answer followed later by candidates as they are gathered.
As "true" is the only value that definitively indicates remote Trickle
ICE support, an application which compares canTrickle against "true"
will by default attempt Half Trickle on initial offers and Full Trickle
on subsequent interactions with a Trickle ICE-compatible agent.
The setConfiguration method allows the global configuration of the
PeerConnection, which was initially set by constructor parameters, to
be changed during the session. The effects of this method call depend
on when it is invoked, and differ depending on which specific
parameters are changed:Any changes to the STUN/TURN servers to use affect
the next gathering phase. If gathering has already occurred, this
will cause the next call to createOffer to generate new ICE
credentials, for the purpose of forcing an ICE restart and kicking
off a new gathering phase, in which the new servers will be used.
If the ICE candidate pool has a nonzero size, any existing
candidates will be discarded, and new candidates will be gathered
from the new servers.
Any changes to the ICE candidate policy also affect the next
gathering phase, in similar fashion to the server changes described
above. Note though that changes to the policy have no effect on
the candidate pool, because pooled candidates are not surfaced to
the application until a gathering phase occurs, and so any
necessary filtering can still be done on any pooled candidates.
Any changes to the ICE candidate pool size take effect
immediately; if increased, additional candidates are pre-gathered;
if decreased, the now-superfluous candidates are discarded.
Any changes to the BUNDLE policy take effect immediately, i.e.
any future tracks added to the PeerConnection will have their
bundle-only state marked accordingly.
This call may result in a change to the state of the ICE Agent,
and may result in a change to media state if it results in
connectivity being established.The addIceCandidate method provides a remote candidate to the ICE
Agent, which, if parsed successfully, will be added to the remote
description according to the rules defined for Trickle ICE.
Connectivity checks will be sent to the new candidate.This call will result in a change to the state of the ICE Agent,
and may result in a change to media state if it results in
connectivity being established.This section describes the specific procedures to be followed when
creating and parsing SDP objects.JSEP implementations must comply with the specifications listed
below that govern the creation and processing of offers and answers.
The first set of specifications is the "mandatory-to-implement" set.
All implementations must support these behaviors, but may not use all
of them if the remote side, which may not be a JSEP endpoint, does not
support them.The second set of specifications is the "mandatory-to-use" set.
The local JSEP endpoint and any remote endpoint must indicate support
for these specifications in their session descriptions.This list of mandatory-to-implement specifications is derived
from the requirements outlined in
.
is the base SDP specification
and MUST be implemented. MUST be supported for signaling
the UDP/TLS/RTP/SAVPF and TCP/TLS/RTP/SAVPF
RTP profiles.
MUST be implemented for
signaling the ICE credentials and candidate lines corresponding
to each media stream. The ICE implementation MUST be a Full
implementation, not a Lite implementation. MUST be implemented to signal
DTLS certificate fingerprints. MUST NOT be implemented to
signal SDES SRTP keying information.The grouping framework MUST
be implemented for signaling grouping information, and MUST be
used to identify m= lines via the a=mid attribute. MUST be
supported, in order to signal associations between RTP objects
and W3C MediaStreams and MediaStreamTracks in a standard way.
The bundle mechanism in MUST be
supported to signal the ability to multiplex RTP streams on a
single UDP port, in order to avoid excessive use of port number
resources.The SDP attributes of "sendonly", "recvonly", "inactive", and
"sendrecv" from MUST be
implemented to signal information about media direction. MUST be implemented to signal
RTP SSRC values. MUST be implemented to signal
RTCP based feedback. MUST be implemented to signal
multiplexing of RTP and RTCP. MUST be implemented to signal
reduced-size RTCP messages. with bandwidth modifiers MAY
be supported for specifying RTCP bandwidth as a fraction of the
media bandwidth, RTCP fraction allocated to the senders and
setting maximum media bit-rate boundaries.As required by , Section 5.13, JSEP
implementations MUST ignore unknown attribute (a=) lines.All session descriptions handled by JSEP endpoints,
both local and remote, MUST indicate support for the following
specifications. If any of these are absent, this omission MUST be
treated as an error. ICE, as specified in ,
MUST be used. Note that the remote endpoint may use a Lite
implementation; implementations MUST properly handle remote
endpoints which do ICE-Lite.DTLS-SRTP, as specified in ,
MUST be used.
For media m= sections, JSEP endpoints MUST support both the "UDP/TLS/
RTP/SAVPF" and "TCP/TLS/RTP/SAVPF" profiles and MUST indicate one of
these two profiles for each media m= line they produce in an offer.
For data m= sections, JSEP endpoints must support both the "UDP/TLS/SCTP"
and "TCP/TLS/SCTP" profiles and MUST indicate one of these two
profiles for each data m= line they produce in an offer.
Because ICE can select either TCP or UDP transport depending on
network conditions, both advertisements are consistent with
ICE eventually selecting either either UDP or TCP.
Unfortunately, in an attempt at compatibility, some endpoints
generate other profile strings even when they mean to support one of
these profiles. For instance, an endpoint might generate "RTP/AVP"
but supply "a=fingerprint" and "a=rtcp-fb" attributes, indicating
its willingness to support "(UDP,TCP)/TLS/RTP/SAVPF". In order to
simplify compatibility with such endpoints, JSEP endpoints MUST
follow the following rules when processing the media m= sections in
an offer:
The profile in any "m=" line in any answer MUST exactly match the profile
provided in the offer.Any profile matching the following patterns MUST be accepted:
"RTP/[S]AVP[F]" and "(UDP/TCP)/TLS/RTP/SAVP[F]"Because DTLS-SRTP is REQUIRED, the choice of SAVP or
AVP has no effect; support for DTLS-SRTP is determined by the presence
of the "a=fingerprint" attribute. Note that lack of an "a=fingerprint"
attribute will lead to negotiation failure.
The use of AVPF or AVP simply controls the timing
rules used for RTCP feedback. If AVPF is provided, or an "a=rtcp-fb"
attribute is present, assume AVPF timing, i.e. a default value of
"trr-int=0". Otherwise, assume that AVPF is being used in
an AVP compatible mode and use AVP timing, i.e., "trr-int=4".For data m= sections, JSEP endpoints MUST support receiving the "UDP/
TLS/SCTP", "TCP/TLS/SCTP", or "DTLS/SCTP" (for backwards
compatibility) profiles. Note that re-offers by JSEP endpoints MUST use the correct profile
strings even if the initial offer/answer exchange used an (incorrect)
older profile string.When createOffer is called, a new SDP description must be created
that includes the functionality specified in . The exact details of this
process are explained below.When createOffer is called for the first time, the result is
known as the initial offer.The first step in generating an initial offer is to generate
session-level attributes, as specified in , Section 5. Specifically: The first SDP line MUST be "v=0", as specified in , Section 5.1The second SDP line MUST be an "o=" line, as specified in
, Section 5.2. The value of the
<username> field SHOULD be "-". The value of the
<sess-id> field SHOULD be a cryptographically random
number. To ensure uniqueness, this number SHOULD be at least 64
bits long. The value of the <sess-version> field SHOULD be
zero. The value of the <nettype> <addrtype>
<unicast-address> tuple SHOULD be set to a non-meaningful
address, such as IN IP4 0.0.0.0, to prevent leaking the local
address in this field. As mentioned in , the entire o= line needs to be unique,
but selecting a random number for <sess-id> is sufficient
to accomplish this.The third SDP line MUST be a "s=" line, as specified in , Section 5.3; to match the "o=" line, a
single dash SHOULD be used as the session name, e.g. "s=-".
Note that this differs from the advice in
which proposes a single space, but as both "o=" and "s="
are meaningless, having the same meaningless value seems clearer.
Session Information ("i="), URI ("u="), Email Address ("e="),
Phone Number ("p="), Bandwidth ("b="), Repeat Times ("r="), and
Time Zones ("z=") lines are not useful in this context and
SHOULD NOT be included.Encryption Keys ("k=") lines do not provide sufficient
security and MUST NOT be included.A "t=" line MUST be added, as specified in , Section 5.9; both <start-time>
and <stop-time> SHOULD be set to zero, e.g. "t=0 0".An "a=msid-semantic:WMS" line MUST be added, as specified in
, Section 4.The next step is to generate m= sections, as specified in
Section 5.14, for each
MediaStreamTrack that has been added to the PeerConnection via the
addStream method. (Note that this method takes a MediaStream, which
can contain multiple MediaStreamTracks, and therefore multiple m=
sections can be generated even if addStream is only called once.)
m=sections MUST be sorted first by the order in which the
MediaStreams were added to the PeerConnection, and
then by the alphabetical ordering of the media type for the MediaStreamTrack.
For example, if a MediaStream containing both an audio and a video
MediaStreamTrack is added to a PeerConnection, the resultant m=audio
section will precede the m=video section. If a second MediaStream
containing an audio MediaStreamTrack was added, it would follow
the m=video section.Each m= section, provided it is not being bundled into
another m= section, MUST generate a unique set of ICE credentials
and gather its own unique set of ICE candidates. Otherwise, it MUST
use the same ICE credentials and candidates as the m= section
into which it is being bundled. Note that this means
that for offers, any m= sections which are not bundle-only
MUST have unique ICE credentials and candidates, since it is
possible that the answerer will accept them without bundling
them.
For DTLS, all m= sections MUST use the certificate for the
identity that has been specified for the PeerConnection; as a result,
they MUST all have the same fingerprint
value, or this value MUST be a session-level attribute.Each m= section should be generated as specified in , Section 5.14. For the m= line itself, the
following rules MUST be followed:The port value is set to the port of the default ICE candidate
for this m= section, but given that no candidates have yet been
gathered, the "dummy" port value of 9 (Discard) MUST be used, as
indicated in ,
Section 5.1.To properly indicate use of DTLS, the <proto> field MUST
be set to "UDP/TLS/RTP/SAVPF", as specified in
, Section 8,
if the default candidate uses UDP transport, or "TCP/TLS/RTP/SAVPF",
as specified in
if the default candidate uses TCP transport.The m= line MUST be followed immediately by a "c=" line, as specified
in , Section 5.7. Again, as no candidates
have yet been gathered, the "c=" line must contain the "dummy" value
"IN IP6 ::", as defined in ,
Section 5.1.Each m= section MUST include the following attribute lines:
An "a=mid" line, as specified in , Section 4. When generating mid values,
it is RECOMMENDED that the values be 3 bytes or less, to allow
them to efficiently fit into the RTP header extension defined in
,
Section 11.An "a=rtcp" line, as specified in ,
Section 2.1, containing the dummy value "9 IN IP6 ::", because
no candidates have yet been gathered.An "a=msid" line, as specified in , Section 2.An "a=sendrecv" line, as specified in , Section 5.1.For each supported codec, "a=rtpmap" and "a=fmtp" lines, as
specified in , Section 6. For
audio, the codecs specified in , Section 3, MUST be be
supported.If this m= section is for media with configurable frame sizes,
e.g. audio, an "a=maxptime" line, indicating the smallest of the
maximum supported frame sizes out of all codecs included above, as
specified in , Section 6.For each primary codec where RTP retransmission should be
used, a corresponding "a=rtpmap" line indicating "rtx" with the
clock rate of the primary codec and an "a=fmtp" line that
references the payload type of the primary codec, as specified
in , Section 8.1.For each supported FEC mechanism, a corresponding "a=rtpmap"
line indicating the desired FEC codec."a=ice-ufrag" and "a=ice-passwd" lines, as specified in , Section 15.4.An "a=ice-options" line, with the "trickle" option, as
specified in ,
Section 4.An "a=fingerprint" line, as specified in , Section 5; the algorithm used for the
fingerprint MUST match that used in the certificate signature.
An "a=setup" line, as specified in , Section 4, and clarified for use in
DTLS-SRTP scenarios in , Section
5. The role value in the offer MUST be "actpass".An "a=rtcp-mux" line, as specified in , Section 5.1.1.An "a=rtcp-rsize" line, as specified in , Section 5.For each supported RTP header extension, an "a=extmap" line,
as specified in , Section 5. The
list of header extensions that SHOULD/MUST be supported is
specified in ,
Section 5.2. [TODO: ensure that urn:ietf:params:rtp-hdrext:sdes:mid
appears either there or here]
Any header extensions that require encryption MUST
be specified as indicated in ,
Section 4.For each supported RTCP feedback mechanism, an "a=rtcp-fb"
mechanism, as specified in ,
Section 4.2. The list of RTCP feedback mechanisms that
SHOULD/MUST be supported is specified in , Section 5.1.An "a=ssrc" line, as specified in , Section 4.1, indicating the SSRC to be
used for sending media, along with the mandatory "cname" source
attribute, as specified in Section 6.1, indicating the CNAME for
the source. The CNAME must be generated in accordance with
. [OPEN ISSUE: How are CNAMEs
specified for MSTs? Are they randomly generated for each
MediaStream? If so, can two MediaStreams be synced?
See: https://github.com/rtcweb-wg/jsep/issues/4]If RTX is supported for this media type, another "a=ssrc"
line with the RTX SSRC, and an "a=ssrc-group" line, as specified
in , section 4.2, with semantics
set to "FID" and including the primary and RTX SSRCs.If FEC is supported for this media type, another "a=ssrc"
line with the FEC SSRC, and an "a=ssrc-group" line, as specified
in , section 4.2, with semantics
set to "FEC" and including the primary and FEC SSRCs.[OPEN ISSUE: Handling of a=imageattr]If the BUNDLE policy for this PeerConnection is set to
"max-bundle", and this is not the first m= section, or the BUNDLE
policy is set to "balanced", and this is not the first m= section
for this media type, an "a=bundle-only" line.Lastly, if a data channel has been created, a m= section MUST be
generated for data. The <media> field MUST be set to
"application" and the <proto> field MUST be set to
"UDP/TLS/SCTP" if the default candidate uses UDP transport,
or "TCP/TLS/SCTP" if the default candidate uses TCP transport . The "fmt" value
MUST be set to the SCTP port number, as specified in Section 4.1.
[TODO: update this to use a=sctp-port, as indicated in the latest
data channel docs]Within the data m= section, the "a=mid",
"a=ice-ufrag", "a=ice-passwd", "a=ice-options", "a=candidate",
"a=fingerprint", and "a=setup" lines MUST be included as mentioned
above, along with an "a=sctpmap" line referencing the SCTP port number
and specifying the application protocol indicated in . [OPEN ISSUE: the -01
of this document is missing this information.]Once all m= sections have been generated, a session-level
"a=group" attribute MUST be added as specified in . This attribute MUST have semantics
"BUNDLE", and MUST include the mid identifiers of each m= section.
The effect of this is that the browser offers all m= sections as one
BUNDLE group. However, whether the m= sections are bundle-only
or not depends on the BUNDLE policy.
Attributes which SDP permits to either be at the session
level or the media level SHOULD generally be at the media
level even if they are identical. This promotes readability,
especially if one of a set of initially identical attributes
is subsequently changed.
Attributes other than the ones specified above MAY be included,
except for the following attributes which are specifically
incompatible with the requirements of , and MUST NOT be
included: "a=crypto""a=key-mgmt""a=ice-lite"Note that when BUNDLE is used, any additional attributes that are
added MUST follow the advice in on how
those attributes interact with BUNDLE.Note that these requirements are in some cases stricter than those
of SDP. Implementations MUST be prepared to accept compliant SDP
even if it would not conform to the requirements for generating
SDP in this specification.When createOffer is called a second (or later) time, or is called
after a local description has already been installed, the processing
is somewhat different than for an initial offer.If the initial offer was not applied using setLocalDescription,
meaning the PeerConnection is still in the "stable" state, the steps
for generating an initial offer should be followed, subject to the
following restriction: The fields of the "o=" line MUST stay the same except for the
<session-version> field, which MUST increment if the session
description changes in any way, including the addition of
ICE candidates.If the initial offer was applied using setLocalDescription, but
an answer from the remote side has not yet been applied, meaning the
PeerConnection is still in the "local-offer" state, an offer is
generated by following the steps in the "stable" state above, along
with these exceptions: The "s=" and "t=" lines MUST stay the same.Each "m=" and c=" line MUST be filled in with the port and
address of the default candidate for the m= section, as described
in , Section 4.3. Each
"a=rtcp" attribute line MUST also be filled in with the port and
address of the appropriate default candidate, either the
default RTP or RTCP candidate, depending on whether RTCP multiplexing
is currently active or not. Note that if RTCP
multiplexing is being offered, but not yet active, the default RTCP
candidate MUST be used, as indicated in ,
section 5.1.3. In each case, if no candidates of the desired
type have yet been gathered, dummy values MUST be used, as
described above.
[TODO: update profile UDP/TCP per default candidate]Each "a=mid" line MUST stay the same.Each "a=ice-ufrag" and "a=ice-pwd" line MUST stay the
same, unless the ICE configuration has changed (either changes to
the supported STUN/TURN servers, or the ICE candidate policy), or
the "IceRestart" option (
was specified.
Within each m= section, for each candidate that has
been gathered during the most recent gathering phase
(see ),
an "a=candidate" line MUST be added, as specified in , Section 4.3., paragraph 3. If
candidate gathering for the section has completed, an
"a=end-of-candidates" attribute MUST be added, as described in
, Section 9.3.
For MediaStreamTracks that are still present, the "a=msid",
"a=ssrc", and "a=ssrc-group" lines MUST stay the same.If any MediaStreamTracks have been removed, either through
the removeStream method or by removing them from an added
MediaStream, their m= sections MUST be marked as recvonly by
changing the value of the
directional attribute to "a=recvonly". The "a=msid", "a=ssrc",
and "a=ssrc-group" lines MUST be removed from the associated m=
sections.If any MediaStreamTracks have been added, and there exist
m= sections of the appropriate media type with no associated
MediaStreamTracks (i.e. as described in the preceding paragraph),
those m= sections MUST be recycled by adding the new
MediaStreamTrack to the m= section. This is done by adding the
necessary "a=msid", "a=ssrc", and "a=ssrc-group" lines to the
recycled m= section, and removing the "a=recvonly" attribute.
If the initial offer was applied using setLocalDescription, and
an answer from the remote side has been applied using
setRemoteDescription, meaning the PeerConnection is in the
"remote-pranswer" or "stable" states, an offer is generated based on
the negotiated session descriptions by following the steps mentioned
for the "local-offer" state above, along with these exceptions:
[OPEN ISSUE: should this be permitted in the remote-pranswer state?]
If a m= section exists in the current local description, but
does not have an associated local MediaStreamTrack (possibly
because said MediaStreamTrack was removed since the last
exchange), a m= section MUST still be generated in the new offer,
as indicated in , Section 8. The
disposition of this section will depend on the state of the
remote MediaStreamTrack associated with this m= section. If one
exists, and it is still in the "live" state, the new m= section
MUST be marked as "a=recvonly", with no "a=msid" or related
attributes present. If no remote MediaStreamTrack
exists, or it is in the "ended" state, the m= section MUST be
marked as rejected, by setting the port to zero, as indicated in
, Section 8.2.If any MediaStreamTracks have been added, and there exist
recvonly m= sections of the appropriate media type with no
associated MediaStreamTracks, or rejected m= sections of any
media type, those m= sections MUST be recycled, and a
local MediaStreamTrack associated with these recycled m=
sections until all such existing m= sections have been
used. This includes any recvonly or rejected m= sections
created by the preceding paragraph.In addition, for each non-recycled, non-rejected m=
section in the new offer, the following adjustments are made based
on the contents of the corresponding m= section in the current
remote description: The m= line and corresponding "a=rtpmap" and "a=fmtp" lines
MUST only include codecs present in the remote description.The RTP header extensions MUST only include those that are
present in the remote description.The RTCP feedback extensions MUST only include those that are
present in the remote description.The "a=rtcp-mux" line MUST only be added if present in the
remote description.The "a=rtcp-rsize" line MUST only be added if present in the
remote description.The "a=group:BUNDLE" attribute MUST include the mid identifiers
specified in the BUNDLE group in the most recent answer, minus any
m= sections that have been marked as rejected, plus any newly added
or re-enabled m= sections. In other words, the BUNDLE attribute must
contain all m= sections that were previously bundled, as long as
they are still alive, as well as any new m= sections.The createOffer method takes as a parameter an RTCOfferOptions
object. Special processing is performed when generating a SDP
description if the following constraints are present.If the "OfferToReceiveAudio" option is specified, with an integer
value of N, and M audio MediaStreamTracks have been added to the
PeerConnection, the offer MUST include N non-rejected m= sections
with media type "audio", even if N is greater than M.
This allows the offerer to receive audio, including multiple independent
streams, even when not sending it; accordingly, the directional
attribute on the N-M audio m= sections without associated
MediaStreamTracks MUST be set to recvonly.If N is set to a value less than M, the offer MUST mark the
m= sections associated with the M-N most recently added
(since the last setLocalDescription) MediaStreamTracks as sendonly.
This allows the offerer to indicate that it does not want to receive
audio on some or all of its newly created streams.
For m= sections that have previously
been negotiated, this setting has no effect. [TODO: refer to
RTCRtpSender in the future]
For backwards compatibility with pre-standard
versions of this specification, a value of "true"
is interpreted as equivalent to N=1, and "false" as N=0.
If the "OfferToReceiveVideo" option is specified, with an integer
value of N, and M video MediaStreamTracks have been added to the
PeerConnection, the offer MUST include N non-rejected m= sections
with media type "video", even if N is greater than M.
This allows the offerer to receive video, including multiple independent
streams, even when not sending it; accordingly, the directional
attribute on the N-M video m= sections without associated
MediaStreamTracks MUST be set to recvonly.If N is set to a value less than M, the offer MUST mark the
m= sections associated with the M-N most recently added
(since the last setLocalDescription) MediaStreamTracks as sendonly.
This allows the offerer to indicate that it does not want to receive
video on some or all of its newly created streams.
For m= sections that have previously
been negotiated, this setting has no effect. [TODO: refer to
RTCRtpSender in the future]
For backwards compatibility with pre-standard
versions of this specification, a value of "true"
is interpreted as equivalent to N=1, and "false" as N=0.
If the "IceRestart" option is specified, with a value of
"true", the offer MUST indicate an ICE restart by generating new
ICE ufrag and pwd attributes, as specified in , Section
9.1.1.1. If this option is specified on an initial offer, it
has no effect (since a new ICE ufrag and pwd are already
generated). Similarly, if the ICE configuration has changed, this
option has no effect, since new ufrag and pwd attributes will be
generated automatically. This option is primarily useful for
reestablishing connectivity in cases where failures are detected by
the application.
If the "VoiceActivityDetection" option is specified, with a
value of "true", the offer MUST indicate support for silence
suppression in the audio it receives by including comfort noise
("CN") codecs for each offered audio codec, as specified in , Section 5.1, except for codecs that have their
own internal silence suppression support. For codecs that have their own
internal silence suppression support, the appropriate fmtp parameters
for that codec MUST be specified to indicate that silence suppression for
received audio is desired. For example, when using the Opus codec, the
"usedtx=1" parameter would be specified in the offer.
This option allows the endpoint to significantly reduce the amount of
audio bandwidth it receives, at the cost of some fidelity, depending on
the quality of the remote VAD algorithm.When createAnswer is called, a new SDP description must be created
that is compatible with the supplied remote description as well as the
requirements specified in . The exact details of this
process are explained below.When createAnswer is called for the first time after a remote
description has been provided, the result is known as the initial
answer. If no remote description has been installed, an answer
cannot be generated, and an error MUST be returned.Note that the remote description SDP may not have been created by
a JSEP endpoint and may not conform to all the requirements listed
in . For many cases, this is
not a problem. However, if any mandatory SDP attributes are missing,
or functionality listed as mandatory-to-use above is not present,
this MUST be treated as an error, and MUST cause the affected
m= sections to be marked as rejected.The first step in generating an initial answer is to generate
session-level attributes. The process here is identical to that
indicated in the Initial Offers section above.The next step is to generate m= sections for each m= section that
is present in the remote offer, as specified in , Section 6. For the purposes of this
discussion, any session-level attributes in the offer that are also
valid as media-level attributes SHALL be considered to be present in
each m= section.The next step is to go through each offered m= section.
If there is a local MediaStreamTrack of the same type which has been
added to the PeerConnection via addStream and not yet associated
with a m= section, and the specific m= section is either sendrecv or
recvonly, the MediaStreamTrack will be associated with the m=
section at this time. MediaStreamTracks are assigned to m=
sections using the canonical order described in
. If there are more m= sections of a certain
type than MediaStreamTracks, some m= sections will not have an
associated MediaStreamTrack. If there are more MediaStreamTracks of
a certain type than compatible m= sections, only the first N
MediaStreamTracks will be able to be associated in the constructed
answer. The remainder will need to be associated in a subsequent
offer.For each offered m= section, if the associated remote
MediaStreamTrack has been stopped, and is therefore in state "ended",
and no local MediaStreamTrack has been associated, the corresponding
m= section in the answer MUST be marked as rejected by setting the
port in the m= line to zero, as indicated in
, Section 6., and further processing for
this m= section can be skipped.Provided that is not the case, each m= section in the answer should
then be generated as specified in , Section 6.1. For the m= line itself, the following
rules must be followed:The port value would normally be set to the port of the default ICE candidate
for this m= section, but given that no candidates have yet been gathered,
the "dummy" port value of 9 (Discard) MUST be used, as
indicated in ,
Section 5.1.The <proto> field MUST be set to exactly match the <proto>
field for the corresponding m= line in the offer.The m= line MUST be followed immediately by a "c=" line, as specified
in , Section 5.7. Again, as no candidates
have yet been gathered, the "c=" line must contain the "dummy" value
"IN IP6 ::", as defined in ,
Section 5.1.
If the offer supports BUNDLE, all m= sections to be BUNDLEd must use
the same ICE credentials and candidates; all m= sections not being
BUNDLEd must use unique ICE credentials and candidates. Each
m= section MUST include the following:
If present in the offer, an "a=mid" line, as specified in
, Section 9.1. The "mid" value
MUST match that specified in the offer.An "a=rtcp" line, as specified in ,
Section 2.1, containing the dummy value "9 IN IP6 ::", because
no candidates have yet been gathered.If a local MediaStreamTrack has been associated, an "a=msid"
line, as specified in , Section 2.Depending on the directionality of the offer, the disposition
of any associated remote MediaStreamTrack, and the presence of an
associated local MediaStreamTrack, the appropriate directionality
attribute, as specified in ,
Section 6.1. If the offer was sendrecv, and the remote
MediaStreamTrack is still "live", and there is a local
MediaStreamTrack that has been associated, the directionality MUST be
set as sendrecv. If the offer was sendonly, and the remote
MediaStreamTrack is still "live", the directionality MUST be set
as recvonly. If the offer was recvonly, and a local
MediaStreamTrack has been associated, the directionality MUST be
set as sendonly. If the offer was inactive, the directionality
MUST be set as inactive.For each supported codec that is present in the offer,
"a=rtpmap" and "a=fmtp" lines, as specified in , Section 6, and , Section 6.1. For audio, the codecs
specified in ,
Section 3, MUST be supported. Note that for simplicity, the
answerer MAY use different payload types for codecs than the
offerer, as it is not prohibited by Section 6.1.If this m= section is for media with configurable frame sizes,
e.g. audio, an "a=maxptime" line, indicating the smallest of the
maximum supported frame sizes out of all codecs included above, as
specified in , Section 6.If "rtx" is present in the offer, for each primary codec
where RTP retransmission should be used, a corresponding
"a=rtpmap" line indicating "rtx" with the clock rate of the
primary codec and an "a=fmtp" line that references the payload
type of the primary codec, as specified in , Section 8.1.For each supported FEC mechanism that is present in the
offer, a corresponding "a=rtpmap" line indicating the desired
FEC codec."a=ice-ufrag" and "a=ice-passwd" lines, as specified in , Section 15.4.If the "trickle" ICE option is present in the offer, an
"a=ice-options" line, with the "trickle" option, as specified in
, Section
4.An "a=fingerprint" line, as specified in , Section 5; the algorithm used for the
fingerprint MUST match that used in the certificate signature.
An "a=setup" line, as specified in , Section 4, and clarified for use in
DTLS-SRTP scenarios in , Section
5. The role value in the answer MUST be "active" or "passive";
the "active" role is RECOMMENDED.If present in the offer, an "a=rtcp-mux" line, as specified
in , Section 5.1.1.If present in the offer, an "a=rtcp-rsize" line, as specified
in , Section 5.For each supported RTP header extension that is present in
the offer, an "a=extmap" line, as specified in , Section 5. The list of header
extensions that SHOULD/MUST be supported is specified in , Section 5.2.
[TODO: Ensure this contains MID header] Any
header extensions that require encryption MUST be specified as
indicated in , Section 4.For each supported RTCP feedback mechanism that is present in
the offer, an "a=rtcp-fb" mechanism, as specified in , Section 4.2. The list of RTCP feedback
mechanisms that SHOULD/MUST be supported is specified in , Section 5.1.If a local MediaStreamTrack has been associated, an "a=ssrc"
line, as specified in , Section
4.1, indicating the SSRC to be used for sending media.If a local MediaStreamTrack has been associated, and RTX has
been negotiated for this m= section, another "a=ssrc" line with
the RTX SSRC, and an "a=ssrc-group" line, as specified in , section 4.2, with semantics set to
"FID" and including the primary and RTX SSRCs.If a local MediaStreamTrack has been associated, and FEC has
been negotiated for this m= section, another "a=ssrc" line with
the FEC SSRC, and an "a=ssrc-group" line, as specified in , section 4.2, with semantics set to
"FEC" and including the primary and FEC SSRCs.[OPEN ISSUE: Handling of a=imageattr]If a data channel m= section has been offered, a m= section MUST
also be generated for data. The <media> field MUST be set to
"application" and the <proto> field MUST be set to
exactly match the field in the offer; the "fmt" value
MUST be set to the SCTP port number, as specified in Section 4.1.
[TODO: update this to use a=sctp-port, as indicated in the latest
data channel docs]Within the data m= section, the "a=mid",
"a=ice-ufrag", "a=ice-passwd", "a=ice-options", "a=candidate",
"a=fingerprint", and "a=setup" lines MUST be included as mentioned
above, along with an "a=sctpmap" line referencing the SCTP port number
and specifying the application protocol indicated in . [OPEN ISSUE: the -01
of this document is missing this information.]If "a=group" attributes with semantics of "BUNDLE" are offered,
corresponding session-level "a=group" attributes MUST be added as
specified in . These attributes MUST
have semantics "BUNDLE", and MUST include the all mid identifiers from
the offered BUNDLE groups that have not been rejected.
Note that regardless of the presence of "a=bundle-only" in the offer,
no m= sections in the answer should have an "a=bundle-only" line.
Attributes that are common between all m= sections MAY be moved
to session-level, if explicitly defined to be valid at
session-level.The attributes prohibited in the creation of offers are also
prohibited in the creation of answers.The createOffer method takes as a parameter an RTCAnswerOptions
object. Special processing is performed when generating a SDP
description if the following constraints are present.Handling of the "VoiceActivityDetection" option in answers is the same as is indicated for offers in .
It is possible to change elements in the SDP returned from createOffer
before passing it to setLocalDescription. When an implementation
receives modified SDP it MUST either:
Accept the changes and adjust its behavior to match the SDP.Reject the changes and return an error via the error callback.
Changes MUST NOT be silently ignored.
The following elements of the SDP media description MUST NOT be
changed between the createOffer and the setLocalDescription, since they
reflect transport attributes that are solely under browser control, and
the browser MUST NOT honor an attempt to change them:The number, type and port number of m= lines.The generated ICE credentials (a=ice-ufrag and a=ice-pwd).The set of ICE candidates and their parameters (a=candidate).The following modifications, if done by the browser to a description
between createOffer/createAnswer and the setLocalDescription, MUST be
honored by the browser:Remove or reorder codecs (m=)The following parameters may be controlled by constraints passed into
createOffer/createAnswer. As an open issue, these changes may also be be
performed by manipulating the SDP returned from
createOffer/createAnswer, as indicated above, as long as the
capabilities of the endpoint are not exceeded (e.g. asking for a
resolution greater than what the endpoint can encode):[[OPEN ISSUE: This is a placeholder for other modifications, which
we may continue adding as use cases appear.]]Implementations MAY choose to either honor or reject any elements not
listed in the above two categories, but must do so explicitly as
described at the beginning of this section. Note that future standards
may add new SDP elements to the list of elements which must be accepted
or rejected, but due to version skew, applications must be
prepared for implementations to accept changes which must be
rejected and vice versa.The application can also modify the SDP to reduce the capabilities in
the offer it sends to the far side or the offer that it installs from the
far side in any way the application sees fit, as long as it is a
valid SDP offer and specifies a subset of what was in the original offer.
This is safe because the answer is not permitted to expand capabilities
and therefore will just respond to what is actually in the offer.As always, the application is solely responsible for what it sends to
the other party, and all incoming SDP will be processed by the browser
to the extent of its capabilities. It is an error to assume that all SDP
is well-formed; however, one should be able to assume that any
implementation of this specification will be able to process, as a
remote offer or answer, unmodified SDP coming from any other
implementation of this specification. Note that this example section shows several SDP fragments. To
format in 72 columns, some of the lines in SDP have been split into
multiple lines, where leading whitespace indicates that a line
is a continuation of the previous line. In addition, some blank lines
have been added to improve readability but are not valid in SDP. More examples of SDP for WebRTC call flows can be found in . This section shows a very simple example that sets up a
minimal audio / video call between two browsers and does not use
trickle ICE. The example in the following section provides a
more realistic example of what would happen in a normal browser
to browser connection. The flow shows Alice's browser initiating the session to
Bob's browser. The messages from Alice's JS to Bob's JS are
assumed to flow over some signaling protocol via a web
server. The JS on both Alice's side and Bob's side waits
for all candidates before sending the offer or answer,
so the offers and answers are complete. Trickle ICE is
not used. Both Alice and Bob are using the default policy of balanced.
The SDP for |offer-A1| looks like:
The SDP for |answer-A1| looks like: This section shows a typical example of a session between two
browsers setting up an audio channel and a data channel. Trickle
ICE is used in full trickle mode with a policy of max-bundle-and-rtcp-mux
and a single TURN server.
Later, two video flows, one for the
presenter and one for screen sharing, are added to the session.
This example shows Alice's browser initiating the
session to Bob's browser. The messages from Alice's JS to Bob's
JS are assumed to flow over some signaling protocol via a web
server.
The SDP for |offer-B1| looks like:
The SDP for |candidate-B1| looks like:
The SDP for |candidate-B2| looks like:
The SDP for |candidate-B3| looks like:
The SDP for |answer-B1| looks like:
The SDP for |candidate-B4| looks like:
The SDP for |candidate-B5| looks like:
The SDP for |candidate-B6| looks like:
The SDP for |offer-B2| looks like:
(note the increment of the version number in the o= line,
and the c= and a=rtcp lines, which indicate the local candidate
that was selected)
The SDP for |answer-B2| looks like:
(note the use of setup:passive to maintain
the existing DTLS roles, and the use of a=recvonly to
indicate that the video streams are one-way)
The IETF has published separate documents
describing the security architecture for WebRTC as a whole.
The remainder of this section describes security considerations
for this document.
While formally the JSEP interface is an API, it is better to
think of it is an Internet protocol, with the JS being untrustworthy
from the perspective of the browser. Thus, the threat model of applies. In particular, JS can call the API in any
order and with any inputs, including malicious ones. This is
particularly relevant when we consider the SDP which is passed to
setLocalDescription(). While correct API usage requires that the
application pass in SDP which was derived from createOffer() or
createAnswer() (perhaps suitably modified as described in , there is no guarantee that
applications do so. The browser MUST be prepared for the JS to pass in
bogus data instead.
Conversely, the application programmer MUST recognize that the
JS does not have complete control of browser behavior. One case that
bears particular mention is that editing ICE candidates out of the SDP
or suppressing trickled candidates does not have the expected
behavior: implementations will still perform checks from those
candidates even if they are not sent to the other side. Thus, for
instance, it is not possible to prevent the remote peer from
learning your public IP address by removing server reflexive
candidates. Applications which wish to conceal their public IP address
should instead configure the ICE agent to use only relay candidates.
This document requires no actions from IANA.Significant text incorporated in the draft as well and review was
provided by Harald Alvestrand and Suhas Nandakumar. Dan
Burnett, Neil Stratford, Eric Rescorla, Anant Narayanan, Andrew Hutton,
Richard Ejzak, Adam
Bergkvist and Matthew Kaufman all provided valuable feedback on this proposal.
Interactive Connectivity Establishment (ICE): A Protocol for
Network Address Translator (NAT) Traversal for Offer/Answer
ProtocolsThe Session Description Protocol (SDP) Grouping
FrameworkIn this specification, we define a framework to group "m" lines
in the Session Description Protocol (SDP) for different purposes.
This framework uses the "group" and "mid" SDP attributes, both of
which are defined in this specification. Additionally, we specify
how to use the framework for two different purposes: for lip
synchronization and for receiving a media flow consisting of
several media streams on different transport addresses. This
document obsoletes RFC 3388. [STANDARDS-TRACK]Multiplexing RTP Data and Control Packets on a Single
PortThis memo discusses issues that arise when multiplexing RTP
data packets and RTP Control Protocol (RTCP) packets on a single
UDP port. It updates RFC 3550 and RFC 3551 to describe when such
multiplexing is and is not appropriate, and it explains how the
Session Description Protocol (SDP) can be used to signal
multiplexed sessions. [STANDARDS-TRACK]Extended RTP Profile for Real-time Transport Control Protocol
(RTCP)-Based Feedback (RTP/AVPF)Real-time media streams that use RTP are, to some degree,
resilient against packet losses. Receivers may use the base
mechanisms of the Real-time Transport Control Protocol (RTCP) to
report packet reception statistics and thus allow a sender to
adapt its transmission behavior in the mid-term. This is the sole
means for feedback and feedback-based error repair (besides a few
codec-specific mechanisms). This document defines an extension to
the Audio-visual Profile (AVP) that enables receivers to provide,
statistically, more immediate feedback to the senders and thus
allows for short-term adaptation and efficient feedback-based
repair mechanisms to be implemented. This early feedback profile
(AVPF) maintains the AVP bandwidth constraints for RTCP and
preserves scalability to large groups. [STANDARDS-TRACK]Extended Secure RTP Profile for Real-time Transport Control
Protocol (RTCP)-Based Feedback (RTP/SAVPF)An RTP profile (SAVP) for secure real-time communications and
another profile (AVPF) to provide timely feedback from the
receivers to a sender are defined in RFC 3711 and RFC 4585,
respectively. This memo specifies the combination of both profiles
to enable secure RTP communications with feedback.
[STANDARDS-TRACK]SIP: Session Initiation ProtocolThis document describes Session Initiation Protocol (SIP), an
application-layer control (signaling) protocol for creating,
modifying, and terminating sessions with one or more participants.
These sessions include Internet telephone calls, multimedia
distribution, and multimedia conferences. [STANDARDS-TRACK]WebRTC Audio Codec and Processing RequirementsThis document outlines the audio codec and processing
requirements for WebRTC client application and endpoint
devices.Stream Control Transmission Protocol (SCTP)-Based Media
Transport in the Session Description Protocol (SDP)SCTP (Stream Control Transmission Protocol) is a transport
protocol used to establish associations between two endpoints.
This document describes how to express media transport over SCTP
in SDP (Session Description Protocol). This document defines the
'SCTP', 'SCTP/DTLS' and 'DTLS/SCTP' protocol identifiers for
SDP.Connection-Oriented Media Transport over the Transport Layer
Security (TLS) Protocol in the Session Description Protocol
(SDP)This document specifies how to establish secure
connection-oriented media transport sessions over the Transport
Layer Security (TLS) protocol using the Session Description
Protocol (SDP). It defines a new SDP protocol identifier,
'TCP/TLS'. It also defines the syntax and semantics for an SDP
'fingerprint' attribute that identifies the certificate that will
be presented for the TLS session. This mechanism allows media
transport over TLS connections to be established securely, so long
as the integrity of session descriptions is
assured.</t><t> This document extends and updates RFC
4145. [STANDARDS-TRACK]TCP-Based Media Transport in the Session Description Protocol
(SDP)This document describes how to express media transport over TCP
using the Session Description Protocol (SDP). It defines the SDP
'TCP' protocol identifier, the SDP 'setup' attribute, which
describes the connection setup procedure, and the SDP 'connection'
attribute, which handles connection reestablishment.
[STANDARDS-TRACK]IANA registration of SDP 'proto' attribute
for transporting RTP Media over TCP under various RTP profiles.Cisco Systems Inc707 Tasman DriveSan JoseCA95134USAsnandaku@cisco.com
RAI
MMUSIC
RTP provides end-to-end network transport functions suitable for
applications transmitting real-time data, such as audio, video or
simulation data, over multicast or unicast network services. The data
transport is augmented by a control protocol (RTCP) to allow monitoring
of the data delivery in a manner scalable to large multicast networks,
and to provide minimal control and identification functionality.
This document describes how to express RTP media transport over TCP
in SDP (Session Description Protocol) under various configurations.
This document defines 'TCP/RTP/AVPF', 'TCP/RTP/SAVP', 'TCP/RTP/SAVPF',
'TCP/TLS/RTP/SAVP', 'TCP/TLS/RTP/SAVPF' protocol identifiers for SDP.
A Framework for SDP Attributes when MultiplexingA General Mechanism for RTP Header ExtensionsThis document provides a general mechanism to use the header
extension feature of RTP (the Real-Time Transport Protocol). It
provides the option to use a small number of small extensions in
each RTP packet, where the universe of possible extensions is
large and registration is de-centralized. The actual extensions in
use in a session are signaled in the setup information for that
session. [STANDARDS-TRACK]Encryption of Header Extensions in the Secure Real-time
Transport Protocol (SRTP)The Secure Real-time Transport Protocol (SRTP) provides
authentication, but not encryption, of the headers of Real-time
Transport Protocol (RTP) packets. However, RTP header extensions
may carry sensitive information for which participants in
multimedia sessions want confidentiality. This document provides a
mechanism, extending the mechanisms of SRTP, to selectively
encrypt RTP header extensions in SRTP.</t><t> This
document updates RFC 3711, the Secure Real-time Transport Protocol
specification, to require that all future SRTP encryption
transforms specify how RTP header extensions are to be
encrypted.Web Real-Time Communication (WebRTC): Media Transport and Use
of RTPThe Web Real-Time Communication (WebRTC) framework provides
support for direct interactive rich communication using audio,
video, text, collaboration, games, etc. between two peers'
web-browsers. This memo describes the media transport aspects of
the WebRTC framework. It specifies how the Real-time Transport
Protocol (RTP) is used in the WebRTC context, and gives
requirements for which RTP features, profiles, and extensions need
to be supported.Multiplexing Negotiation Using Session Description Protocol
(SDP) Port NumbersThis specification defines a new SDP Grouping Framework
extension, "BUNDLE", that can be used with the Session Description
Protocol (SDP) Offer/Answer mechanism to negotiate the usage of
bundled media, which refers to the usage of a single 5-tuple for
media associated with multiple SDP media descriptions ("m="
lines).Cross Session Stream Identification in the Session
Description ProtocolThis document specifies a grouping mechanism for RTP media
streams that can be used to specify relations between media
streams. This mechanism is used to signal the association between
the SDP concept of "m-line" and the WebRTC concept of
"MediaStream" / "MediaStreamTrack" using SDP signaling. This
document is a work item of the MMUSIC WG, whose discussion list is
mmusic@ietf.org.Guidelines for Choosing RTP Control Protocol (RTCP) Canonical Names (CNAMEs)Security Considerations for WebRTCThe Real-Time Communications on the Web (RTCWEB) working group is tasked with standardizing protocols for real-time communications between Web browsers, generally called "WebRTC". The major use cases for WebRTC technology are real-time audio and/or video calls, Web conferencing, and direct data transfer. Unlike most conventional real-time systems (e.g., SIP-based soft phones) WebRTC communications are directly controlled by a Web server, which poses new security challenges. For instance, a Web browser might expose a JavaScript API which allows a server to place a video call. Unrestricted access to such an API would allow any site which a user visited to "bug" a user's computer, capturing any activity which passed in front of their camera. This document defines the WebRTC threat model and analyzes the security threats of WebRTC in that model.WebRTC Security ArchitectureThe Real-Time Communications on the Web (RTCWEB) working group is tasked with standardizing protocols for enabling real-time communications within user-agents using web technologies (commonly called "WebRTC"). This document defines the security architecture for WebRTC.Guidelines for Writing RFC Text on Security ConsiderationsAll RFCs are required to have a Security Considerations section. Historically, such sections have been relatively weak. This document provides guidelines to RFC authors on how to write a good Security Considerations section. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.Key words for use in RFCs to Indicate
Requirement LevelsHarvard University1350 Mass. Ave.CambridgeMA 02138- +1 617 495 3864sob@harvard.edu
General
keywordAn Offer/Answer Model with Session Description Protocol
(SDP)SDP: Session Description ProtocolWebRTC Data Channel ProtocolThe Web Real-Time Communication (WebRTC) working group is
charged to provide protocols to support for direct interactive
rich communication using audio, video, and data between two peers'
web- browsers. This document specifies an actual (minor) protocol
for how the JS-layer DataChannel objects provide the data channels
between the peers.
Real Time Control Protocol (RTCP) attribute in Session Description Protocol (SDP)
The Session Description Protocol (SDP) is used to describe the parameters of media streams used in multimedia sessions. When a session requires multiple ports, SDP assumes that these ports have consecutive numbers. However, when the session crosses a network address translation device that also uses port mapping, the ordering of ports can be destroyed by the translation. To handle this, we propose an extension attribute to SDP.
Trickle ICE: Incremental Provisioning of Candidates for the
Interactive Connectivity Establishment (ICE) ProtocolThis document describes an extension to the Interactive
Connectivity Establishment (ICE) protocol that allows ICE agents
to send and receive candidates incrementally rather than
exchanging complete lists. With such incremental provisioning, ICE
agents can begin connectivity checks while they are still
gathering candidates and considerably shorten the time necessary
for ICE processing to complete. The above mechanism is also
referred to as "trickle ICE".Session Description Protocol (SDP) Bandwidth Modifiers for
RTP Control Protocol (RTCP) BandwidthThis document defines an extension to the Session Description
Protocol (SDP) to specify two additional modifiers for the
bandwidth attribute. These modifiers may be used to specify the
bandwidth allowed for RTP Control Protocol (RTCP) packets in a
Real-time Transport Protocol (RTP) session. [STANDARDS-TRACK]Source-Specific Media Attributes in the Session Description
Protocol (SDP)The Session Description Protocol (SDP) provides mechanisms to
describe attributes of multimedia sessions and of individual media
streams (e.g., Real-time Transport Protocol (RTP) sessions) within
a multimedia session, but does not provide any mechanism to
describe individual media sources within a media stream. This
document defines a mechanism to describe RTP media sources, which
are identified by their synchronization source (SSRC) identifiers,
in SDP, to associate attributes with these sources, and to express
relationships among sources. It also defines several source-level
attributes that can be used to describe properties of media
sources. [STANDARDS-TRACK]Support for Reduced-Size Real-Time Transport Control Protocol
(RTCP): Opportunities and ConsequencesThis memo discusses benefits and issues that arise when
allowing Real-time Transport Protocol (RTCP) packets to be
transmitted with reduced size. The size can be reduced if the
rules on how to create compound packets outlined in RFC 3550 are
removed or changed. Based on that analysis, this memo defines
certain changes to the rules to allow feedback messages to be sent
as Reduced-Size RTCP packets under certain conditions when using
the RTP/AVPF (Real-time Transport Protocol / Audio-Visual Profile
with Feedback) profile (RFC 4585). This document updates RFC 3550,
RFC 3711, and RFC 4585. [STANDARDS-TRACK]Early Media and Ringing Tone Generation in the Session
Initiation Protocol (SIP)This document describes how to manage early media in the
Session Initiation Protocol (SIP) using two models: the gateway
model and the application server model. It also describes the
inputs one needs to consider in defining local policies for
ringing tone generation. This memo provides information for the
Internet community.RTP Retransmission Payload FormatRTP retransmission is an effective packet loss recovery
technique for real-time applications with relaxed delay bounds.
This document describes an RTP payload format for performing
retransmissions. Retransmitted RTP packets are sent in a separate
stream from the original RTP stream. It is assumed that feedback
from receivers to senders is available. In particular, it is
assumed that Real-time Transport Control Protocol (RTCP) feedback
as defined in the extended RTP profile for RTCP-based feedback
(denoted RTP/AVPF) is available in this memo.
[STANDARDS-TRACK]Real-time Transport Protocol (RTP) Payload for Comfort Noise
(CN)SDP for the WebRTCThe Web Real-Time Communication (WebRTC) [WEBRTC] working group
is charged to provide protocol support for direct interactive rich
communication using audio,video and data between two peers' web
browsers. With in the WebRTC framework, Session Description
protocol (SDP) [RFC4566] is used for negotiating session
capabilities between the peers. Such a negotiataion happens based
on the SDP Offer/Answer exchange mechanism described in the RFC
3264 [RFC3264]. This document serves a introductory purpose in
describing the role of SDP for the most common WebRTC use-cases.
This SDP examples provided in this document is still a work in
progress, but aims to align closest to the evolving standards.Framework for Establishing a Secure Real-time Transport
Protocol (SRTP) Security Context Using Datagram Transport Layer
Security (DTLS)This document specifies how to use the Session Initiation
Protocol (SIP) to establish a Secure Real-time Transport Protocol
(SRTP) security context using the Datagram Transport Layer
Security (DTLS) protocol. It describes a mechanism of transporting
a fingerprint attribute in the Session Description Protocol (SDP)
that identifies the key that will be presented during the DTLS
handshake. The key exchange travels along the media path as
opposed to the signaling path. The SIP Identity mechanism can be
used to protect the integrity of the fingerprint attribute from
modification by intermediate proxies. [STANDARDS-TRACK]Datagram Transport Layer Security (DTLS) Extension to Establish Keys for the Secure Real-time Transport Protocol (SRTP)This document describes a Datagram Transport Layer Security (DTLS) extension to establish keys for Secure RTP (SRTP) and Secure RTP Control Protocol (SRTCP) flows. DTLS keying happens on the media path, independent of any out-of-band signalling channel present. [STANDARDS-TRACK]Session Description Protocol (SDP) Security Descriptions for
Media StreamsWebRTC 1.0: Real-time Communication Between Browsers Note: This section will be removed by RFC Editor before publication. Changes in draft-08:
Added new example section and removed old examples in appendix. Fixed <proto> field handling.Added text describing a=rtcp attribute.Reworked handling of OfferToReceiveAudio and OfferToReceiveVideo per discussion
at IETF 90.Reworked trickle ICE handling and its impact on m= and c= lines per
discussion at interim.Added max-bundle-and-rtcp-mux policy.Added description of maxptime handling.Updated ICE candidate pool default to 0.Resolved open issues around AppID/receiver-ID.Reworked and expanded how changes to the ICE configuration are handled.Some reference updates.Editorial clarification.Changes in draft-07:
Expanded discussion of VAD and Opus DTX.Added a security considerations section.Rewrote the section on modifying SDP to require
implementations to clearly indicate whether any given
modification is allowed.Clarified impact of IceRestart on CreateOffer in
local-offer state.Guidance on whether attributes should be defined at the
media level or the session level.Renamed "default" bundle policy to "balanced".Removed default ICE candidate pool size and clarify how it works.Defined a canonical order for assignment of MSTs to m= lines.Removed discussion of rehydration.Added Eric Rescorla as a draft editor.Cleaned up references.Editorial cleanupChanges in draft-06: Reworked handling of m= line recycling.Added handling of BUNDLE and bundle-only.Clarified handling of rollback.Added text describing the ICE Candidate Pool and its behavior.Allowed OfferToReceiveX to create multiple recvonly m= sections.
Changes in draft-05: Fixed several issues identified in the createOffer/Answer
sections during document review.Updated references.Changes in draft-04: Filled in sections on createOffer and createAnswer.Added SDP examples.Fixed references.Changes in draft-03: Added text describing relationship to W3C specificationChanges in draft-02: Converted from nroffRemoved comparisons to old approaches abandoned by the working
groupRemoved stuff that has moved to W3C specificationAlign SDP handling with W3C draftClarified section on forking.Changes in draft-01: Added diagrams for architecture and state machine.Added sections on forking and rehydration.Clarified meaning of "pranswer" and "answer".Reworked how ICE restarts and media directions are
controlled.Added list of parameters that can be changed in a
description.Updated suggested API and examples to match latest thinking.Suggested API and examples have been moved to an appendix.Changes in draft -00: Migrated from draft-uberti-rtcweb-jsep-02.