Network Working Group                                        C. Jennings
Internet-Draft                                                     Cisco
Intended status: Standards Track                          March 18, 2018
Expires: September 19, 2018


                          Modular Media Stack
                  draft-jennings-dispatch-new-media-01

Abstract

   A sketch of a proposal for a modular media stack for interactive
   communications.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 19, 2018.

Copyright Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Jennings               Expires September 19, 2018               [Page 1]

Internet-Draft                  new-media                     March 2018


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Overview  . . . . . . . . . . . . . . . . . . . . . . . . . .   4
   4.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
   5.  Architecture  . . . . . . . . . . . . . . . . . . . . . . . .   5
   6.  Connectivity Layer  . . . . . . . . . . . . . . . . . . . . .   5
     6.1.  Snowflake - New ICE . . . . . . . . . . . . . . . . . . .   6
     6.2.  STUN2 . . . . . . . . . . . . . . . . . . . . . . . . . .   6
       6.2.1.  STUN2 Request . . . . . . . . . . . . . . . . . . . .   6
       6.2.2.  STUN2 Response  . . . . . . . . . . . . . . . . . . .   6
     6.3.  TURN2 . . . . . . . . . . . . . . . . . . . . . . . . . .   7
   7.  Transport Layer . . . . . . . . . . . . . . . . . . . . . . .   8
   8.  Media Layer - RTP3  . . . . . . . . . . . . . . . . . . . . .   9
     8.1.  RTP Meta Data . . . . . . . . . . . . . . . . . . . . . .  12
     8.2.  Securing the messages . . . . . . . . . . . . . . . . . .  12
     8.3.  Sender requests . . . . . . . . . . . . . . . . . . . . .  12
     8.4.  Data Codecs . . . . . . . . . . . . . . . . . . . . . . .  13
     8.5.  Media Keep Alive  . . . . . . . . . . . . . . . . . . . .  13
     8.6.  Forward Error Correction  . . . . . . . . . . . . . . . .  13
     8.7.  MTI Codecs  . . . . . . . . . . . . . . . . . . . . . . .  13
       8.7.1.  Audio . . . . . . . . . . . . . . . . . . . . . . . .  13
       8.7.2.  Video . . . . . . . . . . . . . . . . . . . . . . . .  13
       8.7.3.  Annotation  . . . . . . . . . . . . . . . . . . . . .  14
       8.7.4.  Application Data Channels . . . . . . . . . . . . . .  14
       8.7.5.  Reverse Requests & Stats  . . . . . . . . . . . . . .  14
     8.8.  Message Key Agreement . . . . . . . . . . . . . . . . . .  15
   9.  Control Layer . . . . . . . . . . . . . . . . . . . . . . . .  15
     9.1.  Transport Capabilities API  . . . . . . . . . . . . . . .  15
     9.2.  Media Capabilities API  . . . . . . . . . . . . . . . . .  15
     9.3.  Transport Configuration API . . . . . . . . . . . . . . .  16
     9.4.  Media Configuration API . . . . . . . . . . . . . . . . .  16
     9.5.  Transport Metrics . . . . . . . . . . . . . . . . . . . .  18
     9.6.  Flow Metrics API  . . . . . . . . . . . . . . . . . . . .  18
     9.7.  Stream Metrics API  . . . . . . . . . . . . . . . . . . .  19
   10. Call Signalling - JABBER2 . . . . . . . . . . . . . . . . . .  19
   11. Signalling Examples . . . . . . . . . . . . . . . . . . . . .  20
     11.1.  Simple Audio Example . . . . . . . . . . . . . . . . . .  20
       11.1.1.  simple audio advertisement . . . . . . . . . . . . .  20
       11.1.2.  simple audio proposal  . . . . . . . . . . . . . . .  21
     11.2.  Simple Video Example . . . . . . . . . . . . . . . . . .  22
       11.2.1.  Proposal sent to camera  . . . . . . . . . . . . . .  23
     11.3.  Simulcast Video Example  . . . . . . . . . . . . . . . .  24
     11.4.  FEC Example  . . . . . . . . . . . . . . . . . . . . . .  24
       11.4.1.  Advertisement includes a FEC codec.  . . . . . . . .  24
       11.4.2.  Proposal sent to camera  . . . . . . . . . . . . . .  25
   12. Switched Forwarding Unit (SFU)  . . . . . . . . . . . . . . .  26


Jennings               Expires September 19, 2018               [Page 2]

Internet-Draft                  new-media                     March 2018


     12.1.  Software Defined Networking  . . . . . . . . . . . . . .  26
     12.2.  Vector Packet Processors . . . . . . . . . . . . . . . .  27
     12.3.  Information Centric Networking . . . . . . . . . . . . .  27
   13. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  27
   14. Other Work  . . . . . . . . . . . . . . . . . . . . . . . . .  27
   15. Style of specification  . . . . . . . . . . . . . . . . . . .  27
   16. Informative References  . . . . . . . . . . . . . . . . . . .  28
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  28

1.  Introduction

   This draft is an accumulation of varios ideas some people are
   thinking about.  Most of them are fairly separable and could be
   morphed into existing protocols though this draft takes a blank sheet
   of paper approach to considering what would be the best think if we
   were starting from scratch.  With that is place, it is possible to
   ask which of theses ideas makes sense to back patch into existing
   protocols.

2.  Goals

   o  Better connectivity by enable situation where asymmetric media is
      possible.

   o  Design for SFU ( Switch Forwarding Units).  Design for multiparty
      calls first then consider two party calls as a specialized subcase
      of that.

   o  Designed for client servers with server based controll of clients

   o  Faster setup

   o  Pluggable congestion controll

   o  much much simpler

   o  end to end security

   o  remove ability to use STUN / TURN in DDOS reflection attacks

   o  ability for receiver of video to tell the sender about size
      changes of display window such that the sender can match

   o  Eliminiate the problems with ROC in SRTP

   o  address reasons people have not used from SDES to DTLS-SRTP

   o  seperation of call setup and ongoing call / conference control


Jennings               Expires September 19, 2018               [Page 3]

Internet-Draft                  new-media                     March 2018


   o  make codec negotiation more generic so that it works for future
      codecs

   o  remove ICE's need for global pacing which is more or less
      imposible on general purpose devices like PCs

3.  Overview

   This draft proposes a new media stack to replace the existing stack
   RTP, DTLS-SRTP, and SDP Offer Answer.  The key parts of this stack
   are connectivity layer, the transport layer, the media layer, a
   control API, and the singling layer.

   The connectivity layer uses a simplified version of ICE, called
   snowflake [I-D.jennings-dispatch-snowflake], to find connectivity
   between endpoints and change the connectivity from one address to
   another as different networks become available or disappear.  It is
   based on ideas from [I-D.jennings-mmusic-ice-fix].

   The transport layer uses QUIC to provide a hop by hop encrypted,
   congestion controlled transport of media.  Although QUIC does not
   currently have all of the partial reliability mechanisms to make this
   work, this draft assumes that they will be added to QUIC.

   The media layer uses existing codecs and packages them along with
   extra header information to provide information about, when the
   sequence needs to be played back, which camera it came from, and
   media streams to be synchronized.

   The control API is an abstract API that provides a way for the media
   stack to report it capabilities and features and a way for the an
   application tell the media stack how it should be configured.
   Configuration includes what codec to use, size and frame rate of
   video, and where to send the media.

   The singling layer is based on an advertisement and proposal model.
   Each endpoint can create an advertisement that describes what it
   supports including things like supported codecs and maximum bitrates.
   A proposal can be sent to an endpoint that tells the endpoint exactly
   what media to send and receive and where to send it.  The endpoint
   can accept or reject this proposal in total but cannot change any
   part of it.

4.  Terminology

   o  media stream: Stream of information from a single sensor.  For
      example, a video stream from a single camera.  A stream may have
      multiple encodings for example video at different resolutions.


Jennings               Expires September 19, 2018               [Page 4]

Internet-Draft                  new-media                     March 2018


   o  encoding: A encoded version of a stream.  A given stream may have
      several encodings at different resolutions.  One encoding may
      depend on other encodings such as forward error corrections or in
      the case of scalable video codecs.

   o  flow: A logical transport between two computers.  Many media
      streams can be transported over a single flow.  The actually IP
      address and ports used to transport data in the flow may change
      over time as connectivity changes.

   o  message: some data or media that to be sent across the network
      along with metadata about it.  Similar to an RTP packet.

   o  media source: a camera, microphone or other source of data on an
      endpoint

   o  media sink: a speaker, screen, or other destination for data on an
      endpoint

   o  TLV: Tag Length Value.  When used in the draft, the Tag, Length,
      and any integer values are coded as variable length integers
      similar to how this is done in CBOR.

5.  Architecture

   Much of the deployments architecture of IETF media designs are based
   on a distributed controller for the media stack that is running peer
   to peer in each client.  Nearly all deployments, by they a cloud
   based conferencing systems or an enterprise PBX, use a central
   controller that acts as an SBC to try and controll each client.  The
   goal here would be an deployment architecture that

   o  support a single controller that controlled all the device in a
      given conference or call.  The controller could be in the cloud or
      running on one of the endpoints.

   o  design for multi party conference calls first and treat 2 party
      calls as a specialed sub case of that

   o  design with the assumption that an light weight SFU (Switched
      Forwarding Unit) was used to distribute media for conference
      calls.

6.  Connectivity Layer


Jennings               Expires September 19, 2018               [Page 5]

Internet-Draft                  new-media                     March 2018


6.1.  Snowflake - New ICE

   All that is needed to discover the connectivity is way to:

   o  Gather some IP/ports that may work using TURN2 relay, STUN2, and
      local addresses.

   o  A controller, which might be running in the cloud, to inform a
      client to send a STUN2 packet from a given source IP/port to a
      given destination IP/port.

   o  The receiver notifies the controller about information on received
      STUN2 packets.

   o  The controller can tell the sender the secret that was in the
      packet to prove consent of the receiver to receive data then the
      sending client can allow media to flow over that connection.

   The actually algorithm used to decide on what pairs of addresses are
   tested and in what order does not need to be agreed on by both the
   sides of the call - only the controller needs to know it.  This
   allows the controller to use machine learning, past history, and
   heuristics to find an optimal connection much faster than something
   like ICE.

   The details of this approach are described in
   [I-D.jennings-dispatch-snowflake].  Many of ideas in this can be
   traced back to [I-D.kaufman-rtcweb-traversal].

6.2.  STUN2

   The speed of setting up a new media flow is often determined by how
   many STUN2 checks need to be done.  If the STUN2 packets are smaller,
   then the stun checks can be done faster without risk of causing
   congestion.

6.2.1.  STUN2 Request

   A STUN2 request consists of, well, really nothing.  The STUN client
   just opens a QUIC connection to the STUN server.

6.2.2.  STUN2 Response

   When the STUN2 sever receives a new QUIC connection, it responds with
   the IP address and port that the connection came from.

   The client can check it is talking to the correct STUN server by
   checking the fingerprint of the certificate.  Protocols like ICE


Jennings               Expires September 19, 2018               [Page 6]

Internet-Draft                  new-media                     March 2018


   would need to exchange theses fingerprints instead of all the crazy
   stun attributes.

   Thanks to Peter Thatcher for proposing STUN over QUIC.

6.3.  TURN2

   TODO: make TURN2 run over QUIC

   Out of band, the client tells the TURN2 server the fingerprint of the
   cert it uses to authenticate with.  The TURN2 server gives the client
   two public IP:port address pairs.  One is called inbound and other
   called outbound.  The client connects to the outbound port and
   authenticates to TURN2 server using the TLS domain name of server.
   The TURN2 server authenticates the client using mutual TLS with
   fingerprint of cert provided by the client.  Any time a message or
   stun packet is received on the matched inbound port, the TURN2 server
   forwards it to the client(s) connected to the outbound port.

   A single TURN2 connection can be used for multiple different calls or
   session at the same time and a client could choose to allocate the
   TURN2 connection at the time that it started up.  It does not need to
   be done on a per session basis.

   The client can not send from the TURN2 server.


Jennings               Expires September 19, 2018               [Page 7]

Internet-Draft                  new-media                     March 2018


               Client A      Turn Server     Client B
           (Media Receiver)                (Media Sender)
                 |              |              |
                 |              |              |
                 |              |              |
                 |(1) OnInit Register (A's fingerprint)
                 |------------->|              |
                 |              |              |
                 |              |              |
                 |(2) Register  Response (Port Pair (L,R))
                 |<-------------|              |
                 |              |              |
                 |              |              |
                 | L(left of Server), R(Right of Server)
                 |              |              |
                 |              |              |
                 |              |              |
                 |(3) Setup TLS Connection (L port)
                 |..............|              |
                 |              |              |
                 |              |              |
                 |              |              | B send's media to A
                 |              |              |
                 |              |              |
                 |              |              |
                 |              |(4) Media Tx (Received on Port R)
                 |              |<-------------|
                 |              |              |
                 |              |              |
                 |(5) Media Tx (Sent from Port L)
                 |<-------------|              |
                 |              |              |
                 |              |              |

7.  Transport Layer

   The responsibility of the transport layer is to provide an end to end
   crypto layer equivalent to DTLS and they must ensure adequate
   congestion control.  The transport layer brings up a flow between two
   computers.  This flow can be used by multiple media streams.

   The MTI transport layer is QUIC with packets.  It assumes that QUIC
   has a way to delivers the packets in an effecent unreliable mode as
   wells as an optional way to deliver important metadata packets in a
   reliable mode.  It assumes that QUIC can report up to the rate
   adaptation layer a current max target bandwidth that QUIC can
   transmit at.  It's possible these are all unrealistic characteristics


Jennings               Expires September 19, 2018               [Page 8]

Internet-Draft                  new-media                     March 2018


   of QUIC in which case a new transport protocol should be developed
   that provides these and is layered on top of DTLS for security.

   This is secured by checking the fingerprints of the DTLS connection
   match the fingerprints provided at the control layer or by checking
   the names of the certificates match what was provided at control
   layer.

   The transport layer needs to be able to set the DSCP values in
   transmitting packets as specified by the control layer.

   The transport MAY provide a compression mode to remove the redundancy
   of the non-encrypted portion of the media messages such as
   GlobalEncodingID.  For example, a GlobalEncodingID could be mapped to
   a QUIC channel and then it could be removed before sending the
   message and added back on the receiving side.

   The transport need to be able to ensure that it has a very small
   chance of being confused with the STUN2 traffic it will be
   multiplexed with.  (Open issue - if the STUN2 runs on top of same
   transport, this becomes less of issue )

   The transport crypto needs to be able to export server state that can
   be passed out of band to the client to enable the client to make a
   zero RTT connection to the server.

8.  Media Layer - RTP3

   Each message consist of a set of TLV headers with metadata about the
   packet, followed by payload data such as the output of audio or video
   codec.

   There are several message headers that help the receiver understand
   what to do with the media.  The TLV header are the follow:

   o  Conference ID: Integer that will be globally unique identifier for
      the for all applications using a common call singling system.
      This is set by the proposal.

   o  Endpoint ID: Integer to uniquely identify the endpoint with within
      scope of conference ID.  This is set by the proposal.

   o  Source ID: integer to uniquely identify the input source within
      the scope a endpoint ID.  A source could be a specific camera or a
      microphone.  This is set by the endpoint and included in the
      advertisement.


Jennings               Expires September 19, 2018               [Page 9]

Internet-Draft                  new-media                     March 2018


   o  Sink ID: integer to uniquely identify the sink within the scope a
      endpoint ID.  A sink could be a speaker or screen.  This is set by
      the endpoint and included in the advertisement.  An endpoint
      sending media can have this set.  If it is set it should transmit
      it for 3 frames any time it changes and once every 5 second.  An
      SFU can add, modify, or delete this from any media packet.  TODO -
      How to use this for SFU controlled layout - for example, if have
      100 users in conference and want to put the 10 most recent
      speakers in thumbnails.  Do we need this at all ?

   o  Encoding ID: integer to uniquely identify the encoding of the
      stream within the scope of the source ID.  Note there may be
      multiple encodings of data from the same source.  This is set by
      the proposal.

   o  Salt : salt to use for forming the initialization vector for AEAD.
      The salt shall be sent as part of the packet and need not be sent
      in all the packets.  This is created by the endpoint sending the
      message.

   o  GlobalEncodingID: 64 bit hash of concatenation of conference ID,
      endpoint ID, source ID, encoding ID

   o  Capture time: Time when the first sample in the message was
      captured.  It is a NTP time in ms with the high order bits
      discarded.  The number of bits in the capture time needs to be
      large enough that it does not wrap in for the lifetime of this
      stream.  This is set by the endpoint sending the message.

   o  Sequence ID: When the data captured for a single point in time is
      too large to fit in a single message, it can be split into
      multiple chunks which are sequentially numbered starting at 0
      corresponding to the first chunk of the message.  This is set by
      the endpoint sending the message.

   o  GlobalMessageID: 64 bit hash of concatenation of conference ID,
      endpoint ID, encoding ID, sequence ID

   o  Active level: this is a number from 0 to 100 indicates the
      probability that the sender of this media wishes it to be
      considered active media.  For example if it was voice, it would be
      100 if the person was clearly speaking, and 0 if not, and perhaps
      a value in the middle if it was uncertain.  This allows an media
      switch to select the active speaker in the in a conference call.

   o  Location: relative or absolute location, direction of view, and
      field view.  With video coming from drones, 360 cameras, VR light
      field cameras, and complex video conferencing rooms, this provides


Jennings               Expires September 19, 2018              [Page 10]

Internet-Draft                  new-media                     March 2018


      the information about the camera or microphone that the receiver
      can use to render the correct view.  This is end to end encrypted.

   o  Reference Frame : bool to indicate if this message is part of a
      reference frame.  Typically, a SFU will switch to the new video
      stream at the start of a reference frame.

   o  DSCP : DSCP to use on transmissions of this message and future
      messages on this GlobalEncodingID

   o  Layer ID : Integer indicating which layer is for scalable video
      codecs.  SFU may use this to selectively drop a frame.

   The keys used for the AEAD are unique to a given conference ID and
   endpoint ID.

   If the message has any of the following headers, they must occur in
   the following order followed by all other headers:

   1.  GlobalEncodingID,

   2.  GlobalMessageID,

   3.  conference ID,

   4.  endpoint ID,

   5.  encoding ID,

   6.  sequence ID,

   7.  active level,

   8.  DSCP

   Every second there much be at least one message in each encoding that
   contains:

   o  conference ID,

   o  endpoint ID,

   o  encoding ID,

   o  salt,

   o  and sequence ID headers


Jennings               Expires September 19, 2018              [Page 11]

Internet-Draft                  new-media                     March 2018


   but they are not needed in every packet.

   The sequence ID or GlobalMessageID is required in every message and
   periodically there should be message with the capture time.

8.1.  RTP Meta Data

   We tend to end up with a few categories of data associated with the
   media:

   o  Stuff you need at the same time you get the media.  For example,
      this is a reference frame.

   o  Stuff you need soon but not instantly.  For example the name of
      the speaker in a given rectangle of a video stream

   And it tends to change at different rates:

   o  Stuff that you need to process the media and may change but does
      not change quickly and you don't need it with every frame.  For
      example, salt for encryption

   o  Stuff that you need to join the media but may never change.  For
      example, resolution of the video is

   TODO - think about how to optimize design for each type of meta data

8.2.  Securing the messages

   The whole message is end to end secured with AEAD.  The headers are
   authenticated while the payload data is authenticated and encrypted.
   Similar to how the IV for AES-GCM is calculated in SRTP, in this case
   the IV is computed by xor'ing the salt with the concatenation of the
   GlobalEncodingID and low 64 bits of sequence ID.  The message
   consists of the authenticated data, followed by the encrypted data ,
   then the authentication tag.

8.3.  Sender requests

   The control layer supports requesting retransmission of a particular
   media message identified by IDs and capture time it would contain.

   The control layer supports requesting a maximum rate for each given
   encoding ID.


Jennings               Expires September 19, 2018              [Page 12]

Internet-Draft                  new-media                     March 2018


8.4.  Data Codecs

   Data messages including raw bytes, xml, senml can all be sent just
   like media by selecting an appropriate codec and a software based
   source or sink.  An additional parameter to the codec can indicate if
   reliably delivery is needed and if in order delivery is needed.

8.5.  Media Keep Alive

   Provided by transport.

8.6.  Forward Error Correction

   A new Reed-Solomon based FEC scheme based on
   [I-D.ietf-payload-flexible-fec-scheme] that provides FEC over
   messages needs to be defined.

8.7.  MTI Codecs

8.7.1.  Audio

   Implementation MUST support at least G711 and Opus

8.7.2.  Video

   Implementation MUST support at least H.264 and AV1

   Video codecs use square pixels.

   Video codecs MUST support any aspect ratio within the limits of their
   max width and height.

   Video codecs can specify a maximum pixel rate, maximum frame rate,
   maximum images size.  The can also specify a list of binary flags of
   supported features which are defined by the codec and may be
   supported by the codec for encode, decode, or neither where each
   feature can be independently controlled.  They can not impose
   constraints beyond that.  Some existing codecs like vp8 may easily
   fit into that while some codec like H264 may need some suspects
   defined as new codecs to meet the requirements for this.  It is not
   expected that all the nuances that could be negotiated with SDP for
   264 would be supported in this new media.

   Video codecs MUST support a min width and min height of 1.

   All video on the wire is oriented such that the first scan line in
   the frame is up and first pixel in the scan line is on the left.


Jennings               Expires September 19, 2018              [Page 13]

Internet-Draft                  new-media                     March 2018


   T.38 fax and DTMF are not supported.  Fax can be sent as a TIFF
   imager over a data channel and DTFM can be done as an application
   specific information over a data channel.

   TODO: Capture the list of what metadata video encoders produce * if
   it is a reference frame or not * resolution * frame-rate ? * capture
   time of frame

   TODO: Capture the list of what metadata video encoders needs.  *
   capture timestamp * source and target resolution * source and target
   frame-rate * target bitrate * max bitrate * max pixel rate

8.7.3.  Annotation

   Optional support for annotation based overlay using vector graphics
   such as a subset of SVG.

8.7.4.  Application Data Channels

   Need support for application defined data in both a reliable and
   unreliable datagram mode.

8.7.5.  Reverse Requests & Stats

   The hope is that this is not needed.

   Much of what goes in the reverse direction of the media in RTCP is
   either used for congestion controll, diagnostics, or controll of the
   codec such as requesting to resent a frame or sending a new intra
   codec frame for video.  The design reduces the need for this.

   The congestion controll information which is needed quickly is all
   handled at QUIC layer.

   The diagnostic type information can be reported from the endpint to
   the controller and does not need to flow at the media level.

   Information that needs to be delivered reliably can be sent that way
   at the QUIC level remove the need for retransmit type request.
   System that use selective retransmission to recover from packet loss
   of media do not tend to work as well for interactive medias as
   forward error correction schemes because of the large latency they
   introduce.

   Information like requesting a new intra codec frame for video often
   needs to come from the controller and can be sent over the signalling
   and controll layer.


Jennings               Expires September 19, 2018              [Page 14]

Internet-Draft                  new-media                     March 2018


8.8.  Message Key Agreement

   The secret for encrypting messages can be provided in the proposal by
   value or by a reference.  The reference approach allows the client to
   get it from a messaging system where the server creating the proposal
   may not have access to the the secret.  For example, it might come
   from a system like [I-D.barnes-mls-protocol].

9.  Control Layer

   The control layer needs an API to find out what the capabilities of
   the device are, and then a way to set up sending and receiving
   stream.  All media flow are only in one direction.  The control is
   broken into control of connectivity and transports, and control of
   media streams.

9.1.  Transport Capabilities API

   An API to get information for remote connectivity including:

   o  set the IP, port, and credential for each TURN2 server

   o  can return the IP, port tuple for the remote side to send to TURN2
      server

   o  gather local IP, port, protocol tuples for receiving media

   o  report SHA256 fingerprint of local TLS certificate

   o  encryption algorithms supported

   o  report an error for a bad TURN2 credential

9.2.  Media Capabilities API

   Send and receive codecs are consider separate codecs and can have
   separate capabilities though the default to the same if not specified
   separately.

   For each send or receive audio codec, an API to learn:

   o  codec name

   o  the max sample rate

   o  the max sample size

   o  the max bitrate


Jennings               Expires September 19, 2018              [Page 15]

Internet-Draft                  new-media                     March 2018


   For each send or receive video codec, an API to learn:

   o  codec name

   o  the max width

   o  the max height

   o  the max frame rate

   o  the max pixel depth

   o  the max bitrate

   o  the max pixel rate ( pixels / second )

9.3.  Transport Configuration API

   To create a new flow, the information that can be configured is:

   o  turn server to use

   o  list of IP, Port, Protocol tuples to try connecting to

   o  encryption algorithm to use

   o  TLS fingerprint of far side

   An api to allow modification of the follow attributes of a flow:

   o  total max bandwidth for flow

   o  forward error correction scheme for flow

   o  FEC time window

   o  retransmission scheme for flow

   o  addition IP, Port, Protocol pairs to send to that may improve
      connectivity

9.4.  Media Configuration API

   For all streams:

   o  set conference ID

   o  set endpoint ID


Jennings               Expires September 19, 2018              [Page 16]

Internet-Draft                  new-media                     March 2018


   o  set encoding ID

   o  salt and secret for AEAD

   o  flag to pause transition

   For each transmitted audio steam, a way to set the:

   o  audio codec to use

   o  media source to connect

   o  max encoded bitrate

   o  sample rate

   o  sample size

   o  number of channels to encode

   o  packetization time

   o  process as one of : automatically set, raw, speech, music

   o  DSCP value to use

   o  flag to indicating to use constant bit rate

   o  optionally set a sinkID to periodically include in the media

   For each transmitted video stream, a way to set

   o  video codec to use

   o  media source to connect to

   o  max width and max height

   o  max encoded bitrate

   o  max pixel rate

   o  sample rate

   o  sample size

   o  process as one of : automatically set, rapidly changing video,
      fine detail video


Jennings               Expires September 19, 2018              [Page 17]

Internet-Draft                  new-media                     March 2018


   o  DSCP value to use

   o  for layered codec, a layer ID and set of layers IDs this depends
      on

   o  optionally set a sinkID to periodically include in the media

   For each transmitted video stream, a way to tell it to:

   o  encode the next frame as an intra frame

   For each transmitted data stream:

   o  a way to send a data message and indicate reliable or unreliable
      transmission

   For each received audio stream:

   o  audio codec to use

   o  media sink to connect to

   o  lip sync flag

   For each received video stream:

   o  video codec to use

   o  media sink to connect to

   o  lip sync flag

   For each received data stream:

   o  notification of received data messages

   Note on lip sync: For any streams that have the lip sync flag set to
   true, the render attempts to synchronize their play back.

9.5.  Transport Metrics

   o  report gathering state and completion

9.6.  Flow Metrics API

   For each flow, report:

   o  report connectivity state


Jennings               Expires September 19, 2018              [Page 18]

Internet-Draft                  new-media                     March 2018


   o  report bits sent

   o  report packets lost

   o  report estimated RTT

   o  report SHA256 fingerprint for certificate of far side

   o  current 5 tuple in use

9.7.  Stream Metrics API

   For sending streams:

   o  Bits sent

   o  packets lost

   For receiving streams:

   o  capture time of most recently receives packet

   o  endpoint ID of more recently received packet

   o  bits received

   o  packets lost

   For video streams (send & receive):

   o  current encoded width and height

   o  current encoded frame rate

10.  Call Signalling - JABBER2

   Call signalling is out of scope for usages like WebRTC but other
   usages may want a common REST API they can use.

   Call signalling works be having the client connect to a server when
   it starts up and send its current advertisement and open a web socket
   or to receive proposals from the server.  A client can make a rest
   call indicating the parties(s) it wishes to connect to and the server
   will then send proposals to all clients that connect them.  The
   proposal tell each client exactly how to configure it's media stack
   and MUST be either completely accepted, or completely rejected.


Jennings               Expires September 19, 2018              [Page 19]

Internet-Draft                  new-media                     March 2018


   The signalling is based on the the advertisement proposal ideas from
   [I-D.peterson-sipcore-advprop].

   We define one round trip of signalling to be a message going from a
   client up to a server in the cloud, then down to another client which
   returns a response along the reverse path.  With this definition SIP
   is takes 1.5 round trips or more if TURN is needed to set up a call
   while this takes 0.5 round trips.

11.  Signalling Examples

11.1.  Simple Audio Example

11.1.1.  simple audio advertisement

                  {
                    "receiveAt":[
                      {
                        "relay":"2001:db8::10:443",
                        "stunSecret":"s8i739dk8",
                        "tlsFingerprintSHA256":"1283938"
                      },
                      {
                        "stun":"203.0.113.10:43210",
                        "stunSecret":"s8i739dk8",
                        "tlsFingerprintSHA256":"1283938"
                      },
                      {
                        "local":"192.168.0.2:443",
                        "stunSecret":"s8i739dk8",
                        "tlsFingerprintSHA256":"1283938"
                      }
                    ],
                    "sources":[
                      {
                        "sourceID":1,
                        "sourceType":"audio",
                        "codecs":[
                          {
                            "codecName":"opus",
                            "maxBitrate":128000
                          },
                          {
                            "codecName":"g711"
                          }
                        ]
                      }
                    ],


Jennings               Expires September 19, 2018              [Page 20]

Internet-Draft                  new-media                     March 2018


                    "sinks":[
                      {
                        "sinkID":1,
                        "sourceType":"audio",
                        "codecs":[
                          {
                            "codecName":"opus",
                            "maxBitrate":256000
                          },
                          {
                            "codecName":"g711"
                          }
                        ]
                      }
                    ]
                  }

11.1.2.  simple audio proposal

                  {
                    "receiveAt":[
                      {
                        "relay":"2001:db8::10:443",
                        "stunSecret":"s8i739dk8"
                      },
                      {
                        "stun":"203.0.113.10:43210",
                        "stunSecret":"s8i739dk8"
                      },
                      {
                        "local":"192.168.0.10:443",
                        "stunSecret":"s8i739dk8"
                      }
                    ],
                    "sendTo":[
                      {
                        "relay":"2001:db8::20:443",
                        "stunSecret":"20kdiu83kd8",
                        "tlsFingerprintSHA256":"9389739"
                      },
                      {
                        "stun":"203.0.113.20:43210",
                        "stunSecret":"20kdiu83kd8",
                        "tlsFingerprintSHA256":"9389739"
                      },
                      {
                        "local":"192.168.0.20:443",
                        "stunSecret":"20kdiu83kd8",


Jennings               Expires September 19, 2018              [Page 21]

Internet-Draft                  new-media                     March 2018


                        "tlsFingerprintSHA256":"9389739"
                      }
                    ],
                    "sendStreams":[
                      {
                        "conferenceID":4638572387,
                        "endpointID":23,
                        "sourceID":1,
                        "encodingID":1,
                        "codecName":"opus",
                        "AEAD":"AES128-GCM",
                        "secret":"xy34",
                        "maxBitrate":24000,
                        "packetTime":20
                      }
                    ],
                    "receiveStreams":[
                      {
                        "conferenceID":4638572387,
                        "endpointID":23,
                        "sinkID":1,
                        "encodingID":1,
                        "codecName":"opus",
                        "AEAD":"AES128-GCM",
                        "secret":"xy34"
                      }
                    ]
                  }

11.2.  Simple Video Example

   Advertisement for simple send only camera with no audio


Jennings               Expires September 19, 2018              [Page 22]

Internet-Draft                  new-media                     March 2018


                    {
                      "sources":[
                        {
                          "sourceID":1,
                          "sourceType":"video",
                          "codecs":[
                            {
                              "codecName":"av1",
                              "maxBitrate":20000000,
                              "maxWidth":3840,
                              "maxHeight":2160,
                              "maxFrameRate":120,
                              "maxPixelRate":248832000,
                              "maxPixelDepth":8
                            }
                          ]
                        }
                      ]
                    }

11.2.1.  Proposal sent to camera

                  {
                    "sendTo":[
                      {
                        "relay":"2001:db8::20:443",
                        "stunSecret":"20kdiu83kd8",
                        "tlsFingerprintSHA256":"9389739"
                      }
                    ],
                    "sendStreams":[
                      {
                        "conferenceID":0,
                        "endpointID":0,
                        "sourceID":0,
                        "encodingID":0,
                        "codecName":"av1",
                        "AEAD":"NULL",
                        "width":640,
                        "height":480,
                        "frameRate":30
                      }
                    ]
                  }


Jennings               Expires September 19, 2018              [Page 23]

Internet-Draft                  new-media                     March 2018


11.3.  Simulcast Video Example

   Advertisement same as simple camera above but proposal has two
   streams with different encodingID.

                  {
                    "sendTo":[
                      {
                        "relay":"2001:db8::20:443",
                        "stunSecret":"20kdiu83kd8",
                        "tlsFingerprintSHA256":"9389739"
                      }
                    ],
                    "sendStreams":[
                      {
                        "conferenceID":0,
                        "endpointID":0,
                        "sourceID":0,
                        "encodingID":1,
                        "codecName":"av1",
                        "AEAD":"NULL",
                        "width":1920,
                        "height":1080,
                        "frameRate":30
                      },
                      {
                        "conferenceID":0,
                        "endpointID":0,
                        "sourceID":0,
                        "encodingID":2,
                        "codecName":"av1",
                        "AEAD":"NULL",
                        "width":240,
                        "height":240,
                        "frameRate":15
                      }
                    ]
                  }

11.4.  FEC Example

11.4.1.  Advertisement includes a FEC codec.


Jennings               Expires September 19, 2018              [Page 24]

Internet-Draft                  new-media                     March 2018


                    {
                      "sources":[
                        {
                          "sourceID":1,
                          "sourceType":"video",
                          "codecs":[
                            {
                              "codecName":"av1",
                              "maxBitrate":20000000,
                              "maxWidth":3840,
                              "maxHeight":2160,
                              "maxFrameRate":120,
                              "maxPixelRate":248832000,
                              "maxPixelDepth":8
                            },
                            {
                              "codecName":"flex-fec-rs"
                            }
                          ]
                        }
                      ]
                    }

11.4.2.  Proposal sent to camera


Jennings               Expires September 19, 2018              [Page 25]

Internet-Draft                  new-media                     March 2018


                  {
                    "sendTo":[
                      {
                        "relay":"2001:db8::20:443",
                        "stunSecret":"20kdiu83kd8",
                        "tlsFingerprintSHA256":"9389739"
                      }
                    ],
                    "sendStreams":[
                      {
                        "conferenceID":0,
                        "endpointID":0,
                        "sourceID":0,
                        "encodingID":1,
                        "codecName":"av1",
                        "AEAD":"NULL",
                        "width":640,
                        "height":480,
                        "frameRate":30
                      },
                      {
                        "conferenceID":0,
                        "endpointID":0,
                        "sourceID":0,
                        "encodingID":2,
                        "AEAD":"NULL",
                        "codecName":"flex-fec-rs",
                        "fecRepairWindow":200,
                        "fecRepairEncodingIDs":[
                          1
                        ]
                      }
                    ]
                  }

12.  Switched Forwarding Unit (SFU)

   When several clients are in conference call, the SFU can forward
   packets based on looking at which clients needs a given
   GlobalEncodingID.  By looking at the "active level", the SFU can
   figure out which endpoints are the active speaker and forward only
   those.  The SFU never changes anything in the message.

12.1.  Software Defined Networking

   Is it possible to use the packet recycling concepts in SDN to forward
   a single packet to multiple endpoints?  Can the way SDN forwarding
   would work be adapted to use a SDN router as a SFU?


Jennings               Expires September 19, 2018              [Page 26]

Internet-Draft                  new-media                     March 2018


12.2.  Vector Packet Processors

   Can we use fast VPP systems like fd.io to create a SFU?

12.3.  Information Centric Networking

   What changes would be needed to map RTP2 into the prefix and suffix
   of hICN?

13.  Acknowledgements

   Thank you for input from: Harald Alvestrand, Espen Berger, Matthew
   Kaufman, Patrick Linskey, Eric Rescorla, Peter Thatcher, Malcolm
   Walters Martin Thomson

14.  Other Work

   rfc7016

   draft-kaufman-rtcweb-traversal

   Consider using terminology from rfc7656

   docs.google.com/presentation/
   d/1Sg_1TVCcKJvZ8Egz5oa0CP01TC2rNdv9HVu7W38Y4zA/
   edit#slide=id.g29a8672e18_22_120

   docs.google.com/presentation/d/1o-
   o5jZBLw3Py1OuenzWDkxDG6NigSmLHvGw5KemKWLw/
   edit#slide=id.g2f8f4acff1_1_249

   cs.chromium.org/chromium/src/third_party/webrtc/common_video/include/
   video_frame.h

15.  Style of specification

   Fundamental driven by experiments.  The proposal is to have a high
   level overview document where we document some of the design - this
   document could be a start of that.  Then write a a spec for each on
   of the separable protocol parts such as STUN2, TURN2, etc.

   The protocol specs would contain a high level overview like you might
   find on a wikipedia page and the details of the protocol encoding
   would be provided in an open source reference implementation.  The
   test code for the references implementation helps test the spec.  The
   implementation is not optimized for perfromance but instead is simply
   trying to clearly illustrate the protocol.  Particular version of the
   draft would be bound to a tagged version of the source code.  All the


Jennings               Expires September 19, 2018              [Page 27]

Internet-Draft                  new-media                     March 2018


   source code would be under normal IETF IPR rules just like it was
   included directly in the draft.

16.  Informative References

   [I-D.barnes-mls-protocol]
              Barnes, R., Millican, J., Omara, E., Cohn-Gordon, K., and
              R. Robert, "The Messaging Layer Security (MLS) Protocol",
              draft-barnes-mls-protocol-00 (work in progress), February
              2018.

   [I-D.ietf-payload-flexible-fec-scheme]
              Zanaty, M., Singh, V., Begen, A., and G. Mandyam, "RTP
              Payload Format for Flexible Forward Error Correction
              (FEC)", draft-ietf-payload-flexible-fec-scheme-06 (work in
              progress), March 2018.

   [I-D.jennings-dispatch-snowflake]
              Jennings, C. and S. Nandakumar, "Snowflake - A Lighweight,
              Asymmetric, Flexible, Receiver Driven Connectivity
              Establishment", draft-jennings-dispatch-snowflake-01 (work
              in progress), March 2018.

   [I-D.jennings-mmusic-ice-fix]
              Jennings, C., "Proposal for Fixing ICE", draft-jennings-
              mmusic-ice-fix-00 (work in progress), July 2015.

   [I-D.kaufman-rtcweb-traversal]
              Kaufman, M. and J. Rosenberg, "NAT Traversal Requirements
              for RTCWEB", draft-kaufman-rtcweb-traversal-00 (work in
              progress), June 2011.

   [I-D.peterson-sipcore-advprop]
              Peterson, J. and C. Jennings, "The Advertisement/Proposal
              Model of Session Description", draft-peterson-sipcore-
              advprop-01 (work in progress), March 2011.

Author's Address

   Cullen Jennings
   Cisco

   Email: fluffy@iii.ca


Jennings               Expires September 19, 2018              [Page 28]