Internet Engineering Task Force R. Mekuria Internet-Draft Unified Streaming B.V. Expires: November 7th, 2018 Intended status: Best Current Practice May 7, 2018 Live Media and Metadata Ingest Protocol draft-mekuria-mmediaingest-00.txt Abstract This Internet draft presents a protocol specification for ingesting live media and metadata content from a live media source such as a live encoder towards a media processing entity or content delivery network. It defines the media format usage, the preferred transmission methods and the handling of failovers and redundancy. The live media considered includes high quality encoded audio visual content. The timed metadata supported includes timed graphics, captions, subtitles and metadata markers and information. This protocol can for example be used advanced live streaming workflows that combine high quality live encoders and advanced media processing entities. The specification follows best current industry practice. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Expires November 7 2018 [Page1] Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction 2. Conventions and Terminology 3. Media Ingest Protocol Behavior 4. Formatting Requirements for Timed Text, Captions and Subtitles 5. Formatting Requirements for Timed Metadata Markers 6. Guidelines for Handling of Media Processing Entity Failover 7. Guidelines for Handling of Live Media Source Failover 8. Security Considerations 9. IANA Considerations 10. Contributors 11. References 11.1. Normative References 11.2. Informative References 11.3. URL References Author's Address 1. Introduction This specification describes a protocol for media ingest from a live source (e.g. live encoder) towards media processing entities. Examples of media processing entities include media packagers, publishing points, streaming origins, content delivery networks and others. In particular, we distinguish active media processing entities and passive media processing entities. Active media processing entities perform media processing such as encryption, packaging, changing (parts of) the media content and deriving additional information. Passive media processing entities provide pass through functionality and/or delivery and caching functions that do not alter the media content itself. An example of a passive media processing entity could be a content delivery network (CDN) that provides functionalities for the delivery of the content. An example of an active media processing entity could be a just-in-time packager or a just in time transcoder. Expires November 7 2018 [Page2] Diagram 1: Example workflow with media ingest Live Media Source -> Media processing entity -> CDN -> End User Diagram 1 shows the workflow with a live media ingest from a live media source towards a media processing entity. The media processing entity provides additional processing such as content stitching, encryption, packaging, manifest generation, transcoding etc. Such setups are beneficial for advanced media delivery. The ingest described in this draft includes the latest technologies and standards used in the industry such as timed metadata, captions, timed text and encoding standards such as HEVC [HEVC]. The media ingest protocol specification and associated requirements were discussed with stakeholders, including broadcasters, live encoder vendors, content delivery networks, telecommunications companies and cloud service providers. While this draft specification has also been extensively discussed and reviewed by these stakeholders representing current best practices. Nevertheless, this current draft solely reflects the point of view of the authors of this draft taking received feedback from these stakeholders into account. Some insights on the discussions leading to this draft can be found on [fmp4git]. 2. Conventions and Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119 [RFC2119]. This specification uses the following additional terminology. ISOBMFF: the ISO Base Media File Format specified in [ISOBMFF]. ftyp: the filetype and compatibility "ftyp" box as described in the ISOBMFF [ISOBMFF] that describes the "brand" moov: the container box for all metadata "moov" described in the ISOBMFF base media file format [ISOBMFF] moof: the movie fragment "moof" box as described in the ISOBMFF base media file format [ISOBMFF] that describes the metadata of a fragment of media. mdat: the media data container "mdat" box contained in an ISOBMFF [ISOBMFF], this box contains the compressed media samples kind: the track kind box defined in the ISOBMFF [ISOBMFF] to label a track with its usage mfra: the movie fragment random access "mfra" box defined in the ISOBMFF [ISOBMFF] to signal random access samples (these are samples that require no prior or other samples for decoding) [ISOBMFF]. tfdt: the TrackFragmentDecodeTimeBox box "tfdt" in the base media file format [ISOBMFF] used to signal the decode time of the media fragment signalled in the moof box. Expires November 7 2018 [Page3] mdhd: The media header box "mdhd" as defined in [ISOBMFF], this box contains information about the media such as timescale, duration, language using ISO 639-2/T codes [ISO639-2] pssh: The protection specific system header "pssh" box defined in [CENC] that can be used to signal the content protection information according to the MPEG Common Encryption (CENC) sinf: Protection scheme information box "sinf" defined in [ISOBMFF] that provides information on the encryption scheme used in the file elng: extended language box "elng" defined in [ISOBMFF] that can override the language information nmhd: The null media header Box "nmhd" as defined in [ISOBMFF] to signal a track for which no specific media header is defined, often used for metadata tracks HTTP: Hyper Text Transfer Protocol, version 1.1 as specified by [RFC2626] HTTP POST: Command used in the Hyper Text Transfer Protocol for sending data from a source to a destination [RFC2626] fragmentedMP4stream: stream of [ISOBMFF] fragments (moof and mdat) see page 5 for definition POST_URL: Target URL of a POST command in the HTTP protocol for pushing data from a source to a destination. TCP: Transmission Control Protocol (TCP) as defined in [RFC793] URI_SAFE_IDENTIFIER: identifier/string formatted according to [RFC3986] Connection: connection setup between a host and a source. Live stream event: the total media broadcast stream of the ingest. (Live) encoder: entity performing live encoding and producing a high quality encoded stream, can serve as Media ingest source (Media) Ingest source: a media source ingesting media content , typically a live encoder but not restricted to this, the media ingest source could by any type of media ingest source such as a stored file that is send in partial chunks Publishing point: entity used to publish the media content, consumes/receives the incoming media ingest stream Media processing entity: entity used to process media content, receives/consumes a media ingest stream. Media processing function: Media processing entity 3. Media Ingest Protocol Behavior The specification uses multiple HTTP POST and/or PUT requests to transmit an optional manifest followed by encoded media data packaged in fragmented [ISOBMFF]. The subsequent posted segments correspond to those decribed in the manifest. Each HTTP POST sends a complete manifest or media segment towards the processing entity. The sequence of POST commands starts with the manifest and init segments that includes header boxes (ftyp and moov boxes). It continues with the sequence of segments (combinations of moof and mdat boxes). Expires November 7 2018 [Page4] An example of a POST URL targeting the publishing point is: http://HostName/presentationPath/manifestpath /rsegmentpath/Identifier The PostURL the syntax is defined as follows using the IETF RFC 5234 ANB [RFC5234] to specify the structure. PostURL = Protocol ://BroadcastURL Identifier Protocol = "http" / "https" BroadcastURL = HostName "/" PresentationPath HostName = URI_SAFE_IDENTIFIER PresentationPath = URI_SAFE_IDENTIFIER ManifestPath = URI_SAFE_IDENTIFIER Rsegmentpath = URI_SAFE_IDENTIFIER Identifier = segment_file_name In this PostURL the HostName is typically the hostname of the media processing entity or publishing point. The presentation path is the path to the specific presentation at the publishing point. The manifest path can be used to signal the specific manifest of the presentation. The rsegmentpath can be a different optional extended path based on the relative paths in the manifest file. The identifier describes the filename of the segment as described in the manifest. The live source sender first sends the manifest to the path http://hostname/presentationpath/ allowing the receiving entity to setup reception paths for the following segments and manifests. In case no manifest is used any POST_URL setup for media ingest such as http://hostname/presentationpath/ can be used. The fragmentedMP4stream can be defined using the IETF RFC 5234 ANB [RFC5234] as follows. fragmentedMP4stream = headerboxes fragments headerboxes = ftyp moov fragments = X fragment fragment = Moof Mdat The communication between the live encoder/media ingest source and the receiving media procesing entity follows the following requirements. 1. The live encoder or ingest source communicates to the publishing point/processing entity using the HTTP POST method as defined in the HTTP protocol [RFC2626], or in the case for manifest updates the HTTP PUT Method. 2. The live encoder or ingest source SHOULD start by sending an HTTP POST request with an empty "body" (zero content length) by using the same POSTURL. This can help the live encoder or media ingest source to quickly detect whether the live ingest publishing point is valid, and if there are any authentication or other conditions required. Expires November 7 2018 [Page5] 3. The live encoder/media source SHOULD use secured transmission using HTTPS protocol as specified in [RFC2818] for connecting to the receiving media processing entity or publishing point. 4. In case HTTPS protocol is used, basic authentication HTTP AUTH [RFC7617] or better methods like TLS client certificates SHOULD be used to secure the connection. 5. As compatibility profile for the TLS encryption we recommend the mozzilla intermediate compatibility profile which is supported in many available implementations [MozillaTLS]. 6. Before sending the segments based on fragmentedMP4Stream the live encoder/source MAY send a manifest with the following the limitations/constraints. 6a. Only relative URL paths to be used for each segment 6b. Only unique paths are used for each new presentation 6c. In case the manifest contains these relative paths, these paths MAY be used in combination with the POST_URL + relative URLs to POST each of the different segments from the live encoder or ingest source to the processing entity. 6d. In case the manifest contains no relative paths, or no manifest is used the segments SHOULD be posted to the original POST_URL specified by the service. 6e. In this case the tdft and trackids MAY be used by the processing entity to distinguish incoming segments instead of the target POST_URL. 7. The live encoder MAY send an updated version of the manifest, this manifest cannot override current settings and relative paths or break currently running and incoming POST requests. The updated manifest can only be slightly different from the one that was send previously, e.g. introduce new segments available or event messages. The updated manifest SHOULD be send using a PUT request instead of a POST request. Note: this manifest will be useful for passive media processing entities mostly, for ingest towards active media processing entities this manifest could be avoided and information is signalled through the boxes available in the ISOBMFF. 8. The encoder or ingest source MUST handle any error or failed authentication responses received from the media processing entity such as 403 (forbidden), 400 bad request, 415 unsupported media type, 412 not fulfilling conditions Expires November 7 2018 [Page6] 9. In case of a 412 not fullfilling conditions or 415 unsupported media type, the live source/encoder MUST resend the init segment consisting of a "moov" and "ftyp" box. 10. The live encoder or ingest source SHOULD start a new HTTP POST segment request sequence with the init segment including header boxes "ftyp" and "moov" 11. Following media segment requests SHOULD be corresponding to the segments listed in the manifest if a manifest was sent. 12. The payload of each request MAY start with the header boxes "ftyp" and "moov", followed by segments which consist of a combination of "moof" and "mdat" boxes. Note that the "ftyp", and "moov" boxes (in this order) MAY be transmitted with each request, especially if the encoder must reconnect because the previous POST request was terminated prior to the end of the stream with a 412 or 415 message. Resending the "moov" and "ftyp" boxes allows the receiving entitity to recover the init segment and the track information needed for interpreting the content. 13. The encoder or ingest source MAY use chunked transfer encoding option of the HTTP POST command [RFC2626] for uploading as it might be difficult to predict the entire content length of the segment. This can be used for example to support use cases that require low latency. 14. The encoder or ingest source SHOULD use individual HTTP POST commands [RFC2626] for uploading media segments when ready. 15. If the HTTP POST request terminates or times out with a TCP error prior to the end of the stream, the encoder MUST issue a new POST request by using a new connection, and follow the preceding requirements. Additionally, the encoder MAY resend the previous two segments that were already sent again. 16. In case fixed length POST Commands are used, the live source entity MUST resend the segment to be posted decribed in the manifest entirely in case of responses HTTP 400, 412 or 415 together with the init segment consisting of "moov" and "ftyp" boxes. 17. In case the live stream event is over the live media source/encoder should signal the stop by transmitting an empty "mfra" box towards the publishing point/processing entity 18. The trackFragmentDecodeTime box "tfdt" box MUST be present for each segment posted. 19. The ISOBMFF media fragment duration SHOULD be constant, to reduce the size of the client manifests. A constant MPEG-4 fragment duration also improves client download heuristics through the use of repeat tags. The duration MAY fluctuate to compensate for non-integer frame rates. By choosing an appropriate timescale (a multiple of the frame rate is recommended) this issue can be avoided. Expires November 7 2018 [Page6] 20. The MPEG-4 fragment duration SHOULD be between approximately 2 and 6 seconds. 21. The fragment decode timestamps "tfdt" of fragments in the fragmentedMP4stream and the indexes base_media_decode_ time SHOULD arrive in increasing order for each of the different tracks/streams that are ingested. 22. The segments formatted as fragmented MP4 stream SHOULD use a timescale for video streams based on the framerate and 44.1 KHz or 48 KHz for audio streams or any another timescale that enables integer increments of the decode times of fragments signalled in the "tfdt" box based on this scale. 23. The manifest MAY be used to signal the language of the stream, which SHOULD also be signalled in the "mdhd" box or "elng" boxes in the init segment and/or moof headers ("mdhd") 24. The manifest SHOULD be used to signal encryption specific information, which SHOULD also be signalled in the "pssh", "schm" and "sinf" boxes in the segments of the init segment and media segments 25. The manifest SHOULD be used to signal information about the different tracks such as the durations, media encoding types, content types, which SHOULD also be signalled in the "moov" box in the init segment or the "moof" box in the media segments 26. The manifest SHOULD be used to signal information about the timed text, images and sub-titles in adaptation sets and this information SHOULD also be signalled in the "moov" box in the init segment, for more information see the next section. 27. Segments posted towards the media procesing entity MUST contain the bitrate "btrt" box specifying the target bitrate of the segments and the "tfdt" box specifying the fragments decode time and the "tfhd" box specifying the track id. 28. The live encoder/media source SHOULD repeatedly resolve the Hostname to adapt to changes in the IP to Hostname mapping such as for example by using the dynamic naming system DNS [RFC1035] or any other system that is in place. 29. The Live encoder media source MUST update the IP to hostname resolution respecting the TTL (time to live) from DNS query responses, this will enable better resillience to changes of the IP address in large scale deployments where the IP adress of the publishing point media processing nodes may change frequenty. 30. To support the ingest of live events with low latency, shorter segment and fragment durations MAY be used such as segments with a duration of 1 second. 31. The live encoder/media source SHOULD use a separate TCP connection for ingest of each different bit-rate tracks ingested Expires November 7 2018 [Page8] 4. Formatting Requirements for Timed Text, Captions and Subtitles The specification supports ingest of timed text, images, captions and subtitles. we follow the normative reference [MPEG-4-30] in this section. 1. The tracks containing timed text, images, captions or subtitles MAY be signalled in the manifest by an adaptationset with the different segments containing the data of the track. 2. The segment data MAY be posted to the URL corresponding to the path in the manifest for the segment, else they MUST be posted towards the original POST_URL 3. The track will be a sparse track signalled by a null media header "nmhd" containing the timed text, images, captions corresponding to the recommendation of storing tracks in fragmented MPEG-4 [CMAF] 4. Based on this recommendation the trackhandler "hdlr" shall be set to "text" for WebVTT and "subt" for TTML 5. In case TTML is used the track must use the XMLSampleEntry to signal sample description of the sub-title stream 6. In case WebVTT is used the track must use the WVTTSampleEntry to signal sample description of the text stream 7. These boxes SHOULD signal the mime type and specifics as described in [CMAF] sections 11.3 ,11.4 and 11.5 8. The boxes described in 3-7 must be present in the init segment ("ftyp" + "moov") for the given track 9. subtitles in CTA-608 and CTA-708 can be transmitted following the recommendation section 11.5 in [CMAF] via SEI messages in the video track 10. The "ftyp" box in the init segment for the track containing timed text, images, captions and sub-titles can use signalling using CMAF profiles based on [CMAF] 10a. WebVTT Specified in 11.2 ISO/IEC 14496-30 [MPEG-4-30] 'cwvt' 10b.TTML IMSC1 Text Specified in 11.3.3 [MPEG-4-30] IMSC1 Text Profile 'im1t' 10c.TTML IMSC1 Image Specified in 11.3.4 [MPEG-4-30] IMSC1 Image Profile 'im1i' 10d. CEA CTA-608 and CTA-708 Specified in 11.4 [MPEG-4-30] Caption data is embedded in SEI messages in video track; 'ccea' 11. The segments of the tracks containing Timed Text, Images, Captions and Sub-titles SHOULD use the bit-rate box "btrt" to signal bit-rate of the track in each segment. Expires November 7 2018 [Page9] 5. Formatting Requirements for Timed Metadata This section discusses the specific formatting requirements for ingest of timed metadata related to events and markers for ad- insertion or other timed metadata relating to the media content such as information about the content. When delivering a live streaming presentation with a rich client experience, often it is necessary to transmit time-synced events, metadata or other signals in-band with the main media data. An example of these are opportunities for dynamic live ad insertion signalled by SCTE-35 markers. This type of event signalling is different from regular audio/video streaming because of its sparse nature. In other words, the signalling data usually does not happen continuously, and the interval can be hard to predict. Examples of timed metadata are ID3 tags [ID3v2], SCTE-35 markers [SCTE-35] and DASH emsg messages defined in section 5.10.3.3 of [DASH]. For example, DASH Event messages contain a schemeIdUri that defines the payload of the message. Table 1 provides some example schemes in DASH event messages and Table 2 illustrates an example of a SCTE-35 marker stored in a dash emsg. The presented approach allows ingest of timed metadata from different sources, possibly on different locations by embedding them in sparse metadata tracks. Table 1 Example of DASH emsg schemes URI Scheme URI | Reference -------------------------|------------------ urn:mpeg:dash:event:2012 | [DASH], 5.10.4 urn:dvb:iptv:cpm:2014 | [DVB-DASH], 9.1.2.1 urn:scte:scte35:2013:bin | [SCTE-35] 14-3 (2015), 7.3.2 www.nielsen.com:id3:v1 | Nielsen ID3 in MPEG-DASH Table 2 example of a SCTE-35 marker embedded in a DASH emsg Tag | Value ------------------------|----------------------------- scheme_uri_id | "urn:scte:scte35:2013:bin" Value | the value of the SCTE 35 PID Timescale | positive number presentation_time_delta | non-negative number expressing splice time | relative to tfdt event_duration | duration of event | "0xFFFFFFFF" indicates unknown duration Id | unique identifier for message message_data | splice info section including CRC Expires November 7 2018 [Page10] The following steps are recommended for timed metadata ingest related to events, tags, ad markers and program information: 1. Create a fragmentedMP4stream that contains only a sparse metadata track which are tracks without audio/video. 2. Metadata tracks MAY be signalled in a manifest using an adaptationset with a sparse track, the actual data is in the sparse media track in the segments. 3. For a metadata track the media handler type is "meta" and the tracks handler box is a null media header box "nmhd". 4. The URIMetaSampleEntry entry contains, in a URIbox, the URI following the URI syntax in [RFC3986] defining the form of the metadata (see the ISO Base media file format specification [ISOBMFF]). For example, the URIBox could contain for ID3 tags [ID3v2] the URL http://www.id3.org 5. For the case of ID3, a sample contains a single ID3 tag. The ID3 tag may contain one or more ID3 frames. 6. For the case of DASH e-msg, a sample may contain one or more event message ("emsg") boxes. Version 0 Event Message SHOULD be used. The presentation_time_delta field is relative to the absolute timestamp specified in the TrackFragmentBaseMediaDecode-TimeBox ("tfdt"). The timescale field should match the value specified in the media header box "mdhd". 7. For the case of a DASH e-msg, the kind box (contained in the udta) MUST be used to signal the scheme URI of the type of metadata 8. A BitRateBox ("btrt") SHOULD be present at the end of MetaDataSampleEntry to signal the bit rate information of the stream. 9. If the specific format uses internal timing values, then the timescale must match the timescale field set in the media header box "mdhd". 10. All Timed Metadata samples are sync samples [ISOBMFF], defining the entire set of metadata for the time interval they cover. Hence, the sync sample table box is not present. 11. When Timed Metadata is stored in a TrackRunBox ("trun"), a single sample is present with the duration set to the duration for that run. Given the sparse nature of the signalling event, the following is recommended: 12. At the beginning of the live event, the encoder or media ingest source sends the initial header boxes to the processing entity/publishing point, which allows the service to register the sparse track. 13. When sending segments, the encoder SHOULD start sending from the header boxes, followed by the new fragments. Expires November 7 2018 [Page11] 14. The sparse track segment becomes available to the publishing point/processing entity when the corresponding parent track fragment that has an equal or larger timestamp value is made available. For example, if the sparse fragment has a timestamp of t=1000, it is expected that after the publishing point/processing entity sees "video" (assuming the parent track name is "video") fragment timestamp 1000 or beyond, it can retrieve the sparse fragment t=1000. Note that the actual signal could be used for a different position in the presentation timeline for its designated purpose. In this example, it is possible that the sparse fragment of t=1000 has an XML payload, which is for inserting an ad in a position that is a few seconds later. 15. The payload of sparse track fragments can be in different formats (such as XML, text, or binary), depending on the scenario 6. Guidelines for Handling of Media Processing Entity Failover Given the nature of live streaming, good failover support is critical for ensuring the availability of the service. Typically, media services are designed to handle various types of failures, including network errors, server errors, and storage issues. When used in conjunction with proper failover logic from the live encoder side, customers can achieve a highly reliable live streaming service from the cloud. In this section, we discuss service failover scenarios. In this case, the failure happens somewhere within the service, and it manifests itself as a network error. Here are some recommendations for the encoder implementation for handling service failover: 1. Use a 10-second timeout for establishing the TCP connection. If an attempt to establish the connection takes longer than 10 seconds, abort the operation and try again. 2. Use a short timeout for sending the HTTP requests. If the target segment duration is N seconds, use a send timeout between N and 2 N seconds; for example, if the segment duration is 6 seconds, use a timeout of 6 to 12 seconds. If a timeout occurs, reset the connection, open a new connection, and resume stream ingest on the new connection. This is needed to avoid latency introduced by failing connectivity in the workflow. 3. completely resend segments from the ingest source for which a connection was terminated early 4. We recommend that the encoder or ingest source does NOT limit the number of retries to establish a connection or resume streaming after a TCP error occurs. Expires November 7 2018 [Page12] 5. After a TCP error: a. The current connection MUST be closed, and a new connection MUST be created for a new HTTP POST request. b. The new HTTP POST URL MUST be the same as the initial POST URL for the segment to be ingested. c. The new HTTP POST MUST include stream headers ("ftyp", and "moov" boxes) that are identical to the stream headers in the initial POST request for fragmented media ingest. d. The last two fragments sent for each segment MAY be retransmitted. Other ISOBMFF fragment timestamps MUST increase continuously, even across HTTP POST requests. 6. The encoder or ingest source SHOULD terminate the HTTP POST request if data is not being sent at a rate commensurate with the MP4 segment duration. An HTTP POST request that does not send data can prevent publishing points or media processing entities from quickly disconnecting from the live encoder or media ingest source in the event of a service update. For this reason, the HTTP POST for sparse (ad signal) tracks SHOULD be short-lived, terminating as soon as the sparse fragment is sent. In addition this draft defines responses to the POST requests in order to signal the live media source its status. 7. In case the media processing entity cannot process the manifest or segment POST request due to authentication or permission problems then it can return a permission denied HTTP 403 8. In case the media processing entity can process the manifest or segment POSTED to the POST_URL it returns HTTP 200 OK or 202 Accepted 9. In case the media processing entity can process the manifest or segment POST request but finds the media type cannot be supported it returns HTTP 415 unsupported media type 10. In case an unknown error happened during the processing of the HTTP POST request a HTTP 400 Bad request is returned 11. In case the media processing entity cannot proces a segment posted due to missing init segment, a HTTP 412 unfulfilled condition is returned 12. In case a media source receives an HTTP 412 response, it SHOULD resend the manifest and "ftyp" and "moov" boxes for the track. Expires November 7 2018 [Page13] An example of media ingest with failure and HTTP responses is shown in the following figure: ||===============================================================|| ||===================== ============================ || ||| live media source | | Media processing entity | || ||===================== ============================ || || || || || ||===============Initial Manifest Sending========================|| || || || || || ||-- POST /prefix/media.mpd -------->>|| || || || Succes || || || || <<------ 200 OK --------------------|| || || || Permission denied || || || || <<------ 403 Forbidden -------------|| || || || Bad Request || || || || <<------ 400 Forbidden -------------|| || || || Unsupported Media Type || || || || <<------ 415 Unsupported Media -----|| || || || || || ||==================== Segment Sending ==========================|| || ||-- POST /prefix/chunk.cmaf ------->>|| || || || Succes/Accepted || || || || <<------ 200 OK --------------------|| || || || Succes/Accepted || || || || <<------ 202 OK --------------------|| || || || Premission Denied || || || || <<------ 403 Forbidden -------------|| || || || Bad Request || || || || <<------ 400 Forbidden -------------|| || || || Unsupported Media Type || || || || <<------ 415 Forbidden -------------|| || || || Unsupported Media Type || || || || <<-- 412 Unfulfilled Condition -----|| || || || || || || || || || ||===================== ============================ || ||| live media source | | Media processing entity | || ||===================== ============================ || || || || || ||===============================================================|| Expires November 7 2018 [Page13] 7. Guidelines for Handling of Live Media Source Failover Encoder or media ingest source failover is the second type of failover scenario that needs to be addressed for end-to-end live streaming delivery. In this scenario, the error condition occurs on the encoder side. The following expectations apply fro m the live ingestion endpoint when encoder failover happens: 1. A new encoder or media ingest source instance SHOULD be created to continue streaming 2. The new encoder or media ingest source MUST use the same URL for HTTP POST requests as the failed instance. 3. The new encoder or media ingest source POST request MUST include the same header boxes moov and ftyp as the failed instance. 4. The new encoder or media ingest source MUST be properly synced with all other running encoders for the same live presentation to generate synced audio/video samples with aligned fragment boundaries. This implies that UTC timestamps for fragments in the "tdft" match between decoders, and encoders start running at an appropriate segment boundary. 5. The new stream MUST be semantically equivalent with the previous stream, and interchangeable at the header and media fragment levels. 6. The new encoder or media ingest source SHOULD try to minimize data loss. The basemediadecodetime tdft of media fragments SHOULD increase from the point where the encoder last stopped. The basemediadecodetime in the "tdft" box SHOULD increase in a continuous manner, but it is permissible to introduce a discontinuity, if necessary. Media processing entities or publishing points can ignore fragments that it has already received and processed, so it is better to error on the side of resending fragments than to introduce discontinuities in the media timeline. 8. Security Considerations No security considerations except the ones mentioned in the preceding text. Further security considerations will be updated when they become known. 9. IANA Considerations This memo includes no request to IANA. 10. Contributors Arjen Wagenaar, Dirk Griffioen, Unified Streaming B.V. We thank all of the individual contributors to the discussions in [fmp4git] representing major content delivery networks, broadcasters, commercial encoders and cloud service providers. Expires November 7 2018 [Page14] 11. References 11.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [DASH] MPEG ISO/IEC JTC1/SC29 WG11, "ISO/IEC 23009-1:2014: Dynamic adaptive streaming over HTTP (DASH) -- Part 1: Media presentation description and segment formats," 2014. [SCTE-35] Society of Cable Television Engineers, "SCTE-35 (ANSI/SCTE 35 2013) Digital Program Insertion Cueing Message for Cable," SCTE-35 (ANSI/SCTE 35 2013). [ISOBMFF] MPEG ISO/IEC JTC1/SC29 WG11, " Information technology -- Coding of audio-visual objects Part 12: ISO base media file format ISO/IEC 14496-12:2012" [HEVC] MPEG ISO/IEC JTC1/SC29 WG11, "Information technology -- High efficiency coding and media delivery in heterogeneous environments -- Part 2: High efficiency video coding", ISO/IEC 23008-2:2015, 2015. [RFC793] J Postel IETF DARPA, "TRANSMISSION CONTROL PROTOCOL," IETF RFC 793, 1981. [RFC3986] R. Fielding, L. Masinter, T. Berners Lee, "Uniform Resource Identifiers (URI): Generic Syntax," IETF RFC 3986, 2004. [RFC1035] P. Mockapetris, "DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION" IETF RFC 1035, 1987. [CMAF] MPEG ISO/IEC JTC1/SC29 WG11, "Information technology (MPEG-A) -- Part 19: Common media application format (CMAF) for segmented media," MPEG, ISO/IEC International standard [RFC5234] D. Crocker "Augmented BNF for Syntax Specifications: ABNF" IETF RFC 5234 2008 [CENC] MPEG ISO/IEC JTC1 SC29 WG11 "Information technology -- MPEG systems technologies -- Part 7: Common encryption in ISO base media file format files" ISO/IEC 23001-7:2016 Expires November 7 2018 [Page15] [MPEG-4-30] MPEG ISO/IEC JTC1 SC29 WG11 "ISO/IEC 14496-30:2014 Information technology Coding of audio-visual objects -- Part 30": Timed text and other visual overlays in ISO base media file format [ISO639-2] ISO 639-2 "Codes for the Representation of Names of Languages -- Part 2 ISO 639-2:1998" [DVB-DASH] ETSI Digital Video Broadcasting "MPEG-DASH Profile for Transport of ISOBMFF Based DVB Services over IP Based Networks" ETSI TS 103 285 [RFC7617] J Reschke "The 'Basic' HTTP Authentication Scheme" IETF RFC 7617 September 2015 11.2. Informative References [RFC2626] R. Fielding et al "Hypertext Transfer Protocol HTTP/1.1", RFC 2626 June 1999 [RFC2818] E. Rescorla RFC 2818 HTTP over TLS IETF RFC 2818 May 2000 11.3. URL References [fmp4git] Unified Streaming github fmp4 ingest, "https://github.com/unifiedstreaming/fmp4-ingest". [MozillaTLS] Mozilla Wikie Security/Server Side TLS https://wiki.mozilla.org/Security/Server_Side_TLS #Intermediate_compatibility_.28default.29 (last acessed 30th of March 2018) [ID3v2] M. Nilsson "ID3 Tag version 2.4.0 Main structure" http://id3.org/id3v2.4.0-structure November 2000 (last acessed 2nd of May 2018) Author's Address Rufael Mekuria (editor) Unified Streaming Overtoom 60 1054HK Phone: +31 (0)202338801 E-Mail: rufael@unified-streaming.com Expires November 7 2018 [Page16]