Table of Contents

1. Introduction

This memo describes a real-time transport protocol (RTP) payload format for the Society of Motion Picture and Television Engineers (SMPTE) Ancillary data (ANC), as defined by SMPTE ST 291-1 [ST291]. ANC data is transmitted in the ancillary space of serial digital video interfaces, the space outside of the active video region of images intended for users to view. Ancillary space roughly corresponds to vertical and horizontal blanking periods required by cathode ray tube type displays. ANC can carry a range of data types, including time code, Closed Captioning, and the Active Format Description (AFD).

ANC is generally associated with the carriage of metadata within the bit stream of a Serial Digital Interface (SDI) such as SMPTE ST 259 [ST259], the standard definition (SD) Serial Digital Interface (with ANC data inserted as per SMPTE ST 125 [ST125]), or SMPTE ST 292-1 [ST292], the 1.5 Gb/s Serial Digital Interface for high definition (HD) television applications.

ANC data packet payload definitions for a specific application are specified by a SMPTE Standard, Recommended Practice, Registered Disclosure Document, or by a document generated by another organization, a company, or an individual (an Entity). When a payload format is registered with SMPTE, an application document describing the payload format is required, and the registered ancillary data packet is identified by a registered data identification word.

This memo describes an RTP payload that supports carriage of ANC data packets with origin from any location within any SMPTE defined SDI signal, or even if the ANC packets did not originate in an SDI signal. Sufficient information is provided to enable the ANC data packets at the output of the decoder to be restored to their original locations in the serial digital video signal raster (if that is desired). An optional Media Type parameter allows for signaling of carriage of one or more types of ANC data as specified by Data Identification (DID) or Secondary Data Identification (SDID) words. Another optional Media Type parameter allows for the identification of a link number, stream number, or image number of multi-link, multi-stream, or multi-image SDI interfaces.

It should be noted that the ancillary data flag (ADF) word is not specifically carried in this RTP payload. The ADF may be specified in a document defining an interconnecting digital video interface, otherwise a default ADF is specified by SMPTE ST 291-1 [ST291].

This ANC payload can be used by itself, or used along with a range of RTP video formats. In particular, it has been designed so that it could be used along with RFC 4175 [RFC4175] "RTP Payload Format for Uncompressed Video" or RFC 5371 [RFC5371] "RTP Payload Format for JPEG 2000 Video Streams."

The data model in this document for the ANC data RTP payload is based on the data model of SMPTE ST 2038 [ST2038], which standardizes the carriage of ANC data packets in an MPEG-2 Transport Stream.

1.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

2. RTP Payload Format for SMPTE ST 291 Ancillary Data

An example of the format of an RTP packet containing SMPTE ST 291 Ancillary Data is shown below:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    |V=2|P|X| CC    |M|    PT       |        sequence number        |
    |                           timestamp                           |
    |           synchronization source (SSRC) identifier            |
    |   Extended Sequence Number    |            Length             |
    | ANC_Count     | F |                reserved                   |
    |C|   Line_Number       |   Horizontal_Offset   |    reserved   |
    |         DID       |        SDID       |   Data_Count      |
                |   Checksum_Word   |         word_align            |
    |C|   Line_Number       |   Horizontal_Offset   |    reserved   |
    |         DID       |        SDID       |   Data_Count      |
                                    |   Checksum_Word   |word_align |

Figure 1: SMPTE Ancillary Data RTP Packet Format

In this example, two ANC data packets are present. The first has four 10-bit User Data Words, and the second has five 10-bit User Data Words.

Use of the term "network byte order" in the payload format shall be as defined in RFC 791 [RFC0791].

RTP packet header fields SHALL be interpreted as per RFC 3550 [RFC3550], with the following specifics:

Timestamp: 32 bits

The timestamp field is interpreted in a similar fashion to RFC 4175 [RFC4175]:
For progressive scan video, the timestamp denotes the sampling instant of the frame to which the ancillary data in the RTP packet belongs. RTP packets MUST NOT include ANC data from multiple frames, and all RTP packets with ANC data belonging to the same frame MUST have the same timestamp.
For interlaced video, the timestamp denotes the sampling instant of the field to which the ancillary data in the RTP packet belongs. RTP packets MUST NOT include ANC data from multiple fields, and all RTP packets belonging to the same field MUST have the same timestamp.
If the sampling instant does not correspond to an integer value of the clock, the value SHALL be truncated to the next lowest integer, with no ambiguity. Section 3.1 describes recommended timestamp clock rates.
Marker bit (M): 1 bit

The marker bit set to "1" indicates the last ANC RTP packet for a frame (for progressive scan video) or the last ANC RTP packet for a field (for interlaced video).

2.1. Payload Header Definitions

The ANC RTP payload header fields are defined as:

Extended Sequence Number: 16 bits

The high order bits of the extended 32-bit sequence number, in network byte order. This is the same as the Extended Sequence Number field in RFC 4175 [RFC4175].
Length: 16 bits

Number of octets of the ANC RTP payload, beginning with the "C" bit of the first ANC packet data, as an unsigned integer in network byte order. Note that all word_align fields contribute to the calculation of the length field.
ANC_Count: 8 bits

This field is the count of the total number of ANC data packets carried in the RTP payload, as an unsigned integer. A single ANC RTP packet payload cannot carry more than 255 ANC data packets.
If more than 255 ANC data packets need to be carried in a field or frame, additional RTP packets carrying ANC data MAY be sent with the same RTP timestamp but with different sequence numbers. ANC_Count of 0 indicates that there are no ANC data packets in the payload (for example, for an RTP packet with the marker bit set indicating the last ANC RTP packet in a field/frame, even if that RTP packet carries no actual ANC data packets.) If ANC_Count is 0, Length will also be 0.
F: 2 bits

These two bits relate to signaling the field specified by the RTP timestamp in an interlaced SDI raster. A value of 0b00 indicates that either the video format is progressive or that no field is specified. A value of 0b10 indicates that the timestamp refers to the first field of an interlaced video signal. A value of 0b11 indicates that the timestamp refers to the second field of an interlaced video signal. The value 0b01 is not valid.
reserved: 22 bits

The 22 reserved bits of value "0" follow the F field to ensure that the first ANC data packet header field in the payload begins 32-bit word-aligned to ease implementation.

For each ANC data packet in the payload, the following ANC data packet header fields MUST be present:

C: 1 bit

For HD signals, this flag, when set to "1", indicates that the ANC data corresponds to the color-difference channel (C). When set to "0", this flag indicates that the ANC data corresponds to the luma (Y) channel. For SD signals, this flag SHALL be set to "0". For ANC data packets that do not originate from SDI sources, if the ANC data type definition requires the use of the C or Y channel, the C flag SHALL reflect that requirement, otherwise the C flag SHALL be set to "0".
Line_Number: 11 bits

This field contains the line number (as defined in ITU-R BT.1700 [BT1700] for SD video or ITU-R BT.1120 [BT1120] for HD video) that corresponds to the location of the ANC data packet in an SDI raster as an unsigned integer in network byte order. A value of 0x7FF (all bits in the field are '1') indicates that the ANC data is carried without a specific line location within the field or frame. A value of 0x7FE indicates that the ANC data may be placed into any legal area of VANC, specifically.
Note that the lines that are available to convey ANC data are as defined in the applicable sample structure specification (e.g., SMPTE 274M [ST274], SMPTE ST 296 [ST296], ITU-R BT.656 [BT656]) and may be further restricted per SMPTE RP 168 [RP168].
Horizontal_Offset: 12 bits

This field defines the location of the ANC data packet in an SDI raster relative to the start of active video (SAV) as an unsigned integer in network byte order. A value of 0 means that the Ancillary Data Flag (ADF) of the ANC data packet begins immediately following SAV. For HD, this is in units of luma sample numbers as specified by the defining document of the particular image (e.g., SMPTE 274M [ST274] for 1920 x 1080 active images, or SMPTE ST 296 [ST296] for 1280 x 720 progressive active images). For SD, this is in units of (27MHz) multiplexed word numbers, as specified in SMPTE ST 125 [ST125]. A value of 0xFFF (all bits in the field are '1') indicates that the ANC data is carried without any specific location within the line. A value of 0xFFE indicates that the ANC data may be placed into any legal area of HANC, specifically.
Note that HANC space in the digital blanking area will generally have higher luma sample numbers than any samples in the active digital line.
reserved: 8 bits

One octet is reserved between Horizontal_Offset and DID fields, and contains "0" bits. These reserved bits ensure that the ANC data packet begins on a 32-bit word-aligned boundary for ease of implementation.

An ANC data packet with the header fields Line_Number of 0x7FF and Horizontal_Offset of 0xFFF SHALL be considered to be carried without any specific location within the field or frame.

For each ANC data packet in the payload, immediately after the ANC data packet header fields, the following data fields MUST be present, with the fields DID, SDID, Data_Count, User_Data_Words, and Checksum_Word representing the 10-bit words carried in the ANC data packet, as per SMPTE ST 291-1 [ST291]:

DID: 10 bits

Data Identification Word
SDID: 10 bits

Secondary Data Identification Word. Used only for a "Type 2" ANC data packet. Note that in a "Type 1" ANC data packet, this word will actually carry the Data Block Number (DBN).
Data_Count: 10 bits

The lower 8 bits of Data_Count, corresponding to bits b7 (MSB) through b0 (LSB) of the 10-bit Data_Count word, contain the actual count of 10-bit words in User_Data_Words. Bit b8 is the even parity for bits b7 through b0, and bit b9 is the inverse (logical NOT) of bit b8.
User_Data_Words: integer number of 10 bit words

User_Data_Words (UDW) are used to convey information of a type as identified by the DID word or the DID and SDID words. The number of 10-bit words in the UDW is defined by the Data_Count field. The 10-bit words are carried in order starting with the most significant bit and ending with the least significant bit.
Checksum_Word: 10 bits

The Checksum_Word can be used to determine the validity of the ANC data packet from the DID word through the UDW. It consists of 10 bits, where bits b8 (MSB) through b0 (LSB) define the checksum value and bit b9 is the inverse (logical NOT) of bit b8. The checksum value is equal to the nine least significant bits of the sum of the nine least significant bits of the DID word, the SDID word, the Data_Count word, and all User_Data_Words in the ANC data packet. The checksum is initialized to zero before calculation, and any end carry resulting from the checksum calculation is ignored.

At the end of each ANC data packet in the payload:

word_align: bits as needed to complete 32-bit word

Word align contains enough "0" bits as needed to complete the last 32-bit word of ANC packet's data in the RTP payload. If an ANC data packet in the RTP payload ends aligned with a word boundary, there is no need to add any word alignment bits. Word align should be used even for the last ANC data packet in an RTP packet. Word align should not be used if there are zero ANC data packets being carried in the RTP packet.

When reconstructing an SDI signal based on this payload, it is important to place ANC data packets into the locations indicated by the ANC payload header fields C, Line_Number and Horizontal_Offset, and also to follow the requirements of SMPTE ST 291-1 [ST291] Section 7 "Ancillary Data Space Formatting (Component or Composite Interface)", which include rules on the placement of initial ANC data into allowed spaces as well as the contiguity of ANC data packet sequences within those spaces in order to assure that the resulting ANC data packets in the SDI signal are valid.

Senders of this payload SHOULD transmit available ANC data packets as soon as practical to reduce end-to-end latency, especially if receivers will be embedding the received ANC data packet into an SDI signal emission. One millisecond is a reasonable upper bound for the amount of time between when an ANC data packet becomes available to a sender and the emission of an RTP payload containing that ANC data packet.

ANC data packets with headers that specify specific location within a field or frame SHOULD be sent in raster scan order, both in terms of packing position within an RTP packet and in terms of transmission time of RTP packets.

3. Payload Format Parameters

This RTP payload format is identified using the video/smpte291 media type, which is registered in accordance with RFC 4855 [RFC4855], and using the template of RFC 6838 [RFC6838].

Note that the Media Type Definition is in the "video" tree due to the expected use of SMPTE ST 291 Ancillary Data along with video formats.

3.1. Media Type Definition

Type name: video

Subtype name: smpte291

Required parameters:


RTP timestamp clock rate.
When an ANC RTP stream is to be associated with an RTP video stream, the RTP timestamp rates SHOULD be the same to ensure that ANC data packets can be associated with the appropriate frame or field. Otherwise, a 90 kHz rate SHOULD be used.
Note that techniques described in RFC 7273 [RFC7273] can provide a common reference clock for multiple RTP streams intended for synchronized presentation.

Optional parameters:


Data identification and Secondary data identification words.
The presence of the DID_SDID parameters signals that all ancillary data packets of this stream are of a particular type or types, i.e., labeled with a particular DIDs and SDIDs. DID and SDID values of SMPTE Registered ANC packet types can be found on the at the SMPTE Registry for Data Identification Word Assignments [SMPTE-RA] web site.
"Type 1" ANC packets (which do not have SDIDs defined) SHALL be labeled with SDID=0x00.
DID and SDID values can be registered with SMPTE as per SMPTE ST 291-1 [ST291].
The absence of the DID_SDID parameter signals that in order to determine the DID and SDID of ANC packets in the payload, the DID and SDID fields of each ANC packet must be inspected.

This integer parameter specifies that ANC data in the stream is associated with a specific link number, stream number, or image number of multi-link, multi-stream, or multi-image SDI interfaces.

Encoding considerations: This media type is framed and binary; see Section 4.8 of RFC 6838 [RFC6838].

Security considerations: See Section 7 of [this RFC]

Interoperability considerations: Data items in smpte291 can be very diverse. Receivers might only be capable of interpreting a subset of the possible data items. Some implementations may care about the location of the ANC data packets in the SDI raster, but other implementations may not care.

Published specification: [this RFC]

Applications that use this media type: Devices that stream real-time professional video, especially those that must interoperate with legacy serial digital interfaces (SDI).

Additional Information:

Person & email address to contact for further information: T. Edwards <thomas.edwards@fox.com>, IETF Payload Working Group <payload@ietf.org>

Intended usage: COMMON

Restrictions on usage: This media type depends on RTP framing, and hence is only defined for transfer via RTP RFC 3550 [RFC3550]. Transport within other framing protocols is not defined at this time.

Author: T. Edwards <thomas.edwards@fox.com>

Change controller: The IETF PAYLOAD working group, or other party as designated by the IESG.

4. SDP Considerations

The mapping of the above defined payload format media type and its parameters SHALL be done according to Section 3 of RFC 4855 [RFC4855].

DID and SDID values SHALL be specified in hexadecimal with a "0x" prefix (such as "0x61"). The ABNF as per RFC 5234 [RFC5234] of the DID_SDID optional parameter SHALL be:

        TwoHex = "0x" 1*2(HEXDIG)
        DidSdid = "DID_SDID={" TwoHex "," TwoHex "}"

For example, EIA 608 Closed Caption data would be signalled with the parameter DID_SDID={0x61,0x02}. If a DID_SDID parameter is not specified, then the ancillary data stream may potentially contain ancillary data packets of any type.

Multiple DID_SDID parameters may be specified (separated by semicolons) to signal the presence of multiple types of ANC data in the stream. DID_SDID={0x61,0x02};DID_SDID={0x41,0x05}, for example, signals the presence of EIA 608 Closed Captions as well as AFD/Bar Data. Multiple DID_SDID parameters do not imply any particular ordering of the different types of ANC packets in the stream.

If the optional parameter LinkNumber is present, it SHALL be present only once in the semicolon-separated list, taking a single integer value.

A sample SDP mapping for ancillary data is as follows:

        m=video 30000 RTP/AVP 112
        a=rtpmap:112 smpte291/90000
        a=fmtp:112 DID_SDID={0x61,0x02};DID_SDID={0x41,0x05};LinkNumber=3

In this example, a dynamic payload type 112 is used for ancillary data. The 90 kHz RTP timestamp rate is specified in the "a=rtpmap" line after the subtype. In the "a=fmtp:" line, DID 0x61 and SDID 0x02 are specified (registered to EIA 608 Closed Caption Data by SMPTE), and also DID 0x41 and SDID 0x05 (registered to AFD/Bar Data). The LinkNumber is 3.

4.1. Grouping ANC Streams with other Media Streams

To associate an ANC RTP stream with other media streams, implementers may wish to use the Lip Synchronization (LS) grouping defined in RFC 5888 [RFC5888], which requires that "m" lines that are grouped together using LS semantics MUST synchronize the playout of the corresponding media streams.

A sample SDP mapping for grouping ANC data with RFC 4175 video using LS semantics is as follows:

        o=Al 123456 11 IN IP4 host.example.com
        s=Professional Networked Media Test
        i=A test of synchronized video and ANC data
        t=0 0
        a=group:LS V1 M1
        m=video 50000 RTP/AVP 96
        c=IN IP4
        a=rtpmap:96 raw/90000
        a=fmtp:96 sampling=YCbCr-4:2:2; width=1280; height=720; depth=10
        m=video 50010 RTP/AVP 97
        c=IN IP4
        a=rtpmap:97 smpte291/90000
        a=fmtp:97 DID_SDID={0x61,0x02};DID_SDID={0x41,0x05}

5. Offer/Answer Model and Declarative Considerations

Receivers may with to receive ANC data streams with specific DID_SDID parameters. Thus when offering ANC data streams using the Session Description Protocol (SDP) in an Offer/Answer model [RFC3264] or in a declarative manner (e.g., SDP in the Real-Time Streaming Protocol (RTSP) [RFC2326] or the Session Announcement Protocol (SAP) [RFC2974]), the offerer may provide a list of ANC streams available with specific DID_SDID parameters in the fmtp line. The answerer may respond with all or a subset of the streams offered along with fmtp lines with all or a subset of the DID_SDID parameters offered. Or the answerer may reject the offer. There are no restrictions on updating DID_SDID parameters in a subsequent offer.

6. IANA Considerations

One media type (video/smpte291) has been defined and needs registration in the media types registry. See Section 3.1

7. Security Considerations

RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [RFC3550], and in any applicable RTP profile such as RTP/AVP [RFC3551], RTP/AVPF [RFC4585] RTP/SAVP [RFC3711] or RTP/SAVPF [RFC5124]. However, as "Securing the RTP Protocol Framework: Why RTP Does Not Mandate a Single Media Security Solution" [RFC7202] discusses, it is not an RTP payload format's responsibility to discuss or mandate what solutions are used to meet the basic security goals like confidentiality, integrity and source authenticity for RTP in general. This responsibility lays on anyone using RTP in an application. They can find guidance on available security mechanisms and important considerations in Options for Securing RTP Sessions [RFC7201]. Applications SHOULD use one or more appropriate strong security mechanisms. The rest of this security consideration section discusses the security impacting properties of the payload format itself.

To avoid potential buffer overflow attacks, receivers should take care to validate that the ANC data packets in the RTP payload are of the appropriate length (using the Data_Count field) for the ANC data type specified by DID & SDID. Also the Checksum_Word should be checked against the ANC data packet to ensure that its data has not been damaged in transit, but the Checksum_Word is unlikely to provide a payload integrity check in case of a directed attack.

Some receivers will simply move the ANC data packet bits from the RTP payload into a serial digital interface (SDI). It may still be a good idea for these "re-embedders" to perform the above mentioned validity tests to avoid downstream SDI systems from becoming confused by bad ANC data packets, which could be used for a denial of service attack.

"Re-embedders" into SDI should also double check that the Line_Number and Horizontal_Offset leads to the ANC data packet being inserted into a legal area to carry ancillary data in the SDI video bit stream of the output video format.

8. References

