INTERNET-DRAFT Jack Brassil AVT Working Group HP Labs Henning Schulzrinne November 15, 2000 Columbia University Expires: April 15, 2000 RTP Payload Format for Program Cues STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This memo defines a payload format for carrying program timing and identification cues in RTP [2] packets. 1. Introduction This memo describes how to carry program timing information in RTP packets known as program cues. A cue is a signaling message inserted in, or associated with, an RTP stream. Each cue conveys information about the structure or semantics of a program. Cues typically indicate events whose precise timing is significant to receivers, such as the start or stop time of a program segment. Embedding cues in RTP streams facilitates the creation of applications which receive and process one or more RTP streams. These stream processing applications might exist at a receiver or at a network intermediary (e.g., gateway or proxy). Examples of applications whose implementations are eased by the presence of cues include program recording, insertion, switching, or adaptation; see [3] for a discussion of these applications. Such applications typically require relatively tight time synchronization with arriving media packets to operate correctly. Failure to maintain precise time synchronization -- say when switching between two source streams -- Brassil and Schulzrinne Expires April 2001 [Page 1] INTERNET-DRAFT RTP Payload Format for Program Cues November 2000 could result in undesired perceptible artifacts when the resulting stream is rendered. Tight time synchronization between applications and media streams is also required in implementations where relatively little media packet buffering is available at a stream processing point. Cues are optional within an RTP stream. Each cue corresponds to a distinct RTP packet; no attempt is made to piggyback cues on RTP packets containing media data, nor to incorporate multiple cues within a single RTP packet. Cues can but do not necessarily travel from end-to-end (i.e., sender to receiver); network intermediaries which receive an RTP stream with embedded cues may add or remove cues prior to forwarding. For example, an audio stream transmitted from a source internet radio station to affiliate radio stations (for the purpose of rebroadcast) might include certain cues which contain locally significant or private information which need not be forwarded to a receiver of an affiliate's retransmitted stream. It is anticipated that cues typically will be carried in-band; program cues and media packets will form a single RTP stream on a common connection, with cues distinguished by a separate payload type. However, this document also discusses the possible use of out- of-band cues. In this case, cues and media packets form a single RTP stream but are transmitted on separate connections. Out-of-band delivery of cues might be desirable if privacy is sought, or if the out-of-band communications uses an underlying reliable protocol (e.g., TCP) while the media packets are carried by an unreliable protocol (e.g., UDP). Out-of-band cues might also be desired in certain rights management or monitoring applications, where receipt of program cues is required but receipt of the media packets themselves might not be necessary. RTP cues are intended for the limited purpose of conveying program timing information, and other related program information whose precise timing is significant to receivers. Other out-of-band communication mechanisms (e.g., RTCP, SAP, HTTP) should be used to carry program information which is relatively time-insensitive. An example of such program information would be an internet television station's weekly programming schedule announcement or an internet radio station's future playlist. Transport layer cues provide a media-independent mechanism to convey program structure and facilitate application creation. Program structure, often at a more detailed level, might be conveyed at other protocol layers in addition to the transport layer. Cues enable network intermediaries to perform stream processing without incurring latencies associated with examining RTP payloads and identifying program structure as encoded at the application-level. In many cases this will free applications from needing to know about a multiplicity of media encoding formats. Cues can be related to other types of signaling messages and protocol exchanges. RTCP [2] conveys information about associated RTP streams on a relatively slow time scale. Since cues typically delimit and Brassil and Schulzrinne Expires April 2001 [Page 2] INTERNET-DRAFT RTP Payload Format for Program Cues November 2000 identify events, they can be related to 'named events' as proposed for telephone signaling over RTP [4]. Cues can also be viewed as protocol messages flowing downstream with content for the purpose of content modification or enhancement at 'edge' servers, somewhat analogous to the role played by proposed Internet Content Adaptation Protocol (I-CAP) [5] extensions to HTTP. 1.1. Terminology In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in RFC 2119 [6] and indicate requirement levels for compliant implementations. 1.2. Cues in Conventional Broadcasting Various types of program cues are used extensively to facilitate program insertion, switching and recording applications in traditional broadcast media (i.e., radio and television). Examples include: Radio Data System (RDS) for VHF/FM Broadcasting [7] The RDS system provides traffic, station and song information to RDS-capable radio receivers, primarily in cars. Cues, known as flags, are sent out-of-band on unused, alternate frequencies. Receivers can view program information on a small alphanumeric character display, and operate radios in automatic switch-over mode to receive a travel alert or a preferred program type (e.g., news program). Program Delivery Control (PDC) [8] PDC is a system for encoding television program identity and start and stop times in teletext to facilitate recording on PDC-capable VCRs. Program identity labels (PILs) are transmitted at the start of each broadcast, and at one second intervals throughout. Corresponding published program information (e.g., Gemstar's ShowView and VideoPlus+) can be obtained in printed TV guides and entered into a PDC-capable VCR. Pass through with Local Program Insertion in Cable Headends Insertion of local programming content at conventional analog cable television head ends has been aided by DTMF 'cue tones' transmitted along with program content from a signal source. The demodulated tones have been used to automatically trigger remote ad inserters and channel multiplexors [9-10]. Examples of cue functions include an 'Entry' (start, pre-roll) tone delivered 8 seconds prior to a local insertion to provide adequate setup time for insertion equipment initialization. A corresponding 'Exit' (stop, switch to network) Brassil and Schulzrinne Expires April 2001 [Page 3] INTERNET-DRAFT RTP Payload Format for Program Cues November 2000 tone indicates the end of an insertion period. Cues placed in RTP streams are intended to facilitate the creation of internet services comparable to RDS, PDC, and local ad insertion services in conventional radio and television broadcasting. In addition, the cues described in this document are intended to be extensible, facilitating the creation of new services which enhance the value of media streams. 2. RTP Payload Format 2.1. Introduction The proposed payload format for cues described below facilitates stream processing at both network 'intermediaries' (i.e., not destinations) and receivers (i.e., destinations). In the former scenario, an intermediary receives one or more RTP streams, processes these streams, and retransmits one or more possibly modified streams to other network intermediaries or receivers. Intermediaries might forward or remove cues that are primarily useful for processing by intermediaries. End systems also process RTP streams with embedded cues, but the streams are terminated (i.e., not forwarded). 2.2. Cue Types Each program cue represents one of four different types of signals: Event Notification An Event Notification (EN) cue notifies the recipient of the initiation of an event. Event Termination An Event Termination (ET) cue notifies the recipient of the completion of an event. Event Pending An Event Pending (EP) cue notifies the recipient of an uncoming event. Depending on application requirements, a sender may issue multiple (redundant) EPs associated with each event at various times prior to the event. Event Continuing An Event Continuing (EC) cue notifies the recipient that an event is in progress. Depending on application requirements, a sender may issue multiple ECs associated with an ongoing event at various times during the event. A compliant receiving implementation should support the cue types Brassil and Schulzrinne Expires April 2001 [Page 4] INTERNET-DRAFT RTP Payload Format for Program Cues November 2000 listed above. Each cue must be unambiguously set to one of the above cue types, otherwise the receiver should ignore it. 2.3. Use of RTP Header Fields Figure 1 depicts the standard RTP version 2 header as used by cues. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | . | . | . Figure 1: RTP packet header RTP header fields in cue packets are used as follows: Timestamp The RTP timestamp reflects the measurement point for the event indicated by the current packet. The event duration, as described in Section 2.4, extends forward from that time. The timestamp rate of cues is identical to the timestamp rate of the associated media. Marker The RTP marker bit set to 1 indicates the beginning of an event. In accordance with current practice, the cue payload format does not have a static Payload Type (PT) number, but uses an RTP payload type number established dynamically and out-of-band. 2.4. Payload Format The proposed payload format is shown in Fig. 2. Brassil and Schulzrinne Expires April 2001 [Page 5] INTERNET-DRAFT RTP Payload Format for Program Cues November 2000 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | event type |N|T|P|C| ver | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | duration | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | date | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time (cont.) | reserved | label bytecount | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | label | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: Payload Format for Cues event type The event type specifies the nature of the indicated event. This field is encoded as shown in Section 2.8. N If set to a value of one, the "notification" bit indicates an EN packet. T If set to a value of one, the "termination" bit indicates an ET packet. P If set to a value of one, the "pending" bit indicates an EP packet. C If set to a value of one, the "continuation" bit indicates an EC packet. The sender must set to 1 exactly one of the N, T, P, or C bits in each packet, otherwise a the receiver must ignore it. ver This field identifies the cue command protocol version. The sender must set it to 0x0. number The number uniquely identifies an event of specified type. That is, the {event type, number} tuple uniquely describes a distinct Brassil and Schulzrinne Expires April 2001 [Page 6] INTERNET-DRAFT RTP Payload Format for Program Cues November 2000 event. Depending on application requirements, event numbers may increase sequentially or be associated with a well known, reserved program identifier. If no ID is available, the value 0x00000000 is used. duration The (remaining) duration of an event is the estimated time remaining before completion of the specified event, in timestamp units. The duration of an event for different cue types is interpreted as follows: 1. An EP packet's duration specifies the time before the expected occurrence of the associated pending event. 2. An EN packet indicating the start-of- event has a duration set at the expected time until the corresponding end-of-event. 3. An ET indicating the end-of-event has a duration set to zero. An exception exists if multiple, redundant ETs are issued for the purpose of increased reliability. In that case, the duration of an ET message sent in advance of the event's completion specifies the estimated time before completion of the event. 4. An EC packet has a duration set to the expected time until the end of the currently continuing event's completion. date The Society of Motion Picture and Television Engineer's (SMPTE) date encoding. time Network Time Protocol (NTP) or SMPTE time encoding (to be determined). reserved This field is reserved for future use. The sender must set it to zero, and the receiver must ignore it. label bytecount The length (in bytes) of the variable-length text field. Brassil and Schulzrinne Expires April 2001 [Page 7] INTERNET-DRAFT RTP Payload Format for Program Cues November 2000 label A variable-length text field. 2.5. Markers at other Protocol Layers Information about the structure or semantics of program content might exist at protocol layers other than the Transport Layer. This would be the case, for example, for MPEG-2 transport streams encapsulated in RTP [11]. Hence, it is possible that program cues might contain either redundant or incomplete information relating to a specific event for certain media types. 2.6. Reliable Signaling and use of Redundant Cues Cues placed in an RTP stream might fail to arrive to their destination. For the purposes of obtaining reliable signaling while using an unreliable protocol (e.g., UDP), a sending application can issue multiple, redundant cues to signal the same event. A typical example of an acceptable use of redundant cues might include issuing multiple EPs followed by an EN packet. An alternate example would be issuing redundant ET packets at the end of an event. Implementations should be capable of properly handling redundant cues, as well as cues that are erroneously duplicated in transit. Mechanisms to increase the reliability of RTP packet arrival to a destination can be applied to streams containing cues. For example, the forward error correction mechanism described in RFC 2733 [12] may be used to recover from packet loss in streams containing cues. 2.7. Timing of Event Packets Cues placed in an RTP stream can arrive to a destination out-of- order. Hence, the precise placement of a cue in an RTP stream is not required. In general, application requirements dictate the appropriate time of transmission of cues in an RTP stream. The timestamp and duration fields of a cue convey the precise time of an event, not the cue's position within an RTP stream nor its sequence number. It is the responsibility of a processing application to buffer enough packets to handle lost or out-of-order cues. However, placing a cue near in time to its associated event can reduce the need for receiver buffering. Applications should strive to place certain cues in the transmitted stream at approximately the time of the associated event. In particular, an EN cue marking the beginning of an event should be placed in the stream within 50ms of the transmission time of the first media packet associated with the event. An ET cue (with duration set to zero) marking the end of an event should be placed in the stream within 50ms of the transmission time of the last media packet associated with the event. Brassil and Schulzrinne Expires April 2001 [Page 8] INTERNET-DRAFT RTP Payload Format for Program Cues November 2000 In general a source should also avoid sending cues to mark events which have occurred at much earlier times, since these cues are unlikely to be useful to the receiver. 2.8. Event numbering Table 1 summarizes the encoding of the event type field in the cue payload format. encoding (decimal) Event type _________________________ 10 11 12 13 14 15 16 17 18 19 20 21 >22 to be determined Table 1: Encoding of event type field. 2.9. Application Example Consider how program cues might ease coordination between cooperating applications in the following commercial video advertisement insertion application. Suppose a broadcaster transmits an RTP stream to a network affiliate for the purpose of modification and retransmission to receivers. The broadcaster seeks to allow the affiliate a time slot for an out-of-network video program insertion. The broadcaster issues an EP cue (with event type 13) 8 seconds prior to an interstice suitable for a program insertion. The network affiliate receives the EP cue, and initiates setup of video insertion equipment. A second, redundant EP cue is sent 0.5 seconds prior to the final RTP packet of the program segment preceding the interstice, providing the affiliate with an improved estimate of the upcoming interstice's start time. Subsequent to the final media packet transmission for the terminating program segment, the broadcaster issues an EN cue (event type 13). Upon receipt of the EN cue the downstream affiliate begins transmitting a new, inserted program to the receivers. EC cues are transmitted by the broadcaster to the affiliate at 1 second intervals during the program interstice period. Immediately prior to the broadcaster transmitting the first media packet of the subsequent program segment to the affiliate, the Brassil and Schulzrinne Expires April 2001 [Page 9] INTERNET-DRAFT RTP Payload Format for Program Cues November 2000 broadcaster issues an ET packet to indicate the termination of the interstice. In the above example, no cues were forwarded to receivers by the affiliate; all cues transmitted by the broadcaster were deemed private and not included in the retransmitted stream. 3. Indicating Cue Usage in SDP and RTSP Cues can be sent either with media packets or as a separate stream. For the latter case, cues can be sent on separate multicast groups or separate ports from the media. In either case, these configuration options must be indicated out of band. This section describes how this can be accomplished using the Session Description Protocol (SDP), specified in RFC 2327 [13], and the Real Time Streaming Protocol (RTSP), specified in RFC 2326 [14]. 3.1. Cues sent separately from a Media Stream Cues can be sent on a separate connection from media packets. This can mean they are sent on a different port and/or multicast group from the media. When this is done, several pieces of information must be conveyed: The address and port where the cue stream is being sent to The payload type number for the cues Which media stream the cues are describing The payload type number for the cue stream is conveyed in the m line of the Session Description of the associated media, listed as if it were another valid encoding for the stream. There is no static payload type assignment for cues, so dynamic payload type numbers must be used. The binding to the number is indicated by an rtpmap attribute. The name used in this binding is "cues". The presence of the payload type number in the m line of the associated media does not mean the cues are sent to the same address and port as the media. Instead, this information is conveyed through an fmtp attribute line. The presence of the cue payload type on the m line of the media serves only to indicate the stream associated with the cues. Recall that the format for the fmtp line is: a=fmtp:
where 'number' is the payload type number present in the m line. Port is the port number where the cue stream is sent to. The remaining Brassil and Schulzrinne Expires April 2001 [Page 10] INTERNET-DRAFT RTP Payload Format for Program Cues November 2000 three items - network type, address type, and connection address - have the same syntax and semantics as the c line from SDP. This allows the fmtp line to be partially parsed by the same parser used on the c lines. The following is an example SDP for cues sent separately from their associated media stream: v=0 o=brassil 2890844526 2890842807 IN IP4 126.16.64.4 s=Cueing Seminar c=IN IP4 224.2.17.12/127 t=0 0 m=audio 49170 RTP/AVP 0 78 a=rtpmap:78 cues/8000 a=fmtp:78 49172 IN IP4 224.2.17.12/127 The presence of one m line in this SDP indicates that there is a single media stream - audio. The default media format of 0 indicates that the audio is PCM encoded, and an additional associated format for cues has payload type number 78. The cues are sent to the same multicast group and TTL as the audio, but on a port number two higher (49172). 3.2. Usage with RTSP RTSP [14] can be used to request cues to be sent as a separate stream. When SDP is used with RTSP, the Session Description does not include a connection address and port number for each stream. Instead, RTSP uses the concept of a "Control URL". Control URLs are used in SDP in two distinct ways. 1. There is a single control URL for all streams. This is referred to as "aggregate control". In this case, the fmtp line for the cue stream is omitted. 2. There is a Control URL assigned to each stream. This is referred to as "non-aggregate control". In this case, the fmtp line specifies the Control URL for the cue stream. The URL may be used in a SETUP command by an RTSP client. The format for the fmtp line for cues with RTSP and non-aggregate control is: a=fmtp: where 'number' is the payload type number present in the m line. Control URL is the URL used to control the cue stream. Note that the Control URL does not need to be an absolute URL. The rules for converting a relative Control URL to an absolute URL are given in RFC 2326, Section C.1.1. Brassil and Schulzrinne Expires April 2001 [Page 11] INTERNET-DRAFT RTP Payload Format for Program Cues November 2000 4. Security Considerations RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification (RFC 1889 [1]), and any appropriate RTP profile (for example RFC 1890 [15]).This implies that confidentiality of the media streams is achieved by encryption. Because the data compression used with this payload format is applied end-to-end, encryption may be performed after compression so there is no conflict between the two operations. This payload type does not exhibit any significant non-uniformity in the receiver side computational complexity for packet processing to cause a potential denial-of-service threat. Additional security considerations are described in RFC 2198. 5. References [1] S. Bradner, "The Internet Standards Process -- Revision 3", BCP 9, Request for Comments 2026, Internet Engineering Task Force, Oct. 1996. [2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: a transport protocol for real-time applications," Request for Comments (Proposed Standard) 1889, Internet Engineering Task Force, Jan. 1996. [3] J. Brassil, H Schulzrinne, "Enhancing Internet Streaming Media with Cueing Protocols", to appear in IEEE INFOCOM'01, April 2001. [4] H. Schulzrinne, S. Petrack, "RTP Playload for DTMF Digits, Telephony Tones and Telephony Signals," Internet Draft, AVT Working Group, Oct. 1999. [5] http://www.i-cap.org/ [6] S. Bradner, "Key words for use in RFCs to indicate requirement levels," Request for Comments (Best Current Practice) 2119, Internet Engineering Task Force, Mar. 1997. [7] http://www.rds.org/rds98 [8] http://www.gemstar.co.uk/en/showview/pdc.html [9] DVS-075, "Cue Commands in Digital Systems", Digital Video Subcommittee, Society of Cable Telecommunications Engineers, Inc., March 25, 1997. [10] DVS-253, "Digital Program Insertion Cueing Message for Cable", Digital Video Subcommittee, Society of Cable Telecommunications Brassil and Schulzrinne Expires April 2001 [Page 12] INTERNET-DRAFT RTP Payload Format for Program Cues November 2000 Engineers, Inc., September 27, 1999. [11] D. Hoffman, et al, "RTP Payload for MPEG1/MPEG2 Video," Request for Comments 2250, Internet Engineering Task Force, Jan. 1998. [12] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for Generic Forward Error Correction," Request for Comments 2733, Internet Engineering Task Force, Dec. 1999. [13] M. Handley and V. Jacobson, "SDP: session description protocol," Request for Comments (Proposed Standard) 2327, Internet Engineering Task Force, Apr. 1998. [14] H. Schulzrinne, A. Rao, R. Lanphier, "Real Time Streaming Protocol (RTSP)", Request for Comments (Proposed Standard) 2326, Internet Engineering Task Force, April 1998. [15] H. Schulzrinne, "RTP profile for audio and video conferences with minimal control," Request for Comments (Proposed Standard) 1890, Internet Engineering Task Force, Jan. 1996. 6. Authors' Address: Jack Brassil HP Laboratories 1501 Page Mill Road M/S 1U-17 Palo Alto, CA 94304 USA Phone: (650) 236-8064 Fax: (650) 857-5100 EMail: jtb@hpl.hp.com Henning Schulzrinne Dept. of Computer Science Columbia University 1214 Amsterdam Avenue New York, NY 10027 USA Email: schulzrinne@cs.columbia.edu Brassil and Schulzrinne Expires April 2001 [Page 13]