INTERNET-DRAFT Eric Fleischman draft-fleischman-asf-rtp-record-00 Anders Klemets Microsoft Corporation November 14, 1997 Expires: May 14, 1998 Recording MBone Sessions to ASF Files Status of This Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Abstract This document specifies two approaches by which multimedia data (e.g., MBone conferences), transmitted using the Real-Time Protocol (RTP), may be recorded to Advanced Streaming Format (ASF) files. The first method requires a minimum amount of buffering at the recording station but results in recordings which identically preserve the received content including out of order packets, network ''jitter'', etc. The second approach requires buffering at the recording station but results in enhanced recordings (i.e., higher percentage of correctly ordered packets, elimination of a percentage of received jitter, potential recovery of a percentage of lost packets). Both approaches record all received RTP content and the relevant subset of RTCP information. This recording occurs transparently to the MBone conference or RTP session, and does not involve any alterations to normal RTP, RTCP, or ASF use. E. Fleischman and A. Klemets [Page 1] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 1. Introduction The MBone is the part of the Internet that supports IP multicast, and thus permits efficient many-to-many communication. It is used extensively for multimedia conferencing. Such conferences usually have the property that tight coordination of conference membership is not necessary; to receive a conference, a user at an MBone site only has to know the conference's multicast group address and the UDP ports for the conference data streams. The specific MBone conferences addressed by this document are those which use the Real-time Transport Protocol (RTP, see [1]). In addition, the mechanisms described within this document also support unicast RTP uses. This document describes two methods for recording multimedia data that is transmitted using the Real-Time Transport Protocol (RTP, see [1]) into Advanced Streaming Format (ASF; see [2]) files. The approach is independent of the network protocol used to transmit RTP packets and supports the recording of both unicasted and multicasted sessions. Data thus recorded may subsequently be played back by recreating the original RTP packets and transmitting them using either unicast or multicast techniques. A recording can also be played back locally, using a suitable playback tool. Playback can be controlled using RTSP [4] or other comparable stream control mechanisms. RTP is a protocol for carrying arbitrary real-time data. Each RTP packet contains a sequence number and timestamp, which can be used by a receiver to detect losses and present the data at the right time. RTP uses a control protocol, RTCP, which can be used to synchronize different real-time streams. For synchronization to be possible, the streams must be transmitted such that each stream has a distinct RTP synchronization source (SSRC) identifier. RTP is most commonly used over UDP. However, it may be used with any transport protocol that detects bit errors, and that conveys the length of an RTP packet. RTP does not specify a mechanism for the reliable transfer of data. The protocol also does not address the encapsulation of specific media types, but instead defers it to various profile specifications. ASF is an extensible file format for recording optionally synchronized multimedia streams. The format is not tied to any particular media type or compression scheme. Similarly, the file format was designed to be operating system and data communications protocol independent. 2. ASF Overview E. Fleischman and A. Klemets [Page 2] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 The Advanced Streaming Format is defined in [2]. An ASF file consists of three top-level objects: The Header Object, the Data Object and, optionally, the Index Object. The Header Object provides global information about the file as a whole as well as specific information about the multimedia data stored within the Data Object. This latter content provides the information necessary to correctly interpret each of the media streams found within the Data Object. The Header Object is a container for other objects that provide the following specific functions: * File Properties Object -- describes the global file attributes. * Stream Properties Object -- defines a media stream, its characteristics, and the information needed to decode that stream. * Content Description Object -- contains all bibliographic information, which may be either general for the file as a whole or stream specific. * Component Download Object -- provides information on playback components. * Stream Group Object -- logically groups media streams together into specific rendering contexts. * Scaleable Object -- defines scalability relationships among (scaleable) media streams containing bands. * Prioritization Object -- defines the relative prioritization between media streams. * Mutual Exclusion Object -- defines exclusion relationships between media streams (e.g., language selection) * Inter-Media Dependency Object -- defines dependency relationships among mixed media streams. * Rating Object -- provides the W3C PICS ([5], [6]) rating of the file. * Index Parameters Object -- supplies the information necessary to regenerate the index of an ASF file. * Language List Object -- supplies Language Identifier information that is used by several other ASF objects. The Data Object contains all the data for each of the recorded media streams. This data is stored in the form of ASF Data Units. In the general case, ASF Data Units are designed to be directly insertable into the payloads of data communications transport protocols in order to be streamed across the network. Each ASF Data Unit is of variable length, and contains data for only one media stream. Data units are sorted within the Data Object based on the time at which they should be delivered (send time). Due to the way Data Units are sorted, consecutive Data Units may contain data from different media streams. E. Fleischman and A. Klemets [Page 3] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 ASF media streams logically (in the general case) consist of sub- elements that are referred to as objects. What an object happens to be in a given media stream is entirely media stream dependent (e.g., it is a specific image within an image media stream, a frame within a (non- scalable) video stream, etc). The Index Object contains a time-based index into the multimedia data of an ASF file. The time interval that each index entry represents is set at authoring time and stored in the Index Object. Since it is not required to index into every media stream in a file, a list of the media streams that are indexed follows the time interval value. Each index entry consists of one data unit offset per media stream being indexed. This information allows stream-specific index operations to occur. A minimal ASF implementation consists of a Header Object containing solely a File Properties Object, one Stream Properties object, and one Language List Object as well as a Data Object containing only a single ASF data unit. 3. Recording MBone Sessions The process of recording MBONE sessions may be viewed as optionally consisting of four steps: Step 1 -- Create the ASF Header Object, which will provide the context for correctly interpreting the data that may subsequently be recorded. Step 2 -- Record one or more RTP streams into the ASF Data Object. Step 3 -- Optionally post-process the ASF Header Object to ensure that it is as complete and as efficiently stored as possible Step 4 -- Optionally create an ASF Index Object. 3.1. Preparing ASF Header Information The ASF Header Object contains various other objects that contain information about the media streams in the Data Object. It is often desirable to create an ASF Header Object before the transmission that is to be recorded has begun. This would be appropriate if information is already available that describes the RTP sources that are to be recorded. Such information might be obtained through SDP [7], RTSP [4], or some other non-RTP means. It is also possible to add information to the ASF Header Object as new information is learned during the recording E. Fleischman and A. Klemets [Page 4] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 of the RTP traffic. ASF requires that an instance of the Stream Properties Object (SPO) must be defined to describe each media stream recorded within the Data Object. A media stream generally corresponds to an RTP source in an RTP session. RTP sources, in turn, are identified by the value of the SSRC field in the RTP header. The IP address and port number to which the data is sent identifies RTP sessions. On the MBone, most applications send audio and video on separate RTP sessions, and thus audio and video would be recorded as two separate media streams. However, all RTP packets that belong to a media stream are expected to have identical RTP Payload Type fields. If an RTP source changes the value it is using for the RTP Payload Type field :mid-session", then RTP packets with the new (i.e., different) Payload Type fields should be stored as a different media stream within ASF with its own unique SPO. It is recommended that the relationship between streams that compose the traffic from a single RTP source be associated by grouping them via the ASF Header Object's Stream Group Object. While the session announcement will generally provide enough information to construct an initial File Properties Object (FPO) and some of the necessary SPOs before the session begins, loosely controlled (MBone) conferences can permit additional participants to join the conference. Therefore, provision should be made to anticipate the possibility of additional speakers joining the session. A recommended way to satisfy this provision is to reserve space within the ASF Header Object via the ASF Placeholder Object (See Appendix A) where additional ASF objects may be written (e.g., additional SPOs) as the MBone session dynamically progresses. Static RTP Payload Types may be handled in one of two ways: 1. Static RTP Payload Types should be translated into the equivalent ASF standard media type (see Section 8 of [2]) using the equivalent ASF codec (e.g., see Reference [10]), if known. 2. Alternatively, they can be recorded as RTP Media Types as defined in Appendix B. Dynamic RTP Payload Types may be handled in one of three ways: 1. The dynamic RTP payload type should be translated into the equivalent ASF standard media type (see Section 8 of [2]) using the equivalent ASF codec, if known. This means that the recorder will need to identify the actual codec used by that dynamic RTP Payload Type instance based upon the available information. The identity of this E. Fleischman and A. Klemets [Page 5] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 codec will then need to be expressed as a specific ASF UUID identifier (e.g., see Reference [10]) within the SPO's Codec ID field. 2. Alternatively, the recorder can translate the dynamic RTP payload type to the appropriate static RTP payload type, if any and record it as an RTP Media Type as defined in Appendix B. 3. Alternatively, the recorder can record it as a dynamic RTP payload type as defined in Appendix B. RTP payload types, which can not be deciphered by any of the above approaches, should be ignored (i.e., that media stream can not be recorded). Note that if the RTP payload is translated into the equivalent ASF standard media type, an inverse transformation will need to be applied by a playback device, if the recording is retransmitted as RTP packets. 3.2. Two Recording Approaches The capabilities of local systems vary. For this reason, the document suggests that limited capability systems seek to record data via the Packet Capture Mode, which is described in section 3.2.1. More capable systems are recommended to use the Record Structure Mode, described in section 3.2.2. 3.2.1. Packet Capture Mode (Limited Buffering) The Packet Capture Mode recording alternative seeks to write RTP data as it is received to the ASF Data Object on the disk. The clock of the recording computer is used to determine the ASF Data Unit's Send Time value. The Send Time value is calculated by subtracting the multimedia session's start time (as recorded by the recording computer) from the recording computer's current time and converting the result into millisecond units. The RTP timestamp is directly written as the ASF Data Unit's Presentation Time value, again making the necessary conversions to account for the fact that the initial RTP timestamp value is random while the initial ASF Send Time and Presentation Time values are zero. The granularity of the Presentation Time units (i.e., the Presentation Time Numerator and Presentation Time Denominator fields within the SPO) should be set to the clock granularity for that RTP source. ASF's default presentation time granularity (i.e., a millisecond) should initially be used for those cases in which the actual clock granularity E. Fleischman and A. Klemets [Page 6] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 is not known. The value of the Presentation Time Flags within the SPO for this media stream shall thus be configured to be "11" (i.e., Full Data Unit Presentation Time). RTCP Sender Reports (SR) for the RTP source being recorded can be used to calculate the clock granularity of the source. This is useful if the clock granularity is otherwise unknown. It is also possible to use Sender Reports to detect skews between the clock granularity used by the source, and the granularity that is given by the RTP Payload Type specification or profile. If such a skew is detected, the Rational Time Values (i.e., Presentation Time Numerator and Presentation Time Denominator fields) of the SPO should be altered accordingly. This approach has the advantage of being simple and direct to implement. It has the following disadvantages: * Jitter is preserved - and repeated re-recordings of the same contents by this manner may exacerbate the jitter on each subsequent recording. * Out-of-order packets remain out of order. 3.2.2. Record Structure Mode (Buffering) The Record Structure Mode requires that packets be buffered a finite amount of time (e.g., 5 seconds) before being written to disk. Packets within the buffer should be correctly ordered. Packet holes occurring within the buffer interval should be filled by retransmitted packets (if any). Within this approach, the value of the RTP Timestamp field is used to compute the send time. Since the RTP timestamp starts at a random value, while the ASF Send Time and Presentation Time start at zero, a conversion into appropriate ASF Send Time values must be made. The send time is stored with a 1-millisecond granularity. The appropriate RTP Payload Type specification or profile gives the granularity of the RTP Timestamp. RTCP Sender Reports (SR) may be used calculate the granularity of the RTP Timestamp if it is otherwise unknown. Sender Reports can also be used to detect skews between the RTP Timestamp granularity and the granularity specified in the RTP Payload Type specification or profile. If such a skew is detected, the send time values for currently buffered packets of that media type have to be altered (retaining their millisecond granularities) to correctly reflect the skew. E. Fleischman and A. Klemets [Page 7] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 The following values should be recorded within the Stream Properties Object for the media streams recorded by this approach: The clock frequency of the RTP payload type should be appropriately recorded into the Presentation Time Numerator and Presentation Time Denominator fields. The Presentation Time Flag value should have the value of "01" and the Presentation Time Delta field should have a value of zero. This means that both the ASF send time and presentation time have the same value and that subsequent RTP retransmissions of this data will contain only one timestamp (i.e., RTP's timestamp). This approach has the advantage of correcting some of the received jitter, correctly sorting some of the out-of-order packets, and potentially filling in some lost packets (assuming a retransmission scheme is used). The disadvantage of this approach is that it is more complex to implement. This is particularly the case if the RTP payload type's clock frequency is not known ahead of time and has to be subsequently learned via RTCP transmissions. In addition, it requires additional buffering on the recording computer. 3.3. Recording MBONE Sessions The following translations from RTP packet fields to ASF data fields are identical for both recording approaches. 3.3.1. RTP Mixers and Translators The combined streams resulting from Mixers and Translators need to be demultiplexed back into their original component streams when being recorded into ASF, if possible. If this is not possible, then copies of the RTP packet containing data that is attributed to multiple sources need to be stored into each of these sources' media streams (i.e., ASF Data Units). In either case, these streams may be optionally re-mixed when they are subsequently replayed from the ASF files depending upon local implementation considerations. 3.3.2. RTP Packet Information The RTP Header's Payload Type field combined with the SSRC is used to determine the ASF Stream Number value for that media stream. This Stream Number value identifies which SPO instance should be used to define this media stream. This value is recorded into the Stream Number field of the ASF Data Unit. The Version field in the RTP header is not recorded into the ASF file E. Fleischman and A. Klemets [Page 8] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 unless it is a version other than 2. If the Version field in the RTP header is other than 2, the RTP version number should be recorded into the ASF Header Object's Content Description Object (CDO; see Section 5.4 of [2]) using a value of 73 for the Field Type field. The Padding bit, and the Padding field that is present if the bit is set, is not recorded. If an RTP packet where the Padding bit was set is received, the padding field should be removed from the RTP payload. Padding may be regenerated when retransmitting the recording, if necessary. SSRC information should be written into the CDO as an aid for remembering the association between an SSRC and a Media Stream. This will also permit the original sequence number to be optionally recreated once the recorded data is retransmitted. The 32-bit SSRC value will need to be converted into a string when it is stored into the Value field of the CDO. When storing the SSRC as a Unicode string, the SSRC is treated as an unsigned 32-bit integer, and it must be converted to the local byte order (i.e., host byte order). The value of the Field Type field is 70. Because the initial RTP timestamp value is a random value, the initial RTP timestamp value should also be recorded into the CDO. This will permit the original timestamp sequence to be optionally recreated once the recorded data is retransmitted. The 32-bit timestamp value will need to be converted into a Unicode string when it is recorded into the Value field of the CDO. The value of the CDO's Field Type field is 71. The initial RTP Sequence Number value should be recorded into the CDO. This will permit the original number to be optionally recreated once the recorded data is retransmitted. The 16-bit Sequence Number value will need to be converted into a Unicode string when it is stored within the Value field of the CDO. When storing the Sequence Number as a string, the Sequence Number is treated as an unsigned 16-bit integer, and it must be converted to the local byte order (i.e., host byte order). The value of the Field Type field is 72. It should be noted that ASF's concept of Object Number differs from RTP's concept of Sequence Number although they are both used to identify out-of-order and missing information. [Note: earlier versions of the ASF spec used the term "ObjectID" instead of "Object Number".] The former identifies specific media stream "objects" as a part of a fragmentation and grouping schema. What an object happens to be in a given media E. Fleischman and A. Klemets [Page 9] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 stream is entirely media stream dependent (e.g., it is a specific image within an image media stream, a frame within a (non-scalable) video stream, etc). Since object fragmentation occurs within a specific RTP Payload Type instance and RTP headers do not indicate this type of information, an identical translation of the original Object Number semantics would require a decoding of the media stream. The value of pursuing this type of overhead is highly questionable, especially when the ultimate goal of identifying missing or out-of-order information is common between the two approaches. Therefore, the RTP sequence number should be directly mapped into the ASF's Object Number field of the ASF Data Unit. Since the 16-bit Sequence Number starts at a random interval while the 8-bit Object Number starts at zero, the mapping between the Sequence Number and Object ID needs to reflect this difference (e.g., Current-Sequence-Number value minus Original Sequence-Number value = Object Number) and account for the fact that Object Numbers "wrap around" to zero every 2^8th packet and Sequence Numbers "wrap around" when their value hits 2^16. If the CSRC fields within the RTP header are demultiplexed into their original component streams when being recorded, then the CSRC fields are not recorded. If, however, this is not possible, then the CSRC information should be written into the ASF Data Unit's extension field as described below. If the RTP payload has been converted into an "equivalent ASF standard media type" (see Section 3.1), then the RTP Extension Object described by the next paragraph is optional. However, if the RTP Media Type described in Appendix B has been used to record the data, then the RTP Extension Object is required to be used if either the RTP Header's M-bit or the RTP Header's eXtension (X) bit are ever set within that stream, or if CSRC information is ever needed to be recorded within that media stream. The RTP Extension Object permits exact copies of the original RTP packets to be regenerated, if desired. The RTP Extension Object is an instance of the Extension Object that is described within Section 5.3.1 of [2]. Extension Objects are associated with a specific media stream's SPO and indicate the semantics and format of specific data (i.e., in this case RTP Packet Header data) that is stored on a per packet basis within the Extension Data field of the ASF Data Unit (see Section 6.1 of [2]). The RTP Extension Object is defined as follows: * The value of the Extension Data Size field is 0xFFFF * The UUID value of the Extension System field is {96800c63-4c94-11d1- 837b-0080c7a37f95}. E. Fleischman and A. Klemets [Page 10] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 These definitions indicate that this recording shall follow the "variable length" extension data encoding format (i.e., one bit length field followed by the extension data) within the Extension Data field of the ASF Data Unit. In the case of the RTP Extension Object, the Extension Data field of the ASF Data Unit has the following syntax: Field Name: Size: Extension Length 8 bits - Size in bytes of the Extension Data and Flag fields (i.e. sizeof (Extension Data) + 1) Flag 8 bits X-bit 1 bit (LSB) -- contains the RTP Header's X-bit value CSRC Count 4 bits -- contains the RTP Header's CC value M-bit 1 bit -- contains the RTP Header's M-bit value Reserved 2 bits (MSB) Extension Data RTP Header CSRC list, if any, followed by Extension Data, if any The "variable length" encoding means that if either the X bit is set or the CSRC Count has a non-zero value, then the Extension length, flag, and RTP header extension data are written into the Extension Data field of the ASF Data Unit. If both the X bit is cleared and the CSRC Count has a zero value, then only the extension length and flag fields are written to the Extension Data field of the ASF Data Unit. If both the X- bit is set and the CSRC Count field has a non-zero value, then the CSRC list of the RTP Header appears first immediately followed by the RTP Header Extension data within the Extension Data field. These fields are arranged in big-endian order (also known as network byte order). 3.3.3. RTCP Packet Information RR and BYE packets are not recorded into ASF files. Clock skew information obtained from SR packets is used for the timestamp calculations described in Sections 3.3.1 and 3.3.2. Other information contained in SR packets, except for APP and SDES information, is not recorded. SDES information is stored in the ASF Header Object's Content Description Object (CDO). Appropriate SDES items (i.e., "CNAME", "NAME", "EMAIL", "PHONE", "LOC", "TOOL", "NOTE", and "PRIV") shall be written into the CDO as described by Appendix C. Synchronization relationships between media streams containing the same CNAME value should be retained via associating them by ASF's Inter-Media Dependency Object (Section E. Fleischman and A. Klemets [Page 11] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 5.12 of [2]). APP information should be handled in one of two ways. 1. If the recorder understands (through out-of-band mechanisms outside of the scope of both ASF and RTP) that the APP information contains script commands or invocations, which correspond to either the ASF Header Object's Script Command Object (see section 5.5 of [2]) or to a Command Media stream type (see section 8.7 of [2]), then the recorder can convert the APP information into the appropriate ASF constructs. 2. If the recorder does not understand the APP information then that information should be appropriately recorded "as is" into the ASF Header Object's Script Command Object. If the values of the SDES fields from a particular RTP source change during the recording, it is recommended that the CDO contain the initial value for the SDES field. Subsequent values of the SDES fields should then be recorded as a separate media stream, via the mechanisms described in Appendix D. 3.4. Optional Post-Processing of the ASF Header Whenever live recordings are made, the Live Bit must be set in ASF's File Properties Object. This signifies that certain fields in the ASF File Properties Object and the Stream Properties Object(s) are invalid and should be ignored. In addition, these same files are likely to also contain the ASF Placeholder Object (see Appendix A). It is highly recommended, but not required, that post-processing be done to ASF files to clear the Live Bit, remove the ASF Placeholder Object, and to write valid data into the fields which are invalid when the Live Bit is set. 3.5. Optional Creation of the ASF Index Object ASF uses the Index Parameters Object in the ASF Header to identify the parameters and media streams whose data will be indexed. This object is described in Section 5.14 of [2]. If the Index Parameters Object does not yet exist for this file, then it needs to be constructed before the Index Object is built. Using the information contained within the Index Parameters Object, the Index Object is constructed as defined in Section 7 of [2]. 3.6. Playback of the Recorded RTP Data Recorded media streams are stored into the ASF Data Object as ASF Data Units (see Section 6.1 of [2]). Each ASF Data Unit contains a "header E. Fleischman and A. Klemets [Page 12] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 field" together with the media data which is being stored. The payload of each RTP packet comprises the media data stored within the ASF Data Unit. The RTP header itself is not stored but its content is mapped into the SPO, CDO, and the header field of the ASF Data Unit. The ASF file contains sufficient information to play back the recorded data, either locally or via a remote playback device. When RTP packets are recorded into the ASF file using the RTP Media Type (see Appendix B), sufficient information exists to regenerate RTP packets with the same SSRC and sequence numbers as the original packets, if desired. Additionally, it is possible to regenerate RTCP SDES and APP packets with the same content as those sent by the original RTP source. This permits recorded data to be retransmitted into an existing MBone conference, for example, in such a manner that it may appear that the data originates from the original RTP source. This specification does not define a required feature set for playback devices. For example, even though it is possible to retransmit the recorded data using RTP, playback devices are not required to do so. Appendix A. ASF Placeholder Object Definition "Loosely controlled" sessions permit participants to enter and leave without membership control or parameter negotiation. Since one can not always predict how many participants will speak, nor what media types they will use, a mechanism is needed to reserve space within the Header Object so that new Header Objects (e.g., Stream Properties Objects) may be readily added to the header when needed without requiring the header to be re-written. The purpose of the ASF Placeholder Object is to fulfill this "place holder" function. New header objects are added into the space reserved by the ASF Placeholder Object. The ASF Placeholder Object will then reduce the amount of space it is reserving by the amount taken by the new object(s). ASF Placeholder Objects are ignored (skipped over) when ASF Header Information is conveyed to remote nodes. Even so, it is recommended that they be removed by post processing (see section 3.4) to make more compact files. E. Fleischman and A. Klemets [Page 13] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 The ASF Placeholder Object is defined as follows: Field Name: Size: Value: Object ID 128 bits This field contains the following UUID value: {D6E22A0F-35DA-11d1-9034- 00A0C90349BE} Object Size 64 bits The size of this object in bytes (i.e., Reserved field value + 24) Reserved (Object Size - 24) * 8 bits Reserved space Appendix B. RTP Media Type ASF has defined standard media types for Audio, Video, Image, Timecode, Text, MIDI, Command, and Media-Objects (Hotspots) in Section 8 of [2]. Implementations, which support these types of media streams, are expected to implement them in the manner defined within the ASF standard. MBone content, which is stored within ASF, is therefore expected to be mapped into the standard ASF media streams format whenever possible. However, occasions will exist when it will not be possible to conform to this requirement. Possible reasons include the following: * The recorder may not be aware of which media type is associated with an RTP Payload Type (i.e., whether the RTP Payload Type is referring to Audio, Video, or some other media type). * The recorder may not know which ASF-defined codec corresponds to the codec assumed by the RTP Payload Type and therefore it would be unable to complete the mapping into a standard ASF media type. * The RTP Payload Type may indicate an interleaved data stream (e.g., video and audio combined into a single stream). No standard ASF media type has yet been defined for such interleaved data. * The RTP Payload Type may indicate a media type which is not among the standard ASF Media Types. For these reasons and others, a provision must exist to record MBone data as a distinct RTP Media Type. This appendix defines the format of RTP Media Type. The RTP Media Type is defined within the Stream Properties Object (SPO) by placing the UUID value {96800c65-4c94-11d1-837b-0080c7a37f95} into the Stream Type field. The following information is then stored as Type- E. Fleischman and A. Klemets [Page 14] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 Specific Data field within the SPO: Field Name: Field Type: Size (bits): Description: Payload Type UINT 8 The Payload Type value indicated by the RTP header. Profile Size UINT 16 Size in bytes of the Profile field. Profile UINT8 ? ASCII string identifying the Profile which has defined the Payload Type. (E.g., "AVP" for the profile defined by [3] and [9].) An empty string is used if the profile is not known. Announcement ID Size UINT 16 Size in bytes of the Announcement ID field. Announcement ID UINT8 ? MIME Type of the session announcement mechanism used. (E.g., "application/x-sdp" for SDP [7] announcements.) Announcement Size UINT 16 Size in bytes of the Announcement field. Announcement UINT8 ? ASCII string containing the definition for this media stream. (E.g., for SDP [7] announcements, this would contain the entire rtpmap entry for this media stream.) All ASCII strings in the RTP Media Type are terminated by a NULL character. These fields should be stored in little-endian byte order (i.e., the orientation used in the ASF Header Object). The final four fields (i.e., Announcement ID Size, Announcement ID, Announcement Size, and Announcement) are used to convey information about the dynamic RTP payload type. This information might have been available to the recording device through non-RTP means. Examples of possible sources of such information include session descriptions, such as SDP [7], and presentation descriptions [4]. However, if a static RTP Payload Type is being specified, both the Announcement ID Size and the Announcement Size fields may have a value of zero indicating that the Announcement ID and Announcement fields have not been specified. The rest of the SPO should be specified as indicated in Section 3.2 E. Fleischman and A. Klemets [Page 15] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 above. The received RTP data of this media stream is stored into the ASF Data Object as described in Section 3.2 and Section 3.3 above. Appendix C. Recording SDES Information Section 5.4 of [2] describes the syntax and semantics of the Content Description Object (CDO) within the ASF Header Object. This object consists of an array of Description Records containing four logical entries: 1. A Field Type value which identifies the semantics of the entry. Each SDES packet may be recorded to the CDO using the following pre- defined Field Type (unsigned integer) values: SDES entry: Field Type Value: CNAME 61 NAME 62 EMAIL 63 PHONE 64 LOC 65 TOOL 66 NOTE 67 PRIV 68 2. Stream Number to identify to which media stream this CDO entry refers. 3. Name - Name of the entry. This field is redundant to the Field Type value and therefore the field is frequently not used. However, applications may optionally use this field for language "localization" reasons (e.g., to translate the entry into a specific target language). 4. Value - the information conveyed by the specific SDES message (e.g., User and domain name in a CNAME packet). Appendix D. SDES Media Streams Section 3.3.3 stated that the first instance of a specific SDES RTCP instance (i.e., a specific SDES item associated with a specific RTP source identifier; e.g., a CNAME value for a specific SSRC) should be recorded into the Content Description Object (CDO). The Stream Number field within the CDO should refer to the media stream associated with the RTP source identifier (i.e., SSRC/CSRC field of section 6.4 of [1]) E. Fleischman and A. Klemets [Page 16] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 of that SDES packet chunk. The CDO has provisions for storing only one SDES type instance (e.g., only one instance of a CNAME) for any given media stream. Therefore, subsequent instances of the same SDES type for that media stream will need to be recorded as a distinct "media stream" if that information is to be preserved. This appendix defines how to create such an SDES media stream. An SDES media stream consists of SDES information written into the ASF Data Object via the mechanisms described in section 3.2. Each SDES media stream records SDES information from only one RTP source identifier. A Stream Properties Object (SPO) is constructed for each SDES media stream. That SDES media stream should also be associated with (i.e., synchronized with) the media stream containing the RTP data of that same RTP source identifier via the ASF Header Object's Inter-Media Dependency Object. The SPO for a SDES media stream should be constructed as follows: * The UUID of the SDES Media Stream is {96800c62-4c94-11d1-837b- 0080c7a37f95}. This value should be written into the Stream Type field of ASF's Stream Properties Object (SPO) to identify SDES Media Streams. * The value of the Type-Specific Data Length field within the SPO is zero (i.e., no Type-Specific Data). The format of an SDES Media Stream consists of one or more instances (per ASF Data Unit) of the following structure: Field Name: Field Type: Size (bits): Description: Type Array Size UINT 16 Size in bytes of the Type Array Value Array Size UINT 16 Size in bytes of the Value Array Type Array UINT8 ? UTF-2 string [8] identifying the specific SDES type instance (e.g., "CNAME") Value Array UINT8 ? UTF-2 string [8] containing the SDES value (e.g., "user and domain name" for a CNAME) E. Fleischman and A. Klemets [Page 17] Internet Draft draft-fleischman-asf-rtp-record-00 November 14, 1997 Authors Address Eric Fleischman E-mail: ericfl@microsoft.com and Anders Klemets E-mail: anderskl@microsoft.com Microsoft Corporation 1 Microsoft Way Redmond, WA 98052-8300 USA References: 1 H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson., "RTP : A Transport Protocol for Real-Time Applications", IETF RFC 1889, January 1996. 2 Microsoft Corporation, "Advanced Streaming Format (ASF) Specification", http://www.microsoft.com/asf/specs.htm, September 1997. 3 H. Schulzrinne, "RTP Profile for Audio and Video Conference with Minimal Control", IETF RFC 1890, January 1996. 4 H. Schulzrinne, A. Rao, and R. Lanphier "Real Time Streaming Protocol (RTSP)", work in progress. 5 J. Miller, P. Resnick, and D. Singer, "Rating Services and Rating Systems (and Their Machine Readable Descriptions)," World Wide Web Consortium http://www.w3.org/PICS/services.html, May 5 1996. 6 T. Krauskopf, J. Miller, P. Resnick, and G. W. Treese, "Label Syntax and Communication Protocols," World Wide Web Consortium http://www.w3.org/PICS/labels.html, May 5 1996. 7 M. Handley, V. Jacobson, "SDP: Session Description Protocol", work in progress. 8 International Standards Organization, "ISO/IEC DIS 10646-1:1993 information technology - universal multiple-octet coded character set (UCS) - part I: Architecture and basic multilingual plane," 1993. 9 "RTP Payload types (PT) for standard audio and video encodings", ftp://ftp.isi.edu/in-notes/iana/assignments/rtp-av-payload-types 10 "ASF Codec GUIDs", http://www.microsoft.com/asf/guids.htm E. Fleischman and A. Klemets [Page 18]