Internet Engineering Task Force INTERNET-DRAFT P. Christ, Ch. Guillemot, S. Wesner draft-christ-rtsp-mpeg4-00.txt Univ. Stuttgart - RUS/ INRIA November 16, 1998 Expires: May 15, 1999 RTSP-based Stream Control in MPEG-4 Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months. Internet-Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet-Drafts as reference material or to cite them other than as a "working draft" or "work in progress." To learn the current status of any Internet-Draft, please check the 1id-abstracts.txt listing contained in the Internet-Drafts Shadow Directories on ftp.ietf.org, nic.nordu.net, ftp.nisc.sri.com, or munnari.oz.au. ABSTRACT In order to support advanced interactivity as envisaged for MPEG-4 applications, this document proposes a simple RTSP- based [1] streams control framework - including necessary extensions to RTP methods syntax and semantics. Reflecting syntax and semantics of the MPEG-4 BIFS scene description [2], VRML nodes [4] and the MPEG-4 media delivery framework (DMIF)[3], in the spirit of HTMLSMIL [5], Random Access Point information, Range and Time parameters are introduced into the relevant URL(s) and related signaling methods accordingly. Two additional optional methods, R-MUTE, for Remote-MUTE, and RESUME are also proposed. Christ/Guillemot/Wesner November 16, 1998 [Page 1] INTERNET-DRAFT draft-ietf-christ-rtsp-mpeg4-00.txt November 16, 1998 1. Motivations and Rationale The motivations and rationale of the document is the derivation of a simple streams control framework for MPEG-4 - reflecting the syntax and semantics of the MPEG-4 scene nodes - that could possibly be introduced as extended methods syntax and semantics, in line with RTSP extension guidelines [1-section 1.5]. The attribute 'simple' alludes to the aim to provide for the minimum in support of an MPEG-4 client-to-server application equivalent to HTML/CGI or SMIL. It is expected that in the evolution of MPEG-4 towards Java-enhanced multi-user-environments there will be in parallel other and possibly more elaborated application signaling frameworks [7]. With respect to RTSP, the main technical issues concern first a possible support for dissociation of connection management and stream control via the optional provision of a channel identifier, as an alternative to the URL in the methods. The second main issue concerns the provision of syntactical means for extended timing, random access point and range parameters. In order to support user-navigation based interactivity two additional optional methods, called R-MUTE, for Remote-MUTE, inspired from the local MUTE specified in PREMO [6], and RESUME are also proposed. With respect to MPEG-4 Systems and DMIF the whole proposal will be fed into the ongoing Version 2 procedure. It should also be mentioned that this proposal is orthogonal to [8] which tries to position RTSP-based signaling into the DMIF environment. 2. Preliminary Remarks MPEG-4 'BIFS'-scene description framework is inspired from VRML. In VRML, application specific procedural logic and state management can be implemented via Script nodes - which will be - together with Prototypes - provided only in MPEG-4 Version 2 An MPEG-4 client-server application scenario - of the type this draft is aiming at - can be characterized as follows: from the terminal side and driven by events from user interaction with the scene, a generic MPEG-4 Browser via Application Signaling requires from the Server MPEG-4 compliant streams of scene descriptions - constituting the very application - and their companions e.g. Audio-Video streams. Christ/Guillemot/Wesner November 16, 1998 [Page 2] INTERNET-DRAFT draft-ietf-christ-rtsp-mpeg4-00.txt November 16, 1998 MPEG-4 Version 2 will introduce an advanced Interactivity Model (MPEG-J). This should lead to application specific procedural code at the terminal side, allowing e.g. for the local construction/ encoding(/decoding) of BIFS updates. As the script code probably would be read as part of the scene description, the Browser could (probably) remain generic, i.e. independent of any specific application. The signaling syntax and semantics, as discussed below, is independent of the signaling supporting mechanisms, i.e. of the procedural logic mechanisms. These mechanisms are out of the scope of this document. 3. Relating Application Signaling to MPEG-4 Scene Description In VRML/MPEG-4, syntax and semantics of the nodes of a scene determine and confine the characteristics of interactivity possible. This is true for the parameters available both in 'media-playing nodes' such as MovieTexture, and in 'structure related' nodes such as Inline. Even with future Proto and Script nodes in MPEG-4 Version 2, the expressivity of signaling, e.g. with respect to media streams, will be confined by such of the corresponding nodes in the scene. Hence, in MPEG-4, as shortly indicated in the introduction of this document, all interactivity and in turn all application signaling has to be constructed in accordance with syntax and semantics of the relevant nodes. 3.1. Media Content Playing Nodes An object may be completely described within the scene description (BIFS) information, or may also require elementary stream data from one or more audio - visual objects, via the 'media content playing' nodes. Therefore, interactivity and corresponding signaling with respect to media objects has to be derived from the 'media content playing' nodes such as VideoObject2D, MovieTexture, etc. An application signaling method in that context would typically carry a PLAY, a PAUSE or a TEARDOWN method. 3.2. Structure related nodes Interactivity and application signaling concerning the structure of the scene, e.g. changes of a scene, will be derived from 'structure related' nodes such as Inline2D and Inline. 3.3. Usage of URL's URL as parameter of type MFString, indicating the location of the media Christ/Guillemot/Wesner November 16, 1998 [Page 3] INTERNET-DRAFT draft-ietf-christ-rtsp-mpeg4-00.txt November 16, 1998 stream, or including a reference to an ObjectDescriptor (OD). An OD is a level of indirection that can point either to another object descriptor or to ES_descriptors that in turn provide the references to locations of raw elementary streams associated with the node, via URL fields defined as a string of 8 bits characters (type bit(8)). Interpreted by the Browser these URL(s) will lead to the issuing of the signaling commands. 3.4. MPEG-4 Timing Model A point in time at which an event occurs (change of a parameter value, defining the start or stop of a media stream, etc.) is identified by the SFTime fields of the media content playing nodes. The SFTime fields indicate in general a time relative to the BIFS time base that applies to the BIFS Elementary Stream that has conveyed the scene description. The format of the SFTime field is 64-bit double-precision floating point numbers (in ISO C floating point format ) indicating a duration in seconds with respect to a reference point in time. This corresponds to an NPT - Normal Play Time - in RTSP terms, except that, here, the reference point in time (beginning of the presentation) is not expressed in GMT time but provided by the StartCompositionTimeStamp of the scene description stream. SFTime fields of some nodes may require 'absolute' time values, given by a "wall clock" time . The relation of the BIFS time base ticks, i.e. CTS - composition time stamp - of the BIFS Access Unit that has conveyed the respective scene description (BIFS) node, to the wall clock can be resolved, if the wall clock time is known from the receiver. This is achieved by an optional wallClockTimeStamp. 4. Application Signaling and the MPEG-4 media delivery framework DMIF An MPEG4 application identifies a particular elementary stream through its Elementary Stream Id (ESid), scoped by the service session it belongs to. When using DMIF (MPEG-4 Delivery Multimedia Framework), a 1-to-1 correspondence between each ESid and a channelHandle (chId) is realized by the DMIF layer. The stream identified by the ESid is further referred to through its channelHandle. Dissociating connection management from stream control implies methods syntactic extensions, namely possible stream identification by a different syntactical mean other than the URL (see section 6.3). The URL will be used only for the connection management. Christ/Guillemot/Wesner November 16, 1998 [Page 4] INTERNET-DRAFT draft-ietf-christ-rtsp-mpeg4-00.txt November 16, 1998 5. Extended RTSP methods syntax and semantics 5.1. NPT extension The RTSP NPT format, consisting of a decimal fraction expressed in either seconds or hours, minutes, and seconds, can then be used, npt-time = "now" | npt-sec | npt-hhmmss npt-sec = 1*DIGIT [ "." *DIGIT ] npt-hhmmss = npt-hh ":" npt-mm ":" npt-ss [ "." *DIGIT ] npt-hh = 1*DIGIT ; any positive number npt-mm = 1*2DIGIT ; 0-59 npt-ss = 1*2DIGIT ; 0-59 However, it is necessary to provide the possibility for having the reference point in time set to the value of the startCompositionTimeStamp of the corresponding BIFS scene description stream instead of 0.0 seconds. This would mean that the beginning of the presentation is at time startCompositionTimeStamp of the corresponding BIFS scene description stream. Hence, the NPT syntax can be complemented by an optional field the wall clock time base. If this field is not present, then the default value for the reference point in time is 0.0 seconds. npt-ref = npt-hh ":" npt-mm ":" npt-ss [ "." *DIGIT ] npt-hh = 1*DIGIT ; any positive number npt-mm = 1*2DIGIT ; 0-59 npt-ss = 1*2DIGIT ; 0-59 5.2. Random Access Point (RAP) and Range extensions Method and RAP or range information could be stored as parameters in the MFUrl class defined below in a preliminary syntax: Christ/Guillemot/Wesner November 16, 1998 [Page 5] INTERNET-DRAFT draft-ietf-christ-rtsp-mpeg4-00.txt November 16, 1998 class MFUrl { if (isMethod) SFString GlobalMethod=method; else { if (isOD) { bit(10) ODId; MFString ESId=esid ESMethod=method ESR1=ES_RAP_info1 ESR2=ES_RAP_info2; } else { SFString urlValue ODId=odid; MFString ESId=esid ESMethod=method ESR1=ES_RAP_info1 ESR2=ES_RAP_info2; } } } The GlobalMethod is introduced to allow for dealing with streams of all BIFS/nodes belonging to the group or whole scene. Note that the MPEG-4 system does not, so far, provide semantical means for random access point and range information. The above class is a proposal that the authors are submiting in parallel to MPEG-4. It is proposed here to complement the relative time and range syntax in the RTSP methods by possibly other range specifiers - also including the case of degenerated ranges specifying just a single Random Acces Point. Exept for the degenerated NodeID case, the 'other' ranges are still under consideration. In any case, the syntax would be: other_range = other_RAP_info#1 - other_RAP_info#2 other_rap_info = NodeId | ... range-specifier = npt-range | other_range 5.3. Methods Extended Syntax In addition to Random Access Point, Range and Time parameters, and in order to allow for dissociation of connection management and stream control, an additional syntactical mean - other than the URL mechanism - for identifying a stream must be supported. Christ/Guillemot/Wesner November 16, 1998 [Page 6] INTERNET-DRAFT draft-ietf-christ-rtsp-mpeg4-00.txt November 16, 1998 This syntactical mechanism could be: Loop(ch_identifier) e.g. PLAY Loop(23,56,32) 5.4. Additional Optional Methods The two additional optional methods proposed here find a strong interest in environments with interactivity triggered by user navigation in the presented scenes (e.g. Virtual Reality in the VRML / MPEG-4 spirit). In a scene with several synchronized audio-visual streams, moving away from one audio-visual stream could allow to suspend the delivery of the stream, and coming back closer to it could resume the delivery of the stream, at a point that would be synchronized with all the other streams that have been maintained in the scene. 5.4.1. R-MUTE (Remote-Mute) The R-MUTE method is inspired from the MUTE method specified by PREMO [6]. However, in PREMO, the MUTE command suspends the presentation of the streams on the terminal but does not suspend the delivery of the streams. The R-MUTE method would cause the stream delivery to be suspended temporarily, but a 'local' progression - on the server side - on the streams, with maintained synchronization actions, occurs without delivery of the streams. The server will hence maintain the current reading points of the on-going streams, and will then be able to resume the delivery at the corresponding radom access point, when triggered by the RESUME method. 5.4.2. RESUME The RESUME method causes the re-start of the delivery of the stream, that has been previously suspended by the R-MUTE method. The delivery will be resumed at the random access point given by the server state machine, which is also dependent of the stream time base and of the time interval between the R-MUTE and the RESUME commands. Remark: The above functionalities supposes the provision in the scene description syntax and semantic of mechanisms for routing the whole semantic of the user navigation triggered action to the media content playing nodes. Christ/Guillemot/Wesner November 16, 1998 [Page 7] INTERNET-DRAFT draft-ietf-christ-rtsp-mpeg4-00.txt November 16, 1998 6. Authors Addresses Paul Christ Computer Center - RUS University of Stuttgart Allmandring 30 D70550 Stuttgart, Germany. email: Paul.Christ@rus.uni-stuttgart.de Christine Guillemot INRIA Campus Universitaire de Beaulieu 35042 RENNES Cedex, FRANCE email: Christine.Guillemot@irisa.fr Stefan Wesner Computer Center - RUS University of Stuttgart Allmandring 3a D70550 Stuttgart, Germany. email: Stefan.Wesner@rus.uni-stuttgart.de Christ/Guillemot/Wesner November 16, 1998 [Page 8] INTERNET-DRAFT draft-ietf-christ-rtsp-mpeg4-00.txt November 16, 1998 7. References [1] H. Schulzrinne, A. Rao, R. Lanphier, 'RTSP: Real Time Streaming Protocol', RFC 2326, April 1998. [2] Information Technology - Coding of Audiovisual Objects - Part 1: Systems, ISO/IEC FCD 14496-1 [DRAFT], May-15, 1998. [3] Information technology - Generic coding of moving pictures and associated audio information - Part 6: Delivery Multimedia Integration Framework', ISO/IEC 14496-6, May- 15, 1998. [4] VRML 97: The Virtual Reality Modeling Language, International Standard ISO/IEC 14772-1:1997 [5] Synchronized Multimedia Integration Language (SMIL) 1.0 Specification, W3C proposed recommendation, April-9, 1998. [6] Information Processing Systems - Computer Graphics and Image Processing - Presentation Environments for Multimedia Objects (PREMO), Part 3: Multimedia Systems Services, ISO/IEC 14478-3. [7] ISO/IEC/JTC1/SC29/WG11: w/N2359 subpart 2 Verification Model of Advanced BIFS (Systems VM subpart 2) . July 98 [8] ISO/IEC/JTC1/SC29/WG11: MPEG98/M4102, October 1998; containing: draft-balabanian-rtsp-mpeg4-dmif-00.txt . Sept. 22, 1998: Balabanian: The Role of DMIF with RTSP and MPEG-4 Christ/Guillemot/Wesner November 16, 1998 [Page 9]