Internet Engineering Task Force D. Cromwell INTERNET DRAFT M. Durling File: draft-cromwell-navdec-media-req-00.txt Nortel Networks Date: November 1998 Requirements For Control Of A Media Services Function Status of this Document This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Abstract This document describes the functional requirements for protocol used by a call processing agent in a packet network to control an media services function located in the same packet network or at the inter- face of the packet network and the traditional telephone network. The primary focus of the protocol is on audio, however the protocol could be extended in the future to support other media streams such as video. The protocol provides the standard audio operations of play audio, collect DTMF, and record speech. It supports direct references to simple audio as well as indirect references to simple and complex audio. It provides multi-language audio variables, interruptibility of audio, digit buffer control, special key sequences, and support for reprompting during data collection. Cromwell, Durling expires May 1999 [Page 1] INTERNET DRAFT Media Services Control November 1998 The approach used in specifying protocol functionality was to look at several existing protocols currently in use in the telephone network, taking the best concepts from each and attempting to avoid any limi- tations. The following protocols were examined: ITU CS1-R [2], Nor- tel Extended CS1-R [3], Bellcore GR-1129 [4], and Bellcore SR-3511 [5]. The protocol described in this document provides at a minimum a superset of the functionality of these protocols. Cromwell, Durling expires May 1999 [Page 2] INTERNET DRAFT Media Services Control November 1998 Table of Contents 1. Notation 2. Segments 2.1. Terminology 2.2. Segment Types 3. Variables 3.1. Specification 3.2. Inflection 3.3. Variable Types 3.4. Date 3.5. Digits 3.6. Duration 3.7. Money 3.8. Month 3.9. Number 3.10. Silence 3.11. String 3.12. Text 3.13. Time 3.14. Tone 3.15. Weekday 4. Operations 4.1. Play Operation 4.1.1. Announcements 4.1.2. Iterations 4.1.3. Duration 4.1.4. Speed 4.1.5. Volume 4.1.6. Optional Parameters 4.1.7. Return Values 4.1.8. Examples 4.2. Play Collect Operation 4.2.1. Announcements 4.2.2. Speed 4.2.3. Volume 4.2.4. Interruptibility 4.2.5. Digit Buffer Control 4.2.6. Pattern Matching 4.2.7. Timers 4.2.8. Key Definitions 4.2.9. Number Of Attempts 4.2.10. Optional Parameters 4.2.11. Return Values 4.2.12. Examples 4.3. Play Record Operation Cromwell, Durling expires May 1999 [Page 3] INTERNET DRAFT Media Services Control November 1998 4.3.1. Announcements 4.3.2. Speed 4.3.3. Volume 4.3.4. Interruptibility 4.3.5. Digit Buffer Control 4.3.6. Timers 4.3.7. Key Definitions 4.3.8. Optional Parameters 4.3.9. Return Values 4.3.10. Examples 5. Other Requirements 5.1. Invoke Application 5.2. Audio Management 6. Open Issues 7. Implementation 8. References 9. Author's Address Notation Protocol operations are represented in this document by pseudo code. These representations are not intended to imply an actual implementa- tion syntax and are for purposes of illustration only. 1. Segments 1.1. Terminology A discrete unit of playable speech can be classified as a fragment, a segment, or an announcement. A fragment is the smallest unit and typically consists of one or more phonemes, e.g. "\w\, the first sound in "welcome." A segment can be either composed of a series of fragments or defined atomically and typically consists of one or more words, e.g. "Welcome" or "Welcome to." An announcement is composed of one or more segments and typically embodies a complete logical expression, e.g. "Welcome to Bell South's Automated Directory Assis- tance Service." It is possible for an announcement to be defined as a single segment. In this document "announcement" is a logical con- cept while "segment" refers to actual audio. Media operations supported by the protocol should reference announce- ments. Announcements should be specifiable either as a sequence of segment id's given as a parameter to a media output operation or as a sequence of segment id's provisioned in data and referenced by a Cromwell, Durling expires May 1999 [Page 4] INTERNET DRAFT Media Services Control November 1998 single identifier. This identifier can be used as a parameter to a media output operation. Allowing both parameter driven and data driven specification of announcements provides application designers a great deal of flexibility when choosing an application and provi- sioning model. In practice however, the majority of references made by a call pro- cessing agent will likely be to a single segment which is a logically complete pre-recorded announcement, e.g. play(27), where segment 27 points to a recording of "Please enter your card number after the tone..." 1.2. Segment Types The protocol should support following segment types: RECORDING: A reference by unique id to a single piece of pro- visioned audio. TEXT: A reference to a block of text to be converted to speech or to be displayed on a device. Reference may be by unique id to a block of provisioned text or by direct specification of text in a parameter. SILENCE: A specification of a length of silence to be played in units of 100 milliseconds. TONE: The specification of a tone to be played by algorithmic generation. Most tones however will probably be recorded, not generated. Exact specification of this segment type is tbd. VARIABLE: The specification of a multilanguage voice variable by type, subtype, language, and value. Specification of vari- ables is considered in more detail in a subsequent section of this document. COMPOSITE: A reference by unique id to a provisioned sequence of mixed recording, text, silence, tone, variable, or composite segments. Recursive definition of composite segments should be allowed. For example composite A could have as one of its elements compo- site B which has as one of its elements composite C. However this feature should be used with caution give the additional complexity it introduces. Direct or transitive definition of a composite segment in terms Cromwell, Durling expires May 1999 [Page 5] INTERNET DRAFT Media Services Control November 1998 of itself must not be permitted, e.g. composite A having as one of its elements composite B, which has as one of its elements composite A. 2. Variables 2.1. Specification Variables should be specified by type, subtype, language, and value. Subtype is a refinement of type. For example the variable type Money might have an associated range of subtypes such as Dollar, Rupee, Dinar, etc. Not all variables require a subtype, and for these vari- ables the subtype parameter should be set to null. ISO standard 639, Code For The Representation Of Names Of Languages [6], lists the names of many languages and should be used as a start- ing point in defining the range of available languages. A small excerpt from ISO 639 follows: _________________ |Code | Language | |_____|__________| | cs | Czech | | cy | Welsh | | da | Danish | |_____|__________| Note that ISO 639 is not a complete list. For example the standard includes Chinese but does not mention the Mandarin or Cantonese dialects. In some cases it may be desirable to play an announcement with an embedded variable without playing the variable itself. If the value for a variable is NULL, the variable must not be played. 2.2. Inflection Specification of inflection is beyond the scope of this protocol, however a media services function should support rising, flat, and falling inflections as appropriate. Cromwell, Durling expires May 1999 [Page 6] INTERNET DRAFT Media Services Control November 1998 2.3. Variable Types The protocol should support the following multilanguage voice vari- ables and should be extensible to support additional variable types. A list of supported variables follows: ______________________________ |Type | Subtype | |_________|___________________| |DATE | none | | | | |_________|___________________| |DIGITS | GENERIC | | | NORTH AMERICAN DN | |_________|___________________| |DURATION | none | | | | |_________|___________________| | | | |MONEY | currency_type | |_________|___________________| |MONTH | none | | | | |_________|___________________| |NUMBER | CARDINAL | | | ORDINAL | |_________|___________________| |SILENCE | none | | | | |_________|___________________| |STRING | none | |_________|___________________| |TEXT | DISPLAY | | | SPEECH | | | | |_________|___________________| |TIME | TWELVEHOUR | | | TWENTYFOURHOUR | |_________|___________________| |TONE | none | |_________|___________________| |WEEKDAY | none | |_________|___________________| Cromwell, Durling expires May 1999 [Page 7] INTERNET DRAFT Media Services Control November 1998 2.4. Date Speaks a date. For example "101598" is spoken as "October fifteenth nineteen ninety eight." 2.5. Digits Speaks a string of digits one at a time. If the subtype is North American DN, the format of which is NPA-NXX-XXXX, the digits are spo- ken with appropriate pauses between the NPA and NXX and between the NXX and XXXX. If the subtype is generic, the digits are spoken no pauses. 2.6. Duration Duration is specified in seconds and is spoken in one or more units of time as appropriate, e.g. "3661" is spoken as "One hour, one minute, and one second." 2.7. Money Money is specified in the smallest units of a given currency and is spoken in one or more units of currency as appropriate, e.g. "110" in U.S. Dollars would be spoken "one dollar and ten cents." The list of currency specified in ISO 4217, Currency And Funds Code List [7], should be used as a starting point in defining the currency subtype. A small excerpt from ISO 4217 follows: __________________________________________________________ |Alpha-code | Numeric-code | Currency | Entity | |___________|______________|__________|___________________| |GQE | 226 | Ekwele | Equatorial Guinea | |GRD | 300 | Drachma | Greece | |GTQ | 320 | Quetzal | Guatemala | |___________|______________|__________|___________________| 2.8. Month Speaks the specified month, e.g. "October." Cromwell, Durling expires May 1999 [Page 8] INTERNET DRAFT Media Services Control November 1998 2.9. Number Speaks a number in cardinal form or in ordinal form. For example, "100" is spoken as "one hundred" in cardinal form and "one hundredth" in ordinal form. 2.10. Silence Plays a specified period of silence. Specification is in 100 mil- lisecond units. 2.11. String Speaks each character of a string, e.g. "a34bc" is spoken "A, three, four, b, c." 2.12. Text Produces the specified text as speech or displays it on a device. 2.13. Time Speaks a time (specified in twenty four hour format) in either twelve hour format or twenty four hour format. For example "1700" is spoken as "Five pm" in twelve hour format or as "Seventeen hundred hours" in twenty four hour format. 2.14. Tone Plays an algorithmically generated tone, specification of which is tbd. Probably most applications will use prerecorded tones. 2.15. Weekday Speaks the day of the week, e.g. "Monday." 3. Operations This section describes the functional requirements for a set of media control operations. Three operations are defined: play, play col- lect, and play record. Specification of endpoint, port, or channel Cromwell, Durling expires May 1999 [Page 9] INTERNET DRAFT Media Services Control November 1998 on a per operation basis is not a protocol requirement, however it may be required in particular implementations. 3.1. Play Operation The play operation should play an announcement in situations where there is no need for interaction with the user. Because there is no need to monitor the incoming media stream this operation is an effi- cient mechanism for treatments, informational announcements, etc. The play operation should specified as follows: 3.1.1. Announcements The play operation should play an ordered sequence of one or more segments of the following types: recording, text, silence, tone, variable, and composite. 3.1.2. Iterations The protocol should support specification of the maximum number of times an announcement is to be played. It should be possible to specify that an announcement be repeated forever, and it should also be possible to specify a interval of silence (in 100 millisecond units) to be inserted between announcement plays. If the number of iterations is not specified, it should be assumed to be one (i.e. a single play). If the inter-announcement interval is not specified, it should be assumed to be one second. 3.1.3. Duration The protocol should support specification of the maximum amount of time (in 100 millisecond units) allowed to play and possibly replay an announcement. It should be possible to specify that an announce- ment be played forever. If duration is specified, it should take precedence over iteration and interval. For example, if a 10 second announcement is to be played 5 times with 2 seconds of silence between plays, the total playing time would be 58 seconds. However, if duration is set to 29, the announcement will only be played for 29 seconds, i.e. the entire announcement will be played twice but the third play will be terminated after 5 seconds. Cromwell, Durling expires May 1999 [Page 10] INTERNET DRAFT Media Services Control November 1998 3.1.4. Speed The relative playback speed of the announcement should be specifiable as a percentage variation from the normal playback speed. This ini- tial setting should apply to the entire playing of the announcement and should not be changeable. The normal playback speed and the range of change allowed is implementation dependent. 3.1.5. Volume The relative playback level of an announcement. should be specifiable as an percentage variation from the normal playback level. This ini- tial setting should apply to the entire playing of the announcement and should not be changeable. The normal volume and the amount of change allowed is implementation dependent. 3.1.6. Optional Parameters All parameters to the play operation except the announcement parame- ter are optional. Certain parameters default to reasonable values. This allows the call agent to specify only the minimum set of parame- ters it needs in a given situation. If an announcement is not speci- fied an error will be returned to the call agent. The defaults are: _______________________ |Parameter | Default | |___________|__________| |Iterations | 1 | | Interval | 1 second | |___________|__________| 3.1.7. Return Values In addition to a return code that describes the outcome of a play operation, the following information is returned: The interrupting key sequence, if any. If an announcement was interrupted, the length of the portion of the announcement that was played before the interrupt. Cromwell, Durling expires May 1999 [Page 11] INTERNET DRAFT Media Services Control November 1998 3.1.8. Examples Assume the following syntax: __________________________________________________________________ | | | play(announcement,iterations,interval,duration,speed,volume) | |________________________________________________________________| Play a single recording, text, or composite segment: ______________________ | | | play(segment(5)) | |____________________| Play a sequence of three segments: ____________________________________________ | | | play(segment(5),segment(6),segment(7)) | |__________________________________________| Play three seconds of silence: _______________________ | | | play(silence(30)) | |_____________________| Play text as speech: ___________________________ | | | play(speech("hello")) | |_________________________| Display text on a device: ____________________________ | | | play(display("hello")) | |__________________________| Play "Eleven dollars and fifty three cents" in English: Cromwell, Durling expires May 1999 [Page 12] INTERNET DRAFT Media Services Control November 1998 _________________________________________________ | | | play(variable(MONEY,USDOLLARS,ENGLISH,1153) | |_______________________________________________| Specification of a variable without a subtype: _______________________________________ | | | play(variable(DIGITS,,HINDI,1234) | |_____________________________________| Play a segment followed by 1 second of silence, followed by "one, two, three, four in Hindi, followed by another segment: _______________________________________________________________________ | | | play(segment(45),silence(10),variable(DIGITS,,HINDI,segment(543)) | |_____________________________________________________________________| The same operation as above. The sequence of segment, variable, silence, and segment is defined in data as segment 37: __________________________________________ | | | play(segment(37),DIGITS,,HINDI,1234) | |________________________________________| Play an announcement 10% faster than normal speed and 5% softer than normal volume: ____________________________________________ | | | play(segment(7),speed(+10),volume(-5)) | |__________________________________________| Play an announcement three times with two seconds of silence between plays: __________________________________________________ | | | play(segment(98),iterations(3),interval(20)) | |________________________________________________| The same operation as above only the operation is terminated after twenty seconds: Cromwell, Durling expires May 1999 [Page 13] INTERNET DRAFT Media Services Control November 1998 ________________________________________________________________ | | | play(segment(98),iterations(3),interval(20),duration(200)) | |______________________________________________________________| 3.2. Play Collect Operation The play collect operation should play a prompt and collect DTMF digits. If no digits are entered or an invalid digit pattern is entered, the user may be reprompted and given another chance to enter the correct digits. The play collect operation should be specified as follows: 3.2.1. Announcements The play collect operation should optionally play one or more announcements, each consisting of an ordered sequence of one or more segments of the following types: recording, text, silence, tone, variable, and composite. All play collect announcements are optional and some default to other announcements if they are not specified. For example if the user fails to enter any digits the no digits reprompt is played. If the no digits reprompt is undefined then the reprompt is played. If the reprompt is undefined then the initial prompt is played, and if the initial prompt is not defined then no announcement is played. This concept of cascading defaults allows the level of audio customi- zation to decay gracefully all the way back to a single announcement for all errors and means that applications are not forced to specify any more announcement functionality that they need. The following announcements should be supported for the play collect command. Default relationships are indicated by indentation. Cromwell, Durling expires May 1999 [Page 14] INTERNET DRAFT Media Services Control November 1998 INITIAL PROMPT - If the initial prompt is not specified, digit collection should begin immediately. REPROMPT - Played after the user has made an error; asks the user to try again. Should default to Initial prompt if not set. NO DIGITS REPROMPT - Played when the user has not entered any digits. Should default to Reprompt if not set. FAILURE ANNOUNCEMENT - Played when the all data entry attempts have failed. SUCCESS ANNOUNCEMENT - Played when the operation has succeeded. 3.2.2. Speed The relative playback speed of the announcement should be specifiable as a percentage variation from the normal playback speed. This ini- tial setting should apply to the playing of all announcements associ- ated with a particular play collect operation. The normal playback speed and the range of change allowed is implementation dependent. 3.2.3. Volume The relative playback level of an announcement. should be specifiable as an percentage variation from the normal playback level. This ini- tial setting should apply to the playing of all announcements associ- ated with a particular play collect. The normal volume and the amount of change allowed is implementation dependent. 3.2.4. Interruptibility The play collect operation should support interruptibility by DTMF. A prompt is interruptible if it stops playing when the user presses a DTMF key; if it is non-interruptible it continues to play. Interrup- tibility should be specifiable in a protocol command on a per segment basis. 3.2.5. Digit Buffer Control The protocol should support the ability to clear the digit buffer prior to playing the initial prompt. The default should be to not Cromwell, Durling expires May 1999 [Page 15] INTERNET DRAFT Media Services Control November 1998 clear the buffer. By default the buffer should always be cleared fol- lowing the playing of an uninterruptible segment and before playing a reprompt in response to invalid input. 3.2.6. Pattern Matching The protocol should support specification of the maximum and minimum number of digits to collect. It should support digit pattern match- ing using extended regular expressions as supported by the Rogue Wave Class Library [8], which supports a subset of the POSIX.2 standard [9] for regular expressions. 3.2.7. Timers The protocol should support the following event timers for the play collect operation: FIRST DIGIT - The amount of time allowed for the user to enter the first digit. Specified in units of 100 milliseconds. INTER DIGIT - The amount of time allowed for the user to enter each subsequent digit. Specified units of 100 milliseconds seconds. EXTRA DIGIT - The amount of time to wait for a user to enter a digit once the maximum expected amount of digits have been entered. Specified in units of 100 milliseconds. Typically this timer is used to wait for a terminating key in applications where a specific key has been defined to terminate input. This timer addresses the "# key ambiguity problem." If the application is expecting 5 digits terminated by the # key, but the digits are valid even if not terminated by the # key, if the digits are sent to the call processing agent as soon as the fifth key is entered, the # key when and if it is received is ambiguious since it could be interpreted as a terminating key for the digits entered previously or as something else. 3.2.8. Key Definitions The protocol should support the following keys: 0-9,#,*,A,B,C, and D and should provide the ability to specify the semantics of keys received during the play collect operation as defined below. Defined keys are processed in the following order of precedence from highest to lowest: command keys, playcontrol keys, startinput keys, and Cromwell, Durling expires May 1999 [Page 16] INTERNET DRAFT Media Services Control November 1998 endinput keys. Any keys not defined should be collected. COMMAND KEY - A key followed by a sequence of zero or more keys that has one of the following meanings: RESTART - Discard any digits collected, replay the prompt, and resume collection. REINPUT - Discard any digits collected and resume collection. RETURN - Terminate the current operation and any queued operations and return the terminating key sequence to the call processing agent. PLAYCONTROL KEY - A key that is valid only while an announcement is playing and has one of the following meanings. Play control keys are never collected. POSITION - Stop playing the current announcement and resume playing at another position within the announcement. A play control key can be defined to resume playing at one of the following positions: the beginning of the first, last, previous, next, or the current segment of the announcement. If the announcement consists of a single segment, the first and previous positions are equivalent to the beginning of the announcement. The last and next positions are equivalent to the end of the announcement. STOP - Terminate playback of the announcement. STARTINPUT KEYS - A set of one or more keys that are acceptable as the first digit collected. It should be possible to specify for each key whether interrupts a playing announcement is ignored during a playing announcement. ENDINPUT KEY - A key that signals the end of user input. It should be posible to specify whether or not this key is included in the collected digits. The protocol should support specification of the maximum number of times a user may use a restart key to restart the operation or use a reinput key to re-attempt DTMF entry. 3.2.9. Number Of Attempts The protocol should support specification of the number of times the Cromwell, Durling expires May 1999 [Page 17] INTERNET DRAFT Media Services Control November 1998 user can attempt to make a valid entry. 3.2.10. Optional Parameters All parameters to the play collect operation are optional. Certain parameters default to reasonable values. This allows the call agent to specify only the mimimum set of parameters it needs in a given situation. The defaults are: ________________________________ | Parameter | Default | |___________________|___________| | Iterations | 1 | | Interval | 1 second | | Clear DTMF | false | |First digit timer | 5 seconds | | Interdigit timer | 3 seconds | | Start input key | 0-9 | | End input key | # | |Number of attempts | 1 | |___________________|___________| 3.2.11. Return Value In addition to a return code that describes the outcome of a play collect operation, the following information is returned: The interrupting key sequence, if any. If an announcement was interrupted, the length of the portion of the announcement that was played before the interrupt. The number of attempts it took the user to enter a valid sequence of DTMF keys. The digits that were collected. Cromwell, Durling expires May 1999 [Page 18] INTERNET DRAFT Media Services Control November 1998 3.2.12. Examples Assume the following syntax: __________________________________________________________________________ | | | play_collect(prompt_block,timer_block,key_block,pattern_block,speed, | | volume,cleardigits,attempts) | | | | prompt_block = (initial_prompt,reprompt,no_digits_reprompt, | | success_announcement,failure_announcement) | | | | timer_block = (first_digit,inter_digit,extra_digit) | | | | key_block = (command_block,playcontrol_block,startinput,endinput) | | | | pattern_block = (min_digits,max_digits,pattern) | | | |________________________________________________________________________| Clear the digit buffer before initial prompt, play 5% faster than normal speed, 2 percent less than normal volume, and give the user three attempts to enter some valid data: ___________________________________________________________________________ | | | play_collect(prompt_block,timer_block,key_block,speed(+5),volume(-2), | | cleardigits(TRUE),attempts(3)) | |_________________________________________________________________________| Prompt block with only an initial prompt defined: _________________________________________ | | | prompt_block = (initial_prompt(87)) | |_______________________________________| Prompt block with all prompts defined: ______________________________________________________________________ | | | prompt_block = (initial_prompt(87),reprompt(5), | | no_digits_reprompt(419),failure_announcement(9), | | success_announcement(18)) | |____________________________________________________________________| Cromwell, Durling expires May 1999 [Page 19] INTERNET DRAFT Media Services Control November 1998 Timer block with first_digit timer set to 3 seconds and the inter_digit timer set to 2 seconds: ___________________________________________________ | | | timer_block = (first_digit(3),inter_digit(2)) | |_________________________________________________| Pattern block specifying collection of 1 to 4 digits: ___________________________________________________ | | | pattern_block = (min_digits(1),max_digits(4)) | |_________________________________________________| Pattern block specifying collection of 2 digits where the first digit is 3,4, or 5 and the second digit is any digit except 5, 6, or 7. ________________________________________________________________________ | | | pattern_block = (min_digits(1),max_digits(2),pattern([3-5][^567])) | |______________________________________________________________________| Key block specifying a set of digits that are valid as the first digit of input and also specifying that hese keys will interrupt the current announcement. Specification of a key to end input. This key is not included in any digits collected. _________________________________________________________________ | | | key_block = (command_block,playcontrol_block, | | startinput(0-9,INTERRUPT),endinput(#,EXCLUDE)) | |_______________________________________________________________| Command_block specifying a restart key sequence and a return key sequence: ____________________________________________________ | | | command_block = ((*,76,RESTART),(*,83,RETURN)) | |__________________________________________________| 3.3. Play Record Operation The play record operation should play a prompt and records user speech. If the user does not speak, the user may be reprompted and given another chance to record. The play record operation is speci- fied as follows: Cromwell, Durling expires May 1999 [Page 20] INTERNET DRAFT Media Services Control November 1998 3.3.1. Announcements The play record operation should optionally play one or more announcements, each consisting of an ordered sequence of one or more segments of the following types: recording, text, silence, tone, variable, and composite. All play record announcements are optional and some default to other announcements if they are not specified. For example if the user does not speak the no speech reprompt is played. If the no speech reprompt is undefined then the reprompt is played. If the reprompt is undefined then the initial prompt is played, and if the initial prompt is not defined then no announcement is played. This concept of cascading defaults allows the level of audio customi- zation to decay gracefully all the way back to a single announcement for all errors and means that applications are not forced to specify any more announcement functionality that they need. The following announcements should be supported for the play record command. Default relationships are indicated by indentation. INITIAL PROMPT - If the initial prompt is not specified, digit collection should begin immediately. REPROMPT - Played after the user has made an error; asks the user to try again. Should default to Initial prompt if not set. NO SPEECH REPROMPT - Played when the user has not spoken. Should default to Reprompt if not set. FAILURE ANNOUNCEMENT - Played when the all data entry attempts have failed. SUCCESS ANNOUNCEMENT - Played when the operation has succeeded. 3.3.2. Speed The relative playback speed of the announcement should be specifiable as a percentage variation from the normal playback speed. This ini- tial setting should apply to all announcements associated with a par- ticular play record operation. The normal playback speed and the range of change allowed is implementation dependent. Cromwell, Durling expires May 1999 [Page 21] INTERNET DRAFT Media Services Control November 1998 3.3.3. Volume The relative playback level of an announcement. should be specifiable as an percentage variation from the normal playback level. This ini- tial setting should apply to all announcements associated with a par- ticular play record operation. The normal volume and the amount of change allowed is implementation dependent. 3.3.4. Interruptibility The play record operation should support interruptibility by DTMF. A prompt is interruptible if it stops playing when the user presses a DTMF key; if it is non-interruptible it continues to play. Interrup- tibility is specifiable in a protocol command on a per segment basis. 3.3.5. Digit Buffer Control The protocol should support the ability to clear the digit buffer prior to playing the initial prompt. The default should be to not clear the buffer. By default the digit buffer should always be cleared following the playing of an uninterruptible segment, before playing a reprompt in response to invalid input, and before beginning a recording. 3.3.6. Timers The protocol should support the following event timers for the play record operation: PRE-SPEECH - The amount of time to wait for the user to initially speak. Specified in units of 100 milliseconds. POST-SPEECH - The amount of silence necessary after the end of the last speech segment for the recording to be considered complete. Specified in units of 100 milliseconds. TOTAL RECORDING LENGTH - The maximum allowable length of the recording not including pre or post speech silence. Specified in units of 100 milliseconds. 3.3.7. Key Definitions The protocol should support the following keys: 0-9,#,*,A,B,C, and D and should provide the ability to specify the semantics of keys Cromwell, Durling expires May 1999 [Page 22] INTERNET DRAFT Media Services Control November 1998 received during the play record operation as defined below. Defined keys are processed in the following order of precedence from highest to lowest: command keys, playcontrol keys, and endinput key. COMMAND KEY - A key followed by a sequence of zero or more keys that has one of the following meanings: RESTART - Discard any recording in progress, replay the prompt, and resume collection. REINPUT - Discard any recording in progress and resume collection. RETURN - Terminate the current operation and any queued operations and return the terminating key sequence to the call processing agent. PLAYCONTROL KEY - A key that is valid only while an announcement is playing and has one of the following meanings. Play control keys are never collected. POSITION - Stop playing the current announcement and resume playing at another position within the announcement. A play control key can be defined to resume playing at one of the following positions: the beginning of the first, last, previous, next, or the current segment of the announcement. If the announcement consists of a single segment, the first and previous positions are equivalent to the beginning of the announcement. The last and next positions are equivalent to the end of the announcement. STOP - Terminate playback of the announcement. ENDINPUT KEY - A key that signals the end of user input. It should be posible to specify whether or not this key is included in the collected digits. The protocol should support specification of the maximum number of times a user may use a restart key to restart the operation or use a reinput key to re-attempt recording. 3.3.8. Optional Parameters All parameters to the play record operation are optional. Certain parameters should default to reasonable values. This allows the call agent to specify only the mimimum set of parameters it needs in a given situation. The defaults are: Cromwell, Durling expires May 1999 [Page 23] INTERNET DRAFT Media Services Control November 1998 ________________________________ | Parameter | Default | |___________________|___________| | Iterations | 1 | | Interval | 1 second | | Clear DTMF | false | | Pre speech timer | 3 seconds | |Post speech timer | 2 seconds | | Start input key | 0-9 | | End input key | # | |Number of attempts | 1 | |___________________|___________| 3.3.9. Return Values In addition to a return code that describes the outcome of a play record operation, the following information is returned: The interrupting key sequence, if any. If an announcement was interrupted, the length of the portion of the announcement that was played before the interrupt. The number of attempts it took the user to make a recording. A reference to any recording that was made. Cromwell, Durling expires May 1999 [Page 24] INTERNET DRAFT Media Services Control November 1998 3.3.10. Examples Assume the following syntax: ___________________________________________________________________ | | | play_record(prompt_block,timer_block,key_block,speed,volume, | | cleardigits,attempts) | | | | prompt_block = (initial_prompt,reprompt,no_speech_reprompt, | | success_announcement,failure_announcement) | | | | timer_block = (pre_speech,post_speech,total_recording_length) | | | | key_block = (command_block,playcontrol_block,endinput) | |_________________________________________________________________| Clear digit buffer before initial prompt, play all announcements at 5% faster than normal speed, 2 percent less than normal volume, and give the user only one attempt to make a recording: __________________________________________________________________________ | | | play_record(prompt_block,timer_block,key_block,speed(+5),volume(-2), | | cleardigits(TRUE),attempts(1)) | |________________________________________________________________________| Specify an initial prompt and a reprompt: ______________________________________________ | | | prompt_block = (initial(3),reprompt(45)) | |____________________________________________| Specify prespeech timer of 5 seconds and interword timer of 2 seconds: _______________________________________________ | | | timer_block = (prespeech(5),interword(2)) | |_____________________________________________| Cromwell, Durling expires May 1999 [Page 25] INTERNET DRAFT Media Services Control November 1998 4. Other Requirements This section describes other functional requirements related to media control. These operations do not necessarily directly map to actual commands in a protocol implementation. 4.1. Invoke Application The protocol should support invocation of a custom application resid- ing on the media services function by application id and accompanying unstructured data block. 4.2. Audio Management Audio recordings are temporary by default and exist only for the life of the call. The protocol should provide the capability to change at call time the status of a piece of audio from temporary to permanent or from permanent to temporary. 5. Open Issues The following issues are unresolved: 1. Support for voice recognition. 2. Specification of a dynamically generated TONE segment. 6. Implementation Some systems may not be capable of supporting the entire protocol. Implementations of a subset of the protocol should make every attempt to remain logically consistent. Cromwell, Durling expires May 1999 [Page 26] INTERNET DRAFT Media Services Control November 1998 7. References [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [2] ITU Recommendation Q.1218, "INAP Protocol For Support Of Capability Set 1", April-May, 1995. [3] Nortel CS1-R Extensions Specification, internal Nortel document. [4] Bellcore GR-1129-CORE, "AINGR: Switch-Intelligent Peripheral Interface (IPI)", Issue 3, September 1977. [5] Bellcore SR-3511, "ISCP-IP Interface Specification", Issue 2, Version 5.0, January 1997. [6] ISO 639, "Code For The Representation Of Names Of Languages", 1998. [7] ISO 4217, "Currency And Funds Code List", 1981. [8] Tools.h++ Class Reference Version 7, Rouge Wave Software Inc., 1996. [9] ANSI/IEEE Standard 1003.2 (Portable Operating System Interface), Version D11.2, September 1991. 8. Author's Address David Cromwell Nortel Networks Box 13010 Research Triangle Park, NC 27709 Phone: (919) 992-1373 email: cromwell@nortel.ca Michael Durling Nortel Networks Box 13010 Research Triangle Park, NC 27709 Cromwell, Durling expires May 1999 [Page 27]