SLIM G. Hellstrom
Internet-Draft Omnitor
Intended status: Informational June 7, 2017
Expires: December 9, 2017

Negotiating Simultaneous Modalities in Real-Time Communications


In a real-time communication session, there may be a need or preference for receiving the same content in two or more simultaneous modalities. This specification extends a mechanism for human language negotiation with a mechanism for indication of preference for, or the availability of, simultaneously provided transformations of original contents. This indication enables negotiation of simultaneous modalities in real-time sessions. Applications are for example provision of both spoken and written content in a real-time call (captioning), or provision of both spoken language original and sign language interpretation of the original language in a real-time session.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on December 9, 2017.

Copyright Notice

Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents ( in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

Table of Contents

1. Introduction

In certain applications it is of interest to indicate a need for, or the availability of, transformed version of the contents of a media stream in another media, while still also providing the original. The application of this indication may for example be for rapid subtitling of speech either manually or automatically. It may also be sign language interpretation of speech, or spoken interpretation of sign language. A mechanism for language negotiation in real-time communications is introduced in [I-D.ietf-slim-negotiating-human-language]. This specification extends the mechanism with an indication that a transformation of the same language contents is desired, or available, in a different modality. Negotition of multiple transformations of the same language contents can be accomplished by using this indication in the context of language negotiation in real-time communications [I-D.ietf-slim-negotiating-human-language]. The indication is based on the "t" extension of [RFC5646], specified in RFC 6497 [RFC6497].

2. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

3. Indication of Simutaneous Modalities

The mechanism specified here extends the language negotiation mechanism specified in [I-D.ietf-slim-negotiating-human-language] with a mechanism for indicating request for, or availability of, transformed form of original language content in another modality in the same transmission direction. The indication should be provided in language tags of the 'hlang-send' or 'hlang-recv' SDP attribute values specified in [I-D.ietf-slim-negotiating-human-language].

When the transformed language is provided or requested simultaneously with the original, this condition should be indicated by using the "t" extension to BCP47 [RFC5646] as specified in RFC 6497 [RFC6497], by attaching a "t" subtag on the language tag for the language that is expected to be provided in a transformed modality.

Briefly, the 't' extension consists of the string "-t" followed by the source language subtag.

Example: "en-t-en" is an English transform of an English source (in another modality).

On reception of an indication including a language with the 't' extension for the receive direction, the answering party should interpret this as a request to send both the original and the transformed content, provided that both are included in the indications. (e.g. both spoken and written )

On reception of an indication including an offer of a language with the 't' extension for the send direction, the answering party should interpret this as a request to arrange for reception of both the original and the transformed content simultaneously.

The media that the 't' extension is attached to should only be interpreted as an expectation for how the transformation will be made. Conditions in the established session MAY cause the original and transformation to swap roles from what the subtags indicated.

4. Negotiation

Indication of a request for reception of multiple simultaneous modalities by the "t" extension in an offer by 'hlang-recv' attributes should be interpreted as a request to receive these modalities simultaneously. The answering party MAY satisfy this request by providing the requested simultaneous modalities. This should be indicated in the answer by the "t" extension in the 'hlang-send' SDP attributes. If the answering party had no possibility to provide the simultaneous modalities, then no "t" extensions should be indicated in the 'hlang-send' attribute values with the same original language.

Indication of availability of simultaneous modalities of an original language should be indicated by the "t" extension in the 'hlang-send' attributes in the offer. The answering party SHOULD indicate its interest to receive the offered simultaneous modalities by including the "t" extension in 'hlang-recv' attributes in the approriate media specifications in the answer. If the answering party is only interested in receiving one of the offered modalities, then the language tag should only be provided in the corresponding 'hlang-recv' attribute in the answer.

If an answering party prefers to receive simultaneous modalities of an original language content that was not offered in the 'hlang-send' attribute in the offer, then the answering party MAY anyway include the preferred language and modality with the "t" extension in the answer. The answering user may then observe in the language exchange in the beginning of the session to assess if the request for simultaneous modalities could be satisfied. For cases when a more formal indication of the satisfaction of the request, the answering party SHOULD request an update of the session and include the request for reception of multiple simultaneous modalities in the 'hlang-recv' attributes.

The indications of multiple simultaneous modalities MAY be combined with other preference indications defined for the application of the 'hlang-' attributes.

5. Limitations

It is not possible to use the "t" extention to indicate an alternative language for selection in a different modality than the original language that is also included in a 'hlang-' attribute. Implementations SHOULD always interpret such indications as indications for simultaneous modality. If interpretation as alternative languages to select from is desired, the "t" extension SHOULD be omitted.

6. Examples

A request for a written English subtitling to be received by the caller in the text stream created from a spoken English source in the audio stream. The caller also indicates a preference to speak English:

An acknowledgement of the request:

In the session, the caller will receive both spoken English and written English. The caller will send English speech.

An alternative response from a party that cannot satisfy the request, but only provide spoken English:

7. Acknowledgements

8. IANA Considerations

No IANA considerations. This specification reuses already registered entities.

9. Security Considerations

Some users may regard their language and modality preference details to be sensitive and requiring privacy and security measures. This fact should be considered when implementing the mechanism specified in this document. The security considerations are common with [I-D.ietf-slim-negotiating-human-language].

10. References

10.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.
[RFC5646] Phillips, A. and M. Davis, "Tags for Identifying Languages", BCP 47, RFC 5646, DOI 10.17487/RFC5646, September 2009.
[RFC6497] Davis, M., Phillips, A., Umaoka, Y. and C. Falk, "BCP 47 Extension T - Transformed Content", RFC 6497, DOI 10.17487/RFC6497, February 2012.

10.2. Informative References

[I-D.ietf-slim-negotiating-human-language] Gellens, R., "Negotiating Human Language in Real-Time Communications", Internet-Draft draft-ietf-slim-negotiating-human-language-10, May 2017.

Author's Address

Gunnar Hellstrom Omnitor Hammarby Fabriksvag 23 Stockholm, 120 30 Sweden Phone: +46 708 204 288 EMail: