RTP Payload Format for the TETRA Audio Codec

This document specifies the payload format for packetization of TErrestial Trunked Radio (TETRA) encoded speech signals into the Real-time Transport Protocol (RTP) . The payload format supports transmission of multiple channels, multiple frames per payload, robustness against packet loss, and interoperation with existing TETRA transport formats on non-IP networks, as described in Section . The payload format itself is specified in Section .

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in when they appear in ALL CAPS. These words may also appear in this document in lower case as plain English words, absent their normative meanings. The following acronyms are used in this document: ETSI: European Telecommunications Standards Institute TETRA: TErrestial Trunked Radio The byte order used in this document is network byte order, i.e., the most significant byte first. The bit order is also the most significant bit first. This is presented in all figures as having the most significant bit leftmost on a line and with the lowest number. Some bit fields may wrap over multiple lines in which cases the bits on the first line are more significant than the bits on the next line. Best current practices for writing an RTP payload format specification were followed updated with .

The TETRA codec is used as vocoder for TETRA systems. The TETRA codec is designed for compressing 30ms of audio speech data into 137 bits. The TETRA codec is designed in such a way that on the air interface two of theses 30ms samples are transported together (sub-block 1 and sub-block 2). The codec allows that data of the first 30ms voice frame can be stolen and used for other purposes, e.g. for the exchange of dynamically updated key-material in end-to-end encrypted voice sessions. Codec payload serialisation within the traditional circuit mode based TETRA system is specified for TDM lines with 2048 kBit/s. For this purpose two optional formats are defined , the first format is called FSTE (First Speech Transport Encoding Format), the other format is called OSTE (Optimized Speech Transport Encoding Format). These two formats defer mainly insofar that the OSTE format transports an additional 5 bit frame number, which provides timing information from the air interface to the receiving side in order to save the need for buffering due to different transports speed on air and in 64 kbit/s circuit switched networks. The RTP payload format is defined such that the value of this frame number can be transported.

The RTP payload format is designed in such a way that it can carry the information needed to map the FSTE and OSTE format from . The RTP format is defined such that both of the independent sub-blocks can be transferred separately or together within one RTP packet. Both of them contain the same information in terms of control bits - the information is propagated redundantly. This redundancy is driven by on one hand to simplify the encoding process in direction from E1 to RTP on the other to provide the option to go for either 30ms or 60ms packet size. The redundant information SHALL be propagated consistently equal - otherwise the behavior of the receiver is unspecified. The payload format is chosen such that the TETRA data bits are octet aligned.

The format of the RTP header is specified in . The use of the fields of the RTP header by the TETRA payload format is consistent with that specification. The payload length of TETRA is an integer number of octets; therefore, no padding is necessary. The timestamp, sequence number, and marker bit (M) of the RTP header are used in accordance with Section 4.1 of . The RTP payload type for Tetra is to be assigned dynamically.

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |I|F| CTRL |C|FRAME_NR | R |D(1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | D(137)| S | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1: The following frame contains a first block of two sub-blocks 0: The following frame contains a separated sub-block. A sub-block marked as such could either be a second sub-block, or an independent block, which does not have a relation with any first block. To distinguish between the one and the other the information of the Control bits has to be evaluated.

Value Frame contains 0FSTE encoded data 1OSTE encoded data

Ctrl 1..3 according table 2 of . Value Sub block 1 Sub block 2 000normalnormal 001C stolennormal 010U stolennormal 011C stolenC stolen 100C stolenU stolen 101U stolenC stolen 110U stolenU stolen 111O&M ISI block Ctrl 4..5 according table 3 of . Value Sub block 1 Sub block 2 00BFI no errorsBFI no errors 01BFI no errorsBFI with error(s) 10BFI with error(s)BFI no error(s) 11BFI with error(s)BFI with error(s) NOTE: The meaning of C4 and C5 is outside the scope of the present

This bit may be set to "1" if a decryption (encrypted audio along the circuit switched mobile network, decryption at the RTP sender forwarding this audio) operation could not be performed successfully for the specific half-block. Consequently, the encryption status of the half-block audio data is unknown. Implementation of an RTP receiver has to take into account "C bit" when forwarding such TETRA audio data (either to a decoder directly or via TETRA infrastructure to a TETRA mobile unit), the contained audio might be scrambled - depending if the audio originally was generated as a plain-override half-block or as an encrypted half-block.

Those bits contain an uplink frame number as defined in table 8 of . If no frame number is available the FRAME_NR value SHALL be set to 00000.

The Audio Signal Relevance bits contain information about the Relevance of the voice packet contained here. R 1 0: no audio signal relevance propagated (R2 and R3 do not contain any valid information) 1: audio signal relevance propagated in R2 and R3 R 2..3 According to table 1 of value relevance 00no audio signal relevance (level ? -72 dBm0) 01low audio signal relevance (-52dBm0 ? level > -72dBm0) 10medium audio signal relevance (-32dBm0 ? level > -52dBm0) 11high audio signal relevance (0dBm0 ? level > -32dBm0)

Those bits are reserved for future use and set to "0" currently.

Reference contains the definition for the generation of the codec data. Data bits D1..D137 in chapter 8 correspond to the "Bit number in speech frame" row of table 4 of . The payload itself contains TETRA ACELP coded speech information encoded according to table 4 of .

The following example shows how a first and a consecutive 30 ms frame is combined into a single 60ms RTP packet. Note: This example shows of usage of OSTE mapping.

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1|1| CTRL |C|0|0|0|0|0|0|0|0|D(1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | D(137)| S | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0|1| CTRL |C|0|0|0|0|0|0|0|0|D(1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | D(137)| S | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Both halves of information contain exact the same CTRL bits

Tetra uses a fixed bitrate which cannot be adjusted at all. Since UDP does not provide congestion control, applications that use RTP over UDP SHOULD implement their own congestion control above the UDP layer RFC8085 and MAY also implement a transport circuit breaker RFC8083 . Work in the RMCAT working group describes the interactions and conceptual interfaces necessary between the application components that relate to congestion control, including the RTP layer, the higher-level media codec control layer, and the lower-level transport interface, as well as components dedicated to congestion control functions. Congestion control for RTP SHALL be used in accordance with RFC 3550 , and with any applicable RTP profile; e.g., RFC 3551 . An additional requirement if best-effort service is being used is: users of this payload format MUST monitor packet loss to ensure that the packet loss rate is within acceptable parameters.

This RTP payload format is identified using one media subtype (audio/TETRA) which is registered in accordance with RFC 4855 and per media type registration template from RFC 6838 .

The media type for the TETRA codec is expected to be allocated from the IETF tree once this draft turns into an RFC. This media type registration covers both real-time transfer via RTP and non-real-time transfers via stored files. audio TETRA none Optional parameters: These parameters apply to RTP transfer only. The maximum amount of media which can be encapsulated in a payload packet, expressed as time in milliseconds. The time is calculated as the sum of the time that the media present in the packet represents. The time SHOULD be an integer multiple of the frame size. If this parameter is not present, the sender MAY encapsulate any number of speech frames into one RTP packet. see RFC 4566 . See Section of RFC XXXX. [RFC Editor: Upon publication as an RFC, please replace "XXXX" with the number assigned to this document and remove this note.] Interoperability considerations: Published specification: Applications that use this media type: This media type is used in applications needing transport or storage of encoded voice. Some examples include; Voice over IP, streaming media, voice messaging, and voice recording on recording systems. COMMON

The information carried in the media type specification has a specific mapping to fields in the Session Description Protocol , which is commonly used to describe RTP sessions. When SDP is used to specify sessions employing the TETRA codec, the mapping is as follows: audio TETRA none none The information carried in the media type specification has a specific mapping to fields in the Session Description Protocol , which is commonly used to describe RTP sessions. When SDP is used to specify sessions employing the TETRA codec, the mapping is as follows: The media type ("audio") goes in SDP "m=" as the media name. The media subtype (payload format name) goes in SDP "a=rtpmap" as the encoding name. The RTP clock rate in "a=rtpmap" MUST be 8000. The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and "a=maxptime" attributes, respectively. Any remaining parameters go in the SDP "a=fmtp" attribute by copying them directly from the media type parameter string as a semicolon-separated list of parameter=value pairs. Here is an example SDP session of usage of TETRA:

m=audio 49120 RTP/AVP 99 a=rtpmap:99 TETRA/8000 a=maxptime:60 a=ptime:60

The following considerations apply when using SDP Offer-Answer procedures to negotiate the use of TETRA payload in RTP: In most cases, the parameters "maxptime" and "ptime" will not affect interoperability; however, the setting of the parameters can affect the performance of the application. The SDP offer-answer handling of the "ptime" and "maxptime" parameter is described in RFC3264 . Integer multiples of 30ms SHALL be used for ptime. It is recommended to use packet size of 60ms. Even if there is no good reason why not doing so, there is no need that ptime and maxptime parameters are negotiated symmetrically. Any unknown parameter in an offer SHALL be removed in the answer.

For declarative media, the "ptime" and "maxptime" parameter specifies the possible variants used by the sender.

This memo requests that IANA registers [audio/TETRA] from section . The media type is also requested to be added to the IANA registry for "RTP Payload Format MIME types" ().

RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification , and in any applicable RTP profile. The main security considerations for the RTP packet carrying the RTP payload format defined within this memo are confidentiality, integrity and source authenticity. Confidentiality is achieved by encryption of the RTP payload. Integrity of the RTP packets through suitable cryptographic integrity protection mechanism. Cryptographic systems may also allow the authentication of the source of the payload. A suitable security mechanism for this RTP payload format should provide confidentiality, integrity protection and at least source authentication capable of determining if an RTP packet is from a member of the RTP session or not. Note that the appropriate mechanism to provide security to RTP and payloads following this memo may vary. It is dependent on the application, the transport, and the signaling protocol employed. Therefore a single mechanism is not sufficient, although if suitable the usage of SRTP is recommended. Other mechanism that may be used are IPsec and TLS (RTP over TCP), but also other alternatives may exist.