Network Working Group E. Omara Internet-Draft Apple Intended status: Standards Track J. Uberti Expires: 11 January 2024 Google S. Murillo CoSMo Software R. L. Barnes, Ed. Cisco Y. Fablet Apple 10 July 2023 Secure Frame (SFrame) draft-ietf-sframe-enc-02 Abstract This document describes the Secure Frame (SFrame) end-to-end encryption and authentication mechanism for media frames in a multiparty conference call, in which central media servers (selective forwarding units or SFUs) can access the media metadata needed to make forwarding decisions without having access to the actual media. The proposed mechanism differs from the Secure Real-Time Protocol (SRTP) in that it is independent of RTP (thus compatible with non-RTP media transport) and can be applied to whole media frames in order to be more bandwidth efficient. About This Document This note is to be removed before publishing as an RFC. The latest revision of this draft can be found at https://sframe- wg.github.io/sframe/draft-ietf-sframe-enc.html. Status information for this document may be found at https://datatracker.ietf.org/doc/ draft-ietf-sframe-enc/. Discussion of this document takes place on the Secure Media Frames Working Group mailing list (mailto:sframe@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/sframe/. Subscribe at https://www.ietf.org/mailman/listinfo/sframe/. Source for this draft and an issue tracker can be found at https://github.com/sframe-wg/sframe. Omara, et al. Expires 11 January 2024 [Page 1] Internet-Draft SFrame July 2023 Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 11 January 2024. Copyright Notice Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. SFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4.1. Application Context . . . . . . . . . . . . . . . . . . . 5 4.2. SFrame Ciphertext . . . . . . . . . . . . . . . . . . . . 8 4.3. SFrame Header . . . . . . . . . . . . . . . . . . . . . . 8 4.4. Encryption Schema . . . . . . . . . . . . . . . . . . . . 10 4.4.1. Key Selection . . . . . . . . . . . . . . . . . . . . 10 4.4.2. Key Derivation . . . . . . . . . . . . . . . . . . . 11 4.4.3. Encryption . . . . . . . . . . . . . . . . . . . . . 11 4.4.4. Decryption . . . . . . . . . . . . . . . . . . . . . 13 4.5. Cipher Suites . . . . . . . . . . . . . . . . . . . . . . 14 4.5.1. AES-CTR with SHA2 . . . . . . . . . . . . . . . . . . 15 5. Key Management . . . . . . . . . . . . . . . . . . . . . . . 16 Omara, et al. Expires 11 January 2024 [Page 2] Internet-Draft SFrame July 2023 5.1. Sender Keys . . . . . . . . . . . . . . . . . . . . . . . 16 5.2. MLS . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 6. Media Considerations . . . . . . . . . . . . . . . . . . . . 19 6.1. Selective Forwarding Units . . . . . . . . . . . . . . . 19 6.1.1. LastN and RTP stream reuse . . . . . . . . . . . . . 20 6.1.2. Simulcast . . . . . . . . . . . . . . . . . . . . . . 20 6.1.3. SVC . . . . . . . . . . . . . . . . . . . . . . . . . 20 6.2. Video Key Frames . . . . . . . . . . . . . . . . . . . . 20 6.3. Partial Decoding . . . . . . . . . . . . . . . . . . . . 21 7. Security Considerations . . . . . . . . . . . . . . . . . . . 21 7.1. No Per-Sender Authentication . . . . . . . . . . . . . . 21 7.2. Key Management . . . . . . . . . . . . . . . . . . . . . 21 7.3. Authentication tag length . . . . . . . . . . . . . . . . 21 7.4. Replay . . . . . . . . . . . . . . . . . . . . . . . . . 21 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 8.1. SFrame Cipher Suites . . . . . . . . . . . . . . . . . . 22 9. Application Responsibilities . . . . . . . . . . . . . . . . 23 9.1. Header Value Uniqueness . . . . . . . . . . . . . . . . . 23 9.2. Key Management Framework . . . . . . . . . . . . . . . . 23 9.3. Anti-Replay . . . . . . . . . . . . . . . . . . . . . . . 24 9.4. Metadata . . . . . . . . . . . . . . . . . . . . . . . . 24 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 24 10.1. Normative References . . . . . . . . . . . . . . . . . . 24 10.2. Informative References . . . . . . . . . . . . . . . . . 25 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 26 Appendix B. Example API . . . . . . . . . . . . . . . . . . . . 26 Appendix C. Overhead Analysis . . . . . . . . . . . . . . . . . 28 C.1. Assumptions . . . . . . . . . . . . . . . . . . . . . . . 29 C.2. Audio . . . . . . . . . . . . . . . . . . . . . . . . . . 29 C.3. Video . . . . . . . . . . . . . . . . . . . . . . . . . . 30 C.4. Conferences . . . . . . . . . . . . . . . . . . . . . . . 31 C.5. SFrame over RTP . . . . . . . . . . . . . . . . . . . . . 32 Appendix D. Test Vectors . . . . . . . . . . . . . . . . . . . . 34 D.1. AES_CTR_128_HMAC_SHA256_4 . . . . . . . . . . . . . . . . 35 D.2. AES_CTR_128_HMAC_SHA256_8 . . . . . . . . . . . . . . . . 37 D.3. AES_GCM_128_SHA256 . . . . . . . . . . . . . . . . . . . 38 D.4. AES_GCM_256_SHA512 . . . . . . . . . . . . . . . . . . . 40 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 42 1. Introduction Modern multi-party video call systems use Selective Forwarding Unit (SFU) servers to efficiently route media streams to call endpoints based on factors such as available bandwidth, desired video size, codec support, and other factors. An SFU typically does not need access to the media content of the conference, allowing for the media to be "end-to-end" encrypted so that it cannot be decrypted by the SFU. In order for the SFU to work properly, though, it usually needs Omara, et al. Expires 11 January 2024 [Page 3] Internet-Draft SFrame July 2023 to be able to access RTP metadata and RTCP feedback messages, which is not possible if all RTP/RTCP traffic is end-to-end encrypted. As such, two layers of encryptions and authentication are required: 1. Hop-by-hop (HBH) encryption of media, metadata, and feedback messages between the the endpoints and SFU 2. End-to-end (E2E) encryption of media between the endpoints The Secure Real-Time Protocol (SRTP) is already widely used for HBH encryption [RFC3711]. The SRTP "double encryption" scheme defines a way to do E2E encryption in SRTP [RFC8723]. Unfortunately, this scheme has poor efficiency and high complexity, and its entanglement with RTP makes it unworkable in several realistic SFU scenarios. This document proposes a new end-to-end encryption mechanism known as SFrame, specifically designed to work in group conference calls with SFUs. SFrame is a general encryption framing that can be used to protect media payloads, agnostic of transport. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. IV: Initialization Vector MAC: Message Authentication Code E2EE: End to End Encryption HBH: Hop By Hop We use "Selective Forwarding Unit (SFU)" and "media stream" in a less formal sense than in [RFC7656]. An SFU is a selective switching function for media payloads, and a media stream a sequence of media payloads, in both cases regardless of whether those media payloads are transported over RTP or some other protocol. 3. Goals SFrame is designed to be a suitable E2EE protection scheme for conference call media in a broad range of scenarios, as outlined by the following goals: Omara, et al. Expires 11 January 2024 [Page 4] Internet-Draft SFrame July 2023 1. Provide an secure E2EE mechanism for audio and video in conference calls that can be used with arbitrary SFU servers. 2. Decouple media encryption from key management to allow SFrame to be used with an arbitrary key management system. 3. Minimize packet expansion to allow successful conferencing in as many network conditions as possible. 4. Independence from the underlying transport, including use in non- RTP transports, e.g., WebTransport [I-D.ietf-webtrans-overview]. 5. When used with RTP and its associated error resilience mechanisms, i.e., RTX and FEC, require no special handling for RTX and FEC packets. 6. Minimize the changes needed in SFU servers. 7. Minimize the changes needed in endpoints. 8. Work with the most popular audio and video codecs used in conferencing scenarios. 4. SFrame This document defines an encryption mechanism that provides effective end-to-end encryption, is simple to implement, has no dependencies on RTP, and minimizes encryption bandwidth overhead. Because SFrame can encrypt a full frame, rather than individual packets, bandwidth overhead can be reduced by adding encryption overhead only once per media frame, instead of once per packet. 4.1. Application Context SFrame is a general encryption framing, intended to be used as an E2E encryption layer over an underlying HBH-encrypted transport such as SRTP or QUIC [RFC3711][I-D.ietf-moq-transport]. The scale at which SFrame encryption is applied to media determines the overall amount of overhead that SFrame adds to the media stream, as well as the engineering complexity involved in integrating SFrame into a particular environment. Two patterns are common: Either using SFrame to encrypt whole media frames (per-frame) or individual transport-level media payloads (per-packet). For example, Figure 1 shows a typical media sender stack that takes media in from some source, encodes it into frames, divides those frames into media packets, and then sends those payloads in SRTP Omara, et al. Expires 11 January 2024 [Page 5] Internet-Draft SFrame July 2023 packets. The receiver stack performs the reverse operations, reassembling frames from SRTP packets and decoding. Arrows indicate two different ways that SFrame protection could be integrated into this media stack, to encrypt whole frames or individual media packets. Applying SFrame per-frame in this system offers higher efficiency, but may require a more complex integration in environments where depacketization relies on the content of media packets. Applying SFrame per-packet avoids this complexity, at the cost of higher bandwidth consumption. Some quantitative discussion of these trade- offs is provided in Appendix C. As noted above, however, SFrame is a general media encapsulation, and can be applied in other scenarios. The important thing is that the sender and receivers of an SFrame-encrypted object agree on that object's semantics. SFrame does not provide this agreement; it must be arranged by the application. Omara, et al. Expires 11 January 2024 [Page 6] Internet-Draft SFrame July 2023 +--------------------------------------------------------+ | | | +----------+ +-------------+ +-----------+ | .-. | | | | | | HBH | | | | | | Encode |----->| Packetize |----->| Protect |-----------+ '+' | | | ^ | | ^ | | | | /|\ | +----------+ | +-------------+ | +-----------+ | | / + \ | | | ^ | | / \ | SFrame SFrame | | | / \ | Protect Protect | | | Alice | (per-frame) (per-packet) | | | | ^ ^ | | | | | | | | | +-----------------|-------------------|---------|--------+ | | | | v | | | +---+----+ | E2E Key | | HBH Key | Media | +---- Management ---+ | Management | Server | | | | +---+----+ | | | | +-----------------|-------------------|---------|--------+ | | | | | | | | V V | | | .-. | SFrame SFrame | | | | | | Unprotect Unprotect | | | '+' | (per-frame) (per-packet) | | | /|\ | | | V | | / + \ | +----------+ | +-------------+ | +-----------+ | | / \ | | | V | | V | HBH | | | / \ | | Decode |<-----| Depacketize |<-----| Unprotect |<----------+ Bob | | | | | | | | | +----------+ +-------------+ +-----------+ | | | +--------------------------------------------------------+ Figure 1 Like SRTP, SFrame does not define how the keys used for SFrame are exchanged by the parties in the conference. Keys for SFrame might be distributed over an existing E2E-secure channel (see Section 5.1), or derived from an E2E-secure shared secret (see Section 5.2). The key management system MUST ensure that each key used for encrypting media is used by exactly one media sender, in order to avoid reuse of IVs. Omara, et al. Expires 11 January 2024 [Page 7] Internet-Draft SFrame July 2023 4.2. SFrame Ciphertext An SFrame ciphertext comprises an SFrame header followed by the output of an AEAD encryption of the plaintext [RFC5116], with the header provided as additional authenticated data (AAD). The SFrame header is a variable-length structure described in detail in Section 4.3. The structure of the encrypted data and authentication tag are determined by the AEAD algorithm in use. +-+---+-+----+--------------------+---------------------+<-+ |R|LEN|X|KLEN| Key ID | Counter | | +->+-+---+-+----+--------------------+---------------------+ | | | | | | | | | | | | | | | | | | | Encrypted Data | | | | | | | | | | | | | | | | | | +->+-------------------------------------------------------+<-+ | | Authentication Tag | | | +-------------------------------------------------------+ | | | | | +--- Encrypted Portion Authenticated Portion ---+ When SFrame is applied per-packet, the payload of each packet will be an SFrame ciphertext. When SFrame is applied per-frame, the SFrame ciphertext representing an encrypted frame will span several packets, with the header appearing in the first packet and the authentication tag in the last packet. 4.3. SFrame Header The SFrame header specifies two values from which encryption parameters are derived: * A Key ID (KID) that determines which encryption key should be used * A counter (CTR) that is used to construct the IV for the encryption Applications MUST ensure that each (KID, CTR) combination is used for exactly one encryption operation. A typical approach to achieving this gaurantee is outlined in Section 9.1. Omara, et al. Expires 11 January 2024 [Page 8] Internet-Draft SFrame July 2023 Both the counter and the key id are encoded as integers in network (big-endian) byte order, in a variable length format to decrease the overhead. The length of each field is up to 8 bytes and is represented in 3 bits in the SFrame header: 000 represents a length of 1, 001 a length of 2, etc. The first byte in the SFrame header has a fixed format and contains the header metadata: 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |R| LEN |X| K | +-+-+-+-+-+-+-+-+ Figure 2: SFrame header metadata Reserved (R, 1 bit): This field MUST be set to zero on sending, and MUST be ignored by receivers. Counter Length (LEN, 3 bits): This field indicates the length of the CTR field in bytes, minus one (the range of possible values is thus 1-8). Extended Key Id Flag (X, 1 bit): Indicates if the key field contains the key id or the key length. Key or Key Length (K, 3 bits): This field contains the key id (KID) if the X flag is set to 0, or the key length (KLEN) if set to 1. If X flag is 0, then the KID is in the range of 0-7 and the counter (CTR) is found in the next LEN bytes: 0 1 2 3 4 5 6 7 +-+-----+-+-----+---------------------------------+ |R|LEN |0| KID | CTR... (length=LEN) | +-+-----+-+-----+---------------------------------+ Figure 3: SFrame header with short KID If X flag is 1 then KLEN is the length of the key (KID) in bytes, minus one (the range of possible lengths is thus 1-8). The KID is encoded in the KLEN bytes following the metadata byte, and the counter (CTR) is encoded in the next LEN bytes: 0 1 2 3 4 5 6 7 +-+-----+-+-----+---------------------------+---------------------------+ |R|LEN |1|KLEN | KID... (length=KLEN) | CTR... (length=LEN) | +-+-----+-+-----+---------------------------+---------------------------+ Omara, et al. Expires 11 January 2024 [Page 9] Internet-Draft SFrame July 2023 4.4. Encryption Schema SFrame encryption uses an AEAD encryption algorithm and hash function defined by the cipher suite in use (see Section 4.5). We will refer to the following aspects of the AEAD algorithm below: * AEAD.Encrypt and AEAD.Decrypt - The encryption and decryption functions for the AEAD. We follow the convention of RFC 5116 [RFC5116] and consider the authentication tag part of the ciphertext produced by AEAD.Encrypt (as opposed to a separate field as in SRTP [RFC3711]). * AEAD.Nk - The size in bytes of a key for the encryption algorithm * AEAD.Nn - The size in bytes of a nonce for the encryption algorithm * AEAD.Nt - The overhead in bytes of the encryption algorithm (typically the size of a "tag" that is added to the plaintext) 4.4.1. Key Selection Each SFrame encryption or decryption operation is premised on a single secret base_key, which is labeled with an integer KID value signaled in the SFrame header. The sender and receivers need to agree on which key should be used for a given KID. The process for provisioning keys and their KID values is beyond the scope of this specification, but its security properties will bound the assurances that SFrame provides. For example, if SFrame is used to provide E2E security against intermediary media nodes, then SFrame keys need to be negotiated in a way that does not make them accessible to these intermediaries. For each known KID value, the client stores the corresponding symmetric key base_key. For keys that can be used for encryption, the client also stores the next counter value CTR to be used when encrypting (initially 0). When encrypting a plaintext, the application specifies which KID is to be used, and the counter is incremented after successful encryption. When decrypting, the base_key for decryption is selected from the available keys using the KID value in the SFrame Header. Omara, et al. Expires 11 January 2024 [Page 10] Internet-Draft SFrame July 2023 A given key MUST NOT be used for encryption by multiple senders. Such reuse would result in multiple encrypted frames being generated with the same (key, nonce) pair, which harms the protections provided by many AEAD algorithms. Implementations SHOULD mark each key as usable for encryption or decryption, never both. Note that the set of available keys might change over the lifetime of a real-time session. In such cases, the client will need to manage key usage to avoid media loss due to a key being used to encrypt before all receivers are able to use it to decrypt. For example, an application may make decryption-only keys available immediately, but delay the use of keys for encryption until (a) all receivers have acknowledged receipt of the new key or (b) a timeout expires. 4.4.2. Key Derivation SFrame encrytion and decryption use a key and salt derived from the base_key associated to a KID. Given a base_key value, the key and salt are derived using HKDF [RFC5869] as follows: def derive_key_salt(KID, base_key): sframe_secret = HKDF-Extract("SFrame 1.0 Secret " + KID, base_key) sframe_key = HKDF-Expand(sframe_secret, "key", AEAD.Nk) sframe_salt = HKDF-Expand(sframe_secret, "salt", AEAD.Nn) return sframe_key, sframe_salt In the derivation of sframe_secret, the + operator represents concatenation of octet strings and the KID value is encoded as an 8-byte big-endian integer (not the compressed form used in the SFrame header). The hash function used for HKDF is determined by the cipher suite in use. 4.4.3. Encryption SFrame encryption uses the AEAD encryption algorithm for the cipher suite in use. The key for the encryption is the sframe_key and the nonce is formed by XORing the sframe_salt with the current counter, encoded as a big-endian integer of length AEAD.Nn. The encryptor forms an SFrame header using the CTR, and KID values provided. The encoded header is provided as AAD to the AEAD encryption operation, together with application-provided metadata about the encrypted media (see Section 9.4). Omara, et al. Expires 11 January 2024 [Page 11] Internet-Draft SFrame July 2023 def encrypt(CTR, KID, metadata, plaintext): sframe_key, sframe_salt = key_store[KID] ctr = encode_big_endian(CTR, AEAD.Nn) nonce = xor(sframe_salt, CTR) header = encode_sframe_header(CTR, KID) aad = header + metadata ciphertext = AEAD.Encrypt(sframe_key, nonce, aad, plaintext) return header + ciphertext For example, the metadata input to encryption allows for frame metadata to be authenticated when SFrame is applied per-frame. After encoding the frame and before packetizing it, the necessary media metadata will be moved out of the encoded frame buffer, to be sent in some channel visible to the SFU (e.g., an RTP header extension). Omara, et al. Expires 11 January 2024 [Page 12] Internet-Draft SFrame July 2023 +----------------+ +---------------+ | metadata | | | +-------+--------+ | | | | plaintext | | | | | | | | +-------+-------+ | | header ----+------------------>| AAD +-----+ | | S | | +-----+ | | KID +--+--> sframe_key ----->| Key | | | | | | +--> sframe_salt --+ | +-----+ | | | CTR +---------------------+->| Nonce | | | | | | +-----+ | | AEAD.Encrypt | | | +---------------+ | +---->| SFrame Header | | +---------------+ | | | | | |<----+ | ciphertext | | | | | +---------------+ Figure 4: Encryption flow 4.4.4. Decryption Before decrypting, a client needs to assemble a full SFrame ciphertext. When an SFrame ciphertext may be fragmented into multiple parts for transport (e.g., a whole encrypted frame sent in multiple SRTP packets), the receiving client collects all the fragments of the ciphertext, using an appropriate sequencing and start/end markers in the transport. Once all of the required fragments are available, the client reassembles them into the SFrame ciphertext, then passes the ciphertext to SFrame for decryption. The KID field in the SFrame header is used to find the right key and salt for the encrypted frame, and the CTR field is used to construct the nonce. Omara, et al. Expires 11 January 2024 [Page 13] Internet-Draft SFrame July 2023 def decrypt(metadata, sframe_ciphertext): KID, CTR, ciphertext = parse_ciphertext(sframe_ciphertext) sframe_key, sframe_salt = key_store[KID] ctr = encode_big_endian(CTR, AEAD.Nn) nonce = xor(sframe_salt, ctr) aad = header + metadata return AEAD.Decrypt(sframe_key, nonce, aad, ciphertext) If a ciphertext fails to decrypt because there is no key available for the KID in the SFrame header, the client MAY buffer the ciphertext and retry decryption once a key with that KID is received. 4.5. Cipher Suites Each SFrame session uses a single cipher suite that specifies the following primitives: * A hash function used for key derivation * An AEAD encryption algorithm [RFC5116] used for frame encryption, optionally with a truncated authentication tag This document defines the following cipher suites, with the constants defined in Section 4.4: +============================+====+====+====+====+ | Name | Nh | Nk | Nn | Nt | +============================+====+====+====+====+ | AES_128_CTR_HMAC_SHA256_80 | 32 | 16 | 12 | 10 | +----------------------------+----+----+----+----+ | AES_128_CTR_HMAC_SHA256_64 | 32 | 16 | 12 | 8 | +----------------------------+----+----+----+----+ | AES_128_CTR_HMAC_SHA256_32 | 32 | 16 | 12 | 4 | +----------------------------+----+----+----+----+ | AES_128_GCM_SHA256_128 | 32 | 16 | 12 | 16 | +----------------------------+----+----+----+----+ | AES_256_GCM_SHA512_128 | 64 | 32 | 12 | 16 | +----------------------------+----+----+----+----+ Table 1: SFrame cipher suite constants Numeric identifiers for these cipher suites are defined in the IANA registry created in Section 8.1. Omara, et al. Expires 11 January 2024 [Page 14] Internet-Draft SFrame July 2023 In the suite names, the length of the authentication tag is indicated by the last value: "_128" indicates a hundred-twenty-eight-bit tag, "_80" indicates a eighty-bit tag, "_64" indicates a sixty-four-bit tag and "_32" indicates a thirty-two-bit tag. In a session that uses multiple media streams, different cipher suites might be configured for different media streams. For example, in order to conserve bandwidth, a session might use a cipher suite with eighty-bit tags for video frames and another cipher suite with thirty-two-bit tags for audio frames. 4.5.1. AES-CTR with SHA2 In order to allow very short tag sizes, we define a synthetic AEAD function using the authenticated counter mode of AES together with HMAC for authentication. We use an encrypt-then-MAC approach, as in SRTP [RFC3711]. Before encryption or decryption, encryption and authentication subkeys are derived from the single AEAD key using HKDF. The subkeys are derived as follows, where Nk represents the key size for the AES block cipher in use, Nh represents the output size of the hash function, and Nt represents the size of a tag for the cipher in bytes (as in Table 2): def derive_subkeys(sframe_key): tag_len = encode_big_endian(Nt, 8) aead_label = "SFrame 1.0 AES CTR AEAD " + tag_len aead_secret = HKDF-Extract(aead_label, sframe_key) enc_key = HKDF-Expand(aead_secret, "enc", Nk) auth_key = HKDF-Expand(aead_secret, "auth", Nh) return enc_key, auth_key The AEAD encryption and decryption functions are then composed of individual calls to the CTR encrypt function and HMAC. The resulting MAC value is truncated to a number of bytes Nt fixed by the cipher suite. Omara, et al. Expires 11 January 2024 [Page 15] Internet-Draft SFrame July 2023 def compute_tag(auth_key, nonce, aad, ct): aad_len = encode_big_endian(len(aad), 8) ct_len = encode_big_endian(len(ct), 8) tag_len = encode_big_endian(Nt, 8) auth_data = aad_len + ct_len + tag_len + nonce + aad + ct tag = HMAC(auth_key, auth_data) return truncate(tag, Nt) def AEAD.Encrypt(key, nonce, aad, pt): enc_key, auth_key = derive_subkeys(key) ct = AES-CTR.Encrypt(enc_key, nonce, pt) tag = compute_tag(auth_key, nonce, aad, ct) return ct + tag def AEAD.Decrypt(key, nonce, aad, ct): inner_ct, tag = split_ct(ct, tag_len) enc_key, auth_key = derive_subkeys(key) candidate_tag = compute_tag(auth_key, nonce, aad, inner_ct) if !constant_time_equal(tag, candidate_tag): raise Exception("Authentication Failure") return AES-CTR.Decrypt(enc_key, nonce, inner_ct) 5. Key Management SFrame must be integrated with an E2E key management framework to exchange and rotate the keys used for SFrame encryption. The key management framework provides the following functions: * Provisioning KID / base_key mappings to participating clients * Updating the above data as clients join or leave It is the responsibility of the application to provide the key management framework, as described in Section 9.2. 5.1. Sender Keys If the participants in a call have a pre-existing E2E-secure channel, they can use it to distribute SFrame keys. Each client participating in a call generates a fresh encryption key. The client then uses the E2E-secure channel to send their encryption key to the other participants. Omara, et al. Expires 11 January 2024 [Page 16] Internet-Draft SFrame July 2023 In this scheme, it is assumed that receivers have a signal outside of SFrame for which client has sent a given frame (e.g., an RTP SSRC). SFrame KID values are then used to distinguish between versions of the sender's key. Key IDs in this scheme have two parts, a "key generation" and a "ratchet step". Both are unsigned integers that begin at zero. The key generation increments each time the sender distributes a new key to receivers. The "ratchet step" is incremented each time the sender ratchets their key forward for forward secrecy: sender_base_key[i+1] = HKDF-Expand( HKDF-Extract("SFrame 1.0 Ratchet", sender_base_key[i]), "", CipherSuite.Nh) For compactness, we do not send the whole ratchet step. Instead, we send only its low-order R bits, where R is a value set by the application. Different senders may use different values of R, but each receiver of a given sender needs to know what value of R is used by the sender so that they can recognize when they need to ratchet (vs. expecting a new key). R effectively defines a re-ordering window, since no more than 2^R ratchet steps can be active at a given time. The key generation is sent in the remaining 64 - R bits of the key ID. KID = (key_generation << R) + (ratchet_step % (1 << R)) 64-R bits R bits <---------------> <------------> +-----------------+--------------+ | Key Generation | Ratchet Step | +-----------------+--------------+ Figure 5: Structure of a KID in the Sender Keys scheme The sender signals such a ratchet step update by sending with a KID value in which the ratchet step has been incremented. A receiver who receives from a sender with a new KID computes the new key as above. The old key may be kept for some time to allow for out-of-order delivery, but should be deleted promptly. If a new participant joins mid-call, they will need to receive from each sender (a) the current sender key for that sender and (b) the current KID value for the sender. Evicting a participant requires each sender to send a fresh sender key to all receivers. Omara, et al. Expires 11 January 2024 [Page 17] Internet-Draft SFrame July 2023 5.2. MLS The Messaging Layer Security (MLS) protocol provides group authenticated key exchange [I-D.ietf-mls-architecture] [I-D.ietf-mls-protocol]. In principle, it could be used to instantiate the sender key scheme above, but it can also be used more efficiently directly. MLS creates a linear sequence of keys, each of which is shared among the members of a group at a given point in time. When a member joins or leaves the group, a new key is produced that is known only to the augmented or reduced group. Each step in the lifetime of the group is know as an "epoch", and each member of the group is assigned an "index" that is constant for the time they are in the group. To generate keys and nonces for SFrame, we use the MLS exporter function to generate a base_key value for each MLS epoch. Each member of the group is assigned a set of KID values, so that each member has a unique sframe_key and sframe_salt that it uses to encrypt with. Senders may choose any KID value within their assigned set of KID values, e.g., to allow a single sender to send multiple uncoordinated outbound media streams. base_key = MLS-Exporter("SFrame 1.0 Base Key", "", AEAD.Nk) For compactness, we do not send the whole epoch number. Instead, we send only its low-order E bits, where E is a value set by the application. E effectively defines a re-ordering window, since no more than 2^E epochs can be active at a given time. Receivers MUST be prepared for the epoch counter to roll over, removing an old epoch when a new epoch with the same E lower bits is introduced. Let S be the number of bits required to encode a member index in the group, i.e., the smallest value such that group_size < (1 << S). The sender index is encoded in theSbits above the epoch. The remaining64 - S - Ebits of the KID value are acontextvalue chosen by the sender (context value0` will produce the shortest encoded KID). KID = (context << (S + E)) + (sender_index << E) + (epoch % (1 << E)) 64-S-E bits S bits E bits <-----------> <------> <------> +-------------+--------+-------+ | Context ID | Index | Epoch | +-------------+--------+-------+ Figure 6: Structure of a KID for an MLS Sender Omara, et al. Expires 11 January 2024 [Page 18] Internet-Draft SFrame July 2023 Once an SFrame stack has been provisioned with the sframe_epoch_secret for an epoch, it can compute the required KIDs and sender_base_key values on demand, as it needs to encrypt/decrypt for a given member. ... | | Epoch 14 +--+-- index=3 ---> KID = 0x3e | | | +-- index=7 ---> KID = 0x7e | | | +-- index=20 --> KID = 0x14e | | Epoch 15 +--+-- index=3 ---> KID = 0x3f | | | +-- index=5 ---> KID = 0x5f | | Epoch 16 +----- index=2 --+--> context = 2 --> KID = 0x820 | | | +--> context = 3 --> KID = 0xc20 | | Epoch 17 +--+-- index=33 --> KID = 0x211 | | | +-- index=51 --> KID = 0x331 | | ... Figure 7: An example sequence of KIDs for an MLS-based SFrame session. We assume that the group has 64 members, S=6. 6. Media Considerations 6.1. Selective Forwarding Units Selective Forwarding Units (SFUs) (e.g., those described in Section 3.7 of [RFC7667]) receive the media streams from each participant and select which ones should be forwarded to each of the other participants. There are several approaches about how to do this stream selection but in general, in order to do so, the SFU needs to access metadata associated to each frame and modify the RTP information of the incoming packets when they are transmitted to the received participants. Omara, et al. Expires 11 January 2024 [Page 19] Internet-Draft SFrame July 2023 This section describes how this normal SFU modes of operation interacts with the E2EE provided by SFrame 6.1.1. LastN and RTP stream reuse The SFU may choose to send only a certain number of streams based on the voice activity of the participants. To avoid the overhead involved in establishing new transport streams, the SFU may decide to reuse previously existing streams or even pre-allocate a predefined number of streams and choose in each moment in time which participant media will be sent through it. This means that in the same transport-level stream (e.g., an RTP stream defined by either SSRC or MID) may carry media from different streams of different participants. As different keys are used by each participant for encoding their media, the receiver will be able to verify which is the sender of the media coming within the RTP stream at any given point in time, preventing the SFU trying to impersonate any of the participants with another participant's media. Note that in order to prevent impersonation by a malicious participant (not the SFU), a mechanism based on digital signature would be required. SFrame does not protect against such attacks. 6.1.2. Simulcast When using simulcast, the same input image will produce N different encoded frames (one per simulcast layer) which would be processed independently by the frame encryptor and assigned an unique counter for each. 6.1.3. SVC In both temporal and spatial scalability, the SFU may choose to drop layers in order to match a certain bitrate or forward specific media sizes or frames per second. In order to support it, the sender MUST encode each spatial layer of a given picture in a different frame. That is, an RTP frame may contain more than one SFrame encrypted frame with an incrementing frame counter. 6.2. Video Key Frames Forward and Post-Compromise Security requires that the e2ee keys are updated anytime a participant joins/leave the call. The key exchange happens asynchronously and on a different path than the SFU signaling and media. So it may happen that when a new participant joins the call and the SFU side requests a key frame, the Omara, et al. Expires 11 January 2024 [Page 20] Internet-Draft SFrame July 2023 sender generates the e2ee encrypted frame with a key not known by the receiver, so it will be discarded. When the sender updates his sending key with the new key, it will send it in a non-key frame, so the receiver will be able to decrypt it, but not decode it. Receiver will re-request an key frame then, but due to sender and SFU policies, that new key frame could take some time to be generated. If the sender sends a key frame when the new e2ee key is in use, the time required for the new participant to display the video is minimized. 6.3. Partial Decoding Some codes support partial decoding, where it can decrypt individual packets without waiting for the full frame to arrive, with SFrame this won't be possible because the decoder will not access the packets until the entire frame has arrived and was decrypted. 7. Security Considerations 7.1. No Per-Sender Authentication SFrame does not provide per-sender authentication of media data. Any sender in a session can send media that will be associated with any other sender. This is because SFrame uses symmetric encryption to protect media data, so that any receiver also has the keys required to encrypt packets for the sender. 7.2. Key Management Key exchange mechanism is out of scope of this document, however every client SHOULD change their keys when new clients joins or leaves the call for "Forward Secrecy" and "Post Compromise Security". 7.3. Authentication tag length The cipher suites defined in this draft use short authentication tags for encryption, however it can easily support other ciphers with full authentication tag if the short ones are proved insecure. 7.4. Replay The handling of replay is out of the scope of this document. However, senders MUST reject requests to encrypt multiple times with the same key and nonce, since several AEAD algorithms fail badly in such cases (see, e.g., Section 5.1.1 of [RFC5116]). Omara, et al. Expires 11 January 2024 [Page 21] Internet-Draft SFrame July 2023 8. IANA Considerations This document requests the creation of the following new IANA registries: * SFrame Cipher Suites (Section 8.1) This registries should be under a heading of "SFrame", and assignments are made via the Specification Required policy [RFC8126]. RFC EDITOR: Please replace XXXX throughout with the RFC number assigned to this document 8.1. SFrame Cipher Suites This registry lists identifiers for SFrame cipher suites, as defined in Section 4.5. The cipher suite field is two bytes wide, so the valid cipher suites are in the range 0x0000 to 0xFFFF. Template: * Value: The numeric value of the cipher suite * Name: The name of the cipher suite * Reference: The document where this wire format is defined Initial contents: +========+============================+===========+ | Value | Name | Reference | +========+============================+===========+ | 0x0001 | AES_128_CTR_HMAC_SHA256_80 | RFC XXXX | +--------+----------------------------+-----------+ | 0x0002 | AES_128_CTR_HMAC_SHA256_64 | RFC XXXX | +--------+----------------------------+-----------+ | 0x0003 | AES_128_CTR_HMAC_SHA256_32 | RFC XXXX | +--------+----------------------------+-----------+ | 0x0004 | AES_128_GCM_SHA256_128 | RFC XXXX | +--------+----------------------------+-----------+ | 0x0005 | AES_256_GCM_SHA512_128 | RFC XXXX | +--------+----------------------------+-----------+ Table 2: SFrame cipher suites Omara, et al. Expires 11 January 2024 [Page 22] Internet-Draft SFrame July 2023 9. Application Responsibilities To use SFrame, an application needs to define the inputs to the SFrame encryption and decryption operations, and how SFrame ciphertexts are delivered from sender to receiver (including any fragmentation and reassembly). In this section, we lay out additional requirements that an integration must meet in order for SFrame to operate securely. 9.1. Header Value Uniqueness Applications MUST ensure that each (KID, CTR) combination is used for exactly one encryption operation. Typically this is done by assigning each sender a KID or set of KIDs, then having each sender use the CTR field as a monotonic counter, incrementing for each plaintext that is encrypted. Note that in addition to its simplicity, this scheme minimizes overhead by keeping CTR values as small as possible. 9.2. Key Management Framework It is up to the application to provision SFrame with a mapping of KID values to base_key values and the resulting keys and salts. More importantly, the application specifies which KID values are used for which purposes (e.g., by which senders). An applications KID assignment strategy MUST be structured to assure the non-reuse properties discussed above. It is also up to the application to define a rotation schedule for keys. For example, one application might have an ephemeral group for every call and keep rotating keys when end points join or leave the call, while another application could have a persistent group that can be used for multiple calls and simply derives ephemeral symmetric keys for a specific call. It should be noted that KID values are not encrypted by SFrame, and are thus visible to any application-layer intermediaries that might handle an SFrame ciphertext. If there are application semantics included in KID values, then this information would be exposed to intermediaries. For example, in the scheme of Section 5.1, the number of ratchet steps per sender is exposed, and in the scheme of Section 5.2, the number of epochs and the MLS sender ID of the SFrame sender are exposed. Omara, et al. Expires 11 January 2024 [Page 23] Internet-Draft SFrame July 2023 9.3. Anti-Replay It is the responsibility of the application to handle anti-replay. Replay by network attackers is assumed to be prevented by network- layer facilities (e.g., TLS, SRTP). As mentioned in Section 7.4, senders MUST reject requests to encrypt multiple times with the same key and nonce. It is not mandatory to implement anti-replay on the receiver side. Receivers MAY apply time or counter based anti-replay mitigations. 9.4. Metadata The metadata input to SFrame operations is pure application-specified data. As such, it is up to the application to define what information should go in the metadata input and ensure that it is provided to the encryption and decryption functions at the appropriate points. A receiver SHOULD NOT use SFrame-authenticated metadata until after the SFrame decrypt function has authenticated it. Note: The metadata input is a feature at risk, and needs more confirmation that it is useful and/or needed. 10. References 10.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC5116] McGrew, D., "An Interface and Algorithms for Authenticated Encryption", RFC 5116, DOI 10.17487/RFC5116, January 2008, . [RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand Key Derivation Function (HKDF)", RFC 5869, DOI 10.17487/RFC5869, May 2010, . [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 8126, DOI 10.17487/RFC8126, June 2017, . Omara, et al. Expires 11 January 2024 [Page 24] Internet-Draft SFrame July 2023 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . 10.2. Informative References [I-D.codec-agnostic-rtp-payload-format] Murillo, S. G. and A. Gouaillard, "Codec agnostic RTP payload format for video", Work in Progress, Internet- Draft, draft-codec-agnostic-rtp-payload-format-00, 19 February 2021, . [I-D.ietf-mls-architecture] Beurdouche, B., Rescorla, E., Omara, E., Inguva, S., and A. Duric, "The Messaging Layer Security (MLS) Architecture", Work in Progress, Internet-Draft, draft- ietf-mls-architecture-10, 16 December 2022, . [I-D.ietf-mls-protocol] Barnes, R., Beurdouche, B., Robert, R., Millican, J., Omara, E., and K. Cohn-Gordon, "The Messaging Layer Security (MLS) Protocol", Work in Progress, Internet- Draft, draft-ietf-mls-protocol-20, 27 March 2023, . [I-D.ietf-moq-transport] Curley, L., Pugin, K., Nandakumar, S., and V. Vasiliev, "Media over QUIC Transport", Work in Progress, Internet- Draft, draft-ietf-moq-transport-00, 5 July 2023, . [I-D.ietf-webtrans-overview] Vasiliev, V., "The WebTransport Protocol Framework", Work in Progress, Internet-Draft, draft-ietf-webtrans-overview- 05, 24 January 2023, . [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, DOI 10.17487/RFC3711, March 2004, . Omara, et al. Expires 11 January 2024 [Page 25] Internet-Draft SFrame July 2023 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, DOI 10.17487/RFC4566, July 2006, . [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, September 2012, . [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms for Real-Time Transport Protocol (RTP) Sources", RFC 7656, DOI 10.17487/RFC7656, November 2015, . [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, DOI 10.17487/RFC7667, November 2015, . [RFC8723] Jennings, C., Jones, P., Barnes, R., and A.B. Roach, "Double Encryption Procedures for the Secure Real-Time Transport Protocol (SRTP)", RFC 8723, DOI 10.17487/RFC8723, April 2020, . [TestVectors] "SFrame Test Vectors", 2021, . Appendix A. Acknowledgements The authors wish to specially thank Dr. Alex Gouaillard as one of the early contributors to the document. His passion and energy were key to the design and development of SFrame. Appendix B. Example API *This section is not normative.* This section describes a notional API that an SFrame implementation might expose. The core concept is an "SFrame context", within which KID values are meaningful. In the key management scheme described in Section 5.1, each sender has a different context; in the scheme described in Section 5.2, all senders share the same context. An SFrame context stores mappings from KID values to "key contexts", which are different depending on whether the KID is to be used for sending or receiving (an SFrame key should never be used for both Omara, et al. Expires 11 January 2024 [Page 26] Internet-Draft SFrame July 2023 operations). A key context tracks the key and salt associated to the KID, and the current CTR value. A key context to be used for sending also tracks the next CTR value to be used. The primary operations on an SFrame context are as follows: * *Create an SFrame context:* The context is initialized with a ciphersuite and no KID mappings. * *Adding a key for sending:* The key and salt are derived from the base key, and used to initialize a send context, together with a zero counter value. * *Adding a key for receiving:* The key and salt are derived from the base key, and used to initialize a send context. * *Encrypt a plaintext:* Encrypt a given plaintext using the key for a given KID, including the specified metadata. * *Decrypt an SFrame ciphertext:* Decrypt an SFrame ciphertext with the KID and CTR values specified in the SFrame Header, and the provided metadata. Figure 8 shows an example of the types of structures and methods that could be used to create an SFrame API in Rust. Omara, et al. Expires 11 January 2024 [Page 27] Internet-Draft SFrame July 2023 type KeyId = u64; type Counter = u64; type CipherSuite = u16; struct SendKeyContext { key: Vec, salt: Vec, next_counter: Counter, } struct RecvKeyContext { key: Vec, salt: Vec, } struct SFrameContext { cipher_suite: CipherSuite, send_keys: HashMap, recv_keys: HashMap, } trait SFrameContextMethods { fn create(cipher_suite: CipherSuite) -> Self; fn add_send_key(&self, kid: KeyId, base_key: &[u8]); fn add_recv_key(&self, kid: KeyId, base_key: &[u8]); fn encrypt(&mut self, kid: KeyId, metadata: &[u8], plaintext: &[u8]) -> Vec; fn decrypt(&self, metadata: &[u8], ciphertext: &[u8]) -> Vec; } Figure 8: An example SFrame API Appendix C. Overhead Analysis Any use of SFrame will impose overhead in terms of the amount of bandwidth necessary to transmit a given media stream. Exactly how much overhead will be added depends on several factors: * How many senders are involved in a conference (length of KID) * How long the conference has been going on (length of CTR) * The cipher suite in use (length of authentication tag) * Whether SFrame is used to encrypt packets, whole frames, or some other unit Overall, the overhead rate in kilobits per second can be estimated as: Omara, et al. Expires 11 January 2024 [Page 28] Internet-Draft SFrame July 2023 OverheadKbps = (1 + |CTR| + |KID| + |TAG|) * 8 * CTPerSecond / 1024 Here the constant value 1 reflects the fixed SFrame header; |CTR| and |KID| reflect the lengths of those fields; |TAG| reflects the cipher overhead; and CTPerSecond reflects the number of SFrame ciphertexts sent per second (e.g., packets or frames per second). In the remainder of this secton, we compute overhead estimates for a collection of common scenarios. C.1. Assumptions In the below calculations, we make conservative assumptions about SFrame overhead, so that the overhead amounts we compute here are likely to be an upper bound on those seen in practice. +==============+=======+============================+ | Field | Bytes | Explanataion | +==============+=======+============================+ | Fixed header | 1 | Fixed | +--------------+-------+----------------------------+ | Key ID (KID) | 2 | >255 senders; or MLS epoch | | | | (E=4) and >16 senders | +--------------+-------+----------------------------+ | Counter | 3 | More than 24 hours of | | (CTR) | | media in common cases | +--------------+-------+----------------------------+ | Cipher | 16 | Full GCM tag (longest | | overhead | | defined here) | +--------------+-------+----------------------------+ Table 3 In total, then, we assume that each SFrame encryption will add 22 bytes of overhead. We consider two scenarios, applying SFrame per-frame and per-packet. In each scenario, we compute the SFrame overhead in absolute terms (Kbps) and as a percentage of the base bandwidth. C.2. Audio In audio streams, there is typically a one-to-one relationship between frames and packets, so the overhead is the same whether one uses SFrame at a per-packet or per-frame level. The below table considers three scenarios, based on recommended configurations of the Opus codec [RFC6716]: Omara, et al. Expires 11 January 2024 [Page 29] Internet-Draft SFrame July 2023 * Narrow-band speech: 120ms packets, 8Kbps * Full-band speech: 20ms packets, 32Kbps * Full-band stereo music: 10ms packets, 128Kbps +===============+=====+===========+===============+============+ | Scenario | fps | Base Kbps | Overhead Kbps | Overhead % | +===============+=====+===========+===============+============+ | NB speech, | 8.3 | 8 | 1.4 | 17.9% | | 120ms packets | | | | | +---------------+-----+-----------+---------------+------------+ | FB speech, | 50 | 32 | 8.6 | 26.9% | | 20ms packets | | | | | +---------------+-----+-----------+---------------+------------+ | FB stereo, | 100 | 128 | 17.2 | 13.4% | | 10ms packets | | | | | +---------------+-----+-----------+---------------+------------+ Table 4: SFrame overhead for audio streams C.3. Video Video frames can be larger than an MTU and thus are commonly split across multiple frames. Table 5 and Table 6 show the estimated overhead of encrypting a video stream, where SFrame is applied per- frame and per-packet, respectively. The choices of resolution, frames per second, and bandwidth are chosen to roughly reflect the capabilities of modern video codecs across a range from very low to very high quality. +=============+=====+===========+===============+============+ | Scenario | fps | Base Kbps | Overhead Kbps | Overhead % | +=============+=====+===========+===============+============+ | 426 x 240 | 7.5 | 45 | 1.3 | 2.9% | +-------------+-----+-----------+---------------+------------+ | 640 x 360 | 15 | 200 | 2.6 | 1.3% | +-------------+-----+-----------+---------------+------------+ | 640 x 360 | 30 | 400 | 5.2 | 1.3% | +-------------+-----+-----------+---------------+------------+ | 1280 x 720 | 30 | 1500 | 5.2 | 0.3% | +-------------+-----+-----------+---------------+------------+ | 1920 x 1080 | 60 | 7200 | 10.3 | 0.1% | +-------------+-----+-----------+---------------+------------+ Table 5: SFrame overhead for a video stream encrypted per- frame Omara, et al. Expires 11 January 2024 [Page 30] Internet-Draft SFrame July 2023 +=============+=====+=====+===========+===============+============+ | Scenario | fps | pps | Base Kbps | Overhead Kbps | Overhead % | +=============+=====+=====+===========+===============+============+ | 426 x 240 | 7.5 | 7.5 | 45 | 1.3 | 2.9% | +-------------+-----+-----+-----------+---------------+------------+ | 640 x 360 | 15 | 30 | 200 | 5.2 | 2.6% | +-------------+-----+-----+-----------+---------------+------------+ | 640 x 360 | 30 | 60 | 400 | 10.3 | 2.6% | +-------------+-----+-----+-----------+---------------+------------+ | 1280 x 720 | 30 | 180 | 1500 | 30.9 | 2.1% | +-------------+-----+-----+-----------+---------------+------------+ | 1920 x 1080 | 60 | 780 | 7200 | 134.1 | 1.9% | +-------------+-----+-----+-----------+---------------+------------+ Table 6: SFrame overhead for a video stream encrypted per-packet In the per-frame case, the SFrame percentage overhead approaches zero as the quality of the video goes up, since bandwidth is driven more by picture size than frame rate. In the per-packet case, the SFrame percentage overhead approaches the ratio between the SFrame overhead per packet and the MTU (here 22 bytes of SFrame overhead divided by an assumed 1200-byte MTU, or about 1.8%). C.4. Conferences Real conferences usually involve several audio and video streams. The overhead of SFrame in such a conference is the aggregate of the overhead over all the individual streams. Thus, while SFrame incurs a large percentage overhead on an audio stream, if the conference also involves a video stream, then the audio overhead is likely negligible relative to the overall bandwidth of the conference. For example, Table 7 shows the overhead estimates for a two person conference where one person is sending low-quality media and the other sending high-quality. (And we assume that SFrame is applied per-frame.) The video streams dominate the bandwidth at the SFU, so the total bandwidth overhead is only around 1%. Omara, et al. Expires 11 January 2024 [Page 31] Internet-Draft SFrame July 2023 +=====================+===========+===============+============+ | Stream | Base Kbps | Overhead Kbps | Overhead % | +=====================+===========+===============+============+ | Participant 1 audio | 8 | 1.4 | 17.9% | +---------------------+-----------+---------------+------------+ | Participant 1 video | 45 | 1.3 | 2.9% | +---------------------+-----------+---------------+------------+ | Participant 2 audio | 32 | 9 | 26.9% | +---------------------+-----------+---------------+------------+ | Participant 2 video | 1500 | 5 | 0.3% | +---------------------+-----------+---------------+------------+ | Total at SFU | 1585 | 16.5 | 1.0% | +---------------------+-----------+---------------+------------+ Table 7: SFrame overhead for a two-person conference C.5. SFrame over RTP SFrame is a generic encapsulation format, but many of the applications in which it is likely to be integrated are based on RTP. This section discusses how an integration between SFrame and RTP could be done, and some of the challenges that would need to be overcome. As discussed in Section 4.1, there are two natural patterns for integrating SFrame into an application: applying SFrame per-frame or per-packet. In RTP-based applications, applying SFrame per-packet means that the payload of each RTP packet will be an SFrame ciphertext, starting with an SFrame Header, as shown in Figure 9. Applying SFrame per-frame means that different RTP payloads will have different formats: The first payload of a frame will contain the SFrame headers, and subsequent payloads will contain further chunks of the ciphertext, as shown in Figure 10. In order for these media payloads to be properly interpreted by receivers, receivers will need to be configured to know which of the above schemes the sender has applied to a given sequence of RTP packets. SFrame does not provide a mechanism for distributing this configuration information. In applications that use SDP for negotiating RTP media streams [RFC4566], an appropriate extension to SDP could provide this function. Omara, et al. Expires 11 January 2024 [Page 32] Internet-Draft SFrame July 2023 Applying SFrame per-frame also requires that packetization and depacketization be done in a generic manner that does not depend on the media content of the packets, since the content being packetized / depacketized will be opaque ciphertext (except for the SFrame header). In order for such a generic packetization scheme to work interoperably one would have to be defined, e.g., as proposed in [I-D.codec-agnostic-rtp-payload-format]. +---+-+-+-------+-+-------------+-------------------------------+<-+ |V=2|P|X| CC |M| PT | sequence number | | +---+-+-+-------+-+-------------+-------------------------------+ | | timestamp | | +---------------------------------------------------------------+ | | synchronization source (SSRC) identifier | | +===============================================================+ | | contributing source (CSRC) identifiers | | | .... | | +---------------------------------------------------------------+ | | RTP extension(s) (OPTIONAL) | | +->+--------------------+------------------------------------------+ | | | SFrame header | | | | +--------------------+ | | | | | | | | SFrame encrypted and authenticated payload | | | | | | +->+---------------------------------------------------------------+<-+ | | SRTP authentication tag | | | +---------------------------------------------------------------+ | | | +--- SRTP Encrypted Portion SRTP Authenticated Portion ---+ Figure 9: SRTP packet with SFrame-protected payload Omara, et al. Expires 11 January 2024 [Page 33] Internet-Draft SFrame July 2023 +----------------+ +---------------+ | frame metadata | | | +-------+--------+ | | | | frame | | | | | | | | +-------+-------+ | | | | V V +--------------------------------------+ | SFrame Encrypt | +--------------------------------------+ | | | | | V | +-------+-------+ | | | | | | | | encrypted | | | frame | | | | | | | | +-------+-------+ | | | generic RTP packetize | | | +----------------------+--------.....--------+ | | | | V V V V +---------------+ +---------------+ +---------------+ | SFrame header | | | | | +---------------+ | | | | | | | payload 2/N | ... | payload N/N | | payload 1/N | | | | | | | | | | | +---------------+ +---------------+ +---------------+ Figure 10: Encryption flow with per-frame encryption for RTP Appendix D. Test Vectors This section provides a set of test vectors that implementations can use to verify that they correctly implement SFrame encryption and decryption. For each cipher suite, we provide: * [in] The base_key value (hex encoded) Omara, et al. Expires 11 January 2024 [Page 34] Internet-Draft SFrame July 2023 * [out] The secret, key, and salt values derived from the base_key (hex encoded) * A plaintext value that is encrypted in the following encryption cases * A sequence of encryption cases, including: - [in] The KID and CTR values to be included in the header - [out] The resulting encoded header (hex encoded) - [out] The nonce computed from the salt and CTR values - The ciphertext resulting from encrypting the plaintext with these parameters (hex encoded) An implementation should reproduce the output values given the input values: * An implementation should be able to encrypt with the input values and the plaintext to produce the ciphertext. * An implementation must be able to decrypt with the input values and the ciphertext to generate the plaintext. Line breaks and whitespace within values are inserted to conform to the width requirements of the RFC format. They should be removed before use. These test vectors are also available in JSON format at [TestVectors]. D.1. AES_CTR_128_HMAC_SHA256_4 CipherSuite: 0x01 Base Key: 101112131415161718191a1b1c1d1e1f Key: 343d3290f5c0b936415bea9a43c6f5a2 Salt: 42d662fbad5cd81eb3aad79a Plaintext: 46726f6d2068656176656e6c79206861 726d6f6e79202f2f205468697320756e 6976657273616c206672616d65206265 67616e Omara, et al. Expires 11 January 2024 [Page 35] Internet-Draft SFrame July 2023 KID: 0x7 CTR: 0x0 Header: 0700 Nonce: 42d662fbad5cd81eb3aad79a Ciphertext: 0700c5095af9dbbbed6a952de114ea7b 42768509f1ffc9749abb1e95bf4514d8 d82a0eef4b5ecac16fa193977fa1aa1c 9fa5c7e730b934669c KID: 0x7 CTR: 0x1 Header: 0701 Nonce: 42d662fbad5cd81eb3aad79b Ciphertext: 0701559e262525382885c6c93be8f61a 9064db2dd1e1e96ab1dbd829ca4af4f4 5f2b97a4889217a3f8a2159fb8201b7d 71db01702b9caf8df6 KID: 0x7 CTR: 0x2 Header: 0702 Nonce: 42d662fbad5cd81eb3aad798 Ciphertext: 07020a8f21e052eaa09e50da0a909d15 6cc55b9ef2f2abbcca765f7af3cfb1af 234e3eac1dbc376631c83cf1ff1f8ab3 39dbc41044742c668d KID: 0xf CTR: 0xaa Header: 080faa Nonce: 42d662fbad5cd81eb3aad730 Ciphertext: 080faa9c65aa5b167873f25827f17bc3 4879a4aaa6b38dd9584472e1849d5da5 1555f288d08f03166a5f26af01794006 255c88b589861e2f8e3e KID: 0x1ff CTR: 0xaa Header: 0901ffaa Nonce: 42d662fbad5cd81eb3aad730 Ciphertext: 0901ffaa9c65aa5b167873f25827f17b c34879a4aaa6b38dd9584472e1849d5d a51555f288d08f03166a5f26af017940 06255c88b58986ca1ead10 Omara, et al. Expires 11 January 2024 [Page 36] Internet-Draft SFrame July 2023 KID: 0x1ff CTR: 0xaaaa Header: 1901ffaaaa Nonce: 42d662fbad5cd81eb3aa7d30 Ciphertext: 1901ffaaaa990cbeb4ae2e3a76be8bb9 54b62591e791d0fa53c0553bc1d1e021 d270b1a10688cd89195203b019789253 73b04f9c08c3a4e563e2f6b9 KID: 0xffffffffffffff CTR: 0xffffffffffffff Header: 6effffffffffffffffffffffffffff Nonce: 42d662fbada327e14c552865 Ciphertext: 6effffffffffffffffffffffffffff41 2c43c8077c286f7df3dd9988d1bd033f 1067493e09421e5bfc363e50a3c803b4 da9239514cb924dbcb5f33e33112083e 99108de2ecd6 D.2. AES_CTR_128_HMAC_SHA256_8 CipherSuite: 0x02 Base Key: 202122232425262728292a2b2c2d2e2f Key: 3fce747d505e46ec9b92d9f58ee7a5d4 Salt: 77fbf5f1d82c73f6d2b353c9 Plaintext: 46726f6d2068656176656e6c79206861 726d6f6e79202f2f205468697320756e 6976657273616c206672616d65206265 67616e KID: 0x7 CTR: 0x0 Header: 0700 Nonce: 77fbf5f1d82c73f6d2b353c9 Ciphertext: 07009d89e5753e06edf3025f1ccd70b0 95ebaf10c250e11da740f50f57b6ce86 0d7321dfa49688a2cd6c6d9a71ae9d5c 14ad0978efdd719a7f18c48f07 KID: 0x7 CTR: 0x1 Header: 0701 Nonce: 77fbf5f1d82c73f6d2b353c8 Ciphertext: 0701becd2e9d10e3eed586491b3e0ece dba89407ae2151787c5117b55707d6b8 a0754f4dc937e30ebdf7cafbd3769d65 85d7991b1a6bd31e8bddb1adec Omara, et al. Expires 11 January 2024 [Page 37] Internet-Draft SFrame July 2023 KID: 0x7 CTR: 0x2 Header: 0702 Nonce: 77fbf5f1d82c73f6d2b353cb Ciphertext: 070298508be6b16d034f15b504ced45a 86d1bb43ed7cd3a62bf25557d1b082b0 4e8e6ba6fe76160835dd8953e1be9640 c988627ea447127ae4c103eabd KID: 0xf CTR: 0xaa Header: 080faa Nonce: 77fbf5f1d82c73f6d2b35363 Ciphertext: 080faae7eec4b0556ddfb8068998351c d670ce95f0ce9cd4c6dca2eeee73fb14 d20a0d0fd487337ed43fa7f98dad0995 b8b870325aa35a105af9b1004b22 KID: 0x1ff CTR: 0xaa Header: 0901ffaa Nonce: 77fbf5f1d82c73f6d2b35363 Ciphertext: 0901ffaae7eec4b0556ddfb806899835 1cd670ce95f0ce9cd4c6dca2eeee73fb 14d20a0d0fd487337ed43fa7f98dad09 95b8b870325aa3437cce05a6e67ee8 KID: 0x1ff CTR: 0xaaaa Header: 1901ffaaaa Nonce: 77fbf5f1d82c73f6d2b3f963 Ciphertext: 1901ffaaaa8c1789aa0abcd6abc27006 aae4df5cba4ba07f8113080e9726baac d16c18539974a6204a36b9dc3dcd36ed 9ab48e590d95d4adfb4290f4cb1ba184 KID: 0xffffffffffffff CTR: 0xffffffffffffff Header: 6effffffffffffffffffffffffffff Nonce: 77fbf5f1d8d38c092d4cac36 Ciphertext: 6effffffffffffffffffffffffffffa9 bc6c7edde0fdfd13255a5b145c5ce84d b8f8960858eb998b8ea8f3e770160150 813c5806441b64251bdd2be9e8cec138 6b6f8e3b1982bcd16c84 D.3. AES_GCM_128_SHA256 Omara, et al. Expires 11 January 2024 [Page 38] Internet-Draft SFrame July 2023 CipherSuite: 0x03 Base Key: 303132333435363738393a3b3c3d3e3f Key: 2ea2e8163ff56c0613e6fa9f20a213da Salt: a80478b3f6fba19983d540d5 Plaintext: 46726f6d2068656176656e6c79206861 726d6f6e79202f2f205468697320756e 6976657273616c206672616d65206265 67616e KID: 0x7 CTR: 0x0 Header: 0700 Nonce: a80478b3f6fba19983d540d5 Ciphertext: 07000e426255e47ed70dd7d15d69d759 bf459032ca15f5e8b2a91e7d348aa7c1 86d403f620801c495b1717a35097411a a97cbb1406afd9f4e5215b46e4a39dc4 0c27fd6bc7 KID: 0x7 CTR: 0x1 Header: 0701 Nonce: a80478b3f6fba19983d540d4 Ciphertext: 070103bbafa34ada8a6b9f2066bc34a1 959d87384c9f4b1ce34fed58e938bde1 43393910b1aeb55b48d91d5b0db3ea67 e3d0e02b84e4cf8ecf81f8386f86cda4 8fcd754191 KID: 0x7 CTR: 0x2 Header: 0702 Nonce: a80478b3f6fba19983d540d7 Ciphertext: 070258d58adebd8bf6f3cc0c1fcacf34 ba4d7a763b2683fe302a57f1be7f2a27 4bf81b2236995fec1203cadb146cd402 e1c52d5e6aceaa5252822d25acd0ce4b a14e31fa24 KID: 0xf CTR: 0xaa Header: 080faa Nonce: a80478b3f6fba19983d5407f Ciphertext: 080faad0b1743bf5248f90869c945636 6d55724d16bbe08060875815565e90b1 14f9ccbdba192422b33848a1ae1e3bd2 66a001b2f5bb64c0f1216bba82ab24b1 ebd677c2ca29 Omara, et al. Expires 11 January 2024 [Page 39] Internet-Draft SFrame July 2023 KID: 0x1ff CTR: 0xaa Header: 0901ffaa Nonce: a80478b3f6fba19983d5407f Ciphertext: 0901ffaad0b1743bf5248f90869c9456 366d55724d16bbe08060875815565e90 b114f9ccbdba192422b33848a1ae1e3b d266a001b2f5bb8c718170432b6f922c 1f0fb307514a0e KID: 0x1ff CTR: 0xaaaa Header: 1901ffaaaa Nonce: a80478b3f6fba19983d5ea7f Ciphertext: 1901ffaaaa9de65e21e4f1ca2247b879 43c03c5cb7b182090e93d508dcfb76e0 8174c6397356e682d2eaddabc0b3c101 8d2c13c3570f61c185789dff3cb4469c f471ca71ceb025a5 KID: 0xffffffffffffff CTR: 0xffffffffffffff Header: 6effffffffffffffffffffffffffff Nonce: a80478b3f6045e667c2abf2a Ciphertext: 6effffffffffffffffffffffffffff09 981bdcdad80e380b6f74cf6afdbce946 839bedadd57578bfcd809dbcea535546 cc24660613d2761adea852155785011e 633522450f95fd9f8ccc96fa3de9a247 cfd3 D.4. AES_GCM_256_SHA512 CipherSuite: 0x04 Base Key: 404142434445464748494a4b4c4d4e4f 505152535455565758595a5b5c5d5e5f Key: 436774b0b5ae45633d96547f8f3cb06c 8e6628eff2e4255b5c4d77e721aa3355 Salt: 31ed26f90a072e6aee646298 Plaintext: 46726f6d2068656176656e6c79206861 726d6f6e79202f2f205468697320756e 6976657273616c206672616d65206265 67616e Omara, et al. Expires 11 January 2024 [Page 40] Internet-Draft SFrame July 2023 KID: 0x7 CTR: 0x0 Header: 0700 Nonce: 31ed26f90a072e6aee646298 Ciphertext: 0700f3e297c1e95207710bd31ccc4ba3 96fbef7b257440bde638ff0f3c891154 0136df61b26220249d6c432c245ae8d5 5ef45bfccf3afe18dd36d64d8e341653 e1a0f10be2 KID: 0x7 CTR: 0x1 Header: 0701 Nonce: 31ed26f90a072e6aee646299 Ciphertext: 070193268b0bf030071bff443bb6b447 1bdfb1cc81bc9625f4697b0336ff4665 d15f152f02169448d8a967fb06359a87 d2145398de044ee92acfcc27b7a98f38 712b60c28c KID: 0x7 CTR: 0x2 Header: 0702 Nonce: 31ed26f90a072e6aee64629a Ciphertext: 0702649691ba27c4c01a41280fba4657 c03fa7fe21c8f5c862e9094227c3ca3e c0d9468b1a2cb060ff0978f25a24e6b1 06f5a6e10534b69d975605f31534caea 88b33b455a KID: 0xf CTR: 0xaa Header: 080faa Nonce: 31ed26f90a072e6aee646232 Ciphertext: 080faa2858c10b5ddd231c1f26819490 521678603a050448d563c503b1fd890d 02ead01d754f074ecb6f32da9b2f3859 f380b4f47d4ed539d6103e61580a82c0 14b28eb48b4a Omara, et al. Expires 11 January 2024 [Page 41] Internet-Draft SFrame July 2023 KID: 0x1ff CTR: 0xaa Header: 0901ffaa Nonce: 31ed26f90a072e6aee646232 Ciphertext: 0901ffaa2858c10b5ddd231c1f268194 90521678603a050448d563c503b1fd89 0d02ead01d754f074ecb6f32da9b2f38 59f380b4f47d4e32c565b3b3fa20fc7e cff21a1cee3eec KID: 0x1ff CTR: 0xaaaa Header: 1901ffaaaa Nonce: 31ed26f90a072e6aee64c832 Ciphertext: 1901ffaaaad9bc6a258a07d210a814d5 45eca70321c0e87498ada6e5c708b7ea d162ffcf4fbaba1eb82650590a87122b 4d95fe36bd88b278994922fe5c09f14c 728521333297f84f KID: 0xffffffffffffff CTR: 0xffffffffffffff Header: 6effffffffffffffffffffffffffff Nonce: 31ed26f90af8d195119b9d67 Ciphertext: 6effffffffffffffffffffffffffffaf 480d4779ce0c02b5137ee6a61e026c04 ac999cb0c97319feceeb258d58df23bc e14979e5c67a431777b34498062e72f9 39ca4acb471bad80259bb44f78a15248 7e67 Authors' Addresses Emad Omara Apple Email: eomara@apple.com Justin Uberti Google Email: juberti@google.com Sergio Garcia Murillo CoSMo Software Email: sergio.garcia.murillo@cosmosoftware.io Omara, et al. Expires 11 January 2024 [Page 42] Internet-Draft SFrame July 2023 Richard L. Barnes (editor) Cisco Email: rlb@ipv.sx Youenn Fablet Apple Email: youenn@apple.com Omara, et al. Expires 11 January 2024 [Page 43]