Binary Structured HTTP HeadersFastlymnot@mnot.nethttps://www.mnot.net/
General
Internet-DraftThis specification defines a binary serialisation of Structured Headers for HTTP, along with a negotiation mechanism for its use in HTTP/2. It also defines how to use Structured Headers for many existing headers – thereby “backporting” them – when supported by two peers.RFC EDITOR: please remove this section before publicationThe issues list for this draft can be found at https://github.com/mnot/I-D/labels/binary-structured-headers.The most recent (often, unpublished) draft is at https://mnot.github.io/I-D/binary-structured-headers/.Recent changes are listed at https://github.com/mnot/I-D/commits/gh-pages/binary-structured-headers.See also the draft’s current status in the IETF datatracker, at
https://datatracker.ietf.org/doc/draft-nottingham-binary-structured-headers/.HTTP messages often pass through several systems – clients, intermediaries, servers, and subsystems of each – that parse and process their header and trailer fields. This repeated parsing (and often re-serialisation) adds latency and consumes CPU, energy, and other resources.Structured Headers for HTTP offers a set of data types that new headers can combine to express their semantics. This specification defines a binary serialisation of those structures in , and specifies its use in HTTP/2 – specifically, as part of HPACK Literal Header Field Representations () – in . defines how to use Structured Headers for many existing headers when supported by two peers.The primary goal of this specification are to reduce parsing overhead and associated costs, as compared to the textual representation of Structured Headers. A secondary goal is a more compact wire format in common situations. An additional goal is to enable future work on more granular header compression mechanisms.The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”,
“RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as
described in BCP 14 when, and only when, they appear in all capitals, as
shown here.This section defines a binary serialisation for the Structured Header Types defined in .The types permissable as the top-level of Structured Header field values – Dictionary, List, and Item – are defined in terms of a Binary Literal Representation (), which is a replacement for the String Literal Representation in .Binary representations of the remaining types are defined in .The Binary Literal Representation is a replacement for the String Literal Representation defined in , Section 5.2, for use in BINHEADERS frames ().A binary literal representation contains the following fields:Type: Four bits indicating the type of the payload.PLength: The number of octets used to represent the payload, encoded as per , Section 5.1, with a 4-bit prefix.Payload Data: The payload, as per below.The following payload types are defined:List values (type=0x1) have a payload consisting of a stream of Binary Structured Types representing the members of the list. Members that are Items are represented as per ; members that are inner-lists are represented as per .If any member cannot be represented, the entire field value MUST be serialised as a String Literal ().Dictionary values (type=0x2) have a payload consisting of a stream of members.Each member is represented by a key length, followed by that many bytes of the member-name, followed by Binary Structured Types representing the member-value.A parameter’s fields are:KL: The number of octets used to represent the member-name, encoded as per , Section 5.1, with a 8-bit prefixmember-name: KL octets of the member-namemember-value: One or more Binary Structure Typesmember-values that are Items are represented as per ; member-values that are inner-lists are represented as per .If any member cannot be represented, the entire field value MUST be serialised as a String Literal ().Item values (type=0x3) have a payload consisting of Binary Structured Types, as described in .String Literals (type=0x4) are the string value of a header field; they are used to carry header field values that are not Binary Structured Headers, and may not be Structured Headers at all. As such, their semantics are that of String Literal Representations in , Section 5.2.Their payload is the octets of the field value.ISSUE: use Huffman coding? https://github.com/mnot/I-D/issues/305Every Binary Structured Type starts with a 5-bit type field that identifies the format of its payload:Some Binary Structured Types contain padding bits; senders MUST set padding bits to 0; recipients MUST ignore their values.The Inner List data type (type=0x1) has a payload in the format:Its fields are:L: The number of octets used to represent the members, encoded as per , Section 5.1, with a 3-bit prefixMembers: L octetsEach member of the list will be represented as an Item (); if any member cannot, the entire field value will be serialised as a String Literal ().The inner list’s parameters, if present, are serialised in a following Parameter type (); they do not form part of the payload of the inner list.The Parameters data type (type=0x2) has a payload in the format:Its fields are:L: The number of octets used to represent the token, encoded as per , Section 5.1, with a 3-bit prefixParameters: L octetsEach parameter is represented by key length, followed by that many bytes of the parameter-name, followed by a Binary Structured Type representing the parameter-value.A parameter’s fields are:KL: The number of octets used to represent the parameter-name, encoded as per , Section 5.1, with a 8-bit prefixparameter-name: KL octets of the parameter-nameparameter-value: A Binary Structured type representing a bare item ()Parameter-values are bare items; that is, they MUST NOT have parameters themselves.If the parameters cannot be represented, the entire field value will be serialised as a String Literal ().Parameters are always associated with the Binary Structured Type that immediately preceded them. If parameters are not explicitly allowed on the preceding type, or there is no preceding type, it is an error.ISSUE: use Huffman coding for parameter-name? https://github.com/mnot/I-D/issues/305Individual Structured Header Items can be represented using the Binary Payload Types defined below.The item’s parameters, if present, are serialised in a following Parameter type (); they do not form part of the payload of the item.The Integer data type (type=0x3) has a payload in the format:Its fields are:S: sign bit; 0 is negative, 1 is positiveInteger: The integer, encoded as per , Section 5.1, with a 2-bit prefixThe Float data type (type=0x4) have a payload in the format:Its fields are:S: sign bit; 0 is negative, 1 is positiveInteger: The integer component, encoded as per , Section 5.1, with a 2-bit prefix.Fractional: The fractional component, encoded as per , Section 5.1, with a 8-bit prefix.The String data type (type=0x5) has a payload in the format:Its fields are:L: The number of octets used to represent the string, encoded as per , Section 5.1, with a 3-bit prefix.String: L octets.ISSUE: use Huffman coding? https://github.com/mnot/I-D/issues/305The Token data type (type=0x6) has a payload in the format:Its fields are:L: The number of octets used to represent the token, encoded as per , Section 5.1, with a 3-bit prefix.Token: L octets.ISSUE: use Huffman coding? https://github.com/mnot/I-D/issues/305The Byte Sequence data type (type=0x7) has a payload in the format:Its fields are:L: The number of octets used to represent the byte sequence, encoded as per , Section 5.1, with a 3-bit prefix.Byte Sequence: L octets.The Boolean data type (type=0x8) has a payload of two bits:If B is 0, the value is False; if B is 1, the value is True. X is padding.When both peers on a connection support this specification, they can take advantage of that knowledge to serialise headers that they know to be Structured Headers (or compatible with them; see ).Peers advertise and discover this support using a HTTP/2 setting defined in , and convey Binary Structured Headers in a frame type defined in .Advertising support for Binary Structured Headers is accomplished using a HTTP/2 setting, SETTINGS_BINARY_STRUCTURED_HEADERS (0xTODO).Receiving SETTINGS_BINARY_STRUCTURED_HEADERS from a peer indicates that:The peer supports the Binary Structured Types defined in .The peer will process the BINHEADERS frames as defined in .When a downstream consumer does not likewise support that encoding, the peer will transform them into HEADERS frames (if the peer is HTTP/2) or a form it will understand (e.g., the textual representation of Structured Headers data types defined in ).The peer will likewise transform all fields defined as Aliased Fields () into their non-aliased forms as necessary.The default value of SETTINGS_BINARY_STRUCTURED_HEADERS is 0. Future extensions to Structured Headers might use it to indicate support for new types.When a peer has indicated that it supports this specification {#setting}, a sender can send the BINHEADERS Frame Type (0xTODO).The BINHEADERS Frame Type behaves and is represented exactly as a HEADERS Frame type (, Section 6.2), with one exception; instead of using the String Literal Representation defined in , Section 5.2, it uses the Binary Literal Representation defined in .Fields that are Structured Headers can have their values represented using the Binary Literal Representation corresponding to that header’s top-level type – List, Dictionary, or Item; their values will then be serialised as a stream of Binary Structured Types.Additionally, any field (including those defined as Structured Headers) can be serialised as a String Literal (), which accommodates headers that are not defined as Structured Headers, not valid Structured Headers, or that the sending implementation does not wish to send as Binary Structured Types for some other reason.Note that Field Names are always serialised as String Literals ().This means that a BINHEADERS frame can be converted to a HEADERS frame by converting the field values to the string representations of the various Structured Headers Types, and String Literals () to their string counterparts.Conversely, a HEADERS frame can be converted to a BINHEADERS frame by encoding all of the Literal field values as Binary Structured Types. In this case, the header types used are informed by the implementations knowledge of the individual header field semantics; see . Those which it cannot (do to either lack of knowledge or an error) or does not wish to convert into Structured Headers are conveyed in BINHEADERS as String Literals ().Field values are stored in the HPACK dynamic table without Huffman encoding, although specific Binary Structured Types might specify the use of such encodings.Note that BINHEADERS and HEADERS frames MAY be mixed on the same connection, depending on the requirements of the sender. Also, note that only the field values are encoded as Binary Structured Types; field names are encoded as they are in HPACK.Any header field can potentially be parsed as a Structured Header according to the algorithms in and serialised as a Binary Structured Header. However, many cannot, so optimistically parsing them can be expensive.This section identifies fields that will usually succeed in , and those that can be mapped into Structured Headers by using an alias field name in .The following HTTP field names can have their values parsed as Structured Headers according to the algorithms in , and thus can usually be serialised using the corresponding Binary Structured Types.When one of these fields’ values cannot be represented using Structured Types, its value can instead be represented as a String Literal ().Accept - ListAccept-Encoding - ListAccept-Language - ListAccept-Patch - ListAccept-Ranges - ListAccess-Control-Allow-Credentials - ItemAccess-Control-Allow-Headers - ListAccess-Control-Allow-Methods - ListAccess-Control-Allow-Origin - ItemAccess-Control-Max-Age - ItemAccess-Control-Request-Headers - ListAccess-Control-Request-Method - ItemAge - ItemAllow - ListALPN - ListAlt-Svc - DictionaryAlt-Used - ItemCache-Control - DictionaryConnection - ListContent-Encoding - ListContent-Language - ListContent-Length - ItemContent-Type - ItemExpect - ItemExpect-CT - DictionaryForwarded - DictionaryHost - ItemKeep-Alive - DictionaryOrigin - ItemPragma - DictionaryPrefer - DictionaryPreference-Applied - DictionaryRetry-After - Item (see caveat below)Surrogate-Control - DictionaryTE - ListTrailer - ListTransfer-Encoding - ListVary - ListX-Content-Type-Options - ItemX-XSS-Protection - ListNote that only the delta-seconds form of Retry-After is supported; a Retry-After value containing a http-date will need to be either converted into delta-seconds or serialised as a String Literal ().The following HTTP field names can have their values represented in Structured headers by mapping them into its data types and then serialising the resulting Structured Header using an alternative field name.For example, the Date HTTP header field carries a http-date, which is a string representing a date:Its value is more efficiently represented as an integer number of delta seconds from the Unix epoch (00:00:00 UTC on 1 January 1970, minus leap seconds). Thus, the example above would be represented in (non-binary) Structured headers as:As with directly represented fields, if the intended value of an aliased field cannot be represented using Structured Types successfully, its value can instead be represented as a String Literal ().Note that senders MUST know that the next-hop recipient understands these fields (typically, using the negotiation mechanism defined in ) before using them. Likewise, recipients MUST transform them back to their unaliased form before forwarding the message to a peer or other consuming components that do not have this capability.Each field name listed below indicates a replacement field name and a way to map its value to Structured Headers.ISSUE: using separate names assures that the different syntax doesn’t “leak” into normal headers, but it isn’t strictly necessary if implementations always convert back to the correct form when giving it to peers or consuming software that doesn’t understand this. https://github.com/mnot/I-D/issues/307The following field names (paired with their replacement field names) have values that can be represented in Binary Structured Headers by considering their payload a string.Content-Location - SH-Content-LocationLocation - SH-LocationReferer - SH-RefererFor example, a (non-binary) Location:TOOD: list of strings, one for each path segment, to allow better compression in the future?The following field names (paired with their replacement field names) have values that can be represented in Binary Structured Headers by parsing their payload according to , Section 7.1.1.1, and representing the result as an integer number of seconds delta from the Unix Epoch (00:00:00 UTC on 1 January 1970, minus leap seconds).Date - SH-DateExpires - SH-ExpiresIf-Modified-Since - SH-IMSIf-Unmodified-Since - SH-IUSLast-Modified - SH-LMFor example, a (non-binary) Expires:The following field names (paired with their replacement field names) have values that can be represented in Binary Structured Headers by representing the entity-tag as a string, and the weakness flag as a boolean “w” parameter on it, where true indicates that the entity-tag is weak; if 0 or unset, the entity-tag is strong.ETag - SH-ETagFor example, a (non-Binary) ETag:If-None-Match is a list of the structure described above.If-None-Match - SH-INMFor example, a (non-binary) If-None-Match:The field-value of the Link header field can be represented in Binary Structured Headers by representing the URI-Reference as a string, and link-param as parameters.Link: SH-LinkFor example, a (non-binary) Link:The field-value of the Cookie and Set-Cookie fields can be represented in Binary Structured Headers as a List with parameters and a Dictionary, respectively. The serialisation is almost identical, except that the Expires parameter is always a string (as it can contain a comma), multiple cookie-strings can appear in Set-Cookie, and cookie-pairs are delimited in Cookie by a comma, rather than a semicolon.Set-Cookie: SH-Set-Cookie
Cookie: SH-CookieISSUE: explicitly convert Expires to an integer? https://github.com/mnot/I-D/issues/308ISSUE: dictionary keys cannot contain UC alpha. https://github.com/mnot/I-D/issues/312ISSUE: explicitly allow non-string content. https://github.com/mnot/I-D/issues/313ISSUE: todoAs is so often the case, having alternative representations of data brings the potential for security weaknesses, when attackers exploit the differences between those representations and their handling.One mitigation to this risk is the strictness of parsing for both non-binary and binary Structured Headers data types, along with the “escape valve” of String Literals (). Therefore, implementation divergence from this strictness can have security impact.Key words for use in RFCs to Indicate Requirement LevelsIn many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.Structured Headers for HTTPThis document describes a set of data types and associated algorithms that are intended to make it easier and safer to define and handle HTTP header fields. It is intended for use by specifications of new HTTP header fields that wish to use a common syntax that is more restrictive than traditional HTTP field values.HPACK: Header Compression for HTTP/2This specification defines HPACK, a compression format for efficiently representing HTTP header fields, to be used in HTTP/2.Ambiguity of Uppercase vs Lowercase in RFC 2119 Key WordsRFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.Hypertext Transfer Protocol Version 2 (HTTP/2)This specification describes an optimized expression of the semantics of the Hypertext Transfer Protocol (HTTP), referred to as HTTP version 2 (HTTP/2). HTTP/2 enables a more efficient use of network resources and a reduced perception of latency by introducing header field compression and allowing multiple concurrent exchanges on the same connection. It also introduces unsolicited push of representations from servers to clients.This specification is an alternative to, but does not obsolete, the HTTP/1.1 message syntax. HTTP's existing semantics remain unchanged.Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and RoutingThe Hypertext Transfer Protocol (HTTP) is a stateless application-level protocol for distributed, collaborative, hypertext information systems. This document provides an overview of HTTP architecture and its associated terminology, defines the "http" and "https" Uniform Resource Identifier (URI) schemes, defines the HTTP/1.1 message syntax and parsing requirements, and describes related security concerns for implementations.Web LinkingThis specification defines a model for the relationships between resources on the Web ("links") and the type of those relationships ("link relation types").It also defines the serialisation of such links in HTTP headers with the Link header field.HTTP State Management MechanismThis document defines the HTTP Cookie and Set-Cookie header fields. These header fields can be used by HTTP servers to store state (called cookies) at HTTP user agents, letting the servers maintain a stateful session over the mostly stateless HTTP protocol. Although cookies have many historical infelicities that degrade their security and privacy, the Cookie and Set-Cookie header fields are widely used on the Internet. This document obsoletes RFC 2965. [STANDARDS-TRACK]RFC EDITOR: please remove this section before publicationTo help guide decisions about Directly Represented Fields, the HTTP response headers captured by the HTTP Archive https://httparchive.org in February 2020, representing more than 350,000,000 HTTP exchanges, were parsed as Structured Headers using the types listed in , with the indicated number of successful header instances, failures, and the resulting failure rate:accept: 9,201 / 10 = 0.109%accept-encoding: 34,158 / 74 = 0.216%accept-language: 381,037 / 512 = 0.134%accept-patch: 5 / 0 = 0.000%accept-ranges: 197,759,320 / 3,960 = 0.002%access-control-allow-credentials: 16,687,349 / 7,357 = 0.044%access-control-allow-headers: 12,979,869 / 14,960 = 0.115%access-control-allow-methods: 15,469,948 / 28,203 = 0.182%access-control-allow-origin: 105,326,437 / 264,278 = 0.250%access-control-max-age: 5,287,263 / 7,749 = 0.146%access-control-request-headers: 39,340 / 624 = 1.561%access-control-request-method: 146,566 / 13,822 = 8.618%age: 71,292,543 / 168,572 = 0.236%allow: 351,707 / 1,886 = 0.533%alt-svc: 19,777,530 / 15,682,026 = 44.225%cache-control: 264,666,876 / 946,434 = 0.356%connection: 105,884,722 / 2,915 = 0.003%content-encoding: 139,812,089 / 379 = 0.000%content-language: 2,368,912 / 728 = 0.031%content-length: 296,649,810 / 787,897 = 0.265%content-type: 341,948,525 / 794,864 = 0.232%expect: 0 / 47 = 100.000%expect-ct: 26,573,905 / 29,117 = 0.109%forwarded: 119 / 35 = 22.727%host: 25,335 / 1,441 = 5.382%keep-alive: 43,063,257 / 796 = 0.002%origin: 24,336 / 1,539 = 5.948%pragma: 46,826,446 / 81,707 = 0.174%preference-applied: 57 / 0 = 0.000%retry-after: 605,926 / 6,194 = 1.012%strict-transport-security: 26,826,043 / 35,266,676 = 56.797%surrogate-control: 121,124 / 861 = 0.706%te: 1 / 0 = 0.000%trailer: 282 / 0 = 0.000%transfer-encoding: 13,953,547 / 0 = 0.000%vary: 150,802,211 / 41,317 = 0.027%x-content-type-options: 99,982,040 / 203,824 = 0.203%x-xss-protection: 79,878,780 / 362,984 = 0.452%This data set focuses on response headers, although some request headers are present (because, the Web).alt-svc has a high failure rate because some currently-used ALPN tokens (e.g., h3-Q43) do not conform to key’s syntax. Since the final version of HTTP/3 will use the h3 token, this shouldn’t be a long-term issue, although future tokens may again violate this assumption.forwarded has a high failure rate because many senders use the unquoted form for IP addresses, which makes integer parsing fail; e.g., for=192.168.1.1.strict-transport-security has a high failure rate because the includeSubDomains flag does not conform to the key syntax.The most common problem causing failure for many other headers is duplicated values; e.g., a Content-Length with 2, 2.The top ten header fields in that data set that were not parsed as Directly Represented Fields are:date: 354,682,923server: 311,299,092last-modified: 263,851,521expires: 199,985,746status: 192,439,616etag: 172,071,630timing-allow-origin: 64,413,748x-cache: 41,743,978p3p: 39,495,307x-frame-options: 34,041,316