Binary Representation of HTTP Messages

Binary Representation of HTTP Messages Mozilla

mt@lowentropy.net

Cloudflare

caw@heapingbits.net

ART HTTPBIS This document defines a binary format for representing HTTP messages. Discussion Venues Discussion of this document takes place on the HTTP Working Group mailing list (http@ietf.org), which is archived at . Source for this draft and an issue tracker can be found at .

Introduction This document defines a simple format for representing an HTTP message (), either request or response. This allows for the encoding of HTTP messages that can be conveyed outside of an HTTP protocol. This enables the transformation of entire messages, including the application of authenticated encryption. This format is informed by the framing structure of HTTP/2 () and HTTP/3 (). In comparison, this format simpler by virtue of not including either header compression (, ) or a generic framing layer. This format provides an alternative to the message/http content type defined in . A binary format permits more efficient encoding and processing of messages. A binary format also reduces exposure to security problems related to processing of HTTP messages. Two modes for encoding are described:

a known-length encoding includes length prefixes for all major message components; and
an indefinite-length encoding enables efficient generation of messages where lengths are not known when encoding starts.

Conventions and Definitions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here. This document uses terminology from HTTP () and notation from QUIC ().

Format An HTTP message is split into five sections, following the structure defined in Section 6 of :

Framing indicator. This format uses a single integer to describe framing, which describes whether the message is a request or response and how subsequent sections are formatted; see .
For a response, any number of interim responses, each consisting of an informational status code and header section.
Control data. For a request, this contains the request method and target. For a response, this contains the status code.
Header section. This contains zero or more header fields.
Content. This is a sequence of zero or more bytes.
Trailer section. This contains zero or more trailer fields.

All lengths and numeric values are encoded using the variable-length integer encoding from .

Known Length Messages A message that has a known length at the time of construction uses the format shown in .

Known-Length Message That is, a known-length message consists of a framing indicator, a block of control data that is formatted according to the value of the framing indicator, a header section with a length prefix, binary content with a length prefix, and a trailer section with a length prefix. Response messages that contain informational status codes result in a different structure; see . Fields in the header and trailer sections consist of a length-prefixed name and length-prefixed value. Both name and value are sequences of bytes that cannot be zero length. The format allows for the message to be truncated before any of the length prefixes that precede the field sections or content. This reduces the overall message size. A message that is truncated at any other point is invalid; see . The variable-length integer encoding means that there is a limit of 2^62-1 bytes for each field section and the message content.

Indeterminate Length Messages A message that is constructed without encoding a known length for each section uses the format shown in :

Indeterminate-Length Message That is, an indeterminate length consists of a framing indicator, a block of control data that is formatted according to the value of the framing indicator, a header section that is terminated by a zero value, any number of non-zero-length chunks of binary content, a zero value, and a trailer section that is terminated by a zero value. Response messages that contain informational status codes result in a different structure; see . Indeterminate-length messages can be truncated in a similar way as known-length messages. Truncation occurs after the control data, or after the Content Terminator field that ends a field section or sequence of content chunks. A message that is truncated at any other point is invalid; see . Indeterminate-length messages use the same encoding for field lines as known-length messages; see .

Framing Indicator The start of each is a framing indicator that is a single integer that describes the structure of the subsequent sections. The framing indicator can take just four values:

A value of 0 describes a request of known length.
A value of 1 describes a response of known length.
A value of 2 describes a request of indeterminate length.
A value of 3 describes a response of indeterminate length.

Other values cause the message to be invalid; see .

Request Control Data The control data for a request message includes four values that correspond to the values of the :method, :scheme, :authority, and :path pseudo-header fields described in HTTP/2 (Section 8.1.2.3 of ). These fields are encoded, each with a length prefix, in the order listed. The rules in Section 8.1.2.3 of for constructing pseudo-header fields apply to the construction of these values. However, where the :authority pseudo-header field might be omitted in HTTP/2, a zero-length value is encoded instead. The format of request control data is shown in .

Format of Request Control Data

Response Control Data The control data for a request message includes a single field that corresponds to the :status pseudo-header field in HTTP/2 . This field is encoded as a single variable length integer, not a decimal string. The format of final response control data is shown in .

Format of Final Response Control Data

Informational Status Codes This format supports informational status codes (see Section 15.2 of ). Responses that include information status codes are encoded by repeating the response control data and associated header section until the final status code is encoded. The format of the informational response control data is shown in .

Format of Informational Response Control Data A response message can include any number of informational responses. If the response control data includes an informational status code (that is, a value between 100 and 199 inclusive), the control data is followed by a header section (encoded with known- or indeterminate- length according to the framing indicator). After the header section, another response control data block follows.

Header and Trailer Field Lines Header and trailer sections consist of zero or more field lines; see Section 5 of . The format of a field section depends on whether the message is known- or intermediate-length. Each field line includes a name and a value. Both the name and value are non-zero length sequences of bytes. The format of a field line is shown in .

Format of a Field Line For field names, byte values that are not permitted in an HTTP field name cause the message to be invalid; see Section 5.1 of for a definition of what is valid and for handling of invalid messages. In addition, values from the ASCII uppercase range (0x41-0x5a inclusive) MUST be translated to lowercase values (0x61-0x7a) when generating or forwarding messages. A recipient MUST treat a message containing field names with bytes in the range 0x41-0x5a as invalid; see . For field values, byte values that are not permitted in an HTTP field value cause the message to be invalid; see Section 5.5 of for a definition of valid values. The same field name can be repeated in multiple field lines; see Section 5.2 of for the semantics of repeated field names and rules for combining values. Like HTTP/2, this format has an exception for the combination of multiple instances of the Cookie field. Instances of fields with the ASCII-encoded value of cookie are combined using a semicolon octet (0x3b) rather than a comma; see Section 8.1.2.5 of . This format provides fixed locations for content that would be carried in HTTP/2 pseudo-fields. Therefore, there is no need to include field lines containing a name of :method, :scheme, :authority, :path, or :status. Fields that contain one of these names cause the message to be invalid; see . Pseudo-fields that are defined by protocol extensions MAY be included, however field lines containing pseudo-fields MUST precede other field lines.

Content The content of messages is a sequence of bytes of any length. Though a known-length message has a limit, this limit is large enough that it is unlikely to be a practical limitation. There is there is no limit to an indeterminate length message. Omitting content by truncating a message is only possible if the content is zero-length.

Invalid Messages This document describes a number of ways that a message can be invalid. Invalid messages MUST NOT be processed except to log an error and produce an error response. The format is designed to allow incremental processing. Implementations need to be aware of the possibility that an error might be detected after performing incremental processing.

Examples This section includes example requests and responses encoded in both known-length and indefinite-length forms.

Request Example The example HTTP/1.1 message in shows the content of a message/http. Valid HTTP/1.1 messages require lines terminated with CRLF (the two bytes 0x0a and 0x0d). For simplicity and consistenct, the content of these examples is limited to text, which also uses CRLF for line endings.

Sample HTTP Request This can be expressed as a binary message (type message/bhttp) using a known-length encoding as shown in hexadecimal in . view includes some of the text alongside to show that most of the content is not modified.

Known-Length Binary Encoding of Request This example shows that the Host header field is not replicated in the :authority field, as is required for ensuring that the request is reproduced accurately; see Section 8.1.2.3 of . The same message can be truncated with no effect on interpretation. In this case, the last two bytes - corresponding to content and a trailer section - can each be removed without altering the semantics of the message. The same message, encoded using an indefinite-length encoding is shown in . As the content of this message is empty, the difference in formats is negligible.

Indefinite-Length Binary Encoding of Request This indefinite-length encoding can be truncated by two bytes in the same way.

Response Example Response messages can contain interim (1xx) status codes as the message in shows. includes examples of informational status codes defined in and .

Sample HTTP Response ; rel=preload; as=style Link: ; rel=preload; as=script HTTP/1.1 200 OK Date: Mon, 27 Jul 2009 12:28:53 GMT Server: Apache Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT ETag: "34aa387-d-1568eb00" Accept-Ranges: bytes Content-Length: 51 Vary: Accept-Encoding Content-Type: text/plain Hello World! My content includes a trailing CRLF. ]]> As this is a longer example, only the indefinite-length encoding is shown in . Note here that the specific text used in the reason phrase is not retained by this encoding.

Binary Response including Interim Responses ; rel 3d707265 6c6f6164 3b206173 3d737479 =preload; as=sty 6c65046c 696e6b24 3c2f7363 72697074 le.link$; rel=preloa 643b2061 733d7363 72697074 0040c804 d; as=script.@.. 64617465 1d4d6f6e 2c203237 204a756c date.Mon, 27 Jul 20323030 39203132 3a32383a 35332047 2009 12:28:53 G 4d540673 65727665 72064170 61636865 MT.server.Apache 0d6c6173 742d6d6f 64696669 65641d57 .last-modified.W 65642c20 3232204a 756c2032 30303920 ed, 22 Jul 2009 31393a31 353a3536 20474d54 04657461 19:15:56 GMT.eta 67142233 34616133 38372d64 2d313536 g."34aa387-d-156 38656230 30220d61 63636570 742d7261 8eb00".accept-ra 6e676573 05627974 65730e63 6f6e7465 nges.bytes.conte 6e742d6c 656e6774 68023531 04766172 nt-length.51.var 790f4163 63657074 2d456e63 6f64696e y.Accept-Encodin 670c636f 6e74656e 742d7479 70650a74 g.content-type.t 6578742f 706c6169 6e003348 656c6c6f ext/plain.3Hello 20576f72 6c642120 4d792063 6f6e7465 World! My conte 6e742069 6e636c75 64657320 61207472 nt includes a tr 61696c69 6e672043 524c462e 0d0a0000 ailing CRLF..... ]]> A response that uses the chunked encoding (Section 7.1 of ) as shown for can be encoded by preserving chunk boundaries using indefinite-length encoding, which minimizes buffering needed to translate into the binary format. However, these boundaries do not need to be retained and any chunk extensions cannot be conveyed using the binary format.

Chunked Encoding Example shows this message using the known-length coding. Note that the transfer-encoding header field is removed.

Known-Length Encoding of Response

"message/bhttp" Media Type The message/http media type can be used to enclose a single HTTP request or response message, provided that it obeys the MIME restrictions for all "message" types regarding line length and encodings.

Type name:

message

Subtype name:

bhttp

Required parameters:

N/A

Optional parameters:

None

Encoding considerations:

only "8bit" or "binary" is permitted

Security considerations:

see

Interoperability considerations:

N/A

Published specification:

this specification

Applications that use this media type:

N/A

Fragment identifier considerations:

N/A

Additional information:

Magic number(s):: N/A
Deprecated alias names for this type:: N/A
File extension(s):: N/A
Macintosh file type code(s):: N/A

Person and email address to contact for further information:

see Authors' Addresses section

Intended usage:

COMMON

Restrictions on usage:

N/A

Author:

see Authors' Addresses section

Change controller:

IESG

Security Considerations Many of the considerations that apply to HTTP message handling apply to this format; see Section 17 of and Section 11 of for common issues in handling HTTP messages. Strict parsing of the format with no tolerance for errors can help avoid a number of attacks. However, implementations still need to be aware of the possibility of resource exhaustion attacks that might arise from receiving large messages, particularly those with large numbers of fields. The format is designed to allow for minimal state when translating for use with HTTP proper. However, producing a combined value for fields, which might be necessary for the Cookie field when translating this format (like HTTP/1.1 ), can require the commitment of resources. Implementations need to ensure that they aren't subject to resource exhaustion attack from a maliciously crafted message.

IANA Considerations Please add the "Media Types" registry at https://www.iana.org/assignments/media-types with the registration information in for the media type "message/bhttp".

References Normative References HTTP Semantics The Hypertext Transfer Protocol (HTTP) is a stateless application- level protocol for distributed, collaborative, hypertext information systems. This document describes the overall architecture of HTTP, establishes common terminology, and defines aspects of the protocol that are shared by all versions. In this definition are core protocol elements, extensibility mechanisms, and the "http" and "https" Uniform Resource Identifier (URI) schemes. This document obsoletes RFC 2818, RFC 7231, RFC 7232, RFC 7233, RFC 7235, RFC 7538, RFC 7615, RFC 7694, and portions of RFC 7230. Hypertext Transfer Protocol Version 2 (HTTP/2) This specification describes an optimized expression of the semantics of the Hypertext Transfer Protocol (HTTP), referred to as HTTP version 2 (HTTP/2). HTTP/2 enables a more efficient use of network resources and a reduced perception of latency by introducing header field compression and allowing multiple concurrent exchanges on the same connection. It also introduces unsolicited push of representations from servers to clients. This specification is an alternative to, but does not obsolete, the HTTP/1.1 message syntax. HTTP's existing semantics remain unchanged. HTTP/1.1 The Hypertext Transfer Protocol (HTTP) is a stateless application- level protocol for distributed, collaborative, hypertext information systems. This document specifies the HTTP/1.1 message syntax, message parsing, connection management, and related security concerns. This document obsoletes portions of RFC 7230. Key words for use in RFCs to Indicate Requirement Levels In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements. Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words RFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings. QUIC: A UDP-Based Multiplexed and Secure Transport This document defines the core of the QUIC transport protocol. QUIC provides applications with flow-controlled streams for structured communication, low-latency connection establishment, and network path migration. QUIC includes security measures that ensure confidentiality, integrity, and availability in a range of deployment circumstances. Accompanying documents describe the integration of TLS for key negotiation, loss detection, and an exemplary congestion control algorithm. DO NOT DEPLOY THIS VERSION OF QUIC DO NOT DEPLOY THIS VERSION OF QUIC UNTIL IT IS IN AN RFC. This version is still a work in progress. For trial deployments, please use earlier versions. Note to Readers Discussion of this draft takes place on the QUIC working group mailing list (quic@ietf.org (mailto:quic@ietf.org)), which is archived at https://mailarchive.ietf.org/arch/search/?email_list=quic Working Group information can be found at https://github.com/quicwg; source code and issues list for this draft can be found at https://github.com/quicwg/base-drafts/labels/-transport. Informative References Hypertext Transfer Protocol Version 3 (HTTP/3) The QUIC transport protocol has several features that are desirable in a transport for HTTP, such as stream multiplexing, per-stream flow control, and low-latency connection establishment. This document describes a mapping of HTTP semantics over QUIC. This document also identifies HTTP/2 features that are subsumed by QUIC, and describes how HTTP/2 extensions can be ported to HTTP/3. DO NOT DEPLOY THIS VERSION OF HTTP DO NOT DEPLOY THIS VERSION OF HTTP/3 UNTIL IT IS IN AN RFC. This version is still a work in progress. For trial deployments, please use earlier versions. Note to Readers Discussion of this draft takes place on the QUIC working group mailing list (quic@ietf.org), which is archived at https://mailarchive.ietf.org/arch/search/?email_list=quic. Working Group information can be found at https://github.com/quicwg; source code and issues list for this draft can be found at https://github.com/quicwg/base-drafts/labels/-http. HPACK: Header Compression for HTTP/2 This specification defines HPACK, a compression format for efficiently representing HTTP header fields, to be used in HTTP/2. QPACK: Header Compression for HTTP/3 This specification defines QPACK, a compression format for efficiently representing HTTP fields, to be used in HTTP/3. This is a variation of HPACK compression that seeks to reduce head-of-line blocking. HTTP Extensions for Distributed Authoring -- WEBDAV This document specifies a set of methods, headers, and content-types ancillary to HTTP/1.1 for the management of resource properties, creation and management of resource collections, namespace manipulation, and resource locking (collision avoidance). [STANDARDS-TRACK] An HTTP Status Code for Indicating Hints This memo introduces an informational HTTP status code that can be used to convey hints that help a client make preparations for processing the final response.

Acknowledgments TODO: credit where credit is due.