Binary Representation of HTTP MessagesMozillamt@lowentropy.netCloudflarecaw@heapingbits.net
ART
HTTPBISThis document defines a binary format for representing HTTP messages.Discussion VenuesDiscussion of this document takes place on the
HTTP Working Group mailing list (http@ietf.org),
which is archived at .Source for this draft and an issue tracker can be found at
.IntroductionThis document defines a simple format for representing an HTTP message
(), either request or response. This allows
for the encoding of HTTP messages that can be conveyed outside of an HTTP
protocol. This enables the transformation of entire messages,
including the application of authenticated encryption.This format is informed by the framing structure of HTTP/2 ()
and HTTP/3 (). In comparison, this format simpler by
virtue of not including either header compression (,
) or a generic framing layer.This format provides an alternative to the message/http content type defined
in . A binary format permits more
efficient encoding and processing of messages. A binary format also reduces
exposure to security problems related to processing of HTTP messages.Two modes for encoding are described:
a known-length encoding includes length prefixes for all major message
components; and
an indefinite-length encoding enables efficient generation of messages where
lengths are not known when encoding starts.
Conventions and DefinitionsThe key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be interpreted as
described in BCP 14 when, and only when, they
appear in all capitals, as shown here.This document uses terminology from HTTP () and notation from QUIC ().FormatAn HTTP message is split into five sections, following the structure defined in
Section 6 of :
Framing indicator. This format uses a single integer to describe framing, which describes
whether the message is a request or response and how subsequent sections are
formatted; see .
For a response, any number of interim responses, each consisting of an
informational status code and header section.
Control data. For a request, this contains the request method and target.
For a response, this contains the status code.
Header section. This contains zero or more header fields.
Content. This is a sequence of zero or more bytes.
Trailer section. This contains zero or more trailer fields.
All lengths and numeric values are encoded using the variable-length integer
encoding from .Known Length MessagesA message that has a known length at the time of construction uses the
format shown in .That is, a known-length message consists of a framing indicator, a block of
control data that is formatted according to the value of the framing indicator,
a header section with a length prefix, binary content with a length prefix, and
a trailer section with a length prefix.Response messages that contain informational status codes result in a different
structure; see .Fields in the header and trailer sections consist of a length-prefixed name and
length-prefixed value. Both name and value are sequences of bytes that cannot
be zero length.The format allows for the message to be truncated before any of the length
prefixes that precede the field sections or content. This reduces the overall
message size. A message that is truncated at any other point is invalid; see
.The variable-length integer encoding means that there is a limit of 2^62-1
bytes for each field section and the message content.Indeterminate Length MessagesA message that is constructed without encoding a known length for each section
uses the format shown in :That is, an indeterminate length consists of a framing indicator, a block of
control data that is formatted according to the value of the framing indicator,
a header section that is terminated by a zero value, any number of
non-zero-length chunks of binary content, a zero value, and a trailer section
that is terminated by a zero value.Response messages that contain informational status codes result in a different
structure; see .Indeterminate-length messages can be truncated in a similar way as known-length
messages. Truncation occurs after the control data, or after the Content
Terminator field that ends a field section or sequence of content chunks. A
message that is truncated at any other point is invalid; see .Indeterminate-length messages use the same encoding for field lines as
known-length messages; see .Framing IndicatorThe start of each is a framing indicator that is a single integer that
describes the structure of the subsequent sections. The framing indicator can
take just four values:
A value of 0 describes a request of known length.
A value of 1 describes a response of known length.
A value of 2 describes a request of indeterminate length.
A value of 3 describes a response of indeterminate length.
Other values cause the message to be invalid; see .Request Control DataThe control data for a request message includes four values that correspond to
the values of the :method, :scheme, :authority, and :path pseudo-header
fields described in HTTP/2 (Section 8.1.2.3 of ). These fields are
encoded, each with a length prefix, in the order listed.The rules in Section 8.1.2.3 of for constructing pseudo-header fields
apply to the construction of these values. However, where the :authority
pseudo-header field might be omitted in HTTP/2, a zero-length value is encoded
instead.The format of request control data is shown in .Response Control DataThe control data for a request message includes a single field that corresponds
to the :status pseudo-header field in HTTP/2 . This field is encoded
as a single variable length integer, not a decimal string.The format of final response control data is shown in
.Informational Status CodesThis format supports informational status codes (see Section 15.2 of
). Responses that include information status codes are encoded by
repeating the response control data and associated header section until the
final status code is encoded.The format of the informational response control data is shown in
.A response message can include any number of informational responses. If the
response control data includes an informational status code (that is, a value
between 100 and 199 inclusive), the control data is followed by a header
section (encoded with known- or indeterminate- length according to the framing
indicator). After the header section, another response control data block
follows.Header and Trailer Field LinesHeader and trailer sections consist of zero or more field lines; see Section 5
of . The format of a field section depends on whether the message is
known- or intermediate-length.Each field line includes a name and a value. Both the name and value are
non-zero length sequences of bytes. The format of a field line is shown in
.For field names, byte values that are not permitted in an HTTP field name cause
the message to be invalid; see Section 5.1 of for a definition of
what is valid and for handling of invalid messages.In addition, values from the ASCII uppercase range (0x41-0x5a inclusive) MUST
be translated to lowercase values (0x61-0x7a) when generating or forwarding
messages. A recipient MUST treat a message containing field names with bytes in
the range 0x41-0x5a as invalid; see .For field values, byte values that are not permitted in an HTTP field value
cause the message to be invalid; see Section 5.5 of for a definition
of valid values.The same field name can be repeated in multiple field lines; see Section 5.2 of
for the semantics of repeated field names and rules for combining
values.Like HTTP/2, this format has an exception for the combination of multiple
instances of the Cookie field. Instances of fields with the ASCII-encoded
value of cookie are combined using a semicolon octet (0x3b) rather than a
comma; see Section 8.1.2.5 of .This format provides fixed locations for content that would be carried in
HTTP/2 pseudo-fields. Therefore, there is no need to include field lines
containing a name of :method, :scheme, :authority, :path, or :status.
Fields that contain one of these names cause the message to be invalid; see
. Pseudo-fields that are defined by protocol extensions MAY be
included, however field lines containing pseudo-fields MUST precede other field
lines.ContentThe content of messages is a sequence of bytes of any length. Though a
known-length message has a limit, this limit is large enough that it is
unlikely to be a practical limitation. There is there is no limit to an
indeterminate length message.Omitting content by truncating a message is only possible if the content is
zero-length.Invalid MessagesThis document describes a number of ways that a message can be invalid. Invalid
messages MUST NOT be processed except to log an error and produce an error
response.The format is designed to allow incremental processing. Implementations need to
be aware of the possibility that an error might be detected after performing
incremental processing.ExamplesThis section includes example requests and responses encoded in both
known-length and indefinite-length forms.Request ExampleThe example HTTP/1.1 message in shows the content of a
message/http.Valid HTTP/1.1 messages require lines terminated with CRLF (the two bytes 0x0a
and 0x0d). For simplicity and consistenct, the content of these examples is
limited to text, which also uses CRLF for line endings.This can be expressed as a binary message (type message/bhttp) using a
known-length encoding as shown in hexadecimal in .
view includes some of the text alongside to show that most
of the content is not modified.This example shows that the Host header field is not replicated in the
:authority field, as is required for ensuring that the request is reproduced
accurately; see Section 8.1.2.3 of .The same message can be truncated with no effect on interpretation. In this
case, the last two bytes - corresponding to content and a trailer section - can
each be removed without altering the semantics of the message.The same message, encoded using an indefinite-length encoding is shown in
. As the content of this message is empty, the difference in
formats is negligible.This indefinite-length encoding can be truncated by two bytes in the same way.Response ExampleResponse messages can contain interim (1xx) status codes as the message in
shows. includes examples of informational
status codes defined in and .As this is a longer example, only the indefinite-length encoding is shown in
. Note here that the specific text used in the reason
phrase is not retained by this encoding.A response that uses the chunked encoding (Section 7.1 of ) as
shown for can be encoded by preserving chunk boundaries using
indefinite-length encoding, which minimizes buffering needed to translate into
the binary format. However, these boundaries do not need to be retained and any
chunk extensions cannot be conveyed using the binary format. shows this message using the known-length coding. Note that
the transfer-encoding header field is removed."message/bhttp" Media TypeThe message/http media type can be used to enclose a single HTTP request or
response message, provided that it obeys the MIME restrictions for all
"message" types regarding line length and encodings.
Type name:
message
Subtype name:
bhttp
Required parameters:
N/A
Optional parameters:
None
Encoding considerations:
only "8bit" or "binary" is permitted
Security considerations:
see
Interoperability considerations:
N/A
Published specification:
this specification
Applications that use this media type:
N/A
Fragment identifier considerations:
N/A
Additional information:
Magic number(s):
N/A
Deprecated alias names for this type:
N/A
File extension(s):
N/A
Macintosh file type code(s):
N/A
Person and email address to contact for further information:
see Authors' Addresses section
Intended usage:
COMMON
Restrictions on usage:
N/A
Author:
see Authors' Addresses section
Change controller:
IESG
Security ConsiderationsMany of the considerations that apply to HTTP message handling apply to this
format; see Section 17 of and Section 11 of for common
issues in handling HTTP messages.Strict parsing of the format with no tolerance for errors can help avoid a
number of attacks. However, implementations still need to be aware of the
possibility of resource exhaustion attacks that might arise from receiving
large messages, particularly those with large numbers of fields.The format is designed to allow for minimal state when translating for use with
HTTP proper. However, producing a combined value for fields, which might be
necessary for the Cookie field when translating this format (like HTTP/1.1
), can require the commitment of resources. Implementations need
to ensure that they aren't subject to resource exhaustion attack from a
maliciously crafted message.IANA ConsiderationsPlease add the "Media Types" registry at
https://www.iana.org/assignments/media-types with the registration
information in for the media type "message/bhttp".ReferencesNormative ReferencesHTTP SemanticsThe Hypertext Transfer Protocol (HTTP) is a stateless application- level protocol for distributed, collaborative, hypertext information systems. This document describes the overall architecture of HTTP, establishes common terminology, and defines aspects of the protocol that are shared by all versions. In this definition are core protocol elements, extensibility mechanisms, and the "http" and "https" Uniform Resource Identifier (URI) schemes. This document obsoletes RFC 2818, RFC 7231, RFC 7232, RFC 7233, RFC 7235, RFC 7538, RFC 7615, RFC 7694, and portions of RFC 7230.Hypertext Transfer Protocol Version 2 (HTTP/2)This specification describes an optimized expression of the semantics of the Hypertext Transfer Protocol (HTTP), referred to as HTTP version 2 (HTTP/2). HTTP/2 enables a more efficient use of network resources and a reduced perception of latency by introducing header field compression and allowing multiple concurrent exchanges on the same connection. It also introduces unsolicited push of representations from servers to clients.This specification is an alternative to, but does not obsolete, the HTTP/1.1 message syntax. HTTP's existing semantics remain unchanged.HTTP/1.1The Hypertext Transfer Protocol (HTTP) is a stateless application- level protocol for distributed, collaborative, hypertext information systems. This document specifies the HTTP/1.1 message syntax, message parsing, connection management, and related security concerns. This document obsoletes portions of RFC 7230.Key words for use in RFCs to Indicate Requirement LevelsIn many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.Ambiguity of Uppercase vs Lowercase in RFC 2119 Key WordsRFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.QUIC: A UDP-Based Multiplexed and Secure TransportThis document defines the core of the QUIC transport protocol. QUIC provides applications with flow-controlled streams for structured communication, low-latency connection establishment, and network path migration. QUIC includes security measures that ensure confidentiality, integrity, and availability in a range of deployment circumstances. Accompanying documents describe the integration of TLS for key negotiation, loss detection, and an exemplary congestion control algorithm. DO NOT DEPLOY THIS VERSION OF QUIC DO NOT DEPLOY THIS VERSION OF QUIC UNTIL IT IS IN AN RFC. This version is still a work in progress. For trial deployments, please use earlier versions. Note to Readers Discussion of this draft takes place on the QUIC working group mailing list (quic@ietf.org (mailto:quic@ietf.org)), which is archived at https://mailarchive.ietf.org/arch/search/?email_list=quic Working Group information can be found at https://github.com/quicwg; source code and issues list for this draft can be found at https://github.com/quicwg/base-drafts/labels/-transport.Informative ReferencesHypertext Transfer Protocol Version 3 (HTTP/3)The QUIC transport protocol has several features that are desirable in a transport for HTTP, such as stream multiplexing, per-stream flow control, and low-latency connection establishment. This document describes a mapping of HTTP semantics over QUIC. This document also identifies HTTP/2 features that are subsumed by QUIC, and describes how HTTP/2 extensions can be ported to HTTP/3. DO NOT DEPLOY THIS VERSION OF HTTP DO NOT DEPLOY THIS VERSION OF HTTP/3 UNTIL IT IS IN AN RFC. This version is still a work in progress. For trial deployments, please use earlier versions. Note to Readers Discussion of this draft takes place on the QUIC working group mailing list (quic@ietf.org), which is archived at https://mailarchive.ietf.org/arch/search/?email_list=quic. Working Group information can be found at https://github.com/quicwg; source code and issues list for this draft can be found at https://github.com/quicwg/base-drafts/labels/-http.HPACK: Header Compression for HTTP/2This specification defines HPACK, a compression format for efficiently representing HTTP header fields, to be used in HTTP/2.QPACK: Header Compression for HTTP/3This specification defines QPACK, a compression format for efficiently representing HTTP fields, to be used in HTTP/3. This is a variation of HPACK compression that seeks to reduce head-of-line blocking.HTTP Extensions for Distributed Authoring -- WEBDAVThis document specifies a set of methods, headers, and content-types ancillary to HTTP/1.1 for the management of resource properties, creation and management of resource collections, namespace manipulation, and resource locking (collision avoidance). [STANDARDS-TRACK]An HTTP Status Code for Indicating HintsThis memo introduces an informational HTTP status code that can be used to convey hints that help a client make preparations for processing the final response.AcknowledgmentsTODO: credit where credit is due.