A JSON Encoding for HTTP Header Field Valuesgreenbytes GmbHHafenweg 16MuensterNW48155Germanyjulian.reschke@greenbytes.dehttp://greenbytes.de/tech/webdav/
Applications and Real-Time
HTTPJSONHeader Field Value
This document establishes a convention for use of JSON-encoded field
values in HTTP header fields.
Distribution of this document is unlimited. Although this is not a work
item of the HTTPbis Working Group, comments should be sent to the
Hypertext Transfer Protocol (HTTP) mailing list at ietf-http-wg@w3.org,
which may be joined by sending a message with subject
"subscribe" to ietf-http-wg-request@w3.org.
Discussions of the HTTPbis Working Group are archived at
.
XML versions and latest edits for this document
are available from .
The changes in this draft are summarized in .
Defining syntax for new HTTP header fields (, Section 3.2) is non-trivial. Among the commonly encountered
problems are:
There is no common syntax for complex field values. Several well-known
header fields do use a similarly looking syntax, but it is hard to write
generic parsing code that will both correctly handle valid field values
but also reject invalid ones.
The HTTP message format allows header fields to repeat, so field syntax
needs to be designed in a way that these cases are either meaningful,
or can be unambiguously detected and rejected.
HTTP/1.1 does not define a character encoding scheme (, Section 2), so header fields are either stuck with US-ASCII
(), or need out-of-band information
to decide what encoding scheme is used. Furthermore, APIs
usually assume a default encoding scheme in order to map from
octet sequences to strings (for instance,
uses the IDL type "ByteString", effectively resulting in the
ISO-8859-1 character encoding scheme being used).
(See Section 8.3.1 of
for a summary of considerations for new header fields.)
This specification addresses the issues listed above by defining both a generic
JSON-based () data model and a concrete
wire format that can be used in definitions of new header fields, where the
goals were:
to be compatible with header field recombination when fields occur multiple
times in a single message (Section 3.2.2 of ), andnot to use any problematic characters in the field value (non-ASCII characters and certain whitespace characters).Note: , a work item of the
IETF HTTP Working Group, is a different attempt to address this set
of problems — it tries to identify and formalize common field
structures in existing header fields; the syntax defined over there would usually lead to a more
compact notation.
In HTTP, header fields with the same field name can occur multiple times
within a single message (Section 3.2.2 of ).
When this happens, recipients are allowed to combine the field values using
commas as delimiter. This rule matches nicely JSON's array format
(Section 5 of ). Thus, the basic data model
used here is the JSON array.
Header field definitions that need only a single value can restrict
themselves to arrays of length 1, and are encouraged to define error
handling in case more values are received (such as "first wins", "last wins",
or "abort with fatal error message").
JSON arrays are mapped to field values by creating a sequence of
serialized member elements, separated by commas and optionally whitespace. This
is equivalent to using the full JSON array format, while leaving out
the "begin-array" ('[') and "end-array" (']') delimiters.
Characters in JSON strings that are not allowed or discouraged in HTTP
header field values — that is, not in the "VCHAR" definition —
need to be represented using JSON's "backslash" escaping mechanism
(, Section 7).
The control characters CR, LF, and HTAB do not appear inside JSON
strings, but can be used outside (line breaks, indentation etc.). These characters
need to be either stripped or replaced by space characters (ABNF "SP").
Formally, using the HTTP specification's ABNF extensions defined in
Section 7 of :
To map a JSON array to an HTTP header field value, process each array
element separately by:
generating the JSON representation,stripping all JSON control characters (CR, HTAB, LF), or replacing
them by space ("SP") characters,replacing all remaining non-VSPACE characters by the equivalent
backslash-escape sequence (, Section 7).
The resulting list of strings is transformed into an HTTP field value
by combining them using comma (%x2C) plus optional SP as delimiter,
and encoding the resulting string into an octet sequence using the
US-ASCII character encoding scheme ().
To map a set of HTTP header field instances to a JSON array:
combine all header field instances into a single field as per
Section 3.2.2 of ,add a leading begin-array ("[") octet and a trailing end-array ("]") octet, thenrun the resulting octet sequence through a JSON parser.
The result of the parsing operation is either an error (in which case
the header field values needs to be considered invalid), or a JSON array.
Specifications defining new HTTP header fields need to take the
considerations listed in Section 8.3.1 of
into account. Many of these will already be accounted for by using the
format defined in this specification.
Readers of HTTP-related specifications frequently expect an ABNF definition
of the field value syntax. This is not really needed here, as the actual
syntax is JSON text, as defined in Section 2 of .
A very simple way to use this JSON encoding thus is just to
cite this specification — specifically the "json-field-value" ABNF production
defined in — and otherwise not to talk about the details of the field
syntax at all.
An alternative approach is just to repeat the ABNF-related parts from .
This frees the specification from defining the concrete on-the-wire syntax.
What's left is defining the field value in terms of a JSON array. An
important aspect is the question of extensibility, e.g. how recipients
ought to treat unknown field names. In general, a "must ignore" approach
will allow protocols to evolve without versioning or even using entire new
field names.
This JSON-based syntax will only apply to newly introduced
header fields, thus backwards compatibility is not a problem. That being
said, it is conceivable that there is existing code that might trip over
double quotes not being used for HTTP's quoted-string syntax (Section 3.2.6 of ).
The "I-JSON Message Format" specification () addresses
known JSON interoperability pain points. This specification borrows from
the requirements made over there:
This specification requires that field values use only US-ASCII characters,
and thus by definition use a subset of UTF-8 (Section 2.1 of ).
Be aware of the issues around number precision, as discussed in Section 2.2 of .
As described in Section 4 of , JSON parser implementations
differ in the handling of duplicate object names. Therefore, senders MUST NOT
use duplicate object names, and recipients SHOULD either treat
field values with duplicate names as invalid (consistent with , Section 2.3)
or use the lexically last value (consistent with , Section 24.3.1.1).
Furthermore, ordering of object members is not significant and can not be relied upon.
In HTTP/1.1, header field values are represented by octet sequences, usually used to
transmit ASCII characters, with restrictions on the use of certain
control characters, and no associated default character encoding, nor
a way to describe it (, Section 3.2).
HTTP/2 does not change this.
This specification maps all characters which can cause problems to JSON
escape sequences, thereby solving the HTTP header field
internationalization problem.
Future specifications of HTTP might change to allow non-ASCII characters
natively. In that case, header fields using the syntax defined by this
specification would have a simple migration path (by just stopping
to require escaping of non-ASCII characters).
Using JSON-shaped field values is believed to not introduce any new threads
beyond those described in Section 12 of , namely
the risk of recipients using the wrong tools to parse them.
Other than that, any syntax that makes extensions easy can be used to
smuggle information through field values; however, this concern is shared
with other widely used formats, such as those using parameters in the
form of name/value pairs.
ASCII format for network interchangeAugmented BNF for Syntax Specifications: ABNFThe JavaScript Object Notation (JSON) Data Interchange FormatTextualitytbray@textuality.comHypertext Transfer Protocol (HTTP/1.1): Message Syntax and RoutingAdobe Systems Incorporated345 Park AveSan JoseCA95110USAfielding@gbiv.comhttp://roy.gbiv.com/greenbytes GmbHHafenweg 16MuensterNW48155Germanyjulian.reschke@greenbytes.dehttp://greenbytes.de/tech/webdav/Hypertext Transfer Protocol (HTTP/1.1): Semantics and ContentAdobe Systems Incorporated345 Park AveSan JoseCA95110USAfielding@gbiv.comhttp://roy.gbiv.com/greenbytes GmbHHafenweg 16MuensterNW48155Germanyjulian.reschke@greenbytes.dehttp://greenbytes.de/tech/webdav/The I-JSON Message FormatInformation technology -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1International Organization for StandardizationUse of the Content-Disposition Header Field
in the Hypertext Transfer Protocol (HTTP)Terminology Used in Internationalization in the IETFHypertext Transfer Protocol (HTTP/1.1): AuthenticationAdobe Systems Incorporated345 Park AveSan JoseCA95110USAfielding@gbiv.comhttp://roy.gbiv.com/greenbytes GmbHHafenweg 16MuensterNW48155Germanyjulian.reschke@greenbytes.dehttp://greenbytes.de/tech/webdav/Indicating Character Encoding and Language for HTTP Header Field ParametersECMA-262 6th Edition, The ECMAScript 2015 Language SpecificationEcma InternationalStructured Headers for HTTPThe Key HTTP Response Header FieldXMLHttpRequestWhatWGReporting API 1
Latest version available at
.
Clear Site Data
Latest version available at
.
Feature Policy
This section shows how some of the existing HTTP header fields would look
like if they would use the format defined by this specification.
"Content-Length" is defined in Section 3.3.2 of , with the field value's ABNF being:
So the field value is similar to a JSON number (, Section 6).
Content-Length is restricted to a single field instance, as it doesn't use
the list production (as per Section 3.2.2 of ).
However, in practice multiple instances do occur, and the definition of
the header field does indeed discuss how to handle these cases.
If Content-Length was defined using the JSON format discussed here, the
ABNF would be something like:
...and the prose definition would:
restrict all numbers to be non-negative integers without fractions, andrequire that the array of values is of length 1
(but allow the case where the array is longer, but all members represent
the same value)
Content-Disposition field values, defined in , consist of
a "disposition type" (a string), plus multiple parameters, of which at least
one ("filename") sometime needs to carry non-ASCII characters.
For instance, the first example in Section 5 of :
has a disposition type of "Attachment", with filename parameter value
"example.html". A JSON representation of this information might be:
which would translate to a header field value of:
The third example in Section 5 of
uses a filename parameter containing non-US-ASCII characters:
Note that in this case, the "filename*" parameter uses the encoding defined in
, representing a filename starting with the Unicode
character U+20AC (EURO SIGN), followed by " rates". If the definition
of Content-Disposition would have used the format proposed here, the
workaround involving the "parameter*" syntax would not have been needed at
all.
The JSON representation of this value could then be:
The WWW-Authenticate header field value is defined in Section 4.1 of as a list of "challenges":
...where a challenge consists of a scheme with optional parameters:
An example for a complex header field value given in the definition of
the header field is:
A possible JSON representation of this field value would be the array below:
...which would translate to a header field value of:
The Accept-Encoding header field value is defined in Section 5.3.4 of as a list of codings, each of which
allowing a weight parameter 'q':
An example for a complex header field value given in the definition of
the header field is:
Due to the defaulting rules for the quality value (, Section 5.3.1),
this could also be written as:
A JSON representation could be:
...which would translate to a header field value of:
In this example, the part about "gzip" appears unnecessarily verbose, as
the value is just an empty object. A simpler notation would collapse
members like these to string literals:
If this is desirable, the header field definition could allow both
string literals and objects, and define that a mere string literal
would be mapped to a member whose name is given by the string literal,
and the value is an empty object.
Since work started on this document, various specifications
have adopted this format. At least one of these moved away after
the HTTP Working Group decided to focus on
(see thread starting at ).
The sections below summarize the current usage of this format.
Defined in W3C Note "Reporting API 1" (Section 3.1 of ).
Still in use in latest editor copy as of June 2017.
Used in earlier versions of "Clear Site Data". The current version replaces
the use of JSON with a custom syntax that happens to be somewhat compatible with an array of JSON strings
(see Section 3.1 of and for feedback).
Originally defined in W3C Draft Community Group Report "Feature Policy" (),
but now replaced with a custom syntax (see ).
aims to improve the cacheability of responses that
vary based on certain request header fields, addressing lack of granularity in
the existing "Vary" response header field (, Section 7.1.4).
If the JSON-based format described by this document gains popularity, it
might be useful to add a JSON-aware "Key Parameter" (see Section 2.3 of ).
This approach uses a default of "JSON array", using implicit array markers.
An alternative would be a default of "JSON object". This would simplify
the syntax for non-list-typed header fields, but all the benefits of having the
same data model for both types of header fields would be gone.
A hybrid approach might make sense, as long as it doesn't require any
heuristics on the recipient's side.
Note:
a concrete proposal was made by Kazuho Oku in
.
Use of generic libs vs compactness of field values..
Editorial fixes + working on the TODOs.
Mention slightly increased risk of smuggling information in header field values.
Mention Kazuho Oku's proposal for abbreviated forms.
Added a bit of text about the motivation for a concrete JSON subset (ack Cory Benfield).
Expand I18N section.
Mention relation to KEY header field.
Between June and December 2016, this was a work item of the HTTP
working group (see ).
Work (if any) continues now on .
Changes made while this was a work item of the HTTP Working Group:
Added example for "Accept-Encoding" (inspired by Kazuho's feedback),
showing a potential way to optimize the format when default values apply.
Add interop discussion, building on I-JSON and ECMA-262 (see
).
Move non-essential parts into appendix.
Updated XHR reference.
Add meat to "Using this Format in Header Field Definitions".
Add a few lines on the relation to "Key".
Summarize current use of the format.
RFC 5987 is obsoleted by RFC 8187.
Update CLEARSITE comment.
Update JSON and HSTRUCT references.
FEATUREPOL doesn't use JSON syntax anymore.
Update HSTRUCT reference.
Update notes about CLEARSITE and FEATUREPOL.
Thanks go to the Hypertext Transfer Protocol Working Group participants.