HTTP Representation VariantsFastlymnot@mnot.nethttps://www.mnot.net/
General
Internet-DraftThis specification introduces an alternative way to communicate a secondary cache key for a HTTP resource, using the HTTP “Variants” and “Variant-Key” response header fields. Its aim is to make HTTP proactive content negotiation more cache-friendly.RFC EDITOR: please remove this section before publicationThe issues list for this draft can be found at https://github.com/mnot/I-D/labels/variants.The most recent (often, unpublished) draft is at https://mnot.github.io/I-D/variants/.Recent changes are listed at https://github.com/mnot/I-D/commits/gh-pages/variants.See also the draft’s current status in the IETF datatracker, at
https://datatracker.ietf.org/doc/draft-nottingham-variants/.There is a prototype implementation of the algorithms herein at https://github.com/mnot/variants-toy.HTTP proactive content negotiation (, Section 3.4.1) is seeing renewed interest, both for existing request headers like Content-Language and for newer ones (for example, see ).Successfully reusing negotiated responses that have been stored in a HTTP cache requires establishment of a secondary cache key (, Section 4.1). Currently, the Vary header (, Section 7.1.4) does this by nominating a set of request headers.HTTP’s caching model allows a certain amount of latitude in normalising those request header field values, so as to increase the chances of a cache hit while still respecting the semantics of that header. However, normalisation is not formally defined, leading to divergence in cache behaviours.Even when the headers’ semantics are understood, a cache does not know enough about the possible alternative representations available on the origin server to make an appropriate decision.For example, if a cache has stored the following request/response pair:Provided that the cache has full knowledge of the semantics of Accept-Language and Content-Language, it will know that a French representation is available and might be able to infer that an English representation is not available. But, it does not know (for example) whether a Japanese representation is available without making another request, incurring possibly unnecessary latency.This specification introduces the HTTP Variants response header field () to enumerate the available variant representations on the origin server, to provide clients and caches with enough information to properly satisfy requests – either by selecting a response from cache or by forwarding the request towards the origin – by following the algorithm defined in .Its companion the Variant-Key response header field () indicates which representation was selected, so that it can be reliably reused in the future. When this specification is in use, the example above might become:Proactive content negotiation mechanisms that wish to be used with Variants need to define how to do so explicitly; see . As a result, it is best suited for negotiation over request headers that are well-understood.Variants also works best when content negotiation takes place over a constrained set of representations; since each variant needs to be listed in the header field, it is ill-suited for open-ended sets of representations.Variants can be seen as a simpler version of the Alternates header field introduced by ; unlike that mechanism, Variants does not require specification of each combination of attributes, and does not assume that each combination has a unique URL.The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”,
“RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as
described in BCP 14 when, and only when, they appear in all capitals, as
shown here.This specification uses the Augmented Backus-Naur Form (ABNF) notation of with a list extension, defined in Section 7 of , that allows for compact definition of comma-separated lists using a ‘#’ operator (similar to how the ‘*’ operator indicates repetition).Additionally, it uses the “field-name”, “OWS” and “token” rules from .The Variants HTTP response header field indicates what representations are available for a given resource at the time that the response is produced, by enumerating the request header fields that it varies on, along with the values that are available for each.Each “variant-item” indicates a request header field that carries a value that clients might proactively negotiate for; each parameter on it indicates a value for which there is an available representation on the origin server.So, given this example header field:a recipient can infer that the only content-coding available for that resource is “gzip” (along with the “identity” non-encoding; see ).Given:a recipient can infer that no content-codings (beyond identity) are supported. Note that as always, field-name is case-insensitive.A more complex example:Here, recipients can infer that two content-codings in addition to “identity” are available, as well as two content languages. Note that, as with all HTTP header fields that use the “#” list rule (see , Section 7), they might occur in the same header field or separately, like this:The ordering of available-values after the field-name is significant, as it might be used by the header’s algorithm for selecting a response (in this example, the first language is the default; see ).The ordering of the request header fields themselves indicates descending application of preferences; in the example above, a cache that has all of the possible permutations stored will honour the client’s preferences for Accept-Encoding before honouring Accept-Language.Origin servers SHOULD consistently send Variant header fields on all cacheable (as per , Section 3) responses for a resource, since its absence will trigger caches to fall back to Vary processing.Likewise, servers MUST send the Variant-Key response header field when sending Variants, since its absence means that the stored response will not be reused when this specification is implemented.Caches that implement this specification SHOULD ignore request header fields in the Vary header for the purposes of secondary cache key calculation (, Section 4.1) when their semantics are implemented as per this specification and their corresponding response header field is listed in Variants.If any member of the Vary header does not have a corresponding variant that is understood by the implementation, it is still subject to the requirements there.See for an example.In practice, implementation of Vary varies considerably. As a result, cache efficiency might drop considerably when Variants does not contain all of the headers referenced by Vary, because some implementations might choose to disable Variants processing when this is the case.The Variant-Key HTTP response header field is used to indicate the value(s) from the Variants header field that identify the representation it occurs within.Each value indicates the selected available-value, in the same order as the variants listed in the Variants header field.Therefore, Variant-Key MUST be the same length (in comma-separated members) as Variants, and each member MUST correspond in position to its companion in Variants.For example:This header pair indicates that the representation has a “gzip” content-coding and “fr” content-language.Note that Variant-Key is only used to indicate what request attributes are associated with the response containing it; this is different from headers like Content-Encoding, which indicate attributes of the response itself. In the example above, it might be that a gzip’d version of the French content is not available, in which case the response will include:even though Content-Encoding does not contain “gzip”.This algorithm generates a normalised string for Variant-Key, suitable for comparison with values generated by .Given stored-headers, a set of headers from a stored response, a normalised variant-key for that message can be generated by:Let variant-key-header be a string, the result of selecting all field-values of stored-headers whose field-name is “Variant-Key” and joining them with a comma (“,”).Remove all whitespace from variant-key-header.Return variant-key-header.Caches that implement the Variants header field and the relevant semantics of the field-name it contains can use that knowledge to either select an appropriate stored representation, or forward the request if no appropriate representation is stored.They do so by running this algorithm (or its functional equivalent) upon receiving a request:Given incoming-request, a mapping of field-names to lists of field values, and stored-responses, a list of stored responses suitable for reuse as defined in Section 4, excepting the requirement to calculate a secondary cache key:If stored-responses is empty, return an empty list.Order stored-responses by the “Date” header field, most recent to least recent.Let sorted-variants be an empty list.If the freshest member of stored-responses (as per , Section 4.2) has one or more “Variants” header field(s):
Select one member of stored-responses and let its “Variants” header field-value(s) be variants-header. This SHOULD be the most recent response, but MAY be from an older one as long as it is still fresh.For each variant in variants-header:
If variant’s field-name corresponds to the request header field identified by a content negotiation mechanism that the implementation supports:
Let request-value be the field-value(s) associated with field-name in incoming-request.Let available-values be a list containing all available-value for variant.Let sorted-values be the result of running the algorithm defined by the content negotiation mechanism with request-value and available-values.Append sorted-values to sorted-variants.
At this point, sorted-variants will be a list of lists, each member of the top-level list corresponding to a variant-item in the Variants header field-value, containing zero or more items indicating available-values that are acceptable to the client, in order of preference, greatest to least.Return result of running Find Available Keys () on sorted-variants, an empty string and an empty list.This returns a list of strings suitable for comparing to normalised Variant-Keys () that represent possible responses on the server that can be used to satisfy the request, in preference order, provided that their secondary cache key (after removing the headers covered by Variants) matches. illustrates one way to do this.Given sorted-variants, a list of lists, and key-stub, a string representing a partial key, and possible-keys, a list:Let sorted-values be the first member of sorted-variants.For each sorted-value in sorted-values:
If key-stub is an empty string, let this-key be a copy of sorted-value.Otherwise:
Let this-key be a copy of key-stub.Append a comma (“,”) to this-key.Append sorted-value to this-key.Let remaining-variants be a copy of all of the members of sorted-variants except the first.If remaining-variants is empty, append this-key to possible-keys.Otherwise, run Find Available Keys on remaining-variants, this-key and possible-keys.Return possible-keys.This algorithm is an example of how an implementation can meet the requirement to apply the members of the Vary header field that are not covered by Variants.Given a stored response, stored-response:Let filtered-vary be the field-value(s) of stored-response’s “Vary” header field.Let processed-variants be a list containing the request header fields that identify the content negotiation mechanisms supported by the implementation.Remove any member of filtered-vary that is a case-insensitive match for a member of processed-variants.If the secondary cache key (as calculated in , Section 4.1) for stored_response matches incoming-request, using filtered-vary for the value of the “Vary” response header, return True.Return False.This returns a Boolean that indicates whether stored-response can be used to satisfy the request.Note that implementation of the Vary header field varies in practice, and the algorithm above illustrates only one way to apply it. It is equally viable to forward the request if there is a request header listed in Vary but not Variants.For example, if the selected variants-header was:and the request contained the headers:Then the sorted-variants would be:Which means that the sorted-keys would be:Representing a first preference of a French, gzip’d response. Thus, if a cache has a response with:it could be used to satisfy the first preference. If not, responses corresponding to the other keys could be returned, or the request could be forwarded towards the origin.Origin servers that wish to take advantage of Variants will need to generate both the Variants () and Variant-Key () header fields in all cacheable responses for a given resource. If either is omitted and the response is stored, it will have the effect of disabling caching for that resource until it is no longer stored (e.g., it expires, or is evicted).Likewise, origin servers will need to assure that the members of both header field values are in the same order and have the same length, since discrepancies will cause caches to avoid using the responses they occur in.The value of the Variants header should be relatively stable for a given resource over time; when it changes, it can have the effect of invalidating previously stored responses.As per , the Vary header is required to be set appropriately when Variants is in use, so that caches that do not implement this specification still operate correctly.Origin servers are advised to carefully consider which content negotiation mechanisms to enumerate in Variants; if a mechanism is not supported by a receiving cache, it will “downgrade” to Vary handling, which can negatively impact cache efficiency.The operation of Variants is illustrated by the examples below.Given a request/response pair:Upon receipt of this response, the cache knows that two representations of this resource are available, one with a Content-Language of “en”, and another whose Content-Language is “de”.Subsequent requests (while this response is fresh) will cause the cache to either reuse this response or forward the request, depending on what the selection algorithm determines.So, if a request with “en” in Accept-Language is received and its q-value indicates that it is acceptable, the stored response is used. A request that indicates that “de” is acceptable will be forwarded to the origin, thereby populating the cache. A cache receiving a request that indicates both languages are acceptable will use the q-value to make a determination of what response to return.A cache receiving a request that does not list either language as acceptable (or does not contain an Accept-Language at all) will return the “en” representation (possibly fetching it from the origin), since it is listed first in the Variants list.Note that Accept-Language is listed in Vary, to assure backwards-compatibility with caches that do not support Variants.A more complicated request/response pair:Here, the cache knows that there are two axes that the response varies upon; Content-Language and Content-Encoding. Thus, there are a total of nine possible representations for the resource (including the identity encoding), and the cache needs to consider the selection algorithms for both axes.Upon a subsequent request, if both selection algorithms return a stored representation, it can be served from cache; otherwise, the request will need to be forwarded to origin.Now, consider the previous example, but where only one of the Vary’d axes is listed in Variants:Here, the cache will need to calculate a secondary cache key as per , Section 4.1 – but considering only Accept-Language to be in its field-value – and then continue processing Variants for the set of stored responses that the algorithm described there selects.To be usable with Variants, proactive content negotiation mechanisms need to be specified to take advantage of it. Specifically, they:MUST define a request header field that advertises the clients preferences or capabilities, whose field-name SHOULD begin with “Accept-“.MUST define the syntax of available-values that will occur in Variants and Variant-Key.MUST define an algorithm for selecting a result. It MUST return a list of available-values that are suitable for the request, in order of preference, given the value of the request header nominated above and an available-values list from the Variants header. If the result is an empty list, it implies that the cache cannot satisfy the request. fulfils these requirements for some existing proactive content negotiation mechanisms in HTTP.This specification registers two values in the Permanent Message Header Field Names registry established by :Header field name: VariantsApplicable protocol: httpStatus: standardAuthor/Change Controller: IETFSpecification document(s): [this document]Related information:Header field name: Variant-KeyApplicable protocol: httpStatus: standardAuthor/Change Controller: IETFSpecification document(s): [this document]Related information:If the number or advertised characteristics of the representations available for a resource are considered sensitive, the Variants header by its nature will leak them.Note that the Variants header is not a commitment to make representations of a certain nature available; the runtime behaviour of the server always overrides hints like Variants.This protocol is conceptually similar to, but simpler than, Transparent Content Negotiation . Thanks to its authors for their inspiration.It is also a generalisation of a Fastly VCL feature designed by Rogier ‘DocWilco’ Mulhuijzen.Thanks to Hooman Beheshti for his review and input.Key words for use in RFCs to Indicate Requirement LevelsIn many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.Hypertext Transfer Protocol (HTTP/1.1): Semantics and ContentThe Hypertext Transfer Protocol (HTTP) is a stateless \%application- level protocol for distributed, collaborative, hypertext information systems. This document defines the semantics of HTTP/1.1 messages, as expressed by request methods, request header fields, response status codes, and response header fields, along with the payload of messages (metadata and body content) and mechanisms for content negotiation.Hypertext Transfer Protocol (HTTP/1.1): CachingThe Hypertext Transfer Protocol (HTTP) is a stateless \%application- level protocol for distributed, collaborative, hypertext information systems. This document defines HTTP caches and the associated header fields that control cache behavior or indicate cacheable response messages.Ambiguity of Uppercase vs Lowercase in RFC 2119 Key WordsRFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.Augmented BNF for Syntax Specifications: ABNFInternet technical specifications often need to define a formal syntax. Over the years, a modified version of Backus-Naur Form (BNF), called Augmented BNF (ABNF), has been popular among many Internet specifications. The current specification documents ABNF. It balances compactness and simplicity with reasonable representational power. The differences between standard BNF and ABNF involve naming rules, repetition, alternatives, order-independence, and value ranges. This specification also supplies additional rule definitions and encoding for a core lexical analyzer of the type common to several Internet specifications. [STANDARDS-TRACK]Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and RoutingThe Hypertext Transfer Protocol (HTTP) is a stateless application-level protocol for distributed, collaborative, hypertext information systems. This document provides an overview of HTTP architecture and its associated terminology, defines the "http" and "https" Uniform Resource Identifier (URI) schemes, defines the HTTP/1.1 message syntax and parsing requirements, and describes related security concerns for implementations.Matching of Language TagsThis document describes a syntax, called a "language-range", for specifying items in a user's list of language preferences. It also describes different mechanisms for comparing and matching these to language tags. Two kinds of matching mechanisms, filtering and lookup, are defined. Filtering produces a (potentially empty) set of language tags, whereas lookup produces a single language tag. Possible applications include language negotiation or content selection. This document, in combination with RFC 4646, replaces RFC 3066, which replaced RFC 1766. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.HTTP Client HintsAn increasing diversity of Web-connected devices and software capabilities has created a need to deliver optimized content for each device. This specification defines an extensible and configurable set of HTTP request header fields, colloquially known as Client Hints, to address this. They are intended to be used as input to proactive content negotiation; just as the Accept header field allows user agents to indicate what formats they prefer, Client Hints allow user agents to indicate device and agent specific preferences.Transparent Content Negotiation in HTTPHTTP allows web site authors to put multiple versions of the same information under a single URL. Transparent content negotiation is an extensible negotiation mechanism, layered on top of HTTP, for automatically selecting the best version when the URL is accessed. This enables the smooth deployment of new web data formats and markup tags. This memo defines an Experimental Protocol for the Internet community. It does not specify an Internet standard of any kind. Discussion and suggestions for improvement are requested.Registration Procedures for Message Header FieldsThis specification defines registration procedures for the message header fields used by Internet mail, HTTP, Netnews and other applications. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.This appendix defines the required information to use existing proactive content negotiation mechanisms (as defined in , Section 5.3) with the Variants header field.This section defines handling for Accept variants, as per Section 5.3.2.To perform content negotiation for Accept given a request-value and available-values:Let preferred-available be an empty list.Let preferred-types be a list of the types in the request-value, ordered by their weight, highest to lowest, as per Section 5.3.2 (omitting any coding with a weight of 0). If “Accept” is not present or empty, preferred-types will be empty. If a type lacks an explicit weight, an implementation MAY assign one.If the first member of available-values is not a member of preferred-types, append it to preferred-types (thus making it the default).For each preferred-type in preferred-types:
If any member of available-values matches preferred-type, using the media-range matching mechanism specified in Section 5.3.2 (which is case-insensitive), append those members of available-values to preferred-available (preserving the precedence order implied by the media ranges’ specificity).Return preferred-available.Note that this algorithm explicitly ignores extension parameters on media types (e.g., “charset”).This section defines handling for Accept-Encoding variants, as per Section 5.3.4.To perform content negotiation for Accept-Encoding given a request-value and available-values:Let preferred-available be an empty list.Let preferred-codings be a list of the codings in the request-value, ordered by their weight, highest to lowest, as per Section 5.3.1 (omitting any coding with a weight of 0). If “Accept-Encoding” is not present or empty, preferred-codings will be empty. If a coding lacks an explicit weight, an implementation MAY assign one.If “identity” is not a member of preferred-codings, append “identity”.Append “identity” to available-values.For each preferred-coding in preferred-codings:
If there is a case-insensitive, character-for-character match for preferred-coding in available-values, append that member of available-values to preferred-available.Return preferred-available.Note that the unencoded variant needs to have a Variant-Key header field with a value of “identity”.This section defines handling for Accept-Language variants, as per Section 5.3.5.To perform content negotiation for Accept-Language given a request-value and available-values:Let preferred-available be an empty list.Let preferred-langs be a list of the language-ranges in the request-value, ordered by their weight, highest to lowest, as per Section 5.3.1 (omitting any language-range with a weight of 0). If a language-range lacks a weight, an implementation MAY assign one.If the first member of available-values is not a member of preferred-langs, append it to preferred-langs (thus making it the default).For each preferred-lang in preferred-langs:
If any member of available-values matches preferred-lang, using either the Basic or Extended Filtering scheme defined in Section 3.3, append those members of available-values to preferred-available (preserving their order).Return preferred-available.