Network Working Group C. Drechsler, Ed.
Internet-Draft Technische Universitaet Chemnitz
Intended status: Standards Track May 16, 2016
Expires: November 17, 2016

Hypertext Transfer Protocol: Improved HTTP Caching
draft-drechsler-httpbis-improved-caching-05

Abstract

This document describes an improved HTTP caching method which can be applied in addition to the standard caching behavior for HTTP. It defines the associated header field that controls this improved caching mechanism and a modified caching operation which is slightly different to standard caching operation for HTTP.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on November 17, 2016.

Copyright Notice

Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English.


Table of Contents

1. Introduction

HTTP caching has a significant potential for reducing Interdomain traffic, especially when shared caches are used within operator networks. Recent studies have shown very promising results regarding the cacheability of HTTP traffic (see [Ager], [Erman]).

Unfortunately this potential can not be fully used by the standard caching behavior described in [RFC7234]. The following two reasons mainly limit the benefit of caching today:

  1. Different URLs for one specific resource:

  2. Personalization of HTTP messages in the header:

The goal of this proposal is to address these challenges and come up with caching, varying URLs and personalization.

1.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

2. Specification

The approach for an improved HTTP caching in this proposal is twofold.

Section 2.1 introduces a new header field with a hash value. This is used for precisely identifying the transfered content in the body of HTTP messages and to signal the permission for caching and reusing of the body in intermediate cache systems.

The modified caching operation described in Section 2.2 uses the above-mentioned header field and ensures that all headers (of HTTP request and response messages) are exchanged between client and server even if the body of a response message is coming from an intermediate cache systems.

2.1. HTTP header field extension

For precisely identifying the transfered content independent of the used URL and independent of additional header fields in the context of content negotiation the following header field is used:

    Cache-NT: sha-256 "=" <base64 encoded sha256 output>
            

The new header field carries an SHA-256 value (algorithm as in [SHS]) which is computed and encoded the following way:

  1. When a client wants to retrieve a specific content it uses a HTTP GET request with a URL to address the resource. Additionally the client can use further header fields to negotiate that representation of the resource which fits best for the client (this mechanisms is called content negotiation in [RFC7231]). The SHA-256 value MUST be computed over that representation of the resource which would be send by the server to the client in case of a successful response with status code 200 OK.
  2. The SHA-256 hash value MUST be computed before the modifications of the possibly present header fields Content-Encoding, Content-Range and Transfer-Encoding are applied.
  3. The SHA-256 hash value MUST always be computed over the full representation even if only parts of it are transfered to the client (e. g. partial content, delta encoding). The hash value serves as an unique identifier for intermediate cache systems to identify also parts of the full representation.
  4. The SHA-256 hash value MUST be computed by the origin server. It SHOULD be computed only once (when the resource is made available on the server or when the resource has changed). It SHOULD NOT be computed in the moment when the server receives the request due to not delaying the response.
  5. After computing the SHA-256 hash value the output of it MUST be base64 encoded without line wrapping.

The Cache-NT header field is send by the server in successful responses with status codes 200 or 206. If the header field is present then the server signals that the body of the response can be used for caching by intermediate cache systems for subsequent requests in compliance with the cache operation described in Section 2.2.

In the following some examples are given:

Example header field:

    Cache-NT: sha-256=ZDJhODRmNGI4YjY1M ... DgyMjlkYTgwNGEyNiAgLQo=

Example for computation of the hash value under UNIX:

    sha256sum PopularVideo.mp4 | base64 -w0

Several examples of request-response pairs:

  a)
    +---------------------------------------------+
    | GET /videos/PopularVideo.webm HTTP/1.1      |
    | Host: example.com                           |
    +---------------------------------------------+

        +---------------------------------------------+
        | HTTP/1.1 200 OK                             |
        | Content-Type: video/webm                    |
        | Cache-NT: sha-256=AAAAAAAAAA...AAAAAAAAAA   |
        | ...                                         |
        +---------------------------------------------+
  b)
    +---------------------------------------------+
    | GET /videos/PopularVideo.webm HTTP/1.1      |
    | Host: example.com                           |
    | Range: bytes=0-499                          |
    +---------------------------------------------+
    
        +---------------------------------------------+
        | HTTP/1.1 206 Partial Content                |
        | Content-Type: video/webm                    |
        | Content-Range: bytes 0-499/1000             |
        | Cache-NT: sha-256=AAAAAAAAAA...AAAAAAAAAA   |
        | ...                                         |
        +---------------------------------------------+
        
        => same hash value as in a) because only a part of the
           representation is requested
           
  c)
    +---------------------------------------------+
    | GET /videos/PopularVideo HTTP/1.1           |
    | Host: example.com                           |
    | Accept: video/mp4                           |
    +---------------------------------------------+
    
        +---------------------------------------------+
        | HTTP/1.1 200 OK                             |
        | Content-Type: video/mp4                     |
        | Cache-NT: sha-256=BBBBBBBBBB...BBBBBBBBBB   |
        | ...                                         |
        +---------------------------------------------+
        
        => different representation as in a) and b) results in a
           different hash value
            

2.2. Modified cache operation

The modified cache operation is slightly different to the one in [RFC7234]. It uses the header field described in Section 2.1 and ensures that all headers (of HTTP request and response messages) are exchanged between client and server even if the body of a response message is coming from an intermediate cache systems. Client requests will never terminate at intermediate cache systems as in [RFC7234].

2.2.1. Incoming Request Messages

Incoming request messages MUST always be forwarded to the origin server by the intermediate cache system.

For HTTP/1.0 or HTTP/1.1 requests the cache system SHOULD keep track of the desired connection state by evaluating the Connection header field.

For HTTP/1.1 requests the cache system MUST keep track of all pipelined requests.

2.2.2. Incoming Response Messages

The cache system analyzes the header of incoming response messages. If the status code IS NOT 200 or 206 then the response is forwarded to the client without modifications. If the status code IS 200 or 206 then the cache system looks for the Cache-NT header field (described in Section 2.1). Two situations can arise:

  1. The Cache-NT header field IS NOT present:

  2. The Cache-NT header field IS present:

+-----------------+                             +-----------------+
| HEADER (Client) | <-------------------------- | HEADER (Client) |
|-----------------|     request is forwarded    |-----------------|
|  BODY (Client)  | <-------------------------- |  BODY (Client)  |
+-----------------+                             +-----------------+

                                                
############               ############                ############
#          # <------------ #          # <------------- #          #
#  Server  #               #  Cache   #                #  Client  #
#          # ------------> #          # -------------> #          #
############               ############                ############


+-----------------+                             +-----------------+
| HEADER (Server) | --------------------------> | HEADER (Server) |
|-----------------|   HEADER (Server) + BODY    |-----------------|
|  BODY (Server)  |   (Cache) is forwarded      |                 |
|                 |             --------------> |  BODY (Cache)   |
        ...                     |               |                 |
                                |               +-----------------+
                                |
                                | local stored copy of the body is
                                | used and concatenated with the 
                                | header from the server
                                |
                                |
                   ============ | ============
                   ||           |           ||                   
                   ||           |           ||
                   ||  +-----------------+  ||     
                   ||  |                 |  ||
                   ||  |  BODY (Cache)   |  ||
                   ||  |                 |  ||
                   ||  +-----------------+  ||
                   ||                       ||
                   ||                       ||
                   ||                       ||
                   ||     cache storage     ||
                   ||                       ||
                   ===========================  
                

Cache operation in case of cache hit.

Figure 1

+-----------------+                             +-----------------+
| HEADER (Client) | <-------------------------- | HEADER (Client) |
|-----------------|     request is forwarded    |-----------------|
|  BODY (Client)  | <-------------------------- |  BODY (Client)  |
+-----------------+                             +-----------------+

                                                
############               ############                ############
#          # <------------ #          # <------------- #          #
#  Server  #               #  Cache   #                #  Client  #
#          # ------------> #          # -------------> #          #
############               ############                ############


+-----------------+                             +-----------------+
| HEADER (Server) | --------------------------> | HEADER (Server) |
|-----------------|  response (HEADER + BODY)   |-----------------|
|                 |  is forwarded               |                 |
|  BODY (Server)  | --------------------------> |  BODY (Server)  |
|                 |             |               |                 |
+-----------------+             |               +-----------------+
                                |
                                | copy of body is stored in cache
                                |
                                |
                   ============ | ============
                   ||           |           ||                   
                   ||           V           ||
                   ||  +-----------------+  ||     
                   ||  |                 |  ||
                   ||  |  BODY (Server)  |  ||
                   ||  |                 |  ||
                   ||  +-----------------+  ||
                   ||                       ||
                   ||                       ||
                   ||                       ||
                   ||     cache storage     ||
                   ||                       ||
                   ===========================  
                

Cache operation in case of cache miss.

Figure 2

2.3. Suggestions

In case of a cache hit the cache system aborts the transfer of the response body from the server after the whole header has been received (see Section 2.2). As the transfer of the body cannot be aborted immediately the server will still send some parts of the body. How many Kilobytes are transfered depends mainly on the congestion window of the underlying TCP connection. If the congestion window is small then only a few Kilobytes of the response will go over the wire.

Evaluations at Technische Universitaet Chemnitz have shown that at least around 20 Kilobytes are transfered between origin server and cache system in case of a cache hit (this is for a HTTP/1.0 or HTTP/1.1 request right after opening a TCP connection). Therefore including the Cache-NT header field for small resources does not make much sense from the point of caching as the whole body is being transfered before the cache system can abort it.

3. IANA Considerations

3.1. Header Field Registration

HTTP header fields are registered within the Message Header Field Registry maintained at <http://www.iana.org/assignments/message-headers/>.

This document defines the following HTTP header fields, so their associated registry entries shall be updated according to the permanent registrations below (see [BCP90]):

Header Field Name Protocol Status Reference
Cache-NT http proposed standard Section 2.1

The change controller is: "IETF (iesg@ietf.org) - Internet Engineering Task Force".

3.2. Cache Directive Registration

This document defines the following HTTP header field directives:

Cache Directive Reference
sha-256 Section 2.1

4. Security Considerations

This section is meant to inform developers, information providers, and users of known security concerns specific to the caching mechanism described in this proposal. In addition more general security considerations of HTTP caching are discussed in Section 8 of [RFC7234].

The cache operation in Section 2.2 uses the Cache-NT header field (see Section 2.1) in incoming response messages. If the hash value in the Cache-NT header field of the (server) response does not correspond to the representation in the body of that response then a wrong body is maybe concatenated to the header of the server and send to the client (this occurs when the cache system has an cache entry which fits to the hash value in the response of the server). Origin server SHOULD always include the correct hash value in the Cache-NT header field which fits to the representation in the body. Intermediaries MUST NOT change the hash value in the Cache-NT. In addition the client can compute the hash value over the full representation (in case of responses with 200 OK) itself and can re-validate it with the value in the Cache-NT header field.

If a cache system does not have a cache entry which fits to the hash value in the Cache-NT header field then it forwards the response to the client and stores a local copy of the body (see Section 2.2). To prevent cache poisoning the cache system SHOULD compute the hash value over the full representation in the body (in case of responses with 200 OK) itself and SHOULD re-validate it with the value in the Cache-NT header field.

Another security concern will arise if significant security flaws in the used hash algorithm (currently SHA-256) are detected. Then the cache can easily be poisoned. In this case origin servers and intermediate cache systems MUST switch to another hash algorithm (e. g. SHA-512 or the upcoming SHA-3 family).

5. References

5.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC7234] Fielding, R., Nottingham, M. and J. Reschke, "Hypertext Transfer Protocol (HTTP/1.1): Caching", RFC 7234, June 2014.

5.2. Informative References

[Ager] Ager, B., Schneider, F., Juhoon, K. and A. Feldmann, "Revisiting Cacheability in Times of User Generated Content", IEEE Conference on Computer Communications, Workshops pp. 1-6, March 2010.
[BCP90] Klyne, G., Nottingham, M. and J. Mogul, "Registration Procedures for Message Header Fields", BCP 90, RFC 3864, September 2004.
[Erman] Erman, J., Gerber, A., Hajiaghayi, M., Pei, D. and O. Spatscheck, "Network-aware forward caching", Proceedings of the 18th international conference on World wide web pp. 291-300, 2009.
[RFC7231] Fielding, R. and J. Reschke, "Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content", RFC 7231, June 2014.
[SHS] National Institute of Standards and Technology, "Secure Hash Standard (SHS)", FEDERAL INFORMATION PROCESSING STANDARDS PUBLICATION 180-4, U.S. Department of Commerce , March 2012.

Author's Address

Chris Drechsler (editor) Technische Universitaet Chemnitz 09107 Chemnitz, Germany EMail: chris.drechsler@etit.tu-chemnitz.de