D. Li
                                                                Cisco
                                                               P. Cao
                                                                Cisco
                                                            M. Dahlin
                                                        Univ of Texas
Internet Draft
Document: draft-danli-wrec-wcip-01.txt                     March 2001
Category: Experimental


                 WCIP: Web Cache Invalidation Protocol


Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that
   other groups may also distribute working documents as Internet-
   Drafts. Internet-Drafts are draft documents valid for a maximum of
   six months and may be updated, replaced, or obsoleted by other
   documents at any time. It is inappropriate to use Internet- Drafts
   as reference material or to cite them other than as "work in
   progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt
   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

   Cache consistency is a major impediment to scalable content
   delivery, because periodical revalidating objects one by one is
   unacceptable in terms of performance and/or cache consistency. This
   document describes the Web Cache Invalidation Protocol (WCIP). WCIP
   uses invalidations and updates to keep changing objects up to date
   in web caches. It thus enables proxy caching and content
   distribution of large amounts of frequently changing web objects.

   WCIP runs between the invalidation server, the participating web
   caches, and channel relay points (if any). An invalidation server
   maintains one or more invalidation channels, each of which covers a
   class of related objects, called an "object volume". E.g., the CNNfn
   channel may cover an object volume with the day's top financial news
   and stock quotes. Web caches subscribe to channel(s) they are
   interested in, while the invalidation server(s) send out
   invalidations and/or up-to-date objects to the channel(s). Besides
   server-driven invalidation, WCIP also supports client-driven
   validation of object volumes.


Li & Cao & Dahlin   Experimental - September 2001                   1

                     Draft-danli-wrec-wcip-01.txt           March 2001


   WCIP employs heartbeats to guarantee the freshness of the cached
   objects even under network or server failure. Moreover, WCIP can set
   up channel relay points via a cache hierarchy or a CDN (content
   delivery network). A channel relay point performs channel relay
   (one-to-many) and connection aggregation (many-to-one).

Revision Log

   1. Introduce the concept of "object volume" to clear the common
      confusion between the transport (invalidation channel) and the
      unit of consistency (object volume).
   2. Remove "targeted service" (for now) as it significantly
      complicates the protocol without obvious benefit, given that
      "channel" is already a form of coarse filtering.
   3. Describe "client-driven volume validation" as an operation mode
      as opposed to a special case of channel registration with
      infinite heartbeat interval.
   4. Specify the "channel abstraction". Describe a HTTP-based channel
      implementation. Other implementations, e.g., Beep, IP multicast,
      can be future work.
   5. Add the protocol state machine.

Table of Contents

   1. Introduction ......................................2
   2. Terminology .......................................4
   3. Design Issues .....................................6
        3.1 Freshness Guarantee
        3.2 Object Volume
        3.3 Channel Abstraction
   4. Deployment Issues ................................12
        4.1 Channel Relay
        4.2 Detect Changes
        4.3 Discover Channels
        4.4 Join Channels
   5. Protocol Specification ...........................15
        5.1 Object Volume DTD
        5.2 Client-initiated Volume Synchronization
        5.3 Server-initiated Volume Synchronization
        5.4 Serving Content
   6. Protocol State Machine ...........................22
        6.1 Client State Machine
        6.2 Server State Machine
   7. Security Concerns ................................25
   8. References .......................................26
   9. Acknowledgments ..................................27
   10.Authors' Addresses ...............................27


1. Introduction

   In web proxy caching, a document is downloaded once from a web
   server to a caching proxy, which then serves the document to end-

Li & Cao & Dahlin   Experimental - September 2001                   2

                     Draft-danli-wrec-wcip-01.txt           March 2001


   users repeatedly out of the cache. This offsets the load on the web
   server, improves the response time to the users, and reduces the
   bandwidth consumption. When the document seldom changes, everything
   works out wonderfully. However, the hard part is when the document
   is popular but also frequently changing.

   Frequently changing content is quickly becoming a significant
   percentage of Web traffic, e.g., news and stock quotes, shopping
   catalog and prices, product inventory and orders, etc. Because the
   content is changing, the caching proxy has to frequently poll the
   web server for a fresh copy and still tends to return stale data to
   end-users. Specifically, a proxy using "adaptive TTL" is unable to
   ensure strong cache consistency, and yet "poll every time" is costly
   [1]. So a content provider usually sets a very short expiration time
   or marks frequently changing documents as non-cacheable all
   together. This defeats the benefit of caching, even though those
   objects may be cached, should the proxy know when the document
   becomes obsolete [2]. Moreover, if the proxy can be informed of the
   change to the underlying data that a web object is generated from,
   the proxy can re-generate the web object on its own, making it
   possible to distribute "dynamically computed content".

   Addressing this problem, WCIP (Web Cache Invalidation Protocol)
   provides freshness guarantees to content providers while keeping the
   cost of doing so low. Using WCIP, a web server can advertise to
   caching proxies an "object volume" and the corresponding
   invalidation channel, identified as an URI.

   To provide freshness guarantees to objects in the object volume, a
   caching proxy subscribes to the invalidation channel and obtains an
   up-to-date view of the object volume -- a process referred to as
   "volume synchronization". After the initial volume synchronization,
   to stay synchronized, the invalidation channel operates in either
   the server-driven mode or the client-driven mode (or a mix of both).

   In the server-driven mode, the invalidation server sends
   invalidations to the channel whenever changes happen to the volume,
   while the proxy listens passively. The server also generates
   heartbeats so that the freshness guarantees can be met even upon
   network partition or server crash. The heartbeat interval is
   determined by the freshness guarantees required for the object
   volume.

   In the client-driven mode, the invalidation server doesn't
   proactively send updates to the channel. The caching proxy
   periodically initiates "volume synchronization" to revalidate the
   volume, at which time the invalidation server returns all the
   updates made to the volume since the last time the proxy validated
   the volume. The revalidation interval is determined by the freshness
   guarantees required for the object volume.

   The two modes are merely the two extremes of a continuum,
   characterized by how soon the server proactively sends

Li & Cao & Dahlin   Experimental - September 2001                   3

                     Draft-danli-wrec-wcip-01.txt           March 2001


   updates/heartbeats and how soon the proxy revalidates the volume.
   The sooner the revalidation, the quicker objects are invalidated;
   this results in better consistency but also more load on the server
   and proxy. Regardless of the mode, same messages are exchanged
   between the invalidation server and the caching proxies, whose
   format is defined by "ObjectVolume" XML DTD in Section 5.1. Each
   round of message exchange, whether initiated by the server or the
   client, is a process of "volume synchronization" and results in an
   up-to-date view of the object volume. Based on the up-to-date view,
   the proxy can provide freshness guarantees to all the objects in the
   volume.


2. Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
   this document are to be interpreted as described in RFC-2119 [3].
   Since WCIP makes some extensions to HTTP, please refer to RFC-2616
   [4] for HTTP related terminology. Following are WCIP-related terms.

   Cache Consistency

        A property that the replica data item reflects its master copy
   in a certain fashion. There are at least 3 fashions. (1) Strong
   consistency -- the replica must be always the same as the master.
   (2) Delta consistency -- the replica must become the same as the
   master at most "delta" seconds after the master is updated. (3)
   Eventual consistency -- the replica must become the same as the
   master at some unknown point in the future. WCIP provides "delta
   consistency" where "delta" is the freshness guarantee.

   Freshness Guarantee

        A promise that the invalidation client will not service content
   (belonging to the object volume) from the cache after X seconds of
   known or presumed update at the origin server, where X is the
   freshness guarantee and is specified by the content provider. In
   other words, the invalidation client never delivers cached content
   that is more than X seconds stale, regardless of network partition,
   proxy failure, or server failure. A freshness guarantee provides
   "delta consistency" and also allows "eventual consistency" (i.e.,
   when X is infinite).

   Object Volume

        A set of correlated web objects, their consistency state, and
   their freshness guarantee. The object volume is employed as the unit
   of consistency as well as the unit of filtering. A consistent view
   of the object volume implies the consistent view of every object in
   the object volume. Also called "volume".

   Volume Synchronization

Li & Cao & Dahlin   Experimental - September 2001                   4

                     Draft-danli-wrec-wcip-01.txt           March 2001


        The act of updating the object volume at the invalidation
   client with that at the invalidation server. Either the server or
   the client can initiate volume synchronization. After the volume
   synchronization, the two mutually agree on the consistency states of
   objects in the volume (for "freshness guarantee" time long).

   Last Synchronization Time

        The time of the last volume synchronization.

   Invalidation Channel

        A transport abstraction that carries messages between the
   invalidation server and the invalidation client(s) for the purpose
   of volume synchronization. Also called "channel".

   Invalidation Server

        An application program that provides WCIP services to caching
   proxies. It maintains the master copy of the object volume and
   disseminates the volume and changes to volume to caching proxies.
   (The invalidation server logically differs from the origin server
   because a cache may fill a request from a CDN content server or a
   replica origin server. The cache may not be able to tell these
   various sources from the origin server. The WCIP service may not
   reside on each or any of them. "Invalidation server" uniquely
   identifies the source of the WCIP service.)

   Invalidation Client

        A web cache, usually a caching proxy, which subscribes to the
   invalidation channel and maintains a consistent view of the object
   volume. Also referred to as the "proxy".

   Server-driven mode

        An operation mode, where the invalidation server proactively
   sends changes made to the object volume as well as heartbeats to
   invalidation clients via the invalidation channel, for volume
   synchronization.

   Client-driven mode

        An operation mode, where the invalidation client periodically
   queries the invalidation server via the invalidation channel, for
   volume synchronization. The server replies with the changes made to
   the object volume since the last time the client asked.

   Channel Address


Li & Cao & Dahlin   Experimental - September 2001                   5

                     Draft-danli-wrec-wcip-01.txt           March 2001


        Information that a caching proxy needs in order to access the
   channel, e.g., the name of the channel, the address (domain name) of
   the invalidation server, the security mode, etc.

   Channel Replay

        An intermediary program that subscribes to one or multiple
   invalidation channels on behalf of its clients (e.g., downstream
   proxies) and relay the channel messages to its clients. It MUST
   implement both the invalidation server and the invalidation client.

   Heartbeat

        A periodic message sent by the invalidation server to keep the
   channel from being silent for too long. It allows the invalidation
   client to verify the channel connectivity and source liveliness, so
   as to confirm that the volume remains synchronized.

   Heartbeat Interval

        A property of the server-driven mode. The invalidation server
   sends heartbeat to the invalidation channel if the channel is silent
   for the last heartbeat interval. The interval MUST be smaller than
   the freshness guarantees of the objects in the object volume, or the
   object volume may lose synch.

   Revalidation Interval

        A property of the client-driven mode. The invalidation client
   initiates volume synchronization with the invalidation server, when
   the "last synchronization time" was "revalidation interval" ago. The
   interval SHOULD be smaller than the freshness guarantees of all the
   objects in the object volume, to avoid unnecessary cache misses.

   Invalidation Latency

        The time between an object is updated at the origin server to
   the time the old copy is treated as stale at all the participating
   proxies. The goal of a freshness guarantee of X seconds is to
   guarantee that the invalidation latency is within X seconds at all
   times.

   Content Delivery Network (CDN)

        A self-organizing network of geographically distributed content
   delivery nodes (reverse proxies) for contracted content providers,
   capable of directing requests to the best delivery node for global
   load balancing and best client response time.


3. The Design


Li & Cao & Dahlin   Experimental - September 2001                   6

                     Draft-danli-wrec-wcip-01.txt           March 2001


   Before the specifics, here are some design principles this protocol
   tries to follow:

   (1)  Simple and effective: try to design a lightweight client and
        leave complexity to the server, then use multicast and channel
        relay points to address the server scalability. Also, try to
        leverage off-the-shelf components as much as possible. Example
        may include HTTP, SSL, XML, Beep, etc.

   (2)  Logical separation of the invalidation server and the origin
        server: this is because WCIP needs to work with CDNs and
        distributed data centers. There may be multiple authoritative
        sources of an object. "Invalidation server" uniquely identifies
        where the invalidation source is, not where the content
        initially is fetched. It also allows for delegation of
        invalidation service to a 3rd party, possibly a CDN provider.

   (3)  Clear separation of the notification transport and the
        notification semantics: WCIP includes a transport abstraction
        (invalidation channel) and then the cache consistency semantics
        (object volume). This separation makes the protocol clearly
        layered, much more understandable, and extensible. Moreover,
        the message body is specified in XML, making the protocol
        extensible to other types of notifications.

3.1 Freshness Guarantee

   WCIP provides reliable invalidations and consistency guarantees so
   that content providers can make their frequently changing content
   cacheable. It's important that WCIP guarantees that, in the worst
   case, a proxy subscribed to an invalidation channel will not service
   stale content X seconds after the content is updated at the origin
   server, regardless of network partition or server failure. The
   content provider can specify the value of X, e.g., to 5 minutes.

   In the normal case, this is not hard. Using WCIP, the proxy will not
   deliver any stale object as soon as an invalidation arrives from the
   server. The invalidation latency only depends on network propagation
   and queuing delay, which are typically within a second. In other
   cases, however, when the network or the invalidation server is down,
   invalidations cannot reach the proxy in a timely fashion. To ensure
   an upper bound on the invalidation latency, the proxy MUST
   invalidate content automatically if it hasn't been able to
   synchronize the object volume for a certain period of time, assuming
   the server or network may be down and the volume may have changed.

   Therefore, to control the freshness, the content provider specifies
   a "freshness guarantee" for each object in the volume, while the
   caching proxy keeps track of the "last synchronization time". Then,
   upon serving a client HTTP request, the proxy MAY use the cached
   object only if the time elapsed since the last synchronization time
   is less than the object's freshness guarantee. Otherwise, the cached
   object is marked as stale and MUST NOT be served from the cache

Li & Cao & Dahlin   Experimental - September 2001                   7

                     Draft-danli-wrec-wcip-01.txt           March 2001


   without HTTP revalidation. The proxy is RECOMMENDED not to remove
   the object right away as HTTP revalidation could result in an
   indication that the object is "Not Modified".

   To prevent unnecessary cache misses during normal operation, the
   "last synchronization time" needs to be kept within the freshness
   guarantees. Hence, in the server-driven mode, the invalidation
   server sends heartbeats whenever the channel has been silent for the
   last "heartbeat interval", so as to confirm to the proxies that the
   volume hasn't changed. Similarly, in the client-driven mode, for
   every "revalidation interval", the proxy queries the invalidation
   server to make sure it holds the up-to-date copy of the volume.

   The invalidation server picks the heartbeat interval while the
   invalidation client picks the revalidation interval. Both of them
   SHOULD be smaller than any of the freshness guarantees of the
   objects in the volume, to avoid unnecessary cache misses. Moreover,
   the invalidation server SHOULD send invalidations "reasonably" soon
   after it learns of an object change, but it MAY delay the
   synchronization until some time before the subsequent heartbeat.
   Such a strategy allows the server to batch multiple changes into one
   update, without inducing unnecessary cache misses.

   In essence, there are two consistency concepts: the average-case
   staleness and the worst-case staleness:

   (1)  The worst-case staleness is bounded by the freshness guarantee
        and enforced by the proxy not using a cached object if the time
        elapsed since the last volume synchronization time is more than
        the object's freshness guarantee.

   (2)  The average-case staleness is controlled by the heartbeat
        interval and the revalidation interval. The more aggressively
        that the server sends invalidations or the proxy revalidates
        the volume, the better average-case staleness that can be
        achieved.

   At one extreme, the server sends invalidations immediately when an
   object is modified. Then average-case staleness (for clients that
   are reachable) is on the order of server queue delays plus network
   delays (typically only a few seconds). At another extreme, the
   server doesn't bother with pushing invalidations. Then average case
   = worst case. Middle ground is feasible. For example, a server can
   batch invalidations; every 30 seconds, send the clients a list of
   invalidations that have happened in the last 30 seconds.

   While WCIP uses freshness guarantee to provide "delta consistency",
   it also supports a more relaxed form of consistency -- "eventual
   consistency". I.e., when the freshness guarantee is set to be much
   larger than the typical object modification interval or even set to
   infinite. Then WCIP is similar to best-effort invalidation delivery
   and is subject to network and server failures. As long as the


Li & Cao & Dahlin   Experimental - September 2001                   8

                     Draft-danli-wrec-wcip-01.txt           March 2001


   heartbeat interval or revalidation interval is not infinite, the
   caching proxies achieve eventual consistency.


3.2 Object Volume

   An "object volume" is a set of correlated objects that are updated
   by the same invalidation Channel. E.g., an "eBay auction" channel
   contains the most active auction pages. A "SFO flight schedule"
   channel contains pages describing various airline flights that are
   departing from or arriving at SFO. "Object volume" serves two
   purposes, as the unit of filtering and as the unit of invalidation.

   Using volume as the unit of filtering, caching proxies may subscribe
   to updates for certain object volumes but not others, based on the
   interests of the population they are serving. The strategy for
   forming volumes is to group "correlated" objects into one "volume".
   This way, if a caching proxy is interested in some objects in the
   volume, it's highly likely that it is or will be interested in the
   other objects in the volume as well. Examples include CNNfn, ESPN,
   NBA, etc., similar to TV programming. An object volume of reasonable
   size and correlation helps to reduce unwanted invalidations and
   amortize the channel cost [5][6][7].

   Object volume also serves as the unit of consistency. If the caching
   proxy obtains the up-to-date view of the volume, it follows that the
   caching proxy has the up-to-date view of every object in the volume
   (in terms of Last-Modified time and Etag). This allows one "volume
   synchronization" exchange to (in)validate all the objects in the
   volume, greatly improving efficiency compared to per-object HTTP
   validation. Moreover, when a web site updates its content, often it
   would like to preserve a consistent view of the site. I.e., it would
   like the end-users to see either entirely the new content or
   entirely the old content, not a bit of the new at some web pages and
   a bit of the old at other pages. By grouping these correlated web
   pages into one object volume, one can atomically invalidate the
   entire volume and thus preserve the coherent view.

   An object volume is described in XML (see section 5 for the DTD). In
   essence, it is a collection of object meta-data and can be retrieved
   incrementally based on its version.

   Whenever the caching proxy subscribes to an invalidation channel,
   the first thing it does is to synchronize the object volume with the
   server. Before synchronization, the proxy knows nothing about the
   volume and cannot cache objects that are non-cacheable according to
   HTTP Cache-Control directives. After synchronization, the proxy
   knows what objects are covered by the volume, whether its local
   cache copies are stale or not, and each object's freshness
   guarantee. The proxy SHOULD ignore the normal HTTP Cache-Control
   directives for these objects, such as no-store, expires, and max-
   age. But it SHOULD still honor directives such as "private". [Note:
   thorough specification on HTTP Cache-Control is needed.]

Li & Cao & Dahlin   Experimental - September 2001                   9

                     Draft-danli-wrec-wcip-01.txt           March 2001


   Once synchronized, volume re-synchronize does not need to return the
   entire object volume again. Depending on the last version the proxy
   synchronized, the server sends the list of changes made to the
   volume since the last version, which can be substantially smaller
   than the entire volume. This facilitates quick re-synchronization.

3.3 Channel Abstraction

   An invalidation channel may be implemented in many different ways,
   e.g., using HTTP, Beep, or IP multicast. It's out of the scope of
   WCIP to design this transport layer. However, specified here is the
   channel abstraction that a specific implementation ought to provide:

   (1)  Naming: channels are named as URIs. For interoperability, the
        channel name MUST indicate the transport implementation. E.g.,
        "wcip://my.net:80/channel/name?proto=http" denotes a channel
        carried on top of HTTP. A channel implementation MUST be able
        to translate the channel URI into the addressing information
        that the implementation is using.

   (2)  Subscription: provide an interface for channel subscription
        based on the channel URI as well as notifying the upper layer
        whenever the subscription terminates unexpectedly. Once
        subscribed, the caching proxy can start to send to and receive
        from the channel.

   (3)  Framing: channel messages are self-describing and well-formed
        XML text. Each "send" and "recv" by the invalidation server or
        client returns the entire XML message.

   (4)  Delivery: delivery SHOULD be real-time in that the average
        latency should be comparable to the network round-trip time
        from the sender to the receiver. It's RECOMMENDED that the
        delivery be reliable, full duplex, and in sequence (wrt. the
        sender) to achieve good performance, although it's not
        required.

   (5)  Security: a channel can be configured into clear text, or
        signed for integrity, or encrypted for secrecy. Channel
        subscription can be either open or authenticated.

   (6)  Scalability: help to ensure that the invalidation server
        doesn't become overwhelmed by excessive load, by providing
        either IP multicast (later in this section) or channel relay
        (see section 4.1).

   (7)  Environment: be able to operate across wide-area networks and
        across administrative domains (i.e., firewalls). Some may be
        multicast-capable and some may not.

   An implementation on top of HTTP (RFC-2616) is as follows:


Li & Cao & Dahlin   Experimental - September 2001                  10

                     Draft-danli-wrec-wcip-01.txt           March 2001


   (1)  Naming: A HTTP-based channel is denoted as
        "wcip://<host>:<port>/a/hiearchical/name?proto=http".

   (2)  Framing: messages are sent as HTTP POST requests with the
        request body being the message. The request URI is the channel
        URI. The request response carries the message response (if
        any).

   (3)  Subscription: a persistent connection is established to the
        host and port as specified in the channel URI. If multiple
        channels have the same server address and port, they can share
        the same persistent connection. Tear-down of a persistent
        connection and re-establishment of a new connection represents
        a possible loss of synchronization and MUST trigger a volume
        synchronization.

   (4)  Reliability: HTTP runs on top of TCP so is reliable. All
        server-driven messages are sent on the persistent connection in
        a first-come-first-serve order. Pipelining may be used to
        improve latency and throughput.

   (5)  Security: use HTTPS when the channel needs to be secure. The
        URI is "wcip://<host>:<port>/a/hiearchical/name?proto=https".

   (6)  Scalability: when there are too many HTTP connections to the
        server, the server can instruct the channel implementation to
        use HTTP Location header to redirect new connections to a
        multicast channel or a channel relay point, which can be chosen
        by static configuration or CDN routing.

   (7)  Environment: to cross administrative domains, the channel must
        be on a port allowed by the firewalls. If port 80 is used, it's
        possible that the traffic be intercepted by a transparent proxy
        that doesn't understand WCIP. Depending on its configuration,
        the transparent proxy may or may not pass on the traffic
        without interference. In case it doesn't, either reconfigure it
        or a port number other than 80 must be used.

   Besides HTTP, channels may also be implemented using e.g., Beep [8],
   PGM [9]. Their channel URI may be:

   wcip://my.net:80/channel-name?proto=beep&security=tls
   wcip://my.net:80/channel-name?proto=pgm&group=239.1.1.1

   [Note: additional work is needed to fully specify them.]

   Both HTTP and Beep are unicast, which has scalability limitations.
   E.g., suppose a machine is capable of 20000 concurrent persistent
   connections. Then that machine being an invalidation server can
   support at most 20000 simultaneously active invalidation clients.
   Moreover, if it takes 1ms to send out a message, then the
   invalidation latency is at least 20 seconds, even under the best
   network condition.

Li & Cao & Dahlin   Experimental - September 2001                  11

                     Draft-danli-wrec-wcip-01.txt           March 2001


   An IP-multicast-based channel implementation avoids this scalability
   problem. An IP multicast group is allocated for the invalidation
   channel and its address is advertised as part of the channel
   information. For any update to the object volume, the invalidation
   server only needs to send one copy, to the multicast group. The
   invalidation client subscribes to this multicast group to receive
   the updates. Because the object volume has version numbers, WCIP may
   not have to run on top of a reliable multicast protocol.

   In the absence of IP multicast, an unicast-based channel
   implementation may employ channel relays to improve scalability,
   which is the topic of the next section.


4. Deployment Issues

4.1 Channel Relay

   An invalidation channel may have tens of thousands of invalidation
   clients. Channel relay points can improve the scalability of an
   unicast-based channel. Instead of subscribing directly to the origin
   invalidation server, some invalidation clients are redirected to a
   channel relay point. A channel relay point can perform one-to-many
   channel relay and many-to-one connection aggregation.

   (1)  Channel Relay

   The channel relay point may have multiple clients subscribed to the
   same invalidation channel. It in turn only subscribes once to the
   original invalidation server. By hierarchically relaying channel
   messages, it reduces the load on the invalidation server and helps
   to scale the invalidation channel end-to-end.

                        Invalidation Server
                                |
                                | conn0
                                |
                                |
                        Channel Relay Point
                            /    |   \
                           /     |    \
                   conn1  / conn2|     \ conn3
                         /       |      \
                        /        |       \
                    Client1  Client2  Client3

   A "dumb" relay point copies all messages from connection "conn0" to
   "conn1", "conn2" and "conn3", and vice versa. A "smart" relay point
   also constructs the up-to-date view of the volume as well as the
   journal of changes to the volume, based on the messages it receives
   from the invalidation server. Then, it can respond to client-driven


Li & Cao & Dahlin   Experimental - September 2001                  12

                     Draft-danli-wrec-wcip-01.txt           March 2001


   volume synchronization requests, instead of forwarding the requests
   all the way to the invalidation server.

   (2) Connection Aggregation

   A relay point supports not only multiple clients but also multiple
   channels. Connection aggregation reduces the number of TCP
   connections the invalidation client and the replay point have to
   maintain. See the example below.

                    Server1  Server2  Server3
                        \       |        /
                   conn1 \      |conn2  / conn3
                          \     |      /
                           \    |     /
                        Channel Relay Point
                            /    |   \
                           /     |    \
                   conn4  / conn5|     \ conn6
                         /       |      \
                        /        |       \
                    Client1  Client2  Client3

   The client would have established 3 connections for 3 different
   invalidation servers. Now that all 3 channels are redirected to the
   same relay point, the client only needs to establish 1 TCP
   connection, to the relay point, which in turn subscribes to the 3
   invalidation servers. This reduces the client's TCP overhead and
   allows the client to support more channels, as well as reducing the
   overhead on the invalidation servers and the relay point.

   A channel relay point can be set up via a cache hierarchy or a CDN.
   Specifically, an invalidation client can discover and then connect
   to the relay point in one of the following ways.

   (1)  The origin web server or replica origin web server, being part
        of a CDN, returns a channel URI with the relay point as the
        hostname.

   (2)  The relay point, being a configured outgoing proxy to a
        potential invalidation client, intercepts and replaces the
        channel URI in the HTTP response with its own information.

   (3)  When the invalidation client does DNS name lookup of the
        invalidation server hostname, the DNS server of a CDN returns
        the IP address of a local channel relay point.

   (4)  When the invalidation client connects to the invalidation
        server, the invalidation server replies with a redirect message
        pointing to a channel offered by the relay point.

4.2 Detect Changes


Li & Cao & Dahlin   Experimental - September 2001                  13

                     Draft-danli-wrec-wcip-01.txt           March 2001


   Detecting changes is the job of the origin server and/or
   invalidation server. Web content may change because of updates from
   the content owner or updates from the content viewer. E.g., the
   content owner CNN.com updates its front page every 15 minutes, while
   Ebay updates its content whenever its customers post new auction
   items or bids. Therefore, changes may be detected in 4 ways.

   (1)  When the script runs that generates content and updates the web
        source file (e.g., a news article is updated with the latest
        financial information), the script notifies the invalidation
        server which then sends out invalidations or delta-encoded [10]
        updates to all participating caches.
   (2)  When a piece of data in the database is modified via the
        database interface (e.g., an addition to the inventory of
        books), a database trigger notifies the invalidation server of
        the event.
   (3)  When a HTTP request comes in (e.g., a POST request to add a new
        auction item), the origin server or its surrogate (reverse
        proxy) notifies the invalidation server of the event.
   (4)  The last but simplest way is for the invalidation server to
        poll the origin server periodically to find out if the object
        has changed. Given that there is only one invalidation server
        polling, the polling frequency can be very high, e.g., once
        every minute, offering decent cache consistency as well.

   In some cases, an event described above may invalidate multiple
   URLs. E.g., a database event may trigger the invalidation of
   hundreds of objects. Instead of listing all those objects and
   sending over to proxies, the server may describe the event itself to
   the proxies, provided that the proxies know how to interpret the
   event and figure out what objects become stale. Integrating such
   functionality may be future work.

   There is software providing user-level notification of changes to
   web content, e.g., the AIDE system [11]. WCIP could potentially be
   used to permit agents to subscribe to change notification, not for
   the purpose of cache invalidation, but to notify users. E.g., a web
   crawler could subscribe to WCIP channels instead of crawling web
   sites periodically for object updates.

4.3 Discover Channels

   A caching proxy learns about an invalidation channel in three ways:
   (1) configured by the proxy's administrator, (2) configured by the
   CDN that's controlling the proxy, or (3) obtained from the HTTP
   response when fetching an web object. Specified here is method 3:

   In a normal HTTP request-&-response exchange, the caching proxy
   obtains the channel address from the HTTP entity headers
   "Invalidated-By" and "Channel-Object".

        Invalidated-By = "Invalidated-By" ":" Channel-URI
        Channel-URI =

Li & Cao & Dahlin   Experimental - September 2001                  14

                     Draft-danli-wrec-wcip-01.txt           March 2001


                "wcip:" "//" host ":" port "/" channel-name "?" query
        channel-name = token

   Example:

        Invalidated-By: wcip://www.cnn.com:777/allpolitics?proto=http

4.4 Join Channels

   The decision to join a channel can be either (1) instructed by the
   proxy's administrator, (2) instructed by the CDN that the proxy is
   part of, or (3) dynamically decided.

   It's not the job of this protocol to specify the decision algorithms
   but there are some common sense ones. E.g., join a channel when the
   proxy has cached M objects belonging to that channel, or when the
   proxy has received N requests to objects belonging to that channel.
   The proxy's administrator can configure M and N.

   Moreover, the proxy can employ a heuristic [12]: consider an object
   for WCIP service only if (1) it is cached and (2) a subsequent
   request does use the cached copy without discovering it expired or
   modified. This heuristic avoids objects that either are not very
   popular or are modified more frequently than accessed, despite it
   being cached in the meantime. This guideline can be applied to
   calculating M and N.


5. Protocol Specification

   This section lays out the message syntax and sequences. Section 6
   has the complete rule set (state machine) with regard to the server
   and client's behavior.

   Following is a brief description of the WCIP protocol in the most
   common and simple case:

  1)      In a normal HTTP request-&-response exchange, a caching proxy
     obtains invalidation channel information from the HTTP response
     header "Invalidated-By", returned by the origin server or its
     surrogate.

  2)      To join the channel, the caching proxy establishes a persistent
     HTTP connection with the invalidation server, assuming the channel
     implementation is based on HTTP.

  3)      Immediately following connection set-up, the proxy MUST initiate
     one round of volume synchronization (see section 5.2) to obtain an
     up-to-date view of the ObjectVolume, and hence the up-to-date view
     of all the objects in it.

  4)      After the initial round, the invalidation server MAY initiate
     volume synchronization when updates are made to the volume or when

Li & Cao & Dahlin   Experimental - September 2001                  15

                     Draft-danli-wrec-wcip-01.txt           March 2001


     the channel is silent for "heartbeat interval" time (see section
     5.3).

  5)      Whenever the proxy notices that the "last synchronization time" is
     more than "revalidation interval" ago, the proxy MUST initiate a
     round of volume synchronization.

  6)      When serving content, the proxy MUST NOT use a cached object if
     the cached object is marked as stale or the "last synchronization
     time" is more than "freshness guarantee" time ago for the object.
     Instead, the proxy MUST perform HTTP revalidation with the origin
     server before serving the object. (See section 5.4).

  7)      The proxy or the invalidation server MAY terminate communication
     anytime by closing the connection. Then the proxy reverts back to
     HTTP Cache-Control.

5.1 Object Volume DTD

   A description of the object volume contains (1) the volume's own
   information, e.g., its version, date, invalidation channel, Last-
   Modified time, Etag etc.; and (2) the volume composition, which
   iterates the objects that belong to the volume and their consistent
   state.

   An ObjectVolume MAY list not only objects but also directories. For
   example, an ObjectVolume entry with uri = "http://www.cnn.com
   /allpolitics/" represents all the web objects that share this URI
   prefix. Given a web object, longest prefix match is used to identify
   an applicable entry in the ObjectVolume.

   If the matching entry's URI is a filename, Etag and Last-Modified
   time (if available) SHOULD be used to determine object freshness. If
   the Etag and Last-Modified time are not available or if the matching
   entry's URI is a directory path, the attribute state="stale"
   determines that all cached objects with that URI prefix ought to
   revalidated.

   An ObjectVolume MAY also consist of objects from different origin
   servers, as long as the same invalidation server is being used for
   all the objects. This may be typical in a CDN environment.

   Thus, an object volume is described using the following XML DTD:

   <!ELEMENT ObjectVolume (member*)>
   <!ATTLIST ObjectVolume date CDATA #REQUIRED>
        ;# the time this xml message is sent by the origin
        ;# invalidation server
   <!ATTLIST ObjectVolume channel CDATA #REQUIRED>
        ;# the invalidation channel URI that carries this object
        ;# volume
   <!ATTLIST ObjectVolume version CDATA #REQUIRED>
        ;# the version number of this object volume; It's incremented

Li & Cao & Dahlin   Experimental - September 2001                  16

                     Draft-danli-wrec-wcip-01.txt           March 2001


        ;# whenever an change is made to the volume.
   <!ATTLIST ObjectVolume base CDATA #REQUIRED>
        ;# the base version number that the following volume info is
        ;# based on; a base of 0 means the following volume info solely
        ;# defines the volume composition; a positive base number means
        ;# that the following info should apply to an existing volume
        ;# of that version number.
   <!ATTLIST ObjectVolume last-modified CDATA>
        ;# the volume's current Last-Modified time; may be used in
        ;# conjunction with the version number to identify the version.
   <!ATTLIST ObjectVolume etag CDATA>
        ;# the volume's current Etag; may be used in conjunction
        ;# with the version number to identify a volume version.

   <!ELEMENT member (object+)>
   <!ATTLIST member op (exclude|include|prefetch) "include">
        ;# whether the following objects are to be included or excluded
        ;# from the volume composition. Also, if the cache doesn't have
        ;# the object, whether it should be prefetched into the cache.
        ;# "prefetch" implies "include".
   <!ATTLIST member state (stale|unknown) "unknown">
        ;# whether the enclosed objects have become stale or are still
        ;# fresh relative to the base version, or unknown.
   <!ATTLIST member redirect-to CDATA>
        ;# redirect the receiver to receive updates for the following
        ;# objects from another invalidation channel.
   <!ATTLIST member redirect-from CDATA>
        ;# the following objects are carried by the current channel
        ;# because they are redirected from another channel.

   <!ELEMENT object EMPTY>
   <!ATTLIST object name CDATA #REQUIRED>
        ;# a name (or ID) for the object, unique within the channel.
   <!ATTLIST object fresh CDATA #REQUIRED>
        ;# the freshness guarantee of the object in seconds.
   <!ATTLIST object update (yes|no) "no">
        ;# whether content of the new object will be sent.
   <!ATTLIST object uri CDATA #REQUIRED>
        ;# the object URI; if the URI is a directory path instead of
        ;# a filename, it can potentially match any object with that
        ;# URI prefix.
   <!ATTLIST object last-modified CDATA>
        ;# the object's current Last-Modified time
   <!ATTLIST object etag CDATA>
        ;# the object's current Etag

   For example,

   <?xml version="1.0"?>
   <!DOCTYPE ObjectVolume SYSTEM "ObjectVolume.dtd">
   <ObjectVolume channel="http://cdn.net:88/ch1" version="1"
    base="0" date="Fri, 17 Nov 2000 08:22:17 GMT">
     <member op="include">

Li & Cao & Dahlin   Experimental - September 2001                  17

                     Draft-danli-wrec-wcip-01.txt           March 2001


       <object name="amazon" fresh="120"
         uri="http://www.amazon.com/index.html"
         last-modified="Wed, 15 Nov 2000 04:52:01 GMT"
       />
       <object name="ebay" fresh="240"
         uri="http://www.ebay.com/index.html"
         last-modified="Thur, 16 Nov 2000 03:18:07 GMT"
         etag="yzxzyx"
       />
       <object name="cnn/allpolitics/" fresh="360"
         uri="http://www.cnn.com/allpolitics/"
         last-modified="Fri, 17 Nov 2000 08:22:17 GMT"
       />
     </member>
   </ObjectVolume>

   [Note: specification is needed for sending small objects in full in
   the ObjectVolume and for sending the delta encoding [10] of a
   slightly changed object.]

5.2 Client-Initiated Volume Synchronization

   Immediately following channel subscription is always one round of
   client-initiated volume synchronization. Then, subsequent rounds of
   volume synchronization can be either client-initiated or server-
   initiated.

   Client-initiated volume synchronization is also performed whenever
   the proxy notices that the current time has passed the "last
   synchronization time" plus "revalidation interval". The proxy MAY
   notice it via timeout or notice it whenever it cannot use a cached
   object because the "last synchronization time" has been the object's
   "freshness guarantee" time ago.

   Four steps take place for client-initiated volume synchronization:

   (1)  Synchronization request: the caching proxy sends a ObjectVolume
        message to the invalidation server, describing its own view of
        the volume, especially the version number "A". If the proxy had
        never subscribed to the channel before, the version number is
        0.

   (2)  ObjectVolume update: the invalidation server replies with the
        journal of changes to the volume since version "A" up until the
        latest version "B", if the journal of changes since version "A"
        is still available. The server SHOULD aggregate multiple
        updates to the same object; it only needs to report the latest
        one. If the journal of changes is not available, it replies
        with the full copy of latest ObjectVolume. If "A" is equal to
        "B", the server simply echoes back the synchronization request.

   (3)  ObjectVolume processing: the caching proxy examines each object
        entry in the update, records its freshness guarantee, and

Li & Cao & Dahlin   Experimental - September 2001                  18

                     Draft-danli-wrec-wcip-01.txt           March 2001


        compares the cached object (if any) with the entry. If the
        cached object's Etag is not equal to that in the entry and the
        cached object's Last-Modified time is earlier than that in the
        entry, the proxy marks the cached object as stale. If the entry
        URI is a directory path instead of a filename, all cached
        objects with that directory prefix are marked as stale.

   (4)  Update "last synchronization time": set it to the time the
        caching proxy sent the synchronization request.

   Here are some examples. Suppose the proxy has never subscribed to
   the channel before. The first synchronization request looks like:

   <?xml version="1.0"?>
   <!DOCTYPE ObjectVolume SYSTEM "ObjectVolume.dtd">
   <ObjectVolume channel="http://cdn.net:88/ch1" version="0">
   </ObjectVolume>

   The invalidation server replies with the complete ObjectVolume:

   <?xml version="1.0"?>
   <!DOCTYPE ObjectVolume SYSTEM "ObjectVolume.dtd">
   <ObjectVolume channel="http://cdn.net:88/ch1" version="7"
    base="0" date="Fri, 17 Nov 2000 08:22:17 GMT">
     <member op="include">
       <object name="amazon" fresh="120"
         uri="http://www.amazon.com/index.html"
         last-modified="Wed, 15 Nov 2000 04:52:01 GMT"
       />
       <object name="ebay" fresh="240"
         uri="http://www.ebay.com/index.html"
         last-modified="Thur, 16 Nov 2000 03:18:07 GMT"
         etag="yzxzyx"
       />
       <object name="cnn/allpolitics/" fresh="360"
         uri="http://www.cnn.com/allpolitics"
         last-modified="Fri, 17 Nov 2000 08:22:17 GMT"
       />
     </member>
   </ObjectVolume>

   Suppose later the proxy disconnects from the channel and rejoins
   after 10 minutes. Assuming it still keeps the volume description of
   version 7, it sends a synchronization request like this:

   <?xml version="1.0"?>
   <!DOCTYPE ObjectVolume SYSTEM "ObjectVolume.dtd">
   <ObjectVolume channel="http://cdn.net:88/ch1" version="7"
   base="7" date="Fri, 17 Nov 2000 08:22:17 GMT">
   </ObjectVolume>

   Suppose the volume has not changed in that 10 minutes. The
   invalidation server replies:

Li & Cao & Dahlin   Experimental - September 2001                  19

                     Draft-danli-wrec-wcip-01.txt           March 2001


   <?xml version="1.0"?>
   <!DOCTYPE ObjectVolume SYSTEM "ObjectVolume.dtd">
   <ObjectVolume channel="http://cdn.net:88/ch1" version="7"
   base="7" date="Fri, 17 Nov 2000 08:42:17 GMT">
   </ObjectVolume>

   However, if the volume indeed has changed, the invalidation server
   sends back the journal of changes since version 7. The reply MUST
   have a base version equal to or smaller than the version in the
   synchronization request. Here is an example:

   <?xml version="1.0"?>
   <!DOCTYPE ObjectVolume SYSTEM "ObjectVolume.dtd">
   <ObjectVolume channel="http://cdn.net:88/ch1" version="9"
    base="7" date="Fri, 17 Nov 2000 08:32:17 GMT">
     <member op="exclude"> ;# exclude object(s) from the volume.
       <object name="amazon" fresh="120"
         uri="http://www.amazon.com/index.html"
         last-modified="Wed, 15 Nov 2000 04:52:01 GMT"
       />
     </member>
     <member state="stale">
       <object name="ebay" fresh="240"
         uri="http://www.ebay.com/index.html"
         last-modified="Thur, 17 Nov 2000 08:20:07 GMT"
         etag="yzkzyx"
       />
   </member>
   </ObjectVolume>

   But if the server is now at version 20 and no longer has records on
   changes before version 10, while the client is at version 7, then
   the invalidation server sends back the complete ObjectVolume
   information with base="0".

5.3 Server-Initiated Volume Synchronization

   While the caching proxy is required to initiate volume
   synchronization whenever necessary, the invalidation server is not
   required to initiate volume synchronization if it doesn't choose to
   operate in server-driven mode. The invalidation server may be
   configured to operate in either mode. It MAY switch between server-
   driven mode and client-driven mode after any volume synchronization.

   In server-driven mode, the invalidation server initiates volume
   synchronization when changes are made to the object volume, or when
   the channel has been silent for more than "heartbeat interval" time.

   The invalidation server SHOULD initiate volume synchronization
   "reasonably" soon after it learns of an object change, but it MAY
   delay the synchronization until some time before the subsequent
   heartbeat. It MUST NOT delay further. Such a strategy allows the

Li & Cao & Dahlin   Experimental - September 2001                  20

                     Draft-danli-wrec-wcip-01.txt           March 2001


   server to batch multiple changes into one update. It minimizes the
   number of volume synchronization rounds without inducing unnecessary
   cache misses.

   Three steps take place for server-initiated volume synchronization:

   (1)  ObjectVolume update: if there are changes to the object volume,
        the invalidation server updates its view of the object volume,
        increments the version number, and organizes the changes into
        an ObjectVolume update message. The server then sends this
        update out as well as storing it into the volume's journal of
        changes. The server SHOULD aggregate multiple updates to the
        same object into one. If it's time to generate a heartbeat and
        there has been no change to the volume since the last update,
        the server simply sends out an update that reiterates the
        current volume's version number.

   (2)  ObjectVolume processing: the caching proxy examines each object
        entry in the update, records its freshness guarantee, and
        compares the cached object (if any) with the entry. If the
        cached object's Etag is not equal to that in the entry and the
        cached object's Last-Modified time is earlier than that in the
        entry, the proxy marks the cached object as stale. If the entry
        URI is a directory path instead of a filename, all cached
        objects with that directory prefix are marked as stale.

   (3)  Update "last synchronization time": in this case, there is no
        synchronization request, just the server's update. To account
        for possible clock skew, the proxy MUST convert the "date" in
        the server's update into the proxy's local time. Suppose t1 is
        the time the proxy sent out its initial synchronization request
        (when it established the channel subscription), while t2 is the
        "date" in the corresponding ObjectVolume update at that time.
        Now, t3 is the "date" of the current update from the server,
        then the "last synchronization time" is set to "t1 + (t3 -
        t2)".

   For example, an object modification moves the volume from version 9
   to version 10. The corresponding ObjectVolume update looks like
   this:

   <?xml version="1.0"?>
   <!DOCTYPE ObjectVolume SYSTEM "ObjectVolume.dtd">
   <ObjectVolume channel="http://cdn.net:88/ch1" version="10"
    base="9" date="Fri, 17 Nov 2000 08:32:17 GMT">
     <member state="stale">
       <object name="ebay" fresh="240"
         uri="http://www.ebay.com/index.html"
         last-modified="Tue, 24 Dec 2000 16:38:20 GMT"
         etag="37bb01a2-7ec-39f5bafc "
       />
     </member>
   </ObjectVolume>

Li & Cao & Dahlin   Experimental - September 2001                  21

                     Draft-danli-wrec-wcip-01.txt           March 2001


   To generate a heartbeat when there has been no change to the volume,
   the server simply restates the current volume's version number. For
   example:

   <?xml version="1.0"?>
   <!DOCTYPE ObjectVolume SYSTEM "ObjectVolume.dtd">
   <ObjectVolume channel="http://cdn.net:88/ch1" version="10"
   base="10" date="Fri, 17 Nov 2000 08:22:17 GMT">
   </ObjectVolume>

   [Note: specification is needed for sending small objects in full in
   the ObjectVolume and for sending the delta encoding [10] of a
   slightly changed object.]

5.4 Serving Content

   When a HTTP request comes in with a URI, the proxy searches its
   ObjectVolume data structure for a matching entry. If an ObjectVolume
   entry is a directory path instead of a filename, the entry is
   applicable to the URI if the URI has that directory path as prefix.
   If multiple such directory entries match, the entry with the longest
   match is used.

   The caching proxy MUST NOT use a cached object if the cached object
   is marked as stale or the current time has past the "last
   synchronization time" plus the freshness guarantee of the matching
   entry. Instead, the proxy MUST either perform HTTP revalidation with
   the origin server before serving the object or initiate volume
   synchronization with the invalidation server.

   After the proxy fetched the new object into its cached (or
   revalidated the existing one), the proxy MUST compare the cached
   object with the corresponding entry in the ObjectVolume. If the
   matching entry is a directory path or if the entry doesn't contain
   Last-Modified time and Etag, the cached object MUST be marked as not
   stale. Otherwise, the proxy checks if the cached object's Etag is
   equal to that in the entry or the cached object's Last-Modified time
   is later than that in the entry. If yes, the cached object MUST be
   marked as not stale; otherwise, it's still marked as stale.

6. Protocol State Machine

6.1 Client State Machine

   The initial state is "INIT". Actions may be to run a procedure,
   which is defined at the end of the section.

   STATE: INIT
   INPUT: subscription is established to the invalidation server.
   ACTION: set the "current version number" to 0; create a local
        ObjectVolume data structure with 0 objects in it; initialize


Li & Cao & Dahlin   Experimental - September 2001                  22

                     Draft-danli-wrec-wcip-01.txt           March 2001


        "revalidation interval" to an arbitrary or pre-configured
        value.
   NEXT-STATE: TO-SYNC

   STATE: TO-SYNC
   INPUT: none
   ACTION: send a synchronization request with the "current version
        number"; record the current time as "sync request time"; reset
        the REVALIDATION-TIMER to a value equal or smaller than
        "revalidation interval".
   NEXT-STATE: SYNC-INITIATED

   STATE: SYNC-INITIATED or INTERIM
   INPUT: receive ObjectVolume update with version number X, base Y.
   CONDITION: Y is equal to or smaller than the "current version
        number" and X is equal to or larger than the "current version
        number".
   ACTION: run procedure "process the ObjectVolume update"; set the
        "current version number" to X; run procedure "set last
        synchronization time"; run procedure "set revalidation
        interval"; reset the REVALIDATION-TIMER to "revalidation
        interval".
   NEXT-STATE: INTERIM

   STATE: SYN-INITIATED or INTERIM
   INPUT: receive ObjectVolume update with version number X, base Y.
   CONDITION: Y is larger than the "current version number" or X is
        smaller than the "current version number".
   ACTION: discard the message; reset the REVALIDATION-TIMER to a value
        equal or smaller than "revalidation interval".
   NEXT-STATE: INTERIM

   STATE: SYN-INITIATED or INTERIM
   INPUT: REVALIDATE-TIMER times out
   NEXT-STATE: TO-SYNC

   Procedure "process the ObjectVolume update": for each object entry
        in the update message: add or update the entry in the internal
        ObjectVolume data structure; compare the cached object (if any)
        with the entry. If the cached object's Etag is not equal to
        that in the entry and the cached object's Last-Modified time is
        earlier than that in the entry, mark the cached object as
        stale. If the entry URI is a directory path instead of a
        filename, all cached objects with that directory prefix are
        marked as stale.

   Procedure "set last synchronization time": set it to "sync request
        time" if STATE==SYNC-INITIATED; otherwise, set to " sync
        request time" + current time - "sync response time".

   Procedure "set revalidation interval": this is where the proxy has
       some liberty and can implement some policy. Picking a large
       value means less aggressive synchronization, and thus higher

Li & Cao & Dahlin   Experimental - September 2001                  23

                     Draft-danli-wrec-wcip-01.txt           March 2001


       invalidation latency. To avoid unnecessary cache misses, the
       proxy SHOULD pick a value smaller than any of the freshness
       guarantees of objects in the volume. On the other hand, to limit
       the load, it's RECOMMENDED that the proxy only revalidate if the
       volume has recently seen active use.


6.2 Server State Machine

   The server has two types of tasks. One is the Volume Monitor, which
   keeps track of the up-to-date view of the ObjectVolume and its
   journal of changes. The other is the per-client Volume Synchronizer,
   which is charge of volume synchronization with each client.

   Here is the Volume Monitor state machine. The initial state is
   "INIT".

   STATE: INIT
   INPUT: the initial ObjectVolume.
   ACTION: create an up-to-date ObjectVolume data structure; create an
        empty journal of changes; set the current volume number to 1.
   NEXT-STATE: INTERIM

   STATE: INTERIM
   INPUT: a change to an object in the volume is detected.
   ACTION: generate the up-to-date ObjectVolume entry for the object;
        update the ObjectVolume data structure; increment the current
        version number; if an entry for the same object exists in the
        journal of changes, remove it; enter the entry to journal of
        changes. Also, send a NEED-SYNC signal to every Volume
        Synchronizer.
   NEXT-STATE: INTERIM

   Here is the state machine for the per-client Volume Synchronizer.
   The initial state is "INIT". Actions to run a procedure is defined
   at the end of the section.

   STATE: INIT
   INPUT: a client subscribed.
   ACTION: set "last update version" to 0.
   NEXT-STATE: INTERIM

   STATE: INTERIM
   INPUT: receive client synchronization request with base version X.
   ACTION: set "last update version" to X.
   NEXT-STATE: TO-SYNC

   STATE: TO-SYNC
   INPUT: none
   ACTION: send the journal of changes since the "last update version".
        If the journal since then is not available, send the full
        ObjectVolume. If the journal since then is empty, simply echo
        the synchronization request. Set the "last update version" to

Li & Cao & Dahlin   Experimental - September 2001                  24

                     Draft-danli-wrec-wcip-01.txt           March 2001


        the "current version number"; run procedure "set heartbeat
        interval"; set the SYNC-TIMER to the "heartbeat interval".
   NEXT-STATE: INTERIM

   STATE: INTERIM
   INPUT: receive NEED-SYNC signal from the Volume Monitor.
   CONDITION: the server elects to initiate immediate synchronization.
   NEXT-STATE: TO-SYNC

   STATE: INTERIM
   INPUT: receive NEED-SYNC signal from the Volume Monitor.
   CONDITION: the server elects to delay the synchronization.
   ACTION: set the SYNC-TIMER to a value equal or smaller than the time
        left on the timer. Picking a large timeout means less
        aggressive synchronization, and thus higher invalidation
        latency. But the timer MUST NOT be set to a value larger than
        the time left.
   NEXT-STATE: INTERIM

   STATE: INTERIM
   INPUT: SYNC-TIMER times out
   NEXT-STATE: TO-SYNC

   Procedure "set heartbeat interval":If the server elects to perform
       no proactive invalidation, set the "heartbeat interval" to
       infinite. Otherwise, the server SHOULD pick a value smaller than
       any of the freshness guarantees of objects in the volume. This
       is where the server has some liberty and can implement some
       policy. Picking a large value means less aggressive
       synchronization, and thus higher invalidation latency. The
       server MAY set the heartbeat interval very high or infinite in
       order to reduce load.


7. Security Considerations

   In essence, web caches tend to trust the network infrastructure. If
   one can spoof IP addresses or poison DNS caches, one can poison web
   caches. In contrast, content providers tend to be concerned about
   content integrity, besides freshness. With WCIP, web caches should
   also be concerned about the denial-of-service attack where the
   malicious keeps invalidating objects in a cache, preventing the
   cache from doing real work.

   To accommodate the various security needs of the invalidation
   servers and clients, WCIP provides three channel security modes:

   (1)  IP-based weak security, i.e., the invalidation client accepts a
        channel message if the source IP address of the invalidation
        message matches the invalidation server name.

   This is for those invalidation server and clients that both do not
   need strong security.

Li & Cao & Dahlin   Experimental - September 2001                  25

                     Draft-danli-wrec-wcip-01.txt           March 2001


   (2)  Public-key-based strong security with mandatory verification,
        i.e., the invalidation client obtains the public key of the
        channel during channel subscription (e.g., using SSL). The
        invalidation server signs or encrypts the channel messages with
        the channel's private key. The invalidation client MUST verify
        the signature and discard the message if the signature doesn't
        match.

   This is when the invalidation server requires strong security for
   the channel. The invalidation clients have to comply. For unicast,
   the channel can simply be a SSL connection as in HTTPS.

   To prevent intermediate node from tampering with the channel
   information in the first place, the domain name of the channel MUST
   be identical to that of the object's origin server. Upon channel
   setup, the origin server MAY then redirect the invalidation client
   to the true invalidation server via HTTPS.

   (3)  Public-key-based strong security with optional verification,
        i.e., the invalidation client obtains the public key of the
        channel during channel subscription. The invalidation server
        signs all the channel messages with the channel's private key.
        However, the invalidation client can choose to verify either
        the signature (strong) or the source IP address (weak).

   This is when the invalidation server doesn't need strong security
   but wants to accommodate both clients that need and need not strong
   security. The authors cannot determine the necessity of this third
   option. Option 1 and 2 may be easier to support because they fit in
   the HTTP and HTTPS model well. Option 3 may be easier to support
   using a Beep implementation.

   The above public-key solution ensures message integrity. To guard
   against message replay attacks, the Etag or Last-Modified of the
   updated object has to be part of the invalidation material.


8. References


   1  James Gwertzman and Margo Seltzer, " World-Wide Web cache
      consistency", In Proceedings of 1996 USENIX Technical Conference,
      pages 141-151, San Diego, CA, January 1996.

   2  Cao, P.; Liu, C.; "Maintaining strong cache consistency in the
      World Wide Web" 17th International Conference on Distributed
      Computing Systems. 27-30 May 1997. IEEE Transactions on Computers
      (April 1998) vol.47, no.4 p. 445-57

   3  Bradner, S., "Key words for use in RFCs to Indicate Requirement
      Levels", BCP 14, RFC 2119, March 1997.


Li & Cao & Dahlin   Experimental - September 2001                  26

                     Draft-danli-wrec-wcip-01.txt           March 2001


   4  R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P.
      Leach, T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1",
      RFC 2616, June 1999.

   5  Edith Cohen and Balachander Krishnamurthy and Jennifer Rexford, "
      Improving End-to-End Performance of the Web Using Server Volumes
      and Proxy Filters", Proceedings of the ACM SIGCOMM conference,
      September 1998.

   6  D. Li and D. R. Cheriton. "Scalable Web Caching of Frequently
      Updated Objects using Reliable Multicast", 2nd USENIX Symposium
      on Internet Technologies and Systems (USITS'99). October 1999.
      ftp://ftp.dsg.stanford.edu/pub/papers/mmo.htm

   7  Yin, J.; Alvisi, L.; Dahlin, M.; Lin, C.; "Using leases to
      support server-driven consistency in large-scale systems"
      Proceedings of 18th International Conference on Distributed
      Computing Systems. 26-29 May 1998. p. 285-94

   8  M. T. Rose, "The Blocks Extensible Exchange Protocol Framework",
      IETF Internet Draft draft-ietf-beep-framework-08.

   9  Tony Speakman, etc. "PGM Reliable Transport Protocol", IETF
      Internet Draft draft-speakman-pgm-spec-06

   10 Mogul, J.C.; Douglis, F.; Feldmann, A.; Krishnamurthy, B.,
      "Potential benefits of delta encoding and data compression for
      HTTP", ACM SIGCOMM 97 Conference.

   11 Fred Douglis, Thomas Ball, Yih-Farn Chen, and Eleftherios
      Koutsofios, "The AT&T Internet Difference Engine: Tracking and
      Viewing Changes on the Web", World Wide Web, January 1998, pp.
      27-44. Also appears as AT&T Labs--Research TR 97.23.1, April,
      1997.

   12 Dilley, John; Arlitt, Martin; Perret, Stephane; Jin, Tai. "The
      Distributed Object Consistency Protocol", HP Labs Technical
      Report, http://www.hpl.hp.com/techreports/1999/HPL-1999-109.html,
      September 1999.


9. Acknowledgments

   This draft greatly benefited from the valuable comments from Carl
   Sutton, Ian Cooper, Mark Nottingham, Brad Cain, Hilarie Orman, Fred
   Douglis and Alex Rousskov.

10. Author's Addresses

   Dan Li
   Cisco Systems, Inc.

Li & Cao & Dahlin   Experimental - September 2001                  27

                     Draft-danli-wrec-wcip-01.txt           March 2001


   Email: lidan@cisco.com

   Pei Cao
   Cisco Systems, Inc.
   Email: cao@cisco.com

   Mike Dahlin
   University of Texas
   Email: dahlin@cs.utexas.edu


Li & Cao & Dahlin   Experimental - September 2001                  28

                     Draft-danli-wrec-wcip-01.txt           March 2001


Full Copyright Statement

   "Copyright (C) The Internet Society (date). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implmentation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph
   are included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Acknowledgement

   Funding for the RFC editor function is currently provided by the
   Internet Society.


Li & Cao & Dahlin   Experimental - September 2001                  29