privsec                                                      I. Goldberg
Internet-Draft                                    University of Waterloo
Intended status: Experimental                                    T. Wang
Expires: November 16, 2019       HK University of Science and Technology
                                                                 C. Wood
                                                             Apple, Inc.
                                                            May 15, 2019


                  Network-Based Website Fingerprinting
                    draft-wood-privsec-wfattacks-00

Abstract

   The IETF is well on its way to protecting connection metadata with
   protocols such as DNS-over-TLS and DNS-over-HTTPS, and work-in-
   progress towards encrypting the TLS SNI.  However, more work is
   needed to protect traffic metadata, especially in the context of web
   traffic.  In this document, we survey Website Fingerprinting attacks,
   which are a class of attacks that use machine learning techniques to
   attack web privacy, and highlight metadata leaks used by said
   attacks.  We also survey proposed mitigations for such leakage and
   discuss their applicability to IETF protocols such as TLS, QUIC, and
   HTTP.  We endeavor to show that Website Fingerprinting attacks are a
   serious problem that affect all Internet users, and we pose open
   problems and directions for future research in this area.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on November 16, 2019.


Goldberg, et al.        Expires November 16, 2019               [Page 1]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Website Fingerprinting  . . . . . . . . . . . . . . . . . . .   4
   4.  Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . .   5
   5.  Defenses  . . . . . . . . . . . . . . . . . . . . . . . . . .   8
   6.  Open Problems and Directions  . . . . . . . . . . . . . . . .  11
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  12
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  12
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  13
     9.1.  Normative References  . . . . . . . . . . . . . . . . . .  13
     9.2.  Informative References  . . . . . . . . . . . . . . . . .  13
   Appendix A.  Acknowledgements . . . . . . . . . . . . . . . . . .  18
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  18

1.  Introduction

   Internet protocols such as TLS 1.3 [RFC8446] and QUIC
   [I-D.ietf-quic-transport] bring substantial improvements to end-
   users.  The IETF engineered these with security and privacy in mind
   by encrypting more protocol messages using modern cryptographic
   primitives and algorithms, and engineering against flaws found in
   previous protocols, yielding several desirable security properties,
   including: forward-secure session key secrecy, downgrade protection,
   key compromise impersonation resistance, and protection of endpoint
   identities.  Combined, these two protocols are set to protect a
   significant amount of Internet data.  However, significant metadata
   leaks still exist for users of these protocols.  Examples include
   plaintext TLS SNI and application-specific extensions (ALPN), as well
   as DNS queries.  This information can be used by a passive attacker
   to learn information about the contents of an otherwise encrypted
   network connection.  Recently, such information has also been studied


Goldberg, et al.        Expires November 16, 2019               [Page 2]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


   as a means of building unique user profiles [li2018can].  It has also
   been used to build flow classifiers that aid network management
   [foremski2014dns].

   In the context of Tor, a popular low-latency anonymity network, a
   common class of attacks that use metadata for such inference is
   called Website Fingerprinting (WF).  These attacks use machine
   learning techniques built with features extracted from metadata such
   as traffic patterns to attack web (browsing) privacy.  Miller et al.
   [miller2014know] show how these attacks can be applied to web
   browsing traffic protected with HTTPS to reveal private information
   about users.  Pironti et al. [pironti2012identifying] use similar
   attacks based on data sizes to identify individual social media
   clients using encrypted connections.  Fingerprinting attacks using
   encrypted traffic analysis are also applicable to encrypted media
   streams, such as Netflix videos.  (See work from Reed et al.
   [reed2017identifying] and Schuster et al. [schuster2017beauty] for
   examples of these attacks.)  WF attacks have also been applied to
   other IETF protocols such as encrypted DNS, including dnscrypt, DNS-
   over-TLS, and DNS-over-HTTPS [siby2018dns][shulman2014pretty].  In
   the past, they have also been conducted remotely
   [gong2010fingerprinting], using buffer-based side channels in a
   victim's home router.

   Protocols such as DNS-over-TLS and DNS-over-HTTPS [RFC8484], and
   work-in-progress towards encrypting the TLS SNI extension
   [I-D.ietf-tls-esni], help minimize metadata sent in cleartext on the
   wire.  However, regardless of protocol and even network-layer
   fingerprinting mitigations, application layer specifics, e.g., web
   page sizes and client request patterns, reveal a noticeable amount of
   information to attackers.  We argue that much more work is needed to
   protect encrypted connection metadata, especially in the context of
   web traffic.

   In this document, we describe WF attacks in the context of IETF
   protocols such as TLS and QUIC.  We survey WF attacks and highlight
   metadata features and classification techniques used to conduct said
   attacks.  We also describe proposed mitigations for these attacks and
   discuss their applicability to IETF protocols.  We conclude with a
   discussion of open problems and directions for future research and
   advocate for more work in this area.

2.  Background

   In this section we review how most secure Internet connections are
   made today.  We omit custom configurations such as those using VPNs
   and proxies since they do not represent the common case for most
   Internet users.  The following steps briefly describe the sequence of


Goldberg, et al.        Expires November 16, 2019               [Page 3]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


   events that normally occur when a web client, e.g., browser, curl,
   etc., connects to a website and obtains some resource.  First an
   unencrypted DNS query is sent to an untrusted DNS recursive resolver
   to resolve a name to an IP address.  Upon receipt, clients then open
   a TCP and TLS connection to the destination address.  During this
   stage, metadata such as the TLS SNI and ALPN values are sent in
   cleartext.  The SNI is used to denote the destination application or
   endpoint to which clients want to connect.  Servers use this for
   several purposes, including selecting an appropriate certificate (one
   with the SNI name in the SubjectAlternativeName list) or routing to a
   different backend terminator.  ALPN values are used to negotiate
   which application-layer protocol will be used on top of the TLS
   connection.  Common values include "http/1.1", "h2", and (soon) "h3".
   Upon connection, clients then send HTTP messages to obtain the
   desired resource.

   Connections look different (on the wire) with TLS 1.3, encrypted DNS
   via DNS-over-TLS or DNS-over-HTTPS, and encrypted SNI.  DNS queries
   are encrypted to a (trusted) recursive resolver and TLS metadata such
   as SNI are encrypted in transit to the terminator.  Despite the
   reduction in cleartext metadata sent over the wire, there still
   remains several sources of information that an adversary may use for
   malicious purposes, including: size and timing of DNS queries and
   responses, size and timing or application traffic, and connection
   attempts induced while loading a web resource, e.g., Javascript
   files.  So while technologies such as Encrypted SNI, DoT, and DoH
   help protect some metadata, they are not complete solutions to the
   larger problem.  In the following section, we discuss this
   overarching problem in detail.

3.  Website Fingerprinting

   Website Fingerprinting (WF) is a class of attacks that exploit
   metadata leakage to attack end-user privacy on the Internet.  In the
   WF threat model, Adv is assumed to be a passive and local attacker.
   Local means that Adv can associate traffic with a given client.
   Examples include proxies to which clients directly connect.  Passive
   means that Adv can only view traffic in transit.  It cannot add,
   drop, or otherwise modify packets between the victim client and
   server(s).  Use of reliable and encrypted transport protocols such as
   TLS limit on-path attackers to eavesdropping on encrypted packets.
   (In QUIC, however, reordering packets is possible.)

   Traffic features used for classification include properties such as
   packet size, timing, direction, interarrival times, and burstiness,
   among many others [wang2016website].  Normally, features are
   restricted to those which are extractable as a passive eavesdropper,
   and not those which are viewable by modifying client or server


Goldberg, et al.        Expires November 16, 2019               [Page 4]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


   behavior.  Specifically, this means that attacks such as CRIME {{}
   and TIME {{}, which rely on an attacker abusing TLS-layer compression
   to leak contents of an encrypted connection, are out of scope.

   Website Fingerprinting attacks have evolved over the years through
   three phases: (1) Closed-world WF on SSL/TLS, (2) Closed-world WF on
   Tor, and (3) Open-world WF on Tor.

   1.  In the closed-world model, clients are assumed to only visit a
       small set of pages monitored by Adv.  This is less realistic but
       easier to analyze than the open-world model discussed below, and
       so the earliest results achieved success on SSL/TLS in this
       model.  (For a realistic attack, Adv would need to monitor every
       possible page of interest to each client, which is impractical.)
       Attacks against proxy-based privacy technologies such as VPNs and
       SSH tunneling, which has almost no effect on the network, falls
       under this category as well.

   2.  Tor, an anonymity network built on onion routing, is harder to
       attack than SSL for several reasons; successful results on Tor
       thus came later.  First, Tor pads all cells (Tor's application-
       layer datagrams) to the same constant size, removing unique
       packet lengths as a powerful feature for the attacker.  Second,
       Tor imposes random network conditions upon the client due to
       random selection of proxies, so packet sequences are less likely
       to be consistent.

   3.  In the open-world model, Adv wishes to learn whenever a victim
       client visits one of a select number of monitored pages
       [wang2016website].  Adversaries train classifiers in this model
       using monitored and non-monitored websites of their choosing.  By
       definition, Adv cannot train using client-chosen pages.  Clients
       then visit pages at will and Adv attempts to learn whenever a
       monitored page is visited, if any are at all.  This is a
       realistic model capturing the fact that the set of pages any
       attacker would be interested in must necessarily be a small
       subset of the set of all pages.  As this is a harder model to
       attack, successful results on this model came later.

4.  Attacks

   1.  Closed-world WF on TLS: WF attacks date back to applications on
       SSL first inspired by Wagner and Schneier [wagner1996analysis],
       in which the authors observed that packet lengths reveal
       information about the underlying data.  Subsequent attacks
       carried out by Cheng et al. [cheng1998traffic], Sun et al.
       [sun2002statistical], and Hintz [hintz2002fingerprinting]
       continued to show access.  These attacks assume Adv has knowledge


Goldberg, et al.        Expires November 16, 2019               [Page 5]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


       of the target resource length(s), which is not always possible
       with techniques such as padding.

   Bissias et al. [bissias2005privacy] use cross correlation of inter-
   packet times in one second time windows as an WF attack.  Liberatore
   and Levine [liberatore2006inferring] proposed two WF attacks using
   the Jaccard coefficient and the Naive Bayes classifier.  Herrmann et
   al. [herrmann2009website] extended the work of Liberatore and Levine
   with a multinomial Naive Bayes classifier computed using three input
   frequency transformations.  Results yielded higher accuracy than that
   of Liberatore and Levine.  Herrmann's attack is the best in this
   category, but the authors assume packets which do not fill a MTU
   represent packet trailers.  Therefore, uniqueness is only accurate
   modulo the MTU.  Efficacy is limited if endpoints pad packets to the
   MTU or another fixed length.  Modern protocols such as HTTP/2, QUIC,
   and TLS 1.3 all provide some form of application-controlled padding.
   (Note: These attacks are not successful on Tor.)

   1.  Closed-world WF on Tor: Shmatikov and Wang [shmatikov2006timing]
       presented a WF attack that exploits cross correlation of arrival
       packet counts in one second time windows.  Lu et al.
       [lu2010website] developed a classifier based on the Levenshtein
       distance between ingress and egress packet lengths extracted from
       packet sequences.  Distance is computed between strings of
       ingress and egress packet lengths.  The training packet sequence
       with the closest distance to the testing packet sequence is
       deemed the match.  Dyer et al. [dyer2012peek] used a Naive Bayes
       classifier trained with a reduced set of features, including
       total response transmission time, length of packets (in each
       direction), and burst lengths.  (Wang [wang2016website] notes
       that measuring burst lengths in Tor is difficult given the
       presence of SENDME cells for flow control.)  This approach did
       not yield any measurable improvements over the SVM classifier
       from Panchenko et al.  Cai et al. [cai2012touching] extend the
       work of Lu et al.  by adding transpositions to the Levenshtein
       distance computation and normalizing the result, yielding what
       the authors refer to as the Optimal String Alignment Distance
       (OSAD).  Before feature extraction, the authors round TCP packet
       lengths to the nearest multiple of 600B as an estimate of the
       number of Tor cells.

   Wang et al. [wang2013improved] tuned the OSAD-based attack to improve
   its accuracy.  Specific changes include use of Tor cells instead of
   TCP packets for packet and burst lengths, as well as heuristics to
   remove SENDME cells (those not carrying application data) from flows
   to recover true burst lengths.  The authors also modified the
   distance computation by removing substitutions, increasing the weight
   for egress packets, and varying the transposition cost across the


Goldberg, et al.        Expires November 16, 2019               [Page 6]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


   packet sequence (large weights at the beginning of a trace, and
   smaller weights near the end, where variations are expected across
   repeated page loads.)  Wang et al. also developed an alternate
   classifier with lower accuracy yet superior performance (quadratic to
   linear time complexity).  It works by minimizing the sum of two
   costs: sequence transpositions and sequence deletions or insertions.
   These two costs are computed separately, in contrast to the first
   approach which computes them simultaneously.

   1.  Open-world WF on Tor and TLS: Panchenko et al.
       [panchenko2011website] were the first to use a support vector
       machine (SVM) classifier trained with web domain-specific
       features, such as HTML document sizes, as well as packet lengths.
       Wang et al. [wang2014effective] also developed an attack using a
       k-Nearest Neighbors (k-NN) classifier, which is a supervised
       machine learning algorithm, targeting the open world setting.
       The classifier extracts a large number of features from packet
       sequences, including raw (ingress and egress) packet counts,
       unique packet lengths, direction, burst lengths, and inter-packet
       times, among others.  (There are 4226 features in total.)  The
       k-NN distance metric is computed as the sum of weighted feature
       differences.

   Kota et al. [abe2016fingerprinting] were the first to use Deep
   Learning (DL) methods based on Stacked Denoising Autoencoders for WF
   attacks.  (Autoencoders reduce feature input dimensions when
   stacked.)  Kota et al. form input vectors from Tor cell directions
   (+1 or -1).  They use no other features.  Using a (small) data set
   from Wang [wang2016website], the classifier achieves a 86% true
   positive rate and 2% false positive rate in the open world model.
   Rimmer et al. [rimmer2018automated] applied DL for automated feature
   generation and classifier construction.  Trained with 2,500 traces
   per website, their system achieves 96.3% accuracy in the open world
   model.  Recently, Bhat et al. [bhat2018var], Oh et al. [oh2017pfp],
   and Sirinam et al. [sirinam2018deep] used Convolutional Neural
   Networks (CNNs) and Deep Neural Networks (DNNs) for WF attacks.
   Results from Sirinam et al. show the best results - 98% on Tor
   without recent defenses (in Section {{defenses}) - while performing
   favorably when select defenses are used for both open and closed
   world models.

   Yan et al. [yan2018feature] studied manual high-information feature
   extraction from packet traces.  They "exhaustively" examined
   different levels of features, including packet, burst, TCP, port, and
   IP address, summing to 35,683 in total, and distilled them into a
   diverse set of uncorrelated features for eight different
   communication scenarios.  Rahman [rahman2018using] studied the
   utility of features derived from packet interarrival times,


Goldberg, et al.        Expires November 16, 2019               [Page 7]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


   including: median interarrival time (per burst), burst packet arrival
   time variance, cross-burst interarrival median differences, and
   others.  Using a CNN, results show that these features yield a non-
   negligible increase in WF attack accuracy.

   For all WF attacks, one limitation worth highlighting is the base
   rate fallacy.  This can be summarized as follows: highly accurate
   classifiers with a reliable false positive rate (FPR) decrease in
   efficacy as the world size increases.  Juarez et al.
   [juarez2014critical] studied its impact by measuring the Bayesian
   detection rate (BDR) in comparison to the FPR as a function of world
   size.  As the world size increases, the BDR approaches 0 while the
   FPR remains stable, meaning that the probability of incorrect
   classifier results increase as well.  Juarez et al. partially address
   the base rate fallacy problem by adding a confirmation step to their
   classifier.  Another problem is that web content is (increasingly)
   dynamic.  Most WF attacks, especially those in closed world models,
   assume that traces are static.  However, Juarez et al.
   [juarez2014critical] show this is not the case even for "simple"
   pages such as google.com.  Thus, due to the base fallacy rate and
   dynamic nature of content, classifiers require continual retraining
   in order to ensure accuracy.

5.  Defenses

   WF defenses are deterministic or randomized algorithms that take as
   input application data or packet sequences and return modified
   application data or packet sequences.  Viable defenses seek to
   minimize the transformation cost and maximum (theoretical and
   perfect) attacker accuracy.  Naive defenses such as sending a
   constant stream of (possibly random) bytes between client and server
   may be effective though clearly not viable from a cost perspective.
   Relevant cost metrics include bandwidth overhead, added time or
   latency (and its impact on related metrics such as page load time),
   and even CPU cost, though the latter is often ignored in favor of the
   former two.  Wang [wang2016website] describe defenses as either
   limited or general.  A limited defense is one which only helps
   mitigate specific WF attacks by transforming packets in a way to
   obviate a particular (set of) feature(s) used by said attacks.  In
   contrast, general defenses help mitigate a variety of attacks.

   Several general defenses have been proposed, including BuFLO
   [dyer2012peek], which pads packets to a fixed length of 1500B (the
   normal MTU) and schedules packets for transmission at fixed period
   intervals (and sends fake data if nothing is yet available).  Tamaraw
   [wang2014comparing] is an improvement over BuFLO that uses two
   different fixed lengths for packet transmission, rather than one, to
   save on bandwidth overhead.  Tamaraw also uses two different


Goldberg, et al.        Expires November 16, 2019               [Page 8]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


   scheduling rates for ingress and egress packets.  The authors chose
   to make the ingress packet period smaller than the egress packet
   period since HTTP responses are often larger in size and count - if
   HTTP Push is used - than requests.  While provably correct, both
   BuFLO and Tamaraw limit the rate at which clients send traffic, and
   requires all clients to send at a uniform rate.  Both requirements
   therefore make it difficult to apply as a generic defense in IETF
   protocols.

   Wang et al. also developed Supersequence [wang2016website], which
   attempts to approximate a bandwidth-optimal deterministic defense.
   This is done by casting the padding and flow control problem as the
   shortest common subsequence (SCS) of the transformed packet trace.
   Supersequence approximates the solution by learning the optimal
   packet scheduling rate; it uses the same padding scheme as Tamaraw.

   Walkie-Talkie [wang2015walkie] is a collection of mechanisms for WF
   defense.  It includes running the client (browser) in half-duplex
   mode to batch requests and responses together, as well as randomly
   padding traffic so as to mimic traffic of benign websites.  It
   assumes knowledge of traffic patterns for benign websites, which can
   be information learned over time or provided by a cooperating peer.
   Goldberg and Wang also propose a "randomized" variant that pads real
   bursts of requests and generates random request bursts according to a
   uniform distribution.  The half-duplex mode could be implemented as
   an extension to a protocol such as HTTP/2, QUIC, or TLS.

   Many limited defenses have also been proposed.  We list prominent
   works below.

   o  Shmatikov and Wang [shmatikov2006timing] developed adaptive
      padding which adds packets to mask inter-packet times.  (This
      mechanism does not ever delay application data being sent, in
      contrast to other padding mechanisms such as BuFLO; see below.)
      Juarez et al. [juarez2015wtf]}[juarez2016toward] also created a WF
      defense based on adaptive padding called WTF-PAD.  This variant
      uses application data and "gap" distribution to generate padding
      for delays.  Specifically, when not sending application data,
      senders use the gap distribution to drive fake packet
      transmission.  WTF-PAD can be run by a single endpoint, though it
      is assumed that both client and server participate.  As mentioned
      above, protocols such as HTTP/2, QUIC, and TLS 1.3 offer a
      mechanism by which applications can send padding.  WTF-PAD could
      therefore be implemented as an extension to any of these
      protocols, either by applications supplying padding distributions
      or the system learning them over time.


Goldberg, et al.        Expires November 16, 2019               [Page 9]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


   o  Wright et al. [wright2009traffic] developed traffic morphing,
      which pads packets in such a way so as to make the sequence from
      one page have characteristics of another (non-monitored or benign)
      page.  This technique requires application-specific knowledge
      about benign pages and is therefore best implemented outside of
      the transport layer.

   o  Nithyanand et al. [nithyanand2014glove] developed a mechanism
      called Glove, wherein traces were first clustered and then morphed
      (via dummy insertion, packet merging, splitting, and delaying) to
      look indistinguishable within clusters.  When used to protect the
      Alexa top 500 domains, Glove performs well with respect to
      bandwidth overhead when compared to BuFLO and CS-BuFLO.  Varying
      the cluster size can tune Glove's bandwidth overhead.

   o  Pironti et al. [pironti2012identifying] developed a TLS-based
      fragmentation and padding scheme designed to hide the length of
      application data within a certain range with record padding.  The
      mechanism works by iteratively splitting application data into
      variable sized segments.  Applications can guide the range of
      viable lengths provided such information is available.

   o  Luo et al. [luo2011httpos] created HTTPS with Obfuscation
      (HTTPOS), which is a client-side mechanism for obfuscating HTTP
      traffic.  It uses the HTTP Range method to receive resources in
      chunks, TCP MSS to limit the size of individual chunks, and
      advertised window size to control the flow of chunks in
      transmission.

   o  Panchenko et al. [panchenko2011website] developed Decoy, which is
      a simple mechanism that loads a benign page alongside a real page.
      This seeks to mask the real page load by properties of the "decoy"
      page.  As with morphing, this defense requires application-
      specific knowledge about benign pages and is best implemented
      outside of the transport layer.

   o  The Tor project implemented HTTP pipelining
      [perry2011experimental], which bundles egress HTTP/1.1 requests
      into batches of varying sizes with random orders.  Batching
      requests to mask request and response sizes could be made easier
      with HTTP/2 [RFC7540], HTTP/3, and QUIC, since these protocol
      naturally support multiplexing.  However, pipelining and batching
      may necessarily introduce latency delays that negatively impact
      the user experience.

   o  Cherubin et al. [cherubin2017website] design two application-layer
      defenses called Application Layer Padding Concerns Adversaries
      (ALPaCA) and Lightweight application-Layer Masquerading Add-on


Goldberg, et al.        Expires November 16, 2019              [Page 10]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


      (LLaMA).  ALPaCA is a server-side defense that pads first-party
      content (deterministically or probabilistically) according to a
      known distribution.  (Deterministic padding similar to Tamaraw
      performs worse than probabilistic padding.)  LLaMA is similar to
      randomized pipelining, yet differs in that requests are also
      delayed (if necessary) and spurious requests are generated
      according to some probability distribution.  Comparatively, ALPaCA
      yields a greater reduction in WF attack accuracy than LLaMA.

   o  Lu et al. [lu2018dynaflow] designed DynaFlow, which is a defense
      that dynamically adjusts traffic flows using a combination of
      burst pattern morphing, constant traffic flow with flexible
      intervals, and burst padding.  DynaFlow overhead is 40% less than
      that of Tamaraw and was shown to have similar benefits.

6.  Open Problems and Directions

   To date, WF attacks target clients running over Tor or some other
   anonymizing service, meaning that WF attacks are likely more accurate
   on normal TLS-protected connections.  Moreover, attacks normally
   assume clients use HTTP/1.1 with parallel connections for parallel
   resource fetches.  In recent years, however, protocols such as SPDY,
   HTTP/2, and QUIC with built-in padding support and multiplexed
   stream-based connections should make existing attacks more difficult
   to carry out.  That said, it is unclear how exactly these protocol
   design trends will impact WF attacks.  A non-exhaustive list of
   questions that warrant further research are below:

   1.  How does connection coalescing and consolidation affect WF
       attacks?  Technologies such as DNS-over-HTTPS and ESNI favor
       architectures wherein a single network or connection can serve
       multiple origins or resources.  With connection coalescing,
       traffic for multiple resources is sent on the same connection,
       thereby adding effects similar to that of the Decoy defense
       mechanism described in Section 5

   2.  To what extent does protocol multiplexing increase WF attack
       difficulty?  Using a single connection with multiple streams to
       avoid HoL blocking saves on connection startup and bandwidth
       costs while simultaneously mixing information from multiple
       requests and resources on the same connection.

   3.  How can protocol features such as HTTP Push be used to improve WF
       defense efficacy?  Defenses without cooperative peer support
       often induce suboptimal bandwidth or latency costs.  If both
       endpoints of a connection participate in the defense, even
       proactively with Push, perhaps this could be improved.


Goldberg, et al.        Expires November 16, 2019              [Page 11]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


   4.  Can connection bootstrapping techniques such as those used by
       ESNI be used to distribute WF defense information?  One possible
       approach is to distribute client padding profiles derived from
       CDN knowledge of serviced resources.

   5.  How can clients build, use, and possibly share WF defense
       information to benefit others?

   6.  How can applications package websites and subresources in such a
       way that limits unique information?  For example, websites link
       to third party resources in an ad-hoc fashion, causing the
       subsequent trace of browser fetches to possibly uniquely identify
       the website.

   Research into the above questions will help the IETF community better
   understand the extent to which WF attacks are a problem for Internet
   users in general.

   It is worth mentioning that traffic-based WF attacks may not be
   required to achieve the desired goal of learning a connection's
   destination.  Network connections by nature reveal information about
   endpoint behavior.  For example, a connection to 8.8.8.8 indicates
   usage of Google's DNS service.  Likewise, a connection to any address
   in a Cloudflare IP address block indicates use of a service hosted by
   Cloudflare.  The relationship between network address and domains,
   especially when stable and unique, are a strong signal for website
   fingerprinting.  Trevisan et al. [trevisan2016towards] explored use
   of this signal as a reliable mechanism for website fingerprinting.
   They find that most major services (domains) have clearly associated
   IP address(es), though these addresses may change over time.  Jiang
   et al. [jiang2007lightweight] and Tammaro et al.
   [tammaro2012exploiting] also previously came to the same conclusion.
   Thus, classifiers that rely solely on network addresses may be used
   to aid website fingerprinting attacks.

7.  Security Considerations

   This document surveys security and privacy attacks and defenses on
   encrypted TLS connections.  It does not introduce, specify, or
   recommend any particular mitigation to the aforementioned attacks.

8.  IANA Considerations

   This document makes no IANA requests.


Goldberg, et al.        Expires November 16, 2019              [Page 12]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


9.  References

9.1.  Normative References

   [RFC1035]  Mockapetris, P., "Domain names - implementation and
              specification", STD 13, RFC 1035, DOI 10.17487/RFC1035,
              November 1987, <https://www.rfc-editor.org/info/rfc1035>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997, <https://www.rfc-
              editor.org/info/rfc2119>.

   [RFC6234]  Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms
              (SHA and SHA-based HMAC and HKDF)", RFC 6234,
              DOI 10.17487/RFC6234, May 2011, <https://www.rfc-
              editor.org/info/rfc6234>.

9.2.  Informative References

   [abe2016fingerprinting]
              "Fingerprinting attack on tor anonymity using deep
              learning", Asia-Pacific Advanced Network, 2016 , n.d..

   [backes2013preventing]
              "Preventing Side-Channel Leaks in Web Traffic -- A Formal
              Approach", NDSS, 2013 , n.d..

   [bhat2018var]
              "Var-CNN and DynaFlow -- Improved Attacks and Defenses for
              Website Fingerprinting", arXiv preprint arXiv:1802.10215 ,
              n.d..

   [bissias2005privacy]
              "Privacy vulnerabilities in encrypted HTTP streams",
              International Workshop on Privacy Enhancing Technologies,
              2005 , n.d..

   [cai2012touching]
              "Touching from a distance -- Website fingerprinting
              attacks and defenses", ACM conference on Computer and
              communications security, 2012 , n.d..

   [cheng1998traffic]
              "Traffic analysis of SSL encrypted web browsing", n.d..


Goldberg, et al.        Expires November 16, 2019              [Page 13]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


   [cherubin2017website]
              "Website fingerprinting defenses at the application
              layer", Privacy Enhancing Technologies, 2017 , n.d..

   [coull2007web]
              "On Web Browsing Privacy in Anonymized NetFlows", USENIX
              Security Symposium , n.d..

   [dyer2012peek]
              "Peek-a-boo, i still see you -- Why efficient traffic
              analysis countermeasures fail", IEEE Symposium on Security
              and Privacy, 2012 , n.d..

   [foremski2014dns]
              "DNS-Class -- immediate classification of IP flows using
              DNS", International Journal of Network Management , n.d..

   [gong2010fingerprinting]
              "Fingerprinting websites using remote traffic analysis",
              Proceedings of the 17th ACM conference on Computer and
              communications security , n.d..

   [hayes2016k]
              "k-fingerprinting -- A Robust Scalable Website
              Fingerprinting Technique", USENIX Security Symposium,
              2016 , n.d..

   [herrmann2009website]
              "Website fingerprinting -- attacking popular privacy
              enhancing technologies with the multinomial naive-bayes
              classifier", ACM workshop on Cloud computing security,
              2009 , n.d..

   [hintz2002fingerprinting]
              "Fingerprinting websites using traffic analysis",
              International Workshop on Privacy Enhancing Technologies,
              2002 , n.d..

   [I-D.ietf-quic-transport]
              Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed
              and Secure Transport", draft-ietf-quic-transport-20 (work
              in progress), April 2019.

   [I-D.ietf-tls-esni]
              Rescorla, E., Oku, K., Sullivan, N., and C. Wood,
              "Encrypted Server Name Indication for TLS 1.3", draft-
              ietf-tls-esni-03 (work in progress), March 2019.


Goldberg, et al.        Expires November 16, 2019              [Page 14]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


   [jiang2007lightweight]
              "Lightweight application classification for network
              management", SIGCOMM workshop on Internet network
              management, 2007 , n.d..

   [juarez2014critical]
              "A critical evaluation of website fingerprinting attacks",
              ACM SIGSAC Conference on Computer and Communications
              Security, 2014 , n.d..

   [juarez2015wtf]
              "WTF-PAD -- toward an efficient website fingerprinting
              defense for tor", CoRR, abs/1512.00524 , n.d., <https://pd
              fs.semanticscholar.org/0f54/4d0845cb9f317722759dc49e1493ef
              30d83d.pdf>.

   [juarez2016toward]
              "Toward an efficient website fingerprinting defense",
              European Symposium on Research in Computer Security,
              2016 , n.d..

   [li2018can]
              "Can We Learn What People Are Doing from Raw DNS
              Queries?", IEEE INFOCOM 2018-IEEE Conference on Computer
              Communications , n.d..

   [liberatore2006inferring]
              "Inferring the source of encrypted HTTP connections", ACM
              Conference on Computer and Communications Security, 2006 ,
              n.d..

   [lu2010website]
              "Website fingerprinting and identification using ordered
              feature sequences", European Symposium on Research in
              Computer Security, 2010 , n.d..

   [lu2018dynaflow]
              "DynaFlow -- An Efficient Website Fingerprinting Defense
              Based on Dynamically-Adjusting Flows", Workshop on Privacy
              in the Electronic Society, 2018 , n.d..

   [luo2011httpos]
              "HTTPOS -- Sealing Information Leaks with Browser-side
              Obfuscation of Encrypted Flows", NDSS, 2011 , n.d..


Goldberg, et al.        Expires November 16, 2019              [Page 15]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


   [miller2014know]
              "I know why you went to the clinic -- Risks and
              realization of https traffic analysis", International
              Symposium on Privacy Enhancing Technologies Symposium,
              2014 , n.d..

   [nithyanand2014glove]
              "Glove -- A bespoke website fingerprinting defense",
              Proceedings of the 13th Workshop on Privacy in the
              Electronic Society , n.d..

   [oh2017pfp]
              "p-FP -- Extraction, Classification, and Prediction of
              Website Fingerprints with Deep Learning", n.d..

   [panchenko2011website]
              "Website fingerprinting in onion routing based
              anonymization networks", ACM workshop on Privacy in the
              electronic society, 2011 , n.d..

   [perry2011experimental]
              "Experimental defense for website traffic fingerprinting",
              n.d., <https://blog.torproject.org/experimental-defense-
              website-traffic-fingerprinting>.

   [pironti2012identifying]
              "Identifying website users by TLS traffic analysis -- New
              attacks and effective countermeasures", n.d..

   [rahman2018using]
              "Using Packet Timing Information in Website
              Fingerprinting", n.d..

   [reed2017identifying]
              "Identifying https-protected netflix videos in real-time",
              ACM on Conference on Data and Application Security and
              Privacy, 2017 , n.d..

   [RFC7540]  Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
              Transfer Protocol Version 2 (HTTP/2)", RFC 7540,
              DOI 10.17487/RFC7540, May 2015, <https://www.rfc-
              editor.org/info/rfc7540>.

   [RFC8446]  Rescorla, E., "The Transport Layer Security (TLS) Protocol
              Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018,
              <https://www.rfc-editor.org/info/rfc8446>.


Goldberg, et al.        Expires November 16, 2019              [Page 16]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


   [RFC8484]  Hoffman, P. and P. McManus, "DNS Queries over HTTPS
              (DoH)", RFC 8484, DOI 10.17487/RFC8484, October 2018,
              <https://www.rfc-editor.org/info/rfc8484>.

   [rimmer2018automated]
              "Automated website fingerprinting through deep learning",
              Network & Distributed System Security Symposium (NDSS),
              2018 , n.d..

   [schuster2017beauty]
              "Beauty and the burst -- Remote identification of
              encrypted video streams", USENIX Security, 2017 , n.d..

   [shmatikov2006timing]
              "Timing analysis in low-latency mix networks -- Attacks
              and defenses", European Symposium on Research in Computer
              Security, 2006 , n.d..

   [shulman2014pretty]
              "Pretty bad privacy -- Pitfalls of DNS encryption",
              Workshop on Privacy in the Electronic Society, 2014 ,
              n.d..

   [siby2018dns]
              "DNS Privacy not so private -- the traffic analysis
              perspective", n.d..

   [sirinam2018deep]
              "Deep fingerprinting -- Undermining website fingerprinting
              defenses with deep learning", arXiv preprint
              arXiv:1801.02265 , n.d..

   [sun2002statistical]
              "Statistical identification of encrypted web browsing
              traffic", IEEE, 2002 , n.d..

   [tammaro2012exploiting]
              "Exploiting packet-sampling measurements for traffic
              characterization and classification", International
              Journal of Network Management, 2012 , n.d..

   [trevisan2016towards]
              "Towards web service classification using addresses and
              DNS", Wireless Communications and Mobile Computing
              Conference (IWCMC), 2016 International. IEEE, 2016 , n.d..


Goldberg, et al.        Expires November 16, 2019              [Page 17]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


   [wagner1996analysis]
              "Analysis of the SSL 3.0 protocol", USENIX Workshop on
              Electronic Commerce Proceedings, 1996 , n.d..

   [wang2013improved]
              "Improved website fingerprinting on tor", Workshop on
              privacy in the electronic society, 2013 , n.d..

   [wang2014comparing]
              "Comparing website fingerprinting attacks and defenses",
              Technical Report 2013-30, CACR, 2013. , n.d..

   [wang2014effective]
              "Effective Attacks and Provable Defenses for Website
              Fingerprinting", USENIX Security Symposium, 2014 , n.d..

   [wang2015walkie]
              "Walkie-talkie -- An effective and efficient defense
              against website fingerprinting", n.d..

   [wang2016website]
              "Website fingerprinting -- Attacks and defenses",
              University of Waterloo , n.d..

   [wright2009traffic]
              "Traffic Morphing -- An Efficient Defense Against
              Statistical Traffic Analysis", NDSS, 2009 , n.d..

   [yan2018feature]
              "Feature selection for website fingerprinting",
              Proceedings on Privacy Enhancing Technologies, 2018 ,
              n.d..

Appendix A.  Acknowledgements

   The authors would like to thank Frederic Jacobs and Tim Taubert for
   feedback on earlier versions of this document.

Authors' Addresses

   Ian Goldberg
   University of Waterloo

   Email: (iang@uwaterloo.ca


Goldberg, et al.        Expires November 16, 2019              [Page 18]

Internet-Draft    Network-Based Website Fingerprinting          May 2019


   Tao Wang
   HK University of Science and Technology

   Email: taow@cse.ust.hk


   Christopher A. Wood
   Apple, Inc.

   Email: cawood@apple.com


Goldberg, et al.        Expires November 16, 2019              [Page 19]