Internet-Draft | DAP-PPM | March 2023 |
Geoghegan, et al. | Expires 14 September 2023 | [Page] |
There are many situations in which it is desirable to take measurements of data which people consider sensitive. In these cases, the entity taking the measurement is usually not interested in people's individual responses but rather in aggregated data. Conventional methods require collecting individual responses and then aggregating them, thus representing a threat to user privacy and rendering many such measurements difficult and impractical. This document describes a multi-party distributed aggregation protocol (DAP) for privacy preserving measurement (PPM) which can be used to collect aggregate data without revealing any individual user's data.¶
This note is to be removed before publishing as an RFC.¶
The latest revision of this draft can be found at https://ietf-wg-ppm.github.io/draft-ietf-ppm-dap/draft-ietf-ppm-dap.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-ietf-ppm-dap/.¶
Discussion of this document takes place on the Privacy Preserving Measurement Working Group mailing list (mailto:ppm@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/ppm/. Subscribe at https://www.ietf.org/mailman/listinfo/ppm/.¶
Source for this draft and an issue tracker can be found at https://github.com/ietf-wg-ppm/draft-ietf-ppm-dap.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 14 September 2023.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
This document describes the Distributed Aggregation Protocol (DAP) for privacy preserving measurement. The protocol is executed by a large set of clients and a small set of servers. The servers' goal is to compute some aggregate statistic over the clients' inputs without learning the inputs themselves. This is made possible by distributing the computation among the servers in such a way that, as long as at least one of them executes the protocol honestly, no input is ever seen in the clear by any server.¶
(*) Indicates a change that breaks wire compatibility with the previous draft.¶
04:¶
03:¶
02:¶
DAP-Auth-Token
header; this is now
optional.)¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
The output of the aggregation function over a given set of measurements and aggregation parameter. As defined in [VDAF].¶
A share of the aggregate result emitted by an Aggregator. Aggregate shares are reassembled by the Collector into the aggregate result, which is the final output of the aggregation function. As defined in [VDAF].¶
The function computed over the Clients' measurements. As defined in [VDAF].¶
Parameter used to prepare a set of measurements for aggregation (e.g., the candidate prefixes for Poplar1 from Section 8 of [VDAF]). As defined in [VDAF].¶
An endpoint that receives input shares from Clients and validates and aggregates them with the help of the other Aggregators.¶
A set of reports that are aggregated into an aggregate result.¶
The time difference between the oldest and newest report in a batch.¶
A parameter of a query issued by the Collector that specifies the time range of the reports in the batch.¶
A party that uploads a report.¶
The endpoint that selects the aggregation parameter and receives the aggregate result.¶
An Aggregator that executes the aggregation and collection sub-protocols as instructed by the leader.¶
An Aggregator's share of a measurement. The input shares are output by the VDAF sharding algorithm. As defined in [VDAF].¶
An aggregator's share of the prepared measurement resulting from successful execution of the VDAF preparation phase. Many output shares are combined into an aggregate share during the VDAF aggregation phase. As defined in [VDAF].¶
A distinguished Aggregator that coordinates aggregation and collection amongst the Aggregators.¶
A plaintext input emitted by a Client (e.g., a count, summand, or string), before any encryption or secret sharing is applied. Depending on the VDAF in use, multiple values may be grouped into a single measurement. As defined in [VDAF].¶
The minimum number of reports in a batch.¶
The output of the VDAF sharding algorithm broadcast to each of the Aggregators. As defined in [VDAF].¶
A cryptographically protected measurement uploaded to the Leader by a Client. Comprised of a set of report shares.¶
An encrypted input share comprising a piece of a report.¶
This document uses the presentation language of [RFC8446] to define messages in the DAP protocol. Encoding and decoding of these messages as byte strings also follows [RFC8446].¶
The protocol is executed by a large set of Clients and a small set of servers.
Servers are referred to as "Aggregators". Each Client's input to the protocol is
its measurement (or set of measurements, e.g., counts of some user behavior).
Given the input set of measurements x_1, ..., x_n
held by n
users, the goal
of DAP is to compute y = F(p, x_1, ..., x_n)
for some function F
while
revealing nothing else about the measurements. We call F
the "aggregation
function".¶
This protocol is extensible and allows for the addition of new cryptographic schemes that implement the VDAF interface specified in [VDAF]. Candidates include:¶
VDAFs rely on secret sharing to protect the privacy of the measurements. Rather than sending its input in the clear, each client shards its measurements into a sequence of "input shares" and sends an input share to each of the Aggregators. This provides two important properties:¶
The overall system architecture is shown in Figure 1.¶
[[OPEN ISSUE: This shows two helpers, but the document only allows one for now. https://github.com/ietf-wg-ppm/draft-ietf-ppm-dap/issues/117]]¶
The main participants in the protocol are as follows:¶
The entity which wants to obtain the aggregate of the measurements generated by the Clients. Any given measurement task will have a single Collector.¶
The endpoints which directly take the measurement(s) and report them to the DAP protocol. In order to provide reasonable levels of privacy, there must be a large number of clients.¶
An endpoint which receives report shares. Each Aggregator works with the other Aggregators to compute the aggregate result. This protocol defines two types of Aggregators: Leaders and Helpers. For each measurement task, there is a single Leader and Helper.¶
The Aggregator responsible for coordinating the protocol. It receives the reports, splits them into report shares, and distributes the report shares to the Helpers, and orchestrates the process of computing the aggregate result as requested by the Collector.¶
Helpers are responsible for executing the protocol as instructed by the Leader. The protocol is designed so that Helpers can be relatively lightweight, with most of the state held at the Leader.¶
The basic unit of DAP is the "task" which represents a single measurement process (though potentially taken over multiple time windows). The definition of a task includes the following parameters:¶
These parameters are distributed out of band to the Clients and to the Aggregators. They are distributed by the Collector in some authenticated form. Each task is identified by a unique 32-byte ID which is used to refer to it in protocol messages.¶
During the duration of the task, each Client records its own measurement value(s), packages them up into a report, and sends them to the leader. Each share is separately encrypted for each Aggregator so that even though they pass through the Leader, the Leader is unable to see or modify them. Depending on the task, the Client may only send one report or may send many reports over time.¶
The Leader distributes the shares to the Helpers and orchestrates the process of verifying them (see Section 2.2) and assembling them into a final aggregate result for the Collector. Depending on the VDAF, it may be possible to incrementally process each report as it comes in, or may be necessary to wait until the entire batch of reports is received.¶
An essential task of any data collection pipeline is ensuring that the data being aggregated is "valid". In DAP, input validation is complicated by the fact that none of the entities other than the Client ever sees that client's plaintext measurement.¶
In order to address this problem, the aggregators engage in a secure, multi-party computation specified by the chosen VDAF [VDAF] in order to prepare a report for aggregation. At the beginning of this computation, each Aggregator is in possession of an input share uploaded by the client. At the end of the computation, each Aggregator is in possession of either an "output share" that is ready to be aggregated or an indication that a valid output share could not be computed.¶
To facilitate this computation, the input shares generated by the client include information used by the aggregators during aggregation in order to validate their corresponding output shares. For example, Prio3 includes a distributed zero-knowledge proof of the input's validity [BBCGGI19] which the Aggregators can jointly verify and reject the report if it cannot be verified. However, they do not learn anything about the individual report other than that it is valid.¶
The specific properties attested to in the proof vary depending on the
measurement being taken. For instance, to measure the time the user took
performing a given task the proof might demonstrate that the value reported was
within a certain range (e.g., 0-60 seconds). By contrast, to report which of a
set of N
options the user select, the report might contain N
integers and
the proof would demonstrate that N-1
were 0
and the other was 1
.¶
It is important to recognize that "validity" is distinct from "correctness". For instance, the user might have spent 30s on a task but the client might report 60s. This is a problem with any measurement system and DAP does not attempt to address it; it merely ensures that the data is within acceptable limits, so the client could not report 10^6s or -20s.¶
Communications between DAP participants are carried over HTTPS [RFC9110]. HTTPS provides server authentication and confidentiality. Use of HTTPS is REQUIRED.¶
DAP is made up of several sub-protocols in which different subsets of the protocol's participants interact with each other.¶
In those cases where a channel between two participants is tunneled through another protocol participant, DAP mandates the use of public-key encryption using [HPKE] to ensure that only the intended recipient can see a message in the clear.¶
In other cases, DAP requires HTTPS client authentication. Any authentication scheme that is composable with HTTP is allowed. For example:¶
This flexibility allows organizations deploying DAP to use existing well-known HTTP authentication mechanisms that they already support. Discovering what authentication mechanisms are supported by a DAP participant is outside of this document's scope.¶
Errors can be reported in DAP both at the HTTP layer and within challenge objects as defined in Section 8. DAP servers can return responses with an HTTP error response code (4XX or 5XX). For example, if the client submits a request using a method not allowed in this document, then the server MAY return HTTP status code 405 Method Not Allowed.¶
When the server responds with an error status, it SHOULD provide additional information using a problem document [RFC7807]. To facilitate automatic response to errors, this document defines the following standard tokens for use in the "type" field (within the DAP URN namespace "urn:ietf:params:ppm:dap:error:"):¶
Type | Description |
---|---|
unrecognizedMessage | The message type for a response was incorrect or the payload was malformed. |
unrecognizedTask | An endpoint received a message with an unknown task ID. |
unrecognizedAggregationJob | An endpoint received a message with an unknown aggregation job ID. |
outdatedConfig | The message was generated using an outdated configuration. |
reportRejected | Report could not be processed for an unspecified reason. |
reportTooEarly | Report could not be processed because its timestamp is too far in the future. |
batchInvalid | The batch boundary check for collector's query failed. |
invalidBatchSize | There are an invalid number of reports in the batch. |
batchQueriedTooManyTimes | The maximum number of batch queries has been exceeded for one or more reports included in the batch. |
batchMismatch | Aggregators disagree on the report shares that were aggregated in a batch. |
unauthorizedRequest | Authentication of an HTTP request failed (see Section 3.1). |
missingTaskID | HPKE configuration was requested without specifying a task ID. |
queryMismatch | Query type indicated by a message does not match the task's query type. |
roundMismatch | The aggregators disagree on the current round of the VDAF preparation protocol. |
This list is not exhaustive. The server MAY return errors set to a URI other than those defined above. Servers MUST NOT use the DAP URN namespace for errors not listed in the appropriate IANA registry (see Section 8.4). The "detail" member of the Problem Details document includes additional diagnostic information.¶
When the task ID is known (see Section 4.2), the problem document SHOULD include an additional "taskid" member containing the ID encoded in Base 64 using the URL and filename safe alphabet with no padding defined in Sections 5 and 3.2 of [RFC4648].¶
In the remainder of this document, the tokens in the table above are used to refer to error types, rather than the full URNs. For example, an "error of type 'unrecognizedMessage'" refers to an error document with "type" value "urn:ietf:params:ppm:dap:error:unrecognizedMessage".¶
This document uses the verbs "abort" and "alert with [some error message]" to describe how protocol participants react to various error conditions. This implies HTTP status code 400 Bad Request unless explicitly specified otherwise.¶
DAP has three major interactions which need to be defined:¶
Each of these interactions is defined in terms of "resources". In this section we define these resources and the messages used to act on them.¶
A resource's path is resolved relative to a server's endpoint to construct a resource URI. Resource paths are specified as templates like:¶
{role}/resource_type/{resource-id}¶
{role}
is one of the API endpoints in the task's aggregator_endpoints
(see
Section 4.2). The remainder of the path is resolved relative to the
endpoint.¶
DAP resource identifiers are opaque byte strings, so any occurrence of
{resource-id}
in a URL template (e.g., {task-id}
or {report-id}
) MUST be
expanded to the URL-safe, unpadded Base 64 representation of the corresponding
resource identifier, as specified in Sections 5 and 3.2 of [RFC4648].¶
The following are some basic type definitions used in other messages:¶
/* ASCII encoded URL. e.g., "https://example.com" */ opaque Url<1..2^16-1>; uint64 Duration; /* Number of seconds elapsed between two instants */ uint64 Time; /* seconds elapsed since start of UNIX epoch */ /* An interval of time of length duration, where start is included and (start + duration) is excluded. */ struct { Time start; Duration duration; } Interval; /* An ID used to uniquely identify a report in the context of a DAP task. */ opaque ReportID[16]; /* The various roles in the DAP protocol. */ enum { collector(0), client(1), leader(2), helper(3), (255) } Role; /* Identifier for a server's HPKE configuration */ uint8 HpkeConfigId; /* An HPKE ciphertext. */ struct { HpkeConfigId config_id; /* config ID */ opaque enc<1..2^16-1>; /* encapsulated HPKE key */ opaque payload<1..2^32-1>; /* ciphertext */ } HpkeCiphertext; /* Represent a zero-length byte string. */ struct {} Empty;¶
DAP uses the 16-byte ReportID
as the nonce parameter for the VDAF
measurement_to_input_shares
and prep_init
methods (see [VDAF], Section 5). Thus for a VDAF to be compatible with DAP, it MUST specify a NONCE_SIZE
of 16 bytes.¶
Aggregated results are computed based on sets of reports, called "batches". The Collector influences which reports are used in a batch via a "query." The Aggregators use this query to carry out the aggregation flow and produce aggregate shares encrypted to the Collector.¶
This document defines the following query types:¶
enum { reserved(0), /* Reserved for testing purposes */ time_interval(1), fixed_size(2), (255) } QueryType;¶
The time_interval query type is described in Section 4.1.1; the fixed_size query type is described in Section 4.1.2. Future specifications may introduce new query types as needed (see Section 8.2). A query includes parameters used by the Aggregators to select a batch of reports specific to the given query type. A query is defined as follows:¶
opaque BatchID[32]; enum { by_batch_id(0), current_batch(1), } FixedSizeQueryType; struct { FixedSizeQueryType query_type; select (query_type) { by_batch_id: BatchID batch_id; current_batch: Empty; } } FixedSizeQuery; struct { QueryType query_type; select (Query.query_type) { case time_interval: Interval batch_interval; case fixed_size: FixedSizeQuery fixed_size_query; } } Query;¶
The parameters pertaining to each query type are described in one of the subsections below. The query is issued in-band as part of the collect sub-protocol (Section 4.5). Its content is determined by the "query type", which in turn is encoded by the "query configuration" configured out-of-band. All query types have the following configuration parameters in common:¶
min_batch_size
- The smallest number of reports the batch is allowed to
include. In a sense, this parameter controls the degree of privacy that will
be obtained: The larger the minimum batch size, the higher degree of privacy.
However, this ultimately depends on the application and the nature of the
measurements and aggregation function.¶
time_precision
- Clients use this value to truncate their report timestamps;
see Section 4.3. Additional semantics may apply, depending on the query
type. (See Section 4.5.6 for details.)¶
The parameters pertaining to specific query types are described in the relevant subsection below.¶
The first query type, time_interval
, is designed to support applications in
which reports are collected over a long period of time. The Collector specifies
a "batch interval" that determines the time range for reports included in the
batch. For each report in the batch, the time at which that report was generated
(see Section 4.3) MUST fall within the batch interval specified by the
Collector.¶
Typically the Collector issues queries for which the batch intervals are
continuous, monotonically increasing, and have the same duration. For example,
the sequence of batch intervals (1659544000, 1000)
, (1659545000, 1000)
,
(1659545000, 1000)
, (1659546000, 1000)
satisfies these conditions. (The
first element of the pair denotes the start of the batch interval and the second
denotes the duration.) Of course, there are cases in which Collector may need to
issue queries out-of-order. For example, a previous batch might need to be
queried again with a different aggregation parameter (e.g, for Poplar1). In
addition, the Collector may need to vary the duration to adjust to changing
report upload rates.¶
The fixed_size
query type is used to support applications in which the
Collector needs the ability to strictly control the sample size. This is
particularly important for controlling the amount of noise added to reports by
Clients (or added to aggregate shares by Aggregators) in order to achieve
differential privacy.¶
For this query type, the Aggregators group reports into arbitrary batches such that each batch has roughly the same number of reports. These batches are identified by opaque "batch IDs", allocated in an arbitrary fashion by the Leader.¶
To get the aggregate of a batch, the Collector issues a query specifying the
batch ID of interest (see Section 4.1). The Collector may not know which batch ID
it is interested in; in this case, it can also issue a query of type
current_batch
, which allows the Leader to select a recent batch to aggregate.
The leader SHOULD select a batch which has not yet began collection.¶
In addition to the minimum batch size common to all query types, the
configuration includes a parameter max_batch_size
that determines maximum
number of reports per batch.¶
Implementation note: The goal for the Aggregators is to aggregate precisely
min_batch_size
reports per batch. Doing so, however, may be challenging for
Leader deployments in which multiple, independent nodes running the aggregate
sub-protocol (see Section 4.4) need to be coordinated. The maximum batch
size is intended to allow room for error. Typically the difference between the
minimum and maximum batch size will be a small fraction of the target batch size
for each batch.¶
[OPEN ISSUE: It may be feasible to require a fixed batch size, i.e.,
min_batch_size == max_batch_size
. We should know better once we've had some
implementation/deployment experience.]¶
Prior to the start of execution of the protocol, each participant must agree on the configuration for each task. A task is uniquely identified by its task ID:¶
opaque TaskID[32];¶
A TaskID
is a globally unique sequence of bytes. It is RECOMMENDED that this
be set to a random string output by a cryptographically secure pseudorandom
number generator. Each task has the following parameters associated with it:¶
aggregator_endpoints
: A list of URLs relative to which each Aggregator's API
endpoints can be found. Each endpoint's list MUST be in the same order. The
Leader's endpoint MUST be the first in the list. The order of the
encrypted_input_shares
in a Report
(see Section 4.3) MUST be the same
as the order in which aggregators appear in this list.¶
max_batch_query_count
: The maximum number of times a batch of reports may be
queried by the Collector.¶
task_expiration
: The time up to which clients are expected to upload to this
task. The task is considered completed after this time. Aggregators MAY reject
reports that have timestamps later than task_expiration
.¶
In addition, in order to facilitate the aggregation and collect protocols, each of the Aggregators is configured with following parameters:¶
collector_config
: The [HPKE] configuration of the Collector
(described in Section 4.3.1); see Section 6 for information about the
HPKE configuration algorithms.¶
vdaf_verify_key
: The VDAF verification key shared by the Aggregators. This
key is used in the aggregation sub-protocol (Section 4.4). The security
requirements are described in Section 7.7.¶
Finally, the Collector is configured with the HPKE secret key corresponding to
collector_hpke_config
.¶
Clients periodically upload reports to the Leader, which then distributes the individual report shares to each Helper.¶
Before the Client can upload its report to the Leader, it must know the HPKE configuration of each Aggregator. See Section 6 for information on HPKE algorithm choices.¶
Clients retrieve the HPKE configuration from each aggregator by sending an HTTP
GET request to {aggregator}/hpke_config
. Clients MAY specify a query parameter
task_id
whose value is the task ID whose HPKE configuration they want. If the
Aggregator does not recognize the task ID, then it MUST abort with error
unrecognizedTask
.¶
An Aggregator is free to use different HPKE configurations for each task with
which it is configured. If the task ID is missing from the Client's request,
the Aggregator MAY abort with an error of type missingTaskID
, in which case
the Client SHOULD retry the request with a well-formed task ID included.¶
An Aggregator responds to well-formed requests with HTTP status code 200 OK and
an HpkeConfigList
value. The HpkeConfigList
structure contains one or more
HpkeConfig
structures in decreasing order of preference. This allows an
Aggregator to support multiple HPKE configurations simultaneously.¶
[TODO: Allow aggregators to return HTTP status code 403 Forbidden in deployments that use authentication to avoid leaking information about which tasks exist.]¶
HpkeConfig HpkeConfigList<1..2^16-1>; struct { HpkeConfigId id; HpkeKemId kem_id; HpkeKdfId kdf_id; HpkeAeadId aead_id; HpkePublicKey public_key; } HpkeConfig; opaque HpkePublicKey<1..2^16-1>; uint16 HpkeAeadId; /* Defined in [HPKE] */ uint16 HpkeKemId; /* Defined in [HPKE] */ uint16 HpkeKdfId; /* Defined in [HPKE] */¶
[OPEN ISSUE: Decide whether to expand the width of the id.]¶
Aggregators SHOULD allocate distinct id values for each HpkeConfig
in a
HpkeConfigList
. The RECOMMENDED strategy for generating these values is via
rejection sampling, i.e., to randomly select an id value repeatedly until it
does not match any known HpkeConfig
.¶
The Client MUST abort if any of the following happen for any HPKE config request:¶
Aggregators SHOULD use HTTP caching to permit client-side caching of this resource [RFC5861]. Aggregators SHOULD favor long cache lifetimes to avoid frequent cache revalidation, e.g., on the order of days. Aggregators can control this cached lifetime with the Cache-Control header, as follows:¶
Cache-Control: max-age=86400¶
Clients SHOULD follow the usual HTTP caching [RFC9111] semantics for HPKE configurations.¶
Note: Long cache lifetimes may result in clients using stale HPKE configurations; Aggregators SHOULD continue to accept reports with old keys for at least twice the cache lifetime in order to avoid rejecting reports.¶
Clients upload reports by using an HTTP PUT to
{leader}/tasks/{task-id}/reports
, where {leader}
is the first entry in the
task's Aggregator endpoints.¶
The payload is structured as follows:¶
struct { ReportID report_id; Time time; } ReportMetadata; struct { ReportMetadata report_metadata; opaque public_share<0..2^32-1>; HpkeCiphertext encrypted_input_shares<1..2^32-1>; } Report;¶
report_metadata
is public metadata describing the report.¶
report_id
is used by the Aggregators to ensure the report appears in at
most one batch (see Section 4.4.1.4). The Client MUST generate
this by generating 16 random bytes using a cryptographically secure random
number generator.¶
time
is the time at which the report was generated. The Client SHOULD
round this value down to the nearest multiple of the task's
time_precision
in order to ensure that that the timestamp cannot be used
to link a report back to the Client that generated it.¶
public_share
is the public share output by the VDAF sharding algorithm. Note
that the public share might be empty, depending on the VDAF.¶
encrypted_input_shares
is the sequence of input shares encrypted to each of
the Aggregators.¶
To generate a report, the Client begins by sharding its measurement into input shares and the public share using the VDAF's sharding algorithm (Section 5.1 of [VDAF]), using the report ID as the nonce:¶
(public_share, input_shares) = VDAF.measurement_to_input_shares( measurement, /* plaintext measurement */ report_id, /* nonce */ rand, /* randomness for sharding algorithm */ )¶
The last input comprises the randomness consumed by the sharding algorithm. The sharding randomness is a random byte string of length specified by the VDAF. The Client MUST generate this using a cryptographically secure random number generator.¶
The Client then wraps each input share in the following structure:¶
struct { Extension extensions<0..2^16-1>; opaque payload<0..2^32-1>; } PlaintextInputShare;¶
Field extensions
is set to the list of extensions intended to be consumed by
the given Aggregator. (See Section 4.3.3.) Field payload
is set to the
Aggregator's input share output by the VDAF sharding algorithm.¶
Next, the Client encrypts each PlaintextInputShare plaintext_input_share
as
follows:¶
enc, payload = SealBase(pk, "dap-04 input share" || 0x01 || server_role, input_share_aad, plaintext_input_share)¶
where pk
is the Aggregator's public key; server_role
is the Role of the
intended recipient (0x02
for the Leader and 0x03
for the Helper),
plaintext_input_share
is the Aggregator's PlaintextInputShare, and
input_share_aad
is an encoded message of type InputShareAad defined below,
constructed from the same values as the corresponding fields in the report. The
SealBase()
function is as specified in [HPKE], Section 6.1 for the
ciphersuite indicated by the HPKE configuration.¶
struct { TaskID task_id; ReportMetadata report_metadata; opaque public_share<0..2^32-1>; } InputShareAad;¶
The order of the encrypted input shares appear MUST match the order of the
task's aggregator_endpoints
. That is, the first share should be the Leader's,
the second share should be for the first Helper, and so on.¶
The Leader responds to well-formed requests with HTTP status code 201 Created. Malformed requests are handled as described in Section 3.2. Clients SHOULD NOT upload the same measurement value in more than one report if the Leader responds with HTTP status code 201 Created.¶
If the leader does not recognize the task ID, then it MUST abort with error
unrecognizedTask
.¶
The Leader responds to requests whose leader encrypted input share uses an
out-of-date or unknown HpkeConfig.id
value, indicated by
HpkeCiphertext.config_id
, with error of type 'outdatedConfig'. If the leader
supports multiple HPKE configurations, it can use trial decryption with each
configuration to determine if requests match a known HPKE configuration. When
the Client receives an 'outdatedConfig' error, it SHOULD invalidate any cached
HpkeConfigList and retry with a freshly generated Report. If this retried upload
does not succeed, the Client SHOULD abort and discontinue retrying.¶
If a report's ID matches that of a previously uploaded report, the Leader MUST
ignore it. In addition, it MAY alert the client with error reportRejected
. See
the implementation note in Section 4.4.1.4.¶
The Leader MUST ignore any report pertaining to a batch that has already been
collected (see Section 4.4.1.4 for details). Otherwise, comparing the
aggregate result to the previous aggregate result may result in a privacy
violation. Note that this is enforced by all Aggregators, not just the Leader.
The Leader MAY also abort the upload protocol and alert the client with error
reportRejected
.¶
The Leader MAY ignore any report whose timestamp is past the task's
task_expiration
. When it does so, it SHOULD also abort the upload protocol and
alert the Client with error reportRejected
. Client MAY choose to opt out of
the task if its own clock has passed task_expiration
.¶
The Leader may need to buffer reports while waiting to aggregate them (e.g.,
while waiting for an aggregation parameter from the Collector; see
Section 4.5). The Leader SHOULD NOT accept reports whose timestamps are too
far in the future. Implementors MAY provide for some small leeway, usually no
more than a few minutes, to account for clock skew. If the Leader rejects a
report for this reason, it SHOULD abort the upload protocol and alert the Client
with error reportTooEarly
. In this situation, the Client MAY re-upload the
report later on.¶
If the Leader's ReportShare contains an unrecognized extension, or if two extensions have the same ExtensionType, then the Leader MAY abort the upload request with error "unrecognizedMessage". Note that this behavior is not mandatory because it requires the Leader to decrypt its ReportShare.¶
Each PlaintextInputShare carries a list of extensions that Clients use to convey additional information to the Aggregator. Some extensions might be intended for all Aggregators; others may only be intended for a specific Aggregator. (For example, a DAP deployment might use some out-of-band mechanism for an Aggregator to verify that Reports come from authenticated Clients. It will likely be useful to bind the extension to the input share via HPKE encryption.)¶
Each extension is a tag-length encoded value of the following form:¶
struct { ExtensionType extension_type; opaque extension_data<0..2^16-1>; } Extension; enum { TBD(0), (65535) } ExtensionType;¶
Field "extension_type" indicates the type of extension, and "extension_data" contains information specific to the extension.¶
Extensions are mandatory-to-implement: If an Aggregator receives a Report containing an extension it does not recognize, then it MUST reject the Report. (See Section 4.4.1.4 for details.)¶
The contents of each input share must be kept confidential from everyone but the Client and the Aggregator it is being sent to. In addition, Clients must be able to authenticate the Aggregator they upload to.¶
HTTPS provides confidentiality between the DAP client and the leader, but this is not sufficient since the helper's report shares are relayed through the leader. Confidentiality of report shares is achieved by encrypting each report share to a public key held by the respective aggregator using [HPKE]. Clients fetch the public keys from each aggregator over HTTPS, allowing them to authenticate the server.¶
Aggregators MAY require clients to authenticate when uploading reports. This is an effective mitigation against Sybil [Dou02] attacks in deployments where it is practical for each Client to have an identity provisioned (e.g., a user logged into an online service or a hardware device programmed with an identity). If it is used, Client authentication MUST use a scheme that meets the requirements in Section 3.1.¶
In some deployments, it will not be practical to require Clients to authenticate (e.g., a widely distributed application that does not require its users to login to any service), so Client authentication is not mandatory in DAP.¶
[[OPEN ISSUE: deployments that don't have client auth will need to do something about Sybil attacks. Is there any useful guidance or SHOULD we can provide? Sort of relevant: issue #89]]¶
Once a set of Clients have uploaded their reports to the Leader, the Leader can begin the process of verifying and aggregating them with the Helpers. To enable the system to handle very large batches of reports, this process can be parallelized across smaller sets of reports. Verification of a set of reports is referred to as an "aggregation job". Each aggregation job is associated with exactly one DAP task, and a DAP task can have many aggregation jobs. Each job is associated with an ID that is unique within the context of a DAP task in order to distinguish different jobs from one another. Each aggregator uses this ID as an index into per-job storage, e.g., to keep track of report shares that belong to a given aggregation job.¶
To run an aggregation job, the Leader sends a request to each Helper containing the report shares in the job. Each Helper then processes them (verifying the proofs and incorporating their values into the ongoing aggregate) and responds to the Leader.¶
The exact structure of the aggregation job flow depends on the VDAF. Specifically:¶
Note that it is possible to aggregate reports from one batch while reports from the next batch are coming in. This is because each report is validated independently.¶
This process is illustrated below in Figure 2. In this example, the batch size is 20, but the Leader opts to process the reports in sub-batches of 10. Each sub-batch takes two round-trips to process.¶
The aggregation flow can be thought of as having three phases for transforming each valid input report share into an output share:¶
The Leader begins an aggregation job by choosing a set of candidate reports that pertain to the same DAP task and a unique job ID. The job ID is a 16-byte value, structured as follows:¶
opaque AggregationJobID[16];¶
The leader can run this process for many sets of candidate reports in parallel as needed. After choosing a set of candidates, the leader begins aggregation by splitting each report into report shares, one for each Aggregator. The Leader and Helpers then run the aggregate initialization flow to accomplish two tasks:¶
An invalid report share is marked with one of the following errors:¶
enum { batch_collected(0), report_replayed(1), report_dropped(2), hpke_unknown_config_id(3), hpke_decrypt_error(4), vdaf_prep_error(5), batch_saturated(6), task_expired(7), unrecognized_message(8), report_too_early(9), (255) } ReportShareError;¶
The Leader and Helper initialization behavior is detailed below.¶
The Leader begins the aggregate initialization phase with the set of candidate report shares as follows:¶
If any step invalidates the report share, the Leader removes the report share
from the set of candidate reports. Once the leader has initialized this state
for all valid candidate report shares, it creates an AggregationJobInitReq
message for each Helper to initialize the preparation of this candidate set. The
AggregationJobInitReq
message is structured as follows:¶
struct { ReportMetadata report_metadata; opaque public_share<0..2^32-1>; HpkeCiphertext encrypted_input_share; } ReportShare; struct { QueryType query_type; select (PartialBatchSelector.query_type) { case time_interval: Empty; case fixed_size: BatchID batch_id; }; } PartialBatchSelector; struct { opaque agg_param<0..2^32-1>; PartialBatchSelector part_batch_selector; ReportShare report_shares<1..2^32-1>; } AggregationJobInitReq;¶
[[OPEN ISSUE: Consider sending report shares separately (in parallel) to the aggregate instructions. Right now, aggregation parameters and the corresponding report shares are sent at the same time, but this may not be strictly necessary.]]¶
This message consists of:¶
agg_param
: The opaque, VDAF-specific aggregation parameter provided during
the collection flow (Section 4.5),¶
part_batch_selector
: The "partial batch selector" used by the Aggregators to
determine how to aggregate each report:¶
For fixed_size tasks, the Leader specifies a "batch ID" that determines the batch to which each report for this aggregation job belongs.¶
[OPEN ISSUE: For fixed_size tasks, the Leader is in complete control over which batch a report is included in. For time_interval tasks, the Client has some control, since the timestamp determines which batch window it falls in. Is this desirable from a privacy perspective? If not, it might be simpler to drop the timestamp altogether and have the agg init request specify the batch window instead.]¶
The indicated query type MUST match the task's query type. Otherwise, the Helper MUST abort with error "queryMismatch".¶
This field is called the "partial" batch selector because, depending on the
query type, it may not determine a batch. In particular, if the query type is
time_interval
, the batch is not determined until the Collector's query is
issued (see Section 4.1).¶
report_shares
: The sequence of report shares to aggregate. The
encrypted_input_share
field of the report share is the HpkeCiphertext
whose index in Report.encrypted_input_shares
is equal to the index of the
aggregator in the task's aggregator_endpoints
to which the
AggregationJobInitReq
is being sent.¶
Let {aggregator}
denote the Helper's API endpoint. The Leader sends a PUT
request to {aggregator}/tasks/{task-id}/aggregation_jobs/{aggregation-job-id}
with its AggregationJobInitReq
message as the payload. The media type is
"application/dap-aggregation-job-init-req". The Leader's aggregation job is now
in round 0.¶
The Helper's response will be an AggregationJobResp
message (see
Section 4.4.1.2, which the Leader validates according to the
criteria in Section 4.4.1.6. If the message is valid, the Leader
moves to the aggregation job continuation phase with the enclosed prepare steps,
as described in Section 4.4.2. Otherwise, the Leader should abandon the
aggregation job entirely.¶
Each Helper begins their portion of the aggregate initialization when they receive an AggregationJobInitReq message from the Leader. For each ReportShare conveyed by this message, the Helper attempts to initialize VDAF preparation (see Section 5.1 of [VDAF]) just as the Leader does. If successful, it includes its prepare message in its response that the Leader will use to continue the process.¶
To begin this process, the Helper first checks if it recognizes the task ID. If
not, then it MUST abort with error unrecognizedTask
. Then the Helper checks
that the report IDs in AggregationJobInitReq.report_shares
are all distinct.
If two ReportShare values have the same report ID, then the helper MUST abort
with error unrecognizedMessage
. If this check succeeds, the helper then
attempts to recover each input share in AggregationJobInitReq.report_shares
as
follows:¶
Once the Helper has processed each report share in
AggregationJobInitReq.report_shares
, the Helper creates an
AggregationJobResp
message to complete its initialization. This message is
structured as follows:¶
enum { continued(0), finished(1), failed(2), (255) } PrepareStepState; struct { ReportID report_id; PrepareStepState prepare_step_state; select (PrepareStep.prepare_step_state) { case continued: opaque prep_msg<0..2^32-1>; /* VDAF preparation message */ case finished: Empty; case failed: ReportShareError; }; } PrepareStep; struct { PrepareStep prepare_steps<1..2^32-1>; } AggregationJobResp;¶
The message is a sequence of PrepareStep values, the order of which matches that
of the ReportShare values in AggregationJobInitReq.report_shares
. Each report
that was marked as invalid is assigned the PrepareStepState
failed
.
Otherwise, the PrepareStep
is either marked as continued with the output
prep_msg
, or is marked as finished if the VDAF preparation process is finished
for the report share. The Helper's aggregation job is now in round 0.¶
On success, the Helper responds to the Leader with HTTP status code 201 Created
and a body consisting of the AggregationJobResp
, with media type
"application/dap-aggregation-job-resp".¶
During the aggregation job initialization (Section 4.4.1.1) or continuation
(Section 4.4.2) phases, the Leader will receive an AggregationJobResp
message from the Helper, which needs to be validated before the Leader can move
to the next phase of the aggregation protocol.¶
An AggregationJobResp
is valid only if it satisfies the following requirement:¶
prepare_steps
MUST include exactly the same report IDs in the
same order as either the report_shares
in the Leader's
AggregationJobInitReq
(if this is the first round of continuation) or the
prepare_steps
in the Leader's AggregationJobContinueReq
(if this is a
subsequent round).¶
[[OPEN ISSUE: consider relaxing this ordering constraint. See issue#217.]]¶
In the continuation phase, the leader drives the VDAF preparation of each share in the candidate report set until the underlying VDAF moves into a terminal state, yielding an output share for all Aggregators or an error. This phase may involve multiple rounds of interaction depending on the underlying VDAF. Each round trip is initiated by the leader.¶
The Leader begins each round of continuation for a report share based on its
locally computed prepare message and the previous PrepareStep from the Helper.
If PrepareStep is of type failed
, then the leader acts based on the value of
the ReportShareError
:¶
ReportShareError.report_too_early
, then the leader MAY try
to re-send the report in a later AggregationJobInitReq
.¶
If the type is finished
and the Leader's preparation of this report share is
also finished, then the report share is aggregated and can now be collected (see
Section 4.5). If the Leader is not finished, then the report cannot be
processed further and MUST be removed from the candidate set.¶
If the Helper's PrepareStep
is of type continued
, then the Leader proceeds
as follows.¶
Let leader_prep_share
denote the Leader's prepare message-share and
helper_prep_share
denote the Helper's. The Leader computes the next state
transition as follows:¶
prep_msg = VDAF.prep_shares_to_prep(agg_param, [ leader_prep_share, helper_prep_share, ]) out = VDAF.prep_next(prep_state, prep_msg)¶
where [leader_prep_share, helper_prep_share]
is a vector of two elements. If
either of these operations fails, then the leader marks the report as invalid
with error "prep_share_failed". Otherwise it interprets out
as follows: If
this is the last round of the VDAF, then out
is the Leader's output share, in
which case it stores the output share for further processing as described in
Section 4.5. Otherwise, out
is the pair (next_prep_state,
next_prep_share)
, where next_prep_state
is its updated state and
next_prep_share
is its next preparation message-share (which will be
leader_prep_share
in the next round of continuation). For the latter case, the
helper sets prep_state
to next_prep_state
.¶
The Leader now advances its aggregation job to the next round (round 1 if this
is the first continuation after initialization) and then instructs the Helper to
advance the aggregation job to the round the Leader has just reached by sending
the new prep_msg
message to the Helper in a POST request to the aggregation
job URI used during initialization (see Section 4.4.1.1). The body of the
request is an AggregationJobContinueReq
:¶
struct { u16 round; PrepareStep prepare_steps<1..2^32-1>; } AggregationJobContinueReq;¶
The round
field is the round of VDAF preparation that the Leader just reached
and wants the Helper to advance to.¶
The prepare_steps
field MUST be a sequence of PrepareStep
s in the
continued
state containing the corresponding inbound
prepare message. The
media type is set to "application/dap-aggregation-job-continue-req".¶
The Helper's response will be an AggregationJobResp
message (see
Section 4.4.1.2), which the Leader validates according to
the criteria in Section 4.4.1.6. If the message is valid, the
Leader moves to the next round of continuation with the enclosed prepare steps.
Otherwise, the Leader should abandon the aggregation job entirely.¶
If the Helper does not recognize the task ID, then it MUST abort with error
unrecognizedTask
.¶
Otherwise, the Helper continues with preparation for a report share by combining
the previous message round's prepare message (carried by the AggregationJobReq)
and its current preparation state (prep_state
). This step yields one of three
outputs:¶
(next_prep_state,
next_prep_share)
.¶
To carry out this step, for each PrepareStep
in AggregationJob.prepare_steps
received from the leader, the helper performs the following check to determine
if it should continue preparing the report share:¶
failed
, then mark the report as failed and reply with a
failed PrepareStep to the Leader.¶
finished
, then mark the report as finished and reply with a
finished PrepareStep to the leader. The Helper then stores the output share
and awaits collection; see Section 4.5.¶
Otherwise, preparation continues. The Helper MUST check its current round
against the Leader's AggregationJobContinueReq.round
value. If the Leader is
one round ahead of the Helper, then the Helper combines the Leader's prepare
message and the Helper's current preparation state as follows:¶
out = VDAF.prep_next(prep_state, prep_msg)¶
where prep_msg
is the previous VDAF prepare message sent by the leader and
prep_state
is the helper's current preparation state. This step yields one of
three outputs:¶
PrepareStep
in the failed
state.¶
PrepareStep
in the finished
state.¶
(next_prep_state, next_prep_msg)
, in which case the Helper replies to the
Leader with a PrepareStep
in the continued
state containing prep_share
.¶
After stepping each state, the Helper advances its aggregation job to the
Leader's AggregationJobContinueReq.round
.¶
If the round
in the Leader's request is 0, then the Helper MUST abort with an
error of type unrecognizedMessage
.¶
If the round
in the Leader's request is equal to the Helper's current round
(i.e., this is not the first time the Leader has sent this request), then the
Helper responds with the current round's prepare message-shares. The Helper
SHOULD verify that the contents of the AggregationJobContinueReq
are identical
to the previous message (see Section 4.4.2.3).¶
If the Leader's round
is behind or more than one round ahead of the Helper's
current round, then the Helper MUST abort with an error of type roundMismatch
.¶
If successful, the Helper responds to the Leader with HTTP status 200 OK, media
type application/dap-aggregation-job-resp
and a body consisting of an
AggregationJobResp
(see Section 4.4.1.2) compiled from the stepped
PrepareStep
s.¶
AggregationJobContinueReq
messages contain a round
field, allowing
Aggregators to ensure that their peer is on an expected round of the VDAF
preparation algorithm. In particular, the intent is to allow recovery from a
scenario where the Helper successfully advances from round n
to n+1
, but its
AggregationJobResp
response to the Leader gets dropped due to something like a
transient network failure. The Leader could then resend the request to have the
Helper advance to round n+1
and the Helper should be able to retransmit the
AggregationJobContinueReq
that was previously dropped. To make that kind of
recovery possible, Aggregator implementations SHOULD checkpoint the most recent
round's preparation state and messages to durable storage such that Leaders can
re-construct continuation requests and Helpers can re-construct continuation
responses as needed.¶
When implementing a round skew recovery strategy, Helpers SHOULD ensure that the
Leader's AggregationJobContinueReq
message did not change when it was re-sent
(i.e., the two messages must contain the same set of report IDs and prepare
messages). This prevents the Leader from re-winding an aggregation job and
re-running a round with different parameters.¶
[[OPEN ISSUE: Allowing the Leader to "rewind" aggregation job state of the Helper may allow an attack on privacy. For instance, if the VDAF verification key changes, the preparation shares in the Helper's response would change even if the consistency check is made. Security analysis is required. See #401.]]¶
One way a Helper could address this would be to store a digest of the Leader's request, indexed by aggregation job ID and round, and refuse to service a request for a given aggregation job round unless it matches the previously seen request (if any).¶
Aggregate sub-protocol messages must be confidential and mutually authenticated.¶
The aggregate sub-protocol is driven by the leader acting as an HTTPS client, making requests to the helper's HTTPS server. HTTPS provides confidentiality and authenticates the helper to the leader.¶
Leaders MUST authenticate their requests to helpers using a scheme that meets the requirements in Section 3.1.¶
In this phase, the Collector requests aggregate shares from each Aggregator and then locally combines them to yield a single aggregate result. In particular, the Collector issues a query to the Leader (Section 4.1), which the Aggregators use to select a batch of reports to aggregate. Each emits an aggregate share encrypted to the Collector so that it can decrypt and combine them to yield the aggregate result. This entire process is composed of two interactions:¶
Once complete, the collector computes the final aggregate result as specified in Section 4.5.3.¶
This overall process is referred to as a "collection job".¶
First, the Collector chooses a collection job ID:¶
opaque CollectionJobID[16];¶
This ID MUST be unique within the context of the corresponding DAP task. It is RECOMMENDED that this be set to a random string output by a cryptographically secure pseudorandom number generator.¶
To initiate the collection job, the collector issues a PUT request to
{leader}/tasks/{task-id}/collection_jobs/{collection-job-id}
. The body of the
request is structured as follows:¶
[OPEN ISSUE: Decide if and how the Collector's request is authenticated. If not, then we need to ensure that collection job URIs are resistant to enumeration attacks.]¶
struct { Query query; opaque agg_param<0..2^32-1>; /* VDAF aggregation parameter */ } CollectionReq;¶
The named parameters are:¶
query
, the Collector's query. The indicated query type MUST match the task's
query type. Otherwise, the Leader MUST abort with error "queryMismatch".¶
agg_param
, an aggregation parameter for the VDAF being executed. This is the
same value as in AggregationJobInitReq
(see Section 4.4.1.1).¶
Depending on the VDAF scheme and how the Leader is configured, the Leader and Helper may already have prepared a sufficient number of reports satisfying the query and be ready to return the aggregate shares right away. However, this is not always the case. In fact, for some VDAFs, it is not be possible to begin running aggregation jobs (Section 4.4) until the Collector initiates a collection job. This is because, in general, the aggregation parameter is not known until this point. In certain situations it is possible to predict the aggregation parameter in advance. For example, for Prio3 the only valid aggregation parameter is the empty string. For these reasons, the collection job is handled asynchronously.¶
Upon receipt of a CollectionReq
, the Leader begins by checking that it
recognizes the task ID in the request path. If not, it MUST abort with error
unrecognizedTask
. Then, the Leader verifies that the request meets the
requirements of the batch parameters using the procedure in
Section 4.5.6. If so, it immediately responds with HTTP status 201.¶
The Leader then begins working with the Helper to aggregate the reports satisfying the query (or continues this process, depending on the VDAF) as described in Section 4.4.¶
After receiving the response to its CollectionReq
, the Collector makes an HTTP
POST
request to the collection job URI to check on the status of the collect
job and eventually obtain the result. If the collection job is not finished
yet, the Leader responds with HTTP status 202 Accepted. The response MAY include
a Retry-After header field to suggest a polling interval to the Collector.¶
The Leader obtains each Helper's aggregate share following the
aggregate-share request flow described in Section 4.5.2. When all
aggregate shares are successfully obtained, the Leader responds to subsequent
HTTP POST requests to the collection job with HTTP status code 200 OK and a body
consisting of a Collection
:¶
struct { PartialBatchSelector part_batch_selector; uint64 report_count; Interval interval; HpkeCiphertext encrypted_agg_shares<1..2^32-1>; } Collection;¶
This structure includes the following:¶
part_batch_selector
: Information used to bind the aggregate result to the
query. For fixed_size tasks, this includes the batch ID assigned to the batch
by the Leader. The indicated query type MUST match the task's query type.¶
[OPEN ISSUE: What should the Collector do if the query type doesn't match?]¶
time_precision
parameter. Note that in the case
of a time_interval
type query (see Section 4.1), this interval can be smaller
than the one in the corresponding CollectionReq.query
.¶
If obtaining aggregate shares fails, then the leader responds to subsequent HTTP POST requests to the collection job with an HTTP error status and a problem document as described in Section 3.2.¶
The Leader MAY respond with HTTP status 204 No Content to requests to a collection job if the results have been deleted.¶
The collector can send an HTTP DELETE request to the collection job, which indicates to the leader that it can abandon the collection job and discard all state related to it.¶
The reason we use a POST instead of a GET to poll the state of a collection job is because of the fixed-size query mode (see Section 4.1.2). Collectors may make a query against the current batch, and it is the Leader's responsibility to keep track of what batch is current for some task. Polling a collection job is the only point at which it is safe for the Leader to change its current batch, since it constitutes acknowledgement on the Collector's part that it received the response to some previous PUT request to the collection jobs resource.¶
This means that polling a collection job can have the side effect of changing the current batch in the Leader, and thus using a GET is inappropriate.¶
The Leader obtains each Helper's encrypted aggregate share before it completes a collection job. To do this, the Leader first computes a checksum over the set of output shares included in the batch. The checksum is computed by taking the SHA256 [SHS] hash of each report ID from the client reports included in the aggregation, then combining the hash values with a bitwise-XOR operation.¶
Then, for each Aggregator endpoint {aggregator}
in the parameters associated
with CollectionReq.task_id
(see Section 4.5) except its own, the Leader
sends a POST request to {aggregator}/tasks/{task-id}/aggregate_shares
with the
following message:¶
struct { QueryType query_type; select (BatchSelector.query_type) { case time_interval: Interval batch_interval; case fixed_size: BatchID batch_id; }; } BatchSelector; struct { BatchSelector batch_selector; opaque agg_param<0..2^32-1>; uint64 report_count; opaque checksum[32]; } AggregateShareReq;¶
The message contains the following parameters:¶
batch_selector
: The "batch selector", which encodes parameters used to
determine the batch being aggregated. The value depends on the query type for
the task:¶
The indicated query type MUST match the task's query type. Otherwise, the Helper MUST abort with "queryMismatch".¶
agg_param
: The opaque aggregation parameter for the VDAF being executed.
This value MUST match the AggregationJobInitReq message for each aggregation
job used to compute the aggregate shares (see Section 4.4.1.1) and the
aggregation parameter indicated by the Collector in the CollectionReq message
(see Section 4.5.1).¶
report_count
: The number number of reports included in the batch.¶
checksum
: The batch checksum.¶
To handle the Leader's request, the Helper first ensures that it recognizes the
task ID in the request path. If not, it MUST abort with error
unrecognizedTask
. The Helper then verifies that the request meets the
requirements for batch parameters following the procedure in
Section 4.5.6.¶
Next, it computes a checksum based on the reports that satisfy the query, and
checks that the report_count
and checksum
included in the request match its
computed values. If not, then it MUST abort with an error of type
"batchMismatch".¶
Next, it computes the aggregate share agg_share
corresponding to the set of
output shares, denoted out_shares
, for the batch interval, as follows:¶
agg_share = VDAF.out_shares_to_agg_share(agg_param, out_shares)¶
Implementation note: For most VDAFs, it is possible to aggregate output shares as they arrive rather than wait until the batch is collected. To do so however, it is necessary to enforce the batch parameters as described in Section 4.5.6 so that the aggregator knows which aggregate share to update.¶
The Helper then encrypts agg_share
under the collector's HPKE public key as
described in Section 4.5.4, yielding encrypted_agg_share
.
Encryption prevents the Leader from learning the actual result, as it only has
its own aggregate share and cannot compute the Helper's.¶
The Helper responds to the Leader with HTTP status code 200 OK and a body
consisting of an AggregateShare
:¶
struct { HpkeCiphertext encrypted_aggregate_share; } AggregateShare;¶
encrypted_aggregate_share.config_id
is set to the Collector's HPKE config ID.
encrypted_aggregate_share.enc
is set to the encapsulated HPKE context enc
computed above and encrypted_aggregate_share.ciphertext
is the ciphertext
encrypted_agg_share
computed above.¶
After receiving the Helper's response, the Leader uses the HpkeCiphertext to finalize a collection job (see Section 4.5.3).¶
Once an AggregateShareReq has been issued for the batch determined by a given query, it is an error for the Leader to issue any more aggregation jobs for additional reports that satisfy the query. These reports will be rejected by helpers as described Section 4.4.1.4.¶
Before completing the collection job, the leader also computes its own
aggregate share agg_share
by aggregating all of the prepared output shares
that fall within the batch interval. Finally, it encrypts it under the
collector's HPKE public key as described in Section 4.5.4.¶
Once the Collector has received a collection job from the leader, it can decrypt
the aggregate shares and produce an aggregate result. The Collector decrypts
each aggregate share as described in Section 4.5.4. Once the
Collector successfully decrypts all aggregate shares, it unshards the aggregate
shares into an aggregate result using the VDAF's agg_shares_to_result
algorithm. In particular, let agg_shares
denote the ordered sequence of
aggregator shares, ordered by aggregator index, let report_count
denote the
report count sent by the Leader, and let agg_param
be the opaque aggregation
parameter. The final aggregate result is computed as follows:¶
agg_result = VDAF.agg_shares_to_result(agg_param, agg_shares, report_count)¶
Collect sub-protocol messages must be confidential and mutually authenticated.¶
HTTPS provides confidentiality and authenticates the Leader to the Collector. Additionally, the Leader encrypts its aggregate share to a public key held by the Collector using [HPKE].¶
Collectors MUST authenticate their requests to Leaders using a scheme that meets the requirements in Section 3.1.¶
[[OPEN ISSUE: collector public key is currently in the task parameters, but this will have to change #102]]¶
The collector and helper never directly communicate with each other, but the helper does transmit an aggregate share to the collector through the leader, as detailed in Section 4.5.2. The aggregate share must be confidential from everyone but the helper and the collector.¶
Confidentiality is achieved by having the helper encrypt its aggregate share to a public key held by the collector using [HPKE].¶
There is no authentication between the collector and the helper. This allows the leader to:¶
These are attacks on robustness, which we already assume to hold only if both aggregators are honest, which puts these malicious-leader attacks out of scope (see Section 7).¶
[[OPEN ISSUE: Should we have authentication in either direction between the helper and the collector? #155]]¶
Before an Aggregator responds to a CollectionReq or AggregateShareReq, it must first check that the request does not violate the parameters associated with the DAP task. It does so as described here.¶
First the Aggregator checks that the batch respects any "boundaries" determined by the query type. These are described in the subsections below. If the boundary check fails, then the Aggregator MUST abort with an error of type "batchInvalid".¶
Next, the Aggregator checks that batch contains a valid number of reports, as determined by the query type. If the size check fails, then the Aggregator MUST abort with error of type "invalidBatchSize".¶
Next, the Aggregator checks that the batch has not been aggregated too many
times. This is determined by the maximum number of times a batch can be queried,
max_batch_query_count
. Unless the query has been issued less than
max_batch_query_count
times, the Aggregator MUST abort with error of type
"batchQueriedTooManyTimes".¶
Finally, the Aggregator checks that the batch does not contain a report that was included in any previous batch. If this batch overlap check fails, then the Aggregator MUST abort with error of type "batchOverlap". For time_interval tasks, it is sufficient (but not necessary) to check that the batch interval does not overlap with the batch interval of any previous query. If this batch interval check fails, then the Aggregator MAY abort with error of type "batchOverlap".¶
[[OPEN ISSUE: #195 tracks how we might relax this constraint to allow for more collect query flexibility. As of now, this is quite rigid and doesn't give the collector much room for mistakes.]]¶
The batch boundaries are determined by the time_precision
field of the query
configuration. For the batch_interval
included with the query, the Aggregator
checks that:¶
batch_interval.duration >= time_precision
(this field determines,
effectively, the minimum batch duration)¶
batch_interval.start
and batch_interval.duration
are divisible by
time_precision
¶
These measures ensure that Aggregators can efficiently "pre-aggregate" output shares recovered during the aggregation sub-protocol.¶
The query configuration specifies the minimum batch size, min_batch_size
. The
Aggregator checks that len(X) >= min_batch_size
, where X
is the set of
reports in the batch.¶
For fixed_size tasks, the batch boundaries are defined by opaque batch IDs. Thus the Aggregator needs to check that the query is associated with a known batch ID:¶
by_batch_id
, the Leader
checks that the provided batch ID corresponds to a batch ID it returned in a
previous Collection for the task.¶
AggregationJobInitReq
for the task.¶
The query configuration specifies the minimum batch size, min_batch_size
, and
maximum batch size, max_batch_size
. The Aggregator checks that len(X) >=
min_batch_size
and len(X) <= max_batch_size
, where X
is the set of reports
in the batch.¶
The DAP protocol has inherent constraints derived from the tradeoff between privacy guarantees and computational complexity. These tradeoffs influence how applications may choose to utilize services implementing the specification.¶
The design in this document has different assumptions and requirements for different protocol participants, including clients, aggregators, and collectors. This section describes these capabilities in more detail.¶
Clients have limited capabilities and requirements. Their only inputs to the protocol are (1) the parameters configured out of band and (2) a measurement. Clients are not expected to store any state across any upload flows, nor are they required to implement any sort of report upload retry mechanism. By design, the protocol in this document is robust against individual client upload failures since the protocol output is an aggregate over all inputs.¶
Helpers and leaders have different operational requirements. The design in this document assumes an operationally competent leader, i.e., one that has no storage or computation limitations or constraints, but only a modestly provisioned helper, i.e., one that has computation, bandwidth, and storage constraints. By design, leaders must be at least as capable as helpers, where helpers are generally required to:¶
In addition, for each DAP task, helpers are required to:¶
Beyond the minimal capabilities required of helpers, leaders are generally required to:¶
In addition, for each DAP task, leaders are required to:¶
Collectors statefully interact with aggregators to produce an aggregate output. Their input to the protocol is the task parameters, configured out of band, which include the corresponding batch window and size. For each collect invocation, collectors are required to keep state from the start of the protocol to the end as needed to produce the final aggregate output.¶
Collectors must also maintain state for the lifetime of each task, which includes key material associated with the HPKE key configuration.¶
Privacy comes at the cost of computational complexity. While affine-aggregatable encodings (AFEs) can compute many useful statistics, they require more bandwidth and CPU cycles to account for finite-field arithmetic during input-validation. The increased work from verifying inputs decreases the throughput of the system or the inputs processed per unit time. Throughput is related to the verification circuit's complexity and the available compute-time to each aggregator.¶
Applications that utilize proofs with a large number of multiplication gates or a high frequency of inputs may need to limit inputs into the system to meet bandwidth or compute constraints. Some methods of overcoming these limitations include choosing a better representation for the data or introducing sampling into the data collection methodology.¶
[[TODO: Discuss explicit key performance indicators, here or elsewhere.]]¶
A soft real-time system should produce a response within a deadline to be useful. This constraint may be relevant when the value of an aggregate decreases over time. A missed deadline can reduce an aggregate's utility but not necessarily cause failure in the system.¶
An example of a soft real-time constraint is the expectation that input data can be verified and aggregated in a period equal to data collection, given some computational budget. Meeting these deadlines will require efficient implementations of the input-validation protocol. Applications might batch requests or utilize more efficient serialization to improve throughput.¶
Some applications may be constrained by the time that it takes to reach a privacy threshold defined by a minimum number of reports. One possible solution is to increase the reporting period so more samples can be collected, balanced against the urgency of responding to a soft deadline.¶
Not all DAP tasks have the same operational requirements, so the protocol is designed to allow implementations to reduce operational costs in certain cases.¶
In general, the aggregators are required to keep state for tasks and all valid
reports for as long as collect requests can be made for them. In particular,
aggregators must store a batch as long as the batch has not been queried more
than max_batch_query_count
times. However, it is not always necessary to store
the reports themselves. For schemes like Prio3 [VDAF] in which reports are
verified only once, each aggregator only needs to store its aggregate share for
each possible batch interval, along with the number of times the aggregate share
was used in a batch. This is due to the requirement that the batch interval
respect the boundaries defined by the DAP parameters. (See
Section 4.5.6.)¶
However, Aggregators are also required to implement several per-report checks that require retaining a number of data artifacts. For example, to detect replay attacks, it is necessary for each Aggregator to retain the set of report IDs of reports that have been aggregated for the task so far. Depending on the task lifetime and report upload rate, this can result in high storage costs. To alleviate this burden, DAP allows Aggregators to drop this state as needed, so long as reports are dropped properly as described in Section 4.4.1.4. Aggregators SHOULD take steps to mitigate the risk of dropping reports (e.g., by evicting the oldest data first).¶
Furthermore, the aggregators must store data related to a task as long as the
current time has not passed this task's task_expiration
. Aggregator MAY delete
the task and all data pertaining to this task after task_expiration
.
Implementors SHOULD provide for some leeway so the collector can collect the
batch after some delay.¶
In the absence of an application or deployment-specific profile specifying otherwise, a compliant DAP application MUST implement the following HPKE cipher suite:¶
DAP assumes an active attacker that controls the network and has the ability to statically corrupt any number of clients, aggregators, and collectors. That is, the attacker can learn the secret state of any party prior to the start of its attack. For example, it may coerce a client into providing malicious input shares for aggregation or coerce an aggregator into diverting from the protocol specified (e.g., by divulging its input shares to the attacker).¶
In the presence of this adversary, DAP aims to achieve the privacy and robustness security goals described in [VDAF]'s Security Considerations section.¶
Currently, the specification does not achieve these goals. In particular, there are several open issues that need to be addressed before these goals are met. Details for each issue are below.¶
[OPEN ISSUE: This subsection is a bit out-of-date.]¶
In this section, we enumerate the actors participating in the Prio system and enumerate their assets (secrets that are either inherently valuable or which confer some capability that enables further attack on the system), the capabilities that a malicious or compromised actor has, and potential mitigations for attacks enabled by those capabilities.¶
This model assumes that all participants have previously agreed upon and exchanged all shared parameters over some unspecified secure channel.¶
Clients may affect the quality of aggregations by reporting false input.¶
If clients reveal identifying information to aggregators (such as a trusted identity during client authentication), aggregators can learn which clients are contributing input.¶
Bogus inputs can be generated that encode "null" shares that do not affect the aggregate output, but mask the total number of true inputs.¶
[OPEN ISSUE: Define what "null" shares are. They should be defined such that inserting null shares into an aggregation is effectively a no-op. See issue#98.]¶
The leader is also an aggregator, and so all the assets, capabilities and mitigations available to aggregators also apply to the leader.¶
Input validity proof verification. The leader can forge proofs and collude with a malicious client to trick aggregators into aggregating invalid inputs.¶
Relaying messages between aggregators. The leader can compromise availability by dropping messages.¶
If all aggregators collude (e.g. by promiscuously sharing unencrypted input shares), then none of the properties of the system hold. Accordingly, such scenarios are outside of the threat model.¶
We assume the existence of attackers on the network links between participants.¶
Observation of network traffic. Attackers may observe messages exchanged between participants at the IP layer.¶
The time of transmission of input shares by clients could reveal information about user activity.¶
Observation of message size could allow the attacker to learn how much input is being submitted by a client.¶
[[OPEN ISSUE: The threat model for Prio --- as it's described in the original paper and [BBCGGI19] --- considers either a malicious client (attacking robustness) or a malicious subset of aggregators (attacking privacy). In particular, robustness isn't guaranteed if any one of the aggregators is malicious; in theory it may be possible for a malicious client and aggregator to collude and break robustness. Is this a contingency we need to address? There are techniques in [BBCGGI19] that account for this; we need to figure out if they're practical.]]¶
[TODO: Solve issue#89]¶
Client reports can contain auxiliary information such as source IP, HTTP user agent or in deployments which use it, client authentication information, which could be used by aggregators to identify participating clients or permit some attacks on robustness. This auxiliary information could be removed by having clients submit reports to an anonymizing proxy server which would then use Oblivious HTTP [I-D.draft-ietf-ohai-ohttp-07] to forward inputs to the DAP leader, without requiring any server participating in DAP to be aware of whatever client authentication or attestation scheme is in use.¶
An important parameter of a DAP deployment is the minimum batch size. If an aggregation includes too few inputs, then the outputs can reveal information about individual participants. Aggregators use the batch size field of the shared task parameters to enforce minimum batch size during the collect protocol, but server implementations may also opt out of participating in a DAP task if the minimum batch size is too small. This document does not specify how to choose minimum batch sizes.¶
The DAP parameters also specify the maximum number of times a report can be used. Some protocols, such as Poplar [BBCGGI21], require reports to be used in multiple batches spanning multiple collect requests.¶
Optionally, DAP deployments can choose to ensure their output F achieves differential privacy [Vad16]. A simple approach would require the aggregators to add two-sided noise (e.g. sampled from a two-sided geometric distribution) to outputs. Since each aggregator is adding noise independently, privacy can be guaranteed even if all but one of the aggregators is malicious. Differential privacy is a strong privacy definition, and protects users in extreme circumstances: Even if an adversary has prior knowledge of every input in a batch except for one, that one record is still formally protected.¶
[OPEN ISSUE: While parameters configuring the differential privacy noise (like specific distributions / variance) can be agreed upon out of band by the aggregators and collector, there may be benefits to adding explicit protocol support by encoding them into task parameters.]¶
Most DAP protocols, including Prio and Poplar, are robust against malicious clients, but are not robust against malicious servers. Any aggregator can simply emit bogus aggregate shares and undetectably spoil aggregates. If enough aggregators were available, this could be mitigated by running the protocol multiple times with distinct subsets of aggregators chosen so that no aggregator appears in all subsets and checking all the outputs against each other. If all the protocol runs do not agree, then participants know that at least one aggregator is defective, and it may be possible to identify the defector (i.e., if a majority of runs agree, and a single aggregator appears in every run that disagrees). See #22 for discussion.¶
The verification key for a task SHOULD be chosen before any reports are generated. It SHOULD be fixed for the lifetime of the task and not be rotated. One way to ensure this is to include the verification key in a derivation of the task ID.¶
This consideration comes from current security analysis for existing VDAFs. For example, to ensure that the security proofs for Prio3 hold, the verification key MUST be chosen independently of the generated reports. This can be achieved as recommended above.¶
Prio deployments should ensure that aggregators do not have common dependencies that would enable a single vendor to reassemble inputs. For example, if all participating aggregators stored unencrypted input shares on the same cloud object storage service, then that cloud vendor would be able to reassemble all the input shares and defeat privacy.¶
This specification defines the following protocol messages, along with their corresponding media types types:¶
The definition for each media type is in the following subsections.¶
Protocol message format evolution is supported through the definition of new formats that are identified by new media types.¶
IANA [shall update / has updated] the "Media Types" registry at https://www.iana.org/assignments/media-types with the registration information in this section for all media types listed above.¶
[OPEN ISSUE: Solicit review of these allocations from domain experts.]¶
application¶
dap-hpke-config-list¶
N/A¶
None¶
only "8bit" or "binary" is permitted¶
see Section 4.2¶
N/A¶
this specification¶
N/A¶
N/A¶
see Authors' Addresses section¶
COMMON¶
N/A¶
see Authors' Addresses section¶
IESG¶
application¶
dap-report¶
N/A¶
None¶
only "8bit" or "binary" is permitted¶
see Section 4.3.2¶
N/A¶
this specification¶
N/A¶
N/A¶
see Authors' Addresses section¶
COMMON¶
N/A¶
see Authors' Addresses section¶
IESG¶
application¶
dap-aggregation-job-init-req¶
N/A¶
None¶
only "8bit" or "binary" is permitted¶
see Section 4.5¶
N/A¶
this specification¶
N/A¶
N/A¶
see Authors' Addresses section¶
COMMON¶
N/A¶
see Authors' Addresses section¶
IESG¶
application¶
dap-aggregation-job-resp¶
N/A¶
None¶
only "8bit" or "binary" is permitted¶
see Section 4.5¶
N/A¶
this specification¶
N/A¶
N/A¶
see Authors' Addresses section¶
COMMON¶
N/A¶
see Authors' Addresses section¶
IESG¶
application¶
dap-aggregation-job-continue-req¶
N/A¶
None¶
only "8bit" or "binary" is permitted¶
see Section 4.5¶
N/A¶
this specification¶
N/A¶
N/A¶
see Authors' Addresses section¶
COMMON¶
N/A¶
see Authors' Addresses section¶
IESG¶
application¶
dap-collect-req¶
N/A¶
None¶
only "8bit" or "binary" is permitted¶
see Section 4.5¶
N/A¶
this specification¶
N/A¶
N/A¶
see Authors' Addresses section¶
COMMON¶
N/A¶
see Authors' Addresses section¶
IESG¶
application¶
dap-collection¶
N/A¶
None¶
only "8bit" or "binary" is permitted¶
see Section 4.5¶
N/A¶
this specification¶
N/A¶
N/A¶
see Authors' Addresses section¶
COMMON¶
N/A¶
see Authors' Addresses section¶
IESG¶
This document requests creation of a new registry for Query Types. This registry should contain the following columns:¶
[TODO: define how we want to structure this registry when the time comes]¶
This document requests creation of a new registry for extensions to the Upload protocol. This registry should contain the following columns:¶
[TODO: define how we want to structure this registry when the time comes]¶
The following value [will be/has been] registered in the "IETF URN Sub-namespace for Registered Protocol Parameter Identifiers" registry, following the template in [RFC3553]:¶
Registry name: dap Specification: [[THIS DOCUMENT]] Repository: http://www.iana.org/assignments/dap Index value: No transformation needed.¶
Initial contents: The types and descriptions in the table in Section 3.2 above, with the Reference field set to point to this specification.¶