Internet-Draft | Binomal DP in MPC | July 2024 |
Case & Thomson | Expires 9 January 2025 | [Page] |
TODO¶
This note is to be removed before publishing as an RFC.¶
The latest revision of this draft can be found at https://private-attribution.github.io/i-d/draft-case-ppm-binomial-dp.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-case-ppm-binomial-dp/.¶
Discussion of this document takes place on the Privacy Preserving Measurement Working Group mailing list (mailto:ppm@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/ppm/. Subscribe at https://www.ietf.org/mailman/listinfo/ppm/.¶
Source for this draft and an issue tracker can be found at https://github.com/private-attribution/i-d.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 9 January 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Using multiparty computation (MPC) to compute aggregate statistics has some very promising privacy characteristics. MPC provides strong assurances about the confidentiality of input values, relying only on the assumption that the parties performing the computation do not collude. Depending on the MPC system in use, the cryptographic assumptions involved can be conservative. For instance, MPC is the basis of the Verifiable, Distributed Aggregation Functions (VDAFs) [VDAF] used in DAP [DAP].¶
Depending on how the system is used, particularly for systems where the MPC system offers some flexibility in how it can be queried, concrete privacy guarantees are harder to provide. Multiple aggregations over similar input data might be computed, leading to aggregates that can be compared to reveal aggregates over a small set of inputs or even the value of specific inputs.¶
Differential privacy (DP) [DWORK]) offers a framework for both analyzing and protecting privacy that can be applied to this problem to great effect. By adding some amount of noise to aggregates, strong guarantees can be made about the amount of privacy loss that applies to any given input.¶
There are multiple methods for applying noise to aggregates, but the one that offers the lowest amount of noise — and therefore the most useful outputs — is one where a single entity samples and adds noise, known as central DP. Alternatives include local DP, where each noise is added to each input to the aggregation, or shuffle DP, which reduces noise requirements for local DP by shuffling inputs.¶
Applying noise in a single place ensures that the amount of noise is directly proportional to the sensitivity (that is, the maximum amount that any input might contribute to the output) rather than being in some way proportional to the number of inputs. The amount of noise relative to aggregates decreases as the number of inputs increases, meaning that central DP effectively provides an optimal amount of noise.¶
There are several approaches to adding noise in MPC.¶
Use of local or shuffle DP is possible. As noted, these methods can add more noise than is ideal.¶
Noise can be added by each party independently. Each party adds noise in a fraction that is based on its understanding of the number of honest parties present. In two-party MPC, each party has to assume the other is dishonest, so each adds the entire noise quantity, ultimately doubling the overall noise that is added. In a three-party honest majority MPC, each party can add half of the required noise on the assumption that one other party is honest, resulting in a 50% increase in the amount of noise.¶
Finally, an MPC protocol can be executed to add noise. The primary drawback of this approach is that there is an increased cost to generating the noise in MPC. However, MPC protocols can avoid having to include additional noise in order to compensate for the risk of information leakage from a dishonest participant. Adding noise using MPC provides strong assurances that noise is not known to any party, including the parties that perform the computation, up to the limits of the MPC scheme in use. Finally, the costs of computation in MPC scale only with the privacy parameters for the differential privacy, not the number of inputs. Amortizing this cost over large sets of inputs can make the additional cost small.¶
The Bernoulli distribution provides approximate differential privacy (DP) [DWORK]. This is sometimes named (epsilon, delta)-differential privacy or (ε, δ)-differential privacy. The epsilon value in approximate DP bounds privacy loss for most contributions to the output, however the delta value is a non-zero bound on the probability that a higher privacy loss occurs.¶
A binomial, Bin(N, p)
, distribution is the number of successes out of N
Bernoulli trials, where each Bernoulli trial is a coin flip with success
probability p
.¶
Due to the central limit theorem, a binomial distribution with large N
is a
close approximation of a Normal or Gaussian distribution, which has a number of
useful properties.¶
This document describes a simple MPC protocol, with several instantiations, for efficiently computing binomial noise.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [BCP14] when, and only when, they appear in all capitals, as shown here.¶
The binomial mechanism for DP generates binomial noise in MPC and adds it to outputs before they are released.¶
Our parameter choices rely on an analysis from [CPSGD], which provides more comprehensive formulae for a range of parameters.¶
To sample from a Bin(N, p)
distribution in MPC, two things are needed:¶
A protocol for Bernoulli trials, or coin-flipping protocol, that produces a
value of 1 with probability p
and 0 otherwise.¶
A means to sum the value of N
trials.¶
This protocol sets p
to 0.5. This value of p
provides both an optimal
privacy/utility trade off and good efficiency for computation in MPC. Each
Bernoulli sample requires a single, uniformly distributed bit, which can be done
very efficiently. Using p = 0.5
also requires the fewest samples for any set
of parameters, except for cases with extremely low variance requirements, which
we consider to be out of scope; see Section 2 of [CPSGD].¶
There are several ways to instantiate a coin flipping protocol in MPC depending what MPC protocol is being used. Section 4.1 describes some basic protocol instantiations.¶
For any given set of privacy parameters (epsilon, delta) and for a known sensitivity, Section 3.1 describes how to determine the number of Bernoulli samples needed.¶
To count the number of successes across these N
trials, the MPC helpers simply
run an aggregation circuit over the secret shared results of the N
Bernoulli
trials, each or which is either 0 or 1. The result of this sum is a sample from
a Bin(N, p)
distribution. This binomial noise value is then added to the
output inside the MPC and then the final noised result revealed to the
appropriate output parties. That is, if the MPC computes f(D)
, it outputs
shares of the result f(D) + Bin(N,p)
.¶
The party receiving the output can then postprocess this output to get an
unbiased estimate for f(D)
by subtracting the mean of the Bin(N,p)
distribution, which is N·p
.¶
In the remainder of this document is organized as follows:¶
Section 3 introduces an additional quantization scaling parameter that can be used to optimize the privacy/utility tradeoff.¶
Section 3.1 details the process of determining for a given function f()
and privacy parameters how to determine the optimal number of trials, N
.¶
Section 4.1 describes some instantiations of the coin flipping protocol and the aggregation protocol.¶
Section 5.1 includes a cost analysis of different instantiations in both computation and communication costs.¶
Section 5.2 compares the binomial mechanism to other DP approaches that might be used in MPC.¶
[CPSGD] provides an additional "quantization scale" parameter, s
, for the
binomial mechanism that can be tuned to make it more closely approximate the
Gaussian mechanism and get an improved privacy/utility tradeoff.¶
The paper defines the application of the binomial mechanism as:¶
f(D) + (X - Np) * s¶
where f(D)
is the value that is protected and X
is a sample from a Bin(N,
p)
distribution. This produces a scaled and unbiased output.¶
The value of s
is typically smaller than one, meaning that the sample noise is
effectively able to produce non-integer values. However, operating on
non-integer values in MPC is more complex, so this documents uses a modified
version where the MPC computes:¶
o = f(D) / s + X¶
For an MPC system, the output of the system is shares of this scaled and biased value. The recipient can reconstruct the an unbiased, unscaled, noised value by:¶
Adding the shares it receives: o = sum(oᵢ, o₂, …)
¶
Correcting for bias: o - N·p
¶
Scaling the value: f′(D) = s * (o - N·p)
¶
The resulting value is always within N·p·s
of the computed aggregate, but it
could be negative if that aggregate is smaller than N·p·s
.¶
Applying noise for differential privacy requires understanding the function
being computed, f()
, and the private dataset, D
. For an f
that is a
d
-dimensional query with integer outputs, the output vector is in ℤ<sup>d</sup>
. That
is, the output is a d
-dimensional vector of natural numbers.¶
The binomial mechanism requires understanding the sensitivity of the result under three separate norms.¶
Differential privacy describes sensitivity in terms of databases. In this, databases are considering "neighboring" if the additional, removal, or sometimes the substitution of inputs related to a single subject turns one database into the other.¶
For two neighboring databases D₁
, D₂
, the sensitivity of f
is:¶
||f(D₁) - f(D₂)||ₚ¶
For f(D)
that produces output that is a d
-dimensional vector of integer
values, the p
-norms of interest for use with the binomial mechanism is the L1,
L2, and L∞ (or Linfty) norms.¶
The L1 norm of x∊ℤ<sup>d</sup>
is:¶
sensitivity_1 = ||x||₁ = sum(i=1..d, |xᵢ|)¶
The L2 norm is:¶
sensitivity_2 = ||x||₂ = sqrt(sum(i=1..d,xᵢ²))¶
Finally, the L∞ norm is:¶
sensitivity_infty = ||x||∞ = maxᵢ(|xᵢ|)¶
These properties of the function f()
are all specific to the use case and need
to be known.¶
The privacy parameters for approximate DP are epsilon, ε, and delta, δ. These parameters determine the bounds on privacy loss.¶
Epsilon may vary considerably, though is typically in the range [0.01,
10]
. Often, multiple queries spend proportions of a larger epsilon privacy
budget. For example, a privacy budget of epsilon=3
might be spent in three
separate queries with epsilon 1 or as four queries with epsilon of 2, 0.1, 0.3,
and 0.6.¶
Delta is often be fixed in the range
(2^{-29}..2^{-24}). Typically, the only constraint on delta is
to ensure that 1/delta > population
; that is, expected number of contributions
that will suffer privacy loss greater than epsilon is kept less than one. For
MPC functions that include very large numbers of inputs, delta might need to be
reduced.¶
Theorem 1 of [CPSGD] gives a way to determine the fewest Bernoulli trials,
N
, needed to achieve approximate DP. There are two main constraints that need
to be satisfied which we call the delta_constraint
and
epsilon_delta_constraint
.¶
As the number Bernoulli trials increases, each constraint monotonically allows
for smaller epsilon and delta values to be achieved. To find the smallest
number of Bernoullis that simultaneously satisfies both constraints, find the
minimum N
determined by the delta_constraint
and the minimum N
determined
by the epsilon_delta_contraint
and then take the maximum of these two values.¶
A possible approach to satisfying both constraints is to perform a binary search
over N
to find the smallest one satisfying both constraints simultaneously. A
search might be acceptable as the computation only needs to be performed once
for each set of parameters. An alternative and more direct approach is
described in the following sections.¶
N
by delta_constraint
The delta_constraint
is a function of delta, the dimension, d
, the
sensitivityᵢnfty
, the quantization scale, s
, and p
(which is fixed to 0.5
in this document). This produces a simple formula for determining the minimum
value of N
:¶
N >= 4·max(23·ln(10·d/delta), 2·sensitivityᵢnfty/s)¶
N
by epsilon_delta_constraint
The epsilon_delta_constraint
is a function of epsilon, delta, s
, d
,
sensitivity_1
, sensitivity_2
, sensitivity_infty
, and p
(0.5). It is a
more complicated formula.¶
For the epsilon_delta constraint
, [CPSGD] defines some intermediate
functions of the success probability, p
. For p = 0.5
, these become fixed
constants:¶
bₚ = 1/3 cₚ = sqrt(2)·7/4 dₚ = 2/3¶
The epsilon_delta_constraint
, as written in formula (7) of [CPSGD],
determines what epsilon is currently attained by the provided N
and other
parameters:¶
epsilon = sensitivity_2·sqrt(2·ln(1.25/delta)) / (s/2·sqrt(N)) + (sensitivity_2·cₚ·sqrt(ln(10/delta)) + sensitivity_1·bₚ) / ((s/4)·(1-delta/10) · N) + (2/3·sensitivity_infty·ln(1.25/delta) + sensitivity_infty·dₚ·ln(20·d/delta)·ln(10/delta)) / ((s/4)·N)¶
The value of N
for a fixed set of values for epsilon, delta, sensitivity, and
s
, is a quadratic equation in N
.¶
To see this first write equation (7) as with other variables gathered into
constants c₁
and c₂
:¶
epsilon = c₁ / sqrt(N) + c₂ / N c₁ = 2·sensitivity_2·sqrt(2·ln(1.25/delta)) c₂ = 4 / s·( (sensitivity_2·cₚ·sqrt(ln(10/delta)) + sensitivity_1·bₚ) / (1-delta/10) + 2·sensitivity_infty·ln(1.25/delta) / 3 + sensitivity_infty·dₚ·ln(20·d/delta)·ln(10/delta) )¶
The formula for epsilon can then be written as a quadratic equation in N
:¶
0 = (epsilon / c₁)^2·N² + (2·epsilon·c₂ / c₁² - 1)·N + (c₂ / c₁)^2¶
Once the values of all the other parameters are fixed, this can be solved with the quadratic formula.¶
Setting the quantization scale correctly can help get the best privacy/utility trade offs for the mechanism. An additional equation to note is the error of the mechanism which we would like to minimize subject to the privacy constraints¶
error = d·s²·N·p·(1-p) = 4·d·s²·N¶
[CPSGD] discusses more about why 0.5 is the optimal choice for p
. When it
comes to setting the quantization scale s, making it smaller will decrease the
error directly but also require a larger N
.¶
It is generally the case that making s
smaller will continually decrease the
error, but at some point there is necessarily a performance constraint from the
MPC cost of how large an N
is practical.¶
One approach to setting the scale parameter would be to first determine an upper
bound allowed for N
and then set s
as small as possible within that
constraint. Another approach would be to look for a point at which decreasing
s
and increasing N
leads to diminishing returns in reducing the error of the
mechanism.¶
Once the optimal number of Bernoulli trials has been determined, there are two phases to the algorithm:¶
Perform a distributed coin flipping protocol so that all helpers hold secret
shares of 0 or 1 with probability, p
.¶
Sum up these secret shared samples into a sample from a Bin(N, p)
.¶
This document uses p = 0.5
, so the coin flipping protocol can use a
uniformly-distributed source of entropy.¶
The use of the binomial mechanism for p = 0.5
in a concrete MPC requires a
protocol for jointly computing a number of random bits. Different systems will
have different requirements. This section describes three basic protocols that
can be used to compute the binomial distribution.¶
A three party honest majority system is appealing because there are very efficient protocols for performing multiplication; see [MPC-MUL].¶
Two protocols are described:¶
A binary circuit allows the coin flip to be performed without any communication cost using PRSS [PRSS]. Aggregation requires the use of an addition circuit, which requires one binary multiplication per bit.¶
A circuit using prime fields allows the aggregation to be performed without communication, but the coin flip protocol, which also uses PRSS, requires a modulus conversion operation.¶
Overall, the binary circuit is more efficient in terms of communication costs, but it might be easier to integrate the prime field circuit into a system that uses prime fields.¶
A coin flip protocol in a three party honest majority system simply samples a random share using PRSS. The result is a three-way, replicated sharing of a random binary value.¶
Aggregating these values can be performed using a binary circuit in a tree. Two
bits, a
and b
, are added to form a binary value, {a∧b, a⊕b}
.¶
This process is continued pairwise. The resulting pairs, {a1, a2}
and {b1,
b2}
, are also added pairwise to produce a three-bit value, {a1∧b1,
a1⊕b1⊕(a2∧b2), a2⊕b2}
.¶
Each successive iteration involves one more bit and half as many values, until a
single value with log₂(N)
bits is produced.¶
This aggregation process requires at most 4N
binary multiplications.¶
Addition of values in a prime field with a modulus greater than the number of samples (N) can be performed trivially. However, producing a replicated secret sharing across three parties using a single bit sample from PRSS results in a value that is uniformly distributed between 0 and 2 inclusive.¶
A modulus conversion operation can be used to convert that into a sharing in the prime field. This requires two multiplications, though some parts of those multiplications can be avoided; see [KIMHC].¶
Three bits are sampled by each pair of parties. These are turned into three
shared values, where two of the shared values are filled with zeros. The
exclusive OR of these three values is computed using two multiplications in the
form: x⊕y = x + y - 2xy
. This produces a three-way replicated sharing of a
bit in the prime field.¶
Shares can then be aggregated through simple addition.¶
Obtaining multiple random bits in a two party protocol might involve the use of an oblivious transfer protocol. Ideally, these are obtained in a large prime field so that addition is free.¶
Details of OT protocol TBD.¶
A binomial function is relatively inexpensive to compute in MPC.¶
With large epsilon and delta values (that is, low privacy) the use of the binomial mechanism can be very efficient. However, smaller values for epsilon or delta can require significant numbers of Bernoulli trials.¶
The following table shows some typical values and the resulting number of
trials, along with approximate values for the quantization scaling factor (s
)
and error.¶
epsilon | delta | N | s | error |
---|---|---|---|---|
3 | 10e-6 | TODO | TODO | TODO |
1 | 10e-6 | TODO | TODO | TODO |
0.1 | 10e-6 | TODO | TODO | TODO |
Two other approaches that should be compared with are:¶
simply having each helper party add noise independently [DWORK]¶
amplification by shuffling [SHUFLDP] where local DP is added by clients and used to get a central DP guarantee¶
A binomial will alway give better privacy/utility trade offs compared to
independent noise. An MPC system has to ensure that t
out of P
parties can
reveal their shares without degrading the privacy of outputs. Consequently, the
noise that each party adds needs to be proportional to P/(P-t)
times the
target amount, assuming that noise can be simply added. For a three party
honest majority system, P
is 3 and t
is 1, producing 50% more noise than is
ideal. For a two party system, the amount of noise needs to be doubled.¶
Shuffling and any scheme that makes use of noised inputs results in noise that increases in magnitude as the number of inputs increases, which degrades utility. The binomial mechanism does not result in any additional noise.¶
This document has no IANA considerations.¶