Simple and Efficient Binomial Protocols for Differential Privacy in MPC

Internet-Draft	Binomal DP in MPC	July 2024
Case & Thomson	Expires 9 January 2025	[Page]

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶

This Internet-Draft will expire on 9 January 2025.¶

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶

1. Introduction

Using multiparty computation (MPC) to compute aggregate statistics has some very promising privacy characteristics. MPC provides strong assurances about the confidentiality of input values, relying only on the assumption that the parties performing the computation do not collude. Depending on the MPC system in use, the cryptographic assumptions involved can be conservative. For instance, MPC is the basis of the Verifiable, Distributed Aggregation Functions (VDAFs) [VDAF] used in DAP [DAP].¶

Depending on how the system is used, particularly for systems where the MPC system offers some flexibility in how it can be queried, concrete privacy guarantees are harder to provide. Multiple aggregations over similar input data might be computed, leading to aggregates that can be compared to reveal aggregates over a small set of inputs or even the value of specific inputs.¶

Differential privacy (DP) [DWORK]) offers a framework for both analyzing and protecting privacy that can be applied to this problem to great effect. By adding some amount of noise to aggregates, strong guarantees can be made about the amount of privacy loss that applies to any given input.¶

There are multiple methods for applying noise to aggregates, but the one that offers the lowest amount of noise — and therefore the most useful outputs — is one where a single entity samples and adds noise, known as central DP. Alternatives include local DP, where each noise is added to each input to the aggregation, or shuffle DP, which reduces noise requirements for local DP by shuffling inputs.¶

Applying noise in a single place ensures that the amount of noise is directly proportional to the sensitivity (that is, the maximum amount that any input might contribute to the output) rather than being in some way proportional to the number of inputs. The amount of noise relative to aggregates decreases as the number of inputs increases, meaning that central DP effectively provides an optimal amount of noise.¶

1.1. DP Noise in MPC

There are several approaches to adding noise in MPC.¶

Use of local or shuffle DP is possible. As noted, these methods can add more noise than is ideal.¶

Noise can be added by each party independently. Each party adds noise in a fraction that is based on its understanding of the number of honest parties present. In two-party MPC, each party has to assume the other is dishonest, so each adds the entire noise quantity, ultimately doubling the overall noise that is added. In a three-party honest majority MPC, each party can add half of the required noise on the assumption that one other party is honest, resulting in a 50% increase in the amount of noise.¶

Finally, an MPC protocol can be executed to add noise. The primary drawback of this approach is that there is an increased cost to generating the noise in MPC. However, MPC protocols can avoid having to include additional noise in order to compensate for the risk of information leakage from a dishonest participant. Adding noise using MPC provides strong assurances that noise is not known to any party, including the parties that perform the computation, up to the limits of the MPC scheme in use. Finally, the costs of computation in MPC scale only with the privacy parameters for the differential privacy, not the number of inputs. Amortizing this cost over large sets of inputs can make the additional cost small.¶

1.2. Binomal Noise

The Bernoulli distribution provides approximate differential privacy (DP) [DWORK]. This is sometimes named (epsilon, delta)-differential privacy or (ε, δ)-differential privacy. The epsilon value in approximate DP bounds privacy loss for most contributions to the output, however the delta value is a non-zero bound on the probability that a higher privacy loss occurs.¶

A binomial, Bin(N, p), distribution is the number of successes out of N Bernoulli trials, where each Bernoulli trial is a coin flip with success probability p.¶

Due to the central limit theorem, a binomial distribution with large N is a close approximation of a Normal or Gaussian distribution, which has a number of useful properties.¶

This document describes a simple MPC protocol, with several instantiations, for efficiently computing binomial noise.¶

1.3. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [BCP14] when, and only when, they appear in all capitals, as shown here.¶

2. The Binomial Mechanism for MPC

The binomial mechanism for DP generates binomial noise in MPC and adds it to outputs before they are released.¶

Our parameter choices rely on an analysis from [CPSGD], which provides more comprehensive formulae for a range of parameters.¶

To sample from a Bin(N, p) distribution in MPC, two things are needed:¶

A protocol for Bernoulli trials, or coin-flipping protocol, that produces a value of 1 with probability p and 0 otherwise.¶
A means to sum the value of N trials.¶

This protocol sets p to 0.5. This value of p provides both an optimal privacy/utility trade off and good efficiency for computation in MPC. Each Bernoulli sample requires a single, uniformly distributed bit, which can be done very efficiently. Using p = 0.5 also requires the fewest samples for any set of parameters, except for cases with extremely low variance requirements, which we consider to be out of scope; see Section 2 of [CPSGD].¶

There are several ways to instantiate a coin flipping protocol in MPC depending what MPC protocol is being used. Section 4.1 describes some basic protocol instantiations.¶

For any given set of privacy parameters (epsilon, delta) and for a known sensitivity, Section 3.1 describes how to determine the number of Bernoulli samples needed.¶

To count the number of successes across these N trials, the MPC helpers simply run an aggregation circuit over the secret shared results of the N Bernoulli trials, each or which is either 0 or 1. The result of this sum is a sample from a Bin(N, p) distribution. This binomial noise value is then added to the output inside the MPC and then the final noised result revealed to the appropriate output parties. That is, if the MPC computes f(D), it outputs shares of the result f(D) + Bin(N,p).¶

The party receiving the output can then postprocess this output to get an unbiased estimate for f(D) by subtracting the mean of the Bin(N,p) distribution, which is N·p.¶

2.1. Document Organization

In the remainder of this document is organized as follows:¶

Section 3 introduces an additional quantization scaling parameter that can be used to optimize the privacy/utility tradeoff.¶
Section 3.1 details the process of determining for a given function f() and privacy parameters how to determine the optimal number of trials, N.¶
Section 4.1 describes some instantiations of the coin flipping protocol and the aggregation protocol.¶
Section 5.1 includes a cost analysis of different instantiations in both computation and communication costs.¶
Section 5.2 compares the binomial mechanism to other DP approaches that might be used in MPC.¶

3. Quantization Scale

[CPSGD] provides an additional "quantization scale" parameter, s, for the binomial mechanism that can be tuned to make it more closely approximate the Gaussian mechanism and get an improved privacy/utility tradeoff.¶

The paper defines the application of the binomial mechanism as:¶

f(D) + (X - Np) * s

where f(D) is the value that is protected and X is a sample from a Bin(N, p) distribution. This produces a scaled and unbiased output.¶

The value of s is typically smaller than one, meaning that the sample noise is effectively able to produce non-integer values. However, operating on non-integer values in MPC is more complex, so this documents uses a modified version where the MPC computes:¶

o = f(D) / s + X

For an MPC system, the output of the system is shares of this scaled and biased value. The recipient can reconstruct the an unbiased, unscaled, noised value by:¶

Adding the shares it receives: o = sum(oᵢ, o₂, …)¶
Correcting for bias: o - N·p¶
Scaling the value: f′(D) = s * (o - N·p)¶

The resulting value is always within N·p·s of the computed aggregate, but it could be negative if that aggregate is smaller than N·p·s.¶

3.1. Determining number of Bernoulli trials

Applying noise for differential privacy requires understanding the function being computed, f(), and the private dataset, D. For an f that is a d-dimensional query with integer outputs, the output vector is in ℤ<sup>d</sup>. That is, the output is a d-dimensional vector of natural numbers.¶

The binomial mechanism requires understanding the sensitivity of the result under three separate norms.¶

3.1.1. Sensitivity

Differential privacy describes sensitivity in terms of databases. In this, databases are considering "neighboring" if the additional, removal, or sometimes the substitution of inputs related to a single subject turns one database into the other.¶

For two neighboring databases D₁, D₂, the sensitivity of f is:¶

||f(D₁) - f(D₂)||ₚ

For f(D) that produces output that is a d-dimensional vector of integer values, the p-norms of interest for use with the binomial mechanism is the L1, L2, and L∞ (or Linfty) norms.¶

The L1 norm of x∊ℤ<sup>d</sup> is:¶

sensitivity_1 = ||x||₁ = sum(i=1..d, |xᵢ|)

The L2 norm is:¶

sensitivity_2 = ||x||₂ = sqrt(sum(i=1..d,xᵢ²))

Finally, the L∞ norm is:¶

sensitivity_infty = ||x||∞ = maxᵢ(|xᵢ|)

These properties of the function f() are all specific to the use case and need to be known.¶

3.2. Epsilon and Delta Constraints

The privacy parameters for approximate DP are epsilon, ε, and delta, δ. These parameters determine the bounds on privacy loss.¶

Epsilon may vary considerably, though is typically in the range [0.01, 10]. Often, multiple queries spend proportions of a larger epsilon privacy budget. For example, a privacy budget of epsilon=3 might be spent in three separate queries with epsilon 1 or as four queries with epsilon of 2, 0.1, 0.3, and 0.6.¶

Delta is often be fixed in the range (2^-29..2^-24). Typically, the only constraint on delta is to ensure that 1/delta > population; that is, expected number of contributions that will suffer privacy loss greater than epsilon is kept less than one. For MPC functions that include very large numbers of inputs, delta might need to be reduced.¶

Theorem 1 of [CPSGD] gives a way to determine the fewest Bernoulli trials, N, needed to achieve approximate DP. There are two main constraints that need to be satisfied which we call the delta_constraint and epsilon_delta_constraint.¶

As the number Bernoulli trials increases, each constraint monotonically allows for smaller epsilon and delta values to be achieved. To find the smallest number of Bernoullis that simultaneously satisfies both constraints, find the minimum N determined by the delta_constraint and the minimum N determined by the epsilon_delta_contraint and then take the maximum of these two values.¶

A possible approach to satisfying both constraints is to perform a binary search over N to find the smallest one satisfying both constraints simultaneously. A search might be acceptable as the computation only needs to be performed once for each set of parameters. An alternative and more direct approach is described in the following sections.¶

3.2.1. Bounding `N` by `delta_constraint`

The delta_constraint is a function of delta, the dimension, d, the sensitivityᵢnfty, the quantization scale, s, and p (which is fixed to 0.5 in this document). This produces a simple formula for determining the minimum value of N:¶

N >= 4·max(23·ln(10·d/delta), 2·sensitivityᵢnfty/s)

3.2.2. Bounding `N` by `epsilon_delta_constraint`

The epsilon_delta_constraint is a function of epsilon, delta, s, d, sensitivity_1, sensitivity_2, sensitivity_infty, and p (0.5). It is a more complicated formula.¶

For the epsilon_delta constraint, [CPSGD] defines some intermediate functions of the success probability, p. For p = 0.5, these become fixed constants:¶

bₚ = 1/3
cₚ = sqrt(2)·7/4
dₚ = 2/3

The epsilon_delta_constraint, as written in formula (7) of [CPSGD], determines what epsilon is currently attained by the provided N and other parameters:¶

epsilon =
  sensitivity_2·sqrt(2·ln(1.25/delta))
    / (s/2·sqrt(N))
  + (sensitivity_2·cₚ·sqrt(ln(10/delta)) + sensitivity_1·bₚ)
    / ((s/4)·(1-delta/10) · N)
  + (2/3·sensitivity_infty·ln(1.25/delta)
      + sensitivity_infty·dₚ·ln(20·d/delta)·ln(10/delta))
    / ((s/4)·N)

The value of N for a fixed set of values for epsilon, delta, sensitivity, and s, is a quadratic equation in N.¶

To see this first write equation (7) as with other variables gathered into constants c₁ and c₂:¶

epsilon = c₁ / sqrt(N) + c₂ / N
c₁ = 2·sensitivity_2·sqrt(2·ln(1.25/delta))
c₂ = 4 / s·(
    (sensitivity_2·cₚ·sqrt(ln(10/delta)) + sensitivity_1·bₚ)
      / (1-delta/10)
    + 2·sensitivity_infty·ln(1.25/delta) / 3
    + sensitivity_infty·dₚ·ln(20·d/delta)·ln(10/delta)
  )

The formula for epsilon can then be written as a quadratic equation in N:¶

0 = (epsilon / c₁)^2·N² + (2·epsilon·c₂ / c₁² - 1)·N + (c₂ / c₁)^2

Once the values of all the other parameters are fixed, this can be solved with the quadratic formula.¶

3.2.3. Setting the Quantization Scale

Setting the quantization scale correctly can help get the best privacy/utility trade offs for the mechanism. An additional equation to note is the error of the mechanism which we would like to minimize subject to the privacy constraints¶

error = d·s²·N·p·(1-p) = 4·d·s²·N

[CPSGD] discusses more about why 0.5 is the optimal choice for p. When it comes to setting the quantization scale s, making it smaller will decrease the error directly but also require a larger N.¶

It is generally the case that making s smaller will continually decrease the error, but at some point there is necessarily a performance constraint from the MPC cost of how large an N is practical.¶

One approach to setting the scale parameter would be to first determine an upper bound allowed for N and then set s as small as possible within that constraint. Another approach would be to look for a point at which decreasing s and increasing N leads to diminishing returns in reducing the error of the mechanism.¶

4. Noise Generation Algorithm

Once the optimal number of Bernoulli trials has been determined, there are two phases to the algorithm:¶

Perform a distributed coin flipping protocol so that all helpers hold secret shares of 0 or 1 with probability, p.¶
Sum up these secret shared samples into a sample from a Bin(N, p).¶

This document uses p = 0.5, so the coin flipping protocol can use a uniformly-distributed source of entropy.¶

4.1. Coin Flipping and Aggregation Protocols

The use of the binomial mechanism for p = 0.5 in a concrete MPC requires a protocol for jointly computing a number of random bits. Different systems will have different requirements. This section describes three basic protocols that can be used to compute the binomial distribution.¶

4.1.1. Three Party Honest Majority

A three party honest majority system is appealing because there are very efficient protocols for performing multiplication; see [MPC-MUL].¶

Two protocols are described:¶

A binary circuit allows the coin flip to be performed without any communication cost using PRSS [PRSS]. Aggregation requires the use of an addition circuit, which requires one binary multiplication per bit.¶
A circuit using prime fields allows the aggregation to be performed without communication, but the coin flip protocol, which also uses PRSS, requires a modulus conversion operation.¶

Overall, the binary circuit is more efficient in terms of communication costs, but it might be easier to integrate the prime field circuit into a system that uses prime fields.¶

4.1.2. Three Party Binary Field Protocol

A coin flip protocol in a three party honest majority system simply samples a random share using PRSS. The result is a three-way, replicated sharing of a random binary value.¶

Aggregating these values can be performed using a binary circuit in a tree. Two bits, a and b, are added to form a binary value, {a∧b, a⊕b}.¶

This process is continued pairwise. The resulting pairs, {a1, a2} and {b1, b2}, are also added pairwise to produce a three-bit value, {a1∧b1, a1⊕b1⊕(a2∧b2), a2⊕b2}.¶

Each successive iteration involves one more bit and half as many values, until a single value with log₂(N) bits is produced.¶

This aggregation process requires at most 4N binary multiplications.¶

4.1.3. Three Party Large Prime Field Protocol

Addition of values in a prime field with a modulus greater than the number of samples (N) can be performed trivially. However, producing a replicated secret sharing across three parties using a single bit sample from PRSS results in a value that is uniformly distributed between 0 and 2 inclusive.¶

A modulus conversion operation can be used to convert that into a sharing in the prime field. This requires two multiplications, though some parts of those multiplications can be avoided; see [KIMHC].¶

Three bits are sampled by each pair of parties. These are turned into three shared values, where two of the shared values are filled with zeros. The exclusive OR of these three values is computed using two multiplications in the form: x⊕y = x + y - 2xy. This produces a three-way replicated sharing of a bit in the prime field.¶

Shares can then be aggregated through simple addition.¶

4.1.4. Two Party Protocols

Obtaining multiple random bits in a two party protocol might involve the use of an oblivious transfer protocol. Ideally, these are obtained in a large prime field so that addition is free.¶

Details of OT protocol TBD.¶

5. Performance Characteristics

A binomial function is relatively inexpensive to compute in MPC.¶

5.1. Cost Analysis

With large epsilon and delta values (that is, low privacy) the use of the binomial mechanism can be very efficient. However, smaller values for epsilon or delta can require significant numbers of Bernoulli trials.¶

The following table shows some typical values and the resulting number of trials, along with approximate values for the quantization scaling factor (s) and error.¶

Table 1
epsilon	delta	N	s	error
3	10e-6	TODO	TODO	TODO
1	10e-6	TODO	TODO	TODO
0.1	10e-6	TODO	TODO	TODO

5.2. Comparison with Alternative Approaches

Two other approaches that should be compared with are:¶

simply having each helper party add noise independently [DWORK]¶
amplification by shuffling [SHUFLDP] where local DP is added by clients and used to get a central DP guarantee¶

A binomial will alway give better privacy/utility trade offs compared to independent noise. An MPC system has to ensure that t out of P parties can reveal their shares without degrading the privacy of outputs. Consequently, the noise that each party adds needs to be proportional to P/(P-t) times the target amount, assuming that noise can be simply added. For a three party honest majority system, P is 3 and t is 1, producing 50% more noise than is ideal. For a two party system, the amount of noise needs to be doubled.¶

Shuffling and any scheme that makes use of noised inputs results in noise that increases in magnitude as the number of inputs increases, which degrades utility. The binomial mechanism does not result in any additional noise.¶

7. IANA Considerations

This document has no IANA considerations.¶

8.1. Normative References

[BCP14]: Best Current Practice 14, <https://www.rfc-editor.org/info/bcp14>.
At the time of writing, this BCP comprises the following:

Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.

Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[MPC-MUL]: Savage, B. and M. Thomson, "Efficient Protocols for Binary Fields in the 3-Party Honest Majority MPC Setting", Work in Progress, Internet-Draft, draft-savage-ppm-3phm-mpc-00, July 2024, <https://datatracker.ietf.org/doc/html/draft-savage-ppm-3phm-mpc-00>.
[PRSS]: Thomson, M. and B. Savage, "High Performance Pseudorandom Secret Sharing (PRSS)", Work in Progress, Internet-Draft, draft-thomson-ppm-prss-00, July 2024, <https://datatracker.ietf.org/doc/html/draft-thomson-ppm-prss-00>.

8.2. Informative References

[CPSGD]: Agarwal, N., Suresh, A. T., Yu, F., Kumar, S., and H. B. Mcmahan, "cpSGD: Communication-efficient and differentially-private distributed SGD", May 2018.
[DAP]: Geoghegan, T., Patton, C., Pitman, B., Rescorla, E., and C. A. Wood, "Distributed Aggregation Protocol for Privacy Preserving Measurement", Work in Progress, Internet-Draft, draft-ietf-ppm-dap-11, 21 May 2024, <https://datatracker.ietf.org/doc/html/draft-ietf-ppm-dap-11>.
[DWORK]: Dwork, C. and A. Roth, "The Algorithmic Foundations of Differential Privacy", Now Publishers, Foundations and Trends® in Theoretical Computer Science vol. 9, no. 3-4, pp. 211-407, DOI 10.1561/0400000042, 2013, <https://doi.org/10.1561/0400000042>.
[KIMHC]: Kikuchi, R., Ikarashi, D., Matsuda, T., Hamada, K., and K. Chida, "Efficient Bit-Decomposition and Modulus-Conversion Protocols with an Honest Majority", Springer International Publishing, Information Security and Privacy pp. 64-82, DOI 10.1007/978-3-319-93638-3_5, ISBN ["9783319936376", "9783319936383"], 2018, <https://doi.org/10.1007/978-3-319-93638-3_5>.
[SHUFLDP]: Cheu, A., Smith, A., Ullman, J., Zeber, D., and M. Zhilyaev, "Distributed Differential Privacy via Shuffling", Springer International Publishing, Advances in Cryptology – EUROCRYPT 2019 pp. 375-403, DOI 10.1007/978-3-030-17653-2_13, ISBN ["9783030176525", "9783030176532"], 2019, <https://doi.org/10.1007/978-3-030-17653-2_13>.
[VDAF]: Barnes, R., Cook, D., Patton, C., and P. Schoppmann, "Verifiable Distributed Aggregation Functions", Work in Progress, Internet-Draft, draft-irtf-cfrg-vdaf-10, 8 July 2024, <https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-vdaf-10>.

Authors' Addresses

Ben Case