Privacy can be extraordinarily hard to achieve in practice. Part of the challenge is that quantifying the leakage of information from revealed information is quite tricky. Part of the genius of differential privacy is that it came up with a sensible definition for how to quantify this loss in privacy. In addition, differential privacy satisfies a composition theorem which allows the computation of the privacy lost from multiple queries to be computed easily.
Let’s start by introducing terminology. Here, \epsilon and \delta are technical parameters, adapted from the differential privacy literature, which measure the information loss tied to a particular query. Mathematically, if the privacy loss from the i th query is (\epsilon_i, \delta_i), the total loss from a sequence of n such queries is
(This bound isn’t tight, but is simple and useful for our purposes.) Often in differential privacy applications, a dataset will have a fixed “privacy budget” of say (\epsilon, \delta). Queries are allowed until the compositional privacy loss (computed by the equation above or perhaps a more refined bound), exceeds this limit. At this point, further computations are no longer allowed against this dataset.
This scheme creates challenges in practice since a privacy budget sets a fixed cap on the number of queries that can be made against data. I’d argue that rather, constructing a “privacy cost curve” might be a more useful abstraction.
The \epsilon price-curve illustrated above is a tool used to price for the privacy lost in a given query. Each query has an associated \epsilon. In addition, it’s possible to quantify how multiple successive queries cumulatively increase the privacy loss epsilon.
The diagram above has a suggestive structure. Essentially, as \epsilon increases, the cost associated with queries increases as well. This allows for an adaptive pricing scheme, in which queries are always allowed, but the market can price queries that lead to increasing losses in privacy more expensively.