Using BLS Signatures to Acknowledge Listing Upload in multi-datatrust markets

In the current version of the Computable contracts, there is a simple back-and-forth game that is played when a new listing candidate is proposed by a maker. The steps are as follows:

  1. Maker calls Listing.list with the proposed listing candidate
  2. The datatrust calls Datatrust.setDataHash to acknowledge receipt of the listing
  3. Stakeholders in the market vote upon the listing candidate by calling Voting.vote
  4. Once the poll closes, some participant calls Listing.resolveApplication which checks that the data hash has been set by the datatrust, and then checks if Parameterizer.plurality has been reached

As part of our longer term roadmap, we’d like to work towards decentralizing the datatrust. This means that in the future, there won’t be only one datatrust attached to a data market. The addition of these additional datatrusts complicates the flow for getting a new listing approved however. In particular, step 2 above requires acknowledgement from a datatrust. In the current design, there is only 1 datatrust, so this authorization serves as ironclad confirmation (assuming the datatrust is semi-trusted) that the listing has been received.

This situation gets more complicated with multiple datatrusts. Let’s suppose that we have a datamarket with 8 datatrusts. How do we handle step #2? The simplest might be to require every datatrust operator to independently call Datatrust.setDataHash (or an equivalent method) to acknowledge that their datatrust has taken receipt of the listing. This is simple enough, but there are a couple of complications here. For one, we’re now blowing up the number of Ethereum transactions required to get anything done in the data market. Rather than just 1 acknowledgement transaction, each listing will require 8 transactions. Even worse, if any of these datatrusts are malicious, they can not provide the acknowledgement and stop the listing flow from proceeding further. It’s one thing to find 1 trusted datatrust, but much harder to find 8! And even worse, even if all the datatrust are trusted, one might just be having a network issue. A system shouldn’t fail if a single one of its components is having difficulty.

There are a couple simplifications we can make here. The first is that rather than requiring all 8 datatrusts to acknowledge, perhaps we only require 5-of-8 datatrusts to acknowledge receipt of the listing data. That way, a few malicious datatrusts (or a few with network issues) can’t take down the broader network. However, we still have a problem here that we’d require 5 transactions to confirm a listing. How can we reduce this requirement?

The key idea is that we want to somehow aggregate the acknowledgements from multiple datatrusts into an aggregate acknowledgement. To accomplish this, we’ll make use of some new cryptographic technology that’s been making waves in the blockchain community. In particular, we’ll make use of BLS signatures. These signatures are popping up in many blockchain projects, since there are many situations where consensus needs to be reached amongst a set of participants in a blockchain ecosystem. The math behind BLS signatures is a little novel if you’re not familiar with common cryptographic conventions, so I’ll try to review it here.

First, let’s pick a prime number, r. Consider two groups of prime order r called G, G_T. (You can think of these groups roughly as \mathbb{Z}_r, the set of numbers 0,\dotsc,r-1). In keeping with cryptographic notation, we will write the groups “multiplicatively.” What does this mean? Well you can pretend that we represent the numbers 0 to r-1 by the powers 2^0,\dotsc,2^{r-1}. Or if we want to be more general, we can use a general g instead of 2. Then the elements of the group are g^0,\dotsc,g^{r-1}.

BLS signatures depend on the existence of a special function e: G \times G \to G_T. That is, the function e takes in a tuple (x, y) where x, y are numbers between g^0 and g^{r-1} and returns another number from g^0 to g^{r-1}. We require a couple things from the function e. In particular, we ask that the function be bilinear. This means that e satisfies the following property,

e(g^x, g^y) = e(g, g)^{xy}

where x and y are any two integers. We also require that e(g, g) \neq 1. Note that this has the following neat consequence

e(g^x, g^y) = e(g, g)^{xy} = e(g, g^{xy})

Why does this matter? Well, suppose that we have some integer z. We can check if z = xy by checking whether e(g, g^z) == e(g^x, g^y). This turns out to be very useful for doing cryptography.

Alright, now let’s actually get down to defining what a BLS signature actually is. To start, we need to define what the secret key and public keys are in this system. The secret key is simple. Just pick a random integer x \in 0,\dotsc,r-1 as your secret key (Note that r is likely very very big, so this x is quite secure if you pick it truly at random.) To get the public key, we simply compute g^x as our public key. You might be wondering now, can’t I just take a logarithm and get back x from g^x? Isn’t this insecure? Well it turns out that the so-called “discrete logarithm” problem is very hard to perform (until quantum computers become practical that is.) So there’s no realistic way at present to recover x from g^x so you’re safe even if the whole world knows g^x today. To summarize

\begin{align*} \textrm{sk} &:= x \\ \textrm{pk} &:= g^x\\ \end{align*}

Ok, we now have secret keys and public keys, how do we get signatures? Well, let’s suppose that we have access to a good hash function h. (You can think of this as SHA256 or Keccak or some other good hash function.) We will require that h gives us back a number from 0,\dotsc,r-1 that is in our group. Let’s say m is our message. Let h = H(m) be our hashed message. Note that h is a group element. Then we say the signature of m with our secret key x is \sigma = h^x.

\begin{align*} \sigma &:= H(m)^x = h^x \end{align*}

How can we verify that this signature makes sense? Well, let’s say that someone is trying to prove to us that they’ve signed m with x. We don’t know x, but we know the public key g^x. We also know the hash function H, so we can compute h = H(m). We also have the base element g. Now we want to make use of the special properties of our function e. In particular note the following

e(\sigma, g) = e(h^x, g) = e(h, g)^x = e(h, g^x)

Luckily for us, we can compute both e(\sigma, g) and e(h, g^x). Note that if someone tries to pass off a fraudulent signature \sigma’, we can detect it by using the function e and checking that e(\sigma’, g) \neq e(h, g^x).

Alright, this is pretty cool so far, but you might still be wondering what the point of all this is. You can already sign message with regular old Ethereum private keys. Why bother with this fancy new BLS signature? The answer is that there’s one really cool property of BLS signatures that regular Ethereum signatures don’t have. Let’s suppose that \sigma_1,\dotsc,\sigma_5 are BLS signatures by our 5 datatrusts. Then I claim that the single number

\sigma = \sigma_1 \sigma_2 \sigma_3 \sigma_4 \sigma_5

is sufficient to prove that that all of our 5 datatrusts have signed! How does this work? Well it all has to do with the magical properties of our function e once again. Let’s put some terminology down quickly. Let’s suppose the secret keys for our 5 datatrusts are x_1,\dotsc,x_5. Then their public keys are g^{x_1},\dotsc,g^{x_5}. Let’s assume that all the datatrusts signed the same message m. Let h = H(m) be the hash of the message. Then the signatures that each datatrust generated can be written as h^{x_1}, \dotsc, h^{x_5}. The following equations then hold

\begin{align*} e(\sigma, g) &= e(\sigma_1\dotsc\sigma_5, g) \\ &= e(h^{x_1}\dotsc h^{x_5}, g) \\ &= e(h^{x_1+\dotsc +x_5}, g) \\ &= e(h, g)^{x_1+\dotsc x_5} \\ &= e(h, g^{x_1 + \dotsc + x^5}) \\ &= e(h, g^{x_1} \dotsc g^{x^5}) \\ &= e(h, \textrm{pk}_1 \dotsc \textrm{pk}_5) \\ \end{align*}

Note that we can again compute the e(\sigma, g) and e(h,\textrm{pk}_1 \dotsc \textrm{pk}_5)! (Since we know the public keys for a datatrust operator).

Ok, let’s take a step back from the mathematics. What does all of this mean exactly? Our original problem is that we want to verify that 5 datatrusts have all acknowledged receipt of a listing. And we’d like to do this without having to make 5 separate Ethereum transactions. We now do the following. All the datatrusts agree on a common message m (most likely, m is just the listing hash for the listing). Then each datatrust signs m using its BLS secret key x_i to get signature \sigma_i. (It’s important to note that the BLS secret key for a datatrust operator is not the same as the Ethereum secret key for the datatrust operator!) Each datatrust operator broadcasts \sigma_i to the other datatrusts.

Let’s pause for a minute here. How do we write the code to compute \sigma_i for each datatrust? Well, luckily for us, Ethereum 2.0 (and other projects) all use BLS signatures heavily, so we can piggy-back on BLS codebases that these projects have developed for BLS signature generation. For example, https://github.com/phoreproject/bls is a go BLS signature library used by Prysmatic labs. As Ethereum 2.0 reaches mainnet, this means we will have access to solid, likely audited code, we can use on the datatrust end to generate these signatures. (We will also reuse the precise BLS “curve” BLS12-381 used by Ethereum 2.0 and ZCash to keep things simple.)

Now, let’s get back to our listing acknowledgement flow. As signatures come in, an honest datarust operator will just multiply them. Let’s say that I’m datatrust operator 5, and I’ve received signatures from operators 1 through 4. Then I can compute \sigma = \sigma_1 \dotsc \sigma_5 (using a BLS signature library). What do I do with \sigma? Well, the basic idea is that I can submit a tuple ((1,...,5), \sigma) on-chain. (We can’t actually submit a list (1,\dotsc,5) on-chain, so this will likely be a serialized bytestring). We can then perform verification of the BLS signature on-chain as in this solidity gist that verifies an aggregate BLS signature. It’s important to note here though that some technical spikes will be needed to make the signature verification is robust. In particular, we will need to instrument the gas costs to make sure that make sure that the gas usage isn’t exorbitant.

Assuming that we can handle our gas issues, we’re now in a really powerful spot. We can call one on-chain function to create an acknowledgement that 5 datatrusts have received our listing. This means that we essentially recover today’s listing flow! The only difference is that Datatrust.setDataHash would now accept extra arguments (\textrm{bytes}, \sigma) with the bytestring representation of datatrusts who signed and their BLS signature. Then Datatrust.setDataHash would verify this signature before setting the data hash.

The research outlined in this post closes off one of the major open questions in the design of multi-datatrust data markets. However, a number of questions still remain. In particular, we need to specify precisely how deliveries are handled (see discussion) and we need to specify details about how data uploads are handled (is the maker responsible for uploading to all datatrusts or should datatrusts propagate listings received to other datatrusts?)