Implementing Multiple Datatrust Support with a BFT Distributed Key-Value Store

One of the major limitations of Computable data markets today is the fact that we currently require the datatrust to be a trusted party. This works fine in a data market where there is a trusted coordinating party (perhaps the creator of the data market) who can be trusted by market participants. At the same time, we would like to enable the construction of data markets where there don’t necessarily exist any trusted parties. For example, in an open source consortium data market, there might be a revolving cast of contributors, but no one stable entity that can take on continued responsibility for running the datatrust. What can we do in this situation?

Let’s assume that datatrusts are not trusted entities. In this case, each data market should be able to be backed by multiple datatrusts for robustness. The design of this system should be robust to a subset of the datatrust operators behaving maliciously so long as a majority of datatrust operators are honest. In addition, it should be possible for new datatrusts to join the data market at any time, and it should be possible for datatrusts to leave the data market when they wish. Given this abstract description, an interesting similarity that jumps out is that desired datatrust behavior seems to map pretty closely to desired behavior for validators in Byzantine fault tolerant (BFT) systems such as Tendermint.

Let’s take a step back. What’s a validator? Roughly speaking, a validator is analogous to a miner in Bitcoin or Ethereum (1.0). If it is currently the leader, a validator is responsible for proposing new blocks of transaction to add onto the blockchain and otherwise responsible for validating blocks of transactions proposed by other nodes. The leader changes at set intervals according to either a random choice or a deterministic rotation rule depending on the blockchain in question. The set of validators is responsible for reaching consensus on whether to add the new block proposed by the leader. Tendermint is one algorithm for reaching this consensus that has a number of very nice properties. First, it is robust to up to ⅓ of the validators behaving maliciously. Second it is considerably simpler than alternative consensus algorithms and has been battle tested live in the Cosmos network.

At first glance, this description doesn’t seem to match the responsibilities of datatrusts that closely. Datatrusts are responsible for storing and delivering data, not for reaching consensus on blocks. But, there’s actually a deeper connection here. First, let’s note that our current implementations of datatrusts use a key-value store to store listing data. The ffa-datatrust uses DynamoDB for example. There’s a good argument that a datatrust should be a key-value store in general since that maps closely to the on-protocol listing design keyed by the listing hash. With this algorithmic framework in hand, a multiple datatrust system becomes considerably less mystical. It is simply a distributed key value store run by multiple independent nodes. Distributed Hash tables (a type of distributed key value store) have been used widely already so in some sense this isn’t terribly exotic. However, one difference is that a multiple datatrust has to be Byzantine fault tolerant, while most distributed hash tables are not. That is, some of the nodes implementing the distributed key value store may be malicious, and the system has to be robust to this malicious behavior. There has been some research on BFT distributed hash tables, but these systems are not widely used yet. This seems to leave us in a bit of a pickle, since we wouldn’t have any solid codebases we could lean on to implement a BFT distributed key value store for a multiple datatrust system.

Luckily for us though, we can leverage research on building BFT validator blockchains using Tendermint to constructed a BFT distributed key value store. The key insight is that a “transaction” can correspond to a key-value store update (in our case, this could be “add candidate,” “remove candidate,” “add listing,” “remove listing”). If consensus is reached using Tendermint on the set of transaction updates amongst the set of datastores backing a data market, then each datatrust can implement these updates on its local key value store. Even more promisingly, the tendermint team has already built a prototype exploring this idea of using Tendermint to construct a data store.

This suggests an incremental path towards building a multiple datatrust system. Layer a Tendermint consensus chain on top of existing datatrust infrastructure to reach consensus on the updates to make. Once consensus is reached, simply use the existing code to implement the key value store update. If the consensus algorithm is configured correctly, this provides us the features of the multiple datatrust system we were looking for! That said, I suspect considerable complexity remains in prototyping out this system and stress testing whether this basic idea holds muster.

Ah yes, Tendermint, a project close to my heart. I would also suggest checking out for ideas on how to evaluate and reward data quality/integrity–its highly specific for a type of data, but I’m guessing you’ll be well positioned to evaluate it better than I.

1 Like