Decentralizing the Datatrust

The Computable contracts at present dictate that a single datatrust holds all data associated with a market. This DataTrust holds considerable powers in the market. The DataTrust must sign-off on listings before they can be approved (by submitting a dataHash) and has the ability to see raw listing data. This combination of powers means that the DataTrust has to be run by a trusted party. In most markets, this will likely either be the creator of the market or a trusted vendor. It’s important to note that this doesn’t mean the entire ecosystem is centralized at all. Each market can choose to run a different DataTrust. And furthermore, the council retains the ability to replace a DataTrust (albeit at considerable inconvenience).

We’ve chosen these design tradeoffs due to realistic engineering constraints. We believe it’s critically important to ship a functioning system with reasonable design tradeoffs. Our current design has considerable flexibility and allows for a reasonable amount of decentralization from the start. However, this is not to say that we haven’t thought seriously about the decentralized future. In the remainder of this blog post, I’ll lay out our thinking on the pathway to constructing data markets where there cannot exist any trusted third party.

The next iteration of the core Computable contracts will feature multiple DataTrusts per data market. This iteration should likely feature full mirroring where each DataTrust associated with the market will maintain a complete copy of all listings in the datatrust. The economic acknowledgement mechanisms for listings (acknowledgment via the dataHash) will remain mostly unchanged, with the one change being that it suffices to have any one DataTrust acknowledge receipt of the listing. The delivery mechanism for data will have to be altered, since one of the N DataTrusts will actually provide receipt of the data. The delivery requester will in this case select a DataTrust they want to receive delivery from. This DataTrust will follow the existing delivery flow subsequently.

A second and related upgrade will be to allow for the data to be sharded across DataTrusts. The motivation for this is that perhaps we want to allow for each individual to store their own data within their own DataTrusts. However, it’s not clear if it wouldn’t just be better to shard the data market entirely in this case; that is each individual would create their own data market rather than retaining a sharded fraction of a larger multi-individual DataTrust. I’m tentatively leaning towards having many small data markets rather than allowing for complex sharding of one giant data market, since it preserves the atomicity of a single data market and simplifies the failure modes.

Note however that in both the solutions proposed above, a data contributor has to be able to access a DataTrust that they trust to view their data. Or alternatively, they must have the resources to create their own DataTrust to store their own data. What about cases where these assumptions don’t hold true? That is, where the data is so sensitive that it isn’t possible for the individual to trust a 3rd-party or for the individual to set up their own data trust? In this case, we might seek to use more complex cryptographic schemes. For example, let’s suppose that N datatrusts serve a full data market. In this data market, we could use Shamir’s secret sharing to split the raw data N ways so that no one individual has the ability to view the raw data. Instead, some K-of-N DataTrusts must cooperate to answer queries upon the DataTrust. If N grows large, this scheme might provide considerable extra privacy for individuals. However, the software for efficient Shamir’s secret sharing at the scale of large datasets doesn’t yet exist. Considerable research and development work will be needed to enable these cryptographically secured markets.

1 Like