The current Computable system requires that each data market be able to identify a trusted party who can operate the datatrust associated with that data market. One of our major research goals is to reduce the reliance on this single trusted datatrust operator. For example, in a previous post, we outlined how listing upload could work with multiple datatrusts.
In this post, we’ll explore some more of the basic economics of multiple datatrust systems. First, we discuss how in-bound deliveries are fairly routed to different datatrust operators, how negligent datatrust operators are punished, and who can register to be a datatrust operator in a multi-datatrust system,
Stake Based Delivery Routing
At present, to minimize spam and attacks, we require that callers of
Datatrust.register(), which proposes new datatrust operator candidates, have to put up a stake. This stake has to remain in place until voting on the datatrust candidate completes, after which it can be removed. This stake is primarily required to prevent spam attacks, but doesn’t serve any subsequent purpose.
Things will become a little different with multiple datatrusts in place. For one, deliveries become trickier. Which datatrust is responsible for making a delivery if there are multiple present? Datatrusts are paid to perform deliveries, so there should be interest from datatrust operators in being allocated requested deliveries. Ideally, we’d like to pick a datatrust in a fair fashion. A simple way to do this would be to pick a datatrust randomly. The simplest way to do this would be uniformly at random, but a better way might be to weight by stake. That is, we require each datatrust to lock in a stake for the period that it serves as a datatrust. Datatrusts that lock in more stake would then be able to serve a greater proportion of the deliveries from this market and be able to see greater returns, which nicely incentives datatrusts to participate fully in the data market economy. (There are also some nice security benefits as we will see shortly). To be precise, we propose a new function:
def selectRandomDatatrustByStake() -> address
This function is responsible for making a random choice from amongst the available datatrusts. The function should be called at the end of
Datatrust.requestDelivery() and should assign a datatrust to this delivery. To store this datatrust, a new field should be added to the
Datatrust.Delivery struct with the assigned datatrust. Then,
Datatrust.delivered() will check that
msg.sender is this assigned datatrust. Then the final payout in
Datatrust.delivered() will be made to the chosen datatrust and not to an arbitrary datatrust that didn’t do the work.
Timeouts and Slashing for Negligent Datatrust Operators
This mechanism is relatively straightforward, but raises some additional questions. One of the major challenges of the current data market design is that it does not have any mechanism for enforcing a datatrust operator’s honest behavior. If a datatrust goes offline or fails to respond to delivery requests in a reasonable timeframe, there is no mechanism which can punish such negligent behavior. This situation is made worse in a multi-datatrust world, since as the number of datatrust increases, the possibility of malicious or negligent behavior increases as well. For this reason, we need to add a mechanism for punishing negligent datatrusts.
I propose we add a new parameters
Which provides the maximum time a datatrust operator can take to respond to a delivery request. This parameter creates an implicit contract between the data purchaser and a datatrust for how the data should be delivered. Let’s now game out what happens in a failure case. A data purchasers calls
Datatrust.requestDelivery(). Let’s assume we have 8 registered datatrusts in this market. The call to
Datatrust.selectRandomDatatrustByStake() selects one of these datatrusts in a random fashion and records this datatrust in the
Datatrust.Delivery struct instance associated with this delivery.
TIme passes. The selected datatrust is either negligent or malicious, and does not call
Datatrust.delivered() to acknowledge completion of the delivery before
DATATRUST_TIMEOUT passes. In this case, I propose that the data purchaser should call a new function
def requestDifferentDatatrust(delivery: byte32)
This function takes a few different actions. First, it slashes the datatrust who was previously assigned. That is, we introduce a new parameter
The slash burns
CMT from the datatrust operator’s staked amount. It then checks that the datatrust operator’s remaining stake hasn’t fallen below the minimum
STAKE amount. If it has, the datatrust operator is removed as a datatrust and must re-register. Finally, it calls
Datatrust.selectRandomDatatrustByStake() to select a new datatrust and assigns it to the deliverer.
The addition of this stake creates a powerful incentive for a datatrust operator to complete deliveries in a timely manner, since funds can be lost for negligence. This slashing is a key part of developing proof-of-stake systems such as Cosmos or Ethereum 2.0, so it doesn’t feel unreasonable to require that datatrust operators must be willing to take on the risk of slashing for bad behavior.
Who Can Register to Be a Datatrust?
A datatrust has a powerful superpower in a data market: they can view all data currently in the data market. (There does not yet exist a mechanism for encrypting the data in a data market, although I have some early thoughts on this vein). This suggests that datatrust operators should be socially vetted by the community around a datatrust. We already require that callers of
Datatrust.register() post a stake, which ensures that the datatrust operator is already invested in the community, but security might require a more thorough social vetting, where a would-be datatrust operator publishes blog-posts or discusses their technical setup in more detail to induce confidence in data market stakeholders. The community around a given data market will have to believe that their data is secure with the given datatrust operator, so the more details the operator shares publicly, the more confidence the community will build.
Another point to consider here is that datatrust operators might take on some degree of legal liability. For example, if copyrighted data finds its way into a data market, the DMCA might serve a takedown notice to a datatrust operator. While automated tooling will exist to help the datatrust operator manage their data holdings, this will require additional effort.
This proposed combination of stake+social-vetting is similar to that governing Cosmos validator selection, which has attracted a thriving community of validators. However, Cosmos validators have a powerful additional incentive, which is the “block reward.” Data markets only have delivery payments, and not block rewards. This suggests that datatrust operators will need to believe that sufficient demand exists for deliveries from a data market before they will commit to act as a datatrust operator. Bootstrapping belief in this demand is still a major open challenge for the Computable community.
One of the current design standards of the Computable codebase is that we seek to minimize on-chain state. In particular, we don’t maintain any on-chain state that is iterable. We only maintain mappings, which are not iterable. This suggests that the set of authorized datatrusts would be stored in a map:
datatrusts: map(bytes32, address) # datatrust_hash -> datatrust_address
datatrust_hash is a unique identifier. The set of currently authorized datatrusts can be maintained within this map. However, this poses a challenge for
Datatrust.selectRandomDatatrustByStake(). How do we pick a random datatrust by stake if we can’t iterate over this map?
One possibility is that we might want to use a fixed size list to store the datatrusts. This would place an upper bound on the number of possible datatrusts for a market, but this can be chosen to be large, say 100:
datatrusts: address # datatrust_addresses
Then removing a datatrust operator is simply zeroing out its index. The random sampling can be applied by looping over the 100 addresses here. We would need to watch out for gas costs for accessing storage though.