At present, each data market has one associated datatrust. This datatrust is assumed to be a trusted party (likely run by the creator of the market), who is incentivized to fairly serve the users of the data market. This assumption can hold true for a wide variety of market types in practice. However, there are also markets for which the datatrust operator is not trustworthy. In particular, if the datatrust operator is a third party who isn’t known to the market creator, having additional safeguards on good behavior would be very valuable.
In particular, it would be very useful for market participants to be able to periodically verify that listings are stored correctly by the datatrust and remain retrievability. Let’s call such a demonstration a proof-of-listing (PoL). How can we implement a PoL system in practice? The simplest way is for market participants to directly download data for listings and verify its correctness. This scheme runs into issues with both privacy and bandwidth however. Arbitrary participants can’t access the data for a given listing without making a purchase. In addition, listings might contain very large files that may not be feasible to download easily.
Luckily, the question of verifying retrievability of files has already received considerable attention from the cryptographic community. The paper, PORs: Proofs of Retrievability for Large Files is an important early work that provides a protocol for a user to verify that a file can be retrieved without having to store the file themselves. There are a few basic ideas that underpin this construction. First, the user encrypts the file before uploading. They then introduce a few randomly “sentinel points” into the file. Once the user uploads the file, the server can’t tell which points are the “sentinel points” since the file encryption obfuscates the file structure. To test that the file has been stored correctly, the user simply needs to query the server for a sentinel point. If the server fails to respond correctly, then the user can detect a retrievability failure. To make the scheme more robust, the paper suggests layering error correction on top of this basic scheme.
How can this idea be adapted to provide a proof-of-listing? If a maker uses the POR protocol when uploading their data, they can query a datatrust for the sentinel points from their files. This would allow a maker to be able to verify that the datatrust is correctly storing its listing. However, this basic scheme has some drawbacks. It isn’t possible for an external party to verify the correctness of storage. (Since if the datatrust operator knew the location of the sentinel points, they could simply store only the sentinel points and not the original file). It might be possible to use more recent cryptographic work to improve on the idea in this post. In particular, it might be possible to simplify the approach from the POR protocol by constructing a Merkle tree over the file and requesting merkle proofs for randomly chosen chunks of the file. (H/T Reid Williams for coming up with this simplified approach).
The filecoin team and associated researchers have also done a lot of interesting work on proofs-of-space and proofs-of-replication (see this work) which might prove useful here as well.