Towards AI Smart Contracts

In the preceding few posts, I’ve discussed how current smart contract systems lack floating point and how smart contracts don’t have a systematic method of accessing off-chain state. I’ve also discussed a simple programming model for datatrust, which would allow for users to run computation against datatrusts. This programming model would allow for floating point computation (necessary for handling interesting float datasets) and would allow users to access the off-chain data in the datatrust. This combination of primitives suggests an intriguing future.

Let’s suppose that we’ve bootstrapped a healthy data market ecosystem. There are a variety of data markets supporting different types of data and gathered by different communities. Each such market is supported by one or more datatrusts. Let’s assume that all datatrusts have implemented a programmable architecture on top. As currently sketched out, these datatrusts wouldn’t be able to pool or coordinate their computational effort. If we could figure out a way to allow datatrusts to combine their computational capabilities, we could unlock a number of interesting new use cases. For example, consider federated learning. A user might be interested in training a deep learning model across multiple data markets. She would need each data market to compute the gradient of the model on its data, and send these gradients to a central coordinator which would pool the gradients. To make a concrete example, suppose that a group of pharmaceutical companies want to construct a joint deep model from data in their individual data markets. A federated learning system would be able to achieve this goal.

Let’s consider another alternative example. Let’s suppose that one data market has crawled the web and assembled an indexed representation of the web. It would be really interesting to run a “search engine” on top of this data market. This search engine could run on top of the datatrust programming model. Users who wish to query the search engine could pay a small transaction fee. A portion of this fee would be used to pay for access to the underlying data market with a portion going as profit to the developer of the search engine. As the search engine grows in usability, the engine might choose to harness multiple data markets in tandem. In this case, as in the federated learning example, we would need to coordinate computation across multiple datatrusts.

Let’s consider yet another example. Recently, a number of asset management DeFi protocols have launched. Set protocol maintains a basket of assets with a simple moving average rule for example. The challenges of coding more sophisticated strategies on chain in smart contracts mean that it wouldn’t be straightforward to extend these protocols to create automated crypto hedge funds. However, if it were possible to use something like the datatrust programming model, it would be feasible to maintain a data market with needed historical data and use a datatrust program to make more sophisticated computations to decide trading strategy. Note however that there’s a crucial difference between this example and the previous ones. The previous examples could have been executed off-chain by running a centralized service that communicated with multiple datatrusts. A crypto hedge fund that manages external user funds would need to be able to run on-chain. This necessitates a structured bridge between the off-chain datatrust computation and the on-chain assets under management.

Generalizing, it seems that linking together datatrusts from disparate markets into a unified computing platform with a shared programming model could create a powerful new system that could enable a host of new applications. If this new construct has a native bridge to Ethereum, it would become possible to create rich new DeFi protocols that can intelligently manage off-chain funds. We can call this new class of programming constructs “AI smart contracts.” It’s not clear what such AI smart contracts could do, but as Ethereum’s example teaches us, the construction of a powerful new programming paradigm often opens up new applications that weren’t obvious up front. For example, maybe it might be possible to create improved stablecoins that can use more intelligent asset management to avoid the large overcollateralization of MakerDAO.

The construct that we’re describing here is essentially a new smart contract platform. There are a host of such smart contract platforms currently under development, including Ethereum 2.0, Dfinity, Polkadot, Tezos, Libra and more. It isn’t a stretch to say that the construction of new smart contract platforms is rapidly becoming commoditized. Why should anyone bother constructing a new smart contract system? I think there has to be a critical differentiator that justifies the effort undertaken to build a new platform. (Although as the number of smart contract platforms increases, the engineering will become more standardized and will lower the barrier for new experiments). In our case, the primary differentiator is native access to the data within data markets. The ability to make intelligent data driven AI smart contracts is something that won’t be easy with other smart contract platforms. In fact, building the type of platform I’m describing here could actually lower the barrier to integration of data markets with other smart contract platforms by serving as a test bed for AI smart contract development, in much the same way as ZCash has driven the adoption of privacy preserving techniques in other ecosystems.

It’s also important to notice that there a number of critical technological challenges that we would need to overcome to construct an AI smart contract platform. For one, AI computations are often very expensive, and it wouldn’t be feasible to duplicate smart contract execution across the network. And of course, data governance by the data market would prevent wide dissemination of data across the network. Well then, we might say that each data market could function as a shard, with the datatrusts within that shard needing to reach consensus. This still poses some verification challenges. With a single datatrust, this model is efficient, but requires trusting the datatrust to behave correctly. With multiple datatrusts, computation would have to be duplicated. This is overhead, but perhaps not unreasonable. Another alternative might be to use verified computation techniques using SNARKs or perhaps STARKs to provide certificates of datatrust fidelity. At present these questions remain open research challenges.