Why are AI Smart Contract Systems Hard to Build?

In a previous post I sketched out how the addition of an “AI-smart contract layer” could enable powerful new uses of data markets. However, in that post, I only briefly discussed a core issue: Why hasn’t anyone done this already?

If you’re familiar with smart contract platforms, you’ll know already that contracts tend to be very compute-limited. Ethereum for example has a “gas limit” which limits the amount of computation that can be performed in one “block.” This means that complicated functions are in-practice forbidden on Ethereum. There are very good reasons for having gas limits in place. The biggest one is that a smart contract has large externalities. If you store one byte in a smart contract function, every full node on Ethereum has to store this same byte. This means that the effective storage space used can be enormous. The complexity of transactions executed on Ethereum thus far means that sync-ing a full node trustlessly is a major hassle.

Let’s suppose now that we enable “AI-smart contracts” where smart contracts could perform large amounts of computation. If we continued to use the current Ethereum-style consensus, the load on miners would explode. AI computations tend to use large amounts of compute. Multiplied across the mining pool, the load could become very hard to bear. If more complex “AI smart contracts” were allowed, it seems that the entire network could grind to a halt. How might we improve this situation?

Ethereum 2.0 and Polkadot are talking similar approaches. The basic notion is that you split “validators” (which are functionally equivalent to “miners” except for proof-of-stake rather than proof-of-work systems) onto separate shards. (Validators aren’t permanently assigned to a shard, but rather rotate according to some scheme.) Then, only validators on a given shard have to duplicate the computation of a smart contract in order to verify it. This allows for a linear scaling effect. Let’s suppose we have 1000 shards each with 10 validators. While previously, all 10,000 miners would have had to replicate a smart contract, now we might be down to only having 10 validators on that shard replicate a computation. This will likely lead to a large boost in the complexity of smart contracts that can be executed in practice. Ethereum 2.0 also has a new system of “execution engines” in development which will allow for contracts to be written in web assembly (WASM). Polkadot also has a similar framework. This will likely facilitate the use of AI/ML primitives especially since frameworks like Tensorflow.js make deep learning on the web more common.

At the same time, I don’t think sharding is a sufficient long term solution. To see why, note that the computational requirements of a modern deep learning or reinforcement learning system are tremendously larger than the requirements of an average smart contract. So even if sharding gives a 1000x boost, that will likely still be insufficient for complex AI applications. (To be more precise, I think inference will become feasible on a sharded smart system, but I think training a model will still be beyond reach for most use cases.) Where does this leave us then?

I think in order for us to have feasible AI smart contracts, I think we need to bet on a verified computation technique. There are a number out there. STARKs, SONICs, PLONKs, DARKs, etc. However, it remains tremendously inefficient to use verified computing for general computations. Although the Pepper project has made a lot of progress, years of effort will be required before verified computation techniques can be applied to AI algorithms.

You might be curious as you’re reading along, what the overhead is for a verified computation technique is. It depends heavily on the choice of technique, but if you’re using something like STARKs, there overhead is logarithmic. That is, if the run-time of an algorithm is O(f(n)), then the cost of generating the STARK is O(f(n)\textrm{log}(f(n))). This is considerable overhead, but I suspect that it might be possible to tighten these bounds in practice. You might also ask, why not use a trusted execution environment (TEE) to run large AI computations instead? The main challenge is that TEEs aren’t entirely trustless. You might have to trust a certificate issuer such as Intel, or be confident that a side-channel attack can’t be run against the physical hardware. These assumptions are onerous, and likely don’t hold true in the real world, where sophisticated adversaries are capable of getting physical access to cloud hardware when necessary. It’s better to trust in the provable guarantees of software.

For these reasons, I think that the path towards AI smart contracts will require solidifying software for foundational technologies for verified computation. We’ve spent a good chunk of time fooling around with a STARKs library ourselves, and we hope to see many more efforts from the community take root in the coming years.