This week we’re revising the Tech Tree to reflect some new major milestones to Ethereum 1.x R&D that are not quite a complete realization of Stateless Ethereum, but much more reasonably attainable in the mid-term. The most significant addition to the tech tree is Alexey’s reGenesis proposal. This is far from a well-specified upgrade, but the general sentiment from R&D is that reGenesis offers a less dramatic yet much more attainable step towards the ultimate goal of the “fully stateless” vision. In many ways complimentary to reGenesis is a static state network that would help distribute state snapshots and historical chain data in a bittorrent-style DHT-based network. At the same time, more near-term improvements like code merkleization and a binary trie representation of state are getting closer and closer to being EIP-ready. Below, I’ll explain and clarify the changes that have been made, and link to the relevant discussions if you’d like to dive deeper on any particular feature.
Binary Trie
While Ethereum currently uses a hexary Merkle-Patricia Trie to encode state, there are substantial efficiency gains to be had by switching to a binary format, particularly in the anticipated size of witnesses. A complete re-encoding of Ethereum’s state requires the new format to be specified, and a clear strategy for transition. Finally, it needs to be decided whether or not smart contract code will also be merkleized, and if that should be incorporated into the binary trie transition or as a standalone change.
Binary Trie Format
The general idea of a binary trie is a bit simpler (pun intended :)) than Ethereum’s current hexary trie structure. Instead of having one of 16 possible paths to walk from the root of the trie down towards child nodes, a binary trie has 2. With a complete re-specification of the state trie comes additional opportunity to improve upon well-established inefficiencies that have made themselves known now that Ethereum has been in operation for more than 5 years. In particular, it might be an opportunity to make the state much more amenable to the real-world performance challenges of database encoding (outlined in a previous article on state growth).
The discussion on a formal binary trie specification and merkleization rules can be found on ethresearch.
Binary Trie Transition
It’s not just the destination (binary trie format) that’s important, but the journey itself! In an ideal transition there would be no interruption to transaction processing across the nework, which means that clients will need to build the new binary trie at the same time as handling new blocks rolling in every 15 seconds. The transition strategy that continues to look the most promising is dubbed the overlay method, which is based partially on geth’s new snapshotting sync protocol. In short summary, new state changes will be added to the existing (hexary) trie in a binary format, making a sort of binary/hexary hybrid during the transition. The un-touched state is converted as a background process. Once the conversion is complete, the two layers get flattened into a single binary trie.
It’s important to note that the binary transition is one context in which client diversity is very important. Every client will need to either implement their own version of the transition or rely on other clients to convert and wait for the new trie on the other side of conversion. This will definitely be a ‘measure twice, cut once’ sort of situation, with all client teams working together to implement test, and coordinate the switchover. It is possible that in the interest of safety and security, the network will need to briefly suspend service (e.g. mine a few empty blocks) over the course of the transition, but agreeing on any specific plan is too far out to predict at this time.
Code Merkleization
Smart Contract code makes up a significant portion of the Ethereum state trie (around 1 GB of the ~50GB of state). A witness for any smart contract interaction will necessarily have to provide the code it’s interacting with to calculate a codeHash, and that could be quite a lot of extra data. Code Merkleization is a means of splitting up contract code into smaller chunks, and replacing codeHash with the root of another merkle trie. Doing so would allow a witness to replace potentially large portions of smart contract code with reference hashes, shaving off crucial kilobytes of witness data.
There are a few approaches to code merkleization schemes, which range from chunking universally (for example, into 64 byte pieces) on the simple side to more complex methods like static analysis based on Solidity’s functionId or JUMPDEST instructions. The optimal strategy for code merkleization will ultimately rely on what seems to work best with real data collected from mainnet.
reGenesis
The best place to get a handle on the reGenesis proposal is this explanation by @mandrigin or the full proposal by @realLedgerwatch, but the TL;DR is that reGenesis is essentially “spring cleaning for the blockchain”. The full state would be conceptually divided into an ‘active’ and an ‘inactive’ state. Periodically, the entire ‘active’ state would be de-activated and new transactions would start to build an active state again from almost nothing (hence the name “reGenesis”). If a transaction needed an old part of state, it would provide a witness very similar to what would be required for Stateless Ethereum: a Merkle proof proving that the state change is consistent with some piece of inactive state. If a transaction touches an ‘inactive’ portion of the state, it automatically elevates it to ‘active’ (whether or not the transaction is successful) where it remains until the next reGenesis event. This has the nice property of creating some of the economic bounds on state usage that state rent had without actually deleting any state, and allowing transaction sender unable to generate a witness to just blindly keep trying a transaction until everything it touches is ‘active’ again.
The fun part about reGenesis is that it gets Ethereum much closer to the ultimate goal of Stateless, but sidesteps some of the largest challenges with Statelessness, i.e. how witness gas accounting works during EVM execution. It also gets some version of transaction witnesses moving around the network, allowing for leaner, lighter clients and more opportunity for dapp developers to get used to the stateless paradigm and witness production. “True” Statelessness after reGenesis would then be a matter of degree: Stateless Ethereum is really just reGenesis after each and every block.
State Network
A better network protocol has been a ‘side-quest’ on the tech tree from the beginning, but with the addition of reGenesis to the scope of Stateless Ethereum, finding alternative network primitives for sharing Ethereum chain data (including state) now seems to fit a lot better into the main quest. Ethereum’s current network protocol is a monolith, when in fact there are several distinct types of data that could be shared using different ‘sub-networks’ optimized for different things.
Previously, this has been talked about as the “Three Networks” on earlier Stateless calls, with a DHT-based network able to more effectively serve some of the data that doesn’t change from moment to moment. With the introduction of reGenesis, the ‘inactive’ state would fit into this category of unchanging data, and could be theoretically served by a bittorrent-style swarming network instead of piece-by-piece from a fully synced client as is currently done.
A network passing around the un-changing state since the last reGenesis event would be a static state network, and could be built by extending the new Discovery v5.1 spec in the devp2p library (Ethereum’s networking protocol). Previous proposals such as Merry-go-Round sync and the (more mature) SNAP protocol for syncing active state would still be valuable steps toward a fully distributed dynamic state network for clients trying to rapidly sync the full state.
Wrapping up
A more condensed and technical version of every leaf in the Stateless Tech Tree (not just the updated ones) is available on the Stateless Ethereum specs repo, and active discussions on all of the topics covered here are in the Eth1x/2 R&D Discord – please ask for an invite on ethresear.ch if you’d like to join. As always, tweet @gichiba or @JHancock for feedback, questions, and suggestions for new topics.
Leave a Reply