Dodging a bullet: Ethereum State Problems

Dodging a bullet: Ethereum State Problems

With this blog post, the intention is to officially disclose a severe threat against the Ethereum platform, which was a clear and present danger up until the Berlin hardfork.

State

Let’s begin with some background on Ethereum and State.

The Ethereum state consists of a patricia-merkle trie, a prefix-tree. This post won’t go into it in too much detail, suffice to say that as the state grows, the branches in this tree become more dense. Each added account is another leaf. Between the root of the tree, and the leaf itself, there are a number of “intermediate” nodes.

In order to look up a given account, or “leaf” in this huge tree, somewhere on the order of 6-9 hashes need to be resolved, from the root, via intermediate nodes, to finally resolve the last hash which leads to the data that we were looking for.

In plain terms: whenever a trie lookup is performed to find an account, 8-9 resolve operations are performed. Each resolve operation is one database lookup, and each database lookup may be any number of actual disk operations. The number of disk operations are difficult to estimate, but since the trie keys are cryptographic hashes (collision resistant), the keys are “random”, hitting the exact worst case for any database.

As Ethereum has grown, it has been necessary to increase the gas prices for operations which access the trie. This was performed in Tangerine Whistle at block 2,463,000 in October 2016, which included EIP 150. EIP 150 aggressively raised certain gascosts and introduced a whole slew of changes to protect against DoS attacks, in the wake of the so called “Shanghai attacks”.

Another such raise was performed in the Istanbul upgrade, at block 9,069,000 in December 2019. In this upgrade, EIP 1884 was activated.

EIP-1884 introduced the following change:

  • SLOAD went from 200 to 800 gas,
  • BALANCE went from 400 to 700 gas (and a cheaper SELFBALANCE) was added,
  • EXTCODEHASH went from 400 to 700 gas,

The problem(s)

In March 2019, Martin Swende was doing some measurements of EVM opcode performance. That investigation later led to the creation of EIP-1884. A few months prior to EIP-1884 going live, the paper Broken Metre was published (September 2019).

Two Ethereum security researchers — Hubert Ritzdorf and Matthias Egli — teamed up with one of the authors behind the paper; Daniel Perez, and ‘weaponized’ an exploit which they submitted to the Ethereum bug bounty in. This was on October 4, 2019.

We recommend you to read the submission in full, it’s a well-written report.

On a channel dedicated to cross-client security, developers from Geth, Parity and Aleth were informed about the submission, that same day.

The essence of the exploit is to trigger random trie lookups. A very simple variant would be:

	jumpdest     ; jump label, start of loop
	gas          ; get a 'random' value on the stack
	extcodesize  ; trigger trie lookup
	pop          ; ignore the extcodesize result
	push1 0x00   ; jump label dest
	jump         ; jump back to start

In their report, the researchers executed this payload against nodes synced up to mainnet, via eth_call, and these were their numbers when executed with 10M gas:

  • 10M gas exploit using EXTCODEHASH (at 400 gas)

  • 10M gas exploit using EXTCODESIZE (at 700 gas)

As is plainly obvious, the changes in EIP 1884 were definitely making an impact at reducing the effects of the attack, but it was nowhere near sufficient.

This was right before Devcon in Osaka. During Devcon, knowledge of the problem was shared among the mainnet client developers. We also met up with Hubert and Mathias, as well as Greg Markou (from Chainsafe — who were working on ETC). ETC developers had also received the report.

As 2019 were drawing to a close, we knew that we had larger problems than we had previously anticipated, where malicious transactions could lead to blocktimes in the minute-range. To further add to the woes: the dev community were already not happy about EIP-1884 which hade made certain contract-flows break, and users and miners alike were sorely itching for raised block gas limits.

Furthermore, a mere two months later, in December 2019, Parity Ethereum announced their departure from the scene, and OpenEthereum took over maintenance of the codebase.

A new client coordination channel was created, where Geth, Nethermind, OpenEthereum and Besu developers continued to coordinate.

The solution(s)

We realised that we would have to do a two-pronged approach to handle these problems. One approach would be to work on the Ethereum protocol, and somehow solve this problem at the protocol layer; preferrably without breaking contracts, and preferrably without penalizing ‘good’ behaviour, yet still managing to prevent attacks.

The second approach would be through software engineering, by changing the data models and structures within the clients.

Protocol work

The first iteration of how to handle these types of attacks is here. In February 2020, it was officially launched as EIP 2583. The idea behind it is to simply add a penalty every time a trie lookup causes a miss.

However, Peter found a work-around for this idea — the ‘shielded relay’ attack – which places an upper bound (around ~800) on how large such a penalty can effectively be.

The issue with penalties for misses is that the lookup needs to happen first, to determine that a penalty must be applied. But if there is not enough gas left for the penalty, an unpaid consumption has been performed. Even though that does result in a throw, these state reads can be wrapped into nested calls; allowing the outer caller to continue repeating the attack without paying the (full) penalty.

Because of that, the EIP was abandoned, while we were searching for a better alternative.

  • Alexey Akhunov explored the idea of Oil — a secondary source of “gas”, but which was intrinsically different from gas, in that it would be invisible to the execution layer, and could cause transaction-global reverts.
  • Martin wrote up a similar proposal, about Karma, in May 2020.

While iterating on these various schemes, Vitalik Buterin proposed to just increase the gas costs, and maintain access lists. In August 2020, Martin and Vitalik started iterating on what was to become EIP-2929 and its companion-eip, EIP-2930.

EIP-2929 effectively solved a lot of the former issues.

  • As opposed to EIP-1884, which unconditionally raised costs, it instead raised costs only for things not already accessed. This leads to a mere sub-percent increase in net costs.
  • Also, along with EIP-2930, it does not break any contract flows,
  • And it can be further tuned with raised gascosts (without breaking things).

On the 15th of April 2021, they both went live with the Berlin upgrade.

Development work

Peter’s attempt to solve this matter was dynamic state snapshots, in October 2019.

A snapshot is a secondary data structure for storing the Ethereum state in a flat format, which can be built fully online, during the live operation of a Geth node. The benefit of the snapshot is that it acts as an acceleration structure for state accesses:

  • Instead of doing O(log N) disk reads (x LevelDB overhead) to access an account / storage slot, the snapshot can provide direct, O(1) access time (x LevelDB overhead).
  • The snapshot supports account and storage iteration at O(1) complexity per entry, which enables remote nodes to retrieve sequential state data significantly cheaper than before.
  • The presence of the snapshot also enables more exotic use cases such as offline-pruning the state trie, or migrating to other data formats.

The downside of the snapshot is that the raw account and storage data is essentially duplicated. In the case of mainnet, this means an extra 25GB of SSD space used.

The dynamic snapshot idea had already been started in mid 2019, aiming primarily to be an enabler for snap sync. At the time, there were a number of “big projects” that the geth team was working on.

  • Offline state pruning
  • Dynamic snapshots + snap sync
  • LES state distribution via sharded state

However, it was decided to fully prioritize on snapshots, postponing the other projects for now. These laid the ground-work for what was later to become snap/1 sync algorithm. It was merged in March 2020.

With the “dynamic snapshot” functionality released into the wild, we had a bit of breathing room. In case the Ethereum network would be hit with an attack, it would be painful, yes, but it would at least be possible to inform users about enabling the snapshot. The whole snapshot generation would take a lot of time, and there was no way to sync the snapshots yet, but the network could at least continue to operate.

Tying up the threads

In March-April 2021, the snap/1 protocol was rolled out in geth, making it possible to sync using the new snapshot-based algorithm. While still not the default sync mode, it is one (important) step towards making the snapshots not only useful as an attack-protection, but also as a major improvement for users.

On the protocol side, the Berlin upgrade occurred April 2021.

Some benchmarks made on our AWS monitoring environment are below:

  • Pre-berlin, no snapshots, 25M gas: 14.3s
  • Pre-berlin, with snapshots, 25M gas: 1.5s
  • Post-berlin, no snapshots, 25M gas: ~3.1s
  • Post-berlin, with snapshots, 25M gas: ~0.3s

The (rough) numbers indicate that Berlin reduced the efficiency of the attack by 5x, and snapshot reduces it by 10x, totalling to a 50x reduction of impact.

We estimate that currently, on Mainnet (15M gas), it would be possible to create blocks that would take 2.5-3s to execute on a geth node without snapshots. This number will continue to deteriorate (for non-snapshot nodes), as the state grows.

If refunds are used to increase the effective gas usage within a block, this can be further exacerbated by a factor of (max) 2x . With EIP 1559, the block gas limit will have a higher elasticity, and allow a further 2x (the ELASTICITY_MULTIPLIER) in temporary bursts.

As for the feasibility of executing this attack; the cost for an attacker of buying a full block would be on the order of a few ether (15M gas at 100Gwei is 1.5 ether).

Why disclose now

This threat has been an “open secret” for a long time — it has actually been publically disclosed by mistake at least once, and it has been referenced in ACD calls several times without explicit details.

Since the Berlin upgrade is now behind us, and since geth nodes by default are using snapshots, we estimate that the threat is low enough that transparency trumps, and it’s time to make a full disclosure about the works behind the scenes.

It’s important that the community is given a chance to understand the reasoning behind changes that negatively affect the user experience, such as raising gas costs and limiting refunds.


This post was written by Martin Holst Swende and Peter Szilagyi 2021-04-23.
It was shared with other Ethereum-based projects at 2021-04-26, and publically disclosed 2021-05-18.

Be the first to comment

Leave a Reply

Your email address will not be published.


*