Parallel Execution of Transactions

Boba Network

March 10, 2023

The research in this article is provided by Enya Labs as part of their contributions to Boba Network.

As developers working with Ethereum, we all know that the Ethereum Layer 1 (L1) isn’t very scalable, and that’s what Layer 2 (L2) projects like Boba Network are committed to address and resolve.

Of course, there are several approaches to scaling — on-chain approaches like Sharding parallelizes execution by splitting the network into multiple shards. And the L2 approach takes transaction execution off-chain, using the L1 as a settlement layer through rollups or other techniques. Boba currently uses optimistic rollups, but its mission is to use the right techniques that suit the needs of developers as the technology landscape matures.

But here’s the thing about scaling. Over time, there’s always more demand for scale than any one solution to the problem can provide. Such is the tyranny of distributed systems!

That means Boba’s job is never done. The Boba community is always looking for better ways to increase throughput and reduce latency to meet the ever-increasing demand of projects spanning the internet that need transaction finality across many organizations without sacrificing performance or reliability.

In this article we are going to get into some approaches that could push the boundaries of scalability even further, including some new methods that we at Enya are considering — specifically addressed around a single bottleneck. We will also try and weigh the pros and cons of each, to identify the most promising path forward.

Breaking the Bottlenecks

Scaling one part of a system without scaling the bottlenecks around it doesn’t get you far. While some of the more prominent reasons that limit scalability are consensus and propagation of messages (in turn the block size and time), the Ethereum Virtual Machine (EVM) is itself, in its core, a crucial limiter of scale. And, since several L2s like Boba would still want to utilize and maintain EVM compatibility/equivalence to replicate L1 user experience — they also inherently need to, and are more suitable for finding ways to get around this bottleneck.
So the natural question is — what exactly is this bottleneck and how do we get around this?

Amping Up the EVM

Y(S, T)= S’

Ethereum is a state machine, and the EVM executes transactions sequentially, one by one. It doesn’t execute other transactions until the execution of the current transaction is completed and the resulting state is computed. This is to avoid state/storage collisions and preserve atomicity of transactions. In other words, this ensures that each transaction is a self-contained, atomic unit of operation that changes the system’s state without interference from or interference to any other concurrent transactions. The execution, however, is still sequential, even with a set of completely independent transactions, because the EVM isn’t designed to be able to execute them at the same time.

Despite this being a fundamental feature, it remains a bottleneck for the performance of the network and can limit the throughput of Ethereum or any EVM chain.
Especially for an L2 where other limitations to scalability like consensus(and message propagation) might not exist, serial execution can turn out to be a more pronounced limitation. Additionally, the L2 as an execution layer for the L1, could be an easier and more suitable ground for implementing approaches to go around the bottleneck and scale maximally.

This brings in the need for parallel execution of transactions, and more so, in an L2 context.

We can think of parallel execution to be possible at two levels:

Parallel at the transaction level — treat each transaction as the smallest unit for parallelization and execute them on multiple threads.
Parallel at the opcode level — treat each operation as the smallest unit and parallelize opcodes, memory access, etc for multiple transactions.

While parallelizing at the transaction level is relatively straightforward and could produce considerable throughput improvements, parallelizing at the opcode level will attain the maximum scalability. However, the latter approach is inherently more complex. We will focus on the first approach to begin with, and discuss it throughout the rest of the article.

Long Run Solutions

If we further narrow it down, storage access is where the EVM spends most time while executing a transaction. And when executed sequentially, it represents a significant portion of the overall execution time. Optimizing storage access, hence, could be an important step in enabling parallel execution.
At the same time, much of Ethereum’s ongoing research efforts have been on statelessness, witnesses, and light clients, which could enable pre-fetching or optimizing storage accesses. Furthermore, Sharding would split the network into multiple smaller networks (shards), and would theoretically parallelize execution with the design. Things would be easier for the L1, with all of these implemented in the future.

What We Can Do Today

Until these future enhancements arrive, there are ways to address parallel execution that can be made practical today.

Here are a few, of the numerous approaches people have thought about achieving concurrency:

UTXO based blockchain — This is much easier because a UTXO spending transaction has the unspent outputs (which are also signed) as inputs for every transaction, and that makes it easy to identify the storage slots that are going to be affected.
Speculative concurrency — Attempt executing every transaction in parallel, transactions with conflicts are scheduled at the end.
Speculative concurrency with role-based separation of nodes — Some nodes participate in ordering (e.g., grouping transactions together) and some execute.
Storage access lists provided as an input for transactions — To help identify what storage (or which contracts) the transaction will access.
Predicting access lists or speculatively generating and caching (e.g., estimateGas).
Static analysis of contracts or bytecode analysis to determine access lists ( e.g., commutative operations, even with collisions, can produce a correct state).

At a high level, all of the following (except UTXO) fall into one of these two groups*:

Speculative concurrency
Access Lists

* there are certain methods (in addition) that may not fall into either group

Discussing Different Approaches to L2 Parallelization

Let’s continue to assess each of these groups, and find out what each of these approaches could mean for a Layer2 like Boba, but before that, I bet you are thinking — what about the UTXO model?

Discussion:1 Give away EVM compatibility/equivalence

A chain/L2 based on the UTXO model is easier to parallelize than an account based model. In the former, a transaction is generally a spending transaction and includes the unspent outputs that it is willing to spend (along with proper signatures of the inputs) for the transaction to be valid. This provides for a perfect opportunity to group together transactions that do not spend the same inputs, and execute them in parallel.

However, a chain based on UTXO model isn’t EVM equivalent, and loses out on the advantages of the existing Ethereum toolset, code and developers.

Fuel, Findora are some of the popular examples today, which enforce parallelization and utilize a UTXO model.

Additionally:

The UTXO model can be extended to have parallel validation with utreexo- https://blog.bitmex.com/faster-blockchain-validation-with-utreexo-accumulators/

And, there are some chains that use both UTXO and account model like Findora https://wiki.findora.org/docs/modules/prism/Overview/

Smart contracts with UTXO account model and parallelization is also possible**https://forum.celestia.org/t/accounts-strict-access-lists-and-utxos/37

Going back to the two groups we previously agreed upon, let’s discuss each of them in more detail-

Discussion 2: Speculative Concurrency

Speculative Concurrency appears to be a very popular strategy for parallel execution. A lot of research has been done, lots of ideas shared and several optimizations to speculative concurrency have been proposed.

The general idea of speculative concurrency is to execute transactions in parallel threads speculatively. If there were collisions (common state access) between these transactions, they are discarded and rerun sequentially later.

Adding on to this general idea, there have been several proposals towards improving Speculative concurrency such as — ‘Separation of nodes with speculative concurrency’ — which involves distributing the network into nodes, who have separate roles — mainly a distinction between those that participate in ordering and some that execute
For ex — ParBlockchain, Rainblock (Validator nodes, IO nodes), Flow from Dapper Labs (separation of consensus and compute)

‘BlockSTM’ is another proposal that is ‘built around principles of Software Transactional Memory’. It increases efficiency through dynamic dependency estimation and a smart scheduler.

Some other ideas further add to this with dependency graphs, locks and static/bytecode analysis. Forerunner produces constraints based on multi future predictors and speeds execution through the constraints. But, while there are various proposals, it is challenging to accurately evaluate its effectiveness in practice.

The current concerns with Speculative concurrency have been:

It involves a lot of aborting tx and re-executing — which brings in the need for pricing, otherwise this could open up to DoS attacks
Several of these methods might include changes on a protocol level
Speculative concurrency can suffer from penalties if a lot of tx have conflicts and have to be scheduled again.
If the execution time of one tx in a batch is significantly more than the others, speculative concurrency will yield only minor benefits
It also depends on the activity of the network, types of transactions and the size of the chain among other things, which are hard to generalize, so the efficiency might not be deterministic

‘An Empirical Study of Speculative Concurrency in Ethereum Smart Contracts’ by Seraph et al (https://arxiv.org/pdf/1901.01376.pdf) is a great work of analysis of general speculative concurrency (without additional improvements) and its performance.

Some of their results have been quoted here:

When txs are optimistically executed, conflict grows as blockchain is more crowded, generally 35% clash rate
“Accurate static conflict analysis may yield a modest benefit”
“In high-contention periods, most contention resulted from a very small number of popular contracts”
“Today, contract writers have no motivation for avoiding such conflicts. It could be productive to devise incentives, perhaps in the form of reduced gas prices, for contracts that produce fewer data conflicts.” (we will discuss about this under method access boundaries)
“Speculative techniques typically work well when conflicts are rare, but perform poorly when conflicts are common”

Since speculative concurrency can be subjective, in order to estimate the effects general speculative concurrency (without constraints) could have on Boba — we can probably look back.

Approximating efficiency of general speculative concurrency from Boba’s history:

We extracted information about storage access(read/write) info from previous transactions on Boba in order to estimate the efficiency of speculative concurrency on the network based on its transaction patterns, with the understanding that there may be unexpected deviations. These results, that will follow, are based on around 10,000 transactions from Boba’s history around block ranges (365k-370k) and (795k-800k).

Boba does not have a mempool currently that could clearly reveal the collisions based on the execution time by the sequencer. For approximation, transactions with close proximity have been grouped together (in groups of 3 and 5) as an estimate of what mempool/ concurrent transactions could look like (in the future).

Furthermore, collisions at two different levels have been represented distinctly:

i) Collisions at a contract access level (i.e. two or more transactions that access the same addresses)

ii) Collisions at the storage level to check exact parallelizability (i.e. two or more transactions that access the same storage slots of these addresses)

The storage access information from these historical transactions also include operations on the storage slots that each transaction performs.

Between two transactions (tx1, tx2) and their operations on a specific storage slot-

(read, read) — works for concurrency

(read, write) and (write, read) — does not

(write, write) — mostly does not, unless operations are commutative

Address Access Collisions

Total groups: number of groups of txs, (approximate grouping around proximity of tx — think blocks)

Collision addresses: Groups that have collisions based on the addresses they access
If two or more txs in a group access a common addresses, we treat the group for having address collisions

Collision Storage slots: Groups that have collisions based on the storage slots of addresses they access. If two or more txs in a group access the same slots in the same addresses, we treat the group for having storage slot collisions

Results from both of these block durations show that conflicts could be expected to be quite common (for Boba), and general speculative concurrency (without improvements), though absolutely worth trying, could yield uncertain results.

Discussion 3: Parallel execution with Access Lists

Access lists for parallel execution, proposed with ‘Easy parallelizability’ (https://github.com/ethereum/EIPs/issues/648) is one of the other lines of thinking about the problem. This involves creating a new tx type that allows users to specify the addresses and storage slots a transaction will access (read/write) as a part of the transaction object. The transactions which have disjoint access lists can be executed in parallel.

Solana utilizes a system similar to access lists to enable parallel execution — SeaLevel — where storage access info is embedded into the transaction as instructions (address and operation) to prefetch them for execution, Sui has a similar approach to Solana.

The main problem with access lists is the difficulty of knowing which states and slots a transaction would affect without executing the transaction. There exists rpc endpoints like eth_createAccessLists (https://geth.ethereum.org/docs/rpc/ns-eth#eth_createaccesslist) which simulates to estimate access lists based on the state of the prior block. Static analysis of bytecode could also be effective in predicting access lists for a tx (ref — https://github.com/alexchenzl/predict-al)

Based on current discussions and strategies, access lists appear to be the most promising and effective approach for enabling concurrency in the EVM. The use of Access lists would typically not involve modifying the EVM or adding an additional layer to achieve the goal. Furthermore, it is closer to and can gain from several ongoing research efforts like Ethereum’s Statelessness/Witnesses generation, which can be very close to obtaining strict access lists. Optional Access lists (https://eips.ethereum.org/EIPS/eip-2930) (https://eips.ethereum.org/EIPS/eip-2929) is already enforced on L1 allowing to prefetch states on the basis of optional access lists (not for parallelization).

But in order to implement parallel execution, access lists have to be made strict — they should be maximally complete, so as to know exactly which transactions can be executed concurrently in parallel and avoid penalties for clashes while execution.

Currently Ethereum has ‘Optional Access Lists’ implemented through a new Tx type. But is it possible to make this a compulsion and enforce concurrency on the network? The assumptions are — a) toolset/libraries/rpc endpoints for the new tx type exists, but several of the most popular user wallet do not support the tx type yet, and at least some wallet will remain without support b) Users do not know or do not want to generate access lists or use scripts for sending txs. Furthermore, it is very difficult to know which states and slots a tx would affect without executing the transaction.

Broadly, the questions that must be answered for enforcing access lists are-

How to generate an almost correct and complete access list?
Who generates access lists for the users?

What if wallets add support for the tx-type? How do we still go from ‘optional’ to ‘strict’ access lists and enable parallelization on L2?

If wallets were to enable the new transaction type, they might use `eth_createAccessList` to estimate the access list for the transaction. However, there is a possibility that this access list will not be accurate when the transaction is actually executed. And as is the demand, for transactions with strict access lists, any transaction with incomplete or incorrect (optional) access lists may be either reverted or queued to be executed sequentially. We could still utilize Ethereum’s existing transaction types without having to create a new one. Although, It is important to maintain the distinction between the original “optional” element for prefetching states and the “strict” element where a transaction can be reverted if it does provide a complete access list.

An opt-in mechanism with a precompile can be made use of to switch between optional (default) to strict (parallelization) for each user. (This could be very similar to the opt-in mechanism for the fee token switch that Boba already has implemented). Any access outside the specified list should be unsuccessful.

Two Proposals:

Irrespective of whether tools exist for the generation of access lists, the question remains: can we do more?

Here are two proposals.

Proposal 1: Method Access Boundaries

It is common for most transactions on a blockchain to primarily involve only a few types of smart contracts. If each transaction includes additional data in the form of access lists, it is likely that there will be repeated patterns of similar access lists with similar calls to contracts.

For a new line of thought, contracts themselves could be the ones to specify access lists, instead of the transaction sender in some cases. In other words, transactions are directed to accounts, and these accounts may have internal calls to other accounts. (we are interested in contract accounts only, since transactions from EOAs to EOAs already have all the information to parallelize), If each contract, on this chain of calls — were to have their own method access boundaries (or) “could bound the addresses that each of the methods in the contract are able to access” (indefinite is a valid option) — we could be closer to finding out intersections and storage collisions between transactions before they are executed.

In addition to being a potential way to execute transactions concurrently, pre-specified method access boundaries could also be a way to enforce higher security standards for a contract by dictating what external methods/addresses a contract is allowed/supposed to call.

Several contracts do have complicated logic, there are contracts (or methods) that allow arbitrary calls, for instance contract wallets, there are proxy contracts which would not really benefit from method access boundaries (these will have ‘indefinite’ boundaries), but on the other hand a lot of general transactions could. Method Access Boundaries when used in combination with other parallelization techniques could result in an increased efficiency.

The goal is to encourage the creation of contracts that can be maximally parallelizable or have clearly defined method access boundaries, in order to improve scalability at the layer.

One such model including access boundaries is envisioned below. (the actual efficiency of this model is currently untested)

Specifications:

Have an optional extra storage layer for storing the access boundaries of each contract account (specifically definite for each of the methods of the contract account)
Storage can be optional for nodes. This could also work with multiple sequencers, where some sequencer could support method access boundaries for parallel execution and in case of transaction reversions a general sequencer can be used
The method access boundaries could be centrally supplied/generated by the deployer.
Requirements for a contract to comply can be optional and can be similar to verifying contracts — plus could be provable.
Contract fuzzing can be potentially utilized to find/prove access boundarie
In case the method access boundaries are found to be incomplete/wrong while execution, the sequencer would revert the transaction (or) queue them to be executed sequentially. To avoid reversions, Transactions to the contract or touching those specific contracts would have to be sent through the normal route
The maintained list would not be enough if the addresses it interacts with, cannot be determined prior execution (delegate call, arbitrary call etc)
This works for transactions that have information about all the contract accounts involved in the transaction call chain. This is easiest if the transaction does not involve calls to multiple unknown addresses. Even one contract without information on the transaction call stack could fail potential parallelizing
The proportion of contracts and transaction to these contracts needs to be found out to figure out the exact benefit in terms of parallelization capability
Some access boundaries can have an additional spec — i.e interactions with them could allow updation to the existing boundaries of other methods of the contract. — for example, a method that adds an ‘address’ value to the storage of the account. The node that keeps track of the storage then, updates the boundaries when calls are made to such methods
Some methods will not have access boundaries that can be specified — in which case the external contract addresses in the boundaries can be specified to be undetermined.

For example, each method could have encoded access boundaries in the form

Address Access Collisions

Where, Type determines if these boundaries can be written to, and R/W determines read or write access to external contract

For a further simplified model, only the external contract accesses can be stored for the whole contract — (similar to how comparisons are in Easy parallelizability “v1” [EIP 648] )
for ex with-

tx1 = A -> B -> C

tx2 = E -> F -> D

can both be compared simply by comparing the set
(S{A} U S{B} U S{C}) ∩ (S{E} U S{F} U S{D}) ≠ Φ

Where S{X} is the method access boundaries of the method X

Proposal 2: Meta AL Transactions and Account Abstraction

To reiterate the problem — people may have access to eth_CreateAccessLists which generates an estimated access list for a transaction (for the use in optional access list)- and conversion from ‘optional’ to ‘strict’ access list has been discussed before. Even with access to an rpc, most of the wallets today do not support an ethereum tx-type with access lists, and furthermore users might not know or want to write scripts for populating transactions with the respective access lists.

Furthermore, since existing rpc methods can only give an estimation, users might not be well suited to provide access lists when the requirements are ‘strict’ — specialized actors might be able to provide smarter access lists. Meta transactions can be a way to relieve the users from involvement with generation and supply of access lists. We will start the discussion with meta-transactions (the way we know it) which are suitable for specific contract addresses, and later will move towards generalizing this for all contracts with ‘account abstraction’ methods.

So, to start with, access lists are a part of the tx object — and a relayer can be assigned the duty to add the access lists to a transaction upon forwarding a user signed message. This meta-transaction can then further be sent to the actual contract. Transactions to these contracts can be made cheaper — as incentivization, if they were to develop and have support for Meta AL transactions. This is similar to the gasless meta-transactions that some contracts are designed to support. The approach isn’t very generalized though and works only for contracts that opt-in and want to support parallelizability.

How do we generalize this? i.e allow this for all transactions throughout all contracts

Enter — Account Abstraction

Account abstraction on L2 can enable forming transactions consisting of multiple signers, and custom logic for validating the tx, without the need for the users to worry about providing access lists.

Account abstraction enables building custom transaction objects, logic, signature schemes and essentially revamping the kind of transaction/operation a network supports and identifies, easily.

With a new userOp structure and wallet logic, any address would be able to send transactions with the duty of providing access lists to a second signer of their choice.

Specifications:

Abstract out signature verification and requirements with account abstraction ERC-4337(modified to include provider address/signature for each user operation). A smart contract wallet level abstraction with relayers could also work, but does not necessarily generalize it.
There exists a market of access-list providers whose job is to provide access lists for transactions. (we can call them providers in short). Even with access list estimation methods available, these external providers could be a step more efficient in providing appropriate access lists. For instance, dapps that also provide their own provider, might be a preferred choice for interactions with them.
They are rated/priced at the efficiency with which they can estimate access lists for transactions (smart providers may include additional parameters, such that estimation is accurate most of the times).
Based on the customized signing conditions — each user operation could specify the provider for access lists (or) each account could set rotating signers for supplying access lists.
A user would sign and submit a transaction to the relayer (or) an operation to their wallet specifying the provider, who in turn is either specified by the user, or is selected for a specific epoch
The relayer/provider then wraps the user tx along with the relevant access list for the transaction and calls the User Account. The User Account can then validate the transaction, signatures and send it forward for execution.

Address Access Collisions

Account abstraction is easier to implement for a L2 than the base Layer, and several L2s would move on to account abstraction at some point. If this can be utilized to enable third party providers/relayers to provide ‘stricter’ access lists, a new market of providers could form. This probably can also be combined with ideas from ‘method access boundaries’, where projects could run their own relayers/providers that could do the job of providing access lists for transactions to their contracts in the most efficient way.

And, That wraps up the discussion on ‘Parallel Execution of Transactions’ for today.

So long and thanks for all the fish

But we are not finished exploring the potential of the topic.

How many transactions can we expect to be executed in parallel with each of these methods? (range)
How efficient are these methods?
How can we make parallel execution deterministic?
Several of these methods depend on existing ecosystem toolsets and their compatibility. When and what kind of support will be introduced in the future?
How big a change to the UX does this involve?

As we wrap up for today, we wanted to leave you with some questions that we are currently considering. We will delve into these topics in more detail in the next installment, so be sure to stay tuned! Until then, Adios!