The acceleration of zkEVM prover

5 min readSep 10, 2024

Introduction

The zkEVM is a Layer 2(L2) network of the Ethereum virtual machine (EVM), a zero-knowledge (ZK) rollup scaling solution developed by the Polygon team. Like other L2 zero-knowledge zkEVM such as zksync-era and zk_evm .zkEVM uses STARK for batch proof and recursive proof to obtain faster speeds, meanwhile, uses SNARK for final proof to reduce the size of proof sent to Layer 1(L1) verifier contract, thereby saving the costs on L1 verification. Recursive proofs are also used to reduce the size of the proofs.

The zkEVM prover is a key component of zkEVM. It implements the whole proof process which includes batch proof, recursive proof, final proof. The following figure illustrates the flow of proof generation :

We accelerate the zkEVM prover. Include the modules on generating batch proof and final proof phases.

Below is our contribution to the acceleration of generating batch proof and final proof:

The GPU acceleration of generating batch proof and final proof.
We tested the BATCH-ECDSA-Verifier algorithm using the Plonky2 Circuit and zkASM. The results show that, within the same trace matrix size, the number of signatures accommodated increased by 1.38x with Plonky2 Circuit and 1.71x with zkASM.
Reduction of the row size of zkEVM trace commitment from 2²³ to 2²².
- Reduction of the row size can shorten the whole proof time. Thereby reducing transaction finalization time.
- When transactions are infrequent on zkEVM L2, reducing the row size can shorten the time of finalizing the batch or increase the percentage of transactions filling in the batch.
Lower the hardware configuration requirements by reducing the row size of zkEVM commitment. The requirement of server memory from 490G decreases to 287G.

Overview of the zkEVM prover acceleration

Acceleration of generating batch proof

We cut the time of building a Merkle tree in half by GPU acceleration with one RTX4090. The build time of a Merkle tree with 2²³ leaves from 20.1s decreases to 9.78s by GPU acceleration.
We reduce the row size of zkEVM trace from 2²³ to 2²². The time of generating batch proof from 133.3s decreases to 70.5s. When we use GPU acceleration, the generating time further reduces to 58s.
The requirement of server memory from 490G decreases to 287G by reducing the row size of zkEVM trace from 2²³ to 2²².

GPU acceleration for final proof

FINAL PROOF employs the FFLONK zero-knowledge proof system.

FFLONK or Groth16 is used to convert STARK proofs into SNARK proofs. According to the analysis presented in this article, FFLONK demonstrates lower gas costs compared to Groth16.

We utilized ICICLE, a GPU acceleration library for zero-knowledge proof systems, to accelerate the computationally intensive NTT and MSM operations within the FFLONK zero-knowledge proof system.

Testing the BATCH-ECDSA-Verifier Algorithm with Plonky2 Circuit and zkASM

In a scenario where there are multiple ECDSA signatures requiring verification, we can use the BATCH-ECDSA-Verifier algorithm to reduce circuit size. This allows us to accommodate more ECDSA signatures within the same trace matrix size. We implemented and tested the approach using the Plonky2 Circuit and zkASM, and the test results indicate that the method significantly increases the number of ECDSA signatures that can be accommodated within the same trace matrix size. Specifically, the capacity for ECDSA signatures increased by 1.38x and 1.71x for Plonky2 Circuit and zkASM respectively. The test data is summarized in the table below:

Benchmarking and analysis

The hardware configuration

CPU: AMD EPYC 7763 64-Core Processor * 2
RAM: 1 TB (64G * 8)
GPU: NVIDIA GeForce RTX 4090 24GB * 1

Benchmarking design

To bench the accelerations we have done, we benchmark in two factors: one is GPU acceleration and the other is the row size of zkEVM trace commitment. The number of transactions to batch is 3000.

case1: the row size is 2²², The max number of transactions contained in one batch is 260. So there are 12 batches. And the zkEVM prover is accelerated by GPU.

case2: the row size is 2²², The max number of transactions contained in one batch is 260. So there are 12 batches. And the zkEVM prover runs without GPU.

case3: the row size is 2²³, The max number of transactions contained in one batch is 480. So there are 7 batches. And the zkEVM prover is accelerated by GPU.

case4: the row size is 2²³, The max number of transactions contained in one batch is 480. So there are 7 batches. And the zkEVM prover runs without GPU.

Benchmarking data

The table below is the benchmarking cases table. The unit of proof time is second. The table only lists the time of generating single batch proof and log(n) time of single recursive proof. The phase of generating batch proof can parallel computation by distributed cluster servers. The phase of generating recursive proof can parallel computation with a computational complexity of log(n). n is the number of batch proofs as the recursive proving input. The original Benchmarking logs are stored here.

Table 1: row size 2²² and row size 2²³ batch filled with transfer transactions

Benchmarking conclusion

Based on benchmarking, our GPU accelerations with one RTX4090 reduce about 20% proof generating time. Our reduction of the row size of zkEVM trace commitment from 2²³ to 2²² reduces about 30% proof generating time.

The combination of the two accelerations reduces about 50% proof generating time(220.19s decreases to 118.14s, the whole proof time reduces 45%).

There is a discussion about the trade-off between fast finality and cost minimization in Section 4.5 of this paper. From this perspective, our work on reducing the row size of zkEVM trace indeed accelerates finality, meanwhile, minimizes cost per transaction in the low TPS L2 scene.

Future Work

We will analyze the factors affecting the average cost of transactions verified on L1, such as

the transaction capacity of one batch.
The factors affecting the cost of sequenceBatches transaction and verifyBatches transaction.

We will also compare the gas change before and after the sequenceBatches transaction and verifyBatches transaction support the EIP-4844.