Plonky2 GPU Acceleration Solution Overview

Orbiter_Finance
4 min readDec 29, 2023

--

Orbiter Finance is a ZK-Tech based infrastructure layer for Ethereum ecosystem, aiming to improve the efficiency of funds and interoperability between layer2s, paving the way for billions of people to effortlessly communicate and interact in the web3 world in the future.

We recently achieved a 59.23% speedup on Plonky2 with GPU, 20 secp256k1 signature verification circuit in a batch, from 161.0.7s to 65s, and more speedups are on the way. We are finally moving towards 10s. Every transaction will be proven fast and verified cheaply from source side to destination side.

This article provides an overview of the Plonky2 GPU acceleration solution and highlights the progress made by Orbiter Finance.

Background

In Orbiter business scenario, validating the transaction from source side to destination side, which is a large-scale computing circuit with keccak256, RLP, Merkle tree computation, etc. The large-scale computing circuit is divided into multiple sub-circuits and then aggregated/recursive for verification. Each sub-circuit can be run in parallel, so the bottleneck of the generation would be the generation time of each sub-circuit and the aggregated/recursive time.

Plonky2 is one of the most popular STARK frameworks for ZK developers. Its recursive SNARK is 100x faster than existing alternatives and natively compatible with Ethereum. It combines PLONK and FRI for the best of STARKs, with fast proofs and trustless setup, and the best of SNARKs, with support for recursion and low verification cost on Ethereum¹.

However, in practical scenarios, the circuits to be processed are exceptionally large.The CPU version of Plonky2² remains time-consuming to generate proofs, taking several minutes, or even longer. As a result, it is crucial to explore innovative acceleration solutions to address this challenge.

GPUs are known for their exceptional parallel computing capabilities. Algorithms such as Number Theoretic Transform, Matrix Transpose, and Merkle Tree in the Plonky2 algorithm are computationally intensive and require extensive parallel processing, making the use of GPUs a favorable choice.

Some projects or teams, including sppark³, era-bellman-cuda, icicle and others, have already implemented the computing algorithms, such as NTT powered by CUDA.

Overview

Architecture

The solution architecture consists of three components: the Rust component, the FFI component (which facilitates Rust-C++ interaction), and the C++/CUDA component.

  • The C++/CUDA component implements or interfaces with highly parallelizable algorithms in Plonky2, such as NTT, Matrix Transposition, and Merkle Tree.
  • The FFI component provides Rust interfaces, enabling the use of Rust to invoke the functionality of the underlying algorithms.
  • The Rust component encompasses GPU management, memory management, and the implementation of the Plonky2 algorithmic workflow.

Data Transfer

Reduce Redundant Data Transfers: By combining computations, such as performing a coset NTT followed immediately by a Matrix Transpose, the output of the coset NTT can be directly utilized as the input for the Matrix Transpose. This eliminates the need to transfer data between the GPU and CPU, thereby reducing the complexity and cost of data transfer.

Memory optimization: This involves refining data structures and improving memory management.

Performance Improvement Benchmark

The data provided reflects the execution time for specific program segments in the context of proof generation for 20 ECDSA signatures. A comparison between the original Plonky2 version and our GPU-enhanced version is presented below:

Specifications:

  • CPU: 2 * AMD EPYC 7763 64-Core Processor
  • Memory: 256GB RAM
  • GPU: NVIDIA GeForce RTX 4090 with 24GB of VRAM

Execution Time:

The CPU metrics correspond to the original Plonky2 test branch, while the GPU metrics represent the performance data of our enhanced version. This comparison highlights the substantial improvements achieved with our GPU enhancements in terms of processing time across various program segments involved in the proof generation process.

Please note that the provided data is for reference purposes only, and that actual test results may exhibit some degree of variability.

Zero-knowledge Proofs is magical, inspiring, passionate, and more importantly, breathtaking. While performance is still being optimized, we anticipate that the ultimate performance will meet the demands of real-world application scenarios. Our goal is to make meaningful contributions to the Ethereum ecosystem and share our endeavors in the practical implementation of Zero-knowledge Proofs (ZKP).

More Information about Orbiter Finance

References

[1] https://polygon.technology/blog/introducing-plonky2

[2] https://github.com/0xPolygonZero/plonky2

[3] https://github.com/supranational/sppark.git

[4] https://github.com/matter-labs/era-bellman-cuda

[5] https://github.com/ingonyama-zk/icicle

--

--

Orbiter_Finance

Orbiter Finance is a decentralized cross-rollup Layer 2 bridge with a contract only on the destination side.