Bulletproofs code improvements

DariusL · 28 June 2022 06:22

Introduction

For Incognito transactions, Bulletproofs verification take up around 80% of total verification time. This can be understood as a worthwhile trade-off in order to achieve the proofs’ succinctness. However, thorough inspection helps to indicate that several updates & optimizations can (and should, argued by this document) be made to the code to quickly diminish that 80% number.

Bulletproofs computation speedup methods

Import external Bulletproofs library (dalek-bulletproofs)

Method

Dalek-bulletproofs is the fastest & a popular Bulletproofs implementation. Although it’s written in Rust, we can import it using cgo, linking a static library with functions “bulletproofs-prove”, “bulletproofs-verify” exposed.

Why?

Instant 75% speedup
Same implementation many crypto projects are using

Why not?

It uses the group ristretto255, different from ours. This breaks backwards compatibility
Cgo does not cross-compile into our WASM & gomobile ports. We would need a separate pure-Go ristretto255-based implementation, which is tons of work

Conclusion: not chosen for development

Combine expensive Point Multiplications

Method

We analyze the code & identify consecutive, independent point multiplications, combining them into a single “multi-scalar-mult” call.

Why?

This is a straightforward optimization
Easy speedup: applying on 20-point array can give +17%

Why not?

Introduces a (low but non-zero) risk of new bugs

Conclusion: will do.

Precomputation for global Bulletproofs base parameters

Method

At least half of the point multiplications are done on globally fixed points (elliptic curve generators etc.). We can optimize the algorithm by pre-computing these points (“precompute” makes multiplying a point much faster).

Why?

This is a valid & well-documented practice
Difference between “scalar-mult” & “scalar-base-mult” is a good speedup predictor. We could be looking at 3x improvement (in specific parts of the protocol)

Why not?

Not trivial to adapt a curve library for new precomputes. The library by default comes with precompute of its one generator only
This necessitates copying the curve library into our repository (instead of public domain import), which could hinder upgrades to future versions of said library

Conclusion: will do.

Switch to new curve library

Method

We are currently using Dero-project’s ed25519 implementation. We can switch to filippo.io/edwards25519, which is a mirror of the code from golang/x/crypto.

Why?

(almost) Drop-in replacement with full compatibility
Instant 3-5% gain on every operation. Edge cases go further, e.g. 4-point multiplication is 13% faster
Future maintenance made easier. Our current imported code has readability issues - TODO, bad comments, bad package name (curve25519 when the code is for ed25519), is quite old & has unclear maintenance future.

Why not?

This still takes effort to rewrite wrappers, double-check encoding etc.
At the moment we do have some functions with custom behavior instead of standard, such as “HashToPoint”. Migrating those will be more difficult (worst case: keep those specific functions in our code -> not that bad)

Conclusion: will do.

Update batch-verification logic

Method

Theoretically, batch-verify is fully prover-agnostic and curve-generator-agnostic; verifier should be verifying 1 block’s Bulletproofs in 1 big multiplication only; even for Confidential Asset TX. A recent code & paper review indicates that 1 multiplication / 1 block limit is achievable with a big logic update in the batch-verify function.

Why?

Vital speed improvement for our Confidential Assets’ range proof, which was previously believed to be incompatible with verifier-side batching

Why not?

Since this includes a logic review, we cannot reuse the current batch-verify code very well

Conclusion: will do.

Implement Amortization for Bulletproofs

Method

Many Bulletproofs documents recommend a form of amortization that uses a “helper” to do some of the verifier’s computation in a trustless manner.

Why?

It promises good speedup for verifier with full compatibility

Why not?

It is still to be studied if the newly introduced “helper” model is adequately trustless to be seamlessly integrated into current code

Conclusion: TBD.

Protocol design changes

Method

To achieve even better performance, a design change might be worth considering. The original problem remains that we need to prove transaction outputs are in uint64 range, avoiding trusted setup if possible. Such an update would inevitably break compatibility, and thus actual deployment of it would be relegated to a “privacy-v3”-esque upgrade.

Conclusion: research opportunity for Q2 & Q3, 2022.

Use variable-time computation

Method

The verifier in Bulletproofs protocol uses no secret information, therefore it can call the variable-time versions various tasks such as Point Multiplication…

Why?

20-25% speed gain for multiplications
Make no update to Prove(); maintain the constant-time logic

Why not?

Introduces a (low but non-zero) risk of new bugs

Conclusion: will do.

We developed an initial implementation & obtained a rough speed benchmark for these optimizations

#outputs	old	old + optimize	new	new + combine
1	15.193.736	9.704.923	7.006.252	3.117.194
2	29.053.290	18.240.756	12.983.874	5.649.800
4	57.237.426	35.764.555	24.900.331	10.783.954
8	112.818.286	70.638.589	50.676.403	21.503.311
16	228.455.563	142.324.259	99.016.018	46.478.391
32	449.076.116	291.605.587	198.030.725	98.690.168

Durations are in nanoseconds (ns).

Implementation Remarks

Key takeaways

Change of curve library provides a small, consistent 3-8% improvement across the board
VarTime-MultiScalarMult makes for effective speedup (25%) for verification
The verifier-time reduction via Point pre-computation is noticeable, though it is only applied to var-time functions as benchmarks rejected its efficacy for constant-time functions
Old code performs lots of singular-point multiplications during verification, meaning combining them cuts down verifier time very quickly
Old batch-verify, however, mostly calls MultiScalarMult, so said method has its effectiveness somewhat limited
Applying batching to CA proofs provides the most speedup, as predicted
The protocol implementation could use a lot of refactoring & explanation

Verifier Benchmarks

To measure practical CPU time improvement to the software, we performed 3 benchmarks:

Single Verify time (ms): the time for verification of 1 Bulletproof, parameterized by the output count; improvement leads to better mempool performance.
Batch Verify time for non-confidential-asset batches: the time for verification of 1 Bulletproofs batch, parameterized by the batch length; improvement leads to faster block production & validation. Non-CA scenario represents speedup in the “worst case”: block containing exactly 2 PRV transfers.
Batch Verify time for half-confidential-asset batches: the time for verification of 1 Bulletproofs batch, parameterized by the batch length; improvement leads to faster block production & validation. Half-CA scenario represents speedup in the (near) average and best case: block containing all token transactions (so half of all Bulletproofs are of “CA” variant).

Benchmark results are below (durations are in milliseconds - ms)

Verify Single

output count	1	2	4	8	16	32
old	15,04	29,04	56,67	112,36	230,70	448,16
new	3,12	5,65	10,78	21,50	46,48	98,69
gain (x ratio)	4,83	5,14	5,25	5,23	4,96	4,54

Bulletproofs batch verify time(ms) with no CA

batchlen	2	4	8	16	32
old	15,48	28,32	55,84	112,54	242,55
new	6,53	10,16	16,78	30,39	57,63
gain (x ratio)	2,37	2,79	3,33	3,70	4,21

Bulletproofs batch verify time(ms) with 1/2 CA rate

batchlen	2	4	8	16	32
old	41,88	50,12	123,73	264,50	520,87
new	6,81	7,03	15,77	29,09	51,80
gain (x ratio)	6,15	7,13	7,84	9,09	10,06

The sharp difference in speed when compared to non-CA scenario is explained by the old implementation not having Bulletproofs-batching support for confidential assets.

We can see that for typical chain usage, blocks can enjoy a 6x-10x faster Bulletproofs verification time by updating to the new code; roughly translating to 3.0x-3.57x faster verification for those blocks.

Bulletproofs code improvements

Introduction

Bulletproofs computation speedup methods

Import external Bulletproofs library (dalek-bulletproofs)

Method

Why?

Why not?

Combine expensive Point Multiplications

Method

Why?

Why not?

Precomputation for global Bulletproofs base parameters

Method

Why?

Why not?

Switch to new curve library

Method

Why?

Why not?

Update batch-verification logic

Method

Why?

Why not?

Implement Amortization for Bulletproofs

Method

Why?

Why not?

Protocol design changes

Method

Use variable-time computation

Method

Why?

Why not?

Implementation Remarks