Bulletproofs code improvements

Introduction

For Incognito transactions, Bulletproofs verification take up around 80% of total verification time. This can be understood as a worthwhile trade-off in order to achieve the proofs’ succinctness. However, thorough inspection helps to indicate that several updates & optimizations can (and should, argued by this document) be made to the code to quickly diminish that 80% number.

Bulletproofs computation speedup methods

  1. Import external Bulletproofs library (dalek-bulletproofs)

Method

Dalek-bulletproofs is the fastest & a popular Bulletproofs implementation. Although it’s written in Rust, we can import it using cgo, linking a static library with functions “bulletproofs-prove”, “bulletproofs-verify” exposed.

Why?

  • Instant 75% speedup
  • Same implementation many crypto projects are using

Why not?

  • It uses the group ristretto255, different from ours. This breaks backwards compatibility
  • Cgo does not cross-compile into our WASM & gomobile ports. We would need a separate pure-Go ristretto255-based implementation, which is tons of work

Conclusion: not chosen for development

  1. Combine expensive Point Multiplications

Method

We analyze the code & identify consecutive, independent point multiplications, combining them into a single “multi-scalar-mult” call.

Why?

  • This is a straightforward optimization
  • Easy speedup: applying on 20-point array can give +17%

Why not?

  • Introduces a (low but non-zero) risk of new bugs

Conclusion: will do.

  1. Precomputation for global Bulletproofs base parameters

Method

At least half of the point multiplications are done on globally fixed points (elliptic curve generators etc.). We can optimize the algorithm by pre-computing these points (“precompute” makes multiplying a point much faster).

Why?

  • This is a valid & well-documented practice
  • Difference between “scalar-mult” & “scalar-base-mult” is a good speedup predictor. We could be looking at 3x improvement (in specific parts of the protocol)

Why not?

  • Not trivial to adapt a curve library for new precomputes. The library by default comes with precompute of its one generator only
  • This necessitates copying the curve library into our repository (instead of public domain import), which could hinder upgrades to future versions of said library

Conclusion: will do.

  1. Switch to new curve library

Method

We are currently using Dero-project’s ed25519 implementation. We can switch to filippo.io/edwards25519, which is a mirror of the code from golang/x/crypto.

Why?

  • (almost) Drop-in replacement with full compatibility
  • Instant 3-5% gain on every operation. Edge cases go further, e.g. 4-point multiplication is 13% faster
    image
  • Future maintenance made easier. Our current imported code has readability issues - TODO, bad comments, bad package name (curve25519 when the code is for ed25519), is quite old & has unclear maintenance future.

Why not?

  • This still takes effort to rewrite wrappers, double-check encoding etc.
  • At the moment we do have some functions with custom behavior instead of standard, such as “HashToPoint”. Migrating those will be more difficult (worst case: keep those specific functions in our code -> not that bad)

Conclusion: will do.

  1. Update batch-verification logic

Method

Theoretically, batch-verify is fully prover-agnostic and curve-generator-agnostic; verifier should be verifying 1 block’s Bulletproofs in 1 big multiplication only; even for Confidential Asset TX. A recent code & paper review indicates that 1 multiplication / 1 block limit is achievable with a big logic update in the batch-verify function.

Why?

  • Vital speed improvement for our Confidential Assets’ range proof, which was previously believed to be incompatible with verifier-side batching

Why not?

  • Since this includes a logic review, we cannot reuse the current batch-verify code very well

Conclusion: will do.

  1. Implement Amortization for Bulletproofs

Method

Many Bulletproofs documents recommend a form of amortization that uses a “helper” to do some of the verifier’s computation in a trustless manner.

Why?

  • It promises good speedup for verifier with full compatibility

Why not?

  • It is still to be studied if the newly introduced “helper” model is adequately trustless to be seamlessly integrated into current code

Conclusion: TBD.

  1. Protocol design changes

Method

To achieve even better performance, a design change might be worth considering. The original problem remains that we need to prove transaction outputs are in uint64 range, avoiding trusted setup if possible. Such an update would inevitably break compatibility, and thus actual deployment of it would be relegated to a “privacy-v3”-esque upgrade.

Conclusion: research opportunity for Q2 & Q3, 2022.

  1. Use variable-time computation

Method

The verifier in Bulletproofs protocol uses no secret information, therefore it can call the variable-time versions various tasks such as Point Multiplication…

Why?

  • 20-25% speed gain for multiplications
  • Make no update to Prove(); maintain the constant-time logic

Why not?

  • Introduces a (low but non-zero) risk of new bugs

Conclusion: will do.

We developed an initial implementation & obtained a rough speed benchmark for these optimizations

#outputs old old + optimize new new + combine
1 15.193.736 9.704.923 7.006.252 3.117.194
2 29.053.290 18.240.756 12.983.874 5.649.800
4 57.237.426 35.764.555 24.900.331 10.783.954
8 112.818.286 70.638.589 50.676.403 21.503.311
16 228.455.563 142.324.259 99.016.018 46.478.391
32 449.076.116 291.605.587 198.030.725 98.690.168

image

Durations are in nanoseconds (ns).

Implementation Remarks

Key takeaways

  • Change of curve library provides a small, consistent 3-8% improvement across the board
  • VarTime-MultiScalarMult makes for effective speedup (25%) for verification
  • The verifier-time reduction via Point pre-computation is noticeable, though it is only applied to var-time functions as benchmarks rejected its efficacy for constant-time functions
  • Old code performs lots of singular-point multiplications during verification, meaning combining them cuts down verifier time very quickly
  • Old batch-verify, however, mostly calls MultiScalarMult, so said method has its effectiveness somewhat limited
  • Applying batching to CA proofs provides the most speedup, as predicted
  • The protocol implementation could use a lot of refactoring & explanation

Verifier Benchmarks

To measure practical CPU time improvement to the software, we performed 3 benchmarks:

  • Single Verify time (ms): the time for verification of 1 Bulletproof, parameterized by the output count; improvement leads to better mempool performance.
  • Batch Verify time for non-confidential-asset batches: the time for verification of 1 Bulletproofs batch, parameterized by the batch length; improvement leads to faster block production & validation. Non-CA scenario represents speedup in the “worst case”: block containing exactly 2 PRV transfers.
  • Batch Verify time for half-confidential-asset batches: the time for verification of 1 Bulletproofs batch, parameterized by the batch length; improvement leads to faster block production & validation. Half-CA scenario represents speedup in the (near) average and best case: block containing all token transactions (so half of all Bulletproofs are of “CA” variant).

Benchmark results are below (durations are in milliseconds - ms)

  • Verify Single
output count 1 2 4 8 16 32
old 15,04 29,04 56,67 112,36 230,70 448,16
new 3,12 5,65 10,78 21,50 46,48 98,69
gain (x ratio) 4,83 5,14 5,25 5,23 4,96 4,54

image

  • Bulletproofs batch verify time(ms) with no CA
batchlen 2 4 8 16 32
old 15,48 28,32 55,84 112,54 242,55
new 6,53 10,16 16,78 30,39 57,63
gain (x ratio) 2,37 2,79 3,33 3,70 4,21

image

  • Bulletproofs batch verify time(ms) with 1/2 CA rate
batchlen 2 4 8 16 32
old 41,88 50,12 123,73 264,50 520,87
new 6,81 7,03 15,77 29,09 51,80
gain (x ratio) 6,15 7,13 7,84 9,09 10,06

image

The sharp difference in speed when compared to non-CA scenario is explained by the old implementation not having Bulletproofs-batching support for confidential assets.

We can see that for typical chain usage, blocks can enjoy a 6x-10x faster Bulletproofs verification time by updating to the new code; roughly translating to 3.0x-3.57x faster verification for those blocks.

4 Likes