[Shipped] A Multi-View Solution for PBFT Protocol

What problem are you solving?

Describe the pain point. What are the shortcomings of current solutions?

Proof of Stake achieves the consensus through a voting system which is implemented through a BFT protocol. The well-known BFT protocols are Tendermint, Hotstuff, PBFT which are bi-modal approaches: the protocol typically consists of a simple normal path where a leader makes proposals and everyone votes. When the normal path fails, the protocol switches to a much more complicated fall-back mode typically called a “view change”. The view change protocol must essentially embed a “full-fledged” consensus that offers both consistency and liveness. As such, the complexity of classical protocols arises due to the view change.

In this work, we propose a new approach that could overcome the limitation of view change approach.

What is our solution?

Why is it a good idea? What’s new about what you’re doing? The more details, the better. Sketches, mockups, demos, prototypes, videos, pictures - it all helps community members get excited as you are.

In the case of network traffic peaking, some nodes could commit a block, while other nodes fail to reach the commitment. The idea is on the network, nodes will maintain multi chain view in which the committee could choose the longest branch as a finality chain. For block aggreement, We use a modified pBFT protocol consisting of 3 phases.

modified PFT

After commit phase, the block is inserted into the multi-view graph in which there will be algorithm to select the block as final and the longest chain for next propose phase. The detailed spec will be announced later in the comment section.

Who are you?

Introduce the project team members (schools, jobs, projects, github, twitter, blogs, etc.) and any similar work you’ve done.

We consist of 1 researcher and 5 enginneer at Incognito Scalability team (@0xkumi, @hungngo, @jason, @lam, @hyng, @duybao). Our research interests lie in building efficient blockchain that can scale to thousands of node.

Why do you care?

Tell people why you’re passionate about your privacy project and committed to making it happen.

An efficient BFT protocol which takes into account the following conditions is still a challenge for blockchain community:

  • The adversary can arbitrarily delay messages sent by honest processes.
  • Proposer can be byzantine.
  • The network traffic delay is unpredictable.

To design a new consensus that could solve these issues more efficiently than the view change approach is an interesting problem.

What’s your plan? What’s your schedule?

There are 3 parts to this plan:

  1. Building consensus v2 (new consensus) module
  2. Refactor all other packages to use multiview
  3. Run multiview with consensus v1 (old consensus)
  4. Run multiview with consensus v2

Schedule (total 9 months), starting from Feb 15:

  • Writing protocol detail spec
  • Implement consensus module with multi-view chain
  • Refactor other package using multi-view chain
  • Testing and Launch testnet/mainnet
  • Working on byzantine problem
  • Testing and Launch testnet/mainnet
Milestone Deadline
Development Consensus v2 15 March
Development MultiView Protocol 30 March
Integration with other core features 15 April
Deploy multiview for consensus bft-v1 on testnet 30 April
Deploy multiview for consensus bft-v2 on testnet 30 June
Deploy multiview for consensus bft-v1 on mainnet 30 July
Deploy multiview for consensus bft-v2 on mainnet 30 Oct

How do you validate your work?

In the current network, when fork situation occurs, the fork block will be revert and network could be stuck for a long time. With this multi-view protocol, we allow fork block to insert into chain, and the block proposer selects the longest branch to continue. Hence, block fork will not make the network stuck. We expect to experience no delay during fork situation.

What’s your budget?

A simple breakdown lets community members know you’ve thought things through and have a workable plan, so they can trust you to use funds wisely.

The project will be undertaken by 6 engineers for 10 months:

Feb-2020

Resource Cost Resource Monthly Cost
Incognito Protocol Researcher - @dungtran 2000 PRV 1 2000 PRV
Incognito Protocol Platform Engineer - @0xkumi 2000 PRV 1 2000 PRV
Incognito Engineer - @lam 1000 PRV 1 1000 PRV
Incognito Engineer - @hungngo (1/2 resource) 1000 PRV 1 500 PRV
Incognito Engineer - @duybao (1/4 resource) 2000 PRV 1 500 PRV
Incognito Engineer - @hyng (1/4 resource) 1000 PRV 1 250 PRV
Subtotal 6250 PRV
Total (x 1 months) 6250 PRV

March-April 2020

Resource Cost Resource Monthly Cost
Incognito Protocol Researcher - @dungtran 2000 PRV 1 2000 PRV
Incognito Protocol Platform Engineer - @0xkumi 2000 PRV 1 2000 PRV
Incognito Engineer - @lam,@hungngo,@hyng 1000 PRV 3 3000 PRV
Incognito Engineer - @duybao 2000 PRV 1 2000 PRV
Subtotal 9000 PRV
Total (x 2 months) 18000 PRV

From May-July 2020

Resource Cost Resource Monthly Cost
Incognito Protocol Platform Engineer - @0xkumi 2000 PRV 1 2000 PRV
Incognito Engineer - @hungngo,@hyng 1000 PRV 2 2000 PRV
Incognito Engineer - @duybao 2000 PRV 1 2000 PRV
Subtotal 6000 PRV
Total (x 3 months) 18000 PRV

Is there an existing conversation around this idea?

To the best of our knowledge, Streamlet protocol is one of the approaches attempting to solve this problem.

17 Likes

This project is fully funded, and moved to the category ‘work in progress’.

Funds have been sent for February, and will continue to be disbursed every subsequent week dependent on progress.

Excited to see this develop. Please share your February update with the community!

3 Likes

Progress update:
Until now

  • we have implemented MultiView Consensus Protocol
  • and refactor consensus and sync module

Update for 2-7 March:

  • we benchmarked the performance of the new database. The result is not good and we are looking at it.
  • we started to restructure our blockchain package so that it will use multiview instead of a single state.
1 Like

Weekly update 8-13 March:

  • We finished restructuring the blockchain package, so that it now can run with multiview protocol (branch /dev/multiview)

  • This week, we also fixed database design error and the result is really good (x10 reduce space). Now, we are merging the new database implementation to the multiview code base (branch /dev/multiview-newdb). It is expected to finish at the end of next week

4 Likes

Weekly update 14-20 March:

  • This week, we merged the new database implementation to the multiview code base and started to test the network on local machine. We found some bugs with consensus v2 and are looking at them.
  • We also start to define a new code flow so that it is easier for new developers to comprehend our system quickly.

Next week goals:

  • Fix bug issues
  • For now, the validator can create block smoothly in the normal situation, however, in case of an unstable network, they can fork block. We will add logic for sync process to support fork situation.
2 Likes

Please view official branch for newest code for dev. To make sure you develop this proposal on the newest code for dev, please merge this branch into yours and continue to develop after merging with no confliction.

Weekly update 21-27 March:

  • This week, we are gradually deploying new database implementation to mainnet. There are some bugs and we are looking at them.
  • Regard consensus v2, we continue to review and refactor code so that it could run without error in fork cases.
  • Sync block by hash is just implemented, and we need more time to test.

Next week will involve consensus v2 tests and deploy mainnet nodes.

1 Like

@0xkumi

We will not deploy this on mainnet on next week. On next week, we can only try it on localhost with enjoining of some tester to make sure that forcing issues not happen and all of the old features still work well. My suggestion team needs to contact @khanhj and ask him for a workaround automation test on local for all systems before using dev-net for official testing.

1 Like

@thaibao
I’ll be happy to help!

3 Likes

Weekly update 28 March - 4 April ( in milestone “Integration multiview with other core features”)

  • This week involves much time to fix mainnet bug issue about dbv2.
  • Besides that, we reviewed integration logic for other features, including slashing, pdex, bridge. We added logic for feature block synchronization in some special cases. We also added RPC to view the synchronization process.

The branch mutliview-newdb is ready to test. We hope the QC team can test all flows soon.

3 Likes

Weekly update 5 - 11 April ( in milestone “Integration multiview with other core features”)

  • This week we continue to review the integration with other core features and are waiting QC team to test all flows.

  • In the meantime, we prepare for the next milestone ("run multiview with consensus v1 on testnet) on 30 April. We are running a fullnode, and 8 shard nodes on testnet. These nodes are syncing old blocks and we are following their process to capture any issues. Currently, we found and solved issues related to database and race condition.

Next week, we will watch for any issues related to consensus protocol after these nodes synced.

4 Likes

Hi @0xkumi,

Next week, the team of Portal v2 needs to be reviewed 2nd for updating of custodian reward. So your team can use this time to review and test on local with the tester to make sure everything is ready for testnet. But your team does not worry about that. We can branch a new one from master-temp-B-deploy-db-v2 and merge your code into, in this way, we have good branch for testing your all features.

Sorry for this inconvenience

5 Likes

Weekly update 11 - 17 April ( in milestone “Integration multiview with other core features”)

  • This week the QC team started to test our code with several flows (transaction, stake, bridge, pdex, fault tolerance). There is issue with bridge feature and we are looking at this problem (may be because of handling wrong chain view). In additional, we found that there is loop hole in cross chain interaction features (bridge, portal) when chain reverting happens.

  • We also added time metric utility so that we can report how much time a feature process consume. (thanks to @hungngo )

  • We are implementing a feature on Highway that deliberately make fork case based on predefined scenario (@trungtin2qn1, @hyng will work on this for 2 weeks)

Next week, we focus on working with interoperability team to solve reverting issue.

7 Likes

Hi team,

Sorry about reviewing portal v2 (2nd reviewing for some issue reward custodian )late on last week base on portal v2 so that your team merged portal v2 also late. After reviewing code for portal v2. I see that portal team base a lot on the block by height
image

That means some business logic of them will conflict with the multi-view of this solution. Because they will use the wrong final state of the view. I think you guys will need to work together to get suggestions on how to make it work here. As @0xkumi said, portal v2 needs to wait for choosing the last state or process revert without waiting for the last state when forking. It needs you guys to ask and meeting for results.

This week, I will continue review on multi-view solution code(no portal v2 merging). I will give feedback later. Sorry for this inconvenience.

Thanks

5 Likes

thanks @thaibao for your notice, other feature may counter this problem as well, so plz check this twice whenever you merge code. tks

2 Likes

Hi @team

I’m pre-reviewing base on this pull request. With 158 file and 188 commits, I think that I need to spend a lot of days to do it :smile:. We know that this is a pre-review for multi-view solution coding, not final coding because you guys need to merge portal v2 for a finalized pull request. I will send some reviewing code for 1st fixing:

1/ We can remove case data := <-cm.data here, right?
image
because this function is no longer used
image

2/ shardID come from RPC in some case. So I think we need to check index of slice and nil pointer exception
image
because if RPC is invalid, the node will be crashed

3/ This function always return 0 -> remove it?
image
and review where we use it
image

4/ Singleton in beacon pool
image
We can use blockchainObj as a field in BeaconPool singleton, right? Because any node also needs to get beacon, singleton beacon pool will handle this blockchainObj and shardpool can reuse it

5/ Many interface of server object is no longer used
image
image
image
can we remove all of them?

6/ I think 3 functions are not used, right?
image

7/ Remove this condition
image
image

9 Likes
  • RPC must check this is parameter, if not it’s very carelessness
  • Because in paramter is byte. It’s a value type range from 0->255 so it can’t be nil in this case. And also there’s a condition checking BestView is exist or not before getting it. So i think this function is safe
3 Likes

As i recalled beaconpool/shardpool/crossshardpool is outdated, no longer in use. @0xkumi plz very my answer

3 Likes

We will not use singleton pool in this version.

2 Likes

Weekly update 18 - 24 April (in milestone “Run multiview for consensus v1 on testnet”)

  • After discussing with the pdex bridge team, we decided that the current version is safe to deploy. As the fixed proposer mechanism will prevent forks, at the current stage, we will not care much in revert cases.
  • @hungngo worked on partition database for each chain (shard, beacon), so that we could backup easily a certain chain.
  • We continue to implement a feature on Highway that deliberately makes fork cases based on the predefined scenario.
  • This week, we reviewed and merged new portal features to our code. We also fixed some minor bugs about concurrency and sinker.

Next week, we want to test again the code with other features before deploying on testnet.

5 Likes