[Shipped] Reduce Blockchain size by 50%

What privacy problem are you solving?

Incognito Chain is consuming a lot of storage so far (over 100GB for 4 months). At this rate, the typical physical node device (500GB SSD) would run out of storage in about a year and a half, and virtual node expenses would increase substantially. It is thus essential to reduce the current size of Incognito Chain by by at least 50% (to about 50GB at the time of writing)

The Incognito Chain has grown up extremely fast. After diving into the source code and architecture design, we identified a lot of duplicated and outdated data.

At the moment, Incognito Chain stores all information with simple key-value, an approach taken by blockchains such as Bitcoin and Litecoin. Incognito Chain loads all state data (data after processed block) into RAM and processes a new block. As state data (data after processed block) gets bigger, this process will eventually become impossible to conduct. Incognito is an UTXO-based blockchain, but its privacy features make data much bigger than Bitcoin or Litecoin. A new database design is essential. Here is a summary of problems with the current database design:

  • Lack of ability to support incognito’s consensus.
  • Consumes a lot of storage.
  • Unable to handle state data in forked situation.
  • No atomicity or rollback.

What is the solution?

We found Ethereum’s approach to database design handy. It allows us to:

  • Assemble with Incognito’s consensus
  • Easily handle a forked situation
  • Provide rollback ability and atomicity
  • Reduce storage consumption

This approach has been battle-tested and proven effective over almost 6 years, and will save us a great deal of R&D time at this critical juncture.

Which solutions do people resort to because this doesn’t exist yet?

Here are a few other solutions we considered and found wanting:

  • Process state data after blocks are finalized. This would harm UX and cause large delays.
  • Organize key-value storage schema to handle forked situation, which seems like a good solution, but perhaps overly sophisticated to handle at a low level and get rid of out-dated data.

Who are you?

  • I’m @hungngo from the Incognito Core team. I’ve been a researcher in the blockchain space for some time. I find many new emerging blockchains interesting, but the bulk of my work so far (apart from Incognito!) is in Ethereum and Bitcoin. I’ve been with Incognito for the last 15 months building out its consensus. I spend most of my time working on the database.

Why do you care?

  • Reducing the size of the Incognito Chain is essential in growing our validator community to achieve decentralization. The lower the costs, the lower the requirements, the better.
  • A good database design is integral for incognito’s consensus to continue developing.

What’s your plan? What’s your schedule?

Development has been ongoing for a while (since 15 Nov 2019). Here is what I’ve achieved so far, and my plan going forward:

Step Task ETA (days) Actual Begin Date Actual End Date
1 Research and comprehend Ethereum database design 15 15 Nov 2019 30 Nov 2019
2 Build prototype according to incognito chain database schema 7 1 Dec 2019 7 Dev 2019
3 Build Incognito class diagram, link. 15 8 Dec 2019 24 Dec 2019
4 Implementation and unit testing 30 25 Dec 2019 21 Jan 2020
5 Integration with consensus v1 30 3 Feb 2020 8 Mar 2020
6.1 Review And Testing with consensus v1 60 10 Feb 2020 27 Mar 2020
6.2 Deploy new database with consensus v1 21 21 March 2020 estimated 15 Apr 2020
7 Integration with new code based 7 21 Feb 2020 26 Feb 2020

Blockchain Size would be reduced after Step 6 is completed. Step 7, 8 can run with 6 previous steps at the same time (develop for both version simultaneously)

What’s your budget?

Resource Cost Quantity Monthly Cost
Incognito Protocol Engineer @hungngo (1/2 resource of this job) 1,000 PRV 1 500 PRV
TOTAL (x 5 months) 2500 PRV

Is there an existing conversation around this idea?

Our validators have been especially concerned about this recently – both physical node owners and virtual node operators. These proposed actions will improve the validator experience.

Is there anything else you would like the community to know?

If you have an experience in database design for blockchain products, I would love to hear your thoughts.

18 Likes

In schedule, at 7 and 8
Can you estimate more about timeline for them?

Thanks

we are working on consensus v2 design, i will evaluate the time and update later. Step 7 and 8

may be changed in the future

1 Like

It’s great to see this is being worked on. Any ETA on when the size would be reduced? What happened to the existing data that already used on the Pnode does that get deleted off the HD on the Pnode or does it or the HD space is based on how many GD the network is?

Thank you

Development progress is going well as planed, it’s already in testnet. We are testing this feature and expect to release this in late March 2020, we haven’t decided the exact date yet.

Great question, as you know, team separate data into two type: state data and block data.

  • Block data: is everything contain in block, this data is immutable and the most important thing that make blockchain what it is
  • State data: is what we get after processing block data, this data is extracted and inferred from block data.

Clearly, block data is immutable and we absolutely have no chance to reduce it. But, state data, the heaviest part, will be cut off by roughly 50% as we expected.
We will help users to delete all their current data and insert again all block right from the beginning. You can imagine that, pNodes will join the network again with state data will be in shape of new format, much lighter. In the meanwhile, block data change nothing. In conclusion, nothing happen to the block data, you disk will get cut off.

Could you be more specific about this question? Thank you.

That’s great news to hear. Thank for answering my questions. What I meant by my last question is the space on the Pnode. currently right now 100gb of the ssd is being used. When the network gets cut down that about that also used on the ssd on the Pnode gets reduced as well?

yes, absolutely. The purpose is all about cut down every pNode, virtual node SSD storage.

REPORT 22 Feb 2020 - 28 Feb 2020
Current Step:

Finish:

  • Merge code from batching tx feature
  • Local testing

In progress:

  • Testing on dev/net, estimate end date 2 March 2020

To do next:

  • Waiting to merge Incognito Mode for Smart Contract feature
  • Add more unit test
3 Likes

Great result, Hung!

As this project was funded last year and is now a work in progress, i’ve moved it to the right category.

Funds have been sent from November to February, and will continue to be disbursed every subsequent week dependent on progress.

REPORT 2 Mar - 7 Mar 2020
Current Step:

Finish:

  • Merge code from Incognito Mode for Smart Contract feature
  • Merge code from highway-v2
  • Deploy into testnet and measure blockchain size

In Progress:

  • This week, we deploy code into testnet and measure blockchain. Unfortunately, something wrong happened, blockchain size seem to grow faster. In details, when new block is added, block with greater height consumes more space than block with lower height. So we are analyzing and review our implementation to find out the cause
  • Analysis and review estimate end date 13 March 2020

To do next:

  • After analysis result, we will decide what should we do next

Note: In my opinion, the new technique is fine, I think we misuse something and make some mistake in implementation phase.

1 Like

Good luck solving this. Hopefully it is an obvious error that can be easily found and fixed.

Good news is we detected the problem, and moving into debug phase

2 Likes

REPORT 8 Mar - 13 Mar 2020
Current Step

Finish:

  • Fix data consumption problem last week
  • Upgrade key - value schema
  • Sync Testnet and Mainnet

** In Progress:**

  • Sync Mainnet and continue to review and evaluation
  • Prepare Mainnet deployment strategy

Result:
This week team got great result in reducing database:
a. Testnet:

  • Fullnode: From 70GB to 15 GB
  • Shard node (one shard): From 63GB to 4.7GB
    b. Mainnet:
  • Beacon node: From 140GB to 2GB
  • Shard node: in progress
  • Fullnode: in progress

In general, we have reduce over 10 times blockchain size.

10 Likes

Wow, this looks fantastic. Thanks for your hard work, and genius brains to make it work.

Hi @hungngo! Any updates for the past week ?)

Hello @andrey i make a typing mistake. Previous week update is already above . Please check it out, tks

Once this is complete, I vote that @hungngo deserves a unique incognito badge. “db Crusher” comes to mind.

4 Likes

I love that badge josh lol

2 Likes

Update 18 March 2020:
Last two day we encounter some minor issue, these issue only effect node in run time under some rare circumstance. Issues are:

  • When we insert new block, we have to make sure this procedure is atomic. So if node fail to insert new block, it must activate revert current state to a stable snapshot state (which is its previous state).
  • Our system has multiple shard chain and beacon chain, which is very complicated. So we check and make sure one-shard chain will get the right beacon data at the right time.

We managed this problem and solved it already. Now we continue testing phase and sync mainnet fullnode. At the end of this week, we may acheive:

  • Pass all QC testcase at devnet and testnet
  • Sync mainnet fullnode with new database version
  • Update new database version to shard5 testnet

Notice
At current time, choice for update new database version or not DOESN’T effect our network

4 Likes

database

@hungngo @Josh_Hamon @ning Database Crusher badge?

5 Likes