Introducing new database mode

0xkumi · 14 April 2022 05:36

In the next release, we will support several modes aiming to reduce Incognito node’s disk usage.

1/ FFStorage mode (–ffstorage)

In previous versions, each block and transaction was not optimally encoded, causing a significant increase in storage capacity. We used LevelDB to store them, as well as all other blockchain data, causing massive CPU and RAM consumption every time LevelDB compresses stale data. In this version, we encode blocks and transactions with protobuf, reducing data storage capacity. We also introduced a flat-file system to replace LevelDB in storing block data, to ensure that other data types will not affect the block saving process. Unfortunately, in order to use this version, you need to delete the old data and perform the synchronization process again. The synchronization in this version is also faster than the old version.

2/ Batch-Commit mode (–sync-mode batch commit)

We use Ethereum stateDB to maintain our feature database (transaction, consensus). It allows Incognito to maintain states in any checkpoint which helps us debug, change view, and look up states in the past. Ethereum StateDB use Modified Merkle Patricia Trie (MPT) to organize the data objects as a tree which has some limitations:

When the number of records reaches a certain threshold, Leveldb compression is frequently enabled. The MPT generates a large number of branch nodes, which are used to construct the tree. These nodes just hold routing data, not actual data. Creating a record for each node not only consumes database space but also slows the operation down.
Deleting these records requires an extensive offline process.

To reduce the number of records persisted into disk, we implement a batch commit mechanism. Once a block is inserted, the node will not commit trie data to disk immediately. Instead, commit trie data to disk per batch block. By committing to disk at the last insertion time, we can build only one branch node for two or more leaf nodes with the same prefix. To put it another way, we’ll wait until a large amount of nodes are installed before committing everything to disk. Prior to then, nodes were simply committed to memory. This mode is similar to Ethereum full-sync mode.

3/ Lite mode (–sync-mode lite)

Ethereum-based StateDB implementation allows clients to revert state at specific points in the past. However, there are types of database that node operation doesn’t need this functionality, such as transaction DB. With MPT, there is overhead of internal data structure that makes disk size increase not linearly. In addition, retrieving a value in MPT will need several disk IO.

To support users running low cost nodes, we implement a hybrid data structure, called lite-statedb. As a hybrid mechanism, finalized states are stored in key-value DB, and an unfinalized state (multiview) in link list data structure.

By only persisting the final value, this method does not pose any overhead to the database, but the tradeoff is the node cannot retrieve state in the past.

Result

Here is the result after we benchmark.

Old:the current version code
Archive: the default mode (commit every block), but with ffstorage enabled
Batch: batch-commit mode with ffstorage enabled
Lite: lite mode with ffstorage enabled

From the result, we see that beacon db size is only affected by ffstorage mode (reduce 15%), as it does not have any stateDB that can run in batch-commit or lite mode. On the other hand, shard db size is significantly reduced as its transactionDB can be applied batch or lite.

Regarding the full node database, all shard chains will be reduced 3 times if using batch-commit, and 5 times if using lite mode. And overall, a full node can reduce to 50 - 70% data size.

Note on release

Although these features are tested in several environments, it is not battle-tested yet. To reduce risk to our whole blockchain, we will not apply this feature in production-branch yet, but beta branch instead. There are 3 phases to release:

Core & community volunteer (2 week): core team will run several nodes with new modes, and we welcome any volunteer from community running new modes with us
Partial foundation node (2 week): a portion of fixed node will be upgrade new modes
Full upgrade: these features will be merged and applied in production code.

We recommend that node operators should run at least 1 node with production-branch. In case of bug or db corruption of the beta-branch, you can clone and revert to the production-branch.

The branch and docker tags for this feature will be announced later.

Ducky · 14 April 2022 05:11

@SPAddict25 as previously you asked for more info about blockchain storage reduction, please check out this post.

SPAddict25 · 14 April 2022 06:12

Omg TLDR. Went straight to the charts. I like the green line. Green = good right?

Happy to help out as long as instructions are ELI5 compliant

khanhj · 19 April 2022 08:13

Hey guys,
We have push out an instruction to run new DB mode

fredlee · 19 April 2022 08:33

Can you explain the practical difference between lite, batch, and archive? Why should you not only run lite? Can you run a fullnode? Retrieve account balance? Create transactions?

abduraman · 19 April 2022 08:42

Here is my understanding.

lite is suitable only for validators or nodes that need only actual data since you cannot access the past data.

batch is a delayed database node. Suitable for both validators and the fullnodes that do not need the “recent” data. maybe for the owners of analytics fullnodes or validators wanting to query its past earnings etc.

archive is the current mode. All restrictions unlocked The difference is that it just uses the new feature ffstorage.

Josh_Hamon · 21 April 2022 17:10

What split of modes is desired by the dev team? Are they expecting most to switch to Lite mode? With the other modes being for developers?

0xkumi · 24 April 2022 21:15

The normal mode for fullnode/validator are batch
Archive for backup and debugging
Lite is for low-end validator-only node. Although Lite could be used as fullnode, we don’t recommend this kind of usage.

Linnovations · 26 April 2022 19:17

Hi @0xkumi, thanks for putting together this post.

I’m not a technical person and in fact my very 1st encounter with Linux was to install Ubuntu in order to run a Virtual Node to support this important project, I shared my experience here - Step-By-Step: How I built a Virtual Node on World's Smallest Mini PC

So, when you mention database modes, sorry to say, I’m totally lost. But the from the charts I can tell the changes are necessary and good.

Do you recommend Noobs (like me) who are not so technically savvy sit on the side lines and just wait until the Core team completes testing?

fredlee · 26 April 2022 21:20

I’m currently testing all database modes. I’m not done with my tests yet, but I’ll publish results when I am. What I can say so far is that all modes have different requirements and strengths. I’m done testing batchmode and it requires less disk space than the old archive mode, but requires a lot more memory. We’re talking 6 times more and up. Same seems to go with all ffstorage modes, which cuts disk space, but requires about twice the memory.

radonm · 8 August 2022 17:49

So what would you say is the best mode for validators? How to shrink disk usage and how much more RAM would that need ?

Jared · 8 August 2022 17:58

We are looking into discontinuing these database modes in favor of state pruning. Right now the devs are working on state pruning the bootstrap data. Once complete, node operators will only need to bootstrap their data to take advantage.

The best way to shrink the disk usage is by hard linking, there is a community made script to assist with this:

duc · 8 August 2022 23:24

Hey @radonm, please look into the “How to prune data” of the topic to shrink disk usage via an RPC call.
Note that pruning is a CPU-intensive process so if you are running multiple nodes on the same server, be careful with the CPU usage, it’s common to take about 30-40% CPU for a pruning process.

radonm · 9 August 2022 14:58

I have that but some time after the hard linking storage grows quite a bit and it has to be done again and requires downtime. It seems state pruning will be a live maintenance.