(Kind of solved) Problem starting multiple nodes at the same time

The new node database mode being worked on should drastically help lower resource usage.

I already spoke to the dev in charge of this and we will try to offer a bootstrap version so node operators can quickly change over.

3 Likes

A bootstrap would be greatly appreciated

2 Likes

It’s unfortunately a hardware limitation on the 2011 model. Like you were hinting when we talked, it looks like the memory is the biggest issue, but it also goes hand in hand with storage reaching it’s limit. So basically, with the beacon and shard growth over the past year, I finally hit an upper limit what the hardware could handle.

Kind of, I use ZFS and snapshot clones. I also experimented with dedupe between nodes, but that does not work because each nodes database files ends up with unique data.

Looking forward to the new improvements. What I have noticed on the current database design, if I bootstrap from one node to another, when I then boot the new node, it rewrites about 30GB of data during the integrity check in the start. I imagine it updates some data and therefore rewrites the database pages? (just guessing here). Do you know if this is different in the new storage design? I imagine that if it can avoid updating or rewriting the big immutable block data, cloning or hard linking should work even better.

Yes. That is correct, during normal operation it’s very light on the cpu load because of the relatively slow rate of blocks. You really only strain the CPU during shard resyncing or node restarts.

1 Like

Have you tried using this method for hard links? It was designed and coded by a community member:

I use it to manage a large range of nodes and it works well for me.

Nope, but I saw that one, checked the source and it looked really good.

Unfortunately I would not be able to use it because it would kill my nodes doing this. :wink:

    console.group("\nStarting containers.");
    for (const nodeIndex of allNodesIndex)
      console.log(await docker(["container", "start", `inc_mainnet_${nodeIndex}`], (v) => v.split("\n")[0]));

I wonder tho, are the ldb files immutable? Is it 100% sure that old ldb files are never written to? Or is it more of a, “seems to work fine for me”? :slightly_smiling_face:

I believe the old files are not written to. Just checked and confirmed they are there.

Can you try this @fredlee

For Validator, you can choose between Lite or Batch-commit
For Fullnode, you can choose Batch-commit or Archive

1 Like

Testing this now on a new machine.

If I understand correctly on a fullnode with lite database, I will not be able to pull balances or create transactions in the end? But is it still ok to sync and use that as a bootstrap for validators?

While syncing in lite_ff mode I noticed it slowed down a lot around 1700000 blocks in, and it was eating a lot of memory (>4GB). Being a completist I decided to restart testing all modes in parallel.

1 hour into my new test I noticed that the new version seems to be faster than the 20220408_1 version. In all database modes. But it’s quite heavy on the memory, especially with ffstorage enabled.

CONTAINER ID   MEM USAGE   HEIGHT   SPACE   SPACE/BLOCK
inc_220408_1   397.9MiB    40453    1.1G	27kb
inc_arc        615.8MiB    63492    1.7G	27kb
inc_arc_ff     2.145GiB    61619    1.5G	24kb
inc_batch      1.252GiB    59191    917M	15kb
inc_batch_ff   2.692GiB    56996    734M	13kb
inc_lite       596.3MiB    66788    1.2G	18kb
inc_lite_ff    2.140GiB    63964    881M	14kb

I know you’ve always recommended 2GB per node before, but in reality they have used less than 500MiB when running as validators. I hope this is not going to 4x or more going forward.

Update, after 5 hours syncing:

CONTAINER ID   MEM USAGE   HEIGHT   SPACE   SPACE/BLOCK
inc_220408_1   868MiB      122697   3.5G	28kb
inc_arc        742MiB      204041   7.7G	38kb
inc_arc_ff     2.463GiB    196196   6.8G	35kb
inc_batch      2.318GiB    178799   4.0G	23kb
inc_batch_ff   4.17GiB     178611   3.3G	18kb
inc_lite       709MiB      213209   4.5G	21kb
inc_lite_ff    2.435GiB    203428   3.6G	18kb
1 Like

The RAM usage should settle down quite a bit once they reach Sync State Latest. I’ve been discussing with the devs about providing bootstrap data for these new node methods, so hopefully, we can provide that soon.

3 Likes

Thanks, we’ll see how it looks later in a day or two when it has synced everything.

This is after 22 hours

CONTAINER ID   MEM USAGE   HEIGHT   SPACE   SPACE/BLOCK
inc_220408_1   1.043GiB    380780    12G	32kb
inc_arc        955.8MiB    684379    32G	47kb
inc_arc_ff     2.85GiB     655377    27G	41kb
inc_batch      5.775GiB    611572    16G	26kb
inc_batch_ff   7.447GiB    596533    13G	22kb
inc_lite       913.9MiB    729802    18G	25kb
inc_lite_ff    2.736GiB    678622    14G	21kb

> 7GB, you sure it’s not leaking? :smirk:

I’m firing up a new server now and going to run a beta node. I’ll collaborate with you after and we can compare numbers and data. :nerd_face:

1 Like

If I understand correctly on a fullnode with lite database, I will not be able to pull balances or create transactions in the end?

check my comment here

I sync a fullnode too and it took me 65GB to latest block.

1 Like

My test node has been running for a while now. Roughly 65% of the way done. I’m seeing high usage of cache on the RAM. Occasional disk I/O bursts. I’m curious how much RAM usage will drop down when Sync State reaches Latest.

Side note: The devs are still working on providing bootstrap files for these new database modes, although we should have them ready before these go live.

MODE MEM USAGE MEM CACHED HEIGHT SPACE
lite_ff 1.71 GiB 14.3 GiB 1,269,169 21.9 GiB

After this node finishes syncing I’m going to test out a node with batch_ff to see if I get the same results as you @fredlee.

Update: My high cache could have been due to low disk IO. This was a new server and I had to call my host to have them increase speeds to SSD.

2 Likes

Are you also noticing a substantial slowdown at higher block height? Today my nodes pretty much did 100k blocks in 14 hours.

CONTAINER ID   MEM USAGE   HEIGHT   SPACE   SPACE/BLOCK
inc_220408_1   1.245GiB    825386   35G     42kb
inc_arc        1.625GiB    1191382  90G     76kb
inc_arc_ff     3.236GiB    1202844  84G     70kb
inc_batch      6.466GiB    1172576  39G     33kb
inc_batch_ff   8.569GiB    1172674  33G     28kb
inc_lite       1.866GiB    1283860  43G     33kb
inc_lite_ff    3.611GiB    1291952  36G     28kb

Our used space differ quite a bit on lite_ff. You’re only syncing a single shard right?

Ah, I should have mentioned. This node is not yet staked so it does not have a shard assigned. I bring new nodes online with the anticipation of staking a new one.

Right, I’m a bit special, I only do fullnodes, either to keep running as full nodes or as bootstrapping validators. :relaxed:

I also figured out why it’s slowing down. I’ve started to run out of memory now and it’s swapping. (batch_ff node is using over 15GB).

Update: The batch_ff node filled the swap file, ran out of memory and restarted the process.

To see if I could free up some memory, I did a docker restart on all of my nodes, but that broke the inc_batch. It cannot start anymore, it just keeps spinning the same error.

panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
github.com/incognitochain/incognito-chain/blockchain.(*BlockChain).GetShardBlockByHashWithShardID(0xc000cde160, {0xc0, 0xbf, 0x24, 0x97, 0x22, 0xac, 0xb6, 0x3, 0x17, ...}, ...)
	/Users/autonomous/projects/incognito-chain/blockchain/accessor_blockchain.go:209 +0x49c
github.com/incognitochain/incognito-chain/blockchain.(*BlockChain).GetShardBlockByHash(0xc000cde160, {0xc0, 0xbf, 0x24, 0x97, 0x22, 0xac, 0xb6, 0x3, 0x17, ...})
	/Users/autonomous/projects/incognito-chain/blockchain/accessor_blockchain.go:252 +0x11e
github.com/incognitochain/incognito-chain/blockchain.(*BeaconBestState).countMissingSignatureV1(0xc000100800, 0xc000cde160, 0x7, {{0xc000f44840, 0x15d}, {0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...})
	/Users/autonomous/projects/incognito-chain/blockchain/beaconprocess.go:774 +0x70
github.com/incognitochain/incognito-chain/blockchain.(*BeaconBestState).countMissingSignature(0xc000100800, 0x255fc3c2663522b8, 0xf0074495a9f7183f)
	/Users/autonomous/projects/incognito-chain/blockchain/beaconprocess.go:711 +0x1b6
github.com/incognitochain/incognito-chain/blockchain.(*BeaconBestState).initMissingSignatureCounter(0xc000100800, 0x1c40b98, 0xc006742c88)
	/Users/autonomous/projects/incognito-chain/blockchain/beaconbeststate.go:926 +0x23c
github.com/incognitochain/incognito-chain/blockchain.(*BlockChain).RestoreBeaconViews(0xc000cde160)
	/Users/autonomous/projects/incognito-chain/blockchain/blockchain.go:714 +0x54b
github.com/incognitochain/incognito-chain/blockchain.(*BlockChain).InitChainState(0xc000cde160)
	/Users/autonomous/projects/incognito-chain/blockchain/blockchain.go:185 +0x1e5
github.com/incognitochain/incognito-chain/blockchain.(*BlockChain).Init(0xc000cde160, 0xc006743768)
	/Users/autonomous/projects/incognito-chain/blockchain/blockchain.go:128 +0x2fd
main.(*Server).NewServer(0xc0002f54a0, {0x7ffcc9f51cd7, 0xc}, 0xc00100bb00, {0x1c436b8, 0xc000134068}, 0xc0026c0db0, 0xa, {0x7ffcc9f51c32, 0x40}, ...)
	/Users/autonomous/projects/incognito-chain/server.go:319 +0x147b
main.mainMaster(0x0)
	/Users/autonomous/projects/incognito-chain/incognito.go:194 +0xd26
main.main()
	/Users/autonomous/projects/incognito-chain/incognito.go:269 +0x2b6

So that one is out of the race. I decided to stop all nodes for now to see if I can let inc_batch_ff finish. After restart it is back to 14GB memory at around 1220000 in block height. I take it running (or at least syncing) the batch_ff mode will require a lot of memory. What is the memory recommendations for running a batchcommit --ffstorage fullnode?

Can you stop all nodes and run a disk I/O test?

If dd is installed you can run this via terminal:

dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync

It will take a few moments to run.

dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.80874 s, 382 MB/s

Well that doesn’t seem to be the issue then. Disk I/O was ruled out. :slight_smile:

Haha, ok. This is another incognito server btw, not the Mac mini. So this thread is going off-topic into a personal test of all the new database modes.

But I forgot that /tmp is actually the wrong drive (system drive). Doing it on the SSD raid where the incognito data is i get 959MB/s writes and 1.8GB/s reads (with bonnie++).

I let the batch_ff node run alone now. After reloading the chain it’s back at 12GB, but after that it kinda eats up all the memory it can get (a lot faster than when it had to fight other nodes for resources). Is it possible to limit the memory? Maybe I can throttle the CPU to make it calculating the blocks slower? :stuck_out_tongue: