The new node database mode being worked on should drastically help lower resource usage.
I already spoke to the dev in charge of this and we will try to offer a bootstrap version so node operators can quickly change over.
The new node database mode being worked on should drastically help lower resource usage.
I already spoke to the dev in charge of this and we will try to offer a bootstrap version so node operators can quickly change over.
A bootstrap would be greatly appreciated
It’s unfortunately a hardware limitation on the 2011 model. Like you were hinting when we talked, it looks like the memory is the biggest issue, but it also goes hand in hand with storage reaching it’s limit. So basically, with the beacon and shard growth over the past year, I finally hit an upper limit what the hardware could handle.
Kind of, I use ZFS and snapshot clones. I also experimented with dedupe between nodes, but that does not work because each nodes database files ends up with unique data.
Looking forward to the new improvements. What I have noticed on the current database design, if I bootstrap from one node to another, when I then boot the new node, it rewrites about 30GB of data during the integrity check in the start. I imagine it updates some data and therefore rewrites the database pages? (just guessing here). Do you know if this is different in the new storage design? I imagine that if it can avoid updating or rewriting the big immutable block data, cloning or hard linking should work even better.
Yes. That is correct, during normal operation it’s very light on the cpu load because of the relatively slow rate of blocks. You really only strain the CPU during shard resyncing or node restarts.
Have you tried using this method for hard links? It was designed and coded by a community member:
I use it to manage a large range of nodes and it works well for me.
Nope, but I saw that one, checked the source and it looked really good.
Unfortunately I would not be able to use it because it would kill my nodes doing this.
console.group("\nStarting containers.");
for (const nodeIndex of allNodesIndex)
console.log(await docker(["container", "start", `inc_mainnet_${nodeIndex}`], (v) => v.split("\n")[0]));
I wonder tho, are the ldb files immutable? Is it 100% sure that old ldb files are never written to? Or is it more of a, “seems to work fine for me”?
I believe the old files are not written to. Just checked and confirmed they are there.
Can you try this @fredlee
For Validator, you can choose between Lite or Batch-commit
For Fullnode, you can choose Batch-commit or Archive
Testing this now on a new machine.
If I understand correctly on a fullnode with lite database, I will not be able to pull balances or create transactions in the end? But is it still ok to sync and use that as a bootstrap for validators?
While syncing in lite_ff mode I noticed it slowed down a lot around 1700000 blocks in, and it was eating a lot of memory (>4GB). Being a completist I decided to restart testing all modes in parallel.
1 hour into my new test I noticed that the new version seems to be faster than the 20220408_1 version. In all database modes. But it’s quite heavy on the memory, especially with ffstorage enabled.
CONTAINER ID MEM USAGE HEIGHT SPACE SPACE/BLOCK
inc_220408_1 397.9MiB 40453 1.1G 27kb
inc_arc 615.8MiB 63492 1.7G 27kb
inc_arc_ff 2.145GiB 61619 1.5G 24kb
inc_batch 1.252GiB 59191 917M 15kb
inc_batch_ff 2.692GiB 56996 734M 13kb
inc_lite 596.3MiB 66788 1.2G 18kb
inc_lite_ff 2.140GiB 63964 881M 14kb
I know you’ve always recommended 2GB per node before, but in reality they have used less than 500MiB when running as validators. I hope this is not going to 4x or more going forward.
Update, after 5 hours syncing:
CONTAINER ID MEM USAGE HEIGHT SPACE SPACE/BLOCK
inc_220408_1 868MiB 122697 3.5G 28kb
inc_arc 742MiB 204041 7.7G 38kb
inc_arc_ff 2.463GiB 196196 6.8G 35kb
inc_batch 2.318GiB 178799 4.0G 23kb
inc_batch_ff 4.17GiB 178611 3.3G 18kb
inc_lite 709MiB 213209 4.5G 21kb
inc_lite_ff 2.435GiB 203428 3.6G 18kb
The RAM usage should settle down quite a bit once they reach Sync State Latest
. I’ve been discussing with the devs about providing bootstrap data for these new node methods, so hopefully, we can provide that soon.
Thanks, we’ll see how it looks later in a day or two when it has synced everything.
This is after 22 hours
CONTAINER ID MEM USAGE HEIGHT SPACE SPACE/BLOCK
inc_220408_1 1.043GiB 380780 12G 32kb
inc_arc 955.8MiB 684379 32G 47kb
inc_arc_ff 2.85GiB 655377 27G 41kb
inc_batch 5.775GiB 611572 16G 26kb
inc_batch_ff 7.447GiB 596533 13G 22kb
inc_lite 913.9MiB 729802 18G 25kb
inc_lite_ff 2.736GiB 678622 14G 21kb
> 7GB, you sure it’s not leaking?
I’m firing up a new server now and going to run a beta node. I’ll collaborate with you after and we can compare numbers and data.
If I understand correctly on a fullnode with lite database, I will not be able to pull balances or create transactions in the end?
check my comment here
I sync a fullnode too and it took me 65GB to latest block.
My test node has been running for a while now. Roughly 65% of the way done. I’m seeing high usage of cache on the RAM. Occasional disk I/O bursts. I’m curious how much RAM usage will drop down when Sync State reaches Latest
.
Side note:
The devs are still working on providing bootstrap files for these new database modes, although we should have them ready before these go live.
MODE | MEM USAGE | MEM CACHED | HEIGHT | SPACE |
---|---|---|---|---|
lite_ff | 1.71 GiB | 14.3 GiB | 1,269,169 | 21.9 GiB |
After this node finishes syncing I’m going to test out a node with batch_ff
to see if I get the same results as you @fredlee.
Update:
My high cache could have been due to low disk IO. This was a new server and I had to call my host to have them increase speeds to SSD.
Are you also noticing a substantial slowdown at higher block height? Today my nodes pretty much did 100k blocks in 14 hours.
CONTAINER ID MEM USAGE HEIGHT SPACE SPACE/BLOCK
inc_220408_1 1.245GiB 825386 35G 42kb
inc_arc 1.625GiB 1191382 90G 76kb
inc_arc_ff 3.236GiB 1202844 84G 70kb
inc_batch 6.466GiB 1172576 39G 33kb
inc_batch_ff 8.569GiB 1172674 33G 28kb
inc_lite 1.866GiB 1283860 43G 33kb
inc_lite_ff 3.611GiB 1291952 36G 28kb
Our used space differ quite a bit on lite_ff. You’re only syncing a single shard right?
Ah, I should have mentioned. This node is not yet staked so it does not have a shard assigned. I bring new nodes online with the anticipation of staking a new one.
Right, I’m a bit special, I only do fullnodes, either to keep running as full nodes or as bootstrapping validators.
I also figured out why it’s slowing down. I’ve started to run out of memory now and it’s swapping. (batch_ff node is using over 15GB).
Update: The batch_ff node filled the swap file, ran out of memory and restarted the process.
To see if I could free up some memory, I did a docker restart
on all of my nodes, but that broke the inc_batch
. It cannot start anymore, it just keeps spinning the same error.
panic: runtime error: index out of range [0] with length 0
goroutine 1 [running]:
github.com/incognitochain/incognito-chain/blockchain.(*BlockChain).GetShardBlockByHashWithShardID(0xc000cde160, {0xc0, 0xbf, 0x24, 0x97, 0x22, 0xac, 0xb6, 0x3, 0x17, ...}, ...)
/Users/autonomous/projects/incognito-chain/blockchain/accessor_blockchain.go:209 +0x49c
github.com/incognitochain/incognito-chain/blockchain.(*BlockChain).GetShardBlockByHash(0xc000cde160, {0xc0, 0xbf, 0x24, 0x97, 0x22, 0xac, 0xb6, 0x3, 0x17, ...})
/Users/autonomous/projects/incognito-chain/blockchain/accessor_blockchain.go:252 +0x11e
github.com/incognitochain/incognito-chain/blockchain.(*BeaconBestState).countMissingSignatureV1(0xc000100800, 0xc000cde160, 0x7, {{0xc000f44840, 0x15d}, {0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...})
/Users/autonomous/projects/incognito-chain/blockchain/beaconprocess.go:774 +0x70
github.com/incognitochain/incognito-chain/blockchain.(*BeaconBestState).countMissingSignature(0xc000100800, 0x255fc3c2663522b8, 0xf0074495a9f7183f)
/Users/autonomous/projects/incognito-chain/blockchain/beaconprocess.go:711 +0x1b6
github.com/incognitochain/incognito-chain/blockchain.(*BeaconBestState).initMissingSignatureCounter(0xc000100800, 0x1c40b98, 0xc006742c88)
/Users/autonomous/projects/incognito-chain/blockchain/beaconbeststate.go:926 +0x23c
github.com/incognitochain/incognito-chain/blockchain.(*BlockChain).RestoreBeaconViews(0xc000cde160)
/Users/autonomous/projects/incognito-chain/blockchain/blockchain.go:714 +0x54b
github.com/incognitochain/incognito-chain/blockchain.(*BlockChain).InitChainState(0xc000cde160)
/Users/autonomous/projects/incognito-chain/blockchain/blockchain.go:185 +0x1e5
github.com/incognitochain/incognito-chain/blockchain.(*BlockChain).Init(0xc000cde160, 0xc006743768)
/Users/autonomous/projects/incognito-chain/blockchain/blockchain.go:128 +0x2fd
main.(*Server).NewServer(0xc0002f54a0, {0x7ffcc9f51cd7, 0xc}, 0xc00100bb00, {0x1c436b8, 0xc000134068}, 0xc0026c0db0, 0xa, {0x7ffcc9f51c32, 0x40}, ...)
/Users/autonomous/projects/incognito-chain/server.go:319 +0x147b
main.mainMaster(0x0)
/Users/autonomous/projects/incognito-chain/incognito.go:194 +0xd26
main.main()
/Users/autonomous/projects/incognito-chain/incognito.go:269 +0x2b6
So that one is out of the race. I decided to stop all nodes for now to see if I can let inc_batch_ff
finish. After restart it is back to 14GB memory at around 1220000 in block height. I take it running (or at least syncing) the batch_ff mode will require a lot of memory. What is the memory recommendations for running a batchcommit --ffstorage fullnode?
Can you stop all nodes and run a disk I/O test?
If dd is installed you can run this via terminal:
dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
It will take a few moments to run.
dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.80874 s, 382 MB/s
Well that doesn’t seem to be the issue then. Disk I/O was ruled out.
Haha, ok. This is another incognito server btw, not the Mac mini. So this thread is going off-topic into a personal test of all the new database modes.
But I forgot that /tmp is actually the wrong drive (system drive). Doing it on the SSD raid where the incognito data is i get 959MB/s
writes and 1.8GB/s
reads (with bonnie++).
I let the batch_ff
node run alone now. After reloading the chain it’s back at 12GB, but after that it kinda eats up all the memory it can get (a lot faster than when it had to fight other nodes for resources). Is it possible to limit the memory? Maybe I can throttle the CPU to make it calculating the blocks slower?