(Kind of solved) Problem starting multiple nodes at the same time

abduraman · 17 April 2022 10:49

Then he may have some enemies Unless he hasn’t any exclusive ISP, the network is the only parameter he cannot control. I think I have found the problem: His network has gone bad

fredlee · 17 April 2022 13:17

Well, that’s the thing. Up until recently, I was hosting 33 nodes on a single 2011 Mac mini, 16GB, 1TB SSD. But now with the new version and committee changes, I am facing performance issues and my nodes are getting bulk slashed.

I’ve been trying to find something stable, but have problems starting even 2 nodes at the same time. If I stagger the starts by 15 minutes, I manged to run 4 nodes, but when the 5th starts, everything stops and they all go offline.

I’m giving up on my Mac mini incognito project for now, it had a good run, for over a year it ran both a full node and was validating on all shards.

So now 33 of my nodes have no home, I can retire them, move them to another server, or build a new glorious multi-node project. Have not decided what to do with them yet.

SPAddict25 · 17 April 2022 16:29

Go for glory my friend. An epic multi node project. 33 nodes on a single 2011 Mac mini is pretty cool .

Jared · 17 April 2022 18:58

You want to allocate 2GB RAM per node. Could you upgrade the RAM?

When you start them up check the node monitor (monitor.incognito.org) first to make sure it says Sync Status Latest before you start up the next node.

How do you manage so many on such a small storage space? Do you use hard links?

With the resource improvements the devs recently released there is still hope for your set up. Lets wait and give it a shot again.

SPAddict25 · 18 April 2022 01:49

I know it may depend how many are in committee. But the part I’m always uncertain on is the number of cores given X nodes on a system. If Fredlee was previously running 33 nodes on a 2011 mini mac suggests to me cpu / cores isn’t really an issue and the SSD and RAM are the ones to focus on.

Jared · 18 April 2022 02:08

The new node database mode being worked on should drastically help lower resource usage.

I already spoke to the dev in charge of this and we will try to offer a bootstrap version so node operators can quickly change over.

SPAddict25 · 18 April 2022 11:01

A bootstrap would be greatly appreciated

fredlee · 18 April 2022 23:08

It’s unfortunately a hardware limitation on the 2011 model. Like you were hinting when we talked, it looks like the memory is the biggest issue, but it also goes hand in hand with storage reaching it’s limit. So basically, with the beacon and shard growth over the past year, I finally hit an upper limit what the hardware could handle.

Kind of, I use ZFS and snapshot clones. I also experimented with dedupe between nodes, but that does not work because each nodes database files ends up with unique data.

Looking forward to the new improvements. What I have noticed on the current database design, if I bootstrap from one node to another, when I then boot the new node, it rewrites about 30GB of data during the integrity check in the start. I imagine it updates some data and therefore rewrites the database pages? (just guessing here). Do you know if this is different in the new storage design? I imagine that if it can avoid updating or rewriting the big immutable block data, cloning or hard linking should work even better.

Yes. That is correct, during normal operation it’s very light on the cpu load because of the relatively slow rate of blocks. You really only strain the CPU during shard resyncing or node restarts.

Jared · 18 April 2022 23:29

Have you tried using this method for hard links? It was designed and coded by a community member:

I use it to manage a large range of nodes and it works well for me.

fredlee · 18 April 2022 23:52

Nope, but I saw that one, checked the source and it looked really good.

Unfortunately I would not be able to use it because it would kill my nodes doing this.

    console.group("\nStarting containers.");
    for (const nodeIndex of allNodesIndex)
      console.log(await docker(["container", "start", `inc_mainnet_${nodeIndex}`], (v) => v.split("\n")[0]));

I wonder tho, are the ldb files immutable? Is it 100% sure that old ldb files are never written to? Or is it more of a, “seems to work fine for me”?

Jared · 18 April 2022 23:53

I believe the old files are not written to. Just checked and confirmed they are there.

khanhj · 19 April 2022 08:16

Can you try this @fredlee

For Validator, you can choose between Lite or Batch-commit
For Fullnode, you can choose Batch-commit or Archive

fredlee · 20 April 2022 13:15

Testing this now on a new machine.

If I understand correctly on a fullnode with lite database, I will not be able to pull balances or create transactions in the end? But is it still ok to sync and use that as a bootstrap for validators?

While syncing in lite_ff mode I noticed it slowed down a lot around 1700000 blocks in, and it was eating a lot of memory (>4GB). Being a completist I decided to restart testing all modes in parallel.

1 hour into my new test I noticed that the new version seems to be faster than the 20220408_1 version. In all database modes. But it’s quite heavy on the memory, especially with ffstorage enabled.

CONTAINER ID   MEM USAGE   HEIGHT   SPACE   SPACE/BLOCK
inc_220408_1   397.9MiB    40453    1.1G	27kb
inc_arc        615.8MiB    63492    1.7G	27kb
inc_arc_ff     2.145GiB    61619    1.5G	24kb
inc_batch      1.252GiB    59191    917M	15kb
inc_batch_ff   2.692GiB    56996    734M	13kb
inc_lite       596.3MiB    66788    1.2G	18kb
inc_lite_ff    2.140GiB    63964    881M	14kb

I know you’ve always recommended 2GB per node before, but in reality they have used less than 500MiB when running as validators. I hope this is not going to 4x or more going forward.

Update, after 5 hours syncing:

CONTAINER ID   MEM USAGE   HEIGHT   SPACE   SPACE/BLOCK
inc_220408_1   868MiB      122697   3.5G	28kb
inc_arc        742MiB      204041   7.7G	38kb
inc_arc_ff     2.463GiB    196196   6.8G	35kb
inc_batch      2.318GiB    178799   4.0G	23kb
inc_batch_ff   4.17GiB     178611   3.3G	18kb
inc_lite       709MiB      213209   4.5G	21kb
inc_lite_ff    2.435GiB    203428   3.6G	18kb

Jared · 20 April 2022 13:19

The RAM usage should settle down quite a bit once they reach Sync State Latest. I’ve been discussing with the devs about providing bootstrap data for these new node methods, so hopefully, we can provide that soon.

fredlee · 21 April 2022 06:44

Thanks, we’ll see how it looks later in a day or two when it has synced everything.

This is after 22 hours

CONTAINER ID   MEM USAGE   HEIGHT   SPACE   SPACE/BLOCK
inc_220408_1   1.043GiB    380780    12G	32kb
inc_arc        955.8MiB    684379    32G	47kb
inc_arc_ff     2.85GiB     655377    27G	41kb
inc_batch      5.775GiB    611572    16G	26kb
inc_batch_ff   7.447GiB    596533    13G	22kb
inc_lite       913.9MiB    729802    18G	25kb
inc_lite_ff    2.736GiB    678622    14G	21kb

> 7GB, you sure it’s not leaking?

Jared · 21 April 2022 07:01

I’m firing up a new server now and going to run a beta node. I’ll collaborate with you after and we can compare numbers and data.

khanhj · 22 April 2022 02:53

If I understand correctly on a fullnode with lite database, I will not be able to pull balances or create transactions in the end?

check my comment here

I sync a fullnode too and it took me 65GB to latest block.

Jared · 22 April 2022 14:02

My test node has been running for a while now. Roughly 65% of the way done. I’m seeing high usage of cache on the RAM. Occasional disk I/O bursts. I’m curious how much RAM usage will drop down when Sync State reaches Latest.

Side note: The devs are still working on providing bootstrap files for these new database modes, although we should have them ready before these go live.

MODE	MEM USAGE	MEM CACHED	HEIGHT	SPACE
lite_ff	1.71 GiB	14.3 GiB	1,269,169	21.9 GiB

After this node finishes syncing I’m going to test out a node with batch_ff to see if I get the same results as you @fredlee.

Update: My high cache could have been due to low disk IO. This was a new server and I had to call my host to have them increase speeds to SSD.

fredlee · 22 April 2022 23:37

Are you also noticing a substantial slowdown at higher block height? Today my nodes pretty much did 100k blocks in 14 hours.

CONTAINER ID   MEM USAGE   HEIGHT   SPACE   SPACE/BLOCK
inc_220408_1   1.245GiB    825386   35G     42kb
inc_arc        1.625GiB    1191382  90G     76kb
inc_arc_ff     3.236GiB    1202844  84G     70kb
inc_batch      6.466GiB    1172576  39G     33kb
inc_batch_ff   8.569GiB    1172674  33G     28kb
inc_lite       1.866GiB    1283860  43G     33kb
inc_lite_ff    3.611GiB    1291952  36G     28kb

Our used space differ quite a bit on lite_ff. You’re only syncing a single shard right?

Jared · 23 April 2022 03:57

Ah, I should have mentioned. This node is not yet staked so it does not have a shard assigned. I bring new nodes online with the anticipation of staking a new one.