Shard 2 Stalling at 433077 on Multiple vNodes

Right, might add that running 20210302_1 up to block 1M and then switching over to the latest release works fine. Shard synced all the way up to current block.

I am now running a new test with a full node to check all shards. I have made a new fresh install and running the latest recommended script (How to setup your own node in a blink of an eye). It’s not done yet, but I can tell you it’s not looking good so far. I have blocks with errors on multiple shards. I’ll make a post when all shards are done or stalled.

For those keeping score at home, I am still unable to get shards 0,2 or 6 to sync fully on a validator. I see there’s a new tag 20210622_1 which I’m trying now

Well … at least whatever has been updated fixed the slow sync issue I’ve been (casually) observing for about a week. After an update last week (20210617_1?), one of my pNodes slowed to a crawl on beacon/shard syncing. The pNode was literally in the middle of a sync and saw sync speed instantly drop by ~75%. Was only syncing about ~250,000 blocks per day, if that.

Then whatever change was pushed yesterday broke all my other pNodes, similar to what Devenus observed.

The update today (20210622_1) has restored syncing at a reasonable rate again. The pNode that suddenly couldn’t sync more than ~250,000 in a day, is already up to blockheight ~450,000 in a few hours. Last week that took nearly two full days.

Hopefully the beacon chain syncs will be caught up by tomorrow and I’ll be syncing assigned shard chains thereafter.


Or not.

So far today – one pNode has started resyncing from scratch … again. Another one has been stalled near the current blockheight for nearly an hour, and is now reporting offline in the Node Monitor. I expect it too will start resyncing from scratch – again – shortly. <SIGH>

update: Yep, the stalled one started over AGAIN.

So at least two nodes started a resync from 0 yesterday, synced up to the current blockheight, then inexplicably stalled near the current blockheight and have now started yet another resync from 0 in a ~24-hour period. RIP monthly ISP bandwidth cap.

On the new image and shard 0, I stalled earlier than normal at block 63902

VERY glad that they didn’t implement slashing yet. Any thoughts @support?

You are not one) my two vnodes also hung up at block 63902

Hey all,

I want to share my experience here. I stopped all of my Incognito dockers, and followed 3rd (infura account) and 4th ( script) steps here (How to host a Virtual Node). My vNodes run flawlessly (no stall, no offline) for at least 3 days.

Btw, may be wrong. Please fix it as it is written here: How to host a Virtual Node

@Josh_Hamon @zes333 My pNodes finally resynced the beacon chain (third time’s the charm, I guess) and have started syncing Shard 0. The Shard 0 blockheight for each is currently above 900,000.

These two are each on 20210622_1: image



@abduraman Didn’t need to make changes to scripts or config parameters (not that I could even if I wanted to – these are pNodes).



@Mike_Wagner I agree with you. My experience sharing was not an answer to the concerns about pNodes above. I wrote here since the topic title writes “… Multiple vNodes”.


Unfortunately, I don’t use such a script. I’ve created another script for each node.

If only I knew why

You have been able to fully sync shard 0, 2 & 6?

I am using an infura account, but only 3 calls have been made to it.

Per @fredlee that’s not required, but to confirm I’ve asked @rocky in his setup post.

I have no node syncing shard 0 but 2 and 6 are ok.

Yeah, I run multiple nodes on the same infura. I recently changed to a node.js script managing my validators, but I still have an old shell script that runs two nodes on the same machine. It looks a lot like your script, but yours is cleaner, because I didn’t think of looping through array keys instead of values. :blush:

  for validator_key in "${validators[@]}"; do
    rpc_port=$(($first_rpc_port + i))
    node_port=$(($first_node_port + i))

    echo "Starting inc_validator_$i container on $node_port (RPC $rpc_port)"

    set -x
    docker run --restart=always --net inc_net -p $node_port:$node_port -p $rpc_port:$rpc_port \
        -e NODE_PORT=$node_port -e RPC_PORT=$rpc_port -e BOOTNODE_IP=$bootnode \
        -e GETH_NAME=$geth_name -e GETH_PROTOCOL= -e GETH_PORT= -e FULLNODE= \
        -e MININGKEY=${validator_key} -e TESTNET=false -e LIMIT_FEE=1 \
        -v ${data_dir}:/data -d --name inc_validator_$i incognitochain/incognito-mainnet:${latest_tag}
    set +x

Don’t forget to set the empty -e GETH_PROTOCOL= -e GETH_PORT= because it will append the default values if it’s not set at all and end up with http:// It’s quite an ugly piece of code with no checks. :face_with_hand_over_mouth:

(What´s your -itd for?)

I forked this from @mesquka, so I’m not 100% but I think it might have been intended as -it -d? Per a quick search:


docker run -it -d --name container_name image_name bash

The above command will create a new container with the specified name from the specified docker image. The container name is optional.

  • The -i option means that it will be interactive mode (you can enter commands to it)
  • The -t option gives you a terminal (so that you can use it as if you used ssh to enter the container).
  • The -d option (daemon mode) keeps the container running in the background.
  • bash is the command it runs.

Though think combining all three options into one flag isn’t an issue here.

I don’t have -e GETH_PROTOCOL= -e GETH_PORT= but will give that a try.

UPDATE: Still seeing if shard syncing will make it past the roadblock but with @fredlee’s change I’m already seeing calls to infura, so I’m hopeful.

UPDATE: On Shard0 I’m past the roadblock by adding in the code suggested above. This is using the forked script I mentioned above and the image from 06/26. Later today I’ll work on trying it with other nodes.

Nah -itd is probably no issue at all. I just wondered what it was for. :relaxed:

I found the problem of getting stuck in shard 63902 according to your script) If you are interested, knock on the PM)

Now I’m stalling in different places:

  • shard0 - 63902
  • shard2 - 293816

But not every time

Updating myself here, do not run the latest version with -e GETH_PROTOCOL= and full url in GETH_NAME

