[Solved] vNode sync errors since version 20210313_3

At the moment I have left the node stalled on shard 0,2 and 6 in case the team needs more logs, testing or even comes up with a workaround or update. My mainnet is 74GiB right now. If I calculate my block heights for all shards combined, I’m roughly at 80%. That would give a full node ~92GiB. But that is just an estimate since the shards does seem to differ quite a lot in size.

     bytes    folder     block
18 141 863    beacon  [1240539]
   855 509    shard0  [ 169584]
10 956 895    shard1  [1242694]
 1 537 298    shard2  [ 433077]
11 827 457    shard3  [1239751]
 9 240 702    shard4  [1240383]
 7 978 078    shard5  [1240423]
 4 147 658    shard6  [ 909062]
12 331 384    shard7  [1242307]
3 Likes

Tried the latest release 20210531_1 unfortunately no change on this issue on current data. I have once again restarted syncing from scratch to see if it makes any difference.

This time I was getting a lot of Sync too fast all the time and syncing was just abysmal.

2021-05-31 11:22:39.866 connmanager.go:398 [INF] Peerv2 log: [SyncBeacon] from 7707 to 1242967 
2021-05-31 11:22:39.866 connmanager.go:470 [INF] Peerv2 log: [stream] Request Block type BlkBc from peer  from cID 255, [7707 1242967] 
2021-05-31 11:22:39.866 blockrequester.go:235 [INF] Peerv2 log: [stream] Requesting stream block type BlkBc, spec false, height [7707..1242967] len 2, from 255 to 255, uuid = 5acadffb-cc8e-4009-a54b-6a264d72b7b7
2021-05-31 11:22:39.866 connmanager.go:490 [ERR] Peerv2 log: [stream] rpc error: code = Canceled desc = context canceled
2021-05-31 11:22:42.222 syncker.go:164 [INF] Syncker log : syncker: receive beacon block 1242969 

2021-05-31 11:22:42.264 connmanager.go:490 [ERR] Peerv2 log: [stream] rpc error: code = Unknown desc = Sync too fast, last time sync blocks of CID 255 from height 7707 is 2021-05-31 11:22:35.550777654 +0000 UTC
2021-05-31 11:22:42.264 connmanager.go:398 [INF] Peerv2 log: [SyncBeacon] from 7707 to 1242967 
2021-05-31 11:22:42.264 connmanager.go:470 [INF] Peerv2 log: [stream] Request Block type BlkBc from peer  from cID 255, [7707 1242967] 
2021-05-31 11:22:42.265 blockrequester.go:235 [INF] Peerv2 log: [stream] Requesting stream block type BlkBc, spec false, height [7707..1242967] len 2, from 255 to 255, uuid = 7d06bd33-c5f0-4916-b6c4-c93e635e56e3
2021-05-31 11:22:42.265 syncker.go:164 [INF] Syncker log : syncker: receive beacon block 1242969 
2021-05-31 11:22:42.266 syncker.go:164 [INF] Syncker log : syncker: receive beacon block 1242969 

Why does this happen?

I stopped everything, restarted, and then the problem went away. Now syncing at normal speed. I’ll update the post once shards start to sync up to the infamous blocks.

1 Like

Hi @fredlee ,

I see the problem you’re having is “not running Geth correctly”. So would you mind checking again for your Geth. If you’re running node in docker environment, you can try to docker inspect and send the log of it to us

Thanks

2 Likes

Thank you for the reply. What is the geth supposed to be then? Why does this only affects a handful of blocks in 3 of the 8 shards and none in the beacon chain?

I am using the inc_node_installer.sh script and I only changed my infura api value and validator key.

Is the port supposed to be 80 even tho the protocol is https?

Taken from docker inspect:

   "GETH_NAME=mainnet.infura.io/v3/26e65277...",
   "GETH_PROTOCOL=https",
   "GETH_PORT=80",

I’ll send the full text to you as well (with the correct infura hash) =)

1 Like

Thanks to @trungtin2qn1 I got it to continue syncing shard 0, 2 and 6 again!!!

If I understand correctly, when syncing, there are only certain blocks that need to be looked up with GETH? If it is not configured correctly or if the call fails, then we get the error:

-1007: Verify proof and parse receipt%!!(MISSING)(EXTRA []interface {}=[]) invalid character ‘i’ looking for beginning of value

Which falls through into things like

-1051: Instruction Hash Error
Expect instruction hash to be 711455acaa89a9f8d3baa2e586dfd29b8… but get 000000000000000000000000000… at block 909063

In my original setup, I am not sure what was set wrong, but after contacting support they told me to follow the official instructions and set GETHPORT to 80.

Turns out the setup instructions are wrong? Accessing infura.io with https on port 80 seems to be a bad idea. @trungtin2qn1 told me to go back to -e GETH_PORT=, and with the latest release, it works just fine!

For anyone else troubleshooting. docker inspect inc_mainnet is a good way to make sure that the “Env” variables have been set correctly and not containing any extra characters or incorrect strings.

Mine looks like this now.

            "Env": [
                "MININGKEY=12NJX93MjCm8Z1wrKDNPLsS3zLN8...",
                "LIMIT_FEE=1",
                "NODE_PORT=9400",
                "RPC_PORT=8400",
                "BOOTNODE_IP=mainnet-bootnode.incognito.org:9330",
                "GETH_PROTOCOL=https",
                "GETH_PORT=",
                "FULLNODE=1",
                "GETH_NAME=mainnet.infura.io/v3/26e7865275720d452...",
                "TESTNET=false",
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],

@consensus:
May I, first of all, suggest better error handling when GetETHHeader or GetMostRecentETHBlockHeight fails. As it is right now, there is quite a stretch to understanding that the error on just a handful of blocks in the entire chain has anything to do with the infura setup.

You may also want to update this post if port 80 is wrong. As well as the source file on github.

I’ll let it all run now and see if it syncs up to latest block on all shards.

1 Like

Thanks for your feedback, we will consider for updating with your advises. Your real problem is port 80 is the default port for HTTP protocol but infura is using HTTPS so it should be default port is 443. But we can by pass that by just replace port by an empty string “”. Besides you should not public mining key and your infura api key like that. it is not safety, someone can use it for bad behavior.
And one last things, would you mind close the issue which you have opened on GitHub.

Yeah, I thought that was kinda strange. But @Rocky told me in his PM that I should set it to 80 instead of “”. That in combination with sync working in release 20210302_01 (without GETH at all apparently), threw me off the problem, and I kept on using 80 for the rest of my tests.

Oh, and that is not my real mining key and infura API key. They were just for testing, but yes, it might set a bad example for others. Do not post your keys! :upside_down_face:

@Josh_Hamon Thanks for carrying the ball over to github issues. Double-check and see if all your GETH_ parameters are correct. If this solves your issue with shard 2 as well, I think we can consider this coming down to just lack of good error messages and documentation. :blush:

1 Like

@fredlee, I suggest setting GETH env vars as follows should be working for you:

GETH_PROTOCOL=
GETH_PORT=
GETH_NAME=https://mainnet.infura.io/v3/26e7865275720d452...

By setting GETH_NAME to a full url, you would not need to care about GETH_PROTOCOL and GETH_PORT. thanks!

1 Like

Oh! Yeah, you’re right, they’re only concated together.

func BuildRPCServerAddress(protocol string, host string, port string) string {
	url := host
	if protocol != "" {
		url = protocol + "://" + url
	}
	if port != "" {
		url = url + ":" + port
	}
	return url
}

You still have to define all three tho, because the defaults are HTTP and 8545.

:relaxed:

Yes, hope this could help solve your issues, I also asked @Rocky to update his post a bit to make it clearer. Sorry for the inconvenience!

I think it will, it’s still syncing shard 0, 2 and 6. But it looks good so far. Once it’s done, I’ll confirm and change the topic to [Solved]. I’d also like to know if this might solve the issue for:

@Josh_Hamon Shard 2 Stalling at 433077 on Multiple vNodes
@pfrp Shard6 with sync problem

Let me know guys.

1 Like

I’m starting fresh and syncing the beacon

Hi @fredlee, I have set GETH_NAME to infure.io as instructed by duc and shard6 is syncing again. Local light ethereum node is not working anymore ? I have checked connection to parity container on port 8545 and it is working fine.

Thank you,

Beacon stalled. Restarting

Beacon, that’s odd, never had problems with it. Check for errors if it keeps stalling.

I’m about 200k blocks from a fully synced node. Knock on wood.

2 Likes

Full node synced!

Resulted in around 280 requests to infura.io in total coming from shards 0,2 and 6. Still not sure how this works and why 1,3,4,5,7 and beacon requires no GETH? But I guess that’s outside of the scope of this topic. :blush:

@Mike_Wagner your guess was close. Full node is 112GiB of shard and beacon block data at this time.

Once again, thank you @trungtin2qn1 for helping me figure out what was wrong in the setup.

3 Likes

I would like the answer to this question. Im running my nodes through jservers, and they are not running this part of the code. They said it is not necessary.

hey @fredlee, I am not getting any requests to infura.io even though I’ve wiped and re-synced multiple vnodes. I’ve updated the GETH vars per the info from @duc. Any ideas what I’m doing wrong? Can the same Infura API be used for all nodes?

Are you syncing any shard blocks? Getting any stalls or errors?

Try running FULLNODE=1 on one and check the logs to see if it gets past block 169584 on shard0, that’s the lowest block I found that seemed to require GETH.

I would guess that it works yes, but I have not tested it yet. I’m only running one node so far. It’s in committee in 8 epochs. I want to see how that goes before deciding on running more than one. :relaxed:

I think this is fixed now and it does appear you can run one infura api with multiple nodes. Thanks again for the help.

2 Likes