[Resolved] Shard stalls are back in 20210630_1

Hey @support,

I’m pretty sure you broke something. All my validator and full nodes were running fine until the 20210630_1 update. Now I see that my nodes stalled on shard 0 at 1308064 and shard 6 at 1307476. Looking at infura history, it stopped recieving requests on the same date. I have not changed anything in my setup.

Rolling back to docker image 20210626_1, I can see infura calls again and the shard starts syncing as they should.

Don’t you run any regression tests before pushing releases?

1 Like

We actually did, all nodes running by Incognito team are fine now (You know it’s over 500 nodes). I also talked to a few node owners in the community, their nodes are working well as well. So please give us your log files so that we can investigate the issue. Thanks.

1 Like

Hey @duc. I’m experiencing the exact same issues listed above with many of my nodes, including zero infura calls.

Yikes, sorry for the issue @fredlee, @JG20, can you guys give me the log or a part of log so that I can understand where the issue comes.

Then can you please share how you run these nodes? Do they use infura? What are the exact parameters? What linux distribution do they run under? Feels like we have to trial and error things with our nodes because we lack proper instructions.

I found the error with the latest version in my logs:

2021-07-03 23:59:02.975 shardproducer.go:901 [ERR] BlockChain log: Build Request Action Error -1006: Build request action error%!!(MISSING)(EXTRA []interface {}=[]) -1007: Verify proof and parse receipt%!!(MISSING)(EXTRA []interface {}=[]) Post https://https//mainnet.infura.io/v3/26e652...: dial tcp: lookup https on 127.0.0.11:53: server misbehaving
Verify proof and parse receipt

As you see it’s trying to access https://https//mainnet.infura.io. My docker variables are.

"Env": [
                "BOOTNODE_IP=mainnet-bootnode.incognito.org:9330",
                "GETH_NAME=https://mainnet.infura.io/v3/26e65277...",
                "GETH_PROTOCOL=",
                "NODE_PORT=9500",
                "TESTNET=false",
                "LIMIT_FEE=1",
                "GETH_PORT=",
                "FULLNODE=1",
                "MININGKEY=",
                "RPC_PORT=8500",
            ],

Running GETH_NAME with full URL and blank GETH_PROTOCOL and GETH_PORT as per instructions in the official inc_node_installer script https://github.com/incognitochain/incognito-chain/blob/e8f10e6ccff44c9a9c8dc73ee927a66119c2eea4/bin/inc_node_installer.sh

2 Likes

Looking through commit history I can see that you have changed how ENV variables are read and that providing empty strings for the geth parameters is now ignored.

This means that full URL in GETH_NAME is now a bad thing. The correct method is

GETH_PROTOCOL="https"
GETH_NAME="mainnet.infura.io/v3/26e65277..."
GETH_PORT=""

Seems to work. Shards are back syncing. Can you confirm this change and change the official instructions please?

I highly doubt that the GETH_PORT is needed anymore since empty strings are ignored? Should they be ignored?

2 Likes

Oh I see, the problem came from the default params.
If we just set value for the GETH_NAME and leave empty for GETH_PROTOCOL, GETH_PORT, it would get default values from the file above (GETH_PROTOCOL in this case) and caused the redundant https.
I’ll update the params file and build a new docker image so that it will work for nodes having the issue (and node owners won’t need to take any action like what you did).
Thanks again for reporting the issue.

Cool. I will keep my change, I think removing protocol from the GETH_NAME and set it with GETH_PROTOCOL is cleaner as long as that variable exists.

If we’re being picky…

geth_param:
  host: "kovan.infura.io/v3/1138a1e99b154b10bae5c382ad894361"
  protocol: "https"
  port: ""

That’s not a host, that’s a host with path. :wink:

My vote would have been to remove them all and define a GETH_URL instead. That’s basically what it all ends up as inside the BuildRPCServerAddress function that just slaps them all together into an URL (protocol + host + port, ends up as an invalid url if you define a port and the host contains a path btw).

protocol://host:port/path, something that everyone understands, leaves no room for conflicting parameters, and the net/http package already handles it correctly (defaults to port 443 for https etc.)

your default would then be

geth_url: "https://kovan.infura.io/v3/1138a1e99b154b10bae5c382ad894361"

and cause no problems for someone that wants to run “http://localhost:1234” or whatever.

3 Likes

I just pushed another tag (20210630_2).
For those who are having the issue with GETH, if your nodes pull the latest tag automatically, it should be syncing fine by now. (cc @JG20 )

Yes! Seems to be syncing and pulling the infura calls now. Thanks!

With this new tag, should we have
GETH_NAME and GETH_PROTOCOL or not?

Both should work with the new tag.

They removed “https” as default so it will not be added anymore. As it is now, you can use only GETH_NAME with a full URL and not specify protocol or port at all. No need for empty strings. In fact if you specify port, you might even end up with an invalid url.

2 Likes