[Solved] Block sync is broken (mainnet-bootnode.incognito.org down)

Incognito vNodes are failing to sync blocks since this morning as the mainnet bootnode is down and highway endpoints cannot be discovered.

2021-03-15 09:50:36.493 addrkeeper.go:164 [INF] Peerv2 log: Full RPC address list: [{ mainnet-bootnode.incognito.org:9330}]
2021-03-15 09:50:36.493 addrkeeper.go:172 [INF] Peerv2 log: RPCing addr { mainnet-bootnode.incognito.org:9330} from list [{ mainnet-bootnode.incognito.org:9330}]
2021-03-15 09:50:36.570 addrkeeper.go:181 [INF] Peerv2 log: Ignoring RPC of address { mainnet-bootnode.incognito.org:9330} until 2021-03-15T10:50:36Z
2021-03-15 09:50:36.570 connmanager.go:226 [ERR] Peerv2 log: Failed refreshing highway: Connect to discover peer mainnet-bootnode.incognito.org:9330 return error dial tcp 161.35.112.31:9330: connect: connection refused:
2021-03-15 09:50:41.493 blockrequester.go:84 [ERR] Peerv2 log: Could not dial to highway grpc server: context deadline exceeded 
2021-03-15 09:50:46.492 blockrequester.go:71 [WRN] Peerv2 log: BlockRequester is not ready, dialing

Connection refused from devices outside network as well, so not a firewall issue. Seems to be impacting vNodes globally.

2 Likes

Hi @adrian it seems the network is running fine.

Please double-check your local network and try to connect to other endpoints to check its stability.

If there is nothing abnormal, simply restart your Node(s) and let us know how it works.

sudo bash run.sh

1 Like

Hi Peter, how are you seeing it’s running?

I did a netcat and nmap on mainnet-bootnode.incognito.org, port 9330 is closed. Ports 80 and 8080 work fine.

Is this maybe a wrong config? I’m using the
mainnet_20210106_1 docker image

1 Like

From a brand new VM in AWS:

root@incognito-test:~$ nc -v mainnet-bootnode.incognito.org 9330
nc: connect to mainnet-bootnode.incognito.org port 9330 (tcp) failed: Connection refused

Also https://networkappers.com/tools/open-port-checker:
Screenshot 2021-03-15 at 11.34.41 am

1 Like

Restart does nothing, the vNode tries to connect to mainnet to get highway endpoints and because the port is down it’s stuck in Failed refreshing highway and Could not dial to highway grpc server. Even deleting the disk and starting from 0 the node is stuck on epoch 1.

@adrian do you use the latest version 20210313_3?

I think the problem is in the run.sh script, since there is a fixed bootnode IP:

#!/bin/sh bash

run()
{
  validator_key=xxx
  bootnode="mainnet-bootnode.incognito.org:9330"

If you try to connect to port 9330 is closed:

# dig mainnet-bootnode.incognito.org. +short
161.35.112.31

So the problem is when you restart the container.

# nc -v 161.35.112.31 9330
nc: connect to 161.35.112.31 port 9330 (tcp) failed: Connection refused

Ok, reproduced on my side after a manual restart.

mainnet-bootnode is running fine now

$ nc -v mainnet-bootnode.incognito.org 9330 
Connection to mainnet-bootnode.incognito.org port 9330 [tcp/*] succeeded!

it’s back.

1 Like

Got an answer from @duc about this issue:

Capture d’écran 2021-03-15 à 13.40.46

6 Likes

Thanks @inccry and @duc.

Can we setup port monitoring on that endpoint or can we just run the bootnode ourselves? Everyone on the network seems to be using it for node syncing and for access to the highway so it looks like a single point of failure.

2 Likes

Yeah, having multiple bootnodes could be great.

I’m volunteer to run one of them :wink:

7 Likes

I agree that there must be available a list of bootnodes.
Currently mainnet-bootnode.incognito.org:9330 is not responding again.

Can an already running fullnode be also used as bootnode?

4 Likes

It’s been quite frustrating that this issue (boot node down) repeatedly came up, preventing people from running their nodes. This is a single source of failure that should have never happened in a p2p network project.

At least for nodes that are already bootstrapped (fully synced and working), why can’t they establish network connections with known peers upon a restart? I can’t see a reason that they need to go through this bootstrap process.

I’m having this problem again. I deleted everything and started fullnode sync over again. Stuck at Epoch 1999 with Failed refreshing highway.

Could you check if the problem is resolved?
Currently, the node is hardcoded to single bootstrap IP, in case this node down you cannot connect to the Incognito chain. We will provide multiple bootstrap nodes in the future.

5 Likes

Yes, it’s resolved and my fullnode is back to synching