Please help troubleshoot Nodes that switch Online/Offline several times a day.

Linnovations · 16 September 2022 21:21

Hello Incognito,

I’d like to report some weirdness happening with my vNodes as reported by the Incognito Node Monitor website - https://monitor.incognito.org/node-monitor

Below is a screen shot of my Docker container “Stats” on my vNode that has 4 nodes running:

But when I run the command sudo docker ps:

what worries me is the status is “Up 25 mins”. I’m not a super technical person, but this tells me something caused my node to go offline.

If this is correct, how can I trouble shoot what the issue is?

The Node Monitor website my 4 nodes appear as offline:

This issue of my node going online/offline has been happening for several weeks now.

I have not made any changed to my PC (not not installed any other software). But I did applied the latest updates to Ubuntu 20.04.4 LTS and have restarted my node yesterday.

Does anyone have any ideas what may be going wrong with my vNodes that seems to go online/offline on a frequent basis (a few times a day) as reported by the Incognito Node Monitor?

Thanks for any help provided.

Jared · 16 September 2022 22:29

When’s the last time you restarted your server? If you have not done that go ahead and do it now:

sudo reboot

After that if docker ps is still showing your containers as restarting randomly then I’d stop them

sudo docker stop inc_mainnet_0
sudo docker stop inc_mainnet_1
Etc

Then

sudo docker prune --all followed by y

After that make sure youre running the latest blink script and re-run it to set the nodes up again.

Linnovations · 17 September 2022 00:00

Thanks for replying @Jared.

Can you please confirm where I can download the latest build script?

Jared · 17 September 2022 00:05

https://raw.githubusercontent.com/incognitochain/incognito-chain/production/bin/blink.sh

radonm · 17 September 2022 15:23

I’d also check dmesg output for out of memory events and also sever uptime for unexpected reboots.

Linnovations · 18 September 2022 02:10

My PC is not shutting down and restarting. I don’t beleive it’s running out of memory and I am only using ~15% of my 16GB RAM

I’m not a a very technical person but I ran “dmesg” and below is what I see a tonne of over repeated over and over, it this normal?

[124182.941817] br-e315c44a533a: port 1(veth2862a96) entered blocking state
[124182.941822] br-e315c44a533a: port 1(veth2862a96) entered forwarding state
[127797.462107] br-e315c44a533a: port 2(veth4ca00d6) entered disabled state
[127797.462181] veth6a4aa32: renamed from eth0
[127797.550035] br-e315c44a533a: port 2(veth4ca00d6) entered disabled state
[127797.550610] device veth4ca00d6 left promiscuous mode
[127797.550614] br-e315c44a533a: port 2(veth4ca00d6) entered disabled state
[127797.869392] br-e315c44a533a: port 2(veth456e236) entered blocking state
[127797.869400] br-e315c44a533a: port 2(veth456e236) entered disabled state
[127797.872338] device veth456e236 entered promiscuous mode
[127797.872447] br-e315c44a533a: port 2(veth456e236) entered blocking state
[127797.872451] br-e315c44a533a: port 2(veth456e236) entered forwarding state
[127798.436656] eth0: renamed from vethbf174a0
[127798.472816] IPv6: ADDRCONF(NETDEV_CHANGE): veth456e236: link becomes ready
[127808.775820] br-e315c44a533a: port 3(veth37e38bf) entered disabled state
[127808.775911] vethf76c20a: renamed from eth0
[127808.864568] br-e315c44a533a: port 3(veth37e38bf) entered disabled state
[127808.864990] device veth37e38bf left promiscuous mode
[127808.864994] br-e315c44a533a: port 3(veth37e38bf) entered disabled state
[127809.460710] br-e315c44a533a: port 3(vetha7ccf2e) entered blocking state
[127809.460714] br-e315c44a533a: port 3(vetha7ccf2e) entered disabled state
[127809.460771] device vetha7ccf2e entered promiscuous mode
[127809.460870] br-e315c44a533a: port 3(vetha7ccf2e) entered blocking state
[127809.460874] br-e315c44a533a: port 3(vetha7ccf2e) entered forwarding state
[127809.807732] br-e315c44a533a: port 3(vetha7ccf2e) entered disabled state
[127809.964161] eth0: renamed from veth9bc456d
[127809.988109] IPv6: ADDRCONF(NETDEV_CHANGE): vetha7ccf2e: link becomes ready
[127809.988210] br-e315c44a533a: port 3(vetha7ccf2e) entered blocking state
[127809.988215] br-e315c44a533a: port 3(vetha7ccf2e) entered forwarding state
[127820.321816] br-e315c44a533a: port 4(vethece61bc) entered disabled state
[127820.321879] vethe3d73c1: renamed from eth0
[127820.388957] br-e315c44a533a: port 4(vethece61bc) entered disabled state
[127820.389376] device vethece61bc left promiscuous mode
[127820.389381] br-e315c44a533a: port 4(vethece61bc) entered disabled state
[127821.063945] br-e315c44a533a: port 4(veth1a18f58) entered blocking state
[127821.063951] br-e315c44a533a: port 4(veth1a18f58) entered disabled state
[127821.064042] device veth1a18f58 entered promiscuous mode
[127821.064120] br-e315c44a533a: port 4(veth1a18f58) entered blocking state
[127821.064123] br-e315c44a533a: port 4(veth1a18f58) entered forwarding state
[127821.327646] br-e315c44a533a: port 4(veth1a18f58) entered disabled state
[127821.459974] eth0: renamed from veth701b32f
[127821.488039] IPv6: ADDRCONF(NETDEV_CHANGE): veth1a18f58: link becomes ready
[127821.488140] br-e315c44a533a: port 4(veth1a18f58) entered blocking state
[127821.488144] br-e315c44a533a: port 4(veth1a18f58) entered forwarding state
[127831.864032] br-e315c44a533a: port 1(veth2862a96) entered disabled state
[127831.864104] veth4eee75f: renamed from eth0
[127831.956071] br-e315c44a533a: port 1(veth2862a96) entered disabled state
[127831.956658] device veth2862a96 left promiscuous mode
[127831.956664] br-e315c44a533a: port 1(veth2862a96) entered disabled state
[127832.837517] br-e315c44a533a: port 1(vethab70eba) entered blocking state
[127832.837522] br-e315c44a533a: port 1(vethab70eba) entered disabled state
[127832.837613] device vethab70eba entered promiscuous mode
[127832.837736] br-e315c44a533a: port 1(vethab70eba) entered blocking state
[127832.837740] br-e315c44a533a: port 1(vethab70eba) entered forwarding state
[127832.867544] br-e315c44a533a: port 1(vethab70eba) entered disabled state
[127833.359771] eth0: renamed from veth67582c0
[127833.387778] IPv6: ADDRCONF(NETDEV_CHANGE): vethab70eba: link becomes ready
[127833.387860] br-e315c44a533a: port 1(vethab70eba) entered blocking state
[127833.387866] br-e315c44a533a: port 1(vethab70eba) entered forwarding state

Thanks for your help.

radonm · 18 September 2022 12:49

There would be a line saying “…out of memory…” since it’s not there this is good. Now check/watch for unexpected reboots.

uptime

08:47:37 up 6 days, 14:30, 1 user, load average: 0.34, 0.44, 0.85
Check if days/hours unexpectedly decreases.

Linnovations · 18 September 2022 18:52

Hi, I ran uptime and this is what I got:

 04:45:37 up 2 days,  4:17,  0 users,  load average: 0.49, 0.70, 1.10

But my Incognito Nodes “Created 35 minutes ago”…

CONTAINER ID   IMAGE                                         COMMAND                  CREATED          STATUS          PORTS                                                                                  NAMES
8a5d16241d11   incognitochain/incognito-mainnet:20220830_1   "/bin/bash run_incog…"   36 minutes ago   Up 36 minutes   0.0.0.0:8884->8884/tcp, :::8884->8884/tcp, 0.0.0.0:8894->8894/tcp, :::8894->8894/tcp   inc_mainnet_3

Nothing unusual here as I was forced to restart my PC due to power outage 2 days ago.

My Incognito Nodes have been playing up for about 1 month. Every ~2-3 hours the “Created” date resets to 0 minutes.

Any other thoughts?

I wanted to exhaust other avenues before I commit to rebuilding my nodes using the blink script.

radonm · 18 September 2022 20:13

Hi, at this point it seems it’s time to try Jared’s procedure. I can’t think of anything else to check. Dmesg output seems normal.

Linnovations · 20 September 2022 02:41

Thanks for your ideas and help @radonm, much appreciated. Thanks for being part of this super helpful community.

Linnovations · 20 September 2022 04:00

Hey @Jared,

I was able to stop all my docker containers. The only command that didn’t work in your set of instructions was the “prune” command. That options didn’t exist for me.

Question:
Just curious, now that I’ve configured the new script, I am about to run it, will all my nodes need to re-download the entire blockchain data, or will my nodes reuse the data save already?

Jared · 20 September 2022 04:10

Sorry about that I left out the system part of the command.
You’ll want to stop all of your containers.

Copy and paste the following commands one at a time:

sudo systemctl stop IncognitoUpdater.service

sudo docker container stop $(docker container ls -q --filter name=inc_mainnet_*)

That will stop all of your containers. Next issue:

sudo docker system prune --all followed by y

And now you want to run the newest script.

This will ensure your nodes are running the latest script version and updater service.

This. The volumes where the data is stored will not be touched.

Linnovations · 21 September 2022 16:31

Hey @Jared,
After completing the above steps, my vNodes are back up and running and the uptime of the docker containers are much more stable. But I have seen the following error on the Incognito Node Monitor website (see below)

Here is my version number

I thought the script automatically updates my nodes to the latest version?

Please advise what I need to do to correct this?

Thanks.

Jared · 21 September 2022 16:54

Run the following to start the updater service again:
sudo systemctl start IncognitoUpdater.service

Linnovations · 21 September 2022 20:43

Hi @Jared,
After starting the Updater service, I noticed all my IMAGES for my nodes were then updated to - incognitochain/incognito-mainnet:20220921_1

However, the Incognito Node Monitor website still reports all my nodes are “Not Latest Version”, any ideas why?

Jared · 21 September 2022 20:53

This is a known issue. The Node Monitor site needs to update the code to the newest version and is currently showing a test version.

Linnovations · 21 September 2022 22:25

Hey @Jared, do you who maintains the Node Monitor site?

Can you please share with them my post where I provided some simple ideas that can improve the User Interface for the site :-
https://we.incognito.org/t/idea-to-improve-node-monitor-site/15448

This post has quite a few Likes and some great comments from the community.

Perhaps the maintainer that looks after the site can incorporate some of these changes along with the other updates.

Thanks.

Jared · 21 September 2022 22:28

@0xkumi maintains the Node Monitor site.

Ive pinged him and linked to this as well.