[Shipped] Network Monitor

Duc, is the accuracy of the node status fixed? Does Offline always mean Offline?

Here is my experience and my questions up to now after I updated my nodes to the latest version.

1- One of my nodes went offline. Then I restarted it and it started to run. How can I find out why it went offline? If not, should the validators always track their nodes to check their aliveness?

2- I earned my first reward after update. Here are screenshots:

Region capture 123 Region capture 124

I think the left one is OK. However, acc. to the previous explanation, the right one is not since its vote count for 3325th epoch (3326th epoch is ongoing epoch, the previous ones belong to the outdated version) is 0. In that case, my node would be slashed. How can I find out the reason?

Note to the devs: As I explained here, I redirected the rewards of my all nodes to one node. I don’t think it’s related but I want you to know it since my case is exceptional (I mean there is no interface for this in the app).

@0xkumi @duc

Edit (25th April):
Here is the latest state after my node finished committee.
Region capture 125

In that case, “Note to the devs” part is invalid. However, my questions are still valid. I add a new question too:
3- At which vote count will our nodes be slashed? Is the percentage (%79) in the screenshot above enough?

Edit (28th April): Is there anyone reading this post? @Support

Has anyone getting stuff after there pond is completely down for 2 to 4 days?
I hardly get anything when it’s up right now.
I’m just wondering if it’s just me.

Yeah the monitor says my pNode is offline but I got a push notification about a reward being earned. The reward doesn’t show in the history on the monitor. I trust that the team will fix the monitor so it displays an accurate status. Right now it only adds confusion to my experience.

1 Like

Hi!
I have a few vnodes. The vps hardware were selected based upon the hardware requirements as suggested by the tutorials in the community. The software installation was done as suggested by the tutorials. Now for all practical purposes, all the software updates should have been automatic. Now if I have followed all the instructions as suggested to set up a node, and the node has been performing smoothly so far, I would like to think I have done everything as I was told to do so.

In this situation, if there has been no major change in the platform, the correctly set up nodes should not fail. If there has been any change in the platform the necessitate upgrading hardware capabilities of the vps/homemade devices, it should be clearly communicated. If some has the technical expertise in setting up a node, he/she can easily upgrade the hardware if needed. Can there be an official announcement about the hardware and software requirements to maintain a vnode? I do not think it exits.

A more important thing is, it is so difficult to get the important announcements in the community. I have to go through several threads to understand what’s going on. Is it possible to create an official channel, maybe within the community? Also, how about a Discord channel for official announcements only? In any case, there should be a go to channel to access the official announcements.

3 Likes

Hi guys,

Quick solution to solve your problem: reset and setup your node as new validator. Follow this topic or try the quick setup script.

For those who want to keep current setup and data, send DM to @support if you need assist, we will ask for run logs, server ip/port, access credential if needed.

Further reading, recommend you check this topic:

3 Likes

Hello,

  1. To debug why node offline, you need to get error log. We will write instructions about how to retrieve these files. In addition, we will also consider about notification service.

  2. As the previous case, we need log file to find out the reason. But finding the problem in the past is difficult, as log files are removed after serveral days

  3. There is new post about slashing mechanism. The node will be slashed if total vote is below 50% for that epoch

3 Likes

On point 3, 50% is very high for slashing and might have unintended consequences for most operators. I propose we only slash offline nodes to begin with (0% votes) and lift it up over time.

For example, a lot of my vnodes are still syncing shard blocks during the first committee as ~4h is not enough time to sync everything from pending->earning, so sometimes the vote is under 50%. 2nd/3rd committee goes back to 100% votes as the shard is fully synced.

3 Likes

There is also an ongoing bug that the last earning epoch incorrectly reports 0% in the monitor, and in the next earning cycle the previously reported 0 turns into 100, so it might just be a UI issue or not sending/counting the metric correctly.

2 Likes

Tonight is an example of feeling not ready for slashing. I had one Node in the monitor showing as Latest, when I looked into the details, the Beacon was sync’d but all shards had stalled. ssh into the nodes, stop the containers and sudo bash again. I’m two images behind. That seems to fix things. So I go through and update everyone else. Now about half of my nodes show offline in the monitor.

While I was typing the above, I went back to get a more exact count and now more of the nodes seem to be reporting as online.

I’m not upset, but I am confused. Am I remediating things correctly? Am not reading the Monitor correctly?

5 Likes

Is this down?

2 Likes

Hey @Josh_Hamon…yea it seems the system is down…the network monitor thingie…cause both in the app and the web version…no data is populating the system…hmmm…hey @Support…you guys minding the store?.. :sunglasses:

1 Like

Hey @Support,

I think there is a display error about sync state. “Sync state” of one of my nodes is displayed as “Shard stall” on the grid. However when I open pop up, “Last Insert” column shows “a few seconds ago (syncing)” for the corresponding shard. Is this a display error or is my node really stalled?

Thanks.

This error goes on. The exact opposite also occurs, i.e. Shard syncing on the grid, stalling on the popup. FYI. @Support

P.S: The version of my nodes is the latest.

1 Like

In the app node “Monitor Detail” and on website node monitor, I see “Code Version” and then 9 alphanumeric characters. What does this code version refer to? How can I use this to assess health of node?

When it is green, your node is fine. It is the short SHA commit id from Incognitochain github repo. It should match with the latest release.
If you run node that is not latest, there will be warning message beside the version.

1 Like

Do you still have the problem? You can ping me your Mining Public Key to check

I don’t know :slight_smile: since I cannot see them. The validator table is empty. I thought that again you updated the site and they were lost. When I try re-adding my nodes, I get “node name or validator public key is exist” error. The problem is this.

Now I have tried another browser. It does not add the validator public key to the table. When I hit Check, just a progress circle appears and then disappears. I think the site has a problem.

Can you refresh, your empty page is temporary during our service restart schedule.

2 Likes