[Ongoing] Network Monitor

In relation to my previous post, I’ve manage to get the monitor working now. I was mistakenly using the keychain public key instead of the mining public key.
Screenshot 2021-04-20 at 9.31.42 am

curl -Ss --header "Content-Type: application/json" \
    --request POST \
    --data '{"jsonrpc":"1.0","method":"getmininginfo","params":[],"id":1}' \
    http://IP:RPC_PORT | jq .Result.MiningPublickey

The MiningPublickey only seems to be filled on vNodes running 20210406_1+.

My point on DNS and TLS for the monitoring endpoint still stands and would be much appreciated by the community :slight_smile:

1 Like

My friend has a pNode that routinely shuts off every 1-2 days, specifically right before earnings, for seemingly no apparent reason. He has to constantly turn it back on. His last two selections/earning cycles produced, but when slashing happens, lmao.

1 Like

Oh oh…slashing soon to be…Ouch… :face_with_symbols_over_mouth: :face_with_symbols_over_mouth: :sunglasses:

Mine does this too. All three. But I haven’t earned in over 45 days on two of them so :man_shrugging:t2:

If others are having this issue too, @Support needs to take care of this asap.

1 Like

Do you know if blue light on pNode stays on for your friend?

The in-app status can show an inaccurate “offline” reading when the node is actually available. The blue light stays on under those circumstances. In my direct experience anyway

Hello all,

As you know, we have “Slashing feature” in the roadmap to make Incognito more robust and decentralized. And we don’t trigger this feature unless most of the node owner is aware and resolve all problem.

The first step is helping the community check their node status and take action, that why we develop the Monitor Page.

We will look at all problem and help you solve it. But you need to be patient, as at this time the support resource is limited. In the meantime, you can post any problem here.

It seems we have problems with AutoUpdate script, and pNode operation. Once again, be patient, we will check and come back to you soon.

Thanks for your understanding

7 Likes

Hey guys, here is a tutorial of how to use Node monitor, the post is especially helpful for non-technical users.
Again, like @0xkumi said, please don’t worry too much about the issues your nodes encounter, we will try our best to get those fixed (ideally prior to slashing). Remember our goal is to release the fixed nodes so the more healthy nodes operating by the community, the more confidence we have in achieving it. In other words, that’s our responsibility to help you guys out with these issues.

10 Likes

Duc, is the accuracy of the node status fixed? Does Offline always mean Offline?

Here is my experience and my questions up to now after I updated my nodes to the latest version.

1- One of my nodes went offline. Then I restarted it and it started to run. How can I find out why it went offline? If not, should the validators always track their nodes to check their aliveness?

2- I earned my first reward after update. Here are screenshots:

Region capture 123 Region capture 124

I think the left one is OK. However, acc. to the previous explanation, the right one is not since its vote count for 3325th epoch (3326th epoch is ongoing epoch, the previous ones belong to the outdated version) is 0. In that case, my node would be slashed. How can I find out the reason?

Note to the devs: As I explained here, I redirected the rewards of my all nodes to one node. I don’t think it’s related but I want you to know it since my case is exceptional (I mean there is no interface for this in the app).

@0xkumi @duc

Edit (25th April):
Here is the latest state after my node finished committee.
Region capture 125

In that case, “Note to the devs” part is invalid. However, my questions are still valid. I add a new question too:
3- At which vote count will our nodes be slashed? Is the percentage (%79) in the screenshot above enough?

Edit (28th April): Is there anyone reading this post? @Support

Has anyone getting stuff after there pond is completely down for 2 to 4 days?
I hardly get anything when it’s up right now.
I’m just wondering if it’s just me.

Yeah the monitor says my pNode is offline but I got a push notification about a reward being earned. The reward doesn’t show in the history on the monitor. I trust that the team will fix the monitor so it displays an accurate status. Right now it only adds confusion to my experience.

1 Like

Hi!
I have a few vnodes. The vps hardware were selected based upon the hardware requirements as suggested by the tutorials in the community. The software installation was done as suggested by the tutorials. Now for all practical purposes, all the software updates should have been automatic. Now if I have followed all the instructions as suggested to set up a node, and the node has been performing smoothly so far, I would like to think I have done everything as I was told to do so.

In this situation, if there has been no major change in the platform, the correctly set up nodes should not fail. If there has been any change in the platform the necessitate upgrading hardware capabilities of the vps/homemade devices, it should be clearly communicated. If some has the technical expertise in setting up a node, he/she can easily upgrade the hardware if needed. Can there be an official announcement about the hardware and software requirements to maintain a vnode? I do not think it exits.

A more important thing is, it is so difficult to get the important announcements in the community. I have to go through several threads to understand what’s going on. Is it possible to create an official channel, maybe within the community? Also, how about a Discord channel for official announcements only? In any case, there should be a go to channel to access the official announcements.

2 Likes

Hi guys,

Quick solution to solve your problem: reset and setup your node as new validator. Follow this topic or try the quick setup script.

For those who want to keep current setup and data, send DM to @support if you need assist, we will ask for run logs, server ip/port, access credential if needed.

Further reading, recommend you check this topic:

2 Likes

Hello,

  1. To debug why node offline, you need to get error log. We will write instructions about how to retrieve these files. In addition, we will also consider about notification service.

  2. As the previous case, we need log file to find out the reason. But finding the problem in the past is difficult, as log files are removed after serveral days

  3. There is new post about slashing mechanism. The node will be slashed if total vote is below 50% for that epoch

3 Likes

On point 3, 50% is very high for slashing and might have unintended consequences for most operators. I propose we only slash offline nodes to begin with (0% votes) and lift it up over time.

For example, a lot of my vnodes are still syncing shard blocks during the first committee as ~4h is not enough time to sync everything from pending->earning, so sometimes the vote is under 50%. 2nd/3rd committee goes back to 100% votes as the shard is fully synced.

2 Likes

There is also an ongoing bug that the last earning epoch incorrectly reports 0% in the monitor, and in the next earning cycle the previously reported 0 turns into 100, so it might just be a UI issue or not sending/counting the metric correctly.

2 Likes

Tonight is an example of feeling not ready for slashing. I had one Node in the monitor showing as Latest, when I looked into the details, the Beacon was sync’d but all shards had stalled. ssh into the nodes, stop the containers and sudo bash again. I’m two images behind. That seems to fix things. So I go through and update everyone else. Now about half of my nodes show offline in the monitor.

While I was typing the above, I went back to get a more exact count and now more of the nodes seem to be reporting as online.

I’m not upset, but I am confused. Am I remediating things correctly? Am not reading the Monitor correctly?

5 Likes

Is this down?

2 Likes

Hey @Josh_Hamon…yea it seems the system is down…the network monitor thingie…cause both in the app and the web version…no data is populating the system…hmmm…hey @Support…you guys minding the store?.. :sunglasses:

1 Like