Shard stall

@duc I wiped the pNode again and after re-syncing it’s stuck again at the same block. Same as @fiend138 above. Running 1.0.6 firmware.

Screenshot 2021-05-24 at 7.42.18 pm

To @duc @Mike_Wagner @doc @Thriftinkid @abduraman and to @Jared…I seemed to have run into the same issue that many seem to have run into with both vNodes and pNodes as you can see I am having the same issue on one of my pNodes…I have done two things I have turned off and waited for the system to acknowledge the pNode was offline and then restarted it and the system saw as back online but still there was the same stall issue on the same shard…and I have waited for a day two to see if it would resolve itself on it’s own but it has not…My question is what step or steps do I take to resolve this issue this time around and any other time it might happen again?.. :thinking: :sunglasses:

Stalled again on shard 0

Adjust the picture parameters to 500x900 and the photo will fit screen.

There’s not much we can do here. this is a question for @duc. Are you seeing the same problem we are? I have reset multiple vnodes at this point, and I’m still experiencing stalls. Its not running old code that is the problem here. I just havent heard any new news on this front from @Support. I hope they are working on it right now, and thats why they arent answering lol.

2 Likes

Hello Community,

I see there is a lot of discussion regarding your nodes failed to sync.
We need more information, the validator-process-running-log, in order to debug. Here is how:

  • For pNode:

    • go to web browser, enter http://<pnode-ip>:5000/browser, save 2 most recent log-files and error.log
      image
    • we also need your pNode ID
    • more helpful command can be found here
  • For vNode:

    • we need your run.sh script (remove sensitive info - validator key, infura API key)
    • go to data folder to get the log-files
khanhlh@staking-khanhle:~/incognito-mainnet-data/data1$ ls
68.162-2021-03-02.log  68.162-2021-03-03.log  68.162-2021-04-02.log  68.162-2021-05-01.log  error.log  mainnet

Please send Direct Message to @consensus with the requested info and your log attached.

3 Likes

Don’t think we can send more than 10MB in DMs, the pnode log files are 3GB+

Zip em up into a google link?

Yea that’s what I did, just pointing out the issue so its clarified how to properly send the logs over.

1 Like

Both my Pnodes have gone from stalled to syncing

2 Likes

How long did you have to wait? For some reason my nodes randomly did a delete and restart and I’m stalled again on the exact same spot. This is a very frustrating issue. Currently stalled for 5 hours.

It was stalled for a few days time. All green now and the nodes have earned since I started the post.

They haven’t enabled slashing yet. They are still assessing the situation. So, even if you are stalled, you will still earn rewards

2 Likes

I thankfully can say that the stalls with my pNodes seemed to have resolved themselves on their own. But indeed I check them daily at least once if not a couple of times to make sure they are up and running and connected to the network…sooo for now I guess I have been blessed that no serious issue has developed but then I am always on guard that is for sure…whatever it was that corrected itself I hope the dev team had something to do with it in a good way… :sunglasses: :innocent:

1 Like

Looks like after a little over 24 hours of stalling the shard began syncing again. Just in time too since I’m going to be in committee soon.

2 Likes

@Support - All my pnodes continue to stall at the same spots on Shards 0, 2, and 6. Seems like it’s just the infura API issue, but how does that get resolved on pnodes?

@Support…I am having the same issue with my pNodes…they are reporting as being in a stalled state…I have 3 pNodes and they do not always report being in a stalled state at the same time…sometimes it will be 2 of them or sometimes just one of them…and sometimes but rarely it will be all 3 of them. Now the interesting part is that most of the time the stalled issue will resolve itself on its own. So I have been lucky so far as to the stalled issue resolving itself but my concern is that when slashing is implemented that a pNode that has a stalled sync status will be dropped from the network. One additional thing…recently I had one of the pNodes go thru an earning cycle but when I went and checked its status under the Network Monitor…The pNode did earn but it shows 0% under the vote count for that cycle (3613 and 3612) so I am wondering what that was all about for it seems that according to the slashing protocol my pNode would be dropped due to a 0% vote count during that earning cycle…

Farrah

Validator public key

1SZh55…tCdYYi

Status

Online

Role

Pending

Shard 1

Next event

81 epoch to be commitee

Sync state

Latest

Shard Block Height Last Insert
Beacon 1267254 a few seconds ago (syncing)
Shard 0 1 not syncing
Shard 1 1269388 a few seconds ago (syncing)
Shard 2 132302 not syncing
Shard 3 69302 not syncing
Shard 4 77402 not syncing
Shard 5 144902 not syncing
Shard 6 1 not syncing
Shard 7 1 not syncing
Epoch Chain Id Reward Vote Count (%)
3613 4 9.937195258 0
3612 4 9.937195267 0
3597 3 9.93719555 100
3596 3 9.937195569 99

Any enlightenment would be appreciated… :sunglasses:

Hi, please send us the logs file of the day you had this problem and result of chain-info command:

go to web browser, enter

http://<pnode-ip-address>:5000/browser
http://<pnode-ip-address>:5000/chain-info

more useful command can be found here Update physical node firmware

Hi, @hyng…apology for the delayed response to your post…and thank you for your response by the way… As to the issue I ran into with the one pNode…well first of all…all 3 pNodes had shard stall issues at some point but it seems the shard stall issue resolves itself given enough time for all 3 pNodes are showing proper syncing and none of them are reporting any stall issues at this time…so that is a good thing…now once again as to the Vote% count on Epoch 3612 and 3613 for the one pNode it still showing 0 as to vote count but I do remember that the pNode earned still but I just can’t recall on what date/day was it that both epoch 3612 and 3613 ran on…so, therefore, I was unable to get the logs…for future instances of this issue if it were to arise again I will make a point to grab the logs and any relevant data you might need now that I know what you be requesting…so, for now, I guess we count our blessings and can just consider the matter closed being that all 3 of pNodes are running just fine at this time…but yes I do check on them daily so if anything arises I will reach out to you or support…hope all all is well with you and the dev team…so far so good as to how the dev team has been handling the project please give them all my regards and a special shout out to @anho…tell him the network monitor has been working correctly for me ever since he assisted me with it a few weeks back… :sunglasses:

As of this week, I can no longer access my pnodes via those commands.
(i.e. http://192.168.1.XX:5000/browser)

Has something changed?