@duc I wiped the pNode again and after re-syncing it’s stuck again at the same block. Same as @fiend138 above. Running 1.0.6 firmware.
Shard stall
To @duc @Mike_Wagner @doc @Thriftinkid @abduraman and to @Jared…I seemed to have run into the same issue that many seem to have run into with both vNodes and pNodes as you can see I am having the same issue on one of my pNodes…I have done two things I have turned off and waited for the system to acknowledge the pNode was offline and then restarted it and the system saw as back online but still there was the same stall issue on the same shard…and I have waited for a day two to see if it would resolve itself on it’s own but it has not…My question is what step or steps do I take to resolve this issue this time around and any other time it might happen again?..
Adjust the picture parameters to 500x900 and the photo will fit screen.
There’s not much we can do here. this is a question for @duc. Are you seeing the same problem we are? I have reset multiple vnodes at this point, and I’m still experiencing stalls. Its not running old code that is the problem here. I just havent heard any new news on this front from @Support. I hope they are working on it right now, and thats why they arent answering lol.
Hello Community,
I see there is a lot of discussion regarding your nodes failed to sync.
We need more information, the validator-process-running-log, in order to debug. Here is how:
-
For pNode:
- go to web browser, enter
http://<pnode-ip>:5000/browser
, save 2 most recent log-files anderror.log
- we also need your pNode ID
- more helpful command can be found here
- go to web browser, enter
-
For vNode:
- we need your run.sh script (remove sensitive info - validator key, infura API key)
- go to data folder to get the log-files
khanhlh@staking-khanhle:~/incognito-mainnet-data/data1$ ls
68.162-2021-03-02.log 68.162-2021-03-03.log 68.162-2021-04-02.log 68.162-2021-05-01.log error.log mainnet
Please send Direct Message to @consensus with the requested info and your log attached.
Don’t think we can send more than 10MB in DMs, the pnode log files are 3GB+
Zip em up into a google link?
Yea that’s what I did, just pointing out the issue so its clarified how to properly send the logs over.
Both my Pnodes have gone from stalled to syncing
How long did you have to wait? For some reason my nodes randomly did a delete and restart and I’m stalled again on the exact same spot. This is a very frustrating issue. Currently stalled for 5 hours.
It was stalled for a few days time. All green now and the nodes have earned since I started the post.
They haven’t enabled slashing yet. They are still assessing the situation. So, even if you are stalled, you will still earn rewards
I thankfully can say that the stalls with my pNodes seemed to have resolved themselves on their own. But indeed I check them daily at least once if not a couple of times to make sure they are up and running and connected to the network…sooo for now I guess I have been blessed that no serious issue has developed but then I am always on guard that is for sure…whatever it was that corrected itself I hope the dev team had something to do with it in a good way…
Looks like after a little over 24 hours of stalling the shard began syncing again. Just in time too since I’m going to be in committee soon.
@Support - All my pnodes continue to stall at the same spots on Shards 0, 2, and 6. Seems like it’s just the infura API issue, but how does that get resolved on pnodes?
@Support…I am having the same issue with my pNodes…they are reporting as being in a stalled state…I have 3 pNodes and they do not always report being in a stalled state at the same time…sometimes it will be 2 of them or sometimes just one of them…and sometimes but rarely it will be all 3 of them. Now the interesting part is that most of the time the stalled issue will resolve itself on its own. So I have been lucky so far as to the stalled issue resolving itself but my concern is that when slashing is implemented that a pNode that has a stalled sync status will be dropped from the network. One additional thing…recently I had one of the pNodes go thru an earning cycle but when I went and checked its status under the Network Monitor…The pNode did earn but it shows 0% under the vote count for that cycle (3613 and 3612) so I am wondering what that was all about for it seems that according to the slashing protocol my pNode would be dropped due to a 0% vote count during that earning cycle…
Farrah
Validator public key
1SZh55…tCdYYi
Status
Online
Role
Pending
Shard 1
Next event
81 epoch to be commitee
Sync state
Latest
Shard | Block Height | Last Insert |
---|---|---|
Beacon | 1267254 | a few seconds ago (syncing) |
Shard 0 | 1 | not syncing |
Shard 1 | 1269388 | a few seconds ago (syncing) |
Shard 2 | 132302 | not syncing |
Shard 3 | 69302 | not syncing |
Shard 4 | 77402 | not syncing |
Shard 5 | 144902 | not syncing |
Shard 6 | 1 | not syncing |
Shard 7 | 1 | not syncing |
Epoch | Chain Id | Reward | Vote Count (%) |
---|---|---|---|
3613 | 4 | 9.937195258 | 0 |
3612 | 4 | 9.937195267 | 0 |
3597 | 3 | 9.93719555 | 100 |
3596 | 3 | 9.937195569 | 99 |
Any enlightenment would be appreciated…
Hi, please send us the logs file of the day you had this problem and result of chain-info command:
go to web browser, enter
http://<pnode-ip-address>:5000/browser
http://<pnode-ip-address>:5000/chain-info
more useful command can be found here Update physical node firmware
Hi, @hyng…apology for the delayed response to your post…and thank you for your response by the way… As to the issue I ran into with the one pNode…well first of all…all 3 pNodes had shard stall issues at some point but it seems the shard stall issue resolves itself given enough time for all 3 pNodes are showing proper syncing and none of them are reporting any stall issues at this time…so that is a good thing…now once again as to the Vote% count on Epoch 3612 and 3613 for the one pNode it still showing 0 as to vote count but I do remember that the pNode earned still but I just can’t recall on what date/day was it that both epoch 3612 and 3613 ran on…so, therefore, I was unable to get the logs…for future instances of this issue if it were to arise again I will make a point to grab the logs and any relevant data you might need now that I know what you be requesting…so, for now, I guess we count our blessings and can just consider the matter closed being that all 3 of pNodes are running just fine at this time…but yes I do check on them daily so if anything arises I will reach out to you or support…hope all all is well with you and the dev team…so far so good as to how the dev team has been handling the project please give them all my regards and a special shout out to @anho…tell him the network monitor has been working correctly for me ever since he assisted me with it a few weeks back…
As of this week, I can no longer access my pnodes via those commands.
(i.e. http://192.168.1.XX:5000/browser)
Has something changed?