This is the second time my P note has been picked and status is shard stall.
Why is this happening?
One of my pNodes recently stalled in the SAME shard on the SAME block.
It may be a bad block. I found this in the error log using the /browser endpoint when searching the stalled block number.
2021-05-17 00:00:38.010 utils.go:111 [ERR] Syncker log : Insert block 909063 hash 128819200c8c56557aaa1fc13d6af3111871ef9a6e044f01db27734233a009c8 got error -1051: Instruction Hash Error Expect instruction hash to be 711455acaa89a9f8d3baa2e586dfd29b824baaa5197c26843c43338c36e62a44 but get 0000000000000000000000000000000000000000000000000000000000000000 at block 909063 hash 128819200c8c56557aaa1fc13d6af3111871ef9a6e044f01db27734233a009c8 Instruction Hash Error
This happened after the same pNode successfully synced Shard 4 for 100% Vote Percentage in committee. I think the cause is unlikely to be a hardware issue with a pNode and unlikely to be an issue with a pNode’s internet link. The common factor now appears to be an issue with the circulating blockchain data, since it it’s now reproducible on another unrelated pNode.
That same pNode has moved on to stalling on Shard 0 now.
Good times. /s
Ugg, The status now shows four hours of stalling. Would be nice if someone could explain this to us. @Support ?
In about twelve minutes, you’ll only have 8 more epochs of stalling to wait. Almost another day and a half of no progress before your pNode finally begins voting in committee 6.
@Support, Shard 2 Now shows stalling as well.
Heh – same block my Node stalled on when it was in Shard 2 recently.
So let’s see –
Shards 1, 4 and 5 have synced successfully (at least for me); Shards 4 and 5 achieving 100% Vote Stats per the Node Monitor tool.
- Shard 2 appears to get stuck syncing block 433077, as seen on at least two separate Nodes. (Mike_Wagner, Thriftinkid)
- Shard 6 appears to get stuck syncing block 909062, as seen on at least two separate Nodes. (Mike_Wagner, Silvercap718)
Anyone have a Node in Shard 0 also stuck on block 169583? Asking for a friend.
Just popped pending on Shard 5 @Mike_Wagner. We will see if it gets stuck
I fully synced and subsequently voted 100% on Shard 5.
May the winds of all that blockchain data ever be in your favor.
The winds blew , but not in my favor. Just have stall on shard six message now. I have had the pending notification since 2 AM. I guess I get screwed on earning anything again
Not so. The team is working with the community to resolve these sort of issues. So no slashing (yet). The “old” rules are still in effect, and your Node earns in committee regardless of voting, as has been the case since the genesis block.
Thus that Node will earn in Epoch 3476. Epoch 3476 will be active in just under 1 day. In fact, it will be the active Epoch this time tomorrow.
My pNode was stalling as well, and it’s very likely because it picked up the new docker tag which doesn’t like the old beacon/shard data (pre staking v2). Visiting:
http://[pNode IP]:5000/restart-node?delete-data=1&qrcode=[code from bottom of pNode]
cleared the old data and it’s now re-syncing with the new v2 flow version.
Same thing happened yesterday for vNodes [Solved] Shards sync stalling for multiple nodes, updating to new code broke shard syncing so had to clear everything and start the beacon/shard sync again, all good now.
Thanks for answering this @adrian, I guess all the problems may be solved by your instructions
For those who haven’t know yet, please have a look at the topic, especially @Devenus’s comment. The new firmware for pNode has supported some functions that may help node operators to manage their nodes easier, thanks!
@duc my vnodes are run by jservers. They updated their code this weekend, so im not sure where the shard stall is coming from
If you are on a vNode, it would be even easier, either running by Docker or building from source, please make sure your node is running with the latest code as in the topic (I guess you’ve looked into it already)
Secondly, also make sure you’ve cleared up the node’s data and re-sync from scratch.
The shard’s stall is because it ran the old code and accidentally produced data that was not compatible with Incognito chain’s data. That’s why I recommended node owners who have sync issues to clear up data and then re-sync with the latest code.
This was totally my fault for not informing the community timely about that such a big upgrade. That’s a lesson learned for me, sorry.
I’m optimistic. Even though my node with the shard 0 stall just finished a fresh sync this weekend, I went ahead and deleted the shard data (again) and it’s currently syncing from scratch (again). Not quite halfway through the beacon block data, after ~4.5 hours. So still have some time before I’ll know if it can sync past block 169,583 on shard 0.
In the meantime, this did give me the opportunity to do a fun write-up about the sync behavior of Nodes.
Just read your topic, this is really awesome. To be honest, the core devs team is lacking of such a neat explanation to the community. Probably, our post is quite technical, isn’t it?
As a pnode owner I have no ability to check code, I plug and play. What are my options on dealing with the stalled status, it has stalled the last two times it made committee. I am running the Latest version on both pnodes so I don’t know why I am stalled
So in viewing thriftinkid’s nodes we do not see any external issue from the docker commands. If we don’t have the public validator key (BLS) then we are not able to see this stall issue. How can the stall issue be identified without going into the monitoring tool?
Next question is if the node is in pending but stalled will the node fix the stall issue and go into committee or just stay stalled?
Which then brings up the question, if it’s stalled in pending state, if the vnode is stopped the data cleared and restarted will that cause it to fall out of pending?