Shard stall

The committes page can be hard to understand.

Let’s break it down.

COMMITTEE LISTS

At the top of the page are lists for the Beacon Committee and each of the 8 Shards (0-7).

image

The Beacon Committee is currently run by the Incognito Team. The seven beacon nodes are fixed and do not change.

Next are the Committee Lists; Shards 0-7 with 32 validators per each shard.
image

through

image

Index 1-22 of each shard represents a validator currently run by the Incognito Team. 22 * 8 = 176 fixed validators. These 176 validators (the first 22 per shard) do not change.

Index 23-32 of each shard are the 10 community validators. These are swapped out 5 at a time each epoch. The nodes in index slots 23-27 exit the committee at the end of the epoch/beginning of next epoch. The nodes in index slots 28-32 then move up to slots 23-27. 5 new nodes from the top of the corresponding shard pending list are moved into committee list, to index slots 28-32 in the same order as they appear in the pending list.

The previous process would swap 4 nodes each epoch, but otherwise this is the same as the previous process.

THE WAITING LIST

image

This is the big list of validators that have not been selected for committee. As the header reads – TO BE SELECTED AT RANDOM. Indeed nodes were previously selected at random. They are no longer.

Nodes are moved to a shard pending list directly from the top of the list.
This is different than the previous process.

The order of this list is not changing.
This is different than the previous process.

Nodes exiting committee should join this list. This is no longer happening.
This is different than the previous process.

THE PENDING LIST

image

This is the list of all the validators plucked from the above waiting list, in preparation to join the committee lists at the top of the page. While in this list, a node will sync the assigned shard’s chain. This used to be a much smaller period of time. It is now growing.

The order of this list does not change from epoch to epoch. Validators are pulled from the top of each shard’s list to join committee. Newly assigned validators from the waiting list join the assigned shard’s pending list at the bottom.

Other than the (still) increasing size of each shard’s list, this is the same as the previous process.


It would not have been possible to know which nodes would become validators, except for that short notice when a validator’s role was changed to “pending”. That short window has grown to 6, 14, 23, etc. epochs.

It is also now possible to even know when a validator will be assigned “pending” in n number of epochs ahead of that increasing window. This opens the network and validators up to potential exploitation.

image

This validator, currently in the Waiting List at position 51 for Epoch 3480, will be assigned to a shard waiting list in Epoch 3482; it will not do so in Epoch 3481. In Epoch 3481 (the next epoch) it will move up to position 1. (This will happen around block ~175 of 3481 +/- 25 blocks)

That kind of forecasting for “pending” was not previously possible. It is now. And it can be exploited to potentially attack the network. The only randomness now is which shard a validator will be assigned. The “when” of candidate selection is no longer a random unknown. Not while the Waiting List is fixed and unshuffled.

7 Likes

Thank you @Mike_Wagner…once again…WOW!!!.. :astonished: :open_mouth:

My node status is pending with stalling on 2nd shard.

I don’t have the ability to access the pNode as this is just a plug and play. I appreciate everyone’s suggestions but it is useless to me when all I have is a pnode on the Internet connection.

My pnode is stalling. Updated to latest.

Similarly, my vnode has been stalled in Shard 0. My other vnodes in the other shards are OK. I had deleted everything for the stalled node. I think there is another problem for Shard 0.

Well now we have a new problem @Support . My node earned on the last two epochs, and I even got notifications from the app. But, my node shows a zero balance. Node also went back to pending instead of waiting.

I did a RPC to double check it wasn’t just the app, but my balance shows zero there too.

Screenshot_20210520-103058_Incognito Wallet
Screenshot_20210520-103304_Incognito Wallet

My pNode has now stalled at shard 6, running latest firmware and already wiped the data 2 days ago.

Screenshot 2021-05-20 at 10.17.06 pm

Hey @Thriftinkid

Once we have more slots in a committee by releasing fixed nodes, the community nodes will benefit from this by being able to earn more rewards in general. But I think we still need to keep the number of swapping nodes as small as we do now. The rationale of it is mostly about the network stability: if we swap in a big number of unhealthy nodes at a time, the probability of insufficient votes for a newly proposed block would increase which might take shards (so does the network) down until these bad nodes get slashed. That’s not to mention network security yet.

In my opinion, staking to the network and earning rewards from it is a long-term investment. At the end of the day, the ROI should be the same for either big or small swapping numbers. That being said, with a small swapping number, a node has to wait longer to be in a committee but will also be in there longer and earn more rewards. Conversely, with a big swapping number, the node may get into a committee sooner but will get out of it sooner and earn less in an earning cycle. Please note that in Incognito, when running long enough on a stable set of nodes, the total rewards of each node should be approximately the same regardless of swapping number due to the normal distribution of the randomness.

I guess, after enabling slashing for a period of time, the number of nodes in the network will decrease significantly but these will contribute meaningfully to the network. So the operational nodes may earn more then.

1 Like

Also, @khanhj, please read through problems posted in the topic and prioritize your time to answer and support these cases. thanks.

2 Likes

I just hit the same problem on the same shard and the same block. I’m going to attend to delete all data and resync. I’m hoping these issues get resolved soon. This is my second resync in a week. I’ll reply if it fixes the issue.

1 Like

I believe my nodes fixed their own shard stall. I don’t believe it’s needed to delete all data.

How long did you have to wait. Mine was stalled for ~14 hours before I did the delete.

I’m not sure. I just know I had a bunch stalled and then checked later and they had fixed themselves.

Mine pNode is syncing Beacon, but stalling on Shard 3…

I can “delete the data” again, if you want, but it looks like that may not be the fix from what I am seeing in the comments.

It would be beneficial to have an announcement in app if there is a firmware update for pNode users, or tag everyone in this post that has shown this as an issue (IMHO).

I was able to move past the bad block this time. I figured a delete/restart was the best way since I was ~40 epochs out from committee again so there was plenty of time.

I wanted to give an update, and marked this partially resolved. I talked with @khanhj, and we tracked down the missing funds. They never appeared in my account, but showed up through an RPC. I also had to use and RPC to get my rewards. But, they were, in fact, there. I just couldnt see them. I ran into the same issue with a second node this morning, and resolved it the same way. However, this is still a problem if they arent showing up on their own in my account from within the app without having to code in to get them.

On a different note, has anyone heard any update on nodes going back to pending straight from committee? Pretty much all my nodes are doing that now, and I see the pending list has grown. All shards have between 250-310 pending nodes.

@duc I wiped the pNode again and after re-syncing it’s stuck again at the same block. Same as @fiend138 above. Running 1.0.6 firmware.

Screenshot 2021-05-24 at 7.42.18 pm

To @duc @Mike_Wagner @doc @Thriftinkid @abduraman and to @Jared…I seemed to have run into the same issue that many seem to have run into with both vNodes and pNodes as you can see I am having the same issue on one of my pNodes…I have done two things I have turned off and waited for the system to acknowledge the pNode was offline and then restarted it and the system saw as back online but still there was the same stall issue on the same shard…and I have waited for a day two to see if it would resolve itself on it’s own but it has not…My question is what step or steps do I take to resolve this issue this time around and any other time it might happen again?.. :thinking: :sunglasses:

Stalled again on shard 0

Adjust the picture parameters to 500x900 and the photo will fit screen.