It is becoming increasingly common for updates pushed to the network to break synced data. Some of these updates come in the form of Docker updates; others from other changes – such as the one in the past 12 hours. This forces a validator to resync the beacon data first then the assigned shard chain second.
Neither of these forced resyncs can be accomplished in a single epoch. An update in the last 12 hours has again forced all validators to resync their Nodes. While the resync happens automatically, the very real negative effect is the slashing of community Nodes next in line for committee at the time of the update – plus the following 6-12 epochs. With the Nodes resetting their beacon and shard blockheight back to 0, they are unable to achieve voting consensus and per the network slashing protocol are slashed at the end of committee.
With slashing active and the release of fixed nodes (decentralization) finally nearing, the network runs a very real risk of consensus failure. A breaking update (forced resync) will no longer affect only 10 Nodes per shard, it would affect all 32 Nodes per shard. With <22 Nodes per shard suddenly having beacon & shard blockheights of <Latest (1,530,000+) a vote consensus will not be reached. Furthermore, the next 6-12 epochs worth of incoming Nodes will all be in various states of resync and therefore unable to achieve consensus. And for those of us around in early 2020, we can recall what happens when not enough Nodes are available to reach vote consensus: the network stops.
While the network outage then was caused by a hosting failure taking some of the fixed nodes offline, the effect would be the same if every Node is forced to resync from 0.
Unless there are other mitigations of which I’m unaware, it would seem that we are imminently headed for a situation where a pushed update will unintentionally shut down the network until the ever increasing average resync time for Validator Nodes has passed once again allowing a committee with fully synced Nodes.
The Nodes awaiting the next 6-12 committee epochs – at the time of a breaking update – perhaps should be insulated from these forced resyncs until they have left committee. 6-12 epochs (continually adjusted for the increasing blockchain sizes) allows the Nodes not immediately next time enough to resync from 0 and thus enter committee with fully synced blockchains.
If it’s not a priority, I suggest this issue should receive immediate attention and resources. If this isn’t addressed, besides the potential for network stoppage, the number of “newly” slashed Nodes will spike support requests following every one of these updates.