I’ve been running a full node (mainnet_20210106_1 branch) for the past couple weeks, and noticed some problems. Sometimes it appears that the node got stuck, or some part got stuck (probably a shard failed to sync?). Some of the symptoms are given below(not happening at the same time, but all happened at one point or another):
- Querying balance of an account that used to return the correct amount suddently started to return 0.
- Querying balance of an account would return old value, but from incognito app I could see the amount already been updated. It is not just a problem of temporary out-of-sync, it stayed that way for hours.
- Querying pdex exchange rate returned old rates compared to what was in the incognito app, and they do not change for hours.
Whenever something like the above happened, I had to restart the node manually. Usually the restart eventually fixed the problem, but sometimes I had to restart several times. It has been a frustrating experience having to constantly watch out for stuck nodes. Have anybody else noticed similar problems? What can I do to debug what’s going on? Is there anything like a health check UI (or API) that I could use?