[Solved] vNode very high CPU/network usage post 20201008_1 version

I’ve been running a few vNodes for a while and started upgrading them from 20201008_1 to 20210106_1. I have noticed that CPU and network usage has increased 15-20x from versions starting 20201028_1 (tried upgrading one-by-one).

I’ve compared env variables, configs, mounts, open files, TCP connections and all is the same, except obviously the updated code AND a new TCP connection which only exists on nodes starting 20201028_1:
TCP node4-878f457cb-gvl6r:9433->ns3157884.ip-51-83-237.eu:7337 (ESTABLISHED)

Not entirely sure what this is, but it looks like a highway connection, and it’s taking up most of the CPU and network bandwidth.

Usage on 20201008_1: 0.1-0.5 CPU, ~50 kbit/s up/down
Usage starting 20201028_1 to 20210106_1: 1.5-2 CPU, ~3-5mbit/s up/down

Please see the metrics I’ve captured during yesterday’s upgrade for 2 nodes:

Screenshot 2021-01-14 at 7.24.43 pm

Is there anything I can do to debug this further and reduce the CPU/network usage? the logs are the same as before and nothing else changed in terms of env, config and mounts.

4 Likes

Hi @adrian not exactly sure what the issue is, but this might be related to when the tag update occured. When your node updated the new tag it may have started synching blocks that were missing, thus raising cpu and bandwidth use.

One of our devs also mentioned that there could be a problem with data sync when using the old tag. Could you check the latest block heights on your node and see if they are up to date? If they are, could you monitor cpu and bandwidth use and see if they are still high?

-Thanks!

2 Likes

Hey @Chucky, so I’ve checked the beacon height with getblockchaininfo and it’s currently stuck on 175 for both new nodes, and getting the following error:

2021-01-15 20:57:46.923 utils.go:55 [ERR] Syncker log : Insert block 176 hash [239 81 123 135 47 138 91 148 158 147 81 125 6 187 215 60 180 176 28 250 87 138 24 141 41 47 81 231 71 144 104 162] got error -1099: Process Random Instruction Error 
 strconv.Atoi: parsing "3845508393": value out of range
Process Random Instruction Error
github.com/incognitochain/incognito-chain/blockchain.NewBlockChainError
        <redacted>/incognito-chain/blockchain/error.go:365
github.com/incognitochain/incognito-chain/blockchain.(*BeaconBestState).processInstruction
        <redacted>/incognito-chain/blockchain/beaconprocess.go:927
github.com/incognitochain/incognito-chain/blockchain.(*BeaconBestState).updateBeaconBestState
        <redacted>/incognito-chain/blockchain/beaconprocess.go:730
github.com/incognitochain/incognito-chain/blockchain.(*BlockChain).InsertBeaconBlock
        <redacted>/incognito-chain/blockchain/beaconprocess.go:168
github.com/incognitochain/incognito-chain/blockchain.(*BeaconChain).InsertBlk
        <redacted>/incognito-chain/blockchain/beaconchain.go:188
github.com/incognitochain/incognito-chain/syncker.InsertBatchBlock
        <redacted>/incognito-chain/syncker/utils.go:51
github.com/incognitochain/incognito-chain/syncker.(*BeaconSyncProcess).streamFromPeer
        <redacted>/incognito-chain/syncker/beaconsyncprocess.go:328
github.com/incognitochain/incognito-chain/syncker.(*BeaconSyncProcess).syncBeacon
        <redacted>/incognito-chain/syncker/beaconsyncprocess.go:265
runtime.goexit

I am also running incognito on ARM chips with a custom Docker image that I’ve built with (go1.13.15):

git clone https://github.com/incognitochain/incognito-chain.git
git checkout tags/mainnet_20210106_1
cd bin/
sed -i 's/incognitochain\/incognito-mainnet-arm/<my own registry>/g' build_mainnet_arm.sh
tag=mainnet_20210106_1 ./build_mainnet_arm.sh

This might just be a code incompatibility with ARM; please let me know if I’m doing something silly here.

2 Likes

I have read somewhere that it is not compatible with ARM indeed. Some have tried though…

Indeed it was an issue specific to ARM, the latest 20210320_2 tag only has high CPU/network usage while syncing blocks and afterwards it goes back to normal.

Finished moving all of my arm vnodes to amd64 to ensure they stay healthy and contribute to the network.

The thread can be marked as solved/locked.

3 Likes