Incognito chain’s code upgrade (tag: 20211228_1)

duc · 28 December 2021 12:42

Hi Incognito validators,

We’ve just released a protocol code update for the first version of the new Incognito Exchange. This is expected to be enabled about Dec 28, 10:45PM ET and is a mandatory update, the latest code is published in 2 forms:

The source code’s tag might be found at mainnet_20211228_1
The Docker image might be pulled from 20211228_1

If you run pNodes or set up your vNodes by following our instructions , they should pull the latest docker image automatically. In case you encounter any issues with the code update, feel free to contact @support for assistance.

Thanks!

duc · 28 December 2021 12:47

abduraman · 28 December 2021 14:33

I use the script below. Up to now, for months, it worked well. This time, it started syncing the beacon from scratch (0 height). Why? @Support

#!/bin/sh bash

run()
{
  # ATTENTION: CHANGE THESE
  validator_key=VKEY
  ####################################
  bootnode="mainnet-bootnode.incognito.org:9330"
  latest_tag=$1
  current_tag=$2

  node_index=1
  data_dir="data${node_index}"
  node_port=$((9432 + ${node_index}))
  rpc_port=$((9333 + ${node_index}))
  container_name="inc_mainnet${node_index}"
  script_name="[r]un${node_index}.sh"
  infura_api_key="KEY"
  backup_log=0

  docker -v || bash -c "wget -qO- https://get.docker.com/ | sh"

  # Remove old container 
  docker rm -f ${container_name}
  if [ "$current_tag" != "" ]
  then
    docker image rm -f incognitochain/incognito-mainnet:${current_tag}
  fi

  docker pull incognitochain/incognito-mainnet:${latest_tag}
  docker network create --driver bridge inc_net || true

  docker run --restart=always --net inc_net -p $node_port:$node_port -p $rpc_port:$rpc_port -e NODE_PORT=$node_port -e RPC_PORT=$rpc_port -e BOOTNODE_IP=$bootnode -e GETH_NAME=mainnet.infura.io/v3/${infura_api_key} -e GETH_PROTOCOL=https$

  if [ $backup_log -eq 1 ]; then
    mv $data_dir/log.txt $data_dir/log_$(date "+%Y%m%d_%H%M%S").txt
    mv $data_dir/error_log.txt $data_dir/error_log_$(date "+%Y%m%d_%H%M%S").txt
  fi
}

# kill existing run.sh processes
ps aux | grep $(basename $0) | awk '{ print $2}' | grep -v "^$$\$" | xargs kill -9

current_latest_tag=""
while [ 1 = 1 ]
do
  tags=`curl -X GET https://hub.docker.com/v1/repositories/incognitochain/incognito-mainnet/tags  | sed -e 's/[][]//g' -e 's/"//g' -e 's/ //g' | tr '}' '\n'  | awk -F: '{print $3}' | sed -e 's/\n/;/g'`

  sorted_tags=($(echo ${tags[*]}| tr " " "\n" | sort -rn))
  latest_tag=${sorted_tags[0]}

  if [ "$current_latest_tag" != "$latest_tag" ]
  then
    run $latest_tag $current_latest_tag
    current_latest_tag=$latest_tag
  fi

  sleep 3600s

done &

kabamaru · 29 December 2021 10:43

I’m getting the following when testing the vanilla docker image pushed in your dockerhub:

	root@node:/# cat /data/error.log 
	2021/12/29 10:35:37 Using network param file for mainnet
	panic: runtime error: invalid memory address or nil pointer dereference
	[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xcc27a9]

	goroutine 1 [running]:
	github.com/incognitochain/incognito-chain/blockchain/pdex.NewParamsWithValue(0x30)
		/Users/autonomous/projects/incognito-chain/blockchain/pdex/params.go:40 +0x29
	github.com/incognitochain/incognito-chain/blockchain/pdex.initStateV2FromDB(0x40d187)
		/Users/autonomous/projects/incognito-chain/blockchain/pdex/statedb.go:90 +0x34
	github.com/incognitochain/incognito-chain/blockchain/pdex.InitStateFromDB(0xc00207ecf0, 0x1539e20, 0xc007959230)
		/Users/autonomous/projects/incognito-chain/blockchain/pdex/statedb.go:52 +0x1d6
	github.com/incognitochain/incognito-chain/blockchain.(*BeaconBestState).RestoreBeaconViewStateFromHash(0xc001100400, 0xc001d32160, 0x1, 0x1)
		/Users/autonomous/projects/incognito-chain/blockchain/restorebeststate.go:46 +0x3a5
	github.com/incognitochain/incognito-chain/blockchain.(*BlockChain).RestoreBeaconViews(0xc001d32160)
		/Users/autonomous/projects/incognito-chain/blockchain/blockchain.go:632 +0x228
	github.com/incognitochain/incognito-chain/blockchain.(*BlockChain).InitChainState(0xc001d32160)
		/Users/autonomous/projects/incognito-chain/blockchain/blockchain.go:173 +0x1c6
	github.com/incognitochain/incognito-chain/blockchain.(*BlockChain).Init(0xc001d32160, 0xc003449790)
		/Users/autonomous/projects/incognito-chain/blockchain/blockchain.go:116 +0x13f
	main.(*Server).NewServer(0xc002476f00, {0x7ffe3d5c6c20, 0xc}, 0xc00011cf60, {0x1ba5808, 0xc00011b320}, 0xc0033d2450, 0x0, {0x7ffe3d5c6ba8, 0x0}, ...)
		/Users/autonomous/projects/incognito-chain/server.go:310 +0x13d8
	main.mainMaster(0x0)
		/Users/autonomous/projects/incognito-chain/incognito.go:184 +0xc66
	main.main()
		/Users/autonomous/projects/incognito-chain/incognito.go:255 +0x2b6

And nodes are crashlooping.
Any ideas?
I just upgraded the tag used for these. Anything else new we need to define here?
I’m running these on kubernetes with a helm chart, so if we need new env vars defining something I might have missed these (?).

And keep in mind this is after starting fresh with a new /data (no existing beacon data in there).

kabamaru · 29 December 2021 10:55

Ah, actually, I saw my storage backend didn’t actually delete the beacon data. After deleting it, the image runs fine.
Would be good if you add operations like that in the changelog in github releases. It’ll make operating nodes smoother.
Thanks!

kabamaru · 29 December 2021 11:43

Seems like I spoke too soon, one node syncing seems to be hogging all 12 cpus on my vNode. Something seems wrong here. And syncing the beacon chain is VERY slow compared to previous images.

Edit: fixed by removing “FULLNODE=1” which I added during testing of this.

trungtin2qn1 · 29 December 2021 12:23

Did you pull the latest image after your node pass pdexv3 breakpoint height?

kabamaru · 29 December 2021 12:25

I believe so yes. I found all my nodes were stalling this morning and the image was outdated, I updated it and they were crashing so had to resync from scratch.

adrian · 29 December 2021 14:24

Mine are also stalling before upgrade at beacon block 1700524 and shard 6 block 1693874. After upgrade crashing with:

2021-12-29 14:15:35.722 beaconsyncprocess.go:254 [ERR] Syncker log : Insert beacon block from pool fail 1700525 [170 36 167 55 91 142 76 26 74 239 37 96 214 215 130 208 141 94 227 23 243 160 119 166 135 207 179 147 213 51 58 90] -1052: Flatten And Convert String Instruction Error 
 Expect Instruction Merkle Root in Beacon Block Header to be �B+T��P퓟��(Kw�
f�r�nB>�ḅ~����Ky� get `[
Flatten And Convert String Instruction Error
github.com/incognitochain/incognito-chain/blockchain.NewBlockChainError
        /Users/autonomous/projects/incognito-chain/blockchain/error.go:395
github.com/incognitochain/incognito-chain/blockchain.(*BlockChain).verifyPreProcessingBeaconBlock
        /Users/autonomous/projects/incognito-chain/blockchain/beaconprocess.go:275
github.com/incognitochain/incognito-chain/blockchain.(*BlockChain).InsertBeaconBlock
        /Users/autonomous/projects/incognito-chain/blockchain/beaconprocess.go:141
github.com/incognitochain/incognito-chain/blockchain.(*BeaconChain).InsertBlock
        /Users/autonomous/projects/incognito-chain/blockchain/beaconchain.go:220
github.com/incognitochain/incognito-chain/syncker.(*BeaconSyncProcess).insertBeaconBlockFromPool
        /Users/autonomous/projects/incognito-chain/syncker/beaconsyncprocess.go:253
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1581
2021-12-29 14:15:36.387 connmanager.go:520 [ERR] Peerv2 log: [stream] rpc error: code = Unknown desc = Sync too fast, last time sync blocks of CID 6 from height 1693874 is 2021-12-29 14:15:25.198725809 +0000 UTC
2021-12-29 14:15:36.920 tx_base.go:204 [ERR] Transaction log: Could not parse metadata with type: 291
2021-12-29 14:15:36.920 connmanager.go:541 [ERR] Peerv2 log: [stream] unmarshall Json Shard Block Is Failed. Error Could not parse metadata with type: 291

After wiping the data it works ok, but really not ideal to break all validators during the holidays, forcing them to upgrade or get slashed, to then crash and need to debug.

JG20 · 29 December 2021 14:57

fwiw, it appears all of my pnodes and vnodes auto-updated without issue. No beacon or shard stalls over the last 26 hours…yet.

adrian · 29 December 2021 14:59

Everything that auto-updated likely got the data directly wiped, I got that on my pnode but on the managed vnodes I don’t have auto-updates enabled for security reasons.

trungtin2qn1 · 29 December 2021 15:10

pdex_v3_break_point_height: 1699680 We have announced for nearly 2 days
Everything that auto-updated likely got the data directly wiped also this is not true for vnodes

adrian · 29 December 2021 15:32

pdex_v3_break_point_height: 1699680 We have announced for nearly 2 days

I appreciate that, and forgive my critical feedback but it’s necessary:

I know it’s too late now but the incognito team has to take into consideration that it’s Christmas+New Years and a lot of people won’t watch the forum or be available for emergency vnode upgrades, so even 2 days notice over this period is really not enough.

I haven’t been following the forum for the past week or so and got alerted this morning that all my vnodes are stalling. Luckily I had the monitoring and was near a computer and managed to upgrade them, otherwise I would have 30+ vnodes slashed, which would have made me sell everything out of frustration and loss of confidence.

trungtin2qn1 · 29 December 2021 15:46

Also my bad.
In our culture and traditional, the Christmas + New Years holiday usually not last for too long. So this is the reason why we decided to deploy pdexv3 at this time.
But thanks to your feedback now we know many validators outside will not be available in this period of the year. We hope validators who have some issues with this release tag can read this and can understand where the issue is and how to fix it. We will remember this for the future releases

abduraman · 29 December 2021 15:48

Hey @trungtin2qn1,

Could you look through my script above? As I see, there is nothing wrong with the script (I use it for many months) but this time all of my nodes have started syncing from scratch. Luckily, by bootstrapping, only one node was slashed.

Thanks.

duc · 29 December 2021 17:18

Hello guys, first of all, sorry for the kind of last-minute announcement, this was something we should avoid especially during the holiday period.

We were also afraid 2-day notice was not enough for node operators who didn’t set up auto-update for some reason. (this was my mistake as assumed that most nodes did). This is a big update and people had to wait for it for a long time so we are quite rushed to push it out, to be honest.

Lastly, as @trungtin2qn1 said, the issue happened for nodes that haven’t updated the new code prior to the breakpoint (beacon height 1,699,680). In this case, data was broken and the only way to fix it is syncing the data from scratch (or copying working data from a source that you trusted to save time). Sorry again for the inconvenience.

PS: @adrian, we really appreciate your support by operating many nodes to secure the Incognito network and I personally understand it’s very painful/frustrated if all those nodes took down just due to our mistake. This is a lesson we’ve learned to make the future release more responsible.

abduraman · 29 December 2021 18:07

I swear I started the update before that point. Probably it was at ~1,698,400. Anyway, you owe me ~$30 because of my slashed node Just kidding.

Happy holidays to all.