vNode storage tuning

ninjit · 13 November 2022 18:48

Hello,

I’m trying to see if I can optimize my vNode setup to run a bit better as a validator.
I’m using ubuntu on a 512gb SSD formatted as zfs - and this seems to be working well, other than lots of cpu iowait warnings for the incognito process, when I watch the performance chart. CPU usage itself stays pretty low/minimal

I have the exact same setup on another machine that runs as a plex server, and see no such warnings - so I think it’s the different access patterns for incognito miner - which I assume is more database like?

The default recordsize for zfs is 128kb, so I’m wondering if lowering that would help the inc process run more smoothly?
One of the nice things about zfs is that can be set per dataset, without having to reformat the whole drive - so I can create a special mount just for incognito.

Some suggestions I’ve read for databases, say that the record size should be dropped all the way down to the 10s of kb, probably along the size lines of what a row entry might be.

For the incognito mainnet would the equivalent be the average size of a block on the chain?
Do we know what that value might be?

0xkumi · 14 November 2022 03:35

Hi,
We dont have much experience with tuning ZFS. The disk IO of Incognito node is high, however, we don’t have any issues with SSD and normal file system (xfs, ext4 -default setting)
Incognito use the same database architecture as Ethereum (Statedb, with leveldb as keyvalue db). So if you want to optimize diskIO, you can search for optimization on Ethereum node.

One of the report (https://gist.github.com/pryce-turner/bc14b70ff36ec11e417ef341361b2c5f) said 16K recordsize is good config to try.

fredlee · 14 November 2022 08:28

I run with with the default (128K) recordsize. There are some other things you can tune tho.

I have lz4 compression on, but the ratio is only 1.13x so you can turn it off.

Disable atime. I always disable atime on everything. =)

Disable dedup. Dedup takes a lot of IO and memory to no gain for incognito. Even if you run two nodes on the same computer, each node creates unique database files. There are better methods.

Disable sync. Normally it’s never recommended to disable sync on ZFS as you may corrupt the data/volume if your computer lose power. But since you can always download the data again, it’s not critical. Just take a snapshot every now and then as backup.

Also remember that syncing or starting up an incognito node is very IO heavy. Once it has completed sync or the startup check, the IO settles down to reasonable levels.

fredlee · 14 November 2022 08:48

Oh, one more thing. I hope you set the zpool ashift correctly already (>=13), because it is something you can not change on the fly, you need to destroy the pool and create a new one.

You also get better performance with autotrim turned off, for obvious reasons, but it does require you to trim eventually.

ninjit · 14 November 2022 15:54

Thanks for the tips.
I have all that set, except atime - I hadn’t thought of turning that off.

So far disabling sync, has made the biggest difference in terms of the I/O load warnings, and response times.

I have an external drive that I use to backup snapshots, so hopefully that’s enough to get the node(s) back up and running quickly if(when) it craps out.

ninjit · 14 November 2022 16:51

Is 13 the new 12?
I created my pools with ashift=12, because the advice I had last read was to align it with 4k sector sizes

fredlee · 14 November 2022 16:55

It depends on your drive. I did some tests and 13 was the best for my Samsung EVO drive model. I’m not sure, but I’m guessing that your 12 works pretty good as well. If I understand correctly, the root of the problem is drives reporting 512 byte block size and it then defaulting down to ashift=9