Hello community. I have been running node validators for years now. The last few months I have started to get weird errors on some of my nodes. I run the hardLinks script and the node controller script, so my nodes stop and start a few times a day. If you never stop your containers, then likely this error would never come up. I am posting here in case anyone else is having this issue and we can hopefully come up with a solution.
This has now infected 3 of my nodes in the last month. I have tried the following to resolve the issue properly with no avail…
- check if the port is in use with lsof. It is not in use, shows nothing.
2)remove the container, remove the data folder, prune, re-start docker service, re-add container, same error.
In the end, as a workaround, I was able to get these nodes back up and running by changing the port used in the add process to a port outside the increment of the blink script. This works… but is a workaround, not a solution, as this issue keeps affecting new nodes of mine.
Would be great if I could get to the root of the cause.
Has anyone seen this kind of error before?
The specific verbage is …
“Error response from daemon: driver failed programming external connectivity on endpoint inc_mainnet_0 (99ysadef90324nffsad): Bind for 0.0.0.0:XXXX failed: port is already allocated”.
Where XXXX is the port in-question