Viktor workers on server are unreliable

For multiple apps we use workers on our server.
Now, over the past months we’ve been adjusting settings according to multiple scenarios where the workers don’t (properly) start on server restarts.
It is now down to the point where we cant trust the workers running properly, and thus one of us checks the server daily.

We reïnstalled the workers, allowing them to set-up new tasks (while deleting the workers and tasks before this).
We have used the trigger on the system start or on user logon, we’ve tried starting the workers with or without prompt but everytime (after weeks) suddenly a restart of the server messes this up.

Daniel Sommers works with us for a while, and once i mentioned this to him, he thought to remember more companies having trouble with this very issue. I would like to find a robust and reliable solution.
One solution i see is using a 3rd party application, which might offer more options or;

One thing to keep in mind is that our ICT currently doesn’t allow a server account to be loged on automatically on a server restart. Luckily Task Scheduler has a option (“uitvoeren ongeacht of gebruiker wel of niet is aangemeld”), the worker then is started without prompt. The viktor UI shows geen icons for “Status van integraties”, maar bij werkelijk gebruik van de workers reageren ze totaal niet.

Ik hoor graag wat jullie hiervoor als oplossingen kennen?

Johan and I are in contact over internal actions to take at his organization in order to solve these issues. I will keep this thread updated if any findings are potentially applicable for other users

Could it also be possible that the connection between a worker and the platform is lost without any of them knowing?
We noticed some strange behavior with local workers after a short internet outage. The platform still thought the worker was connected, but when a job was send to it, nothing showed in the worker-console. Didn’t do any further investigation then, as a restart of the worker quickly solved the issue.

This started happening on our servers also:

image

While all workers are online on the server. Sometimes a single worker just turns off.
I wonder what makes this unreliable, because it is a big issue on production.

Restarting the workers restores the connection.

We are currently looking into the cause of this problem. We will update you as soon as there is progress on this matter.