Viktor workers on server are unreliable

For multiple apps we use workers on our server.
Now, over the past months weā€™ve been adjusting settings according to multiple scenarios where the workers donā€™t (properly) start on server restarts.
It is now down to the point where we cant trust the workers running properly, and thus one of us checks the server daily.

We reĆÆnstalled the workers, allowing them to set-up new tasks (while deleting the workers and tasks before this).
We have used the trigger on the system start or on user logon, weā€™ve tried starting the workers with or without prompt but everytime (after weeks) suddenly a restart of the server messes this up.

Daniel Sommers works with us for a while, and once i mentioned this to him, he thought to remember more companies having trouble with this very issue. I would like to find a robust and reliable solution.
One solution i see is using a 3rd party application, which might offer more options or;

One thing to keep in mind is that our ICT currently doesnā€™t allow a server account to be loged on automatically on a server restart. Luckily Task Scheduler has a option (ā€œuitvoeren ongeacht of gebruiker wel of niet is aangemeldā€), the worker then is started without prompt. The viktor UI shows geen icons for ā€œStatus van integratiesā€, maar bij werkelijk gebruik van de workers reageren ze totaal niet.

Ik hoor graag wat jullie hiervoor als oplossingen kennen?

1 Like

Johan and I are in contact over internal actions to take at his organization in order to solve these issues. I will keep this thread updated if any findings are potentially applicable for other users

Could it also be possible that the connection between a worker and the platform is lost without any of them knowing?
We noticed some strange behavior with local workers after a short internet outage. The platform still thought the worker was connected, but when a job was send to it, nothing showed in the worker-console. Didnā€™t do any further investigation then, as a restart of the worker quickly solved the issue.

This started happening on our servers also:

image

While all workers are online on the server. Sometimes a single worker just turns off.
I wonder what makes this unreliable, because it is a big issue on production.

Restarting the workers restores the connection.

We are currently looking into the cause of this problem. We will update you as soon as there is progress on this matter.

I would like to be kept in the loop on this issue as well, as weā€™re encountering the same regarding status of integrations. The workers arenā€™t showing any errors, but the status on the platform is red.

By restarting the workers the issue is resolved for a few days (normally). Iā€™ve set up an auto-reboot of the server each morning which has improved the situation a bit, but still the problem occurs from time to time.

The server and its workers have performed very stable for more than a year, but since december 2022 (or so) weā€™ve encountered this issue.

We currently in the process of rolling out a solution that should solve this issue. Once everything is up and running I will post an update here again.

We have rolled out a solution that should solve the problems related to the workers seemingly losing connection to the platform. If problems still persist please let us know (either here or through email)

Regards,

Raoul

Just to add that I am facing the same issues of reliability with workers (generic). I cannot reliably connect to it and sometimes it takes two pressed of ā€˜sendā€™ to connect to it.