Viktor workers on server are unreliable

Johan · 9 January 2023 13:18

For multiple apps we use workers on our server.
Now, over the past months we’ve been adjusting settings according to multiple scenarios where the workers don’t (properly) start on server restarts.
It is now down to the point where we cant trust the workers running properly, and thus one of us checks the server daily.

We reïnstalled the workers, allowing them to set-up new tasks (while deleting the workers and tasks before this).
We have used the trigger on the system start or on user logon, we’ve tried starting the workers with or without prompt but everytime (after weeks) suddenly a restart of the server messes this up.

Daniel Sommers works with us for a while, and once i mentioned this to him, he thought to remember more companies having trouble with this very issue. I would like to find a robust and reliable solution.
One solution i see is using a 3rd party application, which might offer more options or;

One thing to keep in mind is that our ICT currently doesn’t allow a server account to be loged on automatically on a server restart. Luckily Task Scheduler has a option (“uitvoeren ongeacht of gebruiker wel of niet is aangemeld”), the worker then is started without prompt. The viktor UI shows geen icons for “Status van integraties”, maar bij werkelijk gebruik van de workers reageren ze totaal niet.

Ik hoor graag wat jullie hiervoor als oplossingen kennen?

Daniel · 11 January 2023 09:56

Johan and I are in contact over internal actions to take at his organization in order to solve these issues. I will keep this thread updated if any findings are potentially applicable for other users

rkg · 18 January 2023 11:54

Could it also be possible that the connection between a worker and the platform is lost without any of them knowing?
We noticed some strange behavior with local workers after a short internet outage. The platform still thought the worker was connected, but when a job was send to it, nothing showed in the worker-console. Didn’t do any further investigation then, as a restart of the worker quickly solved the issue.

Johan_Tuls · 23 January 2023 06:28

This started happening on our servers also:

While all workers are online on the server. Sometimes a single worker just turns off.
I wonder what makes this unreliable, because it is a big issue on production.

Restarting the workers restores the connection.

rdejonge · 23 January 2023 08:33

We are currently looking into the cause of this problem. We will update you as soon as there is progress on this matter.

Tom_Nillesen · 17 February 2023 10:42

I would like to be kept in the loop on this issue as well, as we’re encountering the same regarding status of integrations. The workers aren’t showing any errors, but the status on the platform is red.

By restarting the workers the issue is resolved for a few days (normally). I’ve set up an auto-reboot of the server each morning which has improved the situation a bit, but still the problem occurs from time to time.

The server and its workers have performed very stable for more than a year, but since december 2022 (or so) we’ve encountered this issue.

rdejonge · 17 February 2023 13:16

We currently in the process of rolling out a solution that should solve this issue. Once everything is up and running I will post an update here again.

rdejonge · 21 February 2023 12:55

We have rolled out a solution that should solve the problems related to the workers seemingly losing connection to the platform. If problems still persist please let us know (either here or through email)

Regards,

Raoul

topologic · 5 May 2023 12:26

Just to add that I am facing the same issues of reliability with workers (generic). I cannot reliably connect to it and sometimes it takes two pressed of ‘send’ to connect to it.