Worker: implement subtasks (#479) · Issues · gargantext / haskell-gargantext

Worker: implement subtasks

This is a continuation of #238.

We want to implement jobs that can spawn subtasks.

This is doable now directly, just by calling sendJob from within a job. When the current running job finishes, the spawned jobs will execute (if they are spawned on the same queue).

That said, I think flow requires that we make some aggregation once all data from API is fetched. Hence just spawning "small" fetch jobs isn't enough, we would have to know then they finished, to perform the aggregations at the end. Also, notifications to the user would bd broken with such small, isolated jobs.

Thus we need some persistence.

Our current haskell-bee design assumes that the Broker is simple: it can accept new messages and return next message upon request. It's not a database, you cannot update existing messages, etc.

The idea is this:

create 2 queues, one for "slow" and one for "fast" jobs
"slow" jobs spawn subtasks, and create some DB table/row so the subtasks can update their progress there
"slow" job then waits, polling the DB for changes
when all subtasks finish, the "slow" jobs continues with aggregation etc
"slow" job is reponsible for notifying the user

There is also an alternative design:

"slow" job works as before, but instead of sitting there waiting, it quits and reinserts itself at the end of the "slow" queue
next "slow" job comes in and does the same
"slow" job's responsibility is to just check that all subjobs are finished, if they are, continue with rest of execution

First approach is simpler in that, we have a 'timeout' setting for workers so we could more easily kill such "slow" jobs that hung. Also it's somewhat easier to implement notifications (there is a 'messageId' field for notifications that is taken from job's id; with respaning, "slow" job's id could change).

This is a continuation of https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/238.

We want to implement jobs that can spawn subtasks.

This is doable now directly, just by calling `sendJob` from within a job. When the current running job finishes, the spawned jobs will execute (if they are spawned on the same queue).

Thus we need some persistence.

Our current `haskell-bee` design assumes that the `Broker` is simple: it can accept new messages and return next message upon request. It's not a database, you cannot update existing messages, etc.

The idea is this:
- create 2 queues, one for "slow" and one for "fast" jobs
- "slow" jobs spawn subtasks, and create some DB table/row so the subtasks can update their progress there
- "slow" job then waits, polling the DB for changes
- when all subtasks finish, the "slow" jobs continues with aggregation etc
- "slow" job is reponsible for notifying the user

There is also an alternative design:
- "slow" job works as before, but instead of sitting there waiting, it quits and reinserts itself at the end of the "slow" queue
- next "slow" job comes in and does the same
- "slow" job's responsibility is to just check that all subjobs are finished, if they are, continue with rest of execution