Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Submit feedback
    • Contribute to GitLab
  • Sign in
haskell-gargantext
haskell-gargantext
  • Project
    • Project
    • Details
    • Activity
    • Releases
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
  • Issues 175
    • Issues 175
    • List
    • Board
    • Labels
    • Milestones
  • Merge Requests 9
    • Merge Requests 9
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • gargantext
  • haskell-gargantexthaskell-gargantext
  • Issues
  • #479

Closed
Open
Opened Jun 13, 2025 by Przemyslaw Kaminski@cgenie
  • Report abuse
  • New issue
Report abuse New issue

Worker: implement subtasks

This is a continuation of #238.

We want to implement jobs that can spawn subtasks.

This is doable now directly, just by calling sendJob from within a job. When the current running job finishes, the spawned jobs will execute (if they are spawned on the same queue).

That said, I think flow requires that we make some aggregation once all data from API is fetched. Hence just spawning "small" fetch jobs isn't enough, we would have to know then they finished, to perform the aggregations at the end. Also, notifications to the user would bd broken with such small, isolated jobs.

Thus we need some persistence.

Our current haskell-bee design assumes that the Broker is simple: it can accept new messages and return next message upon request. It's not a database, you cannot update existing messages, etc.

The idea is this:

  • create 2 queues, one for "slow" and one for "fast" jobs
  • "slow" jobs spawn subtasks, and create some DB table/row so the subtasks can update their progress there
  • "slow" job then waits, polling the DB for changes
  • when all subtasks finish, the "slow" jobs continues with aggregation etc
  • "slow" job is reponsible for notifying the user

There is also an alternative design:

  • "slow" job works as before, but instead of sitting there waiting, it quits and reinserts itself at the end of the "slow" queue
  • next "slow" job comes in and does the same
  • "slow" job's responsibility is to just check that all subjobs are finished, if they are, continue with rest of execution

First approach is simpler in that, we have a 'timeout' setting for workers so we could more easily kill such "slow" jobs that hung. Also it's somewhat easier to implement notifications (there is a 'messageId' field for notifications that is taken from job's id; with respaning, "slow" job's id could change).

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
0
Labels
None
Assign labels
  • View project labels
Reference: gargantext/haskell-gargantext#479