[worker] fix an unfortunate coincidence of various async issues
This described in this comment: #477 (comment 14458) I repaste, for history: - job timeout was 30 seconds only and this is a big zip file, so the job timed out in worker - however, this was recently added https://gitlab.iscpif.fr/gargantext/haskell-gargantext/blame/dev/src/Gargantext/Database/Action/Flow.hs#L490 and the timeout wasn't caught and the worker continued happily - the job finished normally (most probably) - the job was restarted, because default strategy for timeouts is to restart a job - for sending files, we use postgres large objects because it keeps our JSONs small - when the job finishes, it clears definitely the large object so that we don't leave large, unused blob data - however, that job was restarted and there was no more a large object to work on - you got some sql error, but that wasn't the root cause Solution is: - don't catch any exception, but be careful and handle `Timeout` or `KillWorkerSafely` - increase job timeout for file upload - change timeout strategy for file upload to `TSDelete`, i.e. don't retry that job anymore
Showing
Please register or sign in to comment