• Przemyslaw Kaminski's avatar
    [worker] fix an unfortunate coincidence of various async issues · 406cd89e
    Przemyslaw Kaminski authored
    This described in this comment:
    #477 (comment 14458)
    
    I repaste, for history:
    
    - job timeout was 30 seconds only and this is a big zip file, so the job timed out in worker
    - however, this was recently added https://gitlab.iscpif.fr/gargantext/haskell-gargantext/blame/dev/src/Gargantext/Database/Action/Flow.hs#L490 and the timeout wasn't caught and the worker continued happily
    - the job finished normally (most probably)
    - the job was restarted, because default strategy for timeouts is to restart a job
    - for sending files, we use postgres large objects because it keeps our JSONs small
    - when the job finishes, it clears definitely the large object so that we don't leave large, unused blob data
    - however, that job was restarted and there was no more a large object to work on
    - you got some sql error, but that wasn't the root cause
    
    Solution is:
    - don't catch any exception, but be careful and handle `Timeout` or `KillWorkerSafely`
    - increase job timeout for file upload
    - change timeout strategy for file upload to `TSDelete`, i.e. don't retry that job anymore
    406cd89e
gargantext.cabal 30.8 KB