• Przemyslaw Kaminski's avatar
    [worker] fix an unfortunate coincidence of various async issues · 406cd89e
    Przemyslaw Kaminski authored
    This described in this comment:
    #477 (comment 14458)
    
    I repaste, for history:
    
    - job timeout was 30 seconds only and this is a big zip file, so the job timed out in worker
    - however, this was recently added https://gitlab.iscpif.fr/gargantext/haskell-gargantext/blame/dev/src/Gargantext/Database/Action/Flow.hs#L490 and the timeout wasn't caught and the worker continued happily
    - the job finished normally (most probably)
    - the job was restarted, because default strategy for timeouts is to restart a job
    - for sending files, we use postgres large objects because it keeps our JSONs small
    - when the job finishes, it clears definitely the large object so that we don't leave large, unused blob data
    - however, that job was restarted and there was no more a large object to work on
    - you got some sql error, but that wasn't the root cause
    
    Solution is:
    - don't catch any exception, but be careful and handle `Timeout` or `KillWorkerSafely`
    - increase job timeout for file upload
    - change timeout strategy for file upload to `TSDelete`, i.e. don't retry that job anymore
    406cd89e
Name
Last commit
Last update
..
API Loading commit data...
Core Loading commit data...
Data/HashMap/Strict Loading commit data...
Database Loading commit data...
MicroServices Loading commit data...
Orphans Loading commit data...
System Loading commit data...
Utils Loading commit data...
API.hs Loading commit data...
Core.hs Loading commit data...
Database.hs Loading commit data...
Defaults.hs Loading commit data...
Orphans.hs Loading commit data...