Implement temporary file storage (#444) · Issues · gargantext / haskell-gargantext

Implement temporary file storage

Let's base this on PostgreSQL's large objects:

https://www.postgresql.org/docs/17/largeobjects.html
https://hackage.haskell.org/package/postgresql-simple-0.7.0.0/docs/Database-PostgreSQL-Simple-LargeObjects.html
https://www.percona.com/blog/how-to-remove-an-orphan-large-object-in-postgresql-with-vacuumlo/ (pg_largeobject table contains all files with their OIDs)

It's better to upload files first, and create processing task using only that file's OID. The resulting JSON for workers is much smaller.

NOTE that this is mostly for internal use: I expect to use this instead of tempfile (which, in case workers run on different machines, isn't reliable anymore) and instead of serializing the whole data into pgmq job queue.

Also, we could save some bandwidth by reimplementing file upload first (as binary) and then creating a task based on the uploaded file. (Currently, for zip files, we base64-encode them on the frontend, and then decode on the backend).

Edited Jan 24, 2025 by Przemyslaw Kaminski