Implement temporary file storage
Let's base this on PostgreSQL's large objects:
- https://www.postgresql.org/docs/17/largeobjects.html
- https://hackage.haskell.org/package/postgresql-simple-0.7.0.0/docs/Database-PostgreSQL-Simple-LargeObjects.html
-
https://www.percona.com/blog/how-to-remove-an-orphan-large-object-in-postgresql-with-vacuumlo/ (
pg_largeobject
table contains all files with their OIDs)
It's better to upload files first, and create processing task using only that file's OID. The resulting JSON for workers is much smaller.
NOTE that this is mostly for internal use: I expect to use this instead of tempfile
(which, in case workers run on different machines, isn't reliable anymore) and instead of serializing the whole data into pgmq
job queue.
Also, we could save some bandwidth by reimplementing file upload first (as binary) and then creating a task based on the uploaded file. (Currently, for zip files, we base64-encode them on the frontend, and then decode on the backend).