[idea] Multi-tenant setup for the postgres DB (#218) · Issues · gargantext / haskell-gargantext

[idea] Multi-tenant setup for the postgres DB

#199 (comment 7479) expanded into a separate issue.

The idea is to separate users in the DB by using separate schemas per user. Most projects just use the public postgres schema, forgetting that they can be created and used with ease.

https://www.postgresql.org/docs/current/ddl-schemas.html

Thus, each user (or, in general terms, a tenant) has his own schema in postgres. There are global tables (in the public schema) and there are tenant tables (all of which should have the same structure). So, for example, we could have a global list of documents but a tenant-only list of ngrams or ngram<->document, or document<->list tables. This allows to split user tables and gives better performance (queries are for separate tables, we can refresh materialized views without blocking others). The tables are also smaller. At the same time, they have the same schema so the code for handling them is not changed much.

I have used such a setup with django-tenants https://django-tenants.readthedocs.io/en/latest/ with much success, in a commercial project from the past. The advantage of django-tenants was that it was possible to "narrow down" specific piece of code to a specific tenant. This was done by setting a connection's search path:

https://www.postgresql.org/docs/current/ddl-schemas.html#DDL-SCHEMAS-PATH

More or less the code looks like this: withTenant conn tenantX $ do ... and withTenant is something like Haskell's bracket, i.e. setup is setting search path to match tenant schema first, then the public schema, and bracket exit is setting it back to public only. This way the code sees tenant tables, and the public one, but doesn't see other tenants (unless someone forces to do so in the code by using select * from other_tenant.table.

https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/199#note_7479 expanded into a separate issue.

The idea is to separate users in the DB by using separate schemas per user. Most projects just use the `public` postgres schema, forgetting that they can be created and used with ease.

https://www.postgresql.org/docs/current/ddl-schemas.html

Thus, each user (or, in general terms, a tenant) has his own schema in postgres. There are global tables (in the `public` schema) and there are tenant tables (all of which should have the same structure). So, for example, we could have a global list of documents but a tenant-only list of ngrams or ngram<->document, or document<->list tables. This allows to split user tables and gives better performance (queries are for separate tables, we can refresh materialized views without blocking others). The tables are also smaller. At the same time, they have the same schema so the code for handling them is not changed much.

https://www.postgresql.org/docs/current/ddl-schemas.html#DDL-SCHEMAS-PATH

More or less the code looks like this: `withTenant conn tenantX $ do ...` and `withTenant` is something like Haskell's `bracket`, i.e. setup is setting search path to match tenant schema first, then the public schema, and bracket exit is setting it back to public only. This way the code sees tenant tables, and the public one, but doesn't see other tenants (unless someone forces to do so in the code by using `select * from other_tenant.table`.