Transaction API for DB Transactions (!406) · Merge Requests · gargantext / haskell-gargantext

Transaction API for DB Transactions

This MR is a work in progress towards #411, to model our DB operations in a transactional way. @cgenie I'm trying my best as part of this MR description to state the problem and the solution, hopefully it will ease the review of this.

Problem description

Before this MR, our queries and updates were not "composable" when it came to ACID, let me give you an example. Let's consider:

pureQueryExample :: HasNodeError err
                => NodeId
                -> User
                -> DBCmd err (Node JSON.Value, UserId)
pureQueryExample n u = do
  nodeToCheck <- getNode n
  if (fancyField $ nodeToCheck == 42) then updateUser u ... else getUserId u

Our GGTX queries were not running DB operations like getNode and updateUser in a single PG transaction, meaning that in a highly concurrent environment we could be reading Node n and by the time we get past the successful check, into the then branch, some other actor might have modified fancyField of Node n, meaning we would be running a destructive operation (an update) or a precondition that has now been violated.

Not only that, but our "runners" had the wrong granularity: we had functions like runPGSQuery , execPGSQuery & co which would all run in the same monad DBCmd, meaning we couldn't have the type system enforce that they all compose together in a single transaction, unless we deliberately tried to do so, which is very error prone as it's enough to forget putting a withTransaction at the top or, worse, using too many withTransaction leading to deadlock (AFAIK, in PG you can't nest transactions).

Solution

The solution I came up with is an API that does a few things:

It distinguishes DB reads from DB writes, so that we can make decisions based on that (see later);
It allows DB operations to be composed;
It actually "fires" the DB operations only when we call runDBQuery or runDBTx, and these evaluators also ensures that we call withTransaction only in once place, at the right time.

Regarding 1. distinguishing reads from writes is nice for a couple of reasons:

We can't accidentally mix updates and reads

If we try to write:

pureQueryExample :: HasNodeError err
                => NodeId
                -> User
                -> DBCmd err (Node JSON.Value, UserId)
pureQueryExample n u = runDBQuery $ do
  nodeToCheck <- getNode n
  userIdCheck <- getUserId u
  shareNode (SourceId $ _node_id nodeToCheck) (TargetId $ _node_id nodeToCheck)
  pure (nodeToCheck, userIdCheck)

We would get:

src/Gargantext/Database/Transactional/Example.hs:35:3: error: [GHC-83865]
    • Couldn't match type ‘Gargantext.Database.Transactional.DBWrite’
                     with ‘Gargantext.Database.Transactional.DBRead’
      Expected: DBTx err Gargantext.Database.Transactional.DBRead Int
        Actual: DBUpdate err Int

This is because we are trying to run a query but we have sneaked in a DBUpdate, so the type system rejects that

We can bundle queries into `read-only` PG transactions

Postgres supports "Read-only" transaction, which IIUC are a sort of "lighter" transactions, and we can exploit this, thanks to the fact that the type system will ensure we are calling runDBQuery on read-only operations.

Where do we go from here?

Assuming the approach makes sense, this is a WIP MR because this just specs the API, but it doesn't actually use it. In order to benefit from this, we should:

Convert existing DB operations using this API. I have sketched out a few examples in the Transactional.Example module;
We should add tests, to ensure that the DB operation still works and that we get genuine DB rollbacks if an exception in thrown;
Deprecate and remove old runners like runPGSQuery & co.

NOTA BENE: In order for us to benefit from this API we need to be diligent on two fronts:

We need to be careful when refactoring and/or adding new DB Operations -- they need to match exactly what they say on the tin (i.e. a PG sql quasi quoter doing updates needs to be marked as a DBUpdate etc);
We need to compose as much as possible and delay running DB ops as much as possible, ideally only at the handler level. What I mean by that is that if we start littering our code with runDBQuery and runDBTx after each DB operation we will be back at square one, we will be losing transactional safety. What we need to try hard to do is to compose operations as much as possible running in the DBTx monad, and run them "at the very end" in the Servant handler that is serving the request. That might be tricky to do sometimes, and if that's the case the solution is adding more operations to the DBTransactionOp and make our monad more powerful or do a case-by-case analysis where it's actually fine to partially run operations.

@cgenie I would be curious to know what you think, I hope that what I'm proposing makes sense!

## Problem description

Before this MR, our queries and updates were not "composable" when it came to ACID, let me give you an example. Let's consider:

```hs
pureQueryExample :: HasNodeError err
                => NodeId
                -> User
                -> DBCmd err (Node JSON.Value, UserId)
pureQueryExample n u = do
  nodeToCheck <- getNode n
  if (fancyField $ nodeToCheck == 42) then updateUser u ... else getUserId u
```

Our GGTX queries were not running DB operations like `getNode` and `updateUser` in a single PG transaction, meaning that in a highly concurrent environment we could be reading `Node` `n` and by the time we get _past_ the successful check, into the `then` branch, _some other actor_ might have modified `fancyField` of `Node` `n`, meaning we would be running a destructive operation (an update) or a precondition that has now _been violated_.

Not only that, but our "runners" had the wrong granularity: we had functions like `runPGSQuery` , `execPGSQuery` & co which would all run in the same monad `DBCmd`, meaning we couldn't have the type system enforce that they all compose together in a single transaction, unless we deliberately tried to do so, which is very error prone as it's enough to forget putting a `withTransaction` at the top or, worse, using too many `withTransaction` leading to deadlock (AFAIK, in PG you can't nest transactions).

## Solution

The solution I came up with is an API that does a few things:

1. It distinguishes DB reads from DB writes, so that we can make decisions based on that (see later);
2. It allows DB operations to be composed;
3. It actually "fires" the DB operations only when we call `runDBQuery` or `runDBTx`, and these evaluators also ensures that we call `withTransaction` only in once place, at the right time.

Regarding `1.` distinguishing reads from writes is nice for a couple of reasons:

### We can't accidentally mix updates and reads

If we try to write:

```hs
pureQueryExample :: HasNodeError err
                => NodeId
                -> User
                -> DBCmd err (Node JSON.Value, UserId)
pureQueryExample n u = runDBQuery $ do
  nodeToCheck <- getNode n
  userIdCheck <- getUserId u
  shareNode (SourceId $ _node_id nodeToCheck) (TargetId $ _node_id nodeToCheck)
  pure (nodeToCheck, userIdCheck)
```

We would get:

```
src/Gargantext/Database/Transactional/Example.hs:35:3: error: [GHC-83865]
    • Couldn't match type ‘Gargantext.Database.Transactional.DBWrite’
                     with ‘Gargantext.Database.Transactional.DBRead’
      Expected: DBTx err Gargantext.Database.Transactional.DBRead Int
        Actual: DBUpdate err Int
```

This is because we are trying to run a query but we have sneaked in a `DBUpdate`, so the type system rejects that

### We can bundle queries into `read-only` PG transactions

## Where do we go from here?

Assuming the approach makes sense, this is a WIP MR because this just specs the API, but it doesn't actually use it. In order to benefit from this, we should:

- [ ] Convert existing DB operations using this API. I have sketched out a few examples in the `Transactional.Example` module;
- [x] We should add tests, to ensure that the DB operation still works and that we get genuine DB rollbacks if an exception in thrown;
- [ ] Deprecate and remove old runners like `runPGSQuery` & co.

**NOTA BENE:** In order for us to benefit from this API we need to be diligent on two fronts:

1. We need to be careful when refactoring and/or adding new DB Operations -- they need to match exactly what they say on the tin (i.e. a PG `sql` quasi quoter doing updates needs to be marked as a `DBUpdate` etc);
2. We need to compose as much as possible and delay running DB ops as much as possible, ideally only at the handler level. What I mean by that is that if we start littering our code with `runDBQuery` and `runDBTx` after each DB operation we will be back at square one, we will be losing transactional safety. What we need to try hard to do is to compose operations as much as possible running in the `DBTx` monad, and run them "at the very end" in the Servant handler that is serving the request. That might be _tricky to do sometimes_, and if that's the case the solution is adding more operations to the `DBTransactionOp` and make our monad more powerful or do a case-by-case analysis where it's actually fine to partially run operations.

@cgenie I would be curious to know what you think, I hope that what I'm proposing makes sense!

Edited Apr 28, 2025 by Alfredo Di Napoli

Transaction API for DB Transactions

Problem description

Solution

We can't accidentally mix updates and reads

We can bundle queries into read-only PG transactions

Where do we go from here?

Check out, review, and merge locally

Revert this merge request

Cherry-pick this merge request

We can bundle queries into `read-only` PG transactions