Transaction API for DB Transactions
This MR is a work in progress towards #411, to model our DB operations in a transactional way. @cgenie I'm trying my best as part of this MR description to state the problem and the solution, hopefully it will ease the review of this.
Problem description
Before this MR, our queries and updates were not "composable" when it came to ACID, let me give you an example. Let's consider:
pureQueryExample :: HasNodeError err
=> NodeId
-> User
-> DBCmd err (Node JSON.Value, UserId)
pureQueryExample n u = do
nodeToCheck <- getNode n
if (fancyField $ nodeToCheck == 42) then updateUser u ... else getUserId u
Our GGTX queries were not running DB operations like getNode
and updateUser
in a single PG transaction, meaning that in a highly concurrent environment we could be reading Node
n
and by the time we get past the successful check, into the then
branch, some other actor might have modified fancyField
of Node
n
, meaning we would be running a destructive operation (an update) or a precondition that has now been violated.
Not only that, but our "runners" had the wrong granularity: we had functions like runPGSQuery
, execPGSQuery
& co which would all run in the same monad DBCmd
, meaning we couldn't have the type system enforce that they all compose together in a single transaction, unless we deliberately tried to do so, which is very error prone as it's enough to forget putting a withTransaction
at the top or, worse, using too many withTransaction
leading to deadlock (AFAIK, in PG you can't nest transactions).
Solution
The solution I came up with is an API that does a few things:
- It distinguishes DB reads from DB writes, so that we can make decisions based on that (see later);
- It allows DB operations to be composed;
- It actually "fires" the DB operations only when we call
runDBQuery
orrunDBTx
, and these evaluators also ensures that we callwithTransaction
only in once place, at the right time.
Regarding 1.
distinguishing reads from writes is nice for a couple of reasons:
We can't accidentally mix updates and reads
If we try to write:
pureQueryExample :: HasNodeError err
=> NodeId
-> User
-> DBCmd err (Node JSON.Value, UserId)
pureQueryExample n u = runDBQuery $ do
nodeToCheck <- getNode n
userIdCheck <- getUserId u
shareNode (SourceId $ _node_id nodeToCheck) (TargetId $ _node_id nodeToCheck)
pure (nodeToCheck, userIdCheck)
We would get:
src/Gargantext/Database/Transactional/Example.hs:35:3: error: [GHC-83865]
• Couldn't match type ‘Gargantext.Database.Transactional.DBWrite’
with ‘Gargantext.Database.Transactional.DBRead’
Expected: DBTx err Gargantext.Database.Transactional.DBRead Int
Actual: DBUpdate err Int
This is because we are trying to run a query but we have sneaked in a DBUpdate
, so the type system rejects that
read-only
PG transactions
We can bundle queries into Postgres supports "Read-only" transaction, which IIUC are a sort of "lighter" transactions, and we can exploit this, thanks to the fact that the type system will ensure we are calling runDBQuery
on read-only operations.
Where do we go from here?
Assuming the approach makes sense, this is a WIP MR because this just specs the API, but it doesn't actually use it. In order to benefit from this, we should:
-
Convert existing DB operations using this API. I have sketched out a few examples in the
Transactional.Example
module; - We should add tests, to ensure that the DB operation still works and that we get genuine DB rollbacks if an exception in thrown;
-
Deprecate and remove old runners like
runPGSQuery
& co.
NOTA BENE: In order for us to benefit from this API we need to be diligent on two fronts:
- We need to be careful when refactoring and/or adding new DB Operations -- they need to match exactly what they say on the tin (i.e. a PG
sql
quasi quoter doing updates needs to be marked as aDBUpdate
etc); - We need to compose as much as possible and delay running DB ops as much as possible, ideally only at the handler level. What I mean by that is that if we start littering our code with
runDBQuery
andrunDBTx
after each DB operation we will be back at square one, we will be losing transactional safety. What we need to try hard to do is to compose operations as much as possible running in theDBTx
monad, and run them "at the very end" in the Servant handler that is serving the request. That might be tricky to do sometimes, and if that's the case the solution is adding more operations to theDBTransactionOp
and make our monad more powerful or do a case-by-case analysis where it's actually fine to partially run operations.
@cgenie I would be curious to know what you think, I hope that what I'm proposing makes sense!