Weird behaviour when rolling back Postgres transactions (#480) · Issues · gargantext / haskell-gargantext

Weird behaviour when rolling back Postgres transactions

When working on #476, I had some code written like this:

insertMasterDoc :: ( HasNodeError err
                   , UniqParameters doc
                   , FlowCorpus doc
                   , MkCorpus   c
                   )
                 => GargConfig
                 -> UncommittedNgrams doc
                 -- ^ The ngrams extracted for /all/ the documents
                 -- and indexed by the hash of the given document.
                 -- We can use this map to associate the document
                 -- with the node being created.
                 -> Maybe c
                 -> doc
                 -> DBUpdate err DocId
insertMasterDoc cfg uncommittedNgrams c h = do
  (masterUserId, _, masterCorpusId) <- getOrMkRootWithCorpus cfg MkCorpusUserMaster c
  documentWithId <- insertDoc masterUserId masterCorpusId (toNode masterUserId Nothing h)

  _ <- Doc.add masterCorpusId [_index documentWithId]

  -- TODO
  -- create a corpus with database name (CSV or PubMed)
  -- add documents to the corpus (create node_node link)
  -- this will enable global database monitoring

  case commitNgramsForDocument uncommittedNgrams documentWithId of
    Left failed ->
      -- this allows rollingback the whole tx, and we won't leave dangling nodes behind.
      nodeError $ NodeCreationFailed (InsertDocFailed failed)
    Right ngramsDocsMap -> do
     lId <- getOrMkList masterCorpusId masterUserId
     _ <- saveDocNgramsWith lId ngramsDocsMap
     pure $ contextId2NodeId $ _index documentWithId

The call to nodeError in the middle of that DBUpdate transaction should have the effect to rollback the entire transaction, meaning that we shouldn't create a document node if we cannot extract Ngrams out of it, as that would be stateful (that's an arguable design decision which I think I have changed my mind about, but that's orthogonal to this issue).

Then, I had a test like this:

corpusAddDocBatchSurvival :: TestEnv -> Assertion
corpusAddDocBatchSurvival env = runTestMonad env $ do
  cfg <- view hasConfig
  parentId <- runDBQuery $ getRootId (UserName userMaster)
  [corpus] <- runDBQuery $ getCorporaWithParentId parentId
  let la  = Mono EN
  nlp <- view (nlpServerGet (_tt_lang la))

  oldCount <- docsInCorpusCount

  let docs = [exampleDocument_06, exampleDocument_07, exampleDocument_08 ]

  uNgrams1 <- extractNgramsFromDocument nlp la exampleDocument_06
  uNgrams2 <- extractNgramsFromDocument nlp la exampleDocument_08

  (failures, savedDocs) <- insertMasterDocs cfg (uNgrams1 <> uNgrams2) (Just $ _node_hyperdata corpus) docs

  c' <- docsInCorpusCount
  liftIO $ do
    failures  `shouldBe` [DocumentInsertionError "InternalNodeError Cannot make node due to: Couldn't find the associated ngrams for document with hash \\x666b8e7bfd7c0af37d630e1097791f7ba438a669ecb6d1cb38014edd0b7a2977, therefore the added document won't have any ngrams."]
    c' `shouldBe` oldCount + 2
    savedDocs `shouldBe` [ UnsafeMkNodeId 5, UnsafeMkNodeId 6 ]

With my surprise, this test fails with:

  test/Test/Database/Operations/DocumentSearch.hs:235:15: 
  1) Database, With test user, With test corpus, With test documents, Can survive failures in a batch
       expected: [nodeId-5,nodeId-6]
        but got: [nodeId-5,nodeId-7]

This tests simulates adding 3 documents, the middle one having no ngrams, so the system should persist only the first and the last. To my surprise, it actually persisted all three of them despite the call to nodeError to rollback the transaction.

While looking at the implementation of evalOp in the .Transactional module I thought I discovered the bug: in the case of DBFail we are using throwError, but really we should throw an exception in IO so that the exception handler as part of withTransaction would kick in, rolling back the transaction.

However, after I did the changes, the bug is still there -- I even changed the type of evalOp to make it monomorphic over IO but without luck.

There must be something glaringly obvious (or very subtle) I'm missing as we have tests to assess that a DBTx can rollback and they pass.

When working on #476, I had some code written like this:

```hs
insertMasterDoc :: ( HasNodeError err
                   , UniqParameters doc
                   , FlowCorpus doc
                   , MkCorpus   c
                   )
                 => GargConfig
                 -> UncommittedNgrams doc
                 -- ^ The ngrams extracted for /all/ the documents
                 -- and indexed by the hash of the given document.
                 -- We can use this map to associate the document
                 -- with the node being created.
                 -> Maybe c
                 -> doc
                 -> DBUpdate err DocId
insertMasterDoc cfg uncommittedNgrams c h = do
  (masterUserId, _, masterCorpusId) <- getOrMkRootWithCorpus cfg MkCorpusUserMaster c
  documentWithId <- insertDoc masterUserId masterCorpusId (toNode masterUserId Nothing h)

_ <- Doc.add masterCorpusId [_index documentWithId]

-- TODO
  -- create a corpus with database name (CSV or PubMed)
  -- add documents to the corpus (create node_node link)
  -- this will enable global database monitoring

case commitNgramsForDocument uncommittedNgrams documentWithId of
    Left failed ->
      -- this allows rollingback the whole tx, and we won't leave dangling nodes behind.
      nodeError $ NodeCreationFailed (InsertDocFailed failed)
    Right ngramsDocsMap -> do
     lId <- getOrMkList masterCorpusId masterUserId
     _ <- saveDocNgramsWith lId ngramsDocsMap
     pure $ contextId2NodeId $ _index documentWithId
```

The call to `nodeError` in the middle of that `DBUpdate` transaction should have the effect to rollback the entire transaction, meaning that we shouldn't create a document node if we cannot extract Ngrams out of it, as that would be stateful (that's an arguable design decision which I think I have changed my mind about, but that's orthogonal to this issue).

Then, I had a test like this:

```
corpusAddDocBatchSurvival :: TestEnv -> Assertion
corpusAddDocBatchSurvival env = runTestMonad env $ do
  cfg <- view hasConfig
  parentId <- runDBQuery $ getRootId (UserName userMaster)
  [corpus] <- runDBQuery $ getCorporaWithParentId parentId
  let la  = Mono EN
  nlp <- view (nlpServerGet (_tt_lang la))

oldCount <- docsInCorpusCount

let docs = [exampleDocument_06, exampleDocument_07, exampleDocument_08 ]

uNgrams1 <- extractNgramsFromDocument nlp la exampleDocument_06
  uNgrams2 <- extractNgramsFromDocument nlp la exampleDocument_08

(failures, savedDocs) <- insertMasterDocs cfg (uNgrams1 <> uNgrams2) (Just $ _node_hyperdata corpus) docs

c' <- docsInCorpusCount
  liftIO $ do
    failures  `shouldBe` [DocumentInsertionError "InternalNodeError Cannot make node due to: Couldn't find the associated ngrams for document with hash \\x666b8e7bfd7c0af37d630e1097791f7ba438a669ecb6d1cb38014edd0b7a2977, therefore the added document won't have any ngrams."]
    c' `shouldBe` oldCount + 2
    savedDocs `shouldBe` [ UnsafeMkNodeId 5, UnsafeMkNodeId 6 ]
```

With my surprise, this test fails with:

```
  test/Test/Database/Operations/DocumentSearch.hs:235:15: 
  1) Database, With test user, With test corpus, With test documents, Can survive failures in a batch
       expected: [nodeId-5,nodeId-6]
        but got: [nodeId-5,nodeId-7]
```

While looking at the implementation of `evalOp` in the `.Transactional` module I thought I discovered the bug: in the case of `DBFail` we are using `throwError`, but really we should throw an exception in `IO` so that the exception handler as part of `withTransaction` would kick in, rolling back the transaction.

However, after I did the changes, the bug is still there -- I even changed the type of `evalOp` to make it monomorphic over `IO` but without luck.

There must be something glaringly obvious (or very subtle) I'm missing as we have tests to assess that a `DBTx` can rollback and they pass.