Squashed commit of the following:

commit be879b1e
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Thu Jan 30 18:22:44 2025 +0100

    [ngrams] code fixes according to review

    Related MR:
    !378

commit bf89561b
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Wed Jan 22 21:11:47 2025 +0100

    [test] notification on node move

    Also, some small refactorings.

commit 3d5d74ab
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Wed Jan 22 20:13:44 2025 +0100

    [tests] add notifications func comment, fix core/notifications indent

commit b8ea3af2
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Wed Jan 22 19:13:35 2025 +0100

    [update-project-dependencies]

commit 1217baf4
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Wed Jan 22 19:09:17 2025 +0100

    [tests] notifications: test async notifications for update tree

    Related to
    #418

commit 874785e9
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Tue Jan 21 06:06:27 2025 +0100

    [refactor] unify Database & ExternalIDs

    These types are the same, except for Database.Empty

    I managed to have backwards compatibility with the frontend format,
    hence the frontend doesn't need any changes.

commit e7b16520
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Tue Jan 21 06:05:57 2025 +0100

    [cabal] upgrade haskell-bee to fix TSRetry and ESRepeat issues

commit ad045ae0
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Jan 20 06:32:49 2025 +0100

    [cabal] upgrade haskell-bee tag

commit b3910bb4
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Tue Jan 14 10:56:12 2025 +0100

    [test] move some Arbitrary instances to Test/Instances.hs

commit bb282d02
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Tue Jan 14 09:17:23 2025 +0100

    [test] WithQuery offline test (with EPO constructor)

commit c0fe2e51
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Tue Jan 14 06:59:45 2025 +0100

    [query] move EPO user/token into the datafield

    This simplifies the WithQuery structure even more

commit 93586adc
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Jan 13 17:45:42 2025 +0100

    [tests] fix WithQuery frontend serialization test

    Also, add WithQuery pubmed test (with api_key)

commit bc29319c
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Jan 13 10:13:15 2025 +0100

    [ngrams] simplify WithQuery json structure

    There is now only a 'datafield' field, no need for duplicated
    'database'.

    Related to #441

commit e6fdbee4
Merge: 95dc32b3 13457ca8
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Fri Jan 10 12:03:59 2025 +0100

    Merge branch 'dev' into 224-dev-understanding-ngrams

commit 95dc32b3
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Tue Jan 7 20:01:11 2025 +0100

    [ngrams] refactor PubMed DB type (to include Maybe APIKey)

commit baa2491f
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Tue Jan 7 18:09:04 2025 +0100

    [refactor] searx search refactoring

commit fcf83bf7
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Tue Jan 7 11:14:03 2025 +0100

    [ngrams] more types annotations

commit 0d8a77c4
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Dec 30 16:15:07 2024 +0100

    [ngrams, test] refactor: Count -> Terms

commit 85f1dffe
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Dec 30 14:35:05 2024 +0100

    [ngrams] refactor opaque Int into TermsWeight newtype

commit a81ea049
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Dec 30 14:34:39 2024 +0100

    [CLI] fix limit removal

    It wasn't used anyways.

commit d1dfbf79
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Dec 30 11:35:41 2024 +0100

    [ngrams] one more simplification in ngramsByDoc'

commit fcb48b8f
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Dec 30 11:33:33 2024 +0100

    [ngrams] some more simplification of ngramsByDoc'

commit ab7c1766
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Dec 30 11:00:19 2024 +0100

    [ngrams, tests] understanding ngramsByDoc better

commit 35c2d0b0
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Dec 23 21:20:29 2024 +0100

    [ngrams] small simplification to docNgrams function

commit 161ac077
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Dec 23 18:35:59 2024 +0100

    [ngrams] annotate types of ngrams algorithms

commit 08c7c91c
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Sat Dec 21 09:45:00 2024 +0100

    [ngrams] improve function documentation, add types, add unit tests

    I want to understand ngrams algorithms better.
parent 942e663f
Pipeline #7289 passed with stages
in 51 minutes and 14 seconds
...@@ -371,7 +371,7 @@ https://haskell-language-server.readthedocs.io/en/latest/installation.html ...@@ -371,7 +371,7 @@ https://haskell-language-server.readthedocs.io/en/latest/installation.html
Running the tests can be done via the following command: Running the tests can be done via the following command:
```hs ```shell
cabal v2-test --test-show-details=streaming --flags 'test-crypto no-phylo-debug-logs' cabal v2-test --test-show-details=streaming --flags 'test-crypto no-phylo-debug-logs'
``` ```
...@@ -383,10 +383,24 @@ The flags have the following meaning: ...@@ -383,10 +383,24 @@ The flags have the following meaning:
In order for some tests to run (like the phylo ones) is **required** to install the `gargantext-cli` via: In order for some tests to run (like the phylo ones) is **required** to install the `gargantext-cli` via:
```hs ```shell
cabal v2-install gargantext:exe:gargantext cabal v2-install gargantext:exe:gargantext
``` ```
For tasty, if you want to run specific test (via patterns), use:
```shell
cabal v2-run garg-test-tasty -- -p '/Ngrams/
```
For integration tests, do:
```shell
cabal v2-test garg-test-hspec --test-show-details=streaming --test-option=--match='/some pattern/'
```
You could also use [ghciwatch](https://mercurytechnologies.github.io/ghciwatch/) for testsing:
```shell
ghciwatch --command "cabal v2-repl garg-test-tasty" --after-startup-ghci ':set args "--pattern" "/Ngrams/"' --after-startup-ghci "Main.main" --after-reload-ghci "Main.main" --watch src --watch test
```
### Modifying a golden test to accept a new (expected) output ### Modifying a golden test to accept a new (expected) output
Some tests, like the Phylo one, use golden testing to ensure that the JSON Phylo we generate is Some tests, like the Phylo one, use golden testing to ensure that the JSON Phylo we generate is
...@@ -394,7 +408,7 @@ the same as an expected one. This allows us to catch regressions in the serialis ...@@ -394,7 +408,7 @@ the same as an expected one. This allows us to catch regressions in the serialis
Sometimes, however, we genuinely want to modify the output so that it's the new reference (i.e. the new Sometimes, however, we genuinely want to modify the output so that it's the new reference (i.e. the new
golden reference). To do so, it's enough to run the testsuite passing the `--accept` flag, for example: golden reference). To do so, it's enough to run the testsuite passing the `--accept` flag, for example:
```hs ```shell
cabal v2-test garg-test-tasty --test-show-details=streaming --flags 'test-crypto no-phylo-debug-logs' --test-option=--pattern='/Phylo/' --test-option=--accept" cabal v2-test garg-test-tasty --test-show-details=streaming --flags 'test-crypto no-phylo-debug-logs' --test-option=--pattern='/Phylo/' --test-option=--accept"
``` ```
......
...@@ -26,7 +26,6 @@ import Gargantext.API.Node () -- instances ...@@ -26,7 +26,6 @@ import Gargantext.API.Node () -- instances
import Gargantext.Core (Lang(..)) import Gargantext.Core (Lang(..))
import Gargantext.Core.Text.Corpus.Parsers (FileFormat(..), FileType(..)) import Gargantext.Core.Text.Corpus.Parsers (FileFormat(..), FileType(..))
import Gargantext.Core.Types.Individu (User(..)) import Gargantext.Core.Types.Individu (User(..))
import Gargantext.Core.Types.Query
import Gargantext.Database.Action.Flow (flowCorpusFile, flowAnnuaire, TermType(..)) import Gargantext.Database.Action.Flow (flowCorpusFile, flowAnnuaire, TermType(..))
import Gargantext.Database.Action.Flow.Types (FlowCmdM) import Gargantext.Database.Action.Flow.Types (FlowCmdM)
import Gargantext.Database.Admin.Types.Node (CorpusId) import Gargantext.Database.Admin.Types.Node (CorpusId)
...@@ -39,16 +38,16 @@ import qualified Data.Text as T ...@@ -39,16 +38,16 @@ import qualified Data.Text as T
importCLI :: ImportArgs -> IO () importCLI :: ImportArgs -> IO ()
importCLI (ImportArgs fun user name settingsPath limit corpusPath) = do importCLI (ImportArgs fun user name settingsPath corpusPath) = do
let let
tt = Multi EN tt = Multi EN
format = TsvGargV3 format = TsvGargV3
corpus :: forall m. (FlowCmdM DevEnv BackendInternalError m, MonadJobStatus m, JobHandle m ~ DevJobHandle) => m CorpusId corpus :: forall m. (FlowCmdM DevEnv BackendInternalError m, MonadJobStatus m, JobHandle m ~ DevJobHandle) => m CorpusId
mkCorpusUser = MkCorpusUserNormalCorpusName (UserName $ cs user) (cs name :: Text) mkCorpusUser = MkCorpusUserNormalCorpusName (UserName $ cs user) (cs name :: Text)
corpus = flowCorpusFile mkCorpusUser limit tt format Plain corpusPath Nothing DevJobHandle corpus = flowCorpusFile mkCorpusUser tt format Plain corpusPath Nothing DevJobHandle
corpusTsvHal :: forall m. (FlowCmdM DevEnv BackendInternalError m, MonadJobStatus m, JobHandle m ~ DevJobHandle) => m CorpusId corpusTsvHal :: forall m. (FlowCmdM DevEnv BackendInternalError m, MonadJobStatus m, JobHandle m ~ DevJobHandle) => m CorpusId
corpusTsvHal = flowCorpusFile mkCorpusUser limit tt TsvHal Plain corpusPath Nothing DevJobHandle corpusTsvHal = flowCorpusFile mkCorpusUser tt TsvHal Plain corpusPath Nothing DevJobHandle
annuaire :: forall m. (FlowCmdM DevEnv BackendInternalError m, MonadJobStatus m, JobHandle m ~ DevJobHandle) => m CorpusId annuaire :: forall m. (FlowCmdM DevEnv BackendInternalError m, MonadJobStatus m, JobHandle m ~ DevJobHandle) => m CorpusId
annuaire = flowAnnuaire (MkCorpusUserNormalCorpusName (UserName $ cs user) "Annuaire") (Multi EN) corpusPath DevJobHandle annuaire = flowAnnuaire (MkCorpusUserNormalCorpusName (UserName $ cs user) "Annuaire") (Multi EN) corpusPath DevJobHandle
...@@ -76,7 +75,6 @@ import_p = fmap CCMD_import $ ImportArgs ...@@ -76,7 +75,6 @@ import_p = fmap CCMD_import $ ImportArgs
<*> ( option str ( long "user") ) <*> ( option str ( long "user") )
<*> ( option str ( long "name") ) <*> ( option str ( long "name") )
<*> settings_p <*> settings_p
<*> (fmap Limit ( option auto ( long "limit" <> metavar "INT" <> help "The limit for the query") ))
<*> ( option str ( long "corpus-path" <> help "Path to corpus file") ) <*> ( option str ( long "corpus-path" <> help "Path to corpus file") )
function_p :: String -> Either String ImportFunction function_p :: String -> Either String ImportFunction
......
...@@ -19,7 +19,6 @@ import Data.Text (Text) ...@@ -19,7 +19,6 @@ import Data.Text (Text)
import Gargantext.API.Admin.EnvTypes (Mode) import Gargantext.API.Admin.EnvTypes (Mode)
import Gargantext.Core.Config.Types (SettingsFile(..)) import Gargantext.Core.Config.Types (SettingsFile(..))
import Gargantext.Core.Types (NodeId) import Gargantext.Core.Types (NodeId)
import Gargantext.Core.Types.Query
import Prelude import Prelude
newtype CorpusFile = CorpusFile { _CorpusFile :: FilePath } newtype CorpusFile = CorpusFile { _CorpusFile :: FilePath }
...@@ -55,7 +54,6 @@ data ImportArgs = ImportArgs ...@@ -55,7 +54,6 @@ data ImportArgs = ImportArgs
, imp_user :: !Text , imp_user :: !Text
, imp_name :: !Text , imp_name :: !Text
, imp_settings :: !SettingsFile , imp_settings :: !SettingsFile
, imp_limit :: !Limit
, imp_corpus_path :: !FilePath , imp_corpus_path :: !FilePath
} deriving (Show, Eq) } deriving (Show, Eq)
......
...@@ -18,7 +18,7 @@ fi ...@@ -18,7 +18,7 @@ fi
# with the `sha256sum` result calculated on the `cabal.project` and # with the `sha256sum` result calculated on the `cabal.project` and
# `cabal.project.freeze`. This ensures the files stay deterministic so that CI # `cabal.project.freeze`. This ensures the files stay deterministic so that CI
# cache can kick in. # cache can kick in.
expected_cabal_project_hash="1abcdd99d5d50660e640be8a340c90331a84ef266d174c7ca6099c1c04ef65ea" expected_cabal_project_hash="ac293a4c66092996bc85fbf14ef34b7cce3ed5b0612ceb9e1a5f395059631e0b"
expected_cabal_project_freeze_hash="32310c4d4e7b4679dcb90dcfcd0d6d1b175dbf885a77ffddca16d422998a521c" expected_cabal_project_freeze_hash="32310c4d4e7b4679dcb90dcfcd0d6d1b175dbf885a77ffddca16d422998a521c"
......
...@@ -180,7 +180,7 @@ source-repository-package ...@@ -180,7 +180,7 @@ source-repository-package
source-repository-package source-repository-package
type: git type: git
location: https://gitlab.iscpif.fr/gargantext/haskell-bee location: https://gitlab.iscpif.fr/gargantext/haskell-bee
tag: 69b7388a62f2afb5cb5609beac96e8cb35e94478 tag: 4a9c709613554eed0189b486de2126c18797088c
subdir: haskell-bee/ subdir: haskell-bee/
haskell-bee-pgmq/ haskell-bee-pgmq/
haskell-bee-tests/ haskell-bee-tests/
......
...@@ -266,6 +266,7 @@ library ...@@ -266,6 +266,7 @@ library
Gargantext.Core.Worker.Types Gargantext.Core.Worker.Types
Gargantext.Database.Action.Flow Gargantext.Database.Action.Flow
Gargantext.Database.Action.Flow.Types Gargantext.Database.Action.Flow.Types
Gargantext.Database.Action.Flow.Utils
Gargantext.Database.Action.Metrics.TFICF Gargantext.Database.Action.Metrics.TFICF
Gargantext.Database.Action.Search Gargantext.Database.Action.Search
Gargantext.Database.Action.User Gargantext.Database.Action.User
...@@ -288,6 +289,7 @@ library ...@@ -288,6 +289,7 @@ library
Gargantext.Database.Query.Table.Node.User Gargantext.Database.Query.Table.Node.User
Gargantext.Database.Query.Table.User Gargantext.Database.Query.Table.User
Gargantext.Database.Query.Tree.Root Gargantext.Database.Query.Tree.Root
Gargantext.Database.Schema.Context
Gargantext.Database.Schema.Ngrams Gargantext.Database.Schema.Ngrams
Gargantext.Database.Schema.Node Gargantext.Database.Schema.Node
Gargantext.Database.Schema.User Gargantext.Database.Schema.User
...@@ -416,7 +418,6 @@ library ...@@ -416,7 +418,6 @@ library
Gargantext.Database.Action.Flow.Extract Gargantext.Database.Action.Flow.Extract
Gargantext.Database.Action.Flow.List Gargantext.Database.Action.Flow.List
Gargantext.Database.Action.Flow.Pairing Gargantext.Database.Action.Flow.Pairing
Gargantext.Database.Action.Flow.Utils
Gargantext.Database.Action.Index Gargantext.Database.Action.Index
Gargantext.Database.Action.Learn Gargantext.Database.Action.Learn
Gargantext.Database.Action.Mail Gargantext.Database.Action.Mail
...@@ -461,7 +462,6 @@ library ...@@ -461,7 +462,6 @@ library
Gargantext.Database.Query.Table.NodeNgrams Gargantext.Database.Query.Table.NodeNgrams
Gargantext.Database.Query.Tree Gargantext.Database.Query.Tree
Gargantext.Database.Query.Tree.Error Gargantext.Database.Query.Tree.Error
Gargantext.Database.Schema.Context
Gargantext.Database.Schema.ContextNodeNgrams Gargantext.Database.Schema.ContextNodeNgrams
Gargantext.Database.Schema.ContextNodeNgrams2 Gargantext.Database.Schema.ContextNodeNgrams2
Gargantext.Database.Schema.NodeContext Gargantext.Database.Schema.NodeContext
...@@ -796,6 +796,7 @@ test-suite garg-test-tasty ...@@ -796,6 +796,7 @@ test-suite garg-test-tasty
Test.API.Prelude Test.API.Prelude
Test.API.UpdateList Test.API.UpdateList
Test.Core.Notifications Test.Core.Notifications
Test.Core.Orchestrator
Test.Core.Similarity Test.Core.Similarity
Test.Core.Text Test.Core.Text
Test.Core.Text.Corpus.Query Test.Core.Text.Corpus.Query
...@@ -821,6 +822,7 @@ test-suite garg-test-tasty ...@@ -821,6 +822,7 @@ test-suite garg-test-tasty
Test.Ngrams.NLP Test.Ngrams.NLP
Test.Ngrams.Query Test.Ngrams.Query
Test.Ngrams.Query.PaginationCorpus Test.Ngrams.Query.PaginationCorpus
Test.Ngrams.Terms
Test.Offline.Errors Test.Offline.Errors
Test.Offline.JSON Test.Offline.JSON
Test.Offline.Phylo Test.Offline.Phylo
......
...@@ -17,39 +17,68 @@ Portability : POSIX ...@@ -17,39 +17,68 @@ Portability : POSIX
module Gargantext.API.Admin.Orchestrator.Types module Gargantext.API.Admin.Orchestrator.Types
where where
import Data.Aeson (genericParseJSON, genericToJSON) import Data.Aeson (genericParseJSON, genericToJSON, object, withObject, (.=), (.:), (.:?), Value(String))
import Data.Aeson.Types (unexpected)
import Data.Morpheus.Types ( GQLType(..), DropNamespace(..), typeDirective ) import Data.Morpheus.Types ( GQLType(..), DropNamespace(..), typeDirective )
import Data.Swagger (ToSchema, URL, declareNamedSchema, defaultSchemaOptions, genericDeclareNamedSchemaUnrestricted) import Data.Swagger (ToSchema, URL, declareNamedSchema, defaultSchemaOptions, genericDeclareNamedSchemaUnrestricted)
-- import Gargantext.API.GraphQL.UnPrefix qualified as GQLU -- import Gargantext.API.GraphQL.UnPrefix qualified as GQLU
import Gargantext.Core.Types (TODO(..)) import Gargantext.Core.Types (TODO(..))
import Gargantext.Core.Utils.Aeson (jsonOptions) import Gargantext.Core.Utils.Aeson (jsonOptions)
import Gargantext.Prelude import Gargantext.Prelude
import Test.QuickCheck (elements) import PUBMED.Types qualified as PUBMED
import Test.QuickCheck.Arbitrary
type EPOAPIToken = Text
type EPOAPIUser = Text
------------------------------------------------------------------------ ------------------------------------------------------------------------
-- | Main Types -- | Main Types
-- TODO IsidoreAuth -- TODO IsidoreAuth
data ExternalAPIs = OpenAlex data ExternalAPIs = OpenAlex
| PubMed | PubMed (Maybe PUBMED.APIKey)
| Arxiv | Arxiv
| HAL | HAL
| IsTex | IsTex
| Isidore | Isidore
| EPO | EPO (Maybe EPOAPIUser) (Maybe EPOAPIToken)
deriving (Show, Eq, Generic, Enum, Bounded) deriving (Show, Eq, Generic)
-- | Main Instances -- | Main Instances
instance FromJSON ExternalAPIs instance FromJSON ExternalAPIs where
instance ToJSON ExternalAPIs parseJSON = withObject "ExternalAPIs" $ \o -> do
db <- o .: "db"
case db of
"OpenAlex" -> pure OpenAlex
"PubMed" -> do
mAPIKey <- o .:? "api_key"
pure $ PubMed mAPIKey
"Arxiv" -> pure Arxiv
"HAL" -> pure HAL
"IsTex" -> pure IsTex
"Isidore" -> pure Isidore
"EPO" -> do
mAPIUser <- o .:? "api_user"
mAPIToken <- o .:? "api_token"
pure $ EPO mAPIUser mAPIToken
s -> unexpected (String s)
instance ToJSON ExternalAPIs where
toJSON (PubMed mAPIKey) = object [ "db" .= toJSON ("PubMed" :: Text)
, "api_key" .= toJSON mAPIKey ]
toJSON (EPO mAPIUser mAPIToken) = object [ "db" .= toJSON ("EPO" :: Text)
, "api_user" .= toJSON mAPIUser
, "api_token" .= toJSON mAPIToken ]
toJSON t = object [ "db" .= toJSON (show t :: Text) ]
externalAPIs :: [ExternalAPIs] externalAPIs :: [ExternalAPIs]
externalAPIs = [minBound .. maxBound] externalAPIs =
[ OpenAlex
instance Arbitrary ExternalAPIs , PubMed Nothing
where , Arxiv
arbitrary = arbitraryBoundedEnum , HAL
, IsTex
, Isidore
, EPO Nothing Nothing ]
instance ToSchema ExternalAPIs where instance ToSchema ExternalAPIs where
declareNamedSchema = genericDeclareNamedSchemaUnrestricted defaultSchemaOptions declareNamedSchema = genericDeclareNamedSchemaUnrestricted defaultSchemaOptions
...@@ -64,11 +93,6 @@ data ScraperEvent = ScraperEvent ...@@ -64,11 +93,6 @@ data ScraperEvent = ScraperEvent
, _scev_date :: !(Maybe Text) , _scev_date :: !(Maybe Text)
} }
deriving (Show, Generic, Eq) deriving (Show, Generic, Eq)
instance Arbitrary ScraperEvent where
arbitrary = ScraperEvent <$> elements [Nothing, Just "test message"]
<*> elements [Nothing, Just "INFO", Just "WARN"]
<*> elements [Nothing, Just "2018-04-18"]
instance ToJSON ScraperEvent where instance ToJSON ScraperEvent where
toJSON = genericToJSON $ jsonOptions "_scev_" toJSON = genericToJSON $ jsonOptions "_scev_"
instance FromJSON ScraperEvent where instance FromJSON ScraperEvent where
...@@ -91,12 +115,6 @@ makeLenses ''JobLog ...@@ -91,12 +115,6 @@ makeLenses ''JobLog
noJobLog :: JobLog noJobLog :: JobLog
noJobLog = JobLog Nothing Nothing Nothing Nothing noJobLog = JobLog Nothing Nothing Nothing Nothing
instance Arbitrary JobLog where
arbitrary = JobLog
<$> arbitrary
<*> arbitrary
<*> arbitrary
<*> arbitrary
instance ToJSON JobLog where instance ToJSON JobLog where
toJSON = genericToJSON $ jsonOptions "_scst_" toJSON = genericToJSON $ jsonOptions "_scst_"
instance FromJSON JobLog where instance FromJSON JobLog where
......
...@@ -9,6 +9,7 @@ Portability : POSIX ...@@ -9,6 +9,7 @@ Portability : POSIX
-} -}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-} {-# LANGUAGE TypeOperators #-}
{-# OPTIONS_GHC -Wno-unrecognised-pragmas #-} {-# OPTIONS_GHC -Wno-unrecognised-pragmas #-}
...@@ -108,7 +109,9 @@ getListNgrams nodeIds ngramsType = listNgramsFromRepo nodeIds ngramsType ...@@ -108,7 +109,9 @@ getListNgrams nodeIds ngramsType = listNgramsFromRepo nodeIds ngramsType
<$> getRepo nodeIds <$> getRepo nodeIds
getTermsWith :: (HasNodeStory env err m, Eq a, Hashable a) -- | Fetch terms from repo, gathering terms under the same root (parent).
getTermsWith :: forall a env err m.
(HasNodeStory env err m, Eq a, Hashable a)
=> (NgramsTerm -> a) -> [ListId] => (NgramsTerm -> a) -> [ListId]
-> NgramsType -> Set ListType -> NgramsType -> Set ListType
-> m (HashMap a [a]) -> m (HashMap a [a])
...@@ -119,6 +122,7 @@ getTermsWith f ls ngt lts = HM.fromListWith (<>) ...@@ -119,6 +122,7 @@ getTermsWith f ls ngt lts = HM.fromListWith (<>)
<$> mapTermListRoot ls ngt <$> mapTermListRoot ls ngt
<$> getRepo ls <$> getRepo ls
where where
toTreeWith :: (NgramsTerm, (b, Maybe NgramsTerm)) -> (a, [a])
toTreeWith (t, (_lt, maybeRoot)) = case maybeRoot of toTreeWith (t, (_lt, maybeRoot)) = case maybeRoot of
Nothing -> (f t, []) Nothing -> (f t, [])
Just r -> (f r, [f t]) Just r -> (f r, [f t])
......
...@@ -27,12 +27,11 @@ import Data.Conduit.Internal (zipSources) ...@@ -27,12 +27,11 @@ import Data.Conduit.Internal (zipSources)
import Data.Swagger ( ToSchema(..) ) import Data.Swagger ( ToSchema(..) )
import Data.Text.Encoding qualified as TE import Data.Text.Encoding qualified as TE
import Data.Text qualified as T import Data.Text qualified as T
import EPO.API.Client.Types qualified as EPO
import Gargantext.API.Admin.Orchestrator.Types qualified as API import Gargantext.API.Admin.Orchestrator.Types qualified as API
import Gargantext.API.Ngrams (commitStatePatch, Versioned(..)) import Gargantext.API.Ngrams (commitStatePatch, Versioned(..))
import Gargantext.API.Node.Corpus.New.Types ( FileFormat(..), FileType(..) ) import Gargantext.API.Node.Corpus.New.Types ( FileFormat(..), FileType(..) )
import Gargantext.API.Node.Corpus.Searx ( triggerSearxSearch ) import Gargantext.API.Node.Corpus.Searx ( triggerSearxSearch )
import Gargantext.API.Node.Corpus.Types ( Datafield(Web), database2origin ) import Gargantext.API.Node.Corpus.Types ( Datafield(Web), datafield2origin )
import Gargantext.API.Node.Corpus.Update (addLanguageToCorpus) import Gargantext.API.Node.Corpus.Update (addLanguageToCorpus)
import Gargantext.API.Node.Types import Gargantext.API.Node.Types
import Gargantext.Core.Config (gc_jobs, hasConfig) import Gargantext.Core.Config (gc_jobs, hasConfig)
...@@ -157,26 +156,20 @@ addToCorpusWithQuery :: ( FlowCmdM env err m ...@@ -157,26 +156,20 @@ addToCorpusWithQuery :: ( FlowCmdM env err m
-> JobHandle m -> JobHandle m
-> m () -> m ()
addToCorpusWithQuery user cid (WithQuery { _wq_query = q addToCorpusWithQuery user cid (WithQuery { _wq_query = q
, _wq_databases = dbs
, _wq_datafield = datafield , _wq_datafield = datafield
, _wq_lang = l , _wq_lang = l
, _wq_flowListWith = flw , _wq_flowListWith = flw }) maybeLimit jobHandle = do
, _wq_pubmedAPIKey = mPubmedAPIKey
, .. }) maybeLimit jobHandle = do
-- TODO ... -- TODO ...
$(logLocM) DEBUG $ "[addToCorpusWithQuery] (cid, dbs) " <> show (cid, dbs) $(logLocM) DEBUG $ "[addToCorpusWithQuery] cid " <> show cid
$(logLocM) DEBUG $ "[addToCorpusWithQuery] datafield " <> show datafield $(logLocM) DEBUG $ "[addToCorpusWithQuery] datafield " <> show datafield
$(logLocM) DEBUG $ "[addToCorpusWithQuery] flowListWith " <> show flw $(logLocM) DEBUG $ "[addToCorpusWithQuery] flowListWith " <> show flw
let mEPOAuthKey = EPO.AuthKey <$> (EPO.User <$> _wq_epoAPIUser)
<*> (EPO.Token <$> _wq_epoAPIToken)
$(logLocM) DEBUG $ "[addToCorpusWithQuery] addLanguageToCorpus " <> show cid <> ", " <> show l $(logLocM) DEBUG $ "[addToCorpusWithQuery] addLanguageToCorpus " <> show cid <> ", " <> show l
addLanguageToCorpus cid l addLanguageToCorpus cid l
$(logLocM) DEBUG "[addToCorpusWithQuery] after addLanguageToCorpus" $(logLocM) DEBUG "[addToCorpusWithQuery] after addLanguageToCorpus"
case datafield of case datafield of
Just Web -> do Web -> do
$(logLocM) DEBUG $ "[addToCorpusWithQuery] processing web request " <> show datafield $(logLocM) DEBUG $ "[addToCorpusWithQuery] processing web request " <> show datafield
markStarted 1 jobHandle markStarted 1 jobHandle
...@@ -193,10 +186,10 @@ addToCorpusWithQuery user cid (WithQuery { _wq_query = q ...@@ -193,10 +186,10 @@ addToCorpusWithQuery user cid (WithQuery { _wq_query = q
-- if cid is corpus -> add to corpus -- if cid is corpus -> add to corpus
-- if cid is root -> create corpus in Private -- if cid is root -> create corpus in Private
$(logLocM) DEBUG $ "[addToCorpusWithQuery] getDataText with query: " <> show q $(logLocM) DEBUG $ "[addToCorpusWithQuery] getDataText with query: " <> show q
let db = database2origin dbs let db = datafield2origin datafield
-- mPubmedAPIKey <- getUserPubmedAPIKey user -- mPubmedAPIKey <- getUserPubmedAPIKey user
-- printDebug "[addToCorpusWithQuery] mPubmedAPIKey" mPubmedAPIKey -- printDebug "[addToCorpusWithQuery] mPubmedAPIKey" mPubmedAPIKey
eTxt <- getDataText db (Multi l) q mPubmedAPIKey mEPOAuthKey maybeLimit eTxt <- getDataText db (Multi l) q maybeLimit
-- printDebug "[G.A.N.C.New] lTxts" lTxts -- printDebug "[G.A.N.C.New] lTxts" lTxts
case eTxt of case eTxt of
......
...@@ -15,7 +15,6 @@ module Gargantext.API.Node.Corpus.Searx where ...@@ -15,7 +15,6 @@ module Gargantext.API.Node.Corpus.Searx where
import Control.Lens (view) import Control.Lens (view)
import Data.Aeson qualified as Aeson import Data.Aeson qualified as Aeson
import Data.HashMap.Strict qualified as HashMap
import Data.Text qualified as T import Data.Text qualified as T
import Data.Text qualified as Text import Data.Text qualified as Text
import Data.Time.Calendar (Day, toGregorian) import Data.Time.Calendar (Day, toGregorian)
...@@ -27,14 +26,11 @@ import Gargantext.Core.Config.Types (FramesConfig(..)) ...@@ -27,14 +26,11 @@ import Gargantext.Core.Config.Types (FramesConfig(..))
import Gargantext.Core.NLP (HasNLPServer, nlpServerGet) import Gargantext.Core.NLP (HasNLPServer, nlpServerGet)
import Gargantext.Core.NodeStory.Types ( HasNodeStory ) import Gargantext.Core.NodeStory.Types ( HasNodeStory )
import Gargantext.Core.Text.Corpus.Query qualified as Query import Gargantext.Core.Text.Corpus.Query qualified as Query
import Gargantext.Core.Text.List (buildNgramsLists)
import Gargantext.Core.Text.List.Group.WithStem ({-StopSize(..),-} GroupParams(..))
import Gargantext.Core.Text.Terms (TermType(..)) import Gargantext.Core.Text.Terms (TermType(..))
import Gargantext.Core.Types (HasValidationError) import Gargantext.Core.Types (HasValidationError)
import Gargantext.Core.Types.Individu (User(..)) import Gargantext.Core.Types.Individu (User(..))
import Gargantext.Core.Utils.Prefix (unPrefix) import Gargantext.Core.Utils.Prefix (unPrefix)
import Gargantext.Database.Action.Flow (addDocumentsToHyperCorpus) --, DataText(..)) import Gargantext.Database.Action.Flow (addDocumentsToHyperCorpus, buildSocialList) --, DataText(..))
import Gargantext.Database.Action.Flow.List (flowList_DbRepo)
import Gargantext.Database.Action.User (getUserId) import Gargantext.Database.Action.User (getUserId)
import Gargantext.Database.Admin.Types.Hyperdata.Corpus (HyperdataCorpus) import Gargantext.Database.Admin.Types.Hyperdata.Corpus (HyperdataCorpus)
import Gargantext.Database.Admin.Types.Hyperdata.Document (HyperdataDocument(..)) import Gargantext.Database.Admin.Types.Hyperdata.Document (HyperdataDocument(..))
...@@ -42,7 +38,6 @@ import Gargantext.Database.Admin.Types.Node (CorpusId, ListId, NodeType(NodeText ...@@ -42,7 +38,6 @@ import Gargantext.Database.Admin.Types.Node (CorpusId, ListId, NodeType(NodeText
import Gargantext.Database.Query.Table.Node (getOrMkList, insertDefaultNodeIfNotExists) import Gargantext.Database.Query.Table.Node (getOrMkList, insertDefaultNodeIfNotExists)
import Gargantext.Database.Query.Table.Node.Error (HasNodeError) import Gargantext.Database.Query.Table.Node.Error (HasNodeError)
import Gargantext.Database.Query.Tree.Error (HasTreeError) import Gargantext.Database.Query.Tree.Error (HasTreeError)
import Gargantext.Database.Query.Tree.Root (getOrMkRootWithCorpus, MkCorpusUser (MkCorpusUserMaster))
import Gargantext.Prelude import Gargantext.Prelude
import Gargantext.Utils.Jobs.Monad (JobHandle, MonadJobStatus(..)) import Gargantext.Utils.Jobs.Monad (JobHandle, MonadJobStatus(..))
import Network.HTTP.Client import Network.HTTP.Client
...@@ -147,17 +142,12 @@ insertSearxResponse user cId listId l (Right (SearxResponse { _srs_results })) = ...@@ -147,17 +142,12 @@ insertSearxResponse user cId listId l (Right (SearxResponse { _srs_results })) =
--_ <- flowDataText user (DataNew [docs']) (Multi l) cId Nothing logStatus --_ <- flowDataText user (DataNew [docs']) (Multi l) cId Nothing logStatus
let mCorpus = Nothing :: Maybe HyperdataCorpus let mCorpus = Nothing :: Maybe HyperdataCorpus
void $ addDocumentsToHyperCorpus server mCorpus (Multi l) cId docs' void $ addDocumentsToHyperCorpus server mCorpus (Multi l) cId docs'
(_masterUserId, _masterRootId, masterCorpusId)
<- getOrMkRootWithCorpus MkCorpusUserMaster mCorpus _ <- buildSocialList l user cId listId mCorpus Nothing
let gp = GroupWithPosTag l server HashMap.empty
-- gp = case l of
-- FR -> GroupWithPosTag l Spacy HashMap.empty
-- _ -> GroupWithPosTag l CoreNLP HashMap.empty
ngs <- buildNgramsLists user cId masterCorpusId Nothing gp
_userListId <- flowList_DbRepo listId ngs
pure () pure ()
-- TODO Make an async task out of this? -- TODO Make an async task out of this?
triggerSearxSearch :: ( MonadBase IO m triggerSearxSearch :: ( MonadBase IO m
, HasNodeStory env err m , HasNodeStory env err m
......
...@@ -9,43 +9,45 @@ Portability : POSIX ...@@ -9,43 +9,45 @@ Portability : POSIX
-} -}
{-# LANGUAGE TemplateHaskell #-}
module Gargantext.API.Node.Corpus.Types where module Gargantext.API.Node.Corpus.Types where
import Control.Lens ( (?~) ) import Control.Lens ( (?~) )
import Control.Monad.Fail (fail) import Control.Monad.Fail (fail)
import Data.Aeson ( Value(..), (.:), withText, object ) import Data.Aeson ( Value(..), (.:), (.=), withText, object, withObject )
import Data.Aeson.Types ( Parser )
import Data.Swagger import Data.Swagger
import Data.Text qualified as T import Data.Text qualified as T
import Gargantext.API.Admin.Orchestrator.Types qualified as Types import Gargantext.API.Admin.Orchestrator.Types qualified as Types
import Gargantext.Core.Utils.Prefix (unPrefix)
import Gargantext.Database.Action.Flow.Types (DataOrigin(..)) import Gargantext.Database.Action.Flow.Types (DataOrigin(..))
import Gargantext.Prelude import Gargantext.Prelude
data Database = Empty type EPOAPIToken = Text
| OpenAlex type EPOAPIUser = Text
| PubMed
| Arxiv
| HAL
| IsTex
| Isidore
| EPO
deriving (Eq, Show, Generic, Enum, Bounded)
deriveJSON (unPrefix "") ''Database data Database =
Empty
| DB Types.ExternalAPIs
deriving (Eq, Show, Generic)
instance FromJSON Database where
parseJSON = withObject "Database" $ \o -> do
db <- o .: "db" :: Parser Text
case db of
"Empty" -> pure Empty
_ -> do
eapi <- parseJSON (Object o) :: Parser Types.ExternalAPIs
pure $ DB eapi
instance ToJSON Database where
toJSON Empty = object [ "db" .= (show Empty :: Text)]
toJSON (DB db) = toJSON db
instance ToSchema Database where instance ToSchema Database where
declareNamedSchema = genericDeclareNamedSchemaUnrestricted defaultSchemaOptions declareNamedSchema = genericDeclareNamedSchemaUnrestricted defaultSchemaOptions
database2origin :: Database -> DataOrigin
database2origin Empty = InternalOrigin Types.IsTex datafield2origin :: Datafield -> DataOrigin
database2origin OpenAlex = ExternalOrigin Types.OpenAlex datafield2origin (External Empty) = InternalOrigin Types.IsTex
database2origin PubMed = ExternalOrigin Types.PubMed datafield2origin (External (DB db)) = ExternalOrigin db
database2origin Arxiv = ExternalOrigin Types.Arxiv -- -- | This isn't really used
database2origin HAL = ExternalOrigin Types.HAL datafield2origin _ = InternalOrigin Types.IsTex
database2origin IsTex = ExternalOrigin Types.IsTex
database2origin Isidore = ExternalOrigin Types.Isidore
database2origin EPO = ExternalOrigin Types.EPO
------------------------------------------------------------------------ ------------------------------------------------------------------------
data Datafield = Gargantext data Datafield = Gargantext
......
...@@ -9,8 +9,6 @@ Portability : POSIX ...@@ -9,8 +9,6 @@ Portability : POSIX
-} -}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeOperators #-}
module Gargantext.API.Node.Share module Gargantext.API.Node.Share
where where
...@@ -57,25 +55,21 @@ api userInviting nId (ShareTeamParams user') = do ...@@ -57,25 +55,21 @@ api userInviting nId (ShareTeamParams user') = do
pure u pure u
Left _err -> do Left _err -> do
username' <- getUsername userInviting username' <- getUsername userInviting
_ <- case username' `List.elem` arbitraryUsername of unless (username' `List.elem` arbitraryUsername) $ do
True -> do -- TODO better analysis of the composition of what is shared
-- printDebug "[G.A.N.Share.api]" ("Demo users are not allowed to invite" :: Text) children <- findNodesWithType nId [NodeList] [ NodeFolderShared
pure () , NodeTeam
False -> do , NodeFolder
-- TODO better analysis of the composition of what is shared , NodeCorpus
children <- findNodesWithType nId [NodeList] [ NodeFolderShared ]
, NodeTeam _ <- if List.null children
, NodeFolder then do
, NodeCorpus -- printDebug "[G.A.N.Share.api]" ("Invitation is enabled if you share a corpus at least" :: Text)
] pure $ UnsafeMkUserId 0
_ <- case List.null children of else do
True -> do -- printDebug "[G.A.N.Share.api]" ("Your invitation is sent to: " <> user'')
-- printDebug "[G.A.N.Share.api]" ("Invitation is enabled if you share a corpus at least" :: Text) newUser user''
pure $ UnsafeMkUserId 0 pure ()
False -> do
-- printDebug "[G.A.N.Share.api]" ("Your invitation is sent to: " <> user'')
newUser user''
pure ()
pure u pure u
fromIntegral <$> DB.shareNodeWith (ShareNodeWith_User NodeFolderShared (UserName user)) nId fromIntegral <$> DB.shareNodeWith (ShareNodeWith_User NodeFolderShared (UserName user)) nId
......
...@@ -83,14 +83,10 @@ instance GargDB.SaveFile NewWithFile where ...@@ -83,14 +83,10 @@ instance GargDB.SaveFile NewWithFile where
data WithQuery = WithQuery data WithQuery = WithQuery
{ _wq_query :: !API.RawQuery { _wq_query :: !API.RawQuery
, _wq_databases :: !Database , _wq_datafield :: !Datafield
, _wq_datafield :: !(Maybe Datafield)
, _wq_lang :: !Lang , _wq_lang :: !Lang
, _wq_node_id :: !Int , _wq_node_id :: !Int
, _wq_flowListWith :: !FlowSocialListWith , _wq_flowListWith :: !FlowSocialListWith
, _wq_pubmedAPIKey :: !(Maybe Text)
, _wq_epoAPIUser :: !(Maybe Text)
, _wq_epoAPIToken :: !(Maybe Text)
} }
deriving (Show, Eq, Generic) deriving (Show, Eq, Generic)
......
...@@ -36,7 +36,6 @@ import Gargantext.Core.Text.Corpus.Query qualified as Corpus ...@@ -36,7 +36,6 @@ import Gargantext.Core.Text.Corpus.Query qualified as Corpus
import Gargantext.Database.Admin.Types.Hyperdata.Document (HyperdataDocument(..)) import Gargantext.Database.Admin.Types.Hyperdata.Document (HyperdataDocument(..))
import Gargantext.Prelude hiding (get) import Gargantext.Prelude hiding (get)
import Gargantext.Utils.Jobs.Error import Gargantext.Utils.Jobs.Error
import PUBMED.Types qualified as PUBMED
import Servant.Client (ClientError) import Servant.Client (ClientError)
data GetCorpusError data GetCorpusError
...@@ -60,17 +59,15 @@ get :: ExternalAPIs ...@@ -60,17 +59,15 @@ get :: ExternalAPIs
-- If the provider doesn't support the search filtered by language, or if the language -- If the provider doesn't support the search filtered by language, or if the language
-- is not important, the frontend will simply send 'EN' to the backend. -- is not important, the frontend will simply send 'EN' to the backend.
-> Corpus.RawQuery -> Corpus.RawQuery
-> Maybe PUBMED.APIKey
-> Maybe EPO.AuthKey
-> Text -> Text
-> Maybe Corpus.Limit -> Maybe Corpus.Limit
-- -> IO [HyperdataDocument] -- -> IO [HyperdataDocument]
-> IO (Either GetCorpusError (Maybe Integer, ConduitT () HyperdataDocument IO ())) -> IO (Either GetCorpusError (Maybe Integer, ConduitT () HyperdataDocument IO ()))
get externalAPI lang q mPubmedAPIKey epoAuthKey epoAPIUrl limit = do get externalAPI lang q epoAPIUrl limit = do
-- For PUBMED, HAL, IsTex, Isidore and OpenAlex, we want to send the query as-it. -- For PUBMED, HAL, IsTex, Isidore and OpenAlex, we want to send the query as-it.
-- For Arxiv we parse the query into a structured boolean query we submit over. -- For Arxiv we parse the query into a structured boolean query we submit over.
case externalAPI of case externalAPI of
PubMed -> PubMed mPubmedAPIKey ->
first (ExternalAPIError externalAPI) <$> PUBMED.get (fromMaybe "" mPubmedAPIKey) q limit first (ExternalAPIError externalAPI) <$> PUBMED.get (fromMaybe "" mPubmedAPIKey) q limit
OpenAlex -> OpenAlex ->
first (ExternalAPIError externalAPI) <$> OpenAlex.get (fromMaybe "" Nothing {- email -}) q (Just $ toISO639 lang) limit first (ExternalAPIError externalAPI) <$> OpenAlex.get (fromMaybe "" Nothing {- email -}) q (Just $ toISO639 lang) limit
...@@ -85,8 +82,10 @@ get externalAPI lang q mPubmedAPIKey epoAuthKey epoAPIUrl limit = do ...@@ -85,8 +82,10 @@ get externalAPI lang q mPubmedAPIKey epoAuthKey epoAPIUrl limit = do
Isidore -> do Isidore -> do
docs <- ISIDORE.get lang (Corpus.getLimit <$> limit) (Just $ Corpus.getRawQuery q) Nothing docs <- ISIDORE.get lang (Corpus.getLimit <$> limit) (Just $ Corpus.getRawQuery q) Nothing
pure $ Right (Just $ fromIntegral $ length docs, yieldMany docs) pure $ Right (Just $ fromIntegral $ length docs, yieldMany docs)
EPO -> do EPO mAPIUser mAPIToken -> do
first (ExternalAPIError externalAPI) <$> EPO.get epoAuthKey epoAPIUrl q (toISO639 lang) limit let mEPOAuthKey = EPO.AuthKey <$> (EPO.User <$> mAPIUser)
<*> (EPO.Token <$> mAPIToken)
first (ExternalAPIError externalAPI) <$> EPO.get mEPOAuthKey epoAPIUrl q (toISO639 lang) limit
where where
parse_query = first (InvalidInputQuery q . T.pack) $ Corpus.parseQuery q parse_query = first (InvalidInputQuery q . T.pack) $ Corpus.parseQuery q
......
...@@ -52,7 +52,7 @@ import Gargantext.Core.Text.Terms.Mono (monoTerms) ...@@ -52,7 +52,7 @@ import Gargantext.Core.Text.Terms.Mono (monoTerms)
import Gargantext.Core.Text.Terms.Mono.Stem (stem, StemmingAlgorithm(..)) import Gargantext.Core.Text.Terms.Mono.Stem (stem, StemmingAlgorithm(..))
import Gargantext.Core.Text.Terms.Mono.Token.En (tokenize) import Gargantext.Core.Text.Terms.Mono.Token.En (tokenize)
import Gargantext.Core.Text.Terms.Multi (multiterms) import Gargantext.Core.Text.Terms.Multi (multiterms)
import Gargantext.Core.Types ( TermsCount, POS, Terms(..), TermsWithCount ) import Gargantext.Core.Types ( TermsCount, TermsWeight, POS, Terms(..), TermsWithCount )
import Gargantext.Core.Utils (groupWithCounts) import Gargantext.Core.Utils (groupWithCounts)
import Gargantext.Database.Prelude (DBCmd) import Gargantext.Database.Prelude (DBCmd)
import Gargantext.Database.Query.Table.Ngrams (insertNgrams) import Gargantext.Database.Query.Table.Ngrams (insertNgrams)
...@@ -122,7 +122,7 @@ class ExtractNgramsT h ...@@ -122,7 +122,7 @@ class ExtractNgramsT h
=> NLPServerConfig => NLPServerConfig
-> TermType Lang -> TermType Lang
-> h -> h
-> DBCmd err (HashMap ExtractedNgrams (Map NgramsType Int, TermsCount)) -> DBCmd err (HashMap ExtractedNgrams (Map NgramsType TermsWeight, TermsCount))
------------------------------------------------------------------------ ------------------------------------------------------------------------
enrichedTerms :: Lang -> PosTagAlgo -> POS -> Terms -> NgramsPostag enrichedTerms :: Lang -> PosTagAlgo -> POS -> Terms -> NgramsPostag
enrichedTerms l pa po (Terms { .. }) = enrichedTerms l pa po (Terms { .. }) =
...@@ -176,7 +176,7 @@ terms :: NLPServerConfig -> TermType Lang -> Text -> IO [TermsWithCount] ...@@ -176,7 +176,7 @@ terms :: NLPServerConfig -> TermType Lang -> Text -> IO [TermsWithCount]
terms _ (Mono lang) txt = pure $ monoTerms lang txt terms _ (Mono lang) txt = pure $ monoTerms lang txt
terms ncs (Multi lang) txt = multiterms ncs lang txt terms ncs (Multi lang) txt = multiterms ncs lang txt
terms ncs (MonoMulti lang) txt = terms ncs (Multi lang) txt terms ncs (MonoMulti lang) txt = terms ncs (Multi lang) txt
terms _ (Unsupervised { .. }) txt = pure $ termsUnsupervised (Unsupervised { _tt_model = Just m', .. }) txt terms _ (Unsupervised { .. }) txt = pure $ termsUnsupervised _tt_lang m' _tt_windowSize _tt_ngramsSize txt
where where
m' = maybe (newTries _tt_ngramsSize txt) identity _tt_model m' = maybe (newTries _tt_ngramsSize txt) identity _tt_model
-- terms (WithList list) txt = pure . concat $ extractTermsWithList list txt -- terms (WithList list) txt = pure . concat $ extractTermsWithList list txt
...@@ -189,17 +189,15 @@ type MinNgramSize = Int ...@@ -189,17 +189,15 @@ type MinNgramSize = Int
-- | Unsupervised ngrams extraction -- | Unsupervised ngrams extraction
-- language agnostic extraction -- language agnostic extraction
-- TODO: newtype BlockText -- TODO: newtype BlockText
termsUnsupervised :: TermType Lang -> Text -> [TermsWithCount] termsUnsupervised :: Lang -> Tries Token () -> Int -> Int -> Text -> [TermsWithCount]
termsUnsupervised (Unsupervised { _tt_model = Nothing }) = panicTrace "[termsUnsupervised] no model" termsUnsupervised lang model windowSize ngramsSize =
termsUnsupervised (Unsupervised { _tt_model = Just _tt_model, .. }) = map (first (text2term lang))
map (first (text2term _tt_lang))
. groupWithCounts . groupWithCounts
-- . List.nub -- . List.nub
. List.filter (\l' -> List.length l' >= _tt_windowSize) . List.filter (\l' -> List.length l' >= windowSize)
. List.concat . List.concat
. mainEleveWith _tt_model _tt_ngramsSize . mainEleveWith model ngramsSize
. uniText . uniText
termsUnsupervised _ = undefined
newTries :: Int -> Text -> Tries Token () newTries :: Int -> Text -> Tries Token ()
......
...@@ -96,18 +96,14 @@ termsInText lang pats (manipulateText lang -> txt) = ...@@ -96,18 +96,14 @@ termsInText lang pats (manipulateText lang -> txt) =
-- | Manipulates the input 'Text' before passing it to 'termsInText'. -- | Manipulates the input 'Text' before passing it to 'termsInText'.
-- In particular, if the language is Chinese (ZH), we add spaces. -- In particular, if the language is Chinese (ZH), we add spaces.
manipulateText :: Lang -> Text -> Text manipulateText :: Lang -> Text -> Text
manipulateText lang txt = case lang of manipulateText ZH txt = addSpaces txt
ZH -> addSpaces txt manipulateText _ txt = txt
_ -> txt
-------------------------------------------------------------------------- --------------------------------------------------------------------------
extractTermsWithList :: Patterns -> Text -> Corpus [Text] extractTermsWithList :: Patterns -> Text -> Corpus [Text]
extractTermsWithList pats = map (replaceTerms KeepAll pats) . monoTextsBySentence extractTermsWithList pats = map (replaceTerms KeepAll pats) . monoTextsBySentence
-- | Extract terms -- | Extract terms
-- >>> let termList = [(["chat blanc"], [["chat","blanc"]])] :: TermList
-- extractTermsWithList' (buildPatterns termList) "Le chat blanc"["chat blanc"]
-- ["chat blanc"]
extractTermsWithList' :: Patterns -> Text -> [Text] extractTermsWithList' :: Patterns -> Text -> [Text]
extractTermsWithList' pats = map (concat . map concat . replaceTerms KeepAll pats) extractTermsWithList' pats = map (concat . map concat . replaceTerms KeepAll pats)
. monoTextsBySentence . monoTextsBySentence
......
...@@ -20,7 +20,7 @@ commentary with @some markup@. ...@@ -20,7 +20,7 @@ commentary with @some markup@.
module Gargantext.Core.Types ( module Gargantext.Core.Types.Main module Gargantext.Core.Types ( module Gargantext.Core.Types.Main
, module Gargantext.Database.Admin.Types.Node , module Gargantext.Database.Admin.Types.Node
, DebugMode(..), withDebugMode , DebugMode(..), withDebugMode
, Term(..), Terms(..), TermsCount, TermsWithCount , Term(..), Terms(..), TermsCount, TermsWeight(..), TermsWithCount
, TokenTag(..), POS(..), NER(..) , TokenTag(..), POS(..), NER(..)
, Label, Stems , Label, Stems
, HasValidationError(..), assertValid , HasValidationError(..), assertValid
...@@ -74,6 +74,9 @@ type TermsCount = Int ...@@ -74,6 +74,9 @@ type TermsCount = Int
type TermsWithCount = (Terms, TermsCount) type TermsWithCount = (Terms, TermsCount)
newtype TermsWeight = TermsWeight { unTermsWeight :: Int }
deriving newtype (Eq, Ord, Num, Show)
------------------------------------------------------------------------ ------------------------------------------------------------------------
data Tag = POS | NER data Tag = POS | NER
deriving (Show, Eq) deriving (Show, Eq)
......
...@@ -157,11 +157,6 @@ notifyJobKilled env (W.State { name }) (Just bm) = do ...@@ -157,11 +157,6 @@ notifyJobKilled env (W.State { name }) (Just bm) = do
-- | Spawn a worker with PGMQ broker -- | Spawn a worker with PGMQ broker
-- TODO:
-- - reduce size of DB pool
-- - progress report via notifications
-- - I think there is no point to save job result, as usually there is none (we have side-effects only)
-- - replace Servant.Job to use workers instead of garg API threads
withPGMQWorker :: HasWorkerBroker withPGMQWorker :: HasWorkerBroker
=> WorkerEnv => WorkerEnv
-> WorkerDefinition -> WorkerDefinition
......
This diff is collapsed.
...@@ -12,6 +12,7 @@ Portability : POSIX ...@@ -12,6 +12,7 @@ Portability : POSIX
{-# OPTIONS_GHC -fno-warn-orphans #-} {-# OPTIONS_GHC -fno-warn-orphans #-}
{-# LANGUAGE InstanceSigs #-} {-# LANGUAGE InstanceSigs #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Gargantext.Database.Action.Flow.Extract module Gargantext.Database.Action.Flow.Extract
...@@ -25,11 +26,12 @@ import Gargantext.Core.Text (HasText(..)) ...@@ -25,11 +26,12 @@ import Gargantext.Core.Text (HasText(..))
import Gargantext.Core.Text.Corpus.Parsers (splitOn) import Gargantext.Core.Text.Corpus.Parsers (splitOn)
import Gargantext.Core.Text.Ngrams (NgramsType(..)) import Gargantext.Core.Text.Ngrams (NgramsType(..))
import Gargantext.Core.Text.Terms (ExtractNgramsT, ExtractedNgrams(..), TermType, cleanExtractedNgrams, enrichedTerms, extractNgramsT, extractTerms, tt_lang) import Gargantext.Core.Text.Terms (ExtractNgramsT, ExtractedNgrams(..), TermType, cleanExtractedNgrams, enrichedTerms, extractNgramsT, extractTerms, tt_lang)
import Gargantext.Core.Types (POS(NP), TermsCount) import Gargantext.Core.Types (POS(NP), TermsCount, TermsWeight)
import Gargantext.Database.Admin.Types.Hyperdata.Contact ( HyperdataContact, cw_lastName, hc_who ) import Gargantext.Database.Admin.Types.Hyperdata.Contact ( HyperdataContact, cw_lastName, hc_who )
import Gargantext.Database.Admin.Types.Hyperdata.Document ( HyperdataDocument, hd_authors, hd_bdd, hd_institutes, hd_source ) import Gargantext.Database.Admin.Types.Hyperdata.Document ( HyperdataDocument, hd_authors, hd_bdd, hd_institutes, hd_source )
import Gargantext.Database.Admin.Types.Node ( Node ) import Gargantext.Database.Admin.Types.Node ( Node )
import Gargantext.Database.Prelude (DBCmd) import Gargantext.Database.Prelude (DBCmd)
import Gargantext.Database.Query.Table.NgramsPostag (NgramsPostag)
import Gargantext.Database.Schema.Ngrams ( text2ngrams ) import Gargantext.Database.Schema.Ngrams ( text2ngrams )
import Gargantext.Database.Schema.Node (NodePoly(..)) import Gargantext.Database.Schema.Node (NodePoly(..))
import Gargantext.Prelude import Gargantext.Prelude
...@@ -42,7 +44,7 @@ instance ExtractNgramsT HyperdataContact ...@@ -42,7 +44,7 @@ instance ExtractNgramsT HyperdataContact
extractNgramsT _ncs l hc = HashMap.mapKeys (cleanExtractedNgrams 255) <$> extract l hc extractNgramsT _ncs l hc = HashMap.mapKeys (cleanExtractedNgrams 255) <$> extract l hc
where where
extract :: TermType Lang -> HyperdataContact extract :: TermType Lang -> HyperdataContact
-> DBCmd err (HashMap.HashMap ExtractedNgrams (Map NgramsType Int, TermsCount)) -> DBCmd err (HashMap.HashMap ExtractedNgrams (Map NgramsType TermsWeight, TermsCount))
extract _l hc' = do extract _l hc' = do
let authors = map text2ngrams let authors = map text2ngrams
$ maybe ["Nothing"] (\a -> [a]) $ maybe ["Nothing"] (\a -> [a])
...@@ -59,11 +61,11 @@ instance ExtractNgramsT HyperdataDocument ...@@ -59,11 +61,11 @@ instance ExtractNgramsT HyperdataDocument
extractNgramsT :: NLPServerConfig extractNgramsT :: NLPServerConfig
-> TermType Lang -> TermType Lang
-> HyperdataDocument -> HyperdataDocument
-> DBCmd err (HashMap.HashMap ExtractedNgrams (Map NgramsType Int, TermsCount)) -> DBCmd err (HashMap.HashMap ExtractedNgrams (Map NgramsType TermsWeight, TermsCount))
extractNgramsT ncs lang hd = HashMap.mapKeys (cleanExtractedNgrams 255) <$> extractNgramsT' hd extractNgramsT ncs lang hd = HashMap.mapKeys (cleanExtractedNgrams 255) <$> extractNgramsT' hd
where where
extractNgramsT' :: HyperdataDocument extractNgramsT' :: HyperdataDocument
-> DBCmd err (HashMap.HashMap ExtractedNgrams (Map NgramsType Int, TermsCount)) -> DBCmd err (HashMap.HashMap ExtractedNgrams (Map NgramsType TermsWeight, TermsCount))
extractNgramsT' doc = do extractNgramsT' doc = do
let source = text2ngrams let source = text2ngrams
$ maybe "Nothing" identity $ maybe "Nothing" identity
...@@ -77,14 +79,15 @@ instance ExtractNgramsT HyperdataDocument ...@@ -77,14 +79,15 @@ instance ExtractNgramsT HyperdataDocument
$ maybe ["Nothing"] (splitOn Authors (doc^. hd_bdd)) $ maybe ["Nothing"] (splitOn Authors (doc^. hd_bdd))
$ doc ^. hd_authors $ doc ^. hd_authors
termsWithCounts' <- map (first (enrichedTerms (lang ^. tt_lang) (server ncs) NP)) . concat <$> termsWithCounts' :: [(NgramsPostag, TermsCount)] <-
map (first (enrichedTerms (lang ^. tt_lang) (server ncs) NP)) . concat <$>
liftBase (extractTerms ncs lang $ hasText doc) liftBase (extractTerms ncs lang $ hasText doc)
pure $ HashMap.fromList pure $ HashMap.fromList
$ [(SimpleNgrams source, (DM.singleton Sources 1, 1)) ] $ [(SimpleNgrams source, (DM.singleton Sources 1, 1)) ]
<> [(SimpleNgrams i', (DM.singleton Institutes 1, 1)) | i' <- institutes ] <> [(SimpleNgrams i', (DM.singleton Institutes 1, 1)) | i' <- institutes ]
<> [(SimpleNgrams a', (DM.singleton Authors 1, 1)) | a' <- authors ] <> [(SimpleNgrams a', (DM.singleton Authors 1, 1)) | a' <- authors ]
<> [(EnrichedNgrams t', (DM.singleton NgramsTerms 1, cnt')) | (t', cnt') <- termsWithCounts' ] <> [(EnrichedNgrams t', (DM.singleton NgramsTerms 1, cnt')) | (t', cnt') <- termsWithCounts' ]
instance (ExtractNgramsT a, HasText a) => ExtractNgramsT (Node a) instance (ExtractNgramsT a, HasText a) => ExtractNgramsT (Node a)
where where
......
...@@ -26,7 +26,7 @@ import Gargantext.Core.Text ( HasText ) ...@@ -26,7 +26,7 @@ import Gargantext.Core.Text ( HasText )
import Gargantext.API.Admin.Orchestrator.Types qualified as API import Gargantext.API.Admin.Orchestrator.Types qualified as API
import Gargantext.Core.Text.Ngrams (NgramsType(..)) import Gargantext.Core.Text.Ngrams (NgramsType(..))
import Gargantext.Core.Text.Terms ( ExtractNgramsT ) import Gargantext.Core.Text.Terms ( ExtractNgramsT )
import Gargantext.Core.Types (HasValidationError, TermsCount) import Gargantext.Core.Types (HasValidationError, TermsCount, TermsWeight)
import Gargantext.Core.Utils.Prefix (unPrefix, unPrefixSwagger) import Gargantext.Core.Utils.Prefix (unPrefix, unPrefixSwagger)
import Gargantext.Database.Admin.Types.Hyperdata.Document ( HyperdataDocument ) import Gargantext.Database.Admin.Types.Hyperdata.Document ( HyperdataDocument )
import Gargantext.Database.Admin.Types.Node (NodeId) import Gargantext.Database.Admin.Types.Node (NodeId)
...@@ -67,7 +67,7 @@ type FlowInsertDB a = ( AddUniqId a ...@@ -67,7 +67,7 @@ type FlowInsertDB a = ( AddUniqId a
data DocumentIdWithNgrams a b = data DocumentIdWithNgrams a b =
DocumentIdWithNgrams DocumentIdWithNgrams
{ documentWithId :: Indexed NodeId a { documentWithId :: Indexed NodeId a
, documentNgrams :: HashMap b (Map NgramsType Int, TermsCount) , documentNgrams :: HashMap b (Map NgramsType TermsWeight, TermsCount)
} deriving (Show) } deriving (Show)
......
...@@ -14,20 +14,21 @@ module Gargantext.Database.Action.Flow.Utils ...@@ -14,20 +14,21 @@ module Gargantext.Database.Action.Flow.Utils
, documentIdWithNgrams , documentIdWithNgrams
, insertDocNgrams , insertDocNgrams
, insertDocs , insertDocs
, mapNodeIdNgrams ) , mapNodeIdNgrams
, ngramsByDoc )
where where
import Data.HashMap.Strict (HashMap) import Data.HashMap.Strict (HashMap)
import Data.HashMap.Strict qualified as HashMap import Data.HashMap.Strict qualified as HashMap
import Data.List qualified as List
import Data.Map.Strict qualified as DM import Data.Map.Strict qualified as DM
import Data.Text qualified as T import Data.Text qualified as T
import Gargantext.API.Ngrams.Types qualified as NT import Gargantext.API.Ngrams.Types qualified as NT
import Gargantext.Core (Lang, toDBid) import Gargantext.Core (Lang, toDBid)
import Gargantext.Core.Flow.Types (UniqId, uniqId) import Gargantext.Core.Flow.Types (UniqId, uniqId)
import Gargantext.Core.Text.Ngrams ( Ngrams, NgramsType ) import Gargantext.Core.Text.Ngrams ( Ngrams, NgramsType )
import Gargantext.Core.Text.Terms (ExtractedNgrams(..))
import Gargantext.Core.Text.Terms.WithList (MatchedText, buildPatternsWith, termsInText) import Gargantext.Core.Text.Terms.WithList (MatchedText, buildPatternsWith, termsInText)
import Gargantext.Core.Types (TermsCount) import Gargantext.Core.Types (TermsCount, TermsWeight(..))
import Gargantext.Core.Utils (addTuples) import Gargantext.Core.Utils (addTuples)
import Gargantext.Data.HashMap.Strict.Utils qualified as HashMap import Gargantext.Data.HashMap.Strict.Utils qualified as HashMap
import Gargantext.Database.Action.Flow.Types (DocumentIdWithNgrams(..), FlowInsertDB) import Gargantext.Database.Action.Flow.Types (DocumentIdWithNgrams(..), FlowInsertDB)
...@@ -38,15 +39,15 @@ import Gargantext.Database.Query.Table.ContextNodeNgrams ( ContextNodeNgramsPoly ...@@ -38,15 +39,15 @@ import Gargantext.Database.Query.Table.ContextNodeNgrams ( ContextNodeNgramsPoly
import Gargantext.Database.Query.Table.Node.Document.Add qualified as Doc (add) import Gargantext.Database.Query.Table.Node.Document.Add qualified as Doc (add)
import Gargantext.Database.Query.Table.Node.Document.Insert (ReturnId, addUniqId, insertDb, reId, reInserted, reUniqId) import Gargantext.Database.Query.Table.Node.Document.Insert (ReturnId, addUniqId, insertDb, reId, reInserted, reUniqId)
import Gargantext.Database.Query.Table.Node.Error (HasNodeError(..)) import Gargantext.Database.Query.Table.Node.Error (HasNodeError(..))
import Gargantext.Database.Schema.Context (context_hyperdata, context_id) import Gargantext.Database.Schema.Context (context_oid_hyperdata, context_oid_id)
import Gargantext.Database.Schema.Ngrams (NgramsId, NgramsTypeId(..)) import Gargantext.Database.Schema.Ngrams (NgramsId, NgramsTypeId(..), text2ngrams)
import Gargantext.Database.Types ( Indexed(..), index ) import Gargantext.Database.Types ( Indexed(..), index )
import Gargantext.Prelude import Gargantext.Prelude
import Gargantext.Prelude.Crypto.Hash (Hash) import Gargantext.Prelude.Crypto.Hash (Hash)
insertDocNgrams :: ListId insertDocNgrams :: ListId
-> HashMap (Indexed NgramsId Ngrams) (Map NgramsType (Map DocId (Int, TermsCount))) -> HashMap (Indexed NgramsId Ngrams) (Map NgramsType (Map DocId (TermsWeight, TermsCount)))
-> DBCmd err Int -> DBCmd err Int
insertDocNgrams lId m = do insertDocNgrams lId m = do
-- printDebug "[insertDocNgrams] ns" ns -- printDebug "[insertDocNgrams] ns" ns
...@@ -55,11 +56,11 @@ insertDocNgrams lId m = do ...@@ -55,11 +56,11 @@ insertDocNgrams lId m = do
ns = [ ContextNodeNgrams (nodeId2ContextId docId) ns = [ ContextNodeNgrams (nodeId2ContextId docId)
lId (ng^.index) lId (ng^.index)
(NgramsTypeId $ toDBid t) (NgramsTypeId $ toDBid t)
(fromIntegral i) (fromIntegral $ unTermsWeight w)
cnt cnt
| (ng, t2n2i) <- HashMap.toList m | (ng, t2n2i) <- HashMap.toList m
, (t, n2i) <- DM.toList t2n2i , (t, n2i) <- DM.toList t2n2i
, (docId, (i, cnt)) <- DM.toList n2i , (docId, (w, cnt)) <- DM.toList n2i
] ]
-- [(NodeId, {Ngrams: ({NgramsType: Int}, TermsCount)})] -- [(NodeId, {Ngrams: ({NgramsType: Int}, TermsCount)})]
...@@ -67,28 +68,26 @@ insertDocNgrams lId m = do ...@@ -67,28 +68,26 @@ insertDocNgrams lId m = do
-- Given language, ngrams type, a list of terms and a
-- HyperdataDocument, return ngrams that are in this text, with counts.
-- This is a pure function (doesn't use corenlp nor PostgreSQL FTS).
docNgrams :: Lang docNgrams :: Lang
-> NgramsType
-> [NT.NgramsTerm] -> [NT.NgramsTerm]
-> Gargantext.Database.Admin.Types.Node.Context HyperdataDocument -> ContextOnlyId HyperdataDocument
-> [((MatchedText, TermsCount), -> [(MatchedText, TermsCount)]
Map NgramsType (Map NodeId Int))] docNgrams lang ts doc =
docNgrams lang nt ts doc = (
List.zip termsInText lang (buildPatternsWith lang ts)
(termsInText lang (buildPatternsWith lang ts) $ T.unlines $ catMaybes
$ T.unlines $ catMaybes [ doc ^. context_oid_hyperdata . hd_title
[ doc ^. context_hyperdata . hd_title , doc ^. context_oid_hyperdata . hd_abstract
, doc ^. context_hyperdata . hd_abstract ]
]
) )
(List.cycle [DM.fromList $ [(nt, DM.singleton (doc ^. context_id) 1 )]])
documentIdWithNgrams :: HasNodeError err documentIdWithNgrams :: HasNodeError err
=> (a => ( a
-> DBCmd err (HashMap.HashMap b (Map NgramsType Int, TermsCount))) -> DBCmd err (HashMap.HashMap b (Map NgramsType TermsWeight, TermsCount)) )
-> [Indexed NodeId a] -> [Indexed NodeId a]
-> DBCmd err [DocumentIdWithNgrams a b] -> DBCmd err [DocumentIdWithNgrams a b]
documentIdWithNgrams f = traverse toDocumentIdWithNgrams documentIdWithNgrams f = traverse toDocumentIdWithNgrams
...@@ -103,7 +102,7 @@ mapNodeIdNgrams :: (Ord b, Hashable b) ...@@ -103,7 +102,7 @@ mapNodeIdNgrams :: (Ord b, Hashable b)
=> [DocumentIdWithNgrams a b] => [DocumentIdWithNgrams a b]
-> HashMap.HashMap b -> HashMap.HashMap b
(Map NgramsType (Map NgramsType
(Map NodeId (Int, TermsCount)) (Map NodeId (TermsWeight, TermsCount))
) )
mapNodeIdNgrams = HashMap.unionsWith (DM.unionWith (DM.unionWith addTuples)) . fmap f mapNodeIdNgrams = HashMap.unionsWith (DM.unionWith (DM.unionWith addTuples)) . fmap f
where where
...@@ -112,8 +111,8 @@ mapNodeIdNgrams = HashMap.unionsWith (DM.unionWith (DM.unionWith addTuples)) . f ...@@ -112,8 +111,8 @@ mapNodeIdNgrams = HashMap.unionsWith (DM.unionWith (DM.unionWith addTuples)) . f
-- for it (which is the number of times the terms appears in a -- for it (which is the number of times the terms appears in a
-- document) is copied over to all its types. -- document) is copied over to all its types.
f :: DocumentIdWithNgrams a b f :: DocumentIdWithNgrams a b
-> HashMap.HashMap b (Map NgramsType (Map NodeId (Int, TermsCount))) -> HashMap.HashMap b (Map NgramsType (Map NodeId (TermsWeight, TermsCount)))
f d = fmap (\(ngramsTypeMap, cnt) -> fmap (\i -> DM.singleton nId (i, cnt)) ngramsTypeMap) $ documentNgrams d f d = fmap (\(ngramsTypeMap, cnt) -> fmap (\w -> DM.singleton nId (w, cnt)) ngramsTypeMap) $ documentNgrams d
where where
nId = _index $ documentWithId d nId = _index $ documentWithId d
...@@ -183,3 +182,27 @@ toInserted = ...@@ -183,3 +182,27 @@ toInserted =
-- docs <- List.concat <$> mapM getDocumentsWithParentId corpusIds -- docs <- List.concat <$> mapM getDocumentsWithParentId corpusIds
-- _ <- mapM extractInsert (splitEvery 1000 docs) -- _ <- mapM extractInsert (splitEvery 1000 docs)
-- pure () -- pure ()
-- | Given list of terms and a document, produce a map for this doc's
-- terms count and weights. Notice that the weight is always 1 here.
ngramsByDoc :: Lang
-> NgramsType
-> [NT.NgramsTerm]
-> ContextOnlyId HyperdataDocument
-> HashMap.HashMap ExtractedNgrams (DM.Map NgramsType (Map NodeId (TermsWeight, TermsCount)))
ngramsByDoc l nt ts doc =
HashMap.map (\cnt -> DM.singleton nt $ DM.singleton nId (1, cnt)) extractedMap
where
matched :: [(MatchedText, TermsCount)]
matched = docNgrams l ts doc
nId :: NodeId
nId = doc ^. context_oid_id
withExtractedNgrams :: [(ExtractedNgrams, TermsCount)]
withExtractedNgrams = first (SimpleNgrams . text2ngrams) <$> matched
extractedMap :: HashMap.HashMap ExtractedNgrams TermsCount
extractedMap = HashMap.fromListWith (+) withExtractedNgrams
...@@ -50,15 +50,18 @@ instance HasText HyperdataDocument ...@@ -50,15 +50,18 @@ instance HasText HyperdataDocument
, _hd_abstract h , _hd_abstract h
] ]
defaultHyperdataDocument :: HyperdataDocument emptyHyperdataDocument :: HyperdataDocument
defaultHyperdataDocument = case decode docExample of emptyHyperdataDocument = HyperdataDocument Nothing Nothing
Just hp -> hp
Nothing -> HyperdataDocument Nothing Nothing
Nothing Nothing Nothing Nothing Nothing Nothing Nothing Nothing
Nothing Nothing Nothing Nothing Nothing Nothing Nothing Nothing
Nothing Nothing Nothing Nothing Nothing Nothing Nothing Nothing
Nothing Nothing Nothing Nothing Nothing Nothing Nothing Nothing
defaultHyperdataDocument :: HyperdataDocument
defaultHyperdataDocument = case decode docExample of
Just hp -> hp
Nothing -> emptyHyperdataDocument
where where
docExample :: ByteString docExample :: ByteString
docExample = "{\"doi\":\"sdfds\",\"publication_day\":6,\"language_iso2\":\"en\",\"publication_minute\":0,\"publication_month\":7,\"language_iso3\":\"eng\",\"publication_second\":0,\"authors\":\"Nils Hovdenak, Kjell Haram\",\"publication_year\":2012,\"publication_date\":\"2012-07-06 00:00:00+00:00\",\"language_name\":\"English\",\"realdate_full_\":\"2012 01 12\",\"source\":\"European journal of obstetrics, gynecology, and reproductive biology\",\"abstract\":\"The literature was searched for publications on minerals and vitamins during pregnancy and the possible influence of supplements on pregnancy outcome.\",\"title\":\"Influence of mineral and vitamin supplements on pregnancy outcome.\",\"publication_hour\":0}" docExample = "{\"doi\":\"sdfds\",\"publication_day\":6,\"language_iso2\":\"en\",\"publication_minute\":0,\"publication_month\":7,\"language_iso3\":\"eng\",\"publication_second\":0,\"authors\":\"Nils Hovdenak, Kjell Haram\",\"publication_year\":2012,\"publication_date\":\"2012-07-06 00:00:00+00:00\",\"language_name\":\"English\",\"realdate_full_\":\"2012 01 12\",\"source\":\"European journal of obstetrics, gynecology, and reproductive biology\",\"abstract\":\"The literature was searched for publications on minerals and vitamins during pregnancy and the possible influence of supplements on pregnancy outcome.\",\"title\":\"Influence of mineral and vitamin supplements on pregnancy outcome.\",\"publication_hour\":0}"
......
...@@ -106,6 +106,7 @@ type ContextTitle = Text ...@@ -106,6 +106,7 @@ type ContextTitle = Text
-- | NodePoly indicates that Node has a Polymorphism Type -- | NodePoly indicates that Node has a Polymorphism Type
type Node json = NodePoly NodeId (Maybe Hash) NodeTypeId UserId (Maybe ParentId) NodeName UTCTime json type Node json = NodePoly NodeId (Maybe Hash) NodeTypeId UserId (Maybe ParentId) NodeName UTCTime json
type Context json = ContextPoly NodeId (Maybe Hash) NodeTypeId UserId (Maybe ParentId) ContextTitle UTCTime json type Context json = ContextPoly NodeId (Maybe Hash) NodeTypeId UserId (Maybe ParentId) ContextTitle UTCTime json
type ContextOnlyId json = ContextPolyOnlyId NodeId json
-- | NodeSearch (queries) -- | NodeSearch (queries)
-- type NodeSearch json = NodePolySearch NodeId NodeTypeId UserId (Maybe ParentId) NodeName UTCTime json (Maybe TSVector) -- type NodeSearch json = NodePolySearch NodeId NodeTypeId UserId (Maybe ParentId) NodeName UTCTime json (Maybe TSVector)
......
...@@ -60,9 +60,9 @@ update loggedInUserId (Move sourceId targetId) = do ...@@ -60,9 +60,9 @@ update loggedInUserId (Move sourceId targetId) = do
mbParentId <- getParentId sourceId mbParentId <- getParentId sourceId
-- if the source and the target are the same, this is identity. -- if the source and the target are the same, this is identity.
case sourceId == targetId of if sourceId == targetId
True -> pure [ _NodeId sourceId ] then pure [ _NodeId sourceId ]
False -> do else do
isSourceRO <- isNodeReadOnly sourceId isSourceRO <- isNodeReadOnly sourceId
isTargetRO <- isNodeReadOnly targetId isTargetRO <- isNodeReadOnly targetId
...@@ -70,24 +70,29 @@ update loggedInUserId (Move sourceId targetId) = do ...@@ -70,24 +70,29 @@ update loggedInUserId (Move sourceId targetId) = do
-- act accordingly. -- act accordingly.
ids <- case (isSourceRO, isTargetRO) of ids <- case (isSourceRO, isTargetRO) of
(False, False) (False, False)
-> -- both are not read-only, normal move -> do
move_db_update sourceId targetId -- both are not read-only, normal move
move_db_update sourceId targetId
(False, True) (False, True)
-> do void $ publish_node (SourceId sourceId) (TargetId targetId) NPP_publish_no_edits_allowed -> do
move_db_update sourceId targetId void $ publish_node (SourceId sourceId) (TargetId targetId) NPP_publish_no_edits_allowed
move_db_update sourceId targetId
(True, False) (True, False)
-> -- the source is read only. If we are the owner we allow unpublishing. -> do
-- FIXME(adn) is this check enough? -- the source is read only. If we are the owner we allow unpublishing.
do sourceNode <- getNode sourceId -- FIXME(adn) is this check enough?
case _node_user_id sourceNode == loggedInUserId of sourceNode <- getNode sourceId
True -> do if _node_user_id sourceNode == loggedInUserId
userPublicFolderNode <- getUserRootPublicNode loggedInUserId then do
unpublishNode (SourceId $ sourceId) (TargetId $ _node_id userPublicFolderNode) userPublicFolderNode <- getUserRootPublicNode loggedInUserId
move_db_update sourceId targetId unpublishNode (SourceId $ sourceId) (TargetId $ _node_id userPublicFolderNode)
False -> nodeError (NodeIsReadOnly targetId "logged user is not allowed to move/unpublish a read-only node") move_db_update sourceId targetId
else
nodeError (NodeIsReadOnly targetId "logged user is not allowed to move/unpublish a read-only node")
(True, True) (True, True)
-> -- this case is not allowed. -> do
nodeError (NodeIsReadOnly targetId "Both the source and the target are read-only.") -- this case is not allowed.
nodeError (NodeIsReadOnly targetId "Both the source and the target are read-only.")
for_ mbParentId $ CE.ce_notify . CE.UpdateTreeFirstLevel for_ mbParentId $ CE.ce_notify . CE.UpdateTreeFirstLevel
CE.ce_notify $ CE.UpdateTreeFirstLevel targetId CE.ce_notify $ CE.UpdateTreeFirstLevel targetId
......
...@@ -22,6 +22,7 @@ module Gargantext.Database.Query.Table.NodeContext ...@@ -22,6 +22,7 @@ module Gargantext.Database.Query.Table.NodeContext
, queryNodeContextTable , queryNodeContextTable
, selectDocsDates , selectDocsDates
, selectDocNodes , selectDocNodes
, selectDocNodesOnlyId
, selectDocs , selectDocs
, nodeContextsCategory , nodeContextsCategory
, nodeContextsScore , nodeContextsScore
...@@ -413,6 +414,15 @@ queryDocNodes cId = proc () -> do ...@@ -413,6 +414,15 @@ queryDocNodes cId = proc () -> do
restrict -< (c ^. context_typename) .== sqlInt4 (toDBid NodeDocument) restrict -< (c ^. context_typename) .== sqlInt4 (toDBid NodeDocument)
returnA -< c returnA -< c
selectDocNodesOnlyId :: HasDBid NodeType => CorpusId -> DBCmd err [ContextOnlyId HyperdataDocument]
selectDocNodesOnlyId cId = runOpaQuery (queryDocNodesOnlyId cId)
queryDocNodesOnlyId :: HasDBid NodeType => CorpusId -> O.Select ContextOnlyIdRead
queryDocNodesOnlyId cId = proc () -> do
c <- queryDocNodes cId -< ()
returnA -< ContextOnlyId (c ^. context_id) (c ^. context_hyperdata)
joinInCorpus :: O.Select (ContextRead, MaybeFields NodeContextRead) joinInCorpus :: O.Select (ContextRead, MaybeFields NodeContextRead)
joinInCorpus = proc () -> do joinInCorpus = proc () -> do
c <- queryContextTable -< () c <- queryContextTable -< ()
......
...@@ -52,6 +52,25 @@ $(makeLenses ''ContextPoly) ...@@ -52,6 +52,25 @@ $(makeLenses ''ContextPoly)
$(makeAdaptorAndInstance "pContext" ''ContextPoly) $(makeAdaptorAndInstance "pContext" ''ContextPoly)
$(makeLensesWith abbreviatedFields ''ContextPoly) $(makeLensesWith abbreviatedFields ''ContextPoly)
------------------------------------------------------------------------
-- | This datatype describes queries in the `contexts` table, where
-- only `id` and `hyperdata` are fetched.
data ContextPolyOnlyId id hyperdata =
ContextOnlyId { _context_oid_id :: !id
, _context_oid_hyperdata :: !hyperdata }
deriving (Show, Generic)
------------------------------------------------------------------------
-- Automatic instances derivation
$(deriveJSON (unPrefix "_context_oid_") ''ContextPolyOnlyId)
$(makeLenses ''ContextPolyOnlyId)
$(makeAdaptorAndInstance "pContextOnlyId" ''ContextPolyOnlyId)
$(makeLensesWith abbreviatedFields ''ContextPolyOnlyId)
------------------------------------------------------------------------
contextTable :: Table ContextWrite ContextRead contextTable :: Table ContextWrite ContextRead
contextTable = Table "contexts" (pContext Context { _context_id = optionalTableField "id" contextTable = Table "contexts" (pContext Context { _context_id = optionalTableField "id"
, _context_hash_id = optionalTableField "hash_id" , _context_hash_id = optionalTableField "hash_id"
...@@ -87,6 +106,10 @@ type ContextRead = ContextPoly (Field SqlInt4 ) ...@@ -87,6 +106,10 @@ type ContextRead = ContextPoly (Field SqlInt4 )
(Field SqlText ) (Field SqlText )
(Field SqlTimestamptz ) (Field SqlTimestamptz )
(Field SqlJsonb ) (Field SqlJsonb )
type ContextOnlyIdRead = ContextPolyOnlyId (Field SqlInt4 )
(Field SqlJsonb )
------------------------------------------------------------------------ ------------------------------------------------------------------------
-- | Context(Read|Write)Search is slower than Context(Write|Read) use it -- | Context(Read|Write)Search is slower than Context(Write|Read) use it
-- for full text search only -- for full text search only
......
...@@ -257,15 +257,15 @@ ...@@ -257,15 +257,15 @@
git: "https://gitlab.iscpif.fr/gargantext/gargantext-graph.git" git: "https://gitlab.iscpif.fr/gargantext/gargantext-graph.git"
subdirs: subdirs:
- . - .
- commit: 69b7388a62f2afb5cb5609beac96e8cb35e94478 - commit: 4a9c709613554eed0189b486de2126c18797088c
git: "https://gitlab.iscpif.fr/gargantext/haskell-bee" git: "https://gitlab.iscpif.fr/gargantext/haskell-bee"
subdirs: subdirs:
- "haskell-bee-pgmq/" - "haskell-bee-pgmq/"
- commit: 69b7388a62f2afb5cb5609beac96e8cb35e94478 - commit: 4a9c709613554eed0189b486de2126c18797088c
git: "https://gitlab.iscpif.fr/gargantext/haskell-bee" git: "https://gitlab.iscpif.fr/gargantext/haskell-bee"
subdirs: subdirs:
- "haskell-bee-tests/" - "haskell-bee-tests/"
- commit: 69b7388a62f2afb5cb5609beac96e8cb35e94478 - commit: 4a9c709613554eed0189b486de2126c18797088c
git: "https://gitlab.iscpif.fr/gargantext/haskell-bee" git: "https://gitlab.iscpif.fr/gargantext/haskell-bee"
subdirs: subdirs:
- "haskell-bee/" - "haskell-bee/"
......
...@@ -22,8 +22,6 @@ data_filepath = "~/.garg" ...@@ -22,8 +22,6 @@ data_filepath = "~/.garg"
#repo_filepath = "~/.garg" #repo_filepath = "~/.garg"
[apis] [apis]
[apis.pubmed]
api_key = "no_key"
[apis.epo] [apis.epo]
api_url = "" api_url = ""
......
...@@ -11,6 +11,7 @@ Portability : POSIX ...@@ -11,6 +11,7 @@ Portability : POSIX
{-# OPTIONS_GHC -Wno-orphans #-} {-# OPTIONS_GHC -Wno-orphans #-}
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE ScopedTypeVariables #-}
...@@ -20,28 +21,40 @@ module Test.API.Notifications ( ...@@ -20,28 +21,40 @@ module Test.API.Notifications (
import Control.Concurrent (threadDelay) import Control.Concurrent (threadDelay)
import Control.Concurrent.STM.TChan import Control.Concurrent.STM.TChan
import Control.Concurrent.STM.TSem (newTSem, signalTSem) import Control.Concurrent.STM.TSem (newTSem, signalTSem, TSem)
import Control.Lens ((^.)) import Control.Lens ((^.))
import Control.Monad (void)
import Control.Monad.STM (atomically) import Control.Monad.STM (atomically)
import Data.Aeson qualified as Aeson import Data.Aeson qualified as Aeson
import Data.ByteString qualified as BS
import Data.Text qualified as T
import Data.Text.Encoding qualified as TE
import Fmt ((+|), (|+))
import Gargantext.API.Admin.Auth.Types (AuthResponse, authRes_token, authRes_tree_id)
import Gargantext.Core.Config (gc_notifications_config) import Gargantext.Core.Config (gc_notifications_config)
import Gargantext.Core.Notifications.CentralExchange qualified as CE import Gargantext.Core.Notifications.CentralExchange qualified as CE
import Gargantext.Core.Notifications.CentralExchange.Types qualified as CET import Gargantext.Core.Notifications.CentralExchange.Types qualified as CET
import Gargantext.Core.Notifications.Dispatcher.Types qualified as DT import Gargantext.Core.Notifications.Dispatcher.Types qualified as DT
import Gargantext.Core.Types.Individu (GargPassword(..))
import Gargantext.System.Logging (withLogger) import Gargantext.System.Logging (withLogger)
import Network.WebSockets qualified as WS import Network.WebSockets qualified as WS
import Prelude import Prelude
import System.Timeout qualified as Timeout import System.Timeout qualified as Timeout
import Test.API.Setup (SpecContext(..), withTestDBAndPort) import Test.API.Prelude (newCorpusForUser)
import Test.API.Routes (mkUrl)
import Test.API.Setup (SpecContext(..), dbEnvSetup, withTestDBAndPort)
import Test.Database.Types (test_config) import Test.Database.Types (test_config)
import Test.Hspec import Test.Hspec
import Test.Hspec.Wai.Internal (withApplication)
import Test.Instances () import Test.Instances ()
import Test.Utils (waitForTChanValue, waitForTSem) import Text.RawString.QQ (r)
import Test.Utils (protected, waitForTChanValue, waitForTSem, withValidLoginA)
import Test.Utils.Notifications (withAsyncWSConnection) import Test.Utils.Notifications (withAsyncWSConnection)
tests :: Spec tests :: Spec
tests = sequential $ aroundAll withTestDBAndPort $ do tests = sequential $ around withTestDBAndPort $ do
describe "Notifications" $ do describe "Notifications" $ do
it "ping WS notification works" $ \(SpecContext testEnv port _app _) -> do it "ping WS notification works" $ \(SpecContext testEnv port _app _) -> do
let nc = (test_config testEnv) ^. gc_notifications_config let nc = (test_config testEnv) ^. gc_notifications_config
...@@ -54,20 +67,8 @@ tests = sequential $ aroundAll withTestDBAndPort $ do ...@@ -54,20 +67,8 @@ tests = sequential $ aroundAll withTestDBAndPort $ do
-- locking mechanisms than blindly call 'threadDelay'. -- locking mechanisms than blindly call 'threadDelay'.
wsTSem <- atomically $ newTSem 0 wsTSem <- atomically $ newTSem 0
tchan <- newTChanIO :: IO (TChan (Maybe DT.Notification)) tchan <- newTChanIO :: IO (TChan (Maybe DT.Notification))
-- setup a websocket connection
let wsConnect conn = withLogger () $ \_ioL -> do withAsyncWSConnection ("127.0.0.1", port) (wsConnection topic wsTSem tchan) $ \_a -> do
-- logMsg ioL DEBUG $ "[wsConnect] subscribing topic: " <> show topic
WS.sendTextData conn $ Aeson.encode (DT.WSSubscribe topic)
-- inform the test process that we sent the subscription request
atomically $ signalTSem wsTSem
-- logMsg ioL DEBUG $ "[wsConnect] waiting for notification"
d <- WS.receiveData conn
-- logMsg ioL DEBUG $ "[wsConnect] received " <> show d
let dec = Aeson.decode d :: Maybe DT.Notification
atomically $ writeTChan tchan dec
withAsyncWSConnection ("127.0.0.1", port) wsConnect $ \_a -> do
-- wait for ws process to inform us about topic subscription -- wait for ws process to inform us about topic subscription
waitForTSem wsTSem 500 waitForTSem wsTSem 500
...@@ -133,31 +134,99 @@ tests = sequential $ aroundAll withTestDBAndPort $ do ...@@ -133,31 +134,99 @@ tests = sequential $ aroundAll withTestDBAndPort $ do
-- wait for the value -- wait for the value
waitForTChanValue tchan Nothing 1_000 waitForTChanValue tchan Nothing 1_000
it "simple update tree WS notification works" $ \(SpecContext testEnv port _app _) -> do describe "Update tree notifications" $ do
let nc = (test_config testEnv) ^. gc_notifications_config it "simple WS notification works" $ \(SpecContext testEnv port _app _) -> do
let nc = (test_config testEnv) ^. gc_notifications_config
let topic = DT.UpdateTree 0
wsTSem <- atomically $ newTSem 0 -- initially locked
tchan <- newTChanIO :: IO (TChan (Maybe DT.Notification))
withAsyncWSConnection ("127.0.0.1", port) (wsConnection topic wsTSem tchan) $ \_a -> do
waitForTSem wsTSem 500
let nodeId = 0
CE.notify nc $ CET.UpdateTreeFirstLevel nodeId
waitForTChanValue tchan (Just $ DT.NUpdateTree nodeId) 1_000
let topic = DT.UpdateTree 0 it "WS notification on node creation works" $ \ctx@(SpecContext _testEnv port app _) -> do
wsTSem <- atomically $ newTSem 0 -- initially locked checkNotification ctx $ \authRes -> do
tchan <- newTChanIO :: IO (TChan (Maybe DT.Notification)) let token = authRes ^. authRes_token
-- setup a websocket connection let treeId = authRes ^. authRes_tree_id
let wsConnect conn = withLogger () $ \_ioL -> do let query = [r| {"pn_name": "test", "pn_typename": "NodeCorpus"} |]
-- logMsg ioL DEBUG $ "[wsConnect] subscribing topic: " <> show topic void $ withApplication app $ do
WS.sendTextData conn $ Aeson.encode (DT.WSSubscribe topic) protected token "POST" (mkUrl port $ "/node/" +| treeId |+ "") query
-- inform the test process that we sent the subscription request
atomically $ signalTSem wsTSem it "WS notification on node deletion works" $ \ctx@(SpecContext testEnv port app _) -> do
checkNotification ctx $ \authRes -> do
let token = authRes ^. authRes_token
cId <- newCorpusForUser testEnv "alice"
-- logMsg ioL DEBUG $ "[wsConnect] waiting for notification" void $ withApplication app $ do
d <- WS.receiveData conn protected token "DELETE" (mkUrl port $ "/node/" +| cId |+ "") ""
-- logMsg ioL DEBUG $ "[wsConnect] received " <> show d
let dec = Aeson.decode d :: Maybe DT.Notification it "WS notification on node rename works" $ \ctx@(SpecContext testEnv port app _) -> do
atomically $ writeTChan tchan dec checkNotification ctx $ \authRes -> do
let token = authRes ^. authRes_token
withAsyncWSConnection ("127.0.0.1", port) wsConnect $ \_a -> do cId <- newCorpusForUser testEnv "alice"
waitForTSem wsTSem 500
void $ withApplication app $ do
let nodeId = 0 let query = [r| {"name": "newName"} |]
CE.notify nc $ CET.UpdateTreeFirstLevel nodeId protected token "PUT" (mkUrl port $ "/node/" +| cId |+ "/rename") query
waitForTChanValue tchan (Just $ DT.NUpdateTree nodeId) 1_000 it "WS notification on node move works" $ \ctx@(SpecContext testEnv port app _) -> do
checkNotification ctx $ \authRes -> do
let token = authRes ^. authRes_token
cId <- newCorpusForUser testEnv "alice"
cId2 <- newCorpusForUser testEnv "alice"
void $ withApplication app $ do
let query = BS.fromStrict $ TE.encodeUtf8 $ "[" <> (T.pack $ show cId2) <> "]"
protected token "PUT" (mkUrl port $ "/node/" +| cId |+ "/move/" +| cId2 |+ "" ) query
-- | Given spec context and an action, call that action to perform
-- some node tree update, and check that there was a notification
-- about this tree update.
checkNotification :: SpecContext a
-> (AuthResponse -> IO ())
-> IO ()
checkNotification ctx@(SpecContext _testEnv port _app _) act = do
_ <- dbEnvSetup ctx
withValidLoginA port "alice" (GargPassword "alice") $ \_clientEnv authRes -> do
-- Subscribe to user tree notifications
let treeId = authRes ^. authRes_tree_id
let topic = DT.UpdateTree treeId
wsTSem <- atomically $ newTSem 0 -- initially locked
tchan <- newTChanIO :: IO (TChan (Maybe DT.Notification))
withAsyncWSConnection ("127.0.0.1", port) (wsConnection topic wsTSem tchan) $ \_a -> do
waitForTSem wsTSem 500
act authRes
waitForTChanValue tchan (Just $ DT.NUpdateTree treeId) 1_000
wsConnection :: DT.Topic
-> TSem
-> TChan (Maybe DT.Notification)
-> WS.Connection
-> IO ()
wsConnection topic wsTSem tchan conn = withLogger () $ \_ioL -> do
-- logMsg ioL DEBUG $ "[wsConnect] subscribing topic: " <> show topic
WS.sendTextData conn $ Aeson.encode (DT.WSSubscribe topic)
-- inform the test process that we sent the subscription request
atomically $ signalTSem wsTSem
-- logMsg ioL DEBUG $ "[wsConnect] waiting for notification"
d <- WS.receiveData conn
-- logMsg ioL DEBUG $ "[wsConnect] received " <> show d
let dec = Aeson.decode d :: Maybe DT.Notification
atomically $ writeTChan tchan dec
...@@ -16,8 +16,7 @@ import Data.Aeson qualified as JSON ...@@ -16,8 +16,7 @@ import Data.Aeson qualified as JSON
import Data.Text qualified as T import Data.Text qualified as T
import Gargantext.API.Errors import Gargantext.API.Errors
import Gargantext.Core.Types.Individu import Gargantext.Core.Types.Individu
import Gargantext.Core.Types (NodeId) import Gargantext.Core.Types (NodeId, NodeType(..))
import Gargantext.Core.Types (NodeType(..))
import Gargantext.Core.Worker.Env () -- instance HasNodeError import Gargantext.Core.Worker.Env () -- instance HasNodeError
import Gargantext.Database.Action.User import Gargantext.Database.Action.User
import Gargantext.Database.Admin.Types.Hyperdata.Corpus import Gargantext.Database.Admin.Types.Hyperdata.Corpus
......
...@@ -186,7 +186,7 @@ dbEnvSetup ctx = do ...@@ -186,7 +186,7 @@ dbEnvSetup ctx = do
_ <- createAliceAndBob testEnv _ <- createAliceAndBob testEnv
pure ctx pure ctx
-- show the full exceptions during testing, rather than shallowing them under a generic -- show the full exceptions during testing, rather than shallowing them under a generic
-- "Something went wrong". -- "Something went wrong".
showDebugExceptions :: SomeException -> Wai.Response showDebugExceptions :: SomeException -> Wai.Response
......
{-|
Module : Core.Orchestrator
Description :
Copyright : (c) CNRS, 2017-Present
License : AGPL + CECILL v3
Maintainer : team@gargantext.org
Stability : experimental
Portability : POSIX
-}
module Test.Core.Orchestrator
( qcTests )
where
import Data.Aeson qualified as A
import Gargantext.API.Admin.Orchestrator.Types
import Gargantext.Prelude
import Test.Instances ()
import Test.Tasty
import Test.Tasty.QuickCheck qualified as QC
qcTests :: TestTree
qcTests =
testGroup "Orchestrator QuickCheck tests" $ do
[ QC.testProperty "ExternalAPIs aeson encoding" $ \m -> A.decode (A.encode (m :: ExternalAPIs)) == Just m ]
...@@ -23,13 +23,15 @@ import Data.Text qualified as T ...@@ -23,13 +23,15 @@ import Data.Text qualified as T
import Data.Validity (Validation(..), ValidationChain (..), prettyValidation) import Data.Validity (Validation(..), ValidationChain (..), prettyValidation)
import EPO.API.Client.Types qualified as EPO import EPO.API.Client.Types qualified as EPO
import Gargantext.API.Admin.Auth.Types (AuthenticatedUser, ForgotPasswordAsyncParams) import Gargantext.API.Admin.Auth.Types (AuthenticatedUser, ForgotPasswordAsyncParams)
import Gargantext.API.Admin.Orchestrator.Types qualified as Orch
import Gargantext.API.Errors.Types qualified as Errors import Gargantext.API.Errors.Types qualified as Errors
import Gargantext.API.Ngrams.Types qualified as Ngrams import Gargantext.API.Ngrams.Types qualified as Ngrams
import Gargantext.API.Node.Contact.Types (AddContactParams) import Gargantext.API.Node.Contact.Types (AddContactParams)
import Gargantext.API.Node.Corpus.Annuaire (AnnuaireWithForm) import Gargantext.API.Node.Corpus.Annuaire (AnnuaireWithForm)
import Gargantext.API.Node.Corpus.New (ApiInfo) import Gargantext.API.Node.Corpus.New (ApiInfo)
import Gargantext.API.Node.Corpus.New.Types (FileFormat, FileType) import Gargantext.API.Node.Corpus.New.Types (FileFormat, FileType)
import Gargantext.API.Node.Corpus.Types (Datafield, Database) import Gargantext.API.Node.Corpus.Types (Datafield)
import Gargantext.API.Node.Corpus.Types qualified as CT
import Gargantext.API.Node.DocumentsFromWriteNodes.Types qualified as DFWN import Gargantext.API.Node.DocumentsFromWriteNodes.Types qualified as DFWN
import Gargantext.API.Node.DocumentUpload.Types (DocumentUpload) import Gargantext.API.Node.DocumentUpload.Types (DocumentUpload)
import Gargantext.API.Node.FrameCalcUpload.Types qualified as FCU import Gargantext.API.Node.FrameCalcUpload.Types qualified as FCU
...@@ -127,7 +129,9 @@ instance Arbitrary ApiInfo where arbitrary = genericArbitrary ...@@ -127,7 +129,9 @@ instance Arbitrary ApiInfo where arbitrary = genericArbitrary
instance Arbitrary FileFormat where arbitrary = genericArbitrary instance Arbitrary FileFormat where arbitrary = genericArbitrary
instance Arbitrary FileType where arbitrary = genericArbitrary instance Arbitrary FileType where arbitrary = genericArbitrary
instance Arbitrary Database where arbitrary = arbitraryBoundedEnum instance Arbitrary CT.Database where
arbitrary = oneof [ pure CT.Empty
, CT.DB <$> arbitrary ]
instance Arbitrary Datafield where arbitrary = genericArbitrary instance Arbitrary Datafield where arbitrary = genericArbitrary
instance Arbitrary WithQuery where arbitrary = genericArbitrary instance Arbitrary WithQuery where arbitrary = genericArbitrary
...@@ -291,11 +295,34 @@ instance Arbitrary Hyperdata.HyperdataPublic where ...@@ -291,11 +295,34 @@ instance Arbitrary Hyperdata.HyperdataPublic where
arbitrary = pure Hyperdata.defaultHyperdataPublic arbitrary = pure Hyperdata.defaultHyperdataPublic
instance Arbitrary Orch.ExternalAPIs where
arbitrary = oneof [ pure Orch.OpenAlex
, Orch.PubMed <$> arbitrary
, pure Orch.Arxiv
, pure Orch.HAL
, pure Orch.IsTex
, pure Orch.Isidore
, Orch.EPO <$> arbitrary <*> arbitrary ]
-- instance Arbitrary NewWithFile where -- instance Arbitrary NewWithFile where
-- arbitrary = NewWithFile <$> arbitrary -- _wfi_b64_data -- arbitrary = NewWithFile <$> arbitrary -- _wfi_b64_data
-- <*> arbitrary -- _wf_lang -- <*> arbitrary -- _wf_lang
-- <*> arbitrary -- _wf_name -- <*> arbitrary -- _wf_name
instance Arbitrary Orch.ScraperEvent where
arbitrary = Orch.ScraperEvent <$> elements [Nothing, Just "test message"]
<*> elements [Nothing, Just "INFO", Just "WARN"]
<*> elements [Nothing, Just "2018-04-18"]
instance Arbitrary Orch.JobLog where
arbitrary = Orch.JobLog
<$> arbitrary
<*> arbitrary
<*> arbitrary
<*> arbitrary
instance Arbitrary NewWithForm where arbitrary = genericArbitrary instance Arbitrary NewWithForm where arbitrary = genericArbitrary
instance Arbitrary RenameNode where instance Arbitrary RenameNode where
......
{-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-}
{-# LANGUAGE TypeApplications #-}
module Test.Ngrams.Terms (tests) where
import Data.HashMap.Strict qualified as HashMap
import Data.Map.Strict qualified as Map
import Gargantext.API.Ngrams
import Gargantext.Core (Lang(..))
import Gargantext.Core.Text.Context (TermList)
import Gargantext.Core.Text.Ngrams (NgramsType(NgramsTerms), Ngrams(..))
import Gargantext.Core.Text.Terms (ExtractedNgrams(..))
import Gargantext.Core.Text.Terms.WithList (buildPatterns, buildPatternsWith, extractTermsWithList', termsInText, Pattern(..))
import Gargantext.Database.Action.Flow.Utils (docNgrams, ngramsByDoc)
import Gargantext.Database.Admin.Types.Hyperdata.Document ( HyperdataDocument(..), emptyHyperdataDocument )
import Gargantext.Database.Admin.Types.Node (NodeId(..))
import Gargantext.Database.Schema.Context ( ContextPolyOnlyId(..) )
import Gargantext.Prelude
import Test.Tasty
import Test.Tasty.HUnit
tests :: TestTree
tests = testGroup "Ngrams" [unitTests]
unitTests :: TestTree
unitTests = testGroup "Terms tests"
[ -- Sorting
testCase "Build patterns works 01" testBuildPatterns01
, testCase "Build patterns works 02" testBuildPatterns02
, testCase "termsInText works 01" testTermsInText01
, testCase "termsInText works 02" testTermsInText02
, testCase "termsInText works 03" testTermsInText03
, testCase "termsInText works 04 (related to issue #221)" testTermsInText04
, testCase "extractTermsWithList' works 01" testExtractTermsWithList'01
, testCase "docNgrams works 01" testDocNgrams01
, testCase "docNgrams works 02" testDocNgrams02
, testCase "ngramsByDoc works 01" testNgramsByDoc01
]
-- | Let's document how the `buildPatternsWith` function works.
testBuildPatterns01 :: Assertion
testBuildPatterns01 = do
let terms = ["hello world"] :: [NgramsTerm]
let enPatterns = buildPatternsWith EN terms
length enPatterns @?= 1
let Just pat = head enPatterns
_pat_length pat @?= 2
_pat_terms pat @?= ["hello", "world"]
-- | Let's document how the `buildPatternsWith` function works.
testBuildPatterns02 :: Assertion
testBuildPatterns02 = do
let terms = ["hello", "world"] :: [NgramsTerm]
let enPatterns = buildPatternsWith EN terms
length enPatterns @?= 2
let [pat1, pat2] = enPatterns
_pat_length pat1 @?= 1
_pat_terms pat1 @?= ["hello"]
_pat_length pat2 @?= 1
_pat_terms pat2 @?= ["world"]
-- | Let's document how the `termsInText` function works.
testTermsInText01 :: Assertion
testTermsInText01 = do
let terms = ["hello", "world"] :: [NgramsTerm]
let enPatterns = buildPatternsWith EN terms
let tit = termsInText EN enPatterns "Hello, world!"
length tit @?= 2
let [tit1, tit2] = tit
tit1 @?= ("hello", 1)
tit2 @?= ("world", 1)
-- | Let's document how the `termsInText` function works.
testTermsInText02 :: Assertion
testTermsInText02 = do
let terms = ["hello", "world"] :: [NgramsTerm]
let enPatterns = buildPatternsWith EN terms
let tit = termsInText EN enPatterns "Hello, world, hello!"
length tit @?= 2
let [tit1, tit2] = tit
tit1 @?= ("hello", 2)
tit2 @?= ("world", 1)
-- | Let's document how the `termsInText` function works.
testTermsInText03 :: Assertion
testTermsInText03 = do
let terms = ["hello", "world"] :: [NgramsTerm]
let enPatterns = buildPatternsWith EN terms
let tit = termsInText EN enPatterns "Hello, world, again!"
length tit @?= 2
let [tit1, tit2] = tit
tit1 @?= ("hello", 1)
tit2 @?= ("world", 1)
-- | Let's document how the `termsInText` function works.
-- https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/221
testTermsInText04 :: Assertion
testTermsInText04 = do
let terms = ["feuilles de basilic"] :: [NgramsTerm]
let frPatterns = buildPatternsWith FR terms
let tit = termsInText FR frPatterns "Infos pratiques Nombre de personnes 1 personne Quantité 1 verre Temps de préparation 5 minutes Degré de difficulté Très facile Coût Abordable Les ingrédients de la recette 4 feuilles de basilic 1 branche de romarin 15 ml de citron jaune 60 ml d'eau gazeuse au mastiqua 90 ml de Bulles de Muscat Jaillance La préparation de la recette Verser dans un verre type long drink le citron jaune, les feuilles de basilic et l'eau gazeuse."
length tit @?= 1
let [tit1] = tit
tit1 @?= ("feuilles de basilic", 2)
testExtractTermsWithList'01 :: Assertion
testExtractTermsWithList'01 = do
let termList = [(["chat blanc"], [["chat","blanc"]])] :: TermList
extractTermsWithList' (buildPatterns termList) "Le chat blanc" @?= ["chat blanc"]
testDocNgrams01 :: Assertion
testDocNgrams01 = do
let terms = ["hello", "world"] :: [NgramsTerm]
let hd = emptyHyperdataDocument { _hd_title = Just "hello world"
, _hd_abstract = Nothing }
let ctx = ContextOnlyId 1 hd
let dNgrams = docNgrams EN terms ctx
length dNgrams @?= 2
testDocNgrams02 :: Assertion
testDocNgrams02 = do
let terms = ["hello", "world"] :: [NgramsTerm]
let hd = emptyHyperdataDocument { _hd_title = Just "hello world, kaboom"
, _hd_abstract = Nothing }
let ctx = ContextOnlyId 1 hd
let dNgrams = docNgrams EN terms ctx
length dNgrams @?= 2
testNgramsByDoc01 :: Assertion
testNgramsByDoc01 = do
let terms = ["hello", "world"] :: [NgramsTerm]
let hd1 = emptyHyperdataDocument { _hd_title = Just "hello world, kaboom"
, _hd_abstract = Nothing }
let ctx1 = ContextOnlyId 1 hd1
let hd2 = emptyHyperdataDocument { _hd_title = Just "world, boom world"
, _hd_abstract = Nothing }
let ctx2 = ContextOnlyId 2 hd2
ngramsByDoc EN NgramsTerms terms ctx1 @?=
HashMap.fromList
[ ( SimpleNgrams $ UnsafeNgrams { _ngramsTerms = "hello", _ngramsSize = 1 }
, Map.singleton NgramsTerms $ Map.singleton (UnsafeMkNodeId 1) (1, 1) )
, ( SimpleNgrams $ UnsafeNgrams { _ngramsTerms = "world", _ngramsSize = 1 }
, Map.singleton NgramsTerms $ Map.singleton (UnsafeMkNodeId 1) (1, 1) )
]
ngramsByDoc EN NgramsTerms terms ctx2 @?=
HashMap.fromList
[ ( SimpleNgrams $ UnsafeNgrams { _ngramsTerms = "world", _ngramsSize = 1 }
, Map.singleton NgramsTerms $ Map.singleton (UnsafeMkNodeId 2) (1, 2) )
]
...@@ -63,6 +63,8 @@ tests = testGroup "JSON" [ ...@@ -63,6 +63,8 @@ tests = testGroup "JSON" [
, testProperty "NodeType roundtrips" (jsonEnumRoundtrip (Dict @_ @NodeType)) , testProperty "NodeType roundtrips" (jsonEnumRoundtrip (Dict @_ @NodeType))
, testProperty "NodePublishPolicy roundtrips" (jsonEnumRoundtrip (Dict @_ @NodePublishPolicy)) , testProperty "NodePublishPolicy roundtrips" (jsonEnumRoundtrip (Dict @_ @NodePublishPolicy))
, testCase "WithQuery frontend compliance" testWithQueryFrontend , testCase "WithQuery frontend compliance" testWithQueryFrontend
, testCase "WithQuery frontend compliance (PubMed)" testWithQueryFrontendPubMed
, testCase "WithQuery frontend compliance (EPO)" testWithQueryFrontendEPO
, testGroup "Phylo" [ , testGroup "Phylo" [
testProperty "PeriodToNode" (jsonRoundtrip @PeriodToNodeData) testProperty "PeriodToNode" (jsonRoundtrip @PeriodToNodeData)
, testProperty "GraphData" (jsonRoundtrip @GraphData) , testProperty "GraphData" (jsonRoundtrip @GraphData)
...@@ -83,11 +85,37 @@ testWithQueryFrontend = do ...@@ -83,11 +85,37 @@ testWithQueryFrontend = do
Left err -> fail $ "JSON instance will break frontend!: JSON decoding returned: " <> err Left err -> fail $ "JSON instance will break frontend!: JSON decoding returned: " <> err
Right _ -> pure () Right _ -> pure ()
testWithQueryFrontendPubMed :: Assertion
testWithQueryFrontendPubMed = do
case eitherDecode @WithQuery (C8.pack cannedWithQueryPayloadPubMed) of
Left err -> fail $ "JSON instance will break frontend (PubMed)!: JSON decoding returned: " <> err
Right _ -> pure ()
testWithQueryFrontendEPO :: Assertion
testWithQueryFrontendEPO = do
case eitherDecode @WithQuery (C8.pack cannedWithQueryPayloadEPO) of
Left err -> fail $ "JSON instance will break frontend (EPO)!: JSON decoding returned: " <> err
Right _ -> pure ()
-- The aim of this type is to catch regressions in the frontend serialisation; this -- The aim of this type is to catch regressions in the frontend serialisation; this
-- is what the frontend currently expects, and therefore if we were to change the JSON -- is what the frontend currently expects, and therefore if we were to change the JSON
-- instances, this test would fail, and we will be notified. -- instances, this test would fail, and we will be notified.
cannedWithQueryPayload :: String cannedWithQueryPayload :: String
cannedWithQueryPayload = [r| {"query":"Haskell","node_id":138,"lang":"EN","flowListWith":{"type":"MyListsFirst"},"datafield": { "External": "Arxiv"},"databases":"Arxiv"} |] cannedWithQueryPayload = [r| {"query":"Haskell","node_id":138,"lang":"EN","flowListWith":{"type":"MyListsFirst"},"datafield": { "External": {"db": "Arxiv"}}} |]
-- The aim of this type is to catch regressions in the frontend serialisation; this
-- is what the frontend currently expects, and therefore if we were to change the JSON
-- instances, this test would fail, and we will be notified.
cannedWithQueryPayloadPubMed :: String
cannedWithQueryPayloadPubMed = [r| {"query":"Haskell","node_id":138,"lang":"EN","flowListWith":{"type":"MyListsFirst"},"datafield": { "External": {"db": "PubMed", "api_key": "x"}}} |]
-- The aim of this type is to catch regressions in the frontend serialisation; this
-- is what the frontend currently expects, and therefore if we were to change the JSON
-- instances, this test would fail, and we will be notified.
cannedWithQueryPayloadEPO :: String
cannedWithQueryPayloadEPO = [r| {"query":"Haskell","node_id":138,"lang":"EN","flowListWith":{"type":"MyListsFirst"},"datafield": { "External": {"db": "EPO", "api_user": "user", "api_token": "token"}}} |]
testParseBpaPhylo :: Assertion testParseBpaPhylo :: Assertion
testParseBpaPhylo = do testParseBpaPhylo = do
......
...@@ -12,6 +12,9 @@ module Main where ...@@ -12,6 +12,9 @@ module Main where
import Gargantext.Prelude import Gargantext.Prelude
import qualified Test.Core.Notifications as Notifications
import qualified Test.Core.Orchestrator as Orchestrator
import qualified Test.Core.Similarity as Similarity
import qualified Test.Core.Text.Corpus.Query as CorpusQuery import qualified Test.Core.Text.Corpus.Query as CorpusQuery
import qualified Test.Core.Text.Corpus.TSV as TSVParser import qualified Test.Core.Text.Corpus.TSV as TSVParser
import qualified Test.Core.Utils as Utils import qualified Test.Core.Utils as Utils
...@@ -20,19 +23,27 @@ import qualified Test.Graph.Clustering as Graph ...@@ -20,19 +23,27 @@ import qualified Test.Graph.Clustering as Graph
import qualified Test.Ngrams.Lang.Occurrences as Occurrences import qualified Test.Ngrams.Lang.Occurrences as Occurrences
import qualified Test.Ngrams.NLP as NLP import qualified Test.Ngrams.NLP as NLP
import qualified Test.Ngrams.Query as NgramsQuery import qualified Test.Ngrams.Query as NgramsQuery
import qualified Test.Offline.JSON as JSON import qualified Test.Ngrams.Terms as NgramsTerms
import qualified Test.Offline.Errors as Errors import qualified Test.Offline.Errors as Errors
import qualified Test.Offline.JSON as JSON
import qualified Test.Offline.Phylo as Phylo import qualified Test.Offline.Phylo as Phylo
import qualified Test.Offline.Stemming.Lancaster as Lancaster import qualified Test.Offline.Stemming.Lancaster as Lancaster
import qualified Test.Parsers.Date as PD import qualified Test.Parsers.Date as PD
import qualified Test.Utils.Crypto as Crypto import qualified Test.Utils.Crypto as Crypto
import qualified Test.Utils.Jobs as Jobs import qualified Test.Utils.Jobs as Jobs
import qualified Test.Core.Similarity as Similarity
import qualified Test.Core.Notifications as Notifications
import System.IO (hGetBuffering, hSetBuffering)
import Test.Tasty import Test.Tasty
import Test.Tasty.Hspec import Test.Tasty.Hspec
-- | https://mercurytechnologies.github.io/ghciwatch/integration/tasty.html
protectStdoutBuffering :: IO a -> IO a
protectStdoutBuffering action =
bracket
(hGetBuffering stdout)
(\bufferMode -> hSetBuffering stdout bufferMode)
(const action)
main :: IO () main :: IO ()
main = do main = do
utilSpec <- testSpec "Utils" Utils.test utilSpec <- testSpec "Utils" Utils.test
...@@ -45,7 +56,7 @@ main = do ...@@ -45,7 +56,7 @@ main = do
asyncUpdatesSpec <- testSpec "Notifications" Notifications.test asyncUpdatesSpec <- testSpec "Notifications" Notifications.test
occurrencesSpec <- testSpec "Occurrences" Occurrences.test occurrencesSpec <- testSpec "Occurrences" Occurrences.test
defaultMain $ testGroup "Gargantext" protectStdoutBuffering $ defaultMain $ testGroup "Gargantext"
[ utilSpec [ utilSpec
, clusteringSpec , clusteringSpec
, dateSplitSpec , dateSplitSpec
...@@ -65,4 +76,6 @@ main = do ...@@ -65,4 +76,6 @@ main = do
, Worker.tests , Worker.tests
, asyncUpdatesSpec , asyncUpdatesSpec
, Notifications.qcTests , Notifications.qcTests
, Orchestrator.qcTests
, NgramsTerms.tests
] ]
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment