Squashed commit of the following:

commit be879b1e
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Thu Jan 30 18:22:44 2025 +0100

    [ngrams] code fixes according to review

    Related MR:
    !378

commit bf89561b
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Wed Jan 22 21:11:47 2025 +0100

    [test] notification on node move

    Also, some small refactorings.

commit 3d5d74ab
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Wed Jan 22 20:13:44 2025 +0100

    [tests] add notifications func comment, fix core/notifications indent

commit b8ea3af2
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Wed Jan 22 19:13:35 2025 +0100

    [update-project-dependencies]

commit 1217baf4
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Wed Jan 22 19:09:17 2025 +0100

    [tests] notifications: test async notifications for update tree

    Related to
    #418

commit 874785e9
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Tue Jan 21 06:06:27 2025 +0100

    [refactor] unify Database & ExternalIDs

    These types are the same, except for Database.Empty

    I managed to have backwards compatibility with the frontend format,
    hence the frontend doesn't need any changes.

commit e7b16520
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Tue Jan 21 06:05:57 2025 +0100

    [cabal] upgrade haskell-bee to fix TSRetry and ESRepeat issues

commit ad045ae0
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Jan 20 06:32:49 2025 +0100

    [cabal] upgrade haskell-bee tag

commit b3910bb4
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Tue Jan 14 10:56:12 2025 +0100

    [test] move some Arbitrary instances to Test/Instances.hs

commit bb282d02
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Tue Jan 14 09:17:23 2025 +0100

    [test] WithQuery offline test (with EPO constructor)

commit c0fe2e51
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Tue Jan 14 06:59:45 2025 +0100

    [query] move EPO user/token into the datafield

    This simplifies the WithQuery structure even more

commit 93586adc
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Jan 13 17:45:42 2025 +0100

    [tests] fix WithQuery frontend serialization test

    Also, add WithQuery pubmed test (with api_key)

commit bc29319c
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Jan 13 10:13:15 2025 +0100

    [ngrams] simplify WithQuery json structure

    There is now only a 'datafield' field, no need for duplicated
    'database'.

    Related to #441

commit e6fdbee4
Merge: 95dc32b3 13457ca8
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Fri Jan 10 12:03:59 2025 +0100

    Merge branch 'dev' into 224-dev-understanding-ngrams

commit 95dc32b3
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Tue Jan 7 20:01:11 2025 +0100

    [ngrams] refactor PubMed DB type (to include Maybe APIKey)

commit baa2491f
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Tue Jan 7 18:09:04 2025 +0100

    [refactor] searx search refactoring

commit fcf83bf7
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Tue Jan 7 11:14:03 2025 +0100

    [ngrams] more types annotations

commit 0d8a77c4
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Dec 30 16:15:07 2024 +0100

    [ngrams, test] refactor: Count -> Terms

commit 85f1dffe
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Dec 30 14:35:05 2024 +0100

    [ngrams] refactor opaque Int into TermsWeight newtype

commit a81ea049
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Dec 30 14:34:39 2024 +0100

    [CLI] fix limit removal

    It wasn't used anyways.

commit d1dfbf79
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Dec 30 11:35:41 2024 +0100

    [ngrams] one more simplification in ngramsByDoc'

commit fcb48b8f
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Dec 30 11:33:33 2024 +0100

    [ngrams] some more simplification of ngramsByDoc'

commit ab7c1766
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Dec 30 11:00:19 2024 +0100

    [ngrams, tests] understanding ngramsByDoc better

commit 35c2d0b0
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Dec 23 21:20:29 2024 +0100

    [ngrams] small simplification to docNgrams function

commit 161ac077
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Mon Dec 23 18:35:59 2024 +0100

    [ngrams] annotate types of ngrams algorithms

commit 08c7c91c
Author: Przemysław Kaminski <pk@intrepidus.pl>
Date:   Sat Dec 21 09:45:00 2024 +0100

    [ngrams] improve function documentation, add types, add unit tests

    I want to understand ngrams algorithms better.
parent 942e663f
Pipeline #7289 passed with stages
in 51 minutes and 14 seconds
......@@ -371,7 +371,7 @@ https://haskell-language-server.readthedocs.io/en/latest/installation.html
Running the tests can be done via the following command:
```hs
```shell
cabal v2-test --test-show-details=streaming --flags 'test-crypto no-phylo-debug-logs'
```
......@@ -383,10 +383,24 @@ The flags have the following meaning:
In order for some tests to run (like the phylo ones) is **required** to install the `gargantext-cli` via:
```hs
```shell
cabal v2-install gargantext:exe:gargantext
```
For tasty, if you want to run specific test (via patterns), use:
```shell
cabal v2-run garg-test-tasty -- -p '/Ngrams/
```
For integration tests, do:
```shell
cabal v2-test garg-test-hspec --test-show-details=streaming --test-option=--match='/some pattern/'
```
You could also use [ghciwatch](https://mercurytechnologies.github.io/ghciwatch/) for testsing:
```shell
ghciwatch --command "cabal v2-repl garg-test-tasty" --after-startup-ghci ':set args "--pattern" "/Ngrams/"' --after-startup-ghci "Main.main" --after-reload-ghci "Main.main" --watch src --watch test
```
### Modifying a golden test to accept a new (expected) output
Some tests, like the Phylo one, use golden testing to ensure that the JSON Phylo we generate is
......@@ -394,7 +408,7 @@ the same as an expected one. This allows us to catch regressions in the serialis
Sometimes, however, we genuinely want to modify the output so that it's the new reference (i.e. the new
golden reference). To do so, it's enough to run the testsuite passing the `--accept` flag, for example:
```hs
```shell
cabal v2-test garg-test-tasty --test-show-details=streaming --flags 'test-crypto no-phylo-debug-logs' --test-option=--pattern='/Phylo/' --test-option=--accept"
```
......
......@@ -26,7 +26,6 @@ import Gargantext.API.Node () -- instances
import Gargantext.Core (Lang(..))
import Gargantext.Core.Text.Corpus.Parsers (FileFormat(..), FileType(..))
import Gargantext.Core.Types.Individu (User(..))
import Gargantext.Core.Types.Query
import Gargantext.Database.Action.Flow (flowCorpusFile, flowAnnuaire, TermType(..))
import Gargantext.Database.Action.Flow.Types (FlowCmdM)
import Gargantext.Database.Admin.Types.Node (CorpusId)
......@@ -39,16 +38,16 @@ import qualified Data.Text as T
importCLI :: ImportArgs -> IO ()
importCLI (ImportArgs fun user name settingsPath limit corpusPath) = do
importCLI (ImportArgs fun user name settingsPath corpusPath) = do
let
tt = Multi EN
format = TsvGargV3
corpus :: forall m. (FlowCmdM DevEnv BackendInternalError m, MonadJobStatus m, JobHandle m ~ DevJobHandle) => m CorpusId
mkCorpusUser = MkCorpusUserNormalCorpusName (UserName $ cs user) (cs name :: Text)
corpus = flowCorpusFile mkCorpusUser limit tt format Plain corpusPath Nothing DevJobHandle
corpus = flowCorpusFile mkCorpusUser tt format Plain corpusPath Nothing DevJobHandle
corpusTsvHal :: forall m. (FlowCmdM DevEnv BackendInternalError m, MonadJobStatus m, JobHandle m ~ DevJobHandle) => m CorpusId
corpusTsvHal = flowCorpusFile mkCorpusUser limit tt TsvHal Plain corpusPath Nothing DevJobHandle
corpusTsvHal = flowCorpusFile mkCorpusUser tt TsvHal Plain corpusPath Nothing DevJobHandle
annuaire :: forall m. (FlowCmdM DevEnv BackendInternalError m, MonadJobStatus m, JobHandle m ~ DevJobHandle) => m CorpusId
annuaire = flowAnnuaire (MkCorpusUserNormalCorpusName (UserName $ cs user) "Annuaire") (Multi EN) corpusPath DevJobHandle
......@@ -76,7 +75,6 @@ import_p = fmap CCMD_import $ ImportArgs
<*> ( option str ( long "user") )
<*> ( option str ( long "name") )
<*> settings_p
<*> (fmap Limit ( option auto ( long "limit" <> metavar "INT" <> help "The limit for the query") ))
<*> ( option str ( long "corpus-path" <> help "Path to corpus file") )
function_p :: String -> Either String ImportFunction
......
......@@ -19,7 +19,6 @@ import Data.Text (Text)
import Gargantext.API.Admin.EnvTypes (Mode)
import Gargantext.Core.Config.Types (SettingsFile(..))
import Gargantext.Core.Types (NodeId)
import Gargantext.Core.Types.Query
import Prelude
newtype CorpusFile = CorpusFile { _CorpusFile :: FilePath }
......@@ -55,7 +54,6 @@ data ImportArgs = ImportArgs
, imp_user :: !Text
, imp_name :: !Text
, imp_settings :: !SettingsFile
, imp_limit :: !Limit
, imp_corpus_path :: !FilePath
} deriving (Show, Eq)
......
......@@ -18,7 +18,7 @@ fi
# with the `sha256sum` result calculated on the `cabal.project` and
# `cabal.project.freeze`. This ensures the files stay deterministic so that CI
# cache can kick in.
expected_cabal_project_hash="1abcdd99d5d50660e640be8a340c90331a84ef266d174c7ca6099c1c04ef65ea"
expected_cabal_project_hash="ac293a4c66092996bc85fbf14ef34b7cce3ed5b0612ceb9e1a5f395059631e0b"
expected_cabal_project_freeze_hash="32310c4d4e7b4679dcb90dcfcd0d6d1b175dbf885a77ffddca16d422998a521c"
......
......@@ -180,7 +180,7 @@ source-repository-package
source-repository-package
type: git
location: https://gitlab.iscpif.fr/gargantext/haskell-bee
tag: 69b7388a62f2afb5cb5609beac96e8cb35e94478
tag: 4a9c709613554eed0189b486de2126c18797088c
subdir: haskell-bee/
haskell-bee-pgmq/
haskell-bee-tests/
......
......@@ -266,6 +266,7 @@ library
Gargantext.Core.Worker.Types
Gargantext.Database.Action.Flow
Gargantext.Database.Action.Flow.Types
Gargantext.Database.Action.Flow.Utils
Gargantext.Database.Action.Metrics.TFICF
Gargantext.Database.Action.Search
Gargantext.Database.Action.User
......@@ -288,6 +289,7 @@ library
Gargantext.Database.Query.Table.Node.User
Gargantext.Database.Query.Table.User
Gargantext.Database.Query.Tree.Root
Gargantext.Database.Schema.Context
Gargantext.Database.Schema.Ngrams
Gargantext.Database.Schema.Node
Gargantext.Database.Schema.User
......@@ -416,7 +418,6 @@ library
Gargantext.Database.Action.Flow.Extract
Gargantext.Database.Action.Flow.List
Gargantext.Database.Action.Flow.Pairing
Gargantext.Database.Action.Flow.Utils
Gargantext.Database.Action.Index
Gargantext.Database.Action.Learn
Gargantext.Database.Action.Mail
......@@ -461,7 +462,6 @@ library
Gargantext.Database.Query.Table.NodeNgrams
Gargantext.Database.Query.Tree
Gargantext.Database.Query.Tree.Error
Gargantext.Database.Schema.Context
Gargantext.Database.Schema.ContextNodeNgrams
Gargantext.Database.Schema.ContextNodeNgrams2
Gargantext.Database.Schema.NodeContext
......@@ -796,6 +796,7 @@ test-suite garg-test-tasty
Test.API.Prelude
Test.API.UpdateList
Test.Core.Notifications
Test.Core.Orchestrator
Test.Core.Similarity
Test.Core.Text
Test.Core.Text.Corpus.Query
......@@ -821,6 +822,7 @@ test-suite garg-test-tasty
Test.Ngrams.NLP
Test.Ngrams.Query
Test.Ngrams.Query.PaginationCorpus
Test.Ngrams.Terms
Test.Offline.Errors
Test.Offline.JSON
Test.Offline.Phylo
......
......@@ -17,39 +17,68 @@ Portability : POSIX
module Gargantext.API.Admin.Orchestrator.Types
where
import Data.Aeson (genericParseJSON, genericToJSON)
import Data.Aeson (genericParseJSON, genericToJSON, object, withObject, (.=), (.:), (.:?), Value(String))
import Data.Aeson.Types (unexpected)
import Data.Morpheus.Types ( GQLType(..), DropNamespace(..), typeDirective )
import Data.Swagger (ToSchema, URL, declareNamedSchema, defaultSchemaOptions, genericDeclareNamedSchemaUnrestricted)
-- import Gargantext.API.GraphQL.UnPrefix qualified as GQLU
import Gargantext.Core.Types (TODO(..))
import Gargantext.Core.Utils.Aeson (jsonOptions)
import Gargantext.Prelude
import Test.QuickCheck (elements)
import Test.QuickCheck.Arbitrary
import PUBMED.Types qualified as PUBMED
type EPOAPIToken = Text
type EPOAPIUser = Text
------------------------------------------------------------------------
-- | Main Types
-- TODO IsidoreAuth
data ExternalAPIs = OpenAlex
| PubMed
| PubMed (Maybe PUBMED.APIKey)
| Arxiv
| HAL
| IsTex
| Isidore
| EPO
deriving (Show, Eq, Generic, Enum, Bounded)
| EPO (Maybe EPOAPIUser) (Maybe EPOAPIToken)
deriving (Show, Eq, Generic)
-- | Main Instances
instance FromJSON ExternalAPIs
instance ToJSON ExternalAPIs
instance FromJSON ExternalAPIs where
parseJSON = withObject "ExternalAPIs" $ \o -> do
db <- o .: "db"
case db of
"OpenAlex" -> pure OpenAlex
"PubMed" -> do
mAPIKey <- o .:? "api_key"
pure $ PubMed mAPIKey
"Arxiv" -> pure Arxiv
"HAL" -> pure HAL
"IsTex" -> pure IsTex
"Isidore" -> pure Isidore
"EPO" -> do
mAPIUser <- o .:? "api_user"
mAPIToken <- o .:? "api_token"
pure $ EPO mAPIUser mAPIToken
s -> unexpected (String s)
instance ToJSON ExternalAPIs where
toJSON (PubMed mAPIKey) = object [ "db" .= toJSON ("PubMed" :: Text)
, "api_key" .= toJSON mAPIKey ]
toJSON (EPO mAPIUser mAPIToken) = object [ "db" .= toJSON ("EPO" :: Text)
, "api_user" .= toJSON mAPIUser
, "api_token" .= toJSON mAPIToken ]
toJSON t = object [ "db" .= toJSON (show t :: Text) ]
externalAPIs :: [ExternalAPIs]
externalAPIs = [minBound .. maxBound]
instance Arbitrary ExternalAPIs
where
arbitrary = arbitraryBoundedEnum
externalAPIs =
[ OpenAlex
, PubMed Nothing
, Arxiv
, HAL
, IsTex
, Isidore
, EPO Nothing Nothing ]
instance ToSchema ExternalAPIs where
declareNamedSchema = genericDeclareNamedSchemaUnrestricted defaultSchemaOptions
......@@ -64,11 +93,6 @@ data ScraperEvent = ScraperEvent
, _scev_date :: !(Maybe Text)
}
deriving (Show, Generic, Eq)
instance Arbitrary ScraperEvent where
arbitrary = ScraperEvent <$> elements [Nothing, Just "test message"]
<*> elements [Nothing, Just "INFO", Just "WARN"]
<*> elements [Nothing, Just "2018-04-18"]
instance ToJSON ScraperEvent where
toJSON = genericToJSON $ jsonOptions "_scev_"
instance FromJSON ScraperEvent where
......@@ -91,12 +115,6 @@ makeLenses ''JobLog
noJobLog :: JobLog
noJobLog = JobLog Nothing Nothing Nothing Nothing
instance Arbitrary JobLog where
arbitrary = JobLog
<$> arbitrary
<*> arbitrary
<*> arbitrary
<*> arbitrary
instance ToJSON JobLog where
toJSON = genericToJSON $ jsonOptions "_scst_"
instance FromJSON JobLog where
......
......@@ -9,6 +9,7 @@ Portability : POSIX
-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
{-# OPTIONS_GHC -Wno-unrecognised-pragmas #-}
......@@ -108,7 +109,9 @@ getListNgrams nodeIds ngramsType = listNgramsFromRepo nodeIds ngramsType
<$> getRepo nodeIds
getTermsWith :: (HasNodeStory env err m, Eq a, Hashable a)
-- | Fetch terms from repo, gathering terms under the same root (parent).
getTermsWith :: forall a env err m.
(HasNodeStory env err m, Eq a, Hashable a)
=> (NgramsTerm -> a) -> [ListId]
-> NgramsType -> Set ListType
-> m (HashMap a [a])
......@@ -119,6 +122,7 @@ getTermsWith f ls ngt lts = HM.fromListWith (<>)
<$> mapTermListRoot ls ngt
<$> getRepo ls
where
toTreeWith :: (NgramsTerm, (b, Maybe NgramsTerm)) -> (a, [a])
toTreeWith (t, (_lt, maybeRoot)) = case maybeRoot of
Nothing -> (f t, [])
Just r -> (f r, [f t])
......
......@@ -27,12 +27,11 @@ import Data.Conduit.Internal (zipSources)
import Data.Swagger ( ToSchema(..) )
import Data.Text.Encoding qualified as TE
import Data.Text qualified as T
import EPO.API.Client.Types qualified as EPO
import Gargantext.API.Admin.Orchestrator.Types qualified as API
import Gargantext.API.Ngrams (commitStatePatch, Versioned(..))
import Gargantext.API.Node.Corpus.New.Types ( FileFormat(..), FileType(..) )
import Gargantext.API.Node.Corpus.Searx ( triggerSearxSearch )
import Gargantext.API.Node.Corpus.Types ( Datafield(Web), database2origin )
import Gargantext.API.Node.Corpus.Types ( Datafield(Web), datafield2origin )
import Gargantext.API.Node.Corpus.Update (addLanguageToCorpus)
import Gargantext.API.Node.Types
import Gargantext.Core.Config (gc_jobs, hasConfig)
......@@ -157,26 +156,20 @@ addToCorpusWithQuery :: ( FlowCmdM env err m
-> JobHandle m
-> m ()
addToCorpusWithQuery user cid (WithQuery { _wq_query = q
, _wq_databases = dbs
, _wq_datafield = datafield
, _wq_lang = l
, _wq_flowListWith = flw
, _wq_pubmedAPIKey = mPubmedAPIKey
, .. }) maybeLimit jobHandle = do
, _wq_flowListWith = flw }) maybeLimit jobHandle = do
-- TODO ...
$(logLocM) DEBUG $ "[addToCorpusWithQuery] (cid, dbs) " <> show (cid, dbs)
$(logLocM) DEBUG $ "[addToCorpusWithQuery] cid " <> show cid
$(logLocM) DEBUG $ "[addToCorpusWithQuery] datafield " <> show datafield
$(logLocM) DEBUG $ "[addToCorpusWithQuery] flowListWith " <> show flw
let mEPOAuthKey = EPO.AuthKey <$> (EPO.User <$> _wq_epoAPIUser)
<*> (EPO.Token <$> _wq_epoAPIToken)
$(logLocM) DEBUG $ "[addToCorpusWithQuery] addLanguageToCorpus " <> show cid <> ", " <> show l
addLanguageToCorpus cid l
$(logLocM) DEBUG "[addToCorpusWithQuery] after addLanguageToCorpus"
case datafield of
Just Web -> do
Web -> do
$(logLocM) DEBUG $ "[addToCorpusWithQuery] processing web request " <> show datafield
markStarted 1 jobHandle
......@@ -193,10 +186,10 @@ addToCorpusWithQuery user cid (WithQuery { _wq_query = q
-- if cid is corpus -> add to corpus
-- if cid is root -> create corpus in Private
$(logLocM) DEBUG $ "[addToCorpusWithQuery] getDataText with query: " <> show q
let db = database2origin dbs
let db = datafield2origin datafield
-- mPubmedAPIKey <- getUserPubmedAPIKey user
-- printDebug "[addToCorpusWithQuery] mPubmedAPIKey" mPubmedAPIKey
eTxt <- getDataText db (Multi l) q mPubmedAPIKey mEPOAuthKey maybeLimit
eTxt <- getDataText db (Multi l) q maybeLimit
-- printDebug "[G.A.N.C.New] lTxts" lTxts
case eTxt of
......
......@@ -15,7 +15,6 @@ module Gargantext.API.Node.Corpus.Searx where
import Control.Lens (view)
import Data.Aeson qualified as Aeson
import Data.HashMap.Strict qualified as HashMap
import Data.Text qualified as T
import Data.Text qualified as Text
import Data.Time.Calendar (Day, toGregorian)
......@@ -27,14 +26,11 @@ import Gargantext.Core.Config.Types (FramesConfig(..))
import Gargantext.Core.NLP (HasNLPServer, nlpServerGet)
import Gargantext.Core.NodeStory.Types ( HasNodeStory )
import Gargantext.Core.Text.Corpus.Query qualified as Query
import Gargantext.Core.Text.List (buildNgramsLists)
import Gargantext.Core.Text.List.Group.WithStem ({-StopSize(..),-} GroupParams(..))
import Gargantext.Core.Text.Terms (TermType(..))
import Gargantext.Core.Types (HasValidationError)
import Gargantext.Core.Types.Individu (User(..))
import Gargantext.Core.Utils.Prefix (unPrefix)
import Gargantext.Database.Action.Flow (addDocumentsToHyperCorpus) --, DataText(..))
import Gargantext.Database.Action.Flow.List (flowList_DbRepo)
import Gargantext.Database.Action.Flow (addDocumentsToHyperCorpus, buildSocialList) --, DataText(..))
import Gargantext.Database.Action.User (getUserId)
import Gargantext.Database.Admin.Types.Hyperdata.Corpus (HyperdataCorpus)
import Gargantext.Database.Admin.Types.Hyperdata.Document (HyperdataDocument(..))
......@@ -42,7 +38,6 @@ import Gargantext.Database.Admin.Types.Node (CorpusId, ListId, NodeType(NodeText
import Gargantext.Database.Query.Table.Node (getOrMkList, insertDefaultNodeIfNotExists)
import Gargantext.Database.Query.Table.Node.Error (HasNodeError)
import Gargantext.Database.Query.Tree.Error (HasTreeError)
import Gargantext.Database.Query.Tree.Root (getOrMkRootWithCorpus, MkCorpusUser (MkCorpusUserMaster))
import Gargantext.Prelude
import Gargantext.Utils.Jobs.Monad (JobHandle, MonadJobStatus(..))
import Network.HTTP.Client
......@@ -147,17 +142,12 @@ insertSearxResponse user cId listId l (Right (SearxResponse { _srs_results })) =
--_ <- flowDataText user (DataNew [docs']) (Multi l) cId Nothing logStatus
let mCorpus = Nothing :: Maybe HyperdataCorpus
void $ addDocumentsToHyperCorpus server mCorpus (Multi l) cId docs'
(_masterUserId, _masterRootId, masterCorpusId)
<- getOrMkRootWithCorpus MkCorpusUserMaster mCorpus
let gp = GroupWithPosTag l server HashMap.empty
-- gp = case l of
-- FR -> GroupWithPosTag l Spacy HashMap.empty
-- _ -> GroupWithPosTag l CoreNLP HashMap.empty
ngs <- buildNgramsLists user cId masterCorpusId Nothing gp
_userListId <- flowList_DbRepo listId ngs
_ <- buildSocialList l user cId listId mCorpus Nothing
pure ()
-- TODO Make an async task out of this?
triggerSearxSearch :: ( MonadBase IO m
, HasNodeStory env err m
......
......@@ -9,43 +9,45 @@ Portability : POSIX
-}
{-# LANGUAGE TemplateHaskell #-}
module Gargantext.API.Node.Corpus.Types where
import Control.Lens ( (?~) )
import Control.Monad.Fail (fail)
import Data.Aeson ( Value(..), (.:), withText, object )
import Data.Aeson ( Value(..), (.:), (.=), withText, object, withObject )
import Data.Aeson.Types ( Parser )
import Data.Swagger
import Data.Text qualified as T
import Gargantext.API.Admin.Orchestrator.Types qualified as Types
import Gargantext.Core.Utils.Prefix (unPrefix)
import Gargantext.Database.Action.Flow.Types (DataOrigin(..))
import Gargantext.Prelude
data Database = Empty
| OpenAlex
| PubMed
| Arxiv
| HAL
| IsTex
| Isidore
| EPO
deriving (Eq, Show, Generic, Enum, Bounded)
type EPOAPIToken = Text
type EPOAPIUser = Text
deriveJSON (unPrefix "") ''Database
data Database =
Empty
| DB Types.ExternalAPIs
deriving (Eq, Show, Generic)
instance FromJSON Database where
parseJSON = withObject "Database" $ \o -> do
db <- o .: "db" :: Parser Text
case db of
"Empty" -> pure Empty
_ -> do
eapi <- parseJSON (Object o) :: Parser Types.ExternalAPIs
pure $ DB eapi
instance ToJSON Database where
toJSON Empty = object [ "db" .= (show Empty :: Text)]
toJSON (DB db) = toJSON db
instance ToSchema Database where
declareNamedSchema = genericDeclareNamedSchemaUnrestricted defaultSchemaOptions
database2origin :: Database -> DataOrigin
database2origin Empty = InternalOrigin Types.IsTex
database2origin OpenAlex = ExternalOrigin Types.OpenAlex
database2origin PubMed = ExternalOrigin Types.PubMed
database2origin Arxiv = ExternalOrigin Types.Arxiv
database2origin HAL = ExternalOrigin Types.HAL
database2origin IsTex = ExternalOrigin Types.IsTex
database2origin Isidore = ExternalOrigin Types.Isidore
database2origin EPO = ExternalOrigin Types.EPO
datafield2origin :: Datafield -> DataOrigin
datafield2origin (External Empty) = InternalOrigin Types.IsTex
datafield2origin (External (DB db)) = ExternalOrigin db
-- -- | This isn't really used
datafield2origin _ = InternalOrigin Types.IsTex
------------------------------------------------------------------------
data Datafield = Gargantext
......
......@@ -9,8 +9,6 @@ Portability : POSIX
-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeOperators #-}
module Gargantext.API.Node.Share
where
......@@ -57,25 +55,21 @@ api userInviting nId (ShareTeamParams user') = do
pure u
Left _err -> do
username' <- getUsername userInviting
_ <- case username' `List.elem` arbitraryUsername of
True -> do
-- printDebug "[G.A.N.Share.api]" ("Demo users are not allowed to invite" :: Text)
pure ()
False -> do
-- TODO better analysis of the composition of what is shared
children <- findNodesWithType nId [NodeList] [ NodeFolderShared
, NodeTeam
, NodeFolder
, NodeCorpus
]
_ <- case List.null children of
True -> do
-- printDebug "[G.A.N.Share.api]" ("Invitation is enabled if you share a corpus at least" :: Text)
pure $ UnsafeMkUserId 0
False -> do
-- printDebug "[G.A.N.Share.api]" ("Your invitation is sent to: " <> user'')
newUser user''
pure ()
unless (username' `List.elem` arbitraryUsername) $ do
-- TODO better analysis of the composition of what is shared
children <- findNodesWithType nId [NodeList] [ NodeFolderShared
, NodeTeam
, NodeFolder
, NodeCorpus
]
_ <- if List.null children
then do
-- printDebug "[G.A.N.Share.api]" ("Invitation is enabled if you share a corpus at least" :: Text)
pure $ UnsafeMkUserId 0
else do
-- printDebug "[G.A.N.Share.api]" ("Your invitation is sent to: " <> user'')
newUser user''
pure ()
pure u
fromIntegral <$> DB.shareNodeWith (ShareNodeWith_User NodeFolderShared (UserName user)) nId
......
......@@ -83,14 +83,10 @@ instance GargDB.SaveFile NewWithFile where
data WithQuery = WithQuery
{ _wq_query :: !API.RawQuery
, _wq_databases :: !Database
, _wq_datafield :: !(Maybe Datafield)
, _wq_datafield :: !Datafield
, _wq_lang :: !Lang
, _wq_node_id :: !Int
, _wq_flowListWith :: !FlowSocialListWith
, _wq_pubmedAPIKey :: !(Maybe Text)
, _wq_epoAPIUser :: !(Maybe Text)
, _wq_epoAPIToken :: !(Maybe Text)
}
deriving (Show, Eq, Generic)
......
......@@ -36,7 +36,6 @@ import Gargantext.Core.Text.Corpus.Query qualified as Corpus
import Gargantext.Database.Admin.Types.Hyperdata.Document (HyperdataDocument(..))
import Gargantext.Prelude hiding (get)
import Gargantext.Utils.Jobs.Error
import PUBMED.Types qualified as PUBMED
import Servant.Client (ClientError)
data GetCorpusError
......@@ -60,17 +59,15 @@ get :: ExternalAPIs
-- If the provider doesn't support the search filtered by language, or if the language
-- is not important, the frontend will simply send 'EN' to the backend.
-> Corpus.RawQuery
-> Maybe PUBMED.APIKey
-> Maybe EPO.AuthKey
-> Text
-> Maybe Corpus.Limit
-- -> IO [HyperdataDocument]
-> IO (Either GetCorpusError (Maybe Integer, ConduitT () HyperdataDocument IO ()))
get externalAPI lang q mPubmedAPIKey epoAuthKey epoAPIUrl limit = do
get externalAPI lang q epoAPIUrl limit = do
-- For PUBMED, HAL, IsTex, Isidore and OpenAlex, we want to send the query as-it.
-- For Arxiv we parse the query into a structured boolean query we submit over.
case externalAPI of
PubMed ->
PubMed mPubmedAPIKey ->
first (ExternalAPIError externalAPI) <$> PUBMED.get (fromMaybe "" mPubmedAPIKey) q limit
OpenAlex ->
first (ExternalAPIError externalAPI) <$> OpenAlex.get (fromMaybe "" Nothing {- email -}) q (Just $ toISO639 lang) limit
......@@ -85,8 +82,10 @@ get externalAPI lang q mPubmedAPIKey epoAuthKey epoAPIUrl limit = do
Isidore -> do
docs <- ISIDORE.get lang (Corpus.getLimit <$> limit) (Just $ Corpus.getRawQuery q) Nothing
pure $ Right (Just $ fromIntegral $ length docs, yieldMany docs)
EPO -> do
first (ExternalAPIError externalAPI) <$> EPO.get epoAuthKey epoAPIUrl q (toISO639 lang) limit
EPO mAPIUser mAPIToken -> do
let mEPOAuthKey = EPO.AuthKey <$> (EPO.User <$> mAPIUser)
<*> (EPO.Token <$> mAPIToken)
first (ExternalAPIError externalAPI) <$> EPO.get mEPOAuthKey epoAPIUrl q (toISO639 lang) limit
where
parse_query = first (InvalidInputQuery q . T.pack) $ Corpus.parseQuery q
......
......@@ -52,7 +52,7 @@ import Gargantext.Core.Text.Terms.Mono (monoTerms)
import Gargantext.Core.Text.Terms.Mono.Stem (stem, StemmingAlgorithm(..))
import Gargantext.Core.Text.Terms.Mono.Token.En (tokenize)
import Gargantext.Core.Text.Terms.Multi (multiterms)
import Gargantext.Core.Types ( TermsCount, POS, Terms(..), TermsWithCount )
import Gargantext.Core.Types ( TermsCount, TermsWeight, POS, Terms(..), TermsWithCount )
import Gargantext.Core.Utils (groupWithCounts)
import Gargantext.Database.Prelude (DBCmd)
import Gargantext.Database.Query.Table.Ngrams (insertNgrams)
......@@ -122,7 +122,7 @@ class ExtractNgramsT h
=> NLPServerConfig
-> TermType Lang
-> h
-> DBCmd err (HashMap ExtractedNgrams (Map NgramsType Int, TermsCount))
-> DBCmd err (HashMap ExtractedNgrams (Map NgramsType TermsWeight, TermsCount))
------------------------------------------------------------------------
enrichedTerms :: Lang -> PosTagAlgo -> POS -> Terms -> NgramsPostag
enrichedTerms l pa po (Terms { .. }) =
......@@ -176,7 +176,7 @@ terms :: NLPServerConfig -> TermType Lang -> Text -> IO [TermsWithCount]
terms _ (Mono lang) txt = pure $ monoTerms lang txt
terms ncs (Multi lang) txt = multiterms ncs lang txt
terms ncs (MonoMulti lang) txt = terms ncs (Multi lang) txt
terms _ (Unsupervised { .. }) txt = pure $ termsUnsupervised (Unsupervised { _tt_model = Just m', .. }) txt
terms _ (Unsupervised { .. }) txt = pure $ termsUnsupervised _tt_lang m' _tt_windowSize _tt_ngramsSize txt
where
m' = maybe (newTries _tt_ngramsSize txt) identity _tt_model
-- terms (WithList list) txt = pure . concat $ extractTermsWithList list txt
......@@ -189,17 +189,15 @@ type MinNgramSize = Int
-- | Unsupervised ngrams extraction
-- language agnostic extraction
-- TODO: newtype BlockText
termsUnsupervised :: TermType Lang -> Text -> [TermsWithCount]
termsUnsupervised (Unsupervised { _tt_model = Nothing }) = panicTrace "[termsUnsupervised] no model"
termsUnsupervised (Unsupervised { _tt_model = Just _tt_model, .. }) =
map (first (text2term _tt_lang))
termsUnsupervised :: Lang -> Tries Token () -> Int -> Int -> Text -> [TermsWithCount]
termsUnsupervised lang model windowSize ngramsSize =
map (first (text2term lang))
. groupWithCounts
-- . List.nub
. List.filter (\l' -> List.length l' >= _tt_windowSize)
. List.filter (\l' -> List.length l' >= windowSize)
. List.concat
. mainEleveWith _tt_model _tt_ngramsSize
. mainEleveWith model ngramsSize
. uniText
termsUnsupervised _ = undefined
newTries :: Int -> Text -> Tries Token ()
......
......@@ -96,18 +96,14 @@ termsInText lang pats (manipulateText lang -> txt) =
-- | Manipulates the input 'Text' before passing it to 'termsInText'.
-- In particular, if the language is Chinese (ZH), we add spaces.
manipulateText :: Lang -> Text -> Text
manipulateText lang txt = case lang of
ZH -> addSpaces txt
_ -> txt
manipulateText ZH txt = addSpaces txt
manipulateText _ txt = txt
--------------------------------------------------------------------------
extractTermsWithList :: Patterns -> Text -> Corpus [Text]
extractTermsWithList pats = map (replaceTerms KeepAll pats) . monoTextsBySentence
-- | Extract terms
-- >>> let termList = [(["chat blanc"], [["chat","blanc"]])] :: TermList
-- extractTermsWithList' (buildPatterns termList) "Le chat blanc"["chat blanc"]
-- ["chat blanc"]
extractTermsWithList' :: Patterns -> Text -> [Text]
extractTermsWithList' pats = map (concat . map concat . replaceTerms KeepAll pats)
. monoTextsBySentence
......
......@@ -20,7 +20,7 @@ commentary with @some markup@.
module Gargantext.Core.Types ( module Gargantext.Core.Types.Main
, module Gargantext.Database.Admin.Types.Node
, DebugMode(..), withDebugMode
, Term(..), Terms(..), TermsCount, TermsWithCount
, Term(..), Terms(..), TermsCount, TermsWeight(..), TermsWithCount
, TokenTag(..), POS(..), NER(..)
, Label, Stems
, HasValidationError(..), assertValid
......@@ -74,6 +74,9 @@ type TermsCount = Int
type TermsWithCount = (Terms, TermsCount)
newtype TermsWeight = TermsWeight { unTermsWeight :: Int }
deriving newtype (Eq, Ord, Num, Show)
------------------------------------------------------------------------
data Tag = POS | NER
deriving (Show, Eq)
......
......@@ -157,11 +157,6 @@ notifyJobKilled env (W.State { name }) (Just bm) = do
-- | Spawn a worker with PGMQ broker
-- TODO:
-- - reduce size of DB pool
-- - progress report via notifications
-- - I think there is no point to save job result, as usually there is none (we have side-effects only)
-- - replace Servant.Job to use workers instead of garg API threads
withPGMQWorker :: HasWorkerBroker
=> WorkerEnv
-> WorkerDefinition
......
This diff is collapsed.
......@@ -12,6 +12,7 @@ Portability : POSIX
{-# OPTIONS_GHC -fno-warn-orphans #-}
{-# LANGUAGE InstanceSigs #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Gargantext.Database.Action.Flow.Extract
......@@ -25,11 +26,12 @@ import Gargantext.Core.Text (HasText(..))
import Gargantext.Core.Text.Corpus.Parsers (splitOn)
import Gargantext.Core.Text.Ngrams (NgramsType(..))
import Gargantext.Core.Text.Terms (ExtractNgramsT, ExtractedNgrams(..), TermType, cleanExtractedNgrams, enrichedTerms, extractNgramsT, extractTerms, tt_lang)
import Gargantext.Core.Types (POS(NP), TermsCount)
import Gargantext.Core.Types (POS(NP), TermsCount, TermsWeight)
import Gargantext.Database.Admin.Types.Hyperdata.Contact ( HyperdataContact, cw_lastName, hc_who )
import Gargantext.Database.Admin.Types.Hyperdata.Document ( HyperdataDocument, hd_authors, hd_bdd, hd_institutes, hd_source )
import Gargantext.Database.Admin.Types.Node ( Node )
import Gargantext.Database.Prelude (DBCmd)
import Gargantext.Database.Query.Table.NgramsPostag (NgramsPostag)
import Gargantext.Database.Schema.Ngrams ( text2ngrams )
import Gargantext.Database.Schema.Node (NodePoly(..))
import Gargantext.Prelude
......@@ -42,7 +44,7 @@ instance ExtractNgramsT HyperdataContact
extractNgramsT _ncs l hc = HashMap.mapKeys (cleanExtractedNgrams 255) <$> extract l hc
where
extract :: TermType Lang -> HyperdataContact
-> DBCmd err (HashMap.HashMap ExtractedNgrams (Map NgramsType Int, TermsCount))
-> DBCmd err (HashMap.HashMap ExtractedNgrams (Map NgramsType TermsWeight, TermsCount))
extract _l hc' = do
let authors = map text2ngrams
$ maybe ["Nothing"] (\a -> [a])
......@@ -59,11 +61,11 @@ instance ExtractNgramsT HyperdataDocument
extractNgramsT :: NLPServerConfig
-> TermType Lang
-> HyperdataDocument
-> DBCmd err (HashMap.HashMap ExtractedNgrams (Map NgramsType Int, TermsCount))
-> DBCmd err (HashMap.HashMap ExtractedNgrams (Map NgramsType TermsWeight, TermsCount))
extractNgramsT ncs lang hd = HashMap.mapKeys (cleanExtractedNgrams 255) <$> extractNgramsT' hd
where
extractNgramsT' :: HyperdataDocument
-> DBCmd err (HashMap.HashMap ExtractedNgrams (Map NgramsType Int, TermsCount))
-> DBCmd err (HashMap.HashMap ExtractedNgrams (Map NgramsType TermsWeight, TermsCount))
extractNgramsT' doc = do
let source = text2ngrams
$ maybe "Nothing" identity
......@@ -77,14 +79,15 @@ instance ExtractNgramsT HyperdataDocument
$ maybe ["Nothing"] (splitOn Authors (doc^. hd_bdd))
$ doc ^. hd_authors
termsWithCounts' <- map (first (enrichedTerms (lang ^. tt_lang) (server ncs) NP)) . concat <$>
termsWithCounts' :: [(NgramsPostag, TermsCount)] <-
map (first (enrichedTerms (lang ^. tt_lang) (server ncs) NP)) . concat <$>
liftBase (extractTerms ncs lang $ hasText doc)
pure $ HashMap.fromList
$ [(SimpleNgrams source, (DM.singleton Sources 1, 1)) ]
<> [(SimpleNgrams i', (DM.singleton Institutes 1, 1)) | i' <- institutes ]
<> [(SimpleNgrams a', (DM.singleton Authors 1, 1)) | a' <- authors ]
<> [(EnrichedNgrams t', (DM.singleton NgramsTerms 1, cnt')) | (t', cnt') <- termsWithCounts' ]
<> [(EnrichedNgrams t', (DM.singleton NgramsTerms 1, cnt')) | (t', cnt') <- termsWithCounts' ]
instance (ExtractNgramsT a, HasText a) => ExtractNgramsT (Node a)
where
......
......@@ -26,7 +26,7 @@ import Gargantext.Core.Text ( HasText )
import Gargantext.API.Admin.Orchestrator.Types qualified as API
import Gargantext.Core.Text.Ngrams (NgramsType(..))
import Gargantext.Core.Text.Terms ( ExtractNgramsT )
import Gargantext.Core.Types (HasValidationError, TermsCount)
import Gargantext.Core.Types (HasValidationError, TermsCount, TermsWeight)
import Gargantext.Core.Utils.Prefix (unPrefix, unPrefixSwagger)
import Gargantext.Database.Admin.Types.Hyperdata.Document ( HyperdataDocument )
import Gargantext.Database.Admin.Types.Node (NodeId)
......@@ -67,7 +67,7 @@ type FlowInsertDB a = ( AddUniqId a
data DocumentIdWithNgrams a b =
DocumentIdWithNgrams
{ documentWithId :: Indexed NodeId a
, documentNgrams :: HashMap b (Map NgramsType Int, TermsCount)
, documentNgrams :: HashMap b (Map NgramsType TermsWeight, TermsCount)
} deriving (Show)
......
......@@ -14,20 +14,21 @@ module Gargantext.Database.Action.Flow.Utils
, documentIdWithNgrams
, insertDocNgrams
, insertDocs
, mapNodeIdNgrams )
, mapNodeIdNgrams
, ngramsByDoc )
where
import Data.HashMap.Strict (HashMap)
import Data.HashMap.Strict qualified as HashMap
import Data.List qualified as List
import Data.Map.Strict qualified as DM
import Data.Text qualified as T
import Gargantext.API.Ngrams.Types qualified as NT
import Gargantext.Core (Lang, toDBid)
import Gargantext.Core.Flow.Types (UniqId, uniqId)
import Gargantext.Core.Text.Ngrams ( Ngrams, NgramsType )
import Gargantext.Core.Text.Terms (ExtractedNgrams(..))
import Gargantext.Core.Text.Terms.WithList (MatchedText, buildPatternsWith, termsInText)
import Gargantext.Core.Types (TermsCount)
import Gargantext.Core.Types (TermsCount, TermsWeight(..))
import Gargantext.Core.Utils (addTuples)
import Gargantext.Data.HashMap.Strict.Utils qualified as HashMap
import Gargantext.Database.Action.Flow.Types (DocumentIdWithNgrams(..), FlowInsertDB)
......@@ -38,15 +39,15 @@ import Gargantext.Database.Query.Table.ContextNodeNgrams ( ContextNodeNgramsPoly
import Gargantext.Database.Query.Table.Node.Document.Add qualified as Doc (add)
import Gargantext.Database.Query.Table.Node.Document.Insert (ReturnId, addUniqId, insertDb, reId, reInserted, reUniqId)
import Gargantext.Database.Query.Table.Node.Error (HasNodeError(..))
import Gargantext.Database.Schema.Context (context_hyperdata, context_id)
import Gargantext.Database.Schema.Ngrams (NgramsId, NgramsTypeId(..))
import Gargantext.Database.Schema.Context (context_oid_hyperdata, context_oid_id)
import Gargantext.Database.Schema.Ngrams (NgramsId, NgramsTypeId(..), text2ngrams)
import Gargantext.Database.Types ( Indexed(..), index )
import Gargantext.Prelude
import Gargantext.Prelude.Crypto.Hash (Hash)
insertDocNgrams :: ListId
-> HashMap (Indexed NgramsId Ngrams) (Map NgramsType (Map DocId (Int, TermsCount)))
-> HashMap (Indexed NgramsId Ngrams) (Map NgramsType (Map DocId (TermsWeight, TermsCount)))
-> DBCmd err Int
insertDocNgrams lId m = do
-- printDebug "[insertDocNgrams] ns" ns
......@@ -55,11 +56,11 @@ insertDocNgrams lId m = do
ns = [ ContextNodeNgrams (nodeId2ContextId docId)
lId (ng^.index)
(NgramsTypeId $ toDBid t)
(fromIntegral i)
(fromIntegral $ unTermsWeight w)
cnt
| (ng, t2n2i) <- HashMap.toList m
, (t, n2i) <- DM.toList t2n2i
, (docId, (i, cnt)) <- DM.toList n2i
, (docId, (w, cnt)) <- DM.toList n2i
]
-- [(NodeId, {Ngrams: ({NgramsType: Int}, TermsCount)})]
......@@ -67,28 +68,26 @@ insertDocNgrams lId m = do
-- Given language, ngrams type, a list of terms and a
-- HyperdataDocument, return ngrams that are in this text, with counts.
-- This is a pure function (doesn't use corenlp nor PostgreSQL FTS).
docNgrams :: Lang
-> NgramsType
-> [NT.NgramsTerm]
-> Gargantext.Database.Admin.Types.Node.Context HyperdataDocument
-> [((MatchedText, TermsCount),
Map NgramsType (Map NodeId Int))]
docNgrams lang nt ts doc =
List.zip
(termsInText lang (buildPatternsWith lang ts)
$ T.unlines $ catMaybes
[ doc ^. context_hyperdata . hd_title
, doc ^. context_hyperdata . hd_abstract
]
-> ContextOnlyId HyperdataDocument
-> [(MatchedText, TermsCount)]
docNgrams lang ts doc =
(
termsInText lang (buildPatternsWith lang ts)
$ T.unlines $ catMaybes
[ doc ^. context_oid_hyperdata . hd_title
, doc ^. context_oid_hyperdata . hd_abstract
]
)
(List.cycle [DM.fromList $ [(nt, DM.singleton (doc ^. context_id) 1 )]])
documentIdWithNgrams :: HasNodeError err
=> (a
-> DBCmd err (HashMap.HashMap b (Map NgramsType Int, TermsCount)))
=> ( a
-> DBCmd err (HashMap.HashMap b (Map NgramsType TermsWeight, TermsCount)) )
-> [Indexed NodeId a]
-> DBCmd err [DocumentIdWithNgrams a b]
documentIdWithNgrams f = traverse toDocumentIdWithNgrams
......@@ -103,7 +102,7 @@ mapNodeIdNgrams :: (Ord b, Hashable b)
=> [DocumentIdWithNgrams a b]
-> HashMap.HashMap b
(Map NgramsType
(Map NodeId (Int, TermsCount))
(Map NodeId (TermsWeight, TermsCount))
)
mapNodeIdNgrams = HashMap.unionsWith (DM.unionWith (DM.unionWith addTuples)) . fmap f
where
......@@ -112,8 +111,8 @@ mapNodeIdNgrams = HashMap.unionsWith (DM.unionWith (DM.unionWith addTuples)) . f
-- for it (which is the number of times the terms appears in a
-- document) is copied over to all its types.
f :: DocumentIdWithNgrams a b
-> HashMap.HashMap b (Map NgramsType (Map NodeId (Int, TermsCount)))
f d = fmap (\(ngramsTypeMap, cnt) -> fmap (\i -> DM.singleton nId (i, cnt)) ngramsTypeMap) $ documentNgrams d
-> HashMap.HashMap b (Map NgramsType (Map NodeId (TermsWeight, TermsCount)))
f d = fmap (\(ngramsTypeMap, cnt) -> fmap (\w -> DM.singleton nId (w, cnt)) ngramsTypeMap) $ documentNgrams d
where
nId = _index $ documentWithId d
......@@ -183,3 +182,27 @@ toInserted =
-- docs <- List.concat <$> mapM getDocumentsWithParentId corpusIds
-- _ <- mapM extractInsert (splitEvery 1000 docs)
-- pure ()
-- | Given list of terms and a document, produce a map for this doc's
-- terms count and weights. Notice that the weight is always 1 here.
ngramsByDoc :: Lang
-> NgramsType
-> [NT.NgramsTerm]
-> ContextOnlyId HyperdataDocument
-> HashMap.HashMap ExtractedNgrams (DM.Map NgramsType (Map NodeId (TermsWeight, TermsCount)))
ngramsByDoc l nt ts doc =
HashMap.map (\cnt -> DM.singleton nt $ DM.singleton nId (1, cnt)) extractedMap
where
matched :: [(MatchedText, TermsCount)]
matched = docNgrams l ts doc
nId :: NodeId
nId = doc ^. context_oid_id
withExtractedNgrams :: [(ExtractedNgrams, TermsCount)]
withExtractedNgrams = first (SimpleNgrams . text2ngrams) <$> matched
extractedMap :: HashMap.HashMap ExtractedNgrams TermsCount
extractedMap = HashMap.fromListWith (+) withExtractedNgrams
......@@ -50,15 +50,18 @@ instance HasText HyperdataDocument
, _hd_abstract h
]
defaultHyperdataDocument :: HyperdataDocument
defaultHyperdataDocument = case decode docExample of
Just hp -> hp
Nothing -> HyperdataDocument Nothing Nothing
emptyHyperdataDocument :: HyperdataDocument
emptyHyperdataDocument = HyperdataDocument Nothing Nothing
Nothing Nothing Nothing Nothing
Nothing Nothing Nothing Nothing
Nothing Nothing Nothing Nothing
Nothing Nothing Nothing Nothing
defaultHyperdataDocument :: HyperdataDocument
defaultHyperdataDocument = case decode docExample of
Just hp -> hp
Nothing -> emptyHyperdataDocument
where
docExample :: ByteString
docExample = "{\"doi\":\"sdfds\",\"publication_day\":6,\"language_iso2\":\"en\",\"publication_minute\":0,\"publication_month\":7,\"language_iso3\":\"eng\",\"publication_second\":0,\"authors\":\"Nils Hovdenak, Kjell Haram\",\"publication_year\":2012,\"publication_date\":\"2012-07-06 00:00:00+00:00\",\"language_name\":\"English\",\"realdate_full_\":\"2012 01 12\",\"source\":\"European journal of obstetrics, gynecology, and reproductive biology\",\"abstract\":\"The literature was searched for publications on minerals and vitamins during pregnancy and the possible influence of supplements on pregnancy outcome.\",\"title\":\"Influence of mineral and vitamin supplements on pregnancy outcome.\",\"publication_hour\":0}"
......
......@@ -106,6 +106,7 @@ type ContextTitle = Text
-- | NodePoly indicates that Node has a Polymorphism Type
type Node json = NodePoly NodeId (Maybe Hash) NodeTypeId UserId (Maybe ParentId) NodeName UTCTime json
type Context json = ContextPoly NodeId (Maybe Hash) NodeTypeId UserId (Maybe ParentId) ContextTitle UTCTime json
type ContextOnlyId json = ContextPolyOnlyId NodeId json
-- | NodeSearch (queries)
-- type NodeSearch json = NodePolySearch NodeId NodeTypeId UserId (Maybe ParentId) NodeName UTCTime json (Maybe TSVector)
......
......@@ -60,9 +60,9 @@ update loggedInUserId (Move sourceId targetId) = do
mbParentId <- getParentId sourceId
-- if the source and the target are the same, this is identity.
case sourceId == targetId of
True -> pure [ _NodeId sourceId ]
False -> do
if sourceId == targetId
then pure [ _NodeId sourceId ]
else do
isSourceRO <- isNodeReadOnly sourceId
isTargetRO <- isNodeReadOnly targetId
......@@ -70,24 +70,29 @@ update loggedInUserId (Move sourceId targetId) = do
-- act accordingly.
ids <- case (isSourceRO, isTargetRO) of
(False, False)
-> -- both are not read-only, normal move
move_db_update sourceId targetId
-> do
-- both are not read-only, normal move
move_db_update sourceId targetId
(False, True)
-> do void $ publish_node (SourceId sourceId) (TargetId targetId) NPP_publish_no_edits_allowed
move_db_update sourceId targetId
-> do
void $ publish_node (SourceId sourceId) (TargetId targetId) NPP_publish_no_edits_allowed
move_db_update sourceId targetId
(True, False)
-> -- the source is read only. If we are the owner we allow unpublishing.
-- FIXME(adn) is this check enough?
do sourceNode <- getNode sourceId
case _node_user_id sourceNode == loggedInUserId of
True -> do
userPublicFolderNode <- getUserRootPublicNode loggedInUserId
unpublishNode (SourceId $ sourceId) (TargetId $ _node_id userPublicFolderNode)
move_db_update sourceId targetId
False -> nodeError (NodeIsReadOnly targetId "logged user is not allowed to move/unpublish a read-only node")
-> do
-- the source is read only. If we are the owner we allow unpublishing.
-- FIXME(adn) is this check enough?
sourceNode <- getNode sourceId
if _node_user_id sourceNode == loggedInUserId
then do
userPublicFolderNode <- getUserRootPublicNode loggedInUserId
unpublishNode (SourceId $ sourceId) (TargetId $ _node_id userPublicFolderNode)
move_db_update sourceId targetId
else
nodeError (NodeIsReadOnly targetId "logged user is not allowed to move/unpublish a read-only node")
(True, True)
-> -- this case is not allowed.
nodeError (NodeIsReadOnly targetId "Both the source and the target are read-only.")
-> do
-- this case is not allowed.
nodeError (NodeIsReadOnly targetId "Both the source and the target are read-only.")
for_ mbParentId $ CE.ce_notify . CE.UpdateTreeFirstLevel
CE.ce_notify $ CE.UpdateTreeFirstLevel targetId
......
......@@ -22,6 +22,7 @@ module Gargantext.Database.Query.Table.NodeContext
, queryNodeContextTable
, selectDocsDates
, selectDocNodes
, selectDocNodesOnlyId
, selectDocs
, nodeContextsCategory
, nodeContextsScore
......@@ -413,6 +414,15 @@ queryDocNodes cId = proc () -> do
restrict -< (c ^. context_typename) .== sqlInt4 (toDBid NodeDocument)
returnA -< c
selectDocNodesOnlyId :: HasDBid NodeType => CorpusId -> DBCmd err [ContextOnlyId HyperdataDocument]
selectDocNodesOnlyId cId = runOpaQuery (queryDocNodesOnlyId cId)
queryDocNodesOnlyId :: HasDBid NodeType => CorpusId -> O.Select ContextOnlyIdRead
queryDocNodesOnlyId cId = proc () -> do
c <- queryDocNodes cId -< ()
returnA -< ContextOnlyId (c ^. context_id) (c ^. context_hyperdata)
joinInCorpus :: O.Select (ContextRead, MaybeFields NodeContextRead)
joinInCorpus = proc () -> do
c <- queryContextTable -< ()
......
......@@ -52,6 +52,25 @@ $(makeLenses ''ContextPoly)
$(makeAdaptorAndInstance "pContext" ''ContextPoly)
$(makeLensesWith abbreviatedFields ''ContextPoly)
------------------------------------------------------------------------
-- | This datatype describes queries in the `contexts` table, where
-- only `id` and `hyperdata` are fetched.
data ContextPolyOnlyId id hyperdata =
ContextOnlyId { _context_oid_id :: !id
, _context_oid_hyperdata :: !hyperdata }
deriving (Show, Generic)
------------------------------------------------------------------------
-- Automatic instances derivation
$(deriveJSON (unPrefix "_context_oid_") ''ContextPolyOnlyId)
$(makeLenses ''ContextPolyOnlyId)
$(makeAdaptorAndInstance "pContextOnlyId" ''ContextPolyOnlyId)
$(makeLensesWith abbreviatedFields ''ContextPolyOnlyId)
------------------------------------------------------------------------
contextTable :: Table ContextWrite ContextRead
contextTable = Table "contexts" (pContext Context { _context_id = optionalTableField "id"
, _context_hash_id = optionalTableField "hash_id"
......@@ -87,6 +106,10 @@ type ContextRead = ContextPoly (Field SqlInt4 )
(Field SqlText )
(Field SqlTimestamptz )
(Field SqlJsonb )
type ContextOnlyIdRead = ContextPolyOnlyId (Field SqlInt4 )
(Field SqlJsonb )
------------------------------------------------------------------------
-- | Context(Read|Write)Search is slower than Context(Write|Read) use it
-- for full text search only
......
......@@ -257,15 +257,15 @@
git: "https://gitlab.iscpif.fr/gargantext/gargantext-graph.git"
subdirs:
- .
- commit: 69b7388a62f2afb5cb5609beac96e8cb35e94478
- commit: 4a9c709613554eed0189b486de2126c18797088c
git: "https://gitlab.iscpif.fr/gargantext/haskell-bee"
subdirs:
- "haskell-bee-pgmq/"
- commit: 69b7388a62f2afb5cb5609beac96e8cb35e94478
- commit: 4a9c709613554eed0189b486de2126c18797088c
git: "https://gitlab.iscpif.fr/gargantext/haskell-bee"
subdirs:
- "haskell-bee-tests/"
- commit: 69b7388a62f2afb5cb5609beac96e8cb35e94478
- commit: 4a9c709613554eed0189b486de2126c18797088c
git: "https://gitlab.iscpif.fr/gargantext/haskell-bee"
subdirs:
- "haskell-bee/"
......
......@@ -22,8 +22,6 @@ data_filepath = "~/.garg"
#repo_filepath = "~/.garg"
[apis]
[apis.pubmed]
api_key = "no_key"
[apis.epo]
api_url = ""
......
......@@ -11,6 +11,7 @@ Portability : POSIX
{-# OPTIONS_GHC -Wno-orphans #-}
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE ScopedTypeVariables #-}
......@@ -20,28 +21,40 @@ module Test.API.Notifications (
import Control.Concurrent (threadDelay)
import Control.Concurrent.STM.TChan
import Control.Concurrent.STM.TSem (newTSem, signalTSem)
import Control.Concurrent.STM.TSem (newTSem, signalTSem, TSem)
import Control.Lens ((^.))
import Control.Monad (void)
import Control.Monad.STM (atomically)
import Data.Aeson qualified as Aeson
import Data.ByteString qualified as BS
import Data.Text qualified as T
import Data.Text.Encoding qualified as TE
import Fmt ((+|), (|+))
import Gargantext.API.Admin.Auth.Types (AuthResponse, authRes_token, authRes_tree_id)
import Gargantext.Core.Config (gc_notifications_config)
import Gargantext.Core.Notifications.CentralExchange qualified as CE
import Gargantext.Core.Notifications.CentralExchange.Types qualified as CET
import Gargantext.Core.Notifications.Dispatcher.Types qualified as DT
import Gargantext.Core.Types.Individu (GargPassword(..))
import Gargantext.System.Logging (withLogger)
import Network.WebSockets qualified as WS
import Prelude
import System.Timeout qualified as Timeout
import Test.API.Setup (SpecContext(..), withTestDBAndPort)
import Test.API.Prelude (newCorpusForUser)
import Test.API.Routes (mkUrl)
import Test.API.Setup (SpecContext(..), dbEnvSetup, withTestDBAndPort)
import Test.Database.Types (test_config)
import Test.Hspec
import Test.Hspec.Wai.Internal (withApplication)
import Test.Instances ()
import Test.Utils (waitForTChanValue, waitForTSem)
import Text.RawString.QQ (r)
import Test.Utils (protected, waitForTChanValue, waitForTSem, withValidLoginA)
import Test.Utils.Notifications (withAsyncWSConnection)
tests :: Spec
tests = sequential $ aroundAll withTestDBAndPort $ do
tests = sequential $ around withTestDBAndPort $ do
describe "Notifications" $ do
it "ping WS notification works" $ \(SpecContext testEnv port _app _) -> do
let nc = (test_config testEnv) ^. gc_notifications_config
......@@ -54,20 +67,8 @@ tests = sequential $ aroundAll withTestDBAndPort $ do
-- locking mechanisms than blindly call 'threadDelay'.
wsTSem <- atomically $ newTSem 0
tchan <- newTChanIO :: IO (TChan (Maybe DT.Notification))
-- setup a websocket connection
let wsConnect conn = withLogger () $ \_ioL -> do
-- logMsg ioL DEBUG $ "[wsConnect] subscribing topic: " <> show topic
WS.sendTextData conn $ Aeson.encode (DT.WSSubscribe topic)
-- inform the test process that we sent the subscription request
atomically $ signalTSem wsTSem
-- logMsg ioL DEBUG $ "[wsConnect] waiting for notification"
d <- WS.receiveData conn
-- logMsg ioL DEBUG $ "[wsConnect] received " <> show d
let dec = Aeson.decode d :: Maybe DT.Notification
atomically $ writeTChan tchan dec
withAsyncWSConnection ("127.0.0.1", port) wsConnect $ \_a -> do
withAsyncWSConnection ("127.0.0.1", port) (wsConnection topic wsTSem tchan) $ \_a -> do
-- wait for ws process to inform us about topic subscription
waitForTSem wsTSem 500
......@@ -133,31 +134,99 @@ tests = sequential $ aroundAll withTestDBAndPort $ do
-- wait for the value
waitForTChanValue tchan Nothing 1_000
it "simple update tree WS notification works" $ \(SpecContext testEnv port _app _) -> do
let nc = (test_config testEnv) ^. gc_notifications_config
describe "Update tree notifications" $ do
it "simple WS notification works" $ \(SpecContext testEnv port _app _) -> do
let nc = (test_config testEnv) ^. gc_notifications_config
let topic = DT.UpdateTree 0
wsTSem <- atomically $ newTSem 0 -- initially locked
tchan <- newTChanIO :: IO (TChan (Maybe DT.Notification))
withAsyncWSConnection ("127.0.0.1", port) (wsConnection topic wsTSem tchan) $ \_a -> do
waitForTSem wsTSem 500
let nodeId = 0
CE.notify nc $ CET.UpdateTreeFirstLevel nodeId
waitForTChanValue tchan (Just $ DT.NUpdateTree nodeId) 1_000
let topic = DT.UpdateTree 0
wsTSem <- atomically $ newTSem 0 -- initially locked
tchan <- newTChanIO :: IO (TChan (Maybe DT.Notification))
-- setup a websocket connection
let wsConnect conn = withLogger () $ \_ioL -> do
-- logMsg ioL DEBUG $ "[wsConnect] subscribing topic: " <> show topic
WS.sendTextData conn $ Aeson.encode (DT.WSSubscribe topic)
-- inform the test process that we sent the subscription request
atomically $ signalTSem wsTSem
it "WS notification on node creation works" $ \ctx@(SpecContext _testEnv port app _) -> do
checkNotification ctx $ \authRes -> do
let token = authRes ^. authRes_token
let treeId = authRes ^. authRes_tree_id
let query = [r| {"pn_name": "test", "pn_typename": "NodeCorpus"} |]
void $ withApplication app $ do
protected token "POST" (mkUrl port $ "/node/" +| treeId |+ "") query
it "WS notification on node deletion works" $ \ctx@(SpecContext testEnv port app _) -> do
checkNotification ctx $ \authRes -> do
let token = authRes ^. authRes_token
cId <- newCorpusForUser testEnv "alice"
-- logMsg ioL DEBUG $ "[wsConnect] waiting for notification"
d <- WS.receiveData conn
-- logMsg ioL DEBUG $ "[wsConnect] received " <> show d
let dec = Aeson.decode d :: Maybe DT.Notification
atomically $ writeTChan tchan dec
withAsyncWSConnection ("127.0.0.1", port) wsConnect $ \_a -> do
waitForTSem wsTSem 500
let nodeId = 0
CE.notify nc $ CET.UpdateTreeFirstLevel nodeId
waitForTChanValue tchan (Just $ DT.NUpdateTree nodeId) 1_000
void $ withApplication app $ do
protected token "DELETE" (mkUrl port $ "/node/" +| cId |+ "") ""
it "WS notification on node rename works" $ \ctx@(SpecContext testEnv port app _) -> do
checkNotification ctx $ \authRes -> do
let token = authRes ^. authRes_token
cId <- newCorpusForUser testEnv "alice"
void $ withApplication app $ do
let query = [r| {"name": "newName"} |]
protected token "PUT" (mkUrl port $ "/node/" +| cId |+ "/rename") query
it "WS notification on node move works" $ \ctx@(SpecContext testEnv port app _) -> do
checkNotification ctx $ \authRes -> do
let token = authRes ^. authRes_token
cId <- newCorpusForUser testEnv "alice"
cId2 <- newCorpusForUser testEnv "alice"
void $ withApplication app $ do
let query = BS.fromStrict $ TE.encodeUtf8 $ "[" <> (T.pack $ show cId2) <> "]"
protected token "PUT" (mkUrl port $ "/node/" +| cId |+ "/move/" +| cId2 |+ "" ) query
-- | Given spec context and an action, call that action to perform
-- some node tree update, and check that there was a notification
-- about this tree update.
checkNotification :: SpecContext a
-> (AuthResponse -> IO ())
-> IO ()
checkNotification ctx@(SpecContext _testEnv port _app _) act = do
_ <- dbEnvSetup ctx
withValidLoginA port "alice" (GargPassword "alice") $ \_clientEnv authRes -> do
-- Subscribe to user tree notifications
let treeId = authRes ^. authRes_tree_id
let topic = DT.UpdateTree treeId
wsTSem <- atomically $ newTSem 0 -- initially locked
tchan <- newTChanIO :: IO (TChan (Maybe DT.Notification))
withAsyncWSConnection ("127.0.0.1", port) (wsConnection topic wsTSem tchan) $ \_a -> do
waitForTSem wsTSem 500
act authRes
waitForTChanValue tchan (Just $ DT.NUpdateTree treeId) 1_000
wsConnection :: DT.Topic
-> TSem
-> TChan (Maybe DT.Notification)
-> WS.Connection
-> IO ()
wsConnection topic wsTSem tchan conn = withLogger () $ \_ioL -> do
-- logMsg ioL DEBUG $ "[wsConnect] subscribing topic: " <> show topic
WS.sendTextData conn $ Aeson.encode (DT.WSSubscribe topic)
-- inform the test process that we sent the subscription request
atomically $ signalTSem wsTSem
-- logMsg ioL DEBUG $ "[wsConnect] waiting for notification"
d <- WS.receiveData conn
-- logMsg ioL DEBUG $ "[wsConnect] received " <> show d
let dec = Aeson.decode d :: Maybe DT.Notification
atomically $ writeTChan tchan dec
......@@ -16,8 +16,7 @@ import Data.Aeson qualified as JSON
import Data.Text qualified as T
import Gargantext.API.Errors
import Gargantext.Core.Types.Individu
import Gargantext.Core.Types (NodeId)
import Gargantext.Core.Types (NodeType(..))
import Gargantext.Core.Types (NodeId, NodeType(..))
import Gargantext.Core.Worker.Env () -- instance HasNodeError
import Gargantext.Database.Action.User
import Gargantext.Database.Admin.Types.Hyperdata.Corpus
......
......@@ -186,7 +186,7 @@ dbEnvSetup ctx = do
_ <- createAliceAndBob testEnv
pure ctx
-- show the full exceptions during testing, rather than shallowing them under a generic
-- "Something went wrong".
showDebugExceptions :: SomeException -> Wai.Response
......
{-|
Module : Core.Orchestrator
Description :
Copyright : (c) CNRS, 2017-Present
License : AGPL + CECILL v3
Maintainer : team@gargantext.org
Stability : experimental
Portability : POSIX
-}
module Test.Core.Orchestrator
( qcTests )
where
import Data.Aeson qualified as A
import Gargantext.API.Admin.Orchestrator.Types
import Gargantext.Prelude
import Test.Instances ()
import Test.Tasty
import Test.Tasty.QuickCheck qualified as QC
qcTests :: TestTree
qcTests =
testGroup "Orchestrator QuickCheck tests" $ do
[ QC.testProperty "ExternalAPIs aeson encoding" $ \m -> A.decode (A.encode (m :: ExternalAPIs)) == Just m ]
......@@ -23,13 +23,15 @@ import Data.Text qualified as T
import Data.Validity (Validation(..), ValidationChain (..), prettyValidation)
import EPO.API.Client.Types qualified as EPO
import Gargantext.API.Admin.Auth.Types (AuthenticatedUser, ForgotPasswordAsyncParams)
import Gargantext.API.Admin.Orchestrator.Types qualified as Orch
import Gargantext.API.Errors.Types qualified as Errors
import Gargantext.API.Ngrams.Types qualified as Ngrams
import Gargantext.API.Node.Contact.Types (AddContactParams)
import Gargantext.API.Node.Corpus.Annuaire (AnnuaireWithForm)
import Gargantext.API.Node.Corpus.New (ApiInfo)
import Gargantext.API.Node.Corpus.New.Types (FileFormat, FileType)
import Gargantext.API.Node.Corpus.Types (Datafield, Database)
import Gargantext.API.Node.Corpus.Types (Datafield)
import Gargantext.API.Node.Corpus.Types qualified as CT
import Gargantext.API.Node.DocumentsFromWriteNodes.Types qualified as DFWN
import Gargantext.API.Node.DocumentUpload.Types (DocumentUpload)
import Gargantext.API.Node.FrameCalcUpload.Types qualified as FCU
......@@ -127,7 +129,9 @@ instance Arbitrary ApiInfo where arbitrary = genericArbitrary
instance Arbitrary FileFormat where arbitrary = genericArbitrary
instance Arbitrary FileType where arbitrary = genericArbitrary
instance Arbitrary Database where arbitrary = arbitraryBoundedEnum
instance Arbitrary CT.Database where
arbitrary = oneof [ pure CT.Empty
, CT.DB <$> arbitrary ]
instance Arbitrary Datafield where arbitrary = genericArbitrary
instance Arbitrary WithQuery where arbitrary = genericArbitrary
......@@ -291,11 +295,34 @@ instance Arbitrary Hyperdata.HyperdataPublic where
arbitrary = pure Hyperdata.defaultHyperdataPublic
instance Arbitrary Orch.ExternalAPIs where
arbitrary = oneof [ pure Orch.OpenAlex
, Orch.PubMed <$> arbitrary
, pure Orch.Arxiv
, pure Orch.HAL
, pure Orch.IsTex
, pure Orch.Isidore
, Orch.EPO <$> arbitrary <*> arbitrary ]
-- instance Arbitrary NewWithFile where
-- arbitrary = NewWithFile <$> arbitrary -- _wfi_b64_data
-- <*> arbitrary -- _wf_lang
-- <*> arbitrary -- _wf_name
instance Arbitrary Orch.ScraperEvent where
arbitrary = Orch.ScraperEvent <$> elements [Nothing, Just "test message"]
<*> elements [Nothing, Just "INFO", Just "WARN"]
<*> elements [Nothing, Just "2018-04-18"]
instance Arbitrary Orch.JobLog where
arbitrary = Orch.JobLog
<$> arbitrary
<*> arbitrary
<*> arbitrary
<*> arbitrary
instance Arbitrary NewWithForm where arbitrary = genericArbitrary
instance Arbitrary RenameNode where
......
{-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-}
{-# LANGUAGE TypeApplications #-}
module Test.Ngrams.Terms (tests) where
import Data.HashMap.Strict qualified as HashMap
import Data.Map.Strict qualified as Map
import Gargantext.API.Ngrams
import Gargantext.Core (Lang(..))
import Gargantext.Core.Text.Context (TermList)
import Gargantext.Core.Text.Ngrams (NgramsType(NgramsTerms), Ngrams(..))
import Gargantext.Core.Text.Terms (ExtractedNgrams(..))
import Gargantext.Core.Text.Terms.WithList (buildPatterns, buildPatternsWith, extractTermsWithList', termsInText, Pattern(..))
import Gargantext.Database.Action.Flow.Utils (docNgrams, ngramsByDoc)
import Gargantext.Database.Admin.Types.Hyperdata.Document ( HyperdataDocument(..), emptyHyperdataDocument )
import Gargantext.Database.Admin.Types.Node (NodeId(..))
import Gargantext.Database.Schema.Context ( ContextPolyOnlyId(..) )
import Gargantext.Prelude
import Test.Tasty
import Test.Tasty.HUnit
tests :: TestTree
tests = testGroup "Ngrams" [unitTests]
unitTests :: TestTree
unitTests = testGroup "Terms tests"
[ -- Sorting
testCase "Build patterns works 01" testBuildPatterns01
, testCase "Build patterns works 02" testBuildPatterns02
, testCase "termsInText works 01" testTermsInText01
, testCase "termsInText works 02" testTermsInText02
, testCase "termsInText works 03" testTermsInText03
, testCase "termsInText works 04 (related to issue #221)" testTermsInText04
, testCase "extractTermsWithList' works 01" testExtractTermsWithList'01
, testCase "docNgrams works 01" testDocNgrams01
, testCase "docNgrams works 02" testDocNgrams02
, testCase "ngramsByDoc works 01" testNgramsByDoc01
]
-- | Let's document how the `buildPatternsWith` function works.
testBuildPatterns01 :: Assertion
testBuildPatterns01 = do
let terms = ["hello world"] :: [NgramsTerm]
let enPatterns = buildPatternsWith EN terms
length enPatterns @?= 1
let Just pat = head enPatterns
_pat_length pat @?= 2
_pat_terms pat @?= ["hello", "world"]
-- | Let's document how the `buildPatternsWith` function works.
testBuildPatterns02 :: Assertion
testBuildPatterns02 = do
let terms = ["hello", "world"] :: [NgramsTerm]
let enPatterns = buildPatternsWith EN terms
length enPatterns @?= 2
let [pat1, pat2] = enPatterns
_pat_length pat1 @?= 1
_pat_terms pat1 @?= ["hello"]
_pat_length pat2 @?= 1
_pat_terms pat2 @?= ["world"]
-- | Let's document how the `termsInText` function works.
testTermsInText01 :: Assertion
testTermsInText01 = do
let terms = ["hello", "world"] :: [NgramsTerm]
let enPatterns = buildPatternsWith EN terms
let tit = termsInText EN enPatterns "Hello, world!"
length tit @?= 2
let [tit1, tit2] = tit
tit1 @?= ("hello", 1)
tit2 @?= ("world", 1)
-- | Let's document how the `termsInText` function works.
testTermsInText02 :: Assertion
testTermsInText02 = do
let terms = ["hello", "world"] :: [NgramsTerm]
let enPatterns = buildPatternsWith EN terms
let tit = termsInText EN enPatterns "Hello, world, hello!"
length tit @?= 2
let [tit1, tit2] = tit
tit1 @?= ("hello", 2)
tit2 @?= ("world", 1)
-- | Let's document how the `termsInText` function works.
testTermsInText03 :: Assertion
testTermsInText03 = do
let terms = ["hello", "world"] :: [NgramsTerm]
let enPatterns = buildPatternsWith EN terms
let tit = termsInText EN enPatterns "Hello, world, again!"
length tit @?= 2
let [tit1, tit2] = tit
tit1 @?= ("hello", 1)
tit2 @?= ("world", 1)
-- | Let's document how the `termsInText` function works.
-- https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/221
testTermsInText04 :: Assertion
testTermsInText04 = do
let terms = ["feuilles de basilic"] :: [NgramsTerm]
let frPatterns = buildPatternsWith FR terms
let tit = termsInText FR frPatterns "Infos pratiques Nombre de personnes 1 personne Quantité 1 verre Temps de préparation 5 minutes Degré de difficulté Très facile Coût Abordable Les ingrédients de la recette 4 feuilles de basilic 1 branche de romarin 15 ml de citron jaune 60 ml d'eau gazeuse au mastiqua 90 ml de Bulles de Muscat Jaillance La préparation de la recette Verser dans un verre type long drink le citron jaune, les feuilles de basilic et l'eau gazeuse."
length tit @?= 1
let [tit1] = tit
tit1 @?= ("feuilles de basilic", 2)
testExtractTermsWithList'01 :: Assertion
testExtractTermsWithList'01 = do
let termList = [(["chat blanc"], [["chat","blanc"]])] :: TermList
extractTermsWithList' (buildPatterns termList) "Le chat blanc" @?= ["chat blanc"]
testDocNgrams01 :: Assertion
testDocNgrams01 = do
let terms = ["hello", "world"] :: [NgramsTerm]
let hd = emptyHyperdataDocument { _hd_title = Just "hello world"
, _hd_abstract = Nothing }
let ctx = ContextOnlyId 1 hd
let dNgrams = docNgrams EN terms ctx
length dNgrams @?= 2
testDocNgrams02 :: Assertion
testDocNgrams02 = do
let terms = ["hello", "world"] :: [NgramsTerm]
let hd = emptyHyperdataDocument { _hd_title = Just "hello world, kaboom"
, _hd_abstract = Nothing }
let ctx = ContextOnlyId 1 hd
let dNgrams = docNgrams EN terms ctx
length dNgrams @?= 2
testNgramsByDoc01 :: Assertion
testNgramsByDoc01 = do
let terms = ["hello", "world"] :: [NgramsTerm]
let hd1 = emptyHyperdataDocument { _hd_title = Just "hello world, kaboom"
, _hd_abstract = Nothing }
let ctx1 = ContextOnlyId 1 hd1
let hd2 = emptyHyperdataDocument { _hd_title = Just "world, boom world"
, _hd_abstract = Nothing }
let ctx2 = ContextOnlyId 2 hd2
ngramsByDoc EN NgramsTerms terms ctx1 @?=
HashMap.fromList
[ ( SimpleNgrams $ UnsafeNgrams { _ngramsTerms = "hello", _ngramsSize = 1 }
, Map.singleton NgramsTerms $ Map.singleton (UnsafeMkNodeId 1) (1, 1) )
, ( SimpleNgrams $ UnsafeNgrams { _ngramsTerms = "world", _ngramsSize = 1 }
, Map.singleton NgramsTerms $ Map.singleton (UnsafeMkNodeId 1) (1, 1) )
]
ngramsByDoc EN NgramsTerms terms ctx2 @?=
HashMap.fromList
[ ( SimpleNgrams $ UnsafeNgrams { _ngramsTerms = "world", _ngramsSize = 1 }
, Map.singleton NgramsTerms $ Map.singleton (UnsafeMkNodeId 2) (1, 2) )
]
......@@ -63,6 +63,8 @@ tests = testGroup "JSON" [
, testProperty "NodeType roundtrips" (jsonEnumRoundtrip (Dict @_ @NodeType))
, testProperty "NodePublishPolicy roundtrips" (jsonEnumRoundtrip (Dict @_ @NodePublishPolicy))
, testCase "WithQuery frontend compliance" testWithQueryFrontend
, testCase "WithQuery frontend compliance (PubMed)" testWithQueryFrontendPubMed
, testCase "WithQuery frontend compliance (EPO)" testWithQueryFrontendEPO
, testGroup "Phylo" [
testProperty "PeriodToNode" (jsonRoundtrip @PeriodToNodeData)
, testProperty "GraphData" (jsonRoundtrip @GraphData)
......@@ -83,11 +85,37 @@ testWithQueryFrontend = do
Left err -> fail $ "JSON instance will break frontend!: JSON decoding returned: " <> err
Right _ -> pure ()
testWithQueryFrontendPubMed :: Assertion
testWithQueryFrontendPubMed = do
case eitherDecode @WithQuery (C8.pack cannedWithQueryPayloadPubMed) of
Left err -> fail $ "JSON instance will break frontend (PubMed)!: JSON decoding returned: " <> err
Right _ -> pure ()
testWithQueryFrontendEPO :: Assertion
testWithQueryFrontendEPO = do
case eitherDecode @WithQuery (C8.pack cannedWithQueryPayloadEPO) of
Left err -> fail $ "JSON instance will break frontend (EPO)!: JSON decoding returned: " <> err
Right _ -> pure ()
-- The aim of this type is to catch regressions in the frontend serialisation; this
-- is what the frontend currently expects, and therefore if we were to change the JSON
-- instances, this test would fail, and we will be notified.
cannedWithQueryPayload :: String
cannedWithQueryPayload = [r| {"query":"Haskell","node_id":138,"lang":"EN","flowListWith":{"type":"MyListsFirst"},"datafield": { "External": "Arxiv"},"databases":"Arxiv"} |]
cannedWithQueryPayload = [r| {"query":"Haskell","node_id":138,"lang":"EN","flowListWith":{"type":"MyListsFirst"},"datafield": { "External": {"db": "Arxiv"}}} |]
-- The aim of this type is to catch regressions in the frontend serialisation; this
-- is what the frontend currently expects, and therefore if we were to change the JSON
-- instances, this test would fail, and we will be notified.
cannedWithQueryPayloadPubMed :: String
cannedWithQueryPayloadPubMed = [r| {"query":"Haskell","node_id":138,"lang":"EN","flowListWith":{"type":"MyListsFirst"},"datafield": { "External": {"db": "PubMed", "api_key": "x"}}} |]
-- The aim of this type is to catch regressions in the frontend serialisation; this
-- is what the frontend currently expects, and therefore if we were to change the JSON
-- instances, this test would fail, and we will be notified.
cannedWithQueryPayloadEPO :: String
cannedWithQueryPayloadEPO = [r| {"query":"Haskell","node_id":138,"lang":"EN","flowListWith":{"type":"MyListsFirst"},"datafield": { "External": {"db": "EPO", "api_user": "user", "api_token": "token"}}} |]
testParseBpaPhylo :: Assertion
testParseBpaPhylo = do
......
......@@ -12,6 +12,9 @@ module Main where
import Gargantext.Prelude
import qualified Test.Core.Notifications as Notifications
import qualified Test.Core.Orchestrator as Orchestrator
import qualified Test.Core.Similarity as Similarity
import qualified Test.Core.Text.Corpus.Query as CorpusQuery
import qualified Test.Core.Text.Corpus.TSV as TSVParser
import qualified Test.Core.Utils as Utils
......@@ -20,19 +23,27 @@ import qualified Test.Graph.Clustering as Graph
import qualified Test.Ngrams.Lang.Occurrences as Occurrences
import qualified Test.Ngrams.NLP as NLP
import qualified Test.Ngrams.Query as NgramsQuery
import qualified Test.Offline.JSON as JSON
import qualified Test.Ngrams.Terms as NgramsTerms
import qualified Test.Offline.Errors as Errors
import qualified Test.Offline.JSON as JSON
import qualified Test.Offline.Phylo as Phylo
import qualified Test.Offline.Stemming.Lancaster as Lancaster
import qualified Test.Parsers.Date as PD
import qualified Test.Utils.Crypto as Crypto
import qualified Test.Utils.Jobs as Jobs
import qualified Test.Core.Similarity as Similarity
import qualified Test.Core.Notifications as Notifications
import System.IO (hGetBuffering, hSetBuffering)
import Test.Tasty
import Test.Tasty.Hspec
-- | https://mercurytechnologies.github.io/ghciwatch/integration/tasty.html
protectStdoutBuffering :: IO a -> IO a
protectStdoutBuffering action =
bracket
(hGetBuffering stdout)
(\bufferMode -> hSetBuffering stdout bufferMode)
(const action)
main :: IO ()
main = do
utilSpec <- testSpec "Utils" Utils.test
......@@ -45,7 +56,7 @@ main = do
asyncUpdatesSpec <- testSpec "Notifications" Notifications.test
occurrencesSpec <- testSpec "Occurrences" Occurrences.test
defaultMain $ testGroup "Gargantext"
protectStdoutBuffering $ defaultMain $ testGroup "Gargantext"
[ utilSpec
, clusteringSpec
, dateSplitSpec
......@@ -65,4 +76,6 @@ main = do
, Worker.tests
, asyncUpdatesSpec
, Notifications.qcTests
, Orchestrator.qcTests
, NgramsTerms.tests
]
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment