Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
160
Issues
160
List
Board
Labels
Milestones
Merge Requests
14
Merge Requests
14
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gargantext
haskell-gargantext
Commits
fcf83bf7
Verified
Commit
fcf83bf7
authored
Jan 07, 2025
by
Przemyslaw Kaminski
1
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[ngrams] more types annotations
parent
0d8a77c4
Pipeline
#7187
passed with stages
in 53 minutes and 43 seconds
Changes
2
Pipelines
1
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
10 additions
and
9 deletions
+10
-9
Terms.hs
src/Gargantext/Core/Text/Terms.hs
+6
-8
Extract.hs
src/Gargantext/Database/Action/Flow/Extract.hs
+4
-1
No files found.
src/Gargantext/Core/Text/Terms.hs
View file @
fcf83bf7
...
...
@@ -176,7 +176,7 @@ terms :: NLPServerConfig -> TermType Lang -> Text -> IO [TermsWithCount]
terms
_
(
Mono
lang
)
txt
=
pure
$
monoTerms
lang
txt
terms
ncs
(
Multi
lang
)
txt
=
multiterms
ncs
lang
txt
terms
ncs
(
MonoMulti
lang
)
txt
=
terms
ncs
(
Multi
lang
)
txt
terms
_
(
Unsupervised
{
..
})
txt
=
pure
$
termsUnsupervised
(
Unsupervised
{
_tt_model
=
Just
m'
,
..
})
txt
terms
_
(
Unsupervised
{
..
})
txt
=
pure
$
termsUnsupervised
_tt_lang
m'
_tt_windowSize
_tt_ngramsSize
txt
where
m'
=
maybe
(
newTries
_tt_ngramsSize
txt
)
identity
_tt_model
-- terms (WithList list) txt = pure . concat $ extractTermsWithList list txt
...
...
@@ -189,17 +189,15 @@ type MinNgramSize = Int
-- | Unsupervised ngrams extraction
-- language agnostic extraction
-- TODO: newtype BlockText
termsUnsupervised
::
TermType
Lang
->
Text
->
[
TermsWithCount
]
termsUnsupervised
(
Unsupervised
{
_tt_model
=
Nothing
})
=
panicTrace
"[termsUnsupervised] no model"
termsUnsupervised
(
Unsupervised
{
_tt_model
=
Just
_tt_model
,
..
})
=
map
(
first
(
text2term
_tt_lang
))
termsUnsupervised
::
Lang
->
Tries
Token
()
->
Int
->
Int
->
Text
->
[
TermsWithCount
]
termsUnsupervised
lang
model
windowSize
ngramsSize
=
map
(
first
(
text2term
lang
))
.
groupWithCounts
-- . List.nub
.
List
.
filter
(
\
l'
->
List
.
length
l'
>=
_tt_
windowSize
)
.
List
.
filter
(
\
l'
->
List
.
length
l'
>=
windowSize
)
.
List
.
concat
.
mainEleveWith
_tt_model
_tt_
ngramsSize
.
mainEleveWith
model
ngramsSize
.
uniText
termsUnsupervised
_
=
undefined
newTries
::
Int
->
Text
->
Tries
Token
()
...
...
src/Gargantext/Database/Action/Flow/Extract.hs
View file @
fcf83bf7
...
...
@@ -12,6 +12,7 @@ Portability : POSIX
{-# OPTIONS_GHC -fno-warn-orphans #-}
{-# LANGUAGE InstanceSigs #-}
{-# LANGUAGE ScopedTypeVariables #-}
module
Gargantext.Database.Action.Flow.Extract
...
...
@@ -30,6 +31,7 @@ import Gargantext.Database.Admin.Types.Hyperdata.Contact ( HyperdataContact, cw_
import
Gargantext.Database.Admin.Types.Hyperdata.Document
(
HyperdataDocument
,
hd_authors
,
hd_bdd
,
hd_institutes
,
hd_source
)
import
Gargantext.Database.Admin.Types.Node
(
Node
)
import
Gargantext.Database.Prelude
(
DBCmd
)
import
Gargantext.Database.Query.Table.NgramsPostag
(
NgramsPostag
)
import
Gargantext.Database.Schema.Ngrams
(
text2ngrams
)
import
Gargantext.Database.Schema.Node
(
NodePoly
(
..
))
import
Gargantext.Prelude
...
...
@@ -77,7 +79,8 @@ instance ExtractNgramsT HyperdataDocument
$
maybe
[
"Nothing"
]
(
splitOn
Authors
(
doc
^.
hd_bdd
))
$
doc
^.
hd_authors
termsWithCounts'
<-
map
(
first
(
enrichedTerms
(
lang
^.
tt_lang
)
(
server
ncs
)
NP
))
.
concat
<$>
termsWithCounts'
::
[(
NgramsPostag
,
TermsCount
)]
<-
map
(
first
(
enrichedTerms
(
lang
^.
tt_lang
)
(
server
ncs
)
NP
))
.
concat
<$>
liftBase
(
extractTerms
ncs
lang
$
hasText
doc
)
pure
$
HashMap
.
fromList
...
...
Przemyslaw Kaminski
@cgenie
mentioned in commit
03b33383
·
Jan 30, 2025
mentioned in commit
03b33383
mentioned in commit 03b33383dd67c1821a4edb4628923cf7bd039d90
Toggle commit list
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment