Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
199
Issues
199
List
Board
Labels
Milestones
Merge Requests
12
Merge Requests
12
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gargantext
haskell-gargantext
Commits
14246fa5
Commit
14246fa5
authored
Mar 22, 2022
by
Alexandre Delanoë
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[FIX] NLP API + group revert
parent
996fd394
Changes
4
Show whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
56 additions
and
40 deletions
+56
-40
Multi.hs
src/Gargantext/Core/Text/Terms/Multi.hs
+17
-8
Group.hs
src/Gargantext/Core/Text/Terms/Multi/Group.hs
+2
-1
Types.hs
src/Gargantext/Core/Types.hs
+31
-25
JohnSnowNLP.hs
src/Gargantext/Utils/JohnSnowNLP.hs
+6
-6
No files found.
src/Gargantext/Core/Text/Terms/Multi.hs
View file @
14246fa5
...
...
@@ -12,7 +12,7 @@ Multi-terms are ngrams where n > 1.
-}
module
Gargantext.Core.Text.Terms.Multi
(
multiterms
,
multiterms_rake
)
module
Gargantext.Core.Text.Terms.Multi
(
multiterms
,
multiterms_rake
,
tokenTagsWith
)
where
import
Data.Text
hiding
(
map
,
group
,
filter
,
concat
)
...
...
@@ -28,6 +28,11 @@ import qualified Gargantext.Core.Text.Terms.Multi.Lang.En as En
import
qualified
Gargantext.Core.Text.Terms.Multi.Lang.Fr
as
Fr
import
Gargantext.Core.Text.Terms.Multi.RAKE
(
multiterms_rake
)
import
qualified
Gargantext.Utils.JohnSnowNLP
as
JohnSnow
-------------------------------------------------------------------
type
NLP_API
=
Lang
->
Text
->
IO
PosSentences
-------------------------------------------------------------------
-- To be removed
...
...
@@ -39,19 +44,23 @@ multiterms' f lang txt = concat
<$>
map
(
map
f
)
<$>
map
(
filter
(
\
t
->
_my_token_pos
t
==
Just
NP
))
<$>
tokenTags
lang
txt
-------------------------------------------------------------------
tokenTag2terms
::
TokenTag
->
Terms
tokenTag2terms
(
TokenTag
ws
t
_
_
)
=
Terms
ws
t
tokenTags
::
Lang
->
Text
->
IO
[[
TokenTag
]]
tokenTags
lang
s
=
map
(
groupTokens
lang
)
<$>
tokenTags'
lang
s
tokenTags
EN
txt
=
tokenTagsWith
EN
txt
corenlp
tokenTags
FR
txt
=
tokenTagsWith
FR
txt
JohnSnow
.
nlp
tokenTags
_
_
=
panic
"[G.C.T.T.Multi] NLP API not implemented yet"
tokenTags'
::
Lang
->
Text
->
IO
[[
TokenTag
]]
tokenTags'
lang
t
=
map
tokens2tokensTags
tokenTagsWith
::
Lang
->
Text
->
NLP_API
->
IO
[[
TokenTag
]]
tokenTagsWith
lang
txt
nlp
=
map
(
groupTokens
lang
)
<$>
map
tokens2tokensTags
<$>
map
_sentenceTokens
<$>
_sentences
<$>
corenlp
lang
t
<$>
nlp
lang
txt
---- | This function analyses and groups (or not) ngrams according to
---- specific grammars of each language.
...
...
src/Gargantext/Core/Text/Terms/Multi/Group.hs
View file @
14246fa5
...
...
@@ -23,7 +23,8 @@ import Gargantext.Prelude
group2
::
POS
->
POS
->
[
TokenTag
]
->
[
TokenTag
]
group2
p1
p2
(
x
@
(
TokenTag
_
_
(
Just
p1'
)
_
)
:
y
@
(
TokenTag
_
_
(
Just
p2'
)
_
)
:
z
)
=
if
(
p1
==
p1'
)
&&
(
p2
==
p2'
)
then
(
x
:
y
:
group2
p1
p2
(
x
<>
y
:
z
))
then
group2
p1
p2
(
x
<>
y
:
z
)
-- then (x : y : group2 p1 p2 (x<>y : z))
else
(
x
:
group2
p1
p2
(
y
:
z
))
group2
p1
p2
(
x
@
(
TokenTag
_
_
Nothing
_
)
:
y
)
=
(
x
:
group2
p1
p2
y
)
group2
_
_
[
x
@
(
TokenTag
_
_
(
Just
_
)
_
)]
=
[
x
]
...
...
src/Gargantext/Core/Types.hs
View file @
14246fa5
...
...
@@ -72,7 +72,7 @@ data POS = NP
|
JJ
|
VB
|
CC
|
IN
|
DT
|
ADV
|
No
Pos
|
No
tFound
{
not_found
::
[
Char
]
}
deriving
(
Show
,
Generic
,
Eq
,
Ord
)
------------------------------------------------------------------------
-- https://pythonprogramming.net/part-of-speech-tagging-nltk-tutorial/
...
...
@@ -82,30 +82,36 @@ instance FromJSON POS where
pos
::
[
Char
]
->
POS
pos
"ADJ"
=
JJ
pos
"CC"
=
CC
pos
"CCONJ"
=
CC
pos
"DT"
=
DT
pos
"DET"
=
DT
pos
"IN"
=
IN
pos
"JJ"
=
JJ
pos
"JJR"
=
JJ
pos
"JJS"
=
JJ
pos
"NC"
=
NP
pos
"NN"
=
NP
pos
"NOUN"
=
NP
pos
"NNS"
=
NP
pos
"NNP"
=
NP
pos
"NNPS"
=
NP
pos
"NP"
=
NP
pos
"VB"
=
VB
pos
"VERB"
=
VB
pos
"VBD"
=
VB
pos
"VBG"
=
VB
pos
"VBN"
=
VB
pos
"VBP"
=
VB
pos
"VBZ"
=
VB
pos
"RB"
=
ADV
pos
"ADV"
=
ADV
pos
"RBR"
=
ADV
pos
"RBS"
=
ADV
pos
"WRB"
=
ADV
-- French specific
pos
"P"
=
IN
pos
_
=
NoPos
pos
"PUNCT"
=
IN
pos
x
=
NotFound
x
instance
ToJSON
POS
instance
Hashable
POS
...
...
src/Gargantext/Utils/JohnSnowNLP.hs
View file @
14246fa5
{-|
Module : Gargantext.Utils.JohnSnow
NLP
Description :
PosTagging module using Stanford java REST API
Module : Gargantext.Utils.JohnSnow
Description :
John Snow NLP API connexion
Copyright : (c) CNRS, 2017
License : AGPL + CECILL v3
Maintainer : team@gargantext.org
...
...
@@ -188,7 +188,7 @@ getPosTagAndLems l t = do
jsPos
<-
waitForJsTask
jsPosTask
jsLemma
<-
waitForJsTask
jsLemmaTask
printDebug
"[getPosTagAndLems] sentences"
$
jsAsyncTaskResponseToSentences
jsPos
jsLemma
pure
$
PosSentences
[]
pure
$
jsAsyncTaskResponseToSentences
jsPos
jsLemma
nlp
::
Lang
->
Text
->
IO
PosSentences
nlp
=
getPosTagAndLems
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment