Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
142
Issues
142
List
Board
Labels
Milestones
Merge Requests
8
Merge Requests
8
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gargantext
haskell-gargantext
Commits
7c074fc8
Commit
7c074fc8
authored
Sep 04, 2024
by
Alexandre Delanoë
1
Browse files
Options
Browse Files
Download
Plain Diff
Merge remote-tracking branch 'origin/fix/386' into dev
parents
294ed193
e53d4b86
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
5 additions
and
7 deletions
+5
-7
Multi.hs
src/Gargantext/Core/Text/Terms/Multi.hs
+3
-5
PosTagging.hs
src/Gargantext/Core/Text/Terms/Multi/PosTagging.hs
+2
-2
No files found.
src/Gargantext/Core/Text/Terms/Multi.hs
View file @
7c074fc8
...
...
@@ -11,10 +11,10 @@ Multi-terms are ngrams where n > 1.
-}
module
Gargantext.Core.Text.Terms.Multi
(
multiterms
,
multiterms_rake
,
tokenTagsWith
,
tokenTags
,
cleanTextForNLP
)
module
Gargantext.Core.Text.Terms.Multi
(
multiterms
,
Terms
(
..
),
tokenTag2terms
,
multiterms_rake
,
tokenTagsWith
,
tokenTags
,
cleanTextForNLP
)
where
import
Data.Attoparsec.Text
as
DAT
(
digit
,
space
,
notChar
,
string
)
import
Data.Attoparsec.Text
as
DAT
(
space
,
notChar
,
string
)
import
Gargantext.Core
(
Lang
(
..
),
NLPServerConfig
(
..
),
PosTagAlgo
(
..
))
import
Gargantext.Core.Text.Terms.Multi.Lang.En
qualified
as
En
import
Gargantext.Core.Text.Terms.Multi.Lang.Fr
qualified
as
Fr
...
...
@@ -82,12 +82,10 @@ groupTokens _ = Fr.groupTokens
-- TODO: make tests here
cleanTextForNLP
::
Text
->
Text
cleanTextForNLP
=
unifySpaces
.
remove
DigitsWith
"-"
.
remove
Urls
cleanTextForNLP
=
unifySpaces
.
removeUrls
where
remove
x
=
RAT
.
streamEdit
x
(
const
""
)
unifySpaces
=
RAT
.
streamEdit
(
many
DAT
.
space
)
(
const
" "
)
removeDigitsWith
x
=
remove
(
many
DAT
.
digit
*>
DAT
.
string
x
<*
many
DAT
.
digit
)
removeUrls
=
removeUrlsWith
"http"
.
removeUrlsWith
"www"
removeUrlsWith
w
=
remove
(
DAT
.
string
w
*>
many
(
DAT
.
notChar
' '
)
<*
many
DAT
.
space
)
src/Gargantext/Core/Text/Terms/Multi/PosTagging.hs
View file @
7c074fc8
...
...
@@ -82,7 +82,7 @@ corenlp' :: ( FromJSON a
=>
URI
->
Lang
->
p
->
IO
(
Response
a
)
corenlp'
uri
lang
txt
=
do
req
<-
parseRequest
$
"POST "
<>
show
(
uri
{
uriQuery
=
"?properties="
<>
(
BSL
.
unpack
$
encode
$
toJSON
$
Map
.
fromList
properties
)
})
"POST "
<>
show
(
uri
{
uriQuery
=
"?properties="
<>
BSL
.
unpack
(
encode
$
toJSON
$
Map
.
fromList
properties
)
})
-- curl -XPOST 'http://localhost:9000/?properties=%7B%22annotators%22:%20%22tokenize,ssplit,pos,ner%22,%20%22outputFormat%22:%20%22json%22%7D' -d 'hello world, hello' | jq .
-- printDebug "[corenlp] sending body" $ (cs txt :: ByteString)
catch
(
httpJSON
$
setRequestBodyLBS
(
cs
txt
)
req
)
$
\
e
->
...
...
@@ -97,7 +97,7 @@ corenlp' uri lang txt = do
properties_
::
[(
Text
,
Text
)]
properties_
=
case
lang
of
-- TODO: Add: Aeson.encode $ Aeson.toJSON $ Map.fromList [()] instead of these hardcoded JSON strings
EN
->
[
(
"annotators"
,
"tokenize,ssplit,pos,ner"
)
]
EN
->
[
(
"annotators"
,
"tokenize,ssplit,pos,ner"
)
,
(
"tokenize.options"
,
"splitHyphenated=false"
)
]
FR
->
[
(
"annotators"
,
"tokenize,ssplit,pos,lemma,ner"
)
-- , ("parse.model", "edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz")
,
(
"pos.model"
,
"edu/stanford/nlp/models/pos-tagger/models/french.tagger"
)
...
...
Przemyslaw Kaminski
@cgenie
mentioned in commit
5660aec0
·
Oct 08, 2024
mentioned in commit
5660aec0
mentioned in commit 5660aec07ec5a0a0a5468f440092c1a8f57a864e
Toggle commit list
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment