Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Christian Merten
haskell-gargantext
Commits
c0149016
Verified
Commit
c0149016
authored
Apr 14, 2023
by
Przemyslaw Kaminski
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[nlp] add sample support for languages to corenlp
parent
29aee119
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
45 additions
and
14 deletions
+45
-14
gargantext.cabal
gargantext.cabal
+1
-1
Core.hs
src/Gargantext/Core.hs
+5
-5
PosTagging.hs
src/Gargantext/Core/Text/Terms/Multi/PosTagging.hs
+39
-8
No files found.
gargantext.cabal
View file @
c0149016
...
...
@@ -5,7 +5,7 @@ cabal-version: 1.12
-- see: https://github.com/sol/hpack
name: gargantext
version: 0.0.6.9.8.6.2
version:
0.0.6.9.8.6.2
synopsis: Search, map, share
description: Please see README.md
category: Data
...
...
src/Gargantext/Core.hs
View file @
c0149016
...
...
@@ -28,13 +28,13 @@ import Servant.API
-- | Language of a Text
-- For simplicity, we suppose text has an homogenous language
--
-- Next steps: | DE | IT | SP
--
-- - EN == english
-- - FR == french
-- - DE == deutch (not implemented yet)
-- - IT == italian (not implemented yet)
-- - SP == spanish (not implemented yet)
-- - DE == deutch
-- - IT == italian
-- - ES == spanish
-- - PL == polish
-- - CN == chinese
--
-- ... add your language and help us to implement it (:
...
...
src/Gargantext/Core/Text/Terms/Multi/PosTagging.hs
View file @
c0149016
...
...
@@ -27,6 +27,8 @@ module Gargantext.Core.Text.Terms.Multi.PosTagging
import
Data.Aeson
import
Data.ByteString.Lazy.Internal
(
ByteString
)
import
qualified
Data.ByteString.Lazy.Char8
as
BSL
import
qualified
Data.Map
as
Map
import
Data.Set
(
fromList
)
import
Data.Text
(
Text
,
splitOn
,
pack
,
toLower
)
import
Gargantext.Core
(
Lang
(
..
))
...
...
@@ -79,14 +81,43 @@ corenlp' :: ( FromJSON a
)
=>
URI
->
Lang
->
p
->
IO
(
Response
a
)
corenlp'
uri
lang
txt
=
do
let
properties
=
case
lang
of
EN
->
"{
\"
annotators
\"
:
\"
tokenize,ssplit,pos,ner
\"
,
\"
outputFormat
\"
:
\"
json
\"
}"
FR
->
"{
\"
annotators
\"
:
\"
tokenize,ssplit,pos,lemma,ner
\"
,
\"
parse.model
\"
:
\"
edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz
\"
,
\"
pos.model
\"
:
\"
edu/stanford/nlp/models/pos-tagger/french/french.tagger
\"
,
\"
tokenize.language
\"
:
\"
fr
\"
,
\"
outputFormat
\"
:
\"
json
\"
}"
_
->
panic
$
pack
"not implemented yet"
req
<-
parseRequest
$
"POST "
<>
show
(
uri
{
uriQuery
=
"?properties="
<>
properties
})
-- curl -XPOST 'http://localhost:9000/?properties=%7B%22annotators%22:%20%22tokenize,ssplit,pos,ner%22,%20%22outputFormat%22:%20%22json%22%7D' -d 'hello world, hello' | jq .
let
request
=
setRequestBodyLBS
(
cs
txt
)
req
httpJSON
request
req
<-
parseRequest
$
"POST "
<>
show
(
uri
{
uriQuery
=
"?properties="
<>
(
BSL
.
unpack
$
encode
$
toJSON
$
Map
.
fromList
properties
)
})
-- curl -XPOST 'http://localhost:9000/?properties=%7B%22annotators%22:%20%22tokenize,ssplit,pos,ner%22,%20%22outputFormat%22:%20%22json%22%7D' -d 'hello world, hello' | jq .
httpJSON
$
setRequestBodyLBS
(
cs
txt
)
req
where
properties_
::
[(
Text
,
Text
)]
properties_
=
case
lang
of
-- TODO: Add: Aeson.encode $ Aeson.toJSON $ Map.fromList [()] instead of these hardcoded JSON strings
EN
->
[
(
"annotators"
,
"tokenize,ssplit,pos,ner"
)
]
FR
->
[
(
"annotators"
,
"tokenize,ssplit,pos,lemma,ner"
)
-- , ("parse.model", "edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz")
,
(
"pos.model"
,
"edu/stanford/nlp/models/pos-tagger/french/french.tagger"
)
,
(
"tokenize.language"
,
"fr"
)
]
DE
->
[
(
"annotators"
,
"tokenize,ssplit,pos,lemma,ner"
)
-- , ("parse.model", "edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz")
,
(
"pos.model"
,
"edu/stanford/nlp/models/pos-tagger/french/german-hgc.tagger"
)
,
(
"tokenize.language"
,
"de"
)
]
ES
->
[
(
"annotators"
,
"tokenize,ssplit,pos,lemma,ner"
)
-- , ("parse.model", "edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz")
,
(
"pos.model"
,
"edu/stanford/nlp/models/pos-tagger/french/spanish.tagger"
)
,
(
"tokenize.language"
,
"es"
)
]
IT
->
[
(
"annotators"
,
"tokenize,ssplit,pos,lemma,ner"
)
-- , ("parse.model", "edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz")
-- , ("pos.model", "edu/stanford/nlp/models/pos-tagger/french/french.tagger")
,
(
"tokenize.language"
,
"it"
)
]
PL
->
[
(
"annotators"
,
"tokenize,ssplit,pos,lemma,ner"
)
-- , ("parse.model", "edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz")
-- , ("pos.model", "edu/stanford/nlp/models/pos-tagger/french/french.tagger")
,
(
"tokenize.language"
,
"pl"
)
]
CN
->
[
(
"annotators"
,
"tokenize,ssplit,pos,lemma,ner"
)
-- , ("parse.model", "edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz")
,
(
"pos.model"
,
"edu/stanford/nlp/models/pos-tagger/french/chinese-distsim.tagger"
)
,
(
"tokenize.language"
,
"zh"
)
]
l
->
panic
$
pack
$
"corenlp for language "
<>
show
l
<>
" is not implemented yet"
properties
=
properties_
<>
[
(
"outputFormat"
,
"json"
)
]
corenlp
::
URI
->
Lang
->
Text
->
IO
PosSentences
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment