Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
145
Issues
145
List
Board
Labels
Milestones
Merge Requests
6
Merge Requests
6
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gargantext
haskell-gargantext
Commits
7d6d74c4
Commit
7d6d74c4
authored
Jun 21, 2019
by
Alexandre Delanoë
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[REFACTO] ngrams unsupervised.
parent
0b09d71e
Pipeline
#487
failed with stage
Changes
3
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
15 additions
and
9 deletions
+15
-9
Main.hs
bin/gargantext-import/Main.hs
+1
-1
Flow.hs
src/Gargantext/Database/Flow.hs
+1
-1
Terms.hs
src/Gargantext/Text/Terms.hs
+13
-7
No files found.
bin/gargantext-import/Main.hs
View file @
7d6d74c4
...
...
@@ -46,7 +46,7 @@ main = do
createUsers
=
insertUsersDemo
let
cmd
::
forall
m
.
FlowCmdM
DevEnv
ServantErr
m
=>
m
CorpusId
cmd
=
flowCorpusFile
(
cs
user
)
(
cs
name
)
(
read
limit
::
Int
)
(
Unsupervised
EN
5
Nothing
)
CsvHalFormat
corpusPath
cmd
=
flowCorpusFile
(
cs
user
)
(
cs
name
)
(
read
limit
::
Int
)
(
Unsupervised
EN
5
1
Nothing
)
CsvHalFormat
corpusPath
{-
let debatCorpus :: forall m. FlowCmdM DevEnv ServantErr m => m CorpusId
debatCorpus = do
...
...
src/Gargantext/Database/Flow.hs
View file @
7d6d74c4
...
...
@@ -190,7 +190,7 @@ insertMasterDocs c lang hs = do
let
documentsWithId
=
mergeData
(
toInserted
ids
)
(
Map
.
fromList
$
map
viewUniqId'
hs'
)
let
fixLang
(
Unsupervised
l
n
m
)
=
Unsupervised
l
n
m'
fixLang
(
Unsupervised
l
n
s
m
)
=
Unsupervised
l
n
s
m'
where
m'
=
case
m
of
Nothing
->
Just
$
buildTries
n
(
fmap
toToken
$
uniText
$
Text
.
intercalate
" "
$
List
.
concat
$
map
hasText
documentsWithId
)
...
...
src/Gargantext/Text/Terms.hs
View file @
7d6d74c4
...
...
@@ -59,7 +59,8 @@ data TermType lang
|
Multi
{
_tt_lang
::
lang
}
|
MonoMulti
{
_tt_lang
::
lang
}
|
Unsupervised
{
_tt_lang
::
lang
,
_tt_size
::
Int
,
_tt_windoSize
::
Int
,
_tt_ngramsSize
::
Int
,
_tt_model
::
Maybe
(
Tries
Token
()
)
}
makeLenses
''
T
ermType
...
...
@@ -74,7 +75,7 @@ makeLenses ''TermType
--extractTerms :: Traversable t => TermType Lang -> t Text -> IO (t [Terms])
extractTerms
::
TermType
Lang
->
[
Text
]
->
IO
[[
Terms
]]
extractTerms
(
Unsupervised
l
n
m
)
xs
=
mapM
(
terms
(
Unsupervised
l
n
(
Just
m'
)))
xs
extractTerms
(
Unsupervised
l
n
s
m
)
xs
=
mapM
(
terms
(
Unsupervised
l
n
s
(
Just
m'
)))
xs
where
m'
=
case
m
of
Just
m''
->
m''
...
...
@@ -94,7 +95,7 @@ terms :: TermType Lang -> Text -> IO [Terms]
terms
(
Mono
lang
)
txt
=
pure
$
monoTerms
lang
txt
terms
(
Multi
lang
)
txt
=
multiterms
lang
txt
terms
(
MonoMulti
lang
)
txt
=
terms
(
Multi
lang
)
txt
terms
(
Unsupervised
lang
n
m
)
txt
=
termsUnsupervised
m'
n
lang
txt
terms
(
Unsupervised
lang
n
s
m
)
txt
=
termsUnsupervised
(
Unsupervised
lang
n
s
(
Just
m'
))
txt
where
m'
=
maybe
(
newTries
n
txt
)
identity
m
-- terms (WithList list) txt = pure . concat $ extractTermsWithList list txt
...
...
@@ -112,15 +113,20 @@ isPunctuation x = List.elem x $ (Text.pack . pure)
-- language agnostic extraction
-- TODO: remove IO
-- TODO: newtype BlockText
termsUnsupervised
::
Tries
Token
()
->
Int
->
Lang
->
Text
->
IO
[
Terms
]
termsUnsupervised
m
n
l
=
type
WindowSize
=
Int
type
MinNgramSize
=
Int
termsUnsupervised
::
TermType
Lang
->
Text
->
IO
[
Terms
]
termsUnsupervised
(
Unsupervised
l
n
s
m
)
=
pure
.
map
(
text2term
l
)
.
List
.
nub
.
(
List
.
filter
(
\
l'
->
List
.
length
l'
>
1
))
.
(
List
.
filter
(
\
l'
->
List
.
length
l'
>
s
))
.
List
.
concat
.
mainEleveWith
m
n
.
mainEleveWith
(
maybe
(
panic
"no model"
)
identity
m
)
n
.
uniText
termsUnsupervised
_
=
undefined
newTries
::
Int
->
Text
->
Tries
Token
()
newTries
n
t
=
buildTries
n
(
fmap
toToken
$
uniText
t
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment