Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
199
Issues
199
List
Board
Labels
Milestones
Merge Requests
12
Merge Requests
12
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gargantext
haskell-gargantext
Commits
c6b1adf0
Commit
c6b1adf0
authored
Mar 04, 2019
by
Alexandre Delanoë
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[DEV] Metrics before list creation.
parent
57cd486c
Pipeline
#248
failed with stage
Changes
5
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
60 additions
and
22 deletions
+60
-22
Flow.hs
src/Gargantext/Database/Flow.hs
+33
-2
NgramsByNode.hs
src/Gargantext/Database/Metrics/NgramsByNode.hs
+2
-2
Prelude.hs
src/Gargantext/Prelude.hs
+21
-15
Text.hs
src/Gargantext/Text.hs
+3
-3
TFICF.hs
src/Gargantext/Text/Metrics/TFICF.hs
+1
-0
No files found.
src/Gargantext/Database/Flow.hs
View file @
c6b1adf0
...
@@ -85,6 +85,14 @@ flowCorpus userName ff fp corpusName = do
...
@@ -85,6 +85,14 @@ flowCorpus userName ff fp corpusName = do
-- Master Flow
-- Master Flow
docs
<-
map
addUniqIdsDoc
<$>
liftIO
(
parseDocs
ff
fp
)
docs
<-
map
addUniqIdsDoc
<$>
liftIO
(
parseDocs
ff
fp
)
-- ChunkAlong needed for big corpora
-- ChunkAlong needed for big corpora
-- TODO add LANG as parameter
-- TODO uniformize language of corpus
-- TODO ChunkAlong is not the right function here
-- chunkAlong 10 10 [1..15] == [1..10]
-- BUG: what about the rest (divMod 15 10)?
-- but if temporary enables big corpora insert for tests
-- TODO: chunkAlongNoRest or chunkAlongWithRest
-- default: NoRest
ids
<-
mapM
insertMasterDocs
$
chunkAlong
10000
10000
docs
ids
<-
mapM
insertMasterDocs
$
chunkAlong
10000
10000
docs
-- User Flow
-- User Flow
...
@@ -95,8 +103,10 @@ flowCorpus userName ff fp corpusName = do
...
@@ -95,8 +103,10 @@ flowCorpus userName ff fp corpusName = do
-- User List Flow
-- User List Flow
(
_masterUserId
,
_masterRootId
,
masterCorpusId
)
<-
getOrMkRootWithCorpus
userMaster
""
(
_masterUserId
,
_masterRootId
,
masterCorpusId
)
<-
getOrMkRootWithCorpus
userMaster
""
-- /!\ this extract NgramsTerms Only
-- /!\ this extract NgramsTerms Only
_ngs
<-
sortTficf
<$>
getTficf'
userCorpusId
masterCorpusId
(
ngramsGroup
EN
2
)
_ngs
<-
toTermList
(
isStopTerm
.
fst
)
<$>
sortTficf
printDebug
"tficf size ngs"
(
length
_ngs
)
<$>
getTficf'
userCorpusId
masterCorpusId
(
ngramsGroup
EN
2
)
--printDebug "tficf size ngs" (take 100 $ ngs)
-- TODO getNgramsElement of NgramsType...
-- TODO getNgramsElement of NgramsType...
ngs
<-
getNgramsElementsWithParentNodeId
masterCorpusId
ngs
<-
getNgramsElementsWithParentNodeId
masterCorpusId
...
@@ -347,7 +357,28 @@ ngrams2list' m = fromListWith (<>)
...
@@ -347,7 +357,28 @@ ngrams2list' m = fromListWith (<>)
]
]
------------------------------------------------------------------------
toTermList
::
(
a
->
Bool
)
->
[
a
]
->
[(
ListType
,
a
)]
toTermList
stop
ns
=
map
(
toTermList'
stop
CandidateTerm
)
xs
<>
map
(
toTermList'
stop
GraphTerm
)
ys
<>
map
(
toTermList'
stop
CandidateTerm
)
zs
where
toTermList'
stop'
l
n
=
case
stop'
n
of
True
->
(
StopTerm
,
n
)
False
->
(
l
,
n
)
-- TODO use % of size of list
-- TODO user ML
xs
=
take
a
ns
ys
=
take
b
$
drop
a
xs
zs
=
drop
b
ys
a
=
100
b
=
1000
isStopTerm
::
Text
->
Bool
isStopTerm
x
=
Text
.
length
x
<
3
------------------------------------------------------------------------
------------------------------------------------------------------------
...
...
src/Gargantext/Database/Metrics/NgramsByNode.hs
View file @
c6b1adf0
...
@@ -53,8 +53,8 @@ ngramsGroup l n = Text.intercalate " "
...
@@ -53,8 +53,8 @@ ngramsGroup l n = Text.intercalate " "
sortTficf
::
(
Map
Text
(
Double
,
Set
Text
))
sortTficf
::
(
Map
Text
(
Double
,
Set
Text
))
->
[(
Double
,
Set
Text
)]
->
[(
Text
,
(
Double
,
Set
Text
)
)]
sortTficf
=
List
.
sortOn
fst
.
elems
sortTficf
=
List
.
sortOn
(
fst
.
snd
)
.
toList
getTficf'
::
UserCorpusId
->
MasterCorpusId
->
(
Text
->
Text
)
getTficf'
::
UserCorpusId
->
MasterCorpusId
->
(
Text
->
Text
)
...
...
src/Gargantext/Prelude.hs
View file @
c6b1adf0
...
@@ -130,25 +130,31 @@ type Step = Int
...
@@ -130,25 +130,31 @@ type Step = Int
-- if step == grain then linearity
-- if step == grain then linearity
-- elif step < grain then overlapping
-- elif step < grain then overlapping
-- else dotted with holes
-- else dotted with holes
-- TODO FIX BUG if Steps*Grain /= length l
chunkAlong
::
Eq
a
=>
Grain
->
Step
->
[
a
]
->
[[
a
]]
chunkAlong
::
Eq
a
=>
Grain
->
Step
->
[
a
]
->
[[
a
]]
chunkAlong
a
b
l
=
case
a
>
0
&&
b
>
0
of
chunkAlong
a
b
l
=
case
a
>
=
length
l
of
True
->
chunkAlong_
a
b
l
True
->
[
l
]
False
->
panic
"ChunkAlong: Parameters should be > 0 and Grain > Step"
False
->
chunkAlong'
a
b
l
chunkAlong_
::
Eq
a
=>
Int
->
Int
->
[
a
]
->
[[
a
]]
chunkAlong'
::
Eq
a
=>
Grain
->
Step
->
[
a
]
->
[[
a
]]
chunkAlong_
a
b
l
=
filter
(
/=
[]
)
$
only
(
while
dropAlong
)
chunkAlong'
a
b
l
=
case
a
>
0
&&
b
>
0
of
where
True
->
chunkAlong''
a
b
l
only
=
map
(
take
a
)
False
->
panic
"ChunkAlong: Parameters should be > 0 and Grain > Step"
while
=
takeWhile
(
\
x
->
length
x
>=
a
)
dropAlong
=
L
.
scanl
(
\
x
_y
->
drop
b
x
)
l
([
1
..
]
::
[
Integer
])
chunkAlong''
::
Eq
a
=>
Int
->
Int
->
[
a
]
->
[[
a
]]
chunkAlong''
a
b
l
=
filter
(
/=
[]
)
$
only
(
while
dropAlong
)
where
only
=
map
(
take
a
)
while
=
takeWhile
(
\
x
->
length
x
>=
a
)
dropAlong
=
L
.
scanl
(
\
x
_y
->
drop
b
x
)
l
([
1
..
]
::
[
Integer
])
-- | Optimized version (Vector)
-- | Optimized version (Vector)
chunkAlong
'
::
Int
->
Int
->
V
.
Vector
a
->
V
.
Vector
(
V
.
Vector
a
)
chunkAlong
V
::
Int
->
Int
->
V
.
Vector
a
->
V
.
Vector
(
V
.
Vector
a
)
chunkAlong
'
a
b
l
=
only
(
while
dropAlong
)
chunkAlong
V
a
b
l
=
only
(
while
dropAlong
)
where
where
only
=
V
.
map
(
V
.
take
a
)
only
=
V
.
map
(
V
.
take
a
)
while
=
V
.
takeWhile
(
\
x
->
V
.
length
x
>=
a
)
while
=
V
.
takeWhile
(
\
x
->
V
.
length
x
>=
a
)
dropAlong
=
V
.
scanl
(
\
x
_y
->
V
.
drop
b
x
)
l
(
V
.
fromList
[
1
..
])
dropAlong
=
V
.
scanl
(
\
x
_y
->
V
.
drop
b
x
)
l
(
V
.
fromList
[
1
..
])
-- | TODO Inverse of chunk ? unchunkAlong ?
-- | TODO Inverse of chunk ? unchunkAlong ?
-- unchunkAlong :: Int -> Int -> [[a]] -> [a]
-- unchunkAlong :: Int -> Int -> [[a]] -> [a]
...
...
src/Gargantext/Text.hs
View file @
c6b1adf0
...
@@ -37,10 +37,10 @@ sentences :: Text -> [Text]
...
@@ -37,10 +37,10 @@ sentences :: Text -> [Text]
sentences
txt
=
map
DT
.
pack
$
segment
$
DT
.
unpack
txt
sentences
txt
=
map
DT
.
pack
$
segment
$
DT
.
unpack
txt
sentences'
::
Text
->
[
Text
]
sentences'
::
Text
->
[
Text
]
sentences'
txt
=
split
isStop
txt
sentences'
txt
=
split
is
Char
Stop
txt
isStop
::
Char
->
Bool
is
Char
Stop
::
Char
->
Bool
isStop
c
=
c
`
elem
`
[
'.'
,
'?'
,
'!'
]
is
Char
Stop
c
=
c
`
elem
`
[
'.'
,
'?'
,
'!'
]
unsentences
::
[
Text
]
->
Text
unsentences
::
[
Text
]
->
Text
unsentences
txts
=
DT
.
intercalate
" "
txts
unsentences
txts
=
DT
.
intercalate
" "
txts
...
...
src/Gargantext/Text/Metrics/TFICF.hs
View file @
c6b1adf0
...
@@ -14,6 +14,7 @@ TFICF is a generalization of [TFIDF](https://en.wikipedia.org/wiki/Tf%E2%80%93id
...
@@ -14,6 +14,7 @@ TFICF is a generalization of [TFIDF](https://en.wikipedia.org/wiki/Tf%E2%80%93id
-}
-}
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}
module
Gargantext.Text.Metrics.TFICF
(
TFICF
module
Gargantext.Text.Metrics.TFICF
(
TFICF
,
TficfContext
(
..
)
,
TficfContext
(
..
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment