Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
H
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Przemyslaw Kaminski
haskell-gargantext
Commits
07305554
Commit
07305554
authored
Mar 10, 2019
by
Alexandre Delanoë
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[FIX] Split Corpus size at import.
parent
b074d137
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
16 additions
and
12 deletions
+16
-12
Flow.hs
src/Gargantext/Database/Flow.hs
+3
-8
Prelude.hs
src/Gargantext/Prelude.hs
+6
-1
Metrics.hs
src/Gargantext/Text/Metrics.hs
+7
-3
No files found.
src/Gargantext/Database/Flow.hs
View file @
07305554
...
@@ -87,8 +87,8 @@ flowCorpusSearchInDatabase :: FlowCmdM env ServantErr m
...
@@ -87,8 +87,8 @@ flowCorpusSearchInDatabase :: FlowCmdM env ServantErr m
=>
Username
->
Text
->
m
CorpusId
=>
Username
->
Text
->
m
CorpusId
flowCorpusSearchInDatabase
u
q
=
do
flowCorpusSearchInDatabase
u
q
=
do
(
_masterUserId
,
_masterRootId
,
cId
)
<-
getOrMkRootWithCorpus
userMaster
""
(
_masterUserId
,
_masterRootId
,
cId
)
<-
getOrMkRootWithCorpus
userMaster
""
ids
<-
chunkAlong
10000
10000
<$>
map
fst
<$>
searchInDatabase
cId
(
stemIt
q
)
ids
<-
map
fst
<$>
searchInDatabase
cId
(
stemIt
q
)
flowCorpusUser
u
q
ids
flowCorpusUser
u
q
[
ids
]
flowCorpusMaster
::
FlowCmdM
env
ServantErr
m
=>
FileFormat
->
FilePath
->
m
[[
NodeId
]]
flowCorpusMaster
::
FlowCmdM
env
ServantErr
m
=>
FileFormat
->
FilePath
->
m
[[
NodeId
]]
...
@@ -99,12 +99,7 @@ flowCorpusMaster ff fp = do
...
@@ -99,12 +99,7 @@ flowCorpusMaster ff fp = do
-- ChunkAlong needed for big corpora
-- ChunkAlong needed for big corpora
-- TODO add LANG as parameter
-- TODO add LANG as parameter
-- TODO uniformize language of corpus
-- TODO uniformize language of corpus
-- TODO ChunkAlong is not the right function here
ids
<-
mapM
insertMasterDocs
$
splitEvery
10000
docs
-- chunkAlong 10 10 [1..15] == [1..10]
-- BUG: what about the rest of (divMod 15 10)?
-- TODO: chunkAlongNoRest or chunkAlongWithRest
-- default behavior: NoRest
ids
<-
mapM
insertMasterDocs
$
chunkAlong
10000
10000
docs
pure
ids
pure
ids
...
...
src/Gargantext/Prelude.hs
View file @
07305554
...
@@ -128,10 +128,15 @@ type Grain = Int
...
@@ -128,10 +128,15 @@ type Grain = Int
type
Step
=
Int
type
Step
=
Int
-- | Function to split a range into chunks
-- | Function to split a range into chunks
-- if step == grain then linearity
-- if step == grain then linearity
(splitEvery)
-- elif step < grain then overlapping
-- elif step < grain then overlapping
-- else dotted with holes
-- else dotted with holes
-- TODO FIX BUG if Steps*Grain /= length l
-- TODO FIX BUG if Steps*Grain /= length l
-- chunkAlong 10 10 [1..15] == [1..10]
-- BUG: what about the rest of (divMod 15 10)?
-- TODO: chunkAlongNoRest or chunkAlongWithRest
-- default behavior: NoRest
chunkAlong
::
Eq
a
=>
Grain
->
Step
->
[
a
]
->
[[
a
]]
chunkAlong
::
Eq
a
=>
Grain
->
Step
->
[
a
]
->
[[
a
]]
chunkAlong
a
b
l
=
case
a
>=
length
l
of
chunkAlong
a
b
l
=
case
a
>=
length
l
of
True
->
[
l
]
True
->
[
l
]
...
...
src/Gargantext/Text/Metrics.hs
View file @
07305554
...
@@ -46,13 +46,16 @@ data FilterConfig = FilterConfig
...
@@ -46,13 +46,16 @@ data FilterConfig = FilterConfig
,
fc_defaultValue
::
DefaultValue
,
fc_defaultValue
::
DefaultValue
}
}
filterCooc
::
(
Show
t
,
Ord
t
)
=>
FilterConfig
->
Map
(
t
,
t
)
Int
->
Map
(
t
,
t
)
Int
filterCooc
::
(
Show
t
,
Ord
t
)
=>
FilterConfig
->
Map
(
t
,
t
)
Int
->
Map
(
t
,
t
)
Int
filterCooc
fc
cc
=
(
filterCooc'
fc
)
ts
cc
filterCooc
fc
cc
=
(
filterCooc'
fc
)
ts
cc
where
where
ts
=
map
_scored_terms
$
takeSome
fc
$
coocScored
cc
ts
=
map
_scored_terms
$
takeSome
fc
$
coocScored
cc
filterCooc'
::
(
Show
t
,
Ord
t
)
=>
FilterConfig
->
[
t
]
->
Map
(
t
,
t
)
Int
->
Map
(
t
,
t
)
Int
filterCooc'
::
(
Show
t
,
Ord
t
)
=>
FilterConfig
->
[
t
]
->
Map
(
t
,
t
)
Int
->
Map
(
t
,
t
)
Int
filterCooc'
(
FilterConfig
_
_
_
_
(
DefaultValue
dv
))
ts
m
=
filterCooc'
(
FilterConfig
_
_
_
_
(
DefaultValue
dv
))
ts
m
=
-- trace ("coocScored " <> show ts) $
-- trace ("coocScored " <> show ts) $
foldl'
(
\
m'
k
->
M
.
insert
k
(
maybe
dv
identity
$
M
.
lookup
k
m
)
m'
)
foldl'
(
\
m'
k
->
M
.
insert
k
(
maybe
dv
identity
$
M
.
lookup
k
m
)
m'
)
...
@@ -64,7 +67,8 @@ filterCooc' (FilterConfig _ _ _ _ (DefaultValue dv)) ts m =
...
@@ -64,7 +67,8 @@ filterCooc' (FilterConfig _ _ _ _ (DefaultValue dv)) ts m =
-- Sample the main cluster ordered by specificity/genericity in (SampleBins::Double) parts
-- Sample the main cluster ordered by specificity/genericity in (SampleBins::Double) parts
-- each parts is then ordered by Inclusion/Exclusion
-- each parts is then ordered by Inclusion/Exclusion
-- take n scored terms in each parts where n * SampleBins = MapListSize.
-- take n scored terms in each parts where n * SampleBins = MapListSize.
takeSome
::
Ord
t
=>
FilterConfig
->
[
Scored
t
]
->
[
Scored
t
]
takeSome
::
Ord
t
=>
FilterConfig
->
[
Scored
t
]
->
[
Scored
t
]
takeSome
(
FilterConfig
(
MapListSize
l
)
(
InclusionSize
l'
)
(
SampleBins
s
)
(
Clusters
_
)
_
)
scores
=
L
.
take
l
takeSome
(
FilterConfig
(
MapListSize
l
)
(
InclusionSize
l'
)
(
SampleBins
s
)
(
Clusters
_
)
_
)
scores
=
L
.
take
l
$
takeSample
n
m
$
takeSample
n
m
$
L
.
take
l'
$
L
.
take
l'
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment