Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Julien Moutinho
haskell-gargantext
Commits
a630946f
Commit
a630946f
authored
Jun 09, 2018
by
Alexandre Delanoë
Browse files
Options
Browse Files
Download
Plain Diff
Merge branch 'pipeline'
parents
1ddff49f
05848890
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
17 additions
and
8 deletions
+17
-8
package.yaml
package.yaml
+1
-0
Metrics.hs
src/Gargantext/Text/Metrics.hs
+14
-8
stack.yaml
stack.yaml
+2
-0
No files found.
package.yaml
View file @
a630946f
...
...
@@ -68,6 +68,7 @@ library:
-
hlcm
-
ini
-
jose-jwt
-
kmeans-vector
-
lens
-
logging-effect
-
matrix
...
...
src/Gargantext/Text/Metrics.hs
View file @
a630946f
...
...
@@ -24,15 +24,17 @@ module Gargantext.Text.Metrics
import
Data.Text
(
Text
,
pack
)
import
Data.Map
(
Map
)
import
qualified
Data.List
as
L
import
qualified
Data.Map
as
M
import
qualified
Data.Set
as
S
import
qualified
Data.Text
as
T
import
qualified
Data.Vector
as
V
import
qualified
Data.Vector.Unboxed
as
VU
import
Data.Tuple.Extra
(
both
)
--import GHC.Real (Ratio)
--import qualified Data.Text.Metrics as DTM
import
Data.Array.Accelerate
(
toList
)
import
Math.KMeans
(
kmeans
,
euclidSq
,
elements
)
import
Gargantext.Prelude
...
...
@@ -61,17 +63,21 @@ import GHC.Real (round)
type
ListSize
=
Int
type
BinSize
=
Double
-- Map list creation
-- Kmean split into 2 main clusters with Inclusion/Exclusion (relevance score)
-- Sample the main cluster ordered by specificity/genericity in s parts
-- each parts is then ordered by Inclusion/Exclusion
-- take n scored terms in each parts where n * s = l
takeSome
::
Ord
t
=>
ListSize
->
BinSize
->
[
Scored
t
]
->
[
Scored
t
]
takeSome
l
s
scores
=
L
.
take
l
$
takeSample
n
m
$
takeKmeans
l'
$
L
.
reverse
$
L
.
sortOn
_scored_incExc
scores
$
splitKmeans
2
scores
where
--
TODO : KMEAN split into 2 main clusters
-- (advice: use accelerate-example kmeans version
-- and maybe benchmark it to be sure)
takeKmeans
=
L
.
take
l'
=
4000
--
(TODO: benchmark with accelerate-example kmeans version)
splitKmeans
x
xs
=
elements
$
V
.
head
$
kmeans
(
\
i
->
VU
.
fromList
([(
_scored_incExc
i
::
Double
)]))
euclidSq
x
xs
n
=
round
((
fromIntegral
l
)
/
s
)
m
=
round
$
(
fromIntegral
$
length
scores
)
/
(
s
)
takeSample
n
m
xs
=
L
.
concat
$
map
(
L
.
take
n
)
...
...
stack.yaml
View file @
a630946f
...
...
@@ -23,6 +23,8 @@ extra-deps:
-
fullstop-0.1.4
-
haskell-src-exts-1.18.2
-
http-types-0.12.1
-
kmeans-vector-0.3.2
-
probable-0.1.3
-
protolude-0.2
-
servant-0.13
-
servant-auth-0.3.0.1
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment