Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
199
Issues
199
List
Board
Labels
Milestones
Merge Requests
12
Merge Requests
12
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gargantext
haskell-gargantext
Commits
304c9011
Unverified
Commit
304c9011
authored
May 23, 2019
by
Nicolas Pouillard
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Eleve refactor
parent
a6cc86a9
Pipeline
#405
canceled with stage
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
77 additions
and
63 deletions
+77
-63
Eleve.hs
src/Gargantext/Text/Eleve.hs
+77
-63
No files found.
src/Gargantext/Text/Eleve.hs
View file @
304c9011
...
...
@@ -10,89 +10,103 @@ module Gargantext.Text.Eleve where
import
Data.Ord
(
Ord
)
import
qualified
Data.List
as
L
ist
import
qualified
Data.List
as
L
import
Data.Monoid
import
Data.Text
hiding
(
map
)
import
Data.Text
(
Text
)
import
qualified
Data.Text
as
T
import
Data.Map
(
Map
)
import
Data.Maybe
(
fromMaybe
)
import
qualified
Data.Map
as
Map
import
Gargantext.Prelude
-- prop (Noeud c _e f) = c == Map.size f
-- TODO remove Feuille
-- prop (Node c _e f) = c == Map.size f
-- TODO remove Leaf
-- NP: I think Leaf is an optimisation (less data, a tiny bit more code and time)
example
::
[[
T
erminal
]]
example
=
map
t
erminal
example
::
[[
T
oken
]]
example
=
map
t
oken
$
chunkAlong
3
1
$
words
"New York and New York is a big apple"
$
T
.
words
"New York and New York is a big apple"
data
T
erminal
=
Terminal
Text
|
Fin
data
T
oken
=
Non
Terminal
Text
|
Fin
deriving
(
Ord
,
Eq
,
Show
)
isFin
::
T
erminal
->
Bool
isFin
::
T
oken
->
Bool
isFin
x
=
case
x
of
Fin
->
True
_
->
False
terminal
::
[
Text
]
->
[
Terminal
]
terminal
xs
=
(
map
Terminal
xs
)
<>
[
Fin
]
data
Arbre
k
e
=
Noeud
{
_noeud_count
::
Double
,
_noeud_entropy
::
e
,
_noeud_fils
::
Map
k
(
Arbre
k
e
)
}
|
Feuille
{
_noeud_count
::
Double
}
deriving
(
Show
)
arbreVide
::
Arbre
k
e
arbreVide
=
Feuille
0
mkArbre
::
Monoid
e
=>
Double
->
Map
Terminal
(
Arbre
Terminal
e
)
->
Arbre
Terminal
e
mkArbre
c
fils
|
Map
.
null
fils
=
Feuille
c
|
otherwise
=
Noeud
c
mempty
fils
insertArbre
::
[
Terminal
]
->
Arbre
Terminal
()
->
Arbre
Terminal
()
insertArbre
[]
n
=
n
insertArbre
(
x
:
xs
)
(
Feuille
c
)
=
mkArbre
(
c
+
1
)
(
Map
.
singleton
x
$
insertArbre
xs
arbreVide
)
insertArbre
(
x
:
xs
)
(
Noeud
c
_e
f
)
=
mkArbre
(
c
+
1
)
(
case
Map
.
lookup
x
f
of
Nothing
->
Map
.
insert
x
(
insertArbre
xs
arbreVide
)
f
Just
arbre
->
Map
.
insert
x
(
insertArbre
xs
arbre
)
f
)
Fin
->
True
_
->
False
token
::
[
Text
]
->
[
Token
]
token
xs
=
(
NonTerminal
<$>
xs
)
<>
[
Fin
]
data
Trie
k
e
=
Node
{
_node_count
::
Int
,
_node_entropy
::
e
,
_node_children
::
Map
k
(
Trie
k
e
)
}
-- | Leaf { _node_count :: Int }
deriving
(
Show
)
-- emptyTrie :: Trie k e
-- emptyTrie = Leaf 0
emptyTrie
::
(
Ord
k
,
Monoid
e
)
=>
Trie
k
e
emptyTrie
=
Node
0
mempty
mempty
mkTrie
::
Monoid
e
=>
Int
->
Map
k
(
Trie
k
e
)
->
Trie
k
e
mkTrie
c
children
{-| Map.null children = Leaf c
| otherwise -}
=
Node
c
mempty
children
insertTrie
::
Ord
k
=>
[
k
]
->
Trie
k
()
->
Trie
k
()
insertTrie
[]
n
=
n
-- insertTrie (x:xs) (Leaf c) = mkTrie (c+1) (Map.singleton x $ insertTrie xs emptyTrie)
insertTrie
(
x
:
xs
)
(
Node
c
_e
children
)
=
mkTrie
(
c
+
1
)
$
Map
.
alter
f
x
children
where
f
=
Just
.
insertTrie
xs
.
fromMaybe
emptyTrie
insert
Arbres
::
[[
Terminal
]]
->
Arbre
Terminal
()
insert
Arbres
=
List
.
foldr
insertArbre
arbreVid
e
insert
Tries
::
Ord
k
=>
[[
k
]]
->
Trie
k
()
insert
Tries
=
L
.
foldr
insertTrie
emptyTri
e
entropy
Arbre
::
Arbre
Terminal
()
->
Arbre
Terminal
Double
entropyArbre
(
Feuille
c
)
=
Feuille
c
entropy
Arbre
(
Noeud
c
_e
fils
)
=
(
Noeud
c
e
(
map
entropyArbre
fils
)
)
entropy
Trie
::
(
k
->
Bool
)
->
Trie
k
()
->
Trie
k
Double
-- entropyTrie _ (Leaf c) = Leaf
c
entropy
Trie
pred
(
Node
c
_e
children
)
=
Node
c
e
(
entropyTrie
pred
<$>
children
)
where
e
=
sum
$
map
(
\
(
k
,
f
)
->
case
isFin
k
of
True
->
(
_noeud_count
f
)
/
c
*
log
c
False
->
-
c'
*
log
c'
where
c'
=
(
_noeud_count
f
)
/
c
)
$
Map
.
toList
fils
normalizeArbre
::
Arbre
Terminal
Double
->
Arbre
Terminal
Double
normalizeArbre
(
Feuille
c
)
=
Feuille
c
normalizeArbre
(
Noeud
c
e
f
)
=
Noeud
c
e
(
Map
.
map
(
\
a
->
normalizeLevel
a
$
Map
.
elems
f
)
f
)
normalizeLevel
::
Arbre
Terminal
Double
->
[
Arbre
Terminal
Double
]
->
Arbre
Terminal
Double
normalizeLevel
(
Feuille
c
)
_
=
Feuille
c
normalizeLevel
(
Noeud
c
e
f
)
ns
=
Noeud
c
(
(
e
-
m
)
/
v
)
f
e
=
sum
$
f
<$>
Map
.
toList
children
f
(
k
,
child
)
=
if
pred
k
then
cfc
*
log
(
fromIntegral
c
)
else
-
cfc
*
log
cfc
where
cfc
=
fromIntegral
(
_node_count
child
)
/
fromIntegral
c
normalizeEntropy
::
Trie
k
Double
->
Trie
k
Double
-- normalizeEntropy (Leaf c) = Leaf c
normalizeEntropy
(
Node
c
e
children
)
=
Node
c
e
$
normalizeLevel
m
v
.
normalizeEntropy
<$>
children
where
es
=
map
_noeud_entropy
ns
es
=
_node_entropy
<$>
Map
.
elems
children
m
=
mean
es
v
=
variance
es
buildArbre
::
[[
Terminal
]]
->
Arbre
Terminal
Double
buildArbre
=
normalizeArbre
.
entropyArbre
.
insertArbres
normalizeLevel
::
Double
->
Double
->
Trie
k
Double
->
Trie
k
Double
-- normalizeLevel _ _ (Leaf c) = Leaf c
normalizeLevel
m
v
(
Node
c
e
children
)
=
Node
c
((
e
-
m
)
/
v
)
children
buildTrie
::
[[
Token
]]
->
Trie
Token
Double
buildTrie
=
normalizeEntropy
.
entropyTrie
isFin
.
insertTries
subForest
::
Trie
k
e
->
[
Trie
k
e
]
-- subForest (Leaf _) = []
subForest
(
Node
_
_
children
)
=
Map
.
elems
children
levels
::
Trie
k
e
->
[[
Trie
k
e
]]
levels
=
L
.
takeWhile
(
not
.
L
.
null
)
.
L
.
iterate
(
L
.
concatMap
subForest
)
.
pure
entropyLevels
::
Trie
k
e
->
[[
e
]]
entropyLevels
=
fmap
(
fmap
_node_entropy
)
.
levels
normalizeEntropy'
::
[[
Double
]]
->
Trie
k
Double
->
Trie
k
Double
normalizeEntropy'
[]
_
=
panic
"normalizeEntropy' empty levels"
-- normalizeEntropy' _ (Leaf c) = Leaf c
normalizeEntropy'
(
es
:
ess
)
(
Node
c
e
children
)
=
Node
c
e
(
normalizeLevel
m
v
.
normalizeEntropy'
ess
<$>
children
)
where
m
=
mean
es
v
=
variance
es
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment