Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
147
Issues
147
List
Board
Labels
Milestones
Merge Requests
6
Merge Requests
6
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gargantext
haskell-gargantext
Commits
7fe6bf9e
Commit
7fe6bf9e
authored
Jul 06, 2018
by
Alexandre Delanoë
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[FIX] types for cooc function before refactoring.
parent
affb8b80
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
56 additions
and
16 deletions
+56
-16
Main.hs
bin/gargantext-cli/Main.hs
+24
-1
Count.hs
src/Gargantext/Text/Metrics/Count.hs
+28
-15
Terms.hs
src/Gargantext/Text/Terms.hs
+4
-0
No files found.
bin/gargantext-cli/Main.hs
View file @
7fe6bf9e
...
...
@@ -22,6 +22,8 @@ Main specifications to index a corpus with a term list
module
Main
where
import
Data.Maybe
(
catMaybes
)
import
Data.Text
(
pack
)
import
qualified
Data.Vector
as
DV
import
qualified
Data.Maybe
as
DMaybe
...
...
@@ -44,6 +46,7 @@ import Gargantext.Prelude
import
Gargantext.Core
import
Gargantext.Core.Types
import
Gargantext.Text.Terms
import
Gargantext.Text.Context
import
Gargantext.Text.Terms.WithList
import
Gargantext.Text.Parsers.CSV
(
readCsv
,
csv_title
,
csv_abstract
,
csv_publication_year
)
import
Gargantext.Text.List.CSV
(
csvGraphTermList
)
...
...
@@ -78,7 +81,7 @@ filterTermsAndCooc patterns (year, ts) = do
pure
r
where
log
m
=
do
tid
<-
myThreadId
tid
<-
myThreadId
(
p
,
_
)
<-
threadCapability
tid
putStrLn
.
unwords
$
[
"filterTermsAndCooc:"
,
m
,
show
year
,
"on proc"
,
show
p
]
...
...
@@ -105,3 +108,23 @@ main = do
r
<-
mapConcurrently
(
filterTermsAndCooc
patterns
)
(
DM
.
toList
corpus
)
putStrLn
$
show
r
--writeFile outputFile cooc
testCooc
=
do
let
patterns
=
WithList
$
buildPatterns
testTermList
mapM
(
\
x
->
{-log "work" >>-}
terms
patterns
x
)
$
catMaybes
$
map
(
head
.
snd
)
testCorpus
--mapConcurrently (filterTermsAndCooc patterns) testCorpus
testCorpus
::
[(
Int
,
[
Text
])]
testCorpus
=
[
(
1998
,
[
pack
"The beees"
])
,
(
1999
,
[
pack
"The bees and the flowers"
--, pack "The bees and the flowers"
])
]
testTermList
::
TermList
testTermList
=
[
([
pack
"bee"
],
[[
pack
"bees"
]])
,
([
pack
"flower"
],
[[
pack
"flowers"
]])
]
src/Gargantext/Text/Metrics/Count.hs
View file @
7fe6bf9e
...
...
@@ -28,13 +28,13 @@ Source : https://en.wikipedia.org/wiki/Type%E2%80%93token_distinction#Occurrence
module
Gargantext.Text.Metrics.Count
where
import
Data.Text
(
Text
)
import
Control.Arrow
(
Arrow
(
..
),
(
***
))
import
qualified
Data.List
as
List
import
qualified
Data.Map.Strict
as
DMS
import
Data.Map.Strict
(
Map
,
empty
,
singleton
,
insertWith
,
unionWith
,
insertWith
,
unionWith
,
unionsWith
,
mapKeys
)
import
Data.Set
(
Set
)
...
...
@@ -69,8 +69,8 @@ type Grouped = Stems
----
-}
type
Occs
=
Int
type
Coocs
=
Int
type
Occs
=
Int
type
Coocs
=
Int
type
Threshold
=
Int
removeApax
::
Threshold
->
Map
(
Label
,
Label
)
Int
->
Map
(
Label
,
Label
)
Int
...
...
@@ -108,17 +108,30 @@ labelPolicy m g = case _terms_label <$> fst <$> maximumWith snd <$> DMS.toList
-}
coocOn
::
Ord
b
=>
(
a
->
b
)
->
[[
a
]]
->
Map
(
b
,
b
)
Coocs
coocOn
f
as
=
foldl'
(
\
a
b
->
DMS
.
unionWith
(
+
)
a
b
)
empty
$
map
(
coocOn'
f
)
as
coocOn
f
as
=
DMS
.
unionsWith
(
+
)
$
map
(
coocOn'
f
)
as
coocOn'
::
Ord
b
=>
(
a
->
b
)
->
[
a
]
->
Map
(
b
,
b
)
Coocs
coocOn'
fun
ts
=
DMS
.
fromListWith
(
+
)
xs
where
ts'
=
List
.
nub
$
map
fun
ts
xs
=
[
((
x
,
y
),
1
)
|
x
<-
ts'
,
y
<-
ts'
,
x
>=
y
]
coocOnContexts
::
(
a
->
[
Text
])
->
[[
a
]]
->
Map
([
Text
],
[
Text
])
Int
coocOnContexts
fun
=
DMS
.
fromListWith
(
+
)
.
List
.
concat
.
map
(
coocOnSingleContext
fun
)
coocOnSingleContext
::
(
a
->
[
Text
])
->
[
a
]
->
[(([
Text
],
[
Text
]),
Int
)]
coocOnSingleContext
fun
ts
=
xs
where
coocOn'
::
Ord
b
=>
(
a
->
b
)
->
[
a
]
->
Map
(
b
,
b
)
Coocs
coocOn'
fun
ts
=
foldl'
(
\
m
(
xy
,
c
)
->
insertWith
((
+
))
xy
c
m
)
empty
xs
where
ts'
=
List
.
nub
$
map
fun
ts
xs
=
[
((
x
,
y
),
1
)
|
x
<-
ts'
,
y
<-
ts'
-- , x /= y
]
ts'
=
List
.
nub
$
map
fun
ts
xs
=
[
((
x
,
y
),
1
)
|
x
<-
ts'
,
y
<-
ts'
,
x
>=
y
]
-- | Compute the grouped occurrences (occ)
...
...
@@ -131,6 +144,6 @@ occurrencesOn f = foldl' (\m a -> insertWith (unionWith (+)) (f a) (singleton a
-- TODO add groups and filter stops
sumOcc
::
Ord
a
=>
[
Occ
a
]
->
Occ
a
sumOcc
xs
=
foldl'
(
unionWith
(
+
))
empty
xs
sumOcc
xs
=
unionsWith
(
+
)
xs
src/Gargantext/Text/Terms.hs
View file @
7fe6bf9e
...
...
@@ -47,6 +47,10 @@ import Gargantext.Text.Terms.WithList (Patterns, extractTermsWithList)
data
TermType
lang
=
Mono
lang
|
Multi
lang
|
MonoMulti
lang
|
WithList
Patterns
group
::
[
Text
]
->
[
Text
]
group
=
undefined
-- remove Stop Words
-- map (filter (\t -> not . elem t)) $
------------------------------------------------------------------------
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment