Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
158
Issues
158
List
Board
Labels
Milestones
Merge Requests
11
Merge Requests
11
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gargantext
haskell-gargantext
Commits
7fe6bf9e
Commit
7fe6bf9e
authored
Jul 06, 2018
by
Alexandre Delanoë
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[FIX] types for cooc function before refactoring.
parent
affb8b80
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
56 additions
and
16 deletions
+56
-16
Main.hs
bin/gargantext-cli/Main.hs
+24
-1
Count.hs
src/Gargantext/Text/Metrics/Count.hs
+28
-15
Terms.hs
src/Gargantext/Text/Terms.hs
+4
-0
No files found.
bin/gargantext-cli/Main.hs
View file @
7fe6bf9e
...
...
@@ -22,6 +22,8 @@ Main specifications to index a corpus with a term list
module
Main
where
import
Data.Maybe
(
catMaybes
)
import
Data.Text
(
pack
)
import
qualified
Data.Vector
as
DV
import
qualified
Data.Maybe
as
DMaybe
...
...
@@ -44,6 +46,7 @@ import Gargantext.Prelude
import
Gargantext.Core
import
Gargantext.Core.Types
import
Gargantext.Text.Terms
import
Gargantext.Text.Context
import
Gargantext.Text.Terms.WithList
import
Gargantext.Text.Parsers.CSV
(
readCsv
,
csv_title
,
csv_abstract
,
csv_publication_year
)
import
Gargantext.Text.List.CSV
(
csvGraphTermList
)
...
...
@@ -78,7 +81,7 @@ filterTermsAndCooc patterns (year, ts) = do
pure
r
where
log
m
=
do
tid
<-
myThreadId
tid
<-
myThreadId
(
p
,
_
)
<-
threadCapability
tid
putStrLn
.
unwords
$
[
"filterTermsAndCooc:"
,
m
,
show
year
,
"on proc"
,
show
p
]
...
...
@@ -105,3 +108,23 @@ main = do
r
<-
mapConcurrently
(
filterTermsAndCooc
patterns
)
(
DM
.
toList
corpus
)
putStrLn
$
show
r
--writeFile outputFile cooc
testCooc
=
do
let
patterns
=
WithList
$
buildPatterns
testTermList
mapM
(
\
x
->
{-log "work" >>-}
terms
patterns
x
)
$
catMaybes
$
map
(
head
.
snd
)
testCorpus
--mapConcurrently (filterTermsAndCooc patterns) testCorpus
testCorpus
::
[(
Int
,
[
Text
])]
testCorpus
=
[
(
1998
,
[
pack
"The beees"
])
,
(
1999
,
[
pack
"The bees and the flowers"
--, pack "The bees and the flowers"
])
]
testTermList
::
TermList
testTermList
=
[
([
pack
"bee"
],
[[
pack
"bees"
]])
,
([
pack
"flower"
],
[[
pack
"flowers"
]])
]
src/Gargantext/Text/Metrics/Count.hs
View file @
7fe6bf9e
...
...
@@ -28,13 +28,13 @@ Source : https://en.wikipedia.org/wiki/Type%E2%80%93token_distinction#Occurrence
module
Gargantext.Text.Metrics.Count
where
import
Data.Text
(
Text
)
import
Control.Arrow
(
Arrow
(
..
),
(
***
))
import
qualified
Data.List
as
List
import
qualified
Data.Map.Strict
as
DMS
import
Data.Map.Strict
(
Map
,
empty
,
singleton
,
insertWith
,
unionWith
,
insertWith
,
unionWith
,
unionsWith
,
mapKeys
)
import
Data.Set
(
Set
)
...
...
@@ -69,8 +69,8 @@ type Grouped = Stems
----
-}
type
Occs
=
Int
type
Coocs
=
Int
type
Occs
=
Int
type
Coocs
=
Int
type
Threshold
=
Int
removeApax
::
Threshold
->
Map
(
Label
,
Label
)
Int
->
Map
(
Label
,
Label
)
Int
...
...
@@ -108,17 +108,30 @@ labelPolicy m g = case _terms_label <$> fst <$> maximumWith snd <$> DMS.toList
-}
coocOn
::
Ord
b
=>
(
a
->
b
)
->
[[
a
]]
->
Map
(
b
,
b
)
Coocs
coocOn
f
as
=
foldl'
(
\
a
b
->
DMS
.
unionWith
(
+
)
a
b
)
empty
$
map
(
coocOn'
f
)
as
coocOn
f
as
=
DMS
.
unionsWith
(
+
)
$
map
(
coocOn'
f
)
as
coocOn'
::
Ord
b
=>
(
a
->
b
)
->
[
a
]
->
Map
(
b
,
b
)
Coocs
coocOn'
fun
ts
=
DMS
.
fromListWith
(
+
)
xs
where
ts'
=
List
.
nub
$
map
fun
ts
xs
=
[
((
x
,
y
),
1
)
|
x
<-
ts'
,
y
<-
ts'
,
x
>=
y
]
coocOnContexts
::
(
a
->
[
Text
])
->
[[
a
]]
->
Map
([
Text
],
[
Text
])
Int
coocOnContexts
fun
=
DMS
.
fromListWith
(
+
)
.
List
.
concat
.
map
(
coocOnSingleContext
fun
)
coocOnSingleContext
::
(
a
->
[
Text
])
->
[
a
]
->
[(([
Text
],
[
Text
]),
Int
)]
coocOnSingleContext
fun
ts
=
xs
where
coocOn'
::
Ord
b
=>
(
a
->
b
)
->
[
a
]
->
Map
(
b
,
b
)
Coocs
coocOn'
fun
ts
=
foldl'
(
\
m
(
xy
,
c
)
->
insertWith
((
+
))
xy
c
m
)
empty
xs
where
ts'
=
List
.
nub
$
map
fun
ts
xs
=
[
((
x
,
y
),
1
)
|
x
<-
ts'
,
y
<-
ts'
-- , x /= y
]
ts'
=
List
.
nub
$
map
fun
ts
xs
=
[
((
x
,
y
),
1
)
|
x
<-
ts'
,
y
<-
ts'
,
x
>=
y
]
-- | Compute the grouped occurrences (occ)
...
...
@@ -131,6 +144,6 @@ occurrencesOn f = foldl' (\m a -> insertWith (unionWith (+)) (f a) (singleton a
-- TODO add groups and filter stops
sumOcc
::
Ord
a
=>
[
Occ
a
]
->
Occ
a
sumOcc
xs
=
foldl'
(
unionWith
(
+
))
empty
xs
sumOcc
xs
=
unionsWith
(
+
)
xs
src/Gargantext/Text/Terms.hs
View file @
7fe6bf9e
...
...
@@ -47,6 +47,10 @@ import Gargantext.Text.Terms.WithList (Patterns, extractTermsWithList)
data
TermType
lang
=
Mono
lang
|
Multi
lang
|
MonoMulti
lang
|
WithList
Patterns
group
::
[
Text
]
->
[
Text
]
group
=
undefined
-- remove Stop Words
-- map (filter (\t -> not . elem t)) $
------------------------------------------------------------------------
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment