Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Christian Merten
haskell-gargantext
Commits
0405e007
Verified
Commit
0405e007
authored
Feb 27, 2024
by
Przemyslaw Kaminski
Browse files
Options
Browse Files
Download
Plain Diff
Merge branch 'dev' into 224-dev-uniform-ngrams-creation
parents
dc21080f
fe201115
Changes
19
Hide whitespace changes
Inline
Side-by-side
Showing
19 changed files
with
1334 additions
and
271 deletions
+1334
-271
.gitignore
.gitignore
+6
-1
.gitlab-ci.yml
.gitlab-ci.yml
+2
-2
CHANGELOG.md
CHANGELOG.md
+6
-0
README.md
README.md
+1
-1
GarganText_DocsList-nodeId-185487.csv
bench-data/phylo/GarganText_DocsList-nodeId-185487.csv
+202
-0
GarganText_NgramsList-185488.csv
bench-data/phylo/GarganText_NgramsList-185488.csv
+583
-0
bpa-config.json
bench-data/phylo/bpa-config.json
+68
-0
Main.hs
bin/gargantext-phylo-profile/Main.hs
+112
-0
Main.hs
bin/gargantext-phylo/Main.hs
+2
-205
Common.hs
bin/gargantext-phylo/Phylo/Common.hs
+229
-0
update-project-dependencies
bin/update-project-dependencies
+2
-2
gargantext.cabal
gargantext.cabal
+55
-3
CSV.hs
src/Gargantext/Core/Text/List/Formats/CSV.hs
+5
-4
Tools.hs
src/Gargantext/Core/Viz/Phylo/API/Tools.hs
+1
-1
PhyloExport.hs
src/Gargantext/Core/Viz/Phylo/PhyloExport.hs
+4
-4
PhyloMaker.hs
src/Gargantext/Core/Viz/Phylo/PhyloMaker.hs
+33
-35
PhyloTools.hs
src/Gargantext/Core/Viz/Phylo/PhyloTools.hs
+21
-12
TemporalMatching.hs
src/Gargantext/Core/Viz/Phylo/TemporalMatching.hs
+1
-1
stack.yaml
stack.yaml
+1
-0
No files found.
.gitignore
View file @
0405e007
...
@@ -4,6 +4,10 @@
...
@@ -4,6 +4,10 @@
# Profiling
# Profiling
*.prof
*.prof
*.prof.html
*.hp
*.eventlog
*.eventlog.html
profiling
profiling
# Stack
# Stack
...
@@ -39,4 +43,5 @@ data
...
@@ -39,4 +43,5 @@ data
devops/docker/js-cache
devops/docker/js-cache
cabal.project.local
cabal.project.local
\ No newline at end of file
gargantext_profile_out.dot
.gitlab-ci.yml
View file @
0405e007
...
@@ -50,7 +50,7 @@ bench:
...
@@ -50,7 +50,7 @@ bench:
-
.cabal/
-
.cabal/
policy
:
pull-push
policy
:
pull-push
script
:
script
:
-
nix-shell --run "./bin/update-project-dependencies $CABAL_STORE_DIR && cabal --store-dir=$CABAL_STORE_DIR v2-bench --ghc-options='-O2 -fclear-plugins'"
-
nix-shell --run "./bin/update-project-dependencies $CABAL_STORE_DIR && cabal --store-dir=$CABAL_STORE_DIR v2-bench --
flags +no-phylo-debug-logs --
ghc-options='-O2 -fclear-plugins'"
allow_failure
:
true
allow_failure
:
true
test
:
test
:
...
@@ -84,7 +84,7 @@ test:
...
@@ -84,7 +84,7 @@ test:
mkdir -p /builds/gargantext/haskell-gargantext/devops/coreNLP/stanford-corenlp-current
mkdir -p /builds/gargantext/haskell-gargantext/devops/coreNLP/stanford-corenlp-current
cp -R /root/devops/coreNLP/stanford-corenlp-${CORENLP}/* /builds/gargantext/haskell-gargantext/devops/coreNLP/stanford-corenlp-current/
cp -R /root/devops/coreNLP/stanford-corenlp-${CORENLP}/* /builds/gargantext/haskell-gargantext/devops/coreNLP/stanford-corenlp-current/
nix-shell --run "chown -R test:test /root/.config/ && su -m test -c \"export PATH=$PATH:$TEST_NIX_PATH && cd /builds/gargantext/haskell-gargantext; $CABAL --store-dir=$CABAL_STORE_DIR v2-test --test-show-details=streaming --flags
test-crypto
--ghc-options='-O0 -fclear-plugins'\""
nix-shell --run "chown -R test:test /root/.config/ && su -m test -c \"export PATH=$PATH:$TEST_NIX_PATH && cd /builds/gargantext/haskell-gargantext; $CABAL --store-dir=$CABAL_STORE_DIR v2-test --test-show-details=streaming --flags
'test-crypto no-phylo-debug-logs'
--ghc-options='-O0 -fclear-plugins'\""
chown -R root:root dist-newstyle/
chown -R root:root dist-newstyle/
chown -R root:root /root/
chown -R root:root /root/
chown -R root:root $CABAL_STORE_DIR
chown -R root:root $CABAL_STORE_DIR
...
...
CHANGELOG.md
View file @
0405e007
## Version 0.0.6.9.9.9.6.2 [Release Candidate for 007]
*
[
BACK
][
FIX
][
Node stories insertion error (SqlError violates foreign key constraint) (#303)
](
https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/303
)
*
[
BACK
][
DOC
][
Welcome: Door To enter the project (#177)
](
https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/177
)
*
[
BACK
][
OPTIM
][
Improve Phylo robustness and performance (#292)
](
https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/292
)
## Version 0.0.6.9.9.9.6.1
## Version 0.0.6.9.9.9.6.1
*
[
BACK
][
FEAT
][
Removing Order2_A and Order2_B and use Order2 only instead (#308)
](
https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/308
)
*
[
BACK
][
FEAT
][
Removing Order2_A and Order2_B and use Order2 only instead (#308)
](
https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/308
)
...
...
README.md
View file @
0405e007
...
@@ -199,7 +199,7 @@ See https://www.haskell.org/ghcup/guide/#hls for more details.
...
@@ -199,7 +199,7 @@ See https://www.haskell.org/ghcup/guide/#hls for more details.
```
sh
```
sh
# If docker is not installed:
# If docker is not installed:
# curl -sSL https://gitlab.iscpif.fr/gargantext/haskell-gargantext/raw/dev/devops/docker/
docker-install
| sh
# curl -sSL https://gitlab.iscpif.fr/gargantext/haskell-gargantext/raw/dev/devops/docker/
install_docker
| sh
cd devops/docker
cd devops/docker
docker compose up
docker compose up
```
```
...
...
bench-data/phylo/GarganText_DocsList-nodeId-185487.csv
0 → 100644
View file @
0405e007
This source diff could not be displayed because it is too large. You can
view the blob
instead.
bench-data/phylo/GarganText_NgramsList-185488.csv
0 → 100644
View file @
0405e007
status label forms
map %
map % ci
candidate 17α-ethinylestradiol
map 17β-estradiol
candidate 2-ethylhexyl
candidate 4-tert-octylphenol
candidate absence
candidate action
candidate activation
candidate activities activity
candidate activity
map addition
map additional studies
map adsorption
map adult zebrafish
candidate adults
map adverse effect
map adverse effects adverse effect
candidate adverse health effects
map aim
map aim was
candidate algae
map alteration
map alterations alteration
candidate alternatives
map analogs
candidate analogues
map analysis
map androgen receptor
map animal animals
map animal models
map animals
map apoptosis
candidate application applications
candidate applications
candidate aquatic ecosystems
map aquatic environment
candidate aquatic plants
candidate ar
candidate area
candidate areas area
map article is protected
candidate assay
candidate assays assay
candidate association
candidate associations association
map balloon pulmonary angioplasty
candidate basis
map better understanding
candidate beverages
candidate binding affinity
map bioaccumulation potential
candidate biological effects
map birth outcomes
map birth weight
map bisphenol bisphenols
map bisphenol a
map bisphenol a action
map bisphenol a analogues
candidate bisphenol a exposure
candidate bisphenol a.
map bisphenol af
map bisphenol analogues
map bisphenol b
candidate bisphenol compounds
map bisphenol f
map bisphenol s bisphenol-s
candidate bisphenol-a
map bisphenol-s
map bisphenols
candidate bnct
map body weight
map boron neutron capture therapy
map bp bps
map bpa
candidate bpa analogues
map bpa caused
map bpa degradation
candidate bpa did
map bpa disrupts
map bpa exposure
map bpa exposure was
candidate bpa group
candidate bpa had
map bpa increased
map bpa is
candidate bpa leads
candidate bpa replacements
candidate bpa stimulated
map bpa treatment
map bpa was bpa is
candidate bpa was determined
candidate bpa were found
candidate bpa-bnct
candidate bpa-exposed rats
map bpaf
map bpaf was
map bpb
map bpf
map bps
map breast cancer
map c-bisphenol a
candidate caffeine
candidate carbon nanotubes
map case
map cash register receipts
candidate cat
candidate catalase
candidate cck-8
candidate cd
map cell counting kit-8
map cell proliferation
map cell viability
candidate changes
map chemical chemicals
candidate chemical analysis
map chemical exposure
map chemical oxygen demand
map chemicals
candidate chemicals were
candidate children
candidate china
candidate chromosomes
map chronic exposure
map chronic thromboembolic pulmonary hypertension
candidate cm
candidate co
candidate cod
map coexistence
map color developer
candidate combination
candidate common sources
candidate comparison
candidate complications
map compounds
map concentration
map concentration-dependent manner
map concentrations concentration
map concern
map concerns concern
map conclusion
candidate consumer products
candidate contamination
candidate contrast
map control
map control group control groups
map control groups
map controls control
map copyright
candidate cr
map critical period
map critical windows
candidate cross-sectional study
candidate crucial role
candidate cteph
candidate cu
candidate current knowledge
map cyp11a1
map danio rerio
map data
map day
map days day
map dbp
candidate decrease decreases
candidate decreases
candidate degradation
map degradation rate
candidate dehp
candidate deleterious effects
map delivery
map dermal absorption
map dermal exposure
candidate desorption
map desorption time
map detection
candidate detection limit
map determination
map development
map developmental periods
candidate diabetes
map dibutyl phthalate
candidate dietary exposure
candidate different effects
map different products
candidate different regions
map distribution distributions
map distributions
candidate dna damage
map dna methylation
candidate dose
map dose-limiting tissue
candidate doses dose
map drinking water
map e1
map e2
map ecological and human health risk
map edc
map edcs edc
candidate ee2
map effect
map effects effect
map effects are effects were
map effects were
candidate efsa
candidate embryos
map emt
map emt process
candidate endocrine disrupter
candidate endocrine disruption
map endocrine disruptor
map endocrine disruptors endocrine disruptor
map endocrine system
candidate endocrine-disrupting chemical
candidate environment
candidate environmental behaviors
candidate environmental contaminant
candidate environmental endocrine disruptors
map environmental exposure
map environmental remediation
map environmental toxicants
candidate epidemiological studies
map epoxy resins
map er
map er-dependent pathway
candidate erk1/2
map ers er
map estriol
candidate estrogen
map estrogen receptor
map estrogen receptors estrogen receptor
map estrogenic activity
candidate estrogenic properties
candidate estrogens estrogen
map estrone
candidate europe
map european food safety authority
candidate evaluation
candidate evidence
candidate evidence indicates
candidate experimental data
map exposure
map exposure was
map exposures exposure
map expression expressions
map expressions
candidate extent
map extraction
candidate fact
map females
map fetal development
map findings indicate
candidate findings suggest
map first time
candidate fish
map flow rate
map food foods
candidate food packaging
map foods
candidate formation
map fourier transform infrared spectroscopy
candidate freshwater
map freundlich model
candidate ftir
candidate function
map g creatinine
candidate g protein-coupled estrogen receptor
map gene expression
map general population
candidate gestation
candidate glucose homeostasis
map glutathione peroxidase
candidate good linear relationship
map group
map groups group
candidate growth
candidate h
map head circumference
map health effects
candidate high expression
candidate high risk
candidate high sensitivity
candidate high-resolution transmission electron microscopy
map higher bpa levels
candidate higher concentrations
map higher rates
map hippocampus
candidate hong kong
map hormonal regulation
candidate hours
candidate hplc-uv
candidate hrtem
map human exposure
map human health
candidate human plasma
candidate humans
map humans are
map humic acid
map humic acids humic acid
map hydroxyl radicals
candidate hypothalamus
candidate ici
candidate il-6
candidate immune system
candidate impact
candidate impacts impact
candidate impairment
map implantation failure
candidate important role
candidate increase
map induction
candidate influence
candidate inhibition
map initial concentration
map insulin resistance
candidate interaction interactions
candidate interactions
candidate involvement
map john wiley
map juvenile sprague-dawley rats
candidate key role
candidate kg
candidate kinetics
map l
candidate lack
candidate learning
candidate length
map level
map levels level
map limit
candidate limited number
map limits limit
candidate liver
map lod
map lods lod
candidate loq
map low concentrations
candidate low-dose bisphenol a
map low-dose bpa
map low-dose exposure
map ltd.
map major source
map male
map male mice
map male offspring
map males male
candidate manufacture
candidate mapk
candidate mass spectrometry
candidate maternal diet
map matrix effect
map mcf-7 breast cancer cells
map mcf-7 cells
candidate mechanism
candidate mechanisms mechanism
candidate method
map method was
map method was applied
candidate methods method
map mg g
candidate mg kg
candidate mg l
candidate mg/kg
candidate mg/l
candidate mice
map migration
candidate min
map mitogen-activated protein kinase
candidate ml
map mobile phase
candidate modulation
map molecular docking
candidate molecular interactions
candidate motility
map mrna expression
map myelin basic protein
map n
candidate naphthalene
candidate natural waters
map negative effect
candidate neuroendocrine disruption
map new mechanism
candidate ng/l
candidate ni
map nonylphenol
candidate np
candidate number
candidate obesity
candidate objective
candidate objective was
candidate occupational exposure
map occurrence
candidate onset
candidate order
map organic contaminants
map organic pollutants
candidate organisms
map ovarian reserve
map oxidative stress
candidate p
candidate papillary thyroid carcinoma
candidate parabens
candidate part
candidate participants
candidate pathophysiology
candidate pb
candidate pergafast
map perinatal exposure
candidate pfoa
candidate pfos
candidate ph
candidate phenol phenols
candidate phenolic compounds
candidate phenols
candidate phosphate
candidate phosphorylation
map photocatalytic mechanism
map photocatalytic properties
candidate photoluminescence
map phthalate
map phthalates phthalate
candidate physical activity
map physicochemical properties
candidate pl
map plastic industry
map plastic production
candidate plastics
candidate pnd
candidate pnd90
candidate pollutants
map polycarbonate plastic
map polycarbonate plastics polycarbonate plastic
map positive effect
candidate possible effects
map postnatal day
candidate potential
candidate potential applications
candidate potential effects
candidate potential mechanism
candidate potential risk
map potential sources
map pregnancy
candidate preliminary results
candidate prenatal bisphenol a
map prenatal exposure
map presence
map present study
map present study is present study was
map present study was
candidate previous study
map pro-inflammatory cytokines
candidate production
candidate products
candidate proliferation
map protein expression
map protein levels
candidate pseudo-second-order kinetic model
map pubertal development
candidate puberty
candidate quantification
map r
map range
map ranges range
map rat offspring
map reactive oxygen species
candidate regulation
map relation
map relative standard deviations
candidate release
candidate relevant levels
map removal
map removal efficiencies
map removal efficiency removal efficiencies
candidate reproductive development
map reproductive function
map reproductive hormones
candidate response
map result
map results result
candidate results demonstrated
map results indicate
map results indicated results indicate
candidate results revealed
map results show
map results showed results show
map results suggest
candidate review
map rights reserved
map risk
map risk assessment
map risks risk
map role
map ros
candidate s/n
map sample samples
map samples
candidate samples were
map sampling sites showed
candidate scanning electron microscope
map scanning electron microscopy
map sediment
map sediments sediment
candidate sem
map sensitivities
map sensitivity sensitivities
candidate series
candidate serum
map signal-to-noise ratio
map significant decrease
candidate significant differences
map significant increase
map simultaneous determination
map simultaneous removal
candidate soil
map sons
map sorbent
map source
map sources source
map spatial distribution
map spatial memory
candidate species
map stem cells
map structural analogues
candidate structure
map studies study
candidate studies showed
map study
map study aims
map study evaluated
map study investigated
candidate study suggests
map study was
map superoxide dismutase
candidate surface
map surface water surface waters
map surface waters
candidate susceptibility
map synaptic plasticity
map synthesis
candidate t
map taihu lake
candidate tcs
candidate testis
map tetrabromobisphenol a
map thermal paper
map thermal paper contains
candidate thermal paper receipts
map toxicity
map transcriptional activity
candidate treatment
map triclosan
candidate type types
candidate types
candidate ubiquitous endocrine-disrupting chemical
map urinary levels
map urine samples
map urine samples collected
candidate use
map utero development
map utero windows
candidate value
candidate values value
map vi
map visible light irradiation
map wastewater
map water
candidate water sources
candidate water treatment
candidate week weeks
candidate weeks
map wide range
map women
candidate work
map work was
candidate wt %
map x-ray diffraction
candidate xrd
candidate young children
candidate zebrafish
map zebrafish embryos
map μg
map μg kg
candidate μgl
bench-data/phylo/bpa-config.json
0 → 100644
View file @
0405e007
{
"corpusPath"
:
"Gargantext_DocsList-nodeId-185487.csv"
,
"listPath"
:
"Gargantext_NgramsList-185488.csv"
,
"outputPath"
:
"data"
,
"corpusParser"
:
{
"tag"
:
"Csv"
,
"_csv_limit"
:
1500000
},
"listParser"
:
"V3"
,
"phyloName"
:
"bpa"
,
"phyloScale"
:
2
,
"similarity"
:
{
"tag"
:
"WeightedLogJaccard"
,
"_wlj_sensibility"
:
0.5
,
"_wlj_minSharedNgrams"
:
1
},
"seaElevation"
:
{
"tag"
:
"Evolving"
,
"_evol_neighborhood"
:
true
},
"defaultMode"
:
false
,
"findAncestors"
:
true
,
"phyloSynchrony"
:
{
"tag"
:
"ByProximityThreshold"
,
"_bpt_threshold"
:
0.6
,
"_bpt_sensibility"
:
0
,
"_bpt_scope"
:
"AllBranches"
,
"_bpt_strategy"
:
"MergeAllGroups"
},
"phyloQuality"
:
{
"tag"
:
"Quality"
,
"_qua_granularity"
:
0.1
,
"_qua_minBranch"
:
2
},
"timeUnit"
:
{
"tag"
:
"Week"
,
"_week_period"
:
4
,
"_week_step"
:
2
,
"_week_matchingFrame"
:
5
},
"clique"
:
{
"tag"
:
"Fis"
,
"_fis_support"
:
3
,
"_fis_size"
:
1
},
"exportLabel"
:
[
{
"tag"
:
"BranchLabel"
,
"_branch_labelTagger"
:
"MostEmergentTfIdf"
,
"_branch_labelSize"
:
2
},
{
"tag"
:
"GroupLabel"
,
"_group_labelTagger"
:
"MostEmergentInclusive"
,
"_group_labelSize"
:
2
}
],
"exportSort"
:
{
"tag"
:
"ByHierarchy"
,
"_sort_order"
:
"Desc"
},
"exportFilter"
:
[
{
"tag"
:
"ByBranchSize"
,
"_branch_size"
:
2
}
]
}
bin/gargantext-phylo-profile/Main.hs
0 → 100644
View file @
0405e007
{-# LANGUAGE OverloadedStrings #-}
module
Main
where
import
Common
import
Data.Aeson
import
Data.List
(
nub
)
import
Gargantext.Core.Viz.Phylo
import
Gargantext.Core.Viz.Phylo.API.Tools
import
Gargantext.Core.Viz.Phylo.PhyloExport
(
toPhyloExport
,
dotToFile
)
import
Gargantext.Core.Viz.Phylo.PhyloMaker
(
toPhylo
,
toPhyloWithoutLink
)
import
Gargantext.Core.Viz.Phylo.PhyloTools
(
printIOMsg
,
printIOComment
,
setConfig
,
toPeriods
,
getTimePeriod
,
getTimeStep
)
import
GHC.IO.Encoding
import
GHC.Stack
import
Paths_gargantext
import
Prelude
import
qualified
Data.Text
as
T
import
Shelly
import
System.Directory
--------------
-- | Main | --
--------------
phyloConfig
::
FilePath
->
PhyloConfig
phyloConfig
outdir
=
PhyloConfig
{
corpusPath
=
"corpus.csv"
,
listPath
=
"list.csv"
,
outputPath
=
outdir
,
corpusParser
=
Csv
{
_csv_limit
=
150000
}
,
listParser
=
V4
,
phyloName
=
"phylo_profile_test"
,
phyloScale
=
2
,
similarity
=
WeightedLogJaccard
{
_wlj_sensibility
=
0.5
,
_wlj_minSharedNgrams
=
2
}
,
seaElevation
=
Constante
{
_cons_start
=
0.1
,
_cons_gap
=
0.1
}
,
defaultMode
=
True
,
findAncestors
=
True
,
phyloSynchrony
=
ByProximityThreshold
{
_bpt_threshold
=
0.5
,
_bpt_sensibility
=
0.0
,
_bpt_scope
=
AllBranches
,
_bpt_strategy
=
MergeAllGroups
}
,
phyloQuality
=
Quality
{
_qua_granularity
=
0.8
,
_qua_minBranch
=
3
}
,
timeUnit
=
Year
{
_year_period
=
3
,
_year_step
=
1
,
_year_matchingFrame
=
5
}
,
clique
=
MaxClique
{
_mcl_size
=
5
,
_mcl_threshold
=
1.0e-4
,
_mcl_filter
=
ByThreshold
}
,
exportLabel
=
[
BranchLabel
{
_branch_labelTagger
=
MostEmergentTfIdf
,
_branch_labelSize
=
2
}
,
GroupLabel
{
_group_labelTagger
=
MostEmergentInclusive
,
_group_labelSize
=
2
}
]
,
exportSort
=
ByHierarchy
{
_sort_order
=
Desc
}
,
exportFilter
=
[
ByBranchSize
{
_branch_size
=
3.0
}]
}
main
::
HasCallStack
=>
IO
()
main
=
do
shelly
$
escaping
False
$
withTmpDir
$
\
tdir
->
do
curDir
<-
pwd
let
output
=
curDir
<>
"/"
<>
"gargantext_profile_out.dot"
chdir
tdir
$
do
liftIO
$
setLocaleEncoding
utf8
bpaConfig
<-
liftIO
$
getDataFileName
"bench-data/phylo/bpa-config.json"
corpusPath'
<-
liftIO
$
getDataFileName
"bench-data/phylo/GarganText_DocsList-nodeId-185487.csv"
listPath'
<-
liftIO
$
getDataFileName
"bench-data/phylo/GarganText_NgramsList-185488.csv"
(
Right
config
)
<-
fmap
(
\
pcfg
->
pcfg
{
outputPath
=
tdir
,
corpusPath
=
corpusPath'
,
listPath
=
listPath'
})
<$>
liftIO
(
eitherDecodeFileStrict'
bpaConfig
)
mapList
<-
liftIO
$
fileToList
(
listParser
config
)
(
listPath
config
)
corpus
<-
liftIO
$
if
(
defaultMode
config
)
then
fileToDocsDefault
(
corpusParser
config
)
(
corpusPath
config
)
[
Year
3
1
5
,
Month
3
1
5
,
Week
4
2
5
]
mapList
else
fileToDocsAdvanced
(
corpusParser
config
)
(
corpusPath
config
)
(
timeUnit
config
)
mapList
liftIO
$
do
printIOComment
(
show
(
length
corpus
)
<>
" parsed docs from the corpus"
)
printIOComment
(
show
(
length
$
nub
$
concat
$
map
text
corpus
)
<>
" Size ngs_coterms"
)
printIOComment
(
show
(
length
mapList
)
<>
" Size ngs_terms List Map Ngrams"
)
printIOMsg
"Reconstruct the phylo"
-- check the existing backup files
let
backupPhyloWithoutLink
=
(
outputPath
config
)
<>
"backupPhyloWithoutLink_"
<>
(
configToSha
BackupPhyloWithoutLink
config
)
<>
".json"
let
backupPhylo
=
(
outputPath
config
)
<>
"backupPhylo_"
<>
(
configToSha
BackupPhylo
config
)
<>
".json"
phyloWithoutLinkExists
<-
doesFileExist
backupPhyloWithoutLink
phyloExists
<-
doesFileExist
backupPhylo
-- reconstruct the phylo
phylo
<-
if
phyloExists
then
do
printIOMsg
"Reconstruct the phylo from an existing file"
readPhylo
backupPhylo
else
do
if
phyloWithoutLinkExists
then
do
printIOMsg
"Reconstruct the phylo from an existing file without links"
phyloWithoutLink
<-
readPhylo
backupPhyloWithoutLink
writePhylo
backupPhyloWithoutLink
phyloWithoutLink
pure
$
toPhylo
(
setConfig
config
phyloWithoutLink
)
else
do
printIOMsg
"Reconstruct the phylo from scratch"
phyloWithoutLink
<-
pure
$
toPhyloWithoutLink
corpus
config
writePhylo
backupPhyloWithoutLink
phyloWithoutLink
pure
$
toPhylo
(
setConfig
config
phyloWithoutLink
)
writePhylo
backupPhylo
phylo
printIOMsg
"End of reconstruction, start the export"
let
dot
=
toPhyloExport
(
setConfig
config
phylo
)
dotToFile
output
dot
echo
"Done."
bin/gargantext-phylo/Main.hs
View file @
0405e007
...
@@ -47,210 +47,7 @@ import Gargantext.Database.Schema.Ngrams (NgramsType(..))
...
@@ -47,210 +47,7 @@ import Gargantext.Database.Schema.Ngrams (NgramsType(..))
import
Gargantext.Prelude
hiding
(
hash
,
replace
)
import
Gargantext.Prelude
hiding
(
hash
,
replace
)
import
Prelude
qualified
import
Prelude
qualified
import
System.Directory
(
listDirectory
,
doesFileExist
)
import
System.Directory
(
listDirectory
,
doesFileExist
)
import
Common
data
Backup
=
BackupPhyloWithoutLink
|
BackupPhylo
deriving
(
Show
)
---------------
-- | Tools | --
---------------
-- | To get all the files in a directory or just a file
getFilesFromPath
::
FilePath
->
IO
[
FilePath
]
getFilesFromPath
path
=
do
if
(
isSuffixOf
"/"
path
)
then
(
listDirectory
path
)
else
return
[
path
]
----------------
-- | Parser | --
----------------
-- | To filter the Ngrams of a document based on the termList
termsInText
::
Patterns
->
Text
->
[
Text
]
termsInText
pats
txt
=
nub
$
concat
$
map
(
map
unwords
)
$
extractTermsWithList
pats
txt
-- | To transform a Wos file (or [file]) into a list of Docs
wosToDocs
::
Int
->
Patterns
->
TimeUnit
->
FilePath
->
IO
[
Document
]
wosToDocs
limit
patterns
time
path
=
do
files
<-
getFilesFromPath
path
take
limit
<$>
map
(
\
d
->
let
title
=
fromJust
$
_hd_title
d
abstr
=
if
(
isJust
$
_hd_abstract
d
)
then
fromJust
$
_hd_abstract
d
else
""
in
Document
(
toPhyloDate
(
fromIntegral
$
fromJust
$
_hd_publication_year
d
)
(
fromJust
$
_hd_publication_month
d
)
(
fromJust
$
_hd_publication_day
d
)
time
)
(
toPhyloDate'
(
fromIntegral
$
fromJust
$
_hd_publication_year
d
)
(
fromJust
$
_hd_publication_month
d
)
(
fromJust
$
_hd_publication_day
d
)
time
)
(
termsInText
patterns
$
title
<>
" "
<>
abstr
)
Nothing
[]
time
)
<$>
concat
<$>
mapConcurrently
(
\
file
->
filter
(
\
d
->
(
isJust
$
_hd_publication_year
d
)
&&
(
isJust
$
_hd_title
d
))
<$>
fromRight
[]
<$>
parseFile
WOS
Plain
(
path
<>
file
)
)
files
-- To transform a Csv file into a list of Document
csvToDocs
::
CorpusParser
->
Patterns
->
TimeUnit
->
FilePath
->
IO
[
Document
]
csvToDocs
parser
patterns
time
path
=
case
parser
of
Wos
_
->
Prelude
.
error
"csvToDocs: unimplemented"
Csv
limit
->
Vector
.
toList
<$>
Vector
.
take
limit
<$>
Vector
.
map
(
\
row
->
Document
(
toPhyloDate
(
Csv
.
fromMIntOrDec
Csv
.
defaultYear
$
csv_publication_year
row
)
(
fromMaybe
Csv
.
defaultMonth
$
csv_publication_month
row
)
(
fromMaybe
Csv
.
defaultDay
$
csv_publication_day
row
)
time
)
(
toPhyloDate'
(
Csv
.
fromMIntOrDec
Csv
.
defaultYear
$
csv_publication_year
row
)
(
fromMaybe
Csv
.
defaultMonth
$
csv_publication_month
row
)
(
fromMaybe
Csv
.
defaultDay
$
csv_publication_day
row
)
time
)
(
termsInText
patterns
$
(
csv_title
row
)
<>
" "
<>
(
csv_abstract
row
))
Nothing
[]
time
)
<$>
snd
<$>
either
(
\
err
->
panicTrace
$
"CSV error"
<>
(
show
err
))
identity
<$>
Csv
.
readCSVFile
path
Csv'
limit
->
Vector
.
toList
<$>
Vector
.
take
limit
<$>
Vector
.
map
(
\
row
->
Document
(
toPhyloDate
(
csv'_publication_year
row
)
(
csv'_publication_month
row
)
(
csv'_publication_day
row
)
time
)
(
toPhyloDate'
(
csv'_publication_year
row
)
(
csv'_publication_month
row
)
(
csv'_publication_day
row
)
time
)
(
termsInText
patterns
$
(
csv'_title
row
)
<>
" "
<>
(
csv'_abstract
row
))
(
Just
$
csv'_weight
row
)
(
map
(
T
.
strip
.
pack
)
$
splitOn
";"
(
unpack
$
(
csv'_source
row
)))
time
)
<$>
snd
<$>
Csv
.
readWeightedCsv
path
-- To parse a file into a list of Document
fileToDocsAdvanced
::
CorpusParser
->
FilePath
->
TimeUnit
->
TermList
->
IO
[
Document
]
fileToDocsAdvanced
parser
path
time
lst
=
do
let
patterns
=
buildPatterns
lst
case
parser
of
Wos
limit
->
wosToDocs
limit
patterns
time
path
Csv
_
->
csvToDocs
parser
patterns
time
path
Csv'
_
->
csvToDocs
parser
patterns
time
path
fileToDocsDefault
::
CorpusParser
->
FilePath
->
[
TimeUnit
]
->
TermList
->
IO
[
Document
]
fileToDocsDefault
parser
path
timeUnits
lst
=
if
length
timeUnits
>
0
then
do
let
timeUnit
=
(
head'
"fileToDocsDefault"
timeUnits
)
docs
<-
fileToDocsAdvanced
parser
path
timeUnit
lst
let
periods
=
toPeriods
(
sort
$
nub
$
map
date
docs
)
(
getTimePeriod
timeUnit
)
(
getTimeStep
timeUnit
)
if
(
length
periods
<
3
)
then
fileToDocsDefault
parser
path
(
tail
timeUnits
)
lst
else
pure
docs
else
panicTrace
"this corpus is incompatible with the phylomemy reconstruction"
-- on passe à passer la time unit dans la conf envoyé au phyloMaker
-- dans le phyloMaker si default est true alors dans le setDefault ou pense à utiliser la TimeUnit de la conf
---------------
-- | Label | --
---------------
-- Config time parameters to label
timeToLabel
::
PhyloConfig
->
[
Char
]
timeToLabel
config
=
case
(
timeUnit
config
)
of
Epoch
p
s
f
->
(
"time_epochs"
<>
"_"
<>
(
show
p
)
<>
"_"
<>
(
show
s
)
<>
"_"
<>
(
show
f
))
Year
p
s
f
->
(
"time_years"
<>
"_"
<>
(
show
p
)
<>
"_"
<>
(
show
s
)
<>
"_"
<>
(
show
f
))
Month
p
s
f
->
(
"time_months"
<>
"_"
<>
(
show
p
)
<>
"_"
<>
(
show
s
)
<>
"_"
<>
(
show
f
))
Week
p
s
f
->
(
"time_weeks"
<>
"_"
<>
(
show
p
)
<>
"_"
<>
(
show
s
)
<>
"_"
<>
(
show
f
))
Day
p
s
f
->
(
"time_days"
<>
"_"
<>
(
show
p
)
<>
"_"
<>
(
show
s
)
<>
"_"
<>
(
show
f
))
seaToLabel
::
PhyloConfig
->
[
Char
]
seaToLabel
config
=
case
(
seaElevation
config
)
of
Constante
start
step
->
(
"sea_cst_"
<>
(
show
start
)
<>
"_"
<>
(
show
step
))
Adaptative
granularity
->
(
"sea_adapt"
<>
(
show
granularity
))
Evolving
_
->
(
"sea_evolv"
)
sensToLabel
::
PhyloConfig
->
[
Char
]
sensToLabel
config
=
case
(
similarity
config
)
of
Hamming
_
_
->
Prelude
.
error
"sensToLabel: unimplemented"
WeightedLogJaccard
s
_
->
(
"WeightedLogJaccard_"
<>
show
s
)
WeightedLogSim
s
_
->
(
"WeightedLogSim-sens_"
<>
show
s
)
cliqueToLabel
::
PhyloConfig
->
[
Char
]
cliqueToLabel
config
=
case
(
clique
config
)
of
Fis
s
s'
->
"fis_"
<>
(
show
s
)
<>
"_"
<>
(
show
s'
)
MaxClique
s
t
f
->
"clique_"
<>
(
show
s
)
<>
"_"
<>
(
show
f
)
<>
"_"
<>
(
show
t
)
syncToLabel
::
PhyloConfig
->
[
Char
]
syncToLabel
config
=
case
(
phyloSynchrony
config
)
of
ByProximityThreshold
scl
sync_sens
scope
_
->
(
"scale_"
<>
(
show
scope
)
<>
"_"
<>
(
show
sync_sens
)
<>
"_"
<>
(
show
scl
))
ByProximityDistribution
_
_
->
"syncToLabel: unimplemented"
qualToConfig
::
PhyloConfig
->
[
Char
]
qualToConfig
config
=
case
(
phyloQuality
config
)
of
Quality
g
m
->
"quality_"
<>
(
show
g
)
<>
"_"
<>
(
show
m
)
-- To set up the export file's label from the configuration
configToLabel
::
PhyloConfig
->
[
Char
]
configToLabel
config
=
outputPath
config
<>
(
unpack
$
phyloName
config
)
<>
"-"
<>
(
timeToLabel
config
)
<>
"-scale_"
<>
(
show
(
phyloScale
config
))
<>
"-"
<>
(
seaToLabel
config
)
<>
"-"
<>
(
sensToLabel
config
)
<>
"-"
<>
(
cliqueToLabel
config
)
<>
"-level_"
<>
(
show
(
_qua_granularity
$
phyloQuality
config
))
<>
"-"
<>
(
syncToLabel
config
)
<>
".dot"
-- To write a sha256 from a set of config's parameters
configToSha
::
Backup
->
PhyloConfig
->
[
Char
]
configToSha
stage
config
=
unpack
$
replace
"/"
"-"
$
T
.
pack
(
show
(
hash
$
C8
.
pack
label
))
where
label
::
[
Char
]
label
=
case
stage
of
BackupPhyloWithoutLink
->
(
corpusPath
config
)
<>
(
listPath
config
)
<>
(
timeToLabel
config
)
<>
(
cliqueToLabel
config
)
BackupPhylo
->
(
corpusPath
config
)
<>
(
listPath
config
)
<>
(
timeToLabel
config
)
<>
(
cliqueToLabel
config
)
<>
(
sensToLabel
config
)
<>
(
seaToLabel
config
)
<>
(
syncToLabel
config
)
<>
(
qualToConfig
config
)
<>
(
show
(
phyloScale
config
))
readListV4
::
[
Char
]
->
IO
NgramsList
readListV4
path
=
do
listJson
<-
(
eitherDecode
<$>
readJson
path
)
::
IO
(
Either
Prelude
.
String
NgramsList
)
case
listJson
of
Left
err
->
do
putStrLn
err
Prelude
.
error
"readListV4 unimplemented"
Right
listV4
->
pure
listV4
fileToList
::
ListParser
->
FilePath
->
IO
TermList
fileToList
parser
path
=
case
parser
of
V3
->
csvMapTermList
path
V4
->
fromJust
<$>
toTermList
MapTerm
NgramsTerms
<$>
readListV4
path
--------------
-- | Main | --
--------------
main
::
IO
()
main
::
IO
()
main
=
do
main
=
do
...
@@ -274,7 +71,7 @@ main = do
...
@@ -274,7 +71,7 @@ main = do
corpus
<-
if
(
defaultMode
config
)
corpus
<-
if
(
defaultMode
config
)
then
fileToDocsDefault
(
corpusParser
config
)
(
corpusPath
config
)
[
Year
3
1
5
,
Month
3
1
5
,
Week
4
2
5
]
mapList
then
fileToDocsDefault
(
corpusParser
config
)
(
corpusPath
config
)
[
Year
3
1
5
,
Month
3
1
5
,
Week
4
2
5
]
mapList
else
fileToDocsAdvanced
(
corpusParser
config
)
(
corpusPath
config
)
(
timeUnit
config
)
mapList
else
fileToDocsAdvanced
(
corpusParser
config
)
(
corpusPath
config
)
(
timeUnit
config
)
mapList
printIOComment
(
show
(
length
corpus
)
<>
" parsed docs from the corpus"
)
printIOComment
(
show
(
length
corpus
)
<>
" parsed docs from the corpus"
)
printIOComment
(
show
(
length
$
nub
$
concat
$
map
text
corpus
)
<>
" Size ngs_coterms"
)
printIOComment
(
show
(
length
$
nub
$
concat
$
map
text
corpus
)
<>
" Size ngs_coterms"
)
...
...
bin/gargantext-phylo/Phylo/Common.hs
0 → 100644
View file @
0405e007
{-# LANGUAGE OverloadedStrings #-}
module
Common
where
import
Control.Concurrent.Async
(
mapConcurrently
)
import
Crypto.Hash.SHA256
(
hash
)
import
Data.Aeson
import
Data.ByteString.Char8
qualified
as
C8
import
Data.List
(
nub
,
tail
)
import
Data.List.Split
import
Data.Maybe
(
fromJust
)
import
Data.Text
(
unpack
,
replace
,
pack
)
import
Data.Text
qualified
as
T
import
Data.Vector
qualified
as
Vector
import
Gargantext.API.Ngrams.Prelude
(
toTermList
)
import
Gargantext.API.Ngrams.Types
import
Gargantext.Core.Text.Context
(
TermList
)
import
Gargantext.Core.Text.Corpus.Parsers
(
FileFormat
(
..
),
FileType
(
..
),
parseFile
)
import
Gargantext.Core.Text.Corpus.Parsers.CSV
(
csv_title
,
csv_abstract
,
csv_publication_year
,
csv_publication_month
,
csv_publication_day
,
csv'_source
,
csv'_title
,
csv'_abstract
,
csv'_publication_year
,
csv'_publication_month
,
csv'_publication_day
,
csv'_weight
)
import
Gargantext.Core.Text.Corpus.Parsers.CSV
qualified
as
Csv
import
Gargantext.Core.Text.List.Formats.CSV
(
csvMapTermList
)
import
Gargantext.Core.Text.Terms.WithList
(
Patterns
,
buildPatterns
,
extractTermsWithList
)
import
Gargantext.Core.Types.Main
(
ListType
(
..
))
import
Gargantext.Core.Viz.Phylo
import
Gargantext.Core.Viz.Phylo.API.Tools
import
Gargantext.Core.Viz.Phylo.PhyloTools
(
toPeriods
,
getTimePeriod
,
getTimeStep
)
import
Gargantext.Database.Admin.Types.Hyperdata
(
HyperdataDocument
(
..
))
import
Gargantext.Database.Schema.Ngrams
(
NgramsType
(
..
))
import
Gargantext.Prelude
hiding
(
hash
,
replace
)
import
Prelude
qualified
import
System.Directory
(
listDirectory
)
data
Backup
=
BackupPhyloWithoutLink
|
BackupPhylo
deriving
(
Show
)
---------------
-- | Tools | --
---------------
-- | To get all the files in a directory or just a file
getFilesFromPath
::
FilePath
->
IO
[
FilePath
]
getFilesFromPath
path
=
do
if
(
isSuffixOf
"/"
path
)
then
(
listDirectory
path
)
else
return
[
path
]
----------------
-- | Parser | --
----------------
-- | To filter the Ngrams of a document based on the termList
termsInText
::
Patterns
->
Text
->
[
Text
]
termsInText
pats
txt
=
nub
$
concat
$
map
(
map
unwords
)
$
extractTermsWithList
pats
txt
-- | To transform a Wos file (or [file]) into a list of Docs
wosToDocs
::
Int
->
Patterns
->
TimeUnit
->
FilePath
->
IO
[
Document
]
wosToDocs
limit
patterns
time
path
=
do
files
<-
getFilesFromPath
path
take
limit
<$>
map
(
\
d
->
let
title
=
fromJust
$
_hd_title
d
abstr
=
if
(
isJust
$
_hd_abstract
d
)
then
fromJust
$
_hd_abstract
d
else
""
in
Document
(
toPhyloDate
(
fromIntegral
$
fromJust
$
_hd_publication_year
d
)
(
fromJust
$
_hd_publication_month
d
)
(
fromJust
$
_hd_publication_day
d
)
time
)
(
toPhyloDate'
(
fromIntegral
$
fromJust
$
_hd_publication_year
d
)
(
fromJust
$
_hd_publication_month
d
)
(
fromJust
$
_hd_publication_day
d
)
time
)
(
termsInText
patterns
$
title
<>
" "
<>
abstr
)
Nothing
[]
time
)
<$>
concat
<$>
mapConcurrently
(
\
file
->
filter
(
\
d
->
(
isJust
$
_hd_publication_year
d
)
&&
(
isJust
$
_hd_title
d
))
<$>
fromRight
[]
<$>
parseFile
WOS
Plain
(
path
<>
file
)
)
files
-- To transform a Csv file into a list of Document
csvToDocs
::
CorpusParser
->
Patterns
->
TimeUnit
->
FilePath
->
IO
[
Document
]
csvToDocs
parser
patterns
time
path
=
case
parser
of
Wos
_
->
Prelude
.
error
"csvToDocs: unimplemented"
Csv
limit
->
Vector
.
toList
<$>
Vector
.
take
limit
<$>
Vector
.
map
(
\
row
->
Document
(
toPhyloDate
(
Csv
.
fromMIntOrDec
Csv
.
defaultYear
$
csv_publication_year
row
)
(
fromMaybe
Csv
.
defaultMonth
$
csv_publication_month
row
)
(
fromMaybe
Csv
.
defaultDay
$
csv_publication_day
row
)
time
)
(
toPhyloDate'
(
Csv
.
fromMIntOrDec
Csv
.
defaultYear
$
csv_publication_year
row
)
(
fromMaybe
Csv
.
defaultMonth
$
csv_publication_month
row
)
(
fromMaybe
Csv
.
defaultDay
$
csv_publication_day
row
)
time
)
(
termsInText
patterns
$
(
csv_title
row
)
<>
" "
<>
(
csv_abstract
row
))
Nothing
[]
time
)
<$>
snd
<$>
either
(
\
err
->
panicTrace
$
"CSV error"
<>
(
show
err
))
identity
<$>
Csv
.
readCSVFile
path
Csv'
limit
->
Vector
.
toList
<$>
Vector
.
take
limit
<$>
Vector
.
map
(
\
row
->
Document
(
toPhyloDate
(
csv'_publication_year
row
)
(
csv'_publication_month
row
)
(
csv'_publication_day
row
)
time
)
(
toPhyloDate'
(
csv'_publication_year
row
)
(
csv'_publication_month
row
)
(
csv'_publication_day
row
)
time
)
(
termsInText
patterns
$
(
csv'_title
row
)
<>
" "
<>
(
csv'_abstract
row
))
(
Just
$
csv'_weight
row
)
(
map
(
T
.
strip
.
pack
)
$
splitOn
";"
(
unpack
$
(
csv'_source
row
)))
time
)
<$>
snd
<$>
Csv
.
readWeightedCsv
path
-- To parse a file into a list of Document
fileToDocsAdvanced
::
CorpusParser
->
FilePath
->
TimeUnit
->
TermList
->
IO
[
Document
]
fileToDocsAdvanced
parser
path
time
lst
=
do
let
patterns
=
buildPatterns
lst
case
parser
of
Wos
limit
->
wosToDocs
limit
patterns
time
path
Csv
_
->
csvToDocs
parser
patterns
time
path
Csv'
_
->
csvToDocs
parser
patterns
time
path
fileToDocsDefault
::
CorpusParser
->
FilePath
->
[
TimeUnit
]
->
TermList
->
IO
[
Document
]
fileToDocsDefault
parser
path
timeUnits
lst
=
if
length
timeUnits
>
0
then
do
let
timeUnit
=
(
head'
"fileToDocsDefault"
timeUnits
)
docs
<-
fileToDocsAdvanced
parser
path
timeUnit
lst
let
periods
=
toPeriods
(
sort
$
nub
$
map
date
docs
)
(
getTimePeriod
timeUnit
)
(
getTimeStep
timeUnit
)
if
(
length
periods
<
3
)
then
fileToDocsDefault
parser
path
(
tail
timeUnits
)
lst
else
pure
docs
else
panicTrace
"this corpus is incompatible with the phylomemy reconstruction"
-- on passe à passer la time unit dans la conf envoyé au phyloMaker
-- dans le phyloMaker si default est true alors dans le setDefault ou pense à utiliser la TimeUnit de la conf
readListV4
::
[
Char
]
->
IO
NgramsList
readListV4
path
=
do
listJson
<-
(
eitherDecode
<$>
readJson
path
)
::
IO
(
Either
Prelude
.
String
NgramsList
)
case
listJson
of
Left
err
->
do
putStrLn
err
Prelude
.
error
"readListV4 unimplemented"
Right
listV4
->
pure
listV4
fileToList
::
ListParser
->
FilePath
->
IO
TermList
fileToList
parser
path
=
case
parser
of
V3
->
csvMapTermList
path
V4
->
fromJust
<$>
toTermList
MapTerm
NgramsTerms
<$>
readListV4
path
---------------
-- | Label | --
---------------
-- Config time parameters to label
timeToLabel
::
PhyloConfig
->
[
Char
]
timeToLabel
config
=
case
(
timeUnit
config
)
of
Epoch
p
s
f
->
(
"time_epochs"
<>
"_"
<>
(
show
p
)
<>
"_"
<>
(
show
s
)
<>
"_"
<>
(
show
f
))
Year
p
s
f
->
(
"time_years"
<>
"_"
<>
(
show
p
)
<>
"_"
<>
(
show
s
)
<>
"_"
<>
(
show
f
))
Month
p
s
f
->
(
"time_months"
<>
"_"
<>
(
show
p
)
<>
"_"
<>
(
show
s
)
<>
"_"
<>
(
show
f
))
Week
p
s
f
->
(
"time_weeks"
<>
"_"
<>
(
show
p
)
<>
"_"
<>
(
show
s
)
<>
"_"
<>
(
show
f
))
Day
p
s
f
->
(
"time_days"
<>
"_"
<>
(
show
p
)
<>
"_"
<>
(
show
s
)
<>
"_"
<>
(
show
f
))
seaToLabel
::
PhyloConfig
->
[
Char
]
seaToLabel
config
=
case
(
seaElevation
config
)
of
Constante
start
step
->
(
"sea_cst_"
<>
(
show
start
)
<>
"_"
<>
(
show
step
))
Adaptative
granularity
->
(
"sea_adapt"
<>
(
show
granularity
))
Evolving
_
->
(
"sea_evolv"
)
sensToLabel
::
PhyloConfig
->
[
Char
]
sensToLabel
config
=
case
(
similarity
config
)
of
Hamming
_
_
->
Prelude
.
error
"sensToLabel: unimplemented"
WeightedLogJaccard
s
_
->
(
"WeightedLogJaccard_"
<>
show
s
)
WeightedLogSim
s
_
->
(
"WeightedLogSim-sens_"
<>
show
s
)
cliqueToLabel
::
PhyloConfig
->
[
Char
]
cliqueToLabel
config
=
case
(
clique
config
)
of
Fis
s
s'
->
"fis_"
<>
(
show
s
)
<>
"_"
<>
(
show
s'
)
MaxClique
s
t
f
->
"clique_"
<>
(
show
s
)
<>
"_"
<>
(
show
f
)
<>
"_"
<>
(
show
t
)
syncToLabel
::
PhyloConfig
->
[
Char
]
syncToLabel
config
=
case
(
phyloSynchrony
config
)
of
ByProximityThreshold
scl
sync_sens
scope
_
->
(
"scale_"
<>
(
show
scope
)
<>
"_"
<>
(
show
sync_sens
)
<>
"_"
<>
(
show
scl
))
ByProximityDistribution
_
_
->
"syncToLabel: unimplemented"
qualToConfig
::
PhyloConfig
->
[
Char
]
qualToConfig
config
=
case
(
phyloQuality
config
)
of
Quality
g
m
->
"quality_"
<>
(
show
g
)
<>
"_"
<>
(
show
m
)
-- To set up the export file's label from the configuration
configToLabel
::
PhyloConfig
->
[
Char
]
configToLabel
config
=
outputPath
config
<>
(
unpack
$
phyloName
config
)
<>
"-"
<>
(
timeToLabel
config
)
<>
"-scale_"
<>
(
show
(
phyloScale
config
))
<>
"-"
<>
(
seaToLabel
config
)
<>
"-"
<>
(
sensToLabel
config
)
<>
"-"
<>
(
cliqueToLabel
config
)
<>
"-level_"
<>
(
show
(
_qua_granularity
$
phyloQuality
config
))
<>
"-"
<>
(
syncToLabel
config
)
<>
".dot"
-- To write a sha256 from a set of config's parameters
configToSha
::
Backup
->
PhyloConfig
->
[
Char
]
configToSha
stage
config
=
unpack
$
replace
"/"
"-"
$
T
.
pack
(
show
(
hash
$
C8
.
pack
label
))
where
label
::
[
Char
]
label
=
case
stage
of
BackupPhyloWithoutLink
->
(
corpusPath
config
)
<>
(
listPath
config
)
<>
(
timeToLabel
config
)
<>
(
cliqueToLabel
config
)
BackupPhylo
->
(
corpusPath
config
)
<>
(
listPath
config
)
<>
(
timeToLabel
config
)
<>
(
cliqueToLabel
config
)
<>
(
sensToLabel
config
)
<>
(
seaToLabel
config
)
<>
(
syncToLabel
config
)
<>
(
qualToConfig
config
)
<>
(
show
(
phyloScale
config
))
bin/update-project-dependencies
View file @
0405e007
...
@@ -18,8 +18,8 @@ fi
...
@@ -18,8 +18,8 @@ fi
# with the `sha256sum` result calculated on the `cabal.project` and
# with the `sha256sum` result calculated on the `cabal.project` and
# `cabal.project.freeze`. This ensures the files stay deterministic so that CI
# `cabal.project.freeze`. This ensures the files stay deterministic so that CI
# cache can kick in.
# cache can kick in.
expected_cabal_project_hash
=
"
20253fb02ed59b6e8e72f974aad2aef3409ca6f9d005f7e84bb660812f1a70db
"
expected_cabal_project_hash
=
"
c9fe39301e8b60bfd183e60e7e25a14cd1c9c66d8739bf9041ca3f4db89db7c6
"
expected_cabal_project_freeze_hash
=
"
745c65c246998cfda4d2a7a22df44a9f1f7fb0927e2afc2f16712861bf552c76
"
expected_cabal_project_freeze_hash
=
"
db24c7d3006167102532e3101e2b49bae13d478003459c7d3f1d66590e57740a
"
cabal
--store-dir
=
$STORE_DIR
v2-build
--dry-run
cabal
--store-dir
=
$STORE_DIR
v2-build
--dry-run
cabal2stack
--system-ghc
--allow-newer
--resolver
lts-21.17
--resolver-file
devops/stack/lts-21.17.yaml
-o
stack.yaml
cabal2stack
--system-ghc
--allow-newer
--resolver
lts-21.17
--resolver-file
devops/stack/lts-21.17.yaml
-o
stack.yaml
...
...
gargantext.cabal
View file @
0405e007
...
@@ -5,7 +5,7 @@ cabal-version: 3.4
...
@@ -5,7 +5,7 @@ cabal-version: 3.4
-- see: https://github.com/sol/hpack
-- see: https://github.com/sol/hpack
name: gargantext
name: gargantext
version: 0.0.6.9.9.9.6.
1
version: 0.0.6.9.9.9.6.
2
synopsis: Search, map, share
synopsis: Search, map, share
description: Please see README.md
description: Please see README.md
category: Data
category: Data
...
@@ -17,6 +17,9 @@ license: AGPL-3.0-or-later
...
@@ -17,6 +17,9 @@ license: AGPL-3.0-or-later
license-file: LICENSE
license-file: LICENSE
build-type: Simple
build-type: Simple
data-files:
data-files:
bench-data/phylo/bpa-config.json
bench-data/phylo/GarganText_DocsList-nodeId-185487.csv
bench-data/phylo/GarganText_NgramsList-185488.csv
bench-data/phylo/issue-290.json
bench-data/phylo/issue-290.json
bench-data/phylo/issue-290-small.json
bench-data/phylo/issue-290-small.json
devops/postgres/extensions.sql
devops/postgres/extensions.sql
...
@@ -82,6 +85,13 @@ flag disable-db-obfuscation-executable
...
@@ -82,6 +85,13 @@ flag disable-db-obfuscation-executable
default: False
default: False
manual: True
manual: True
-- When enabled, it suppresses at compile time the
-- debug output for the phylo code, so that it doesn't
-- hinder its performance.
flag no-phylo-debug-logs
default: False
manual: True
library
library
import:
import:
defaults
defaults
...
@@ -431,6 +441,8 @@ library
...
@@ -431,6 +441,8 @@ library
src
src
if flag(test-crypto)
if flag(test-crypto)
cpp-options: -DTEST_CRYPTO
cpp-options: -DTEST_CRYPTO
if flag(no-phylo-debug-logs)
cpp-options: -DNO_PHYLO_DEBUG_LOGS
build-depends:
build-depends:
HSvm ^>= 0.1.1.3.22
HSvm ^>= 0.1.1.3.22
, KMP ^>= 0.2.0.0
, KMP ^>= 0.2.0.0
...
@@ -476,6 +488,7 @@ library
...
@@ -476,6 +488,7 @@ library
, data-time-segment ^>= 0.1.0.0
, data-time-segment ^>= 0.1.0.0
, deepseq ^>= 1.4.4.0
, deepseq ^>= 1.4.4.0
, directory ^>= 1.3.6.0
, directory ^>= 1.3.6.0
, discrimination >= 0.5
, duckling ^>= 0.2.0.0
, duckling ^>= 0.2.0.0
, ekg-core ^>= 0.1.1.7
, ekg-core ^>= 0.1.1.7
, ekg-json ^>= 0.1.0.7
, ekg-json ^>= 0.1.0.7
...
@@ -745,8 +758,25 @@ executable gargantext-phylo
...
@@ -745,8 +758,25 @@ executable gargantext-phylo
main-is: Main.hs
main-is: Main.hs
other-modules:
other-modules:
Paths_gargantext
Paths_gargantext
Common
hs-source-dirs:
hs-source-dirs:
bin/gargantext-phylo
bin/gargantext-phylo bin/gargantext-phylo/Phylo
default-extensions:
DataKinds
DeriveGeneric
FlexibleContexts
FlexibleInstances
GADTs
GeneralizedNewtypeDeriving
ImportQualifiedPost
MultiParamTypeClasses
NamedFieldPuns
NoImplicitPrelude
OverloadedStrings
RankNTypes
RecordWildCards
StrictData
ghc-options: -Wall -threaded -rtsopts -with-rtsopts=-N -O2 -Wmissing-signatures
build-depends:
build-depends:
async ^>= 2.2.4
async ^>= 2.2.4
, bytestring ^>= 0.10.12.0
, bytestring ^>= 0.10.12.0
...
@@ -1009,4 +1039,26 @@ benchmark garg-bench
...
@@ -1009,4 +1039,26 @@ benchmark garg-bench
if impl(ghc >= 8.6)
if impl(ghc >= 8.6)
ghc-options: "-with-rtsopts=--nonmoving-gc"
ghc-options: "-with-rtsopts=--nonmoving-gc"
executable gargantext-phylo-profile
main-is: Main.hs
other-modules:
Paths_gargantext
Common
hs-source-dirs:
bin/gargantext-phylo-profile bin/gargantext-phylo/Phylo
default-extensions: GHC2021
ghc-options: -Wall -threaded -rtsopts -with-rtsopts=-N -O2 -Wmissing-signatures
build-depends:
base
, bytestring
, gargantext
, gargantext-prelude
, shelly
, text
, async
, cryptohash
, aeson
, split
, vector
, directory
default-language: Haskell2010
src/Gargantext/Core/Text/List/Formats/CSV.hs
View file @
0405e007
...
@@ -64,10 +64,11 @@ instance ToNamedRecord CsvList where
...
@@ -64,10 +64,11 @@ instance ToNamedRecord CsvList where
]
]
------------------------------------------------------------------------
------------------------------------------------------------------------
instance
FromField
CsvListType
where
instance
FromField
CsvListType
where
parseField
"map"
=
pure
CsvMap
parseField
"map"
=
pure
CsvMap
parseField
"main"
=
pure
CsvCandidate
parseField
"main"
=
pure
CsvCandidate
parseField
"stop"
=
pure
CsvStop
parseField
"candidate"
=
pure
CsvCandidate
-- backward compat
parseField
_
=
mzero
parseField
"stop"
=
pure
CsvStop
parseField
_
=
mzero
instance
ToField
CsvListType
where
instance
ToField
CsvListType
where
toField
CsvMap
=
"map"
toField
CsvMap
=
"map"
...
...
src/Gargantext/Core/Viz/Phylo/API/Tools.hs
View file @
0405e007
...
@@ -195,7 +195,7 @@ toPhyloDate' y m d _ = pack $ showGregorian $ fromGregorian (toIntege
...
@@ -195,7 +195,7 @@ toPhyloDate' y m d _ = pack $ showGregorian $ fromGregorian (toIntege
-- Utils
-- Utils
writePhylo
::
[
Char
]
->
Phylo
->
IO
()
writePhylo
::
HasCallStack
=>
[
Char
]
->
Phylo
->
IO
()
writePhylo
path
phylo
=
Lazy
.
writeFile
path
$
encode
phylo
writePhylo
path
phylo
=
Lazy
.
writeFile
path
$
encode
phylo
...
...
src/Gargantext/Core/Viz/Phylo/PhyloExport.hs
View file @
0405e007
...
@@ -702,17 +702,17 @@ toPhyloExport phylo = exportToDot phylo
...
@@ -702,17 +702,17 @@ toPhyloExport phylo = exportToDot phylo
traceExportBranches
::
[
PhyloBranch
]
->
[
PhyloBranch
]
traceExportBranches
::
[
PhyloBranch
]
->
[
PhyloBranch
]
traceExportBranches
branches
=
traceExportBranches
branches
=
trace
(
"
\n
"
trace
Phylo
(
"
\n
"
<>
"-- | Export "
<>
show
(
length
branches
)
<>
" branches"
::
Text
)
branches
<>
"-- | Export "
<>
show
(
length
branches
)
<>
" branches"
::
Text
)
branches
tracePhyloAncestors
::
[[
PhyloGroup
]]
->
[[
PhyloGroup
]]
tracePhyloAncestors
::
[[
PhyloGroup
]]
->
[[
PhyloGroup
]]
tracePhyloAncestors
groups
=
tracePhyloAncestors
groups
=
trace
(
"-- | Found "
<>
show
(
length
$
concat
$
map
_phylo_groupAncestors
$
concat
groups
)
trace
Phylo
(
"-- | Found "
<>
show
(
length
$
concat
$
map
_phylo_groupAncestors
$
concat
groups
)
<>
" ancestors"
::
Text
)
groups
<>
" ancestors"
::
Text
)
groups
tracePhyloInfo
::
Phylo
->
Phylo
tracePhyloInfo
::
Phylo
->
Phylo
tracePhyloInfo
phylo
=
tracePhyloInfo
phylo
=
trace
(
"
\n
"
<>
"##########################"
<>
"
\n\n
"
<>
"-- | Phylo with level = "
trace
Phylo
(
"
\n
"
<>
"##########################"
<>
"
\n\n
"
<>
"-- | Phylo with level = "
<>
show
(
getLevel
phylo
)
<>
" applied to "
<>
show
(
getLevel
phylo
)
<>
" applied to "
<>
show
(
length
$
Vector
.
toList
$
getRoots
phylo
)
<>
" foundations"
::
Text
<>
show
(
length
$
Vector
.
toList
$
getRoots
phylo
)
<>
" foundations"
::
Text
)
phylo
)
phylo
...
@@ -720,7 +720,7 @@ tracePhyloInfo phylo =
...
@@ -720,7 +720,7 @@ tracePhyloInfo phylo =
traceExportGroups
::
[
PhyloGroup
]
->
[
PhyloGroup
]
traceExportGroups
::
[
PhyloGroup
]
->
[
PhyloGroup
]
traceExportGroups
groups
=
traceExportGroups
groups
=
trace
(
"
\n
"
<>
"-- | Export "
trace
Phylo
(
"
\n
"
<>
"-- | Export "
<>
show
(
length
$
nub
$
map
(
\
g
->
g
^.
phylo_groupBranchId
)
groups
)
<>
" branches, "
<>
show
(
length
$
nub
$
map
(
\
g
->
g
^.
phylo_groupBranchId
)
groups
)
<>
" branches, "
<>
show
(
length
groups
)
<>
" groups and "
<>
show
(
length
groups
)
<>
" groups and "
<>
show
(
length
$
nub
$
concat
$
map
(
\
g
->
g
^.
phylo_groupNgrams
)
groups
)
<>
" terms"
::
Text
<>
show
(
length
$
nub
$
concat
$
map
(
\
g
->
g
^.
phylo_groupNgrams
)
groups
)
<>
" terms"
::
Text
...
...
src/Gargantext/Core/Viz/Phylo/PhyloMaker.hs
View file @
0405e007
...
@@ -10,19 +10,23 @@ Portability : POSIX
...
@@ -10,19 +10,23 @@ Portability : POSIX
{-# OPTIONS_GHC -fno-warn-deprecations #-}
{-# OPTIONS_GHC -fno-warn-deprecations #-}
{-# LANGUAGE BangPatterns #-}
module
Gargantext.Core.Viz.Phylo.PhyloMaker
where
module
Gargantext.Core.Viz.Phylo.PhyloMaker
where
import
Control.Lens
hiding
(
Level
)
import
Control.Lens
hiding
(
Level
)
import
Control.Parallel.Strategies
(
parList
,
rdeepseq
,
using
)
import
Control.Parallel.Strategies
(
parMap
,
rpar
)
import
Data.List
(
nub
,
partition
,
intersect
,
tail
)
import
Data.Containers.ListUtils
(
nubOrd
)
import
Data.Discrimination
qualified
as
D
import
Data.List
(
partition
,
intersect
,
tail
)
import
Data.List
qualified
as
List
import
Data.List
qualified
as
List
import
Data.Map
(
fromListWith
,
keys
,
unionWith
,
fromList
,
empty
,
toList
,
elems
,
(
!
),
restrictKeys
,
insert
)
import
Data.Map
(
fromListWith
,
keys
,
unionWith
,
fromList
,
empty
,
toList
,
elems
,
(
!
),
restrictKeys
,
insert
)
import
Data.Map
qualified
as
Map
import
Data.Map
qualified
as
Map
import
Data.Set
qualified
as
Set
import
Data.Set
qualified
as
Set
import
Data.
Vector
(
Vector
)
import
Data.
Text
qualified
as
T
import
Data.Vector
qualified
as
Vector
import
Data.Vector
qualified
as
Vector
import
Data.Vector
(
Vector
)
import
Gargantext.Core.Methods.Graph.MaxClique
(
getMaxCliques
)
import
Gargantext.Core.Methods.Graph.MaxClique
(
getMaxCliques
)
import
Gargantext.Core.Methods.Similarities
(
Similarity
(
Conditional
))
import
Gargantext.Core.Methods.Similarities
(
Similarity
(
Conditional
))
import
Gargantext.Core.Text.Metrics.FrequentItemSet
(
fisWithSizePolyMap
,
fisWithSizePolyMap'
,
Size
(
..
))
import
Gargantext.Core.Text.Metrics.FrequentItemSet
(
fisWithSizePolyMap
,
fisWithSizePolyMap'
,
Size
(
..
))
...
@@ -131,9 +135,9 @@ evolvSeaLadder nbFdt lambda freq similarities graph = map snd
...
@@ -131,9 +135,9 @@ evolvSeaLadder nbFdt lambda freq similarities graph = map snd
--------
--------
-- 1.1) for each measure of similarity, prune the flat phylo, compute the branches and estimate the quality
-- 1.1) for each measure of similarity, prune the flat phylo, compute the branches and estimate the quality
qua
::
[
Double
]
qua
::
[
Double
]
qua
=
map
(
\
thr
->
qua
=
parMap
rpar
(
\
thr
->
let
edges
=
filter
(
\
edge
->
snd
edge
>=
thr
)
graph
let
edges
=
filter
(
\
edge
->
snd
edge
>=
thr
)
graph
nodes
=
nub
$
concat
$
map
(
\
((
n
,
n'
),
_
)
->
[
n
,
n'
])
edges
nodes
=
nub
Ord
$
concat
$
map
(
\
((
n
,
n'
),
_
)
->
[
n
,
n'
])
edges
branches
=
toRelatedComponents
nodes
edges
branches
=
toRelatedComponents
nodes
edges
in
toPhyloQuality
nbFdt
lambda
freq
branches
in
toPhyloQuality
nbFdt
lambda
freq
branches
)
$
(
Set
.
toList
similarities
)
)
$
(
Set
.
toList
similarities
)
...
@@ -146,7 +150,7 @@ findSeaLadder :: Phylo -> Phylo
...
@@ -146,7 +150,7 @@ findSeaLadder :: Phylo -> Phylo
findSeaLadder
phylo
=
case
getSeaElevation
phylo
of
findSeaLadder
phylo
=
case
getSeaElevation
phylo
of
Constante
start
gap
->
phylo
&
phylo_seaLadder
.~
(
constSeaLadder
start
gap
Set
.
empty
)
Constante
start
gap
->
phylo
&
phylo_seaLadder
.~
(
constSeaLadder
start
gap
Set
.
empty
)
Adaptative
steps
->
phylo
&
phylo_seaLadder
.~
(
squareLadder
$
adaptSeaLadder
steps
similarities
Set
.
empty
)
Adaptative
steps
->
phylo
&
phylo_seaLadder
.~
(
squareLadder
$
adaptSeaLadder
steps
similarities
Set
.
empty
)
Evolving
_
->
let
ladder
=
evolvSeaLadder
Evolving
_
->
let
!
ladder
=
evolvSeaLadder
(
fromIntegral
$
Vector
.
length
$
getRoots
phylo
)
(
fromIntegral
$
Vector
.
length
$
getRoots
phylo
)
(
getLevel
phylo
)
(
getLevel
phylo
)
(
getRootsFreq
phylo
)
(
getRootsFreq
phylo
)
...
@@ -172,7 +176,7 @@ findSeaLadder phylo = case getSeaElevation phylo of
...
@@ -172,7 +176,7 @@ findSeaLadder phylo = case getSeaElevation phylo of
docs
=
filterDocs
(
getDocsByDate
phylo
)
([
period
]
++
next
)
docs
=
filterDocs
(
getDocsByDate
phylo
)
([
period
]
++
next
)
diagos
=
filterDiago
(
getCoocByDate
phylo
)
([
period
]
++
next
)
diagos
=
filterDiago
(
getCoocByDate
phylo
)
([
period
]
++
next
)
-- 1.2) compute the kinship similarities between pairs of source & target in parallel
-- 1.2) compute the kinship similarities between pairs of source & target in parallel
pairs
=
map
(
\
source
->
pairs
=
parMap
rpar
(
\
source
->
let
candidates
=
filter
(
\
target
->
(
>
2
)
$
length
let
candidates
=
filter
(
\
target
->
(
>
2
)
$
length
$
intersect
(
getGroupNgrams
source
)
(
getGroupNgrams
target
))
targets
$
intersect
(
getGroupNgrams
source
)
(
getGroupNgrams
target
))
targets
in
map
(
\
target
->
in
map
(
\
target
->
...
@@ -183,8 +187,7 @@ findSeaLadder phylo = case getSeaElevation phylo of
...
@@ -183,8 +187,7 @@ findSeaLadder phylo = case getSeaElevation phylo of
in
((
source
,
target
),
toSimilarity
nbDocs
diago
(
getSimilarity
phylo
)
(
getGroupNgrams
source
)
(
getGroupNgrams
target
)
(
getGroupNgrams
target
))
in
((
source
,
target
),
toSimilarity
nbDocs
diago
(
getSimilarity
phylo
)
(
getGroupNgrams
source
)
(
getGroupNgrams
target
)
(
getGroupNgrams
target
))
)
candidates
)
candidates
)
sources
)
sources
pairs'
=
pairs
`
using
`
parList
rdeepseq
in
acc
++
(
concat
pairs
)
in
acc
++
(
concat
pairs'
)
)
[]
$
keys
$
phylo
^.
phylo_periods
)
[]
$
keys
$
phylo
^.
phylo_periods
appendGroups
::
(
a
->
Period
->
(
Text
,
Text
)
->
Scale
->
Int
->
[
Cooc
]
->
Map
Int
Double
->
PhyloGroup
)
->
Scale
->
Map
(
Date
,
Date
)
[
a
]
->
Phylo
->
Phylo
appendGroups
::
(
a
->
Period
->
(
Text
,
Text
)
->
Scale
->
Int
->
[
Cooc
]
->
Map
Int
Double
->
PhyloGroup
)
->
Scale
->
Map
(
Date
,
Date
)
[
a
]
->
Phylo
->
Phylo
...
@@ -311,15 +314,14 @@ filterCliqueBySize thr l = filter (\clq -> (length $ clq ^. clustering_roots) >=
...
@@ -311,15 +314,14 @@ filterCliqueBySize thr l = filter (\clq -> (length $ clq ^. clustering_roots) >=
-- To filter nested Fis
-- To filter nested Fis
filterCliqueByNested
::
Map
(
Date
,
Date
)
[
Clustering
]
->
Map
(
Date
,
Date
)
[
Clustering
]
filterCliqueByNested
::
Map
(
Date
,
Date
)
[
Clustering
]
->
Map
(
Date
,
Date
)
[
Clustering
]
filterCliqueByNested
m
=
filterCliqueByNested
m
=
let
clq
=
map
(
\
l
->
let
clq
=
parMap
rpar
(
\
l
->
foldl
'
(
\
mem
f
->
if
(
any
(
\
f'
->
isNested
(
f'
^.
clustering_roots
)
(
f
^.
clustering_roots
))
mem
)
foldl
(
\
mem
f
->
if
(
any
(
\
f'
->
isNested
(
f'
^.
clustering_roots
)
(
f
^.
clustering_roots
))
mem
)
then
mem
then
mem
else
else
let
fMax
=
filter
(
\
f'
->
not
$
isNested
(
f
^.
clustering_roots
)
(
f'
^.
clustering_roots
))
mem
let
fMax
=
filter
(
\
f'
->
not
$
isNested
(
f
^.
clustering_roots
)
(
f'
^.
clustering_roots
))
mem
in
fMax
++
[
f
]
)
[]
l
)
in
fMax
++
[
f
]
)
[]
l
)
$
elems
m
$
elems
m
clq'
=
clq
`
using
`
parList
rdeepseq
in
fromList
$
zip
(
keys
m
)
clq
in
fromList
$
zip
(
keys
m
)
clq'
-- | To transform a time map of docs into a time map of Fis with some filters
-- | To transform a time map of docs into a time map of Fis with some filters
...
@@ -340,7 +342,7 @@ toSeriesOfClustering phylo phyloDocs = case (clique $ getConfig phylo) of
...
@@ -340,7 +342,7 @@ toSeriesOfClustering phylo phyloDocs = case (clique $ getConfig phylo) of
seriesOfClustering
::
Map
(
Date
,
Date
)
[
Clustering
]
seriesOfClustering
::
Map
(
Date
,
Date
)
[
Clustering
]
seriesOfClustering
=
case
(
clique
$
getConfig
phylo
)
of
seriesOfClustering
=
case
(
clique
$
getConfig
phylo
)
of
Fis
_
_
->
Fis
_
_
->
let
fis
=
map
(
\
(
prd
,
docs
)
->
let
fis
=
parMap
rpar
(
\
(
prd
,
docs
)
->
case
(
corpusParser
$
getConfig
phylo
)
of
case
(
corpusParser
$
getConfig
phylo
)
of
Csv'
_
->
let
lst
=
toList
Csv'
_
->
let
lst
=
toList
$
fisWithSizePolyMap'
(
Segment
1
20
)
1
(
map
(
\
d
->
(
ngramsToIdx
(
text
d
)
(
getRoots
phylo
),
(
weight
d
,
(
sourcesToIdx
(
sources
d
)
(
getSources
phylo
)))))
docs
)
$
fisWithSizePolyMap'
(
Segment
1
20
)
1
(
map
(
\
d
->
(
ngramsToIdx
(
text
d
)
(
getRoots
phylo
),
(
weight
d
,
(
sourcesToIdx
(
sources
d
)
(
getSources
phylo
)))))
docs
)
...
@@ -350,18 +352,16 @@ toSeriesOfClustering phylo phyloDocs = case (clique $ getConfig phylo) of
...
@@ -350,18 +352,16 @@ toSeriesOfClustering phylo phyloDocs = case (clique $ getConfig phylo) of
in
(
prd
,
map
(
\
f
->
Clustering
(
Set
.
toList
$
fst
f
)
(
snd
f
)
prd
(
Just
$
fromIntegral
$
snd
f
)
[]
)
lst
)
in
(
prd
,
map
(
\
f
->
Clustering
(
Set
.
toList
$
fst
f
)
(
snd
f
)
prd
(
Just
$
fromIntegral
$
snd
f
)
[]
)
lst
)
)
)
$
toList
phyloDocs
$
toList
phyloDocs
fis'
=
fis
`
using
`
parList
rdeepseq
in
fromList
fis
in
fromList
fis'
MaxClique
_
thr
filterType
->
MaxClique
_
thr
filterType
->
let
mcl
=
map
(
\
(
prd
,
docs
)
->
let
mcl
=
parMap
rpar
(
\
(
prd
,
docs
)
->
let
cooc
=
map
round
let
cooc
=
map
round
$
foldl'
sumCooc
empty
$
foldl'
sumCooc
empty
$
map
listToMatrix
$
map
listToMatrix
$
map
(
\
d
->
ngramsToIdx
(
text
d
)
(
getRoots
phylo
))
docs
$
map
(
\
d
->
ngramsToIdx
(
text
d
)
(
getRoots
phylo
))
docs
in
(
prd
,
map
(
\
cl
->
Clustering
cl
0
prd
Nothing
[]
)
$
getMaxCliques
filterType
Conditional
thr
cooc
))
in
(
prd
,
map
(
\
cl
->
Clustering
cl
0
prd
Nothing
[]
)
$
getMaxCliques
filterType
Conditional
thr
cooc
))
$
toList
phyloDocs
$
toList
phyloDocs
mcl'
=
mcl
`
using
`
parList
rdeepseq
in
fromList
mcl
in
fromList
mcl'
--------------------------------------
--------------------------------------
-- dev viz graph maxClique getMaxClique
-- dev viz graph maxClique getMaxClique
...
@@ -377,7 +377,7 @@ docsToTimeScaleCooc :: [Document] -> Vector Ngrams -> Map Date Cooc
...
@@ -377,7 +377,7 @@ docsToTimeScaleCooc :: [Document] -> Vector Ngrams -> Map Date Cooc
docsToTimeScaleCooc
docs
fdt
=
docsToTimeScaleCooc
docs
fdt
=
let
mCooc
=
fromListWith
sumCooc
let
mCooc
=
fromListWith
sumCooc
$
map
(
\
(
_d
,
l
)
->
(
_d
,
listToMatrix
l
))
$
map
(
\
(
_d
,
l
)
->
(
_d
,
listToMatrix
l
))
$
map
(
\
doc
->
(
date
doc
,
sort
$
ngramsToIdx
(
text
doc
)
fdt
))
docs
$
map
(
\
doc
->
(
date
doc
,
D
.
sort
$
ngramsToIdx
(
text
doc
)
fdt
))
docs
mCooc'
=
fromList
mCooc'
=
fromList
$
map
(
\
t
->
(
t
,
empty
))
$
map
(
\
t
->
(
t
,
empty
))
$
toTimeScale
(
map
date
docs
)
1
$
toTimeScale
(
map
date
docs
)
1
...
@@ -406,13 +406,12 @@ groupDocsByPeriodRec f prds docs acc =
...
@@ -406,13 +406,12 @@ groupDocsByPeriodRec f prds docs acc =
groupDocsByPeriod'
::
(
NFData
doc
,
Ord
date
,
Enum
date
)
=>
(
doc
->
date
)
->
[(
date
,
date
)]
->
[
doc
]
->
Map
(
date
,
date
)
[
doc
]
groupDocsByPeriod'
::
(
NFData
doc
,
Ord
date
,
Enum
date
)
=>
(
doc
->
date
)
->
[(
date
,
date
)]
->
[
doc
]
->
Map
(
date
,
date
)
[
doc
]
groupDocsByPeriod'
f
pds
docs
=
groupDocsByPeriod'
f
pds
docs
=
let
docs'
=
groupBy
(
\
d
d'
->
f
d
==
f
d'
)
$
sortOn
f
docs
let
docs'
=
groupBy
(
\
d
d'
->
f
d
==
f
d'
)
$
sortOn
f
docs
periods
=
map
(
inPeriode
f
docs'
)
pds
periods
=
parMap
rpar
(
inPeriode
f
docs'
)
pds
periods'
=
periods
`
using
`
parList
rdeepseq
in
trace
(
"
\n
"
<>
"-- | Group "
in
trace
(
"
\n
"
<>
"-- | Group "
<>
show
(
length
docs
)
<>
show
(
length
docs
)
<>
" docs by "
<>
" docs by "
<>
show
(
length
pds
)
<>
" periods"
<>
"
\n
"
::
Text
)
<>
show
(
length
pds
)
<>
" periods"
<>
"
\n
"
::
Text
)
$
fromList
$
zip
pds
periods
'
$
fromList
$
zip
pds
periods
where
where
--------------------------------------
--------------------------------------
inPeriode
::
Ord
b
=>
(
t
->
b
)
->
[[
t
]]
->
(
b
,
b
)
->
[
t
]
inPeriode
::
Ord
b
=>
(
t
->
b
)
->
[[
t
]]
->
(
b
,
b
)
->
[
t
]
...
@@ -425,13 +424,12 @@ groupDocsByPeriod' f pds docs =
...
@@ -425,13 +424,12 @@ groupDocsByPeriod' f pds docs =
groupDocsByPeriod
::
(
NFData
doc
,
Ord
date
,
Enum
date
)
=>
(
doc
->
date
)
->
[(
date
,
date
)]
->
[
doc
]
->
Map
(
date
,
date
)
[
doc
]
groupDocsByPeriod
::
(
NFData
doc
,
Ord
date
,
Enum
date
)
=>
(
doc
->
date
)
->
[(
date
,
date
)]
->
[
doc
]
->
Map
(
date
,
date
)
[
doc
]
groupDocsByPeriod
_
_
[]
=
panic
"[ERR][Viz.Phylo.PhyloMaker] Empty [Documents] can not have any periods"
groupDocsByPeriod
_
_
[]
=
panic
"[ERR][Viz.Phylo.PhyloMaker] Empty [Documents] can not have any periods"
groupDocsByPeriod
f
pds
es
=
groupDocsByPeriod
f
pds
es
=
let
periods
=
map
(
inPeriode
f
es
)
pds
let
periods
=
parMap
rpar
(
inPeriode
f
es
)
pds
periods'
=
periods
`
using
`
parList
rdeepseq
in
trace
(
"
\n
"
<>
"-- | Group "
in
trace
(
"
\n
"
<>
"-- | Group "
<>
show
(
length
es
)
<>
" docs by "
<>
show
(
length
es
)
<>
" docs by "
<>
show
(
length
pds
)
<>
" periods"
<>
"
\n
"
::
Text
)
<>
show
(
length
pds
)
<>
" periods"
<>
"
\n
"
::
Text
)
$
fromList
$
zip
pds
periods
'
$
fromList
$
zip
pds
periods
where
where
--------------------------------------
--------------------------------------
inPeriode
::
Ord
b
=>
(
t
->
b
)
->
[
t
]
->
(
b
,
b
)
->
[
t
]
inPeriode
::
Ord
b
=>
(
t
->
b
)
->
[
t
]
->
(
b
,
b
)
->
[
t
]
...
@@ -446,7 +444,7 @@ docsToTermFreq docs fdt =
...
@@ -446,7 +444,7 @@ docsToTermFreq docs fdt =
freqs
=
map
(
/
(
nbDocs
))
freqs
=
map
(
/
(
nbDocs
))
$
fromList
$
fromList
$
map
(
\
lst
->
(
head'
"docsToTermFreq"
lst
,
fromIntegral
$
length
lst
))
$
map
(
\
lst
->
(
head'
"docsToTermFreq"
lst
,
fromIntegral
$
length
lst
))
$
group
$
sort
$
concat
$
map
(
\
d
->
nub
$
ngramsToIdx
(
text
d
)
fdt
)
docs
$
group
$
D
.
sort
$
concat
$
map
(
\
d
->
D
.
nub
$
ngramsToIdx
(
text
d
)
fdt
)
docs
sumFreqs
=
sum
$
elems
freqs
sumFreqs
=
sum
$
elems
freqs
in
map
(
/
sumFreqs
)
freqs
in
map
(
/
sumFreqs
)
freqs
...
@@ -454,28 +452,28 @@ docsToTermFreq docs fdt =
...
@@ -454,28 +452,28 @@ docsToTermFreq docs fdt =
docsToTermCount
::
[
Document
]
->
Vector
Ngrams
->
Map
Int
Double
docsToTermCount
::
[
Document
]
->
Vector
Ngrams
->
Map
Int
Double
docsToTermCount
docs
roots
=
fromList
docsToTermCount
docs
roots
=
fromList
$
map
(
\
lst
->
(
head'
"docsToTermCount"
lst
,
fromIntegral
$
length
lst
))
$
map
(
\
lst
->
(
head'
"docsToTermCount"
lst
,
fromIntegral
$
length
lst
))
$
group
$
sort
$
concat
$
map
(
\
d
->
nub
$
ngramsToIdx
(
text
d
)
roots
)
docs
$
group
$
D
.
sort
$
concat
$
map
(
\
d
->
D
.
nub
$
ngramsToIdx
(
text
d
)
roots
)
docs
docsToTimeTermCount
::
[
Document
]
->
Vector
Ngrams
->
(
Map
Date
(
Map
Int
Double
))
docsToTimeTermCount
::
[
Document
]
->
Vector
Ngrams
->
(
Map
Date
(
Map
Int
Double
))
docsToTimeTermCount
docs
roots
=
docsToTimeTermCount
docs
roots
=
let
docs'
=
Map
.
map
(
\
l
->
fromList
$
map
(
\
lst
->
(
head'
"docsToTimeTermCount"
lst
,
fromIntegral
$
length
lst
))
let
docs'
=
Map
.
map
(
\
l
->
fromList
$
map
(
\
lst
->
(
head'
"docsToTimeTermCount"
lst
,
fromIntegral
$
length
lst
))
$
group
$
sort
l
)
$
group
$
D
.
sort
l
)
$
fromListWith
(
++
)
$
fromListWith
(
++
)
$
map
(
\
d
->
(
date
d
,
nub
$
ngramsToIdx
(
text
d
)
roots
))
docs
$
map
(
\
d
->
(
date
d
,
D
.
nub
$
ngramsToIdx
(
text
d
)
roots
))
docs
time
=
fromList
$
map
(
\
t
->
(
t
,
Map
.
empty
))
$
toTimeScale
(
keys
docs'
)
1
time
=
fromList
$
map
(
\
t
->
(
t
,
Map
.
empty
))
$
toTimeScale
(
keys
docs'
)
1
in
unionWith
(
Map
.
union
)
time
docs'
in
unionWith
(
Map
.
union
)
time
docs'
docsToLastTermFreq
::
Int
->
[
Document
]
->
Vector
Ngrams
->
Map
Int
Double
docsToLastTermFreq
::
Int
->
[
Document
]
->
Vector
Ngrams
->
Map
Int
Double
docsToLastTermFreq
n
docs
fdt
=
docsToLastTermFreq
n
docs
fdt
=
let
last
=
take
n
$
reverse
$
sort
$
map
date
docs
let
last
=
take
n
$
reverse
$
D
.
sort
$
map
date
docs
nbDocs
=
fromIntegral
$
length
$
filter
(
\
d
->
elem
(
date
d
)
last
)
docs
nbDocs
=
fromIntegral
$
length
$
filter
(
\
d
->
elem
(
date
d
)
last
)
docs
freqs
=
map
(
/
(
nbDocs
))
freqs
=
map
(
/
(
nbDocs
))
$
fromList
$
fromList
$
map
(
\
lst
->
(
head'
"docsToLastTermFreq"
lst
,
fromIntegral
$
length
lst
))
$
map
(
\
lst
->
(
head'
"docsToLastTermFreq"
lst
,
fromIntegral
$
length
lst
))
$
group
$
sort
$
concat
$
map
(
\
d
->
nub
$
ngramsToIdx
(
text
d
)
fdt
)
$
filter
(
\
d
->
elem
(
date
d
)
last
)
docs
$
group
$
D
.
sort
$
concat
$
map
(
\
d
->
D
.
nub
$
ngramsToIdx
(
text
d
)
fdt
)
$
filter
(
\
d
->
elem
(
date
d
)
last
)
docs
sumFreqs
=
sum
$
elems
freqs
sumFreqs
=
sum
$
elems
freqs
in
map
(
/
sumFreqs
)
freqs
in
map
(
/
sumFreqs
)
freqs
...
@@ -527,10 +525,10 @@ setDefault conf timeScale nbDocs = defaultConfig
...
@@ -527,10 +525,10 @@ setDefault conf timeScale nbDocs = defaultConfig
--
--
initPhylo
::
[
Document
]
->
PhyloConfig
->
Phylo
initPhylo
::
[
Document
]
->
PhyloConfig
->
Phylo
initPhylo
docs
conf
=
initPhylo
docs
conf
=
let
roots
=
Vector
.
fromList
$
nub
$
concat
$
map
text
docs
let
roots
=
Vector
.
fromList
$
D
.
nubWith
T
.
unpack
$
concat
$
map
text
docs
timeScale
=
head'
"initPhylo"
$
map
docTime
docs
timeScale
=
head'
"initPhylo"
$
map
docTime
docs
foundations
=
PhyloFoundations
roots
empty
foundations
=
PhyloFoundations
roots
empty
docsSources
=
PhyloSources
(
Vector
.
fromList
$
nub
$
concat
$
map
sources
docs
)
docsSources
=
PhyloSources
(
Vector
.
fromList
$
nub
Ord
$
concat
$
map
sources
docs
)
docsCounts
=
PhyloCounts
(
docsToTimeScaleCooc
docs
(
foundations
^.
foundations_roots
))
docsCounts
=
PhyloCounts
(
docsToTimeScaleCooc
docs
(
foundations
^.
foundations_roots
))
(
docsToTimeScaleNb
docs
)
(
docsToTimeScaleNb
docs
)
(
docsToTimeTermCount
docs
(
foundations
^.
foundations_roots
))
(
docsToTimeTermCount
docs
(
foundations
^.
foundations_roots
))
...
@@ -540,7 +538,7 @@ initPhylo docs conf =
...
@@ -540,7 +538,7 @@ initPhylo docs conf =
params
=
if
(
defaultMode
conf
)
params
=
if
(
defaultMode
conf
)
then
defaultPhyloParam
{
_phyloParam_config
=
setDefault
conf
timeScale
(
length
docs
)
}
then
defaultPhyloParam
{
_phyloParam_config
=
setDefault
conf
timeScale
(
length
docs
)
}
else
defaultPhyloParam
{
_phyloParam_config
=
conf
}
else
defaultPhyloParam
{
_phyloParam_config
=
conf
}
periods
=
toPeriods
(
sort
$
nub
$
map
date
docs
)
(
getTimePeriod
timeScale
)
(
getTimeStep
timeScale
)
periods
=
toPeriods
(
D
.
sort
$
D
.
nub
$
map
date
docs
)
(
getTimePeriod
timeScale
)
(
getTimeStep
timeScale
)
in
trace
(
"
\n
"
<>
"-- | Init a phylo out of "
in
trace
(
"
\n
"
<>
"-- | Init a phylo out of "
<>
show
(
length
docs
)
<>
" docs
\n
"
::
Text
)
<>
show
(
length
docs
)
<>
" docs
\n
"
::
Text
)
$
trace
(
"
\n
"
<>
"-- | lambda "
$
trace
(
"
\n
"
<>
"-- | lambda "
...
...
src/Gargantext/Core/Viz/Phylo/PhyloTools.hs
View file @
0405e007
...
@@ -10,7 +10,8 @@ Portability : POSIX
...
@@ -10,7 +10,8 @@ Portability : POSIX
{-# OPTIONS_GHC -fno-warn-deprecations #-}
{-# OPTIONS_GHC -fno-warn-deprecations #-}
{-# LANGUAGE ViewPatterns #-}
{-# LANGUAGE CPP #-}
{-# LANGUAGE ViewPatterns #-}
module
Gargantext.Core.Viz.Phylo.PhyloTools
where
module
Gargantext.Core.Viz.Phylo.PhyloTools
where
...
@@ -232,6 +233,14 @@ keepFilled f thr l = if (null $ f thr l) && (not $ null l)
...
@@ -232,6 +233,14 @@ keepFilled f thr l = if (null $ f thr l) && (not $ null l)
then
keepFilled
f
(
thr
-
1
)
l
then
keepFilled
f
(
thr
-
1
)
l
else
f
thr
l
else
f
thr
l
-- | General workhorse to use in lieu of /trace/. It decides at compile
-- time whether or not debug logs are enabled.
tracePhylo
::
(
Print
s
,
IsString
s
)
=>
s
->
a
->
a
#
if
NO_PHYLO_DEBUG_LOGS
tracePhylo
_
p
=
p
#
else
tracePhylo
msg
p
=
trace
msg
p
#
endif
traceClique
::
Map
(
Date
,
Date
)
[
Clustering
]
->
String
traceClique
::
Map
(
Date
,
Date
)
[
Clustering
]
->
String
traceClique
mFis
=
foldl'
(
\
msg
cpt
->
msg
<>
show
(
countSup
cpt
cliques
)
<>
" (>"
<>
show
(
cpt
)
<>
") "
)
""
[
1
..
6
]
traceClique
mFis
=
foldl'
(
\
msg
cpt
->
msg
<>
show
(
countSup
cpt
cliques
)
<>
" (>"
<>
show
(
cpt
)
<>
") "
)
""
[
1
..
6
]
...
@@ -252,7 +261,7 @@ traceSupport mFis = foldl' (\msg cpt -> msg <> show (countSup cpt supports) <> "
...
@@ -252,7 +261,7 @@ traceSupport mFis = foldl' (\msg cpt -> msg <> show (countSup cpt supports) <> "
traceFis
::
[
Char
]
->
Map
(
Date
,
Date
)
[
Clustering
]
->
Map
(
Date
,
Date
)
[
Clustering
]
traceFis
::
[
Char
]
->
Map
(
Date
,
Date
)
[
Clustering
]
->
Map
(
Date
,
Date
)
[
Clustering
]
traceFis
msg
mFis
=
trace
(
"
\n
"
<>
"-- | "
<>
msg
<>
" : "
<>
show
(
sum
$
map
length
$
elems
mFis
)
<>
"
\n
"
traceFis
msg
mFis
=
trace
Phylo
(
"
\n
"
<>
"-- | "
<>
msg
<>
" : "
<>
show
(
sum
$
map
length
$
elems
mFis
)
<>
"
\n
"
<>
"Support : "
<>
traceSupport
mFis
<>
"
\n
"
<>
"Support : "
<>
traceSupport
mFis
<>
"
\n
"
<>
"Nb Ngrams : "
<>
traceClique
mFis
<>
"
\n
"
<>
"Nb Ngrams : "
<>
traceClique
mFis
<>
"
\n
"
)
mFis
)
mFis
...
@@ -636,7 +645,7 @@ updateLevel level phylo = phylo { _phylo_level = level }
...
@@ -636,7 +645,7 @@ updateLevel level phylo = phylo { _phylo_level = level }
traceToPhylo
::
Scale
->
Phylo
->
Phylo
traceToPhylo
::
Scale
->
Phylo
->
Phylo
traceToPhylo
lvl
phylo
=
traceToPhylo
lvl
phylo
=
trace
(
"
\n
"
<>
"-- | End of phylo making at scale "
<>
show
(
lvl
)
<>
" with "
trace
Phylo
(
"
\n
"
<>
"-- | End of phylo making at scale "
<>
show
(
lvl
)
<>
" with "
<>
show
(
length
$
getGroupsFromScale
lvl
phylo
)
<>
" groups and "
<>
show
(
length
$
getGroupsFromScale
lvl
phylo
)
<>
" groups and "
<>
show
(
length
$
nub
$
map
_phylo_groupBranchId
$
getGroupsFromScale
lvl
phylo
)
<>
show
(
length
$
nub
$
map
_phylo_groupBranchId
$
getGroupsFromScale
lvl
phylo
)
<>
" branches"
<>
"
\n
"
::
Text
<>
" branches"
<>
"
\n
"
::
Text
...
@@ -702,7 +711,7 @@ toRelatedComponents nodes edges =
...
@@ -702,7 +711,7 @@ toRelatedComponents nodes edges =
traceSynchronyEnd
::
Phylo
->
Phylo
traceSynchronyEnd
::
Phylo
->
Phylo
traceSynchronyEnd
phylo
=
traceSynchronyEnd
phylo
=
trace
(
"-- | End synchronic clustering at scale "
<>
show
(
getLastLevel
phylo
)
trace
Phylo
(
"-- | End synchronic clustering at scale "
<>
show
(
getLastLevel
phylo
)
<>
" with "
<>
show
(
length
$
getGroupsFromScale
(
getLastLevel
phylo
)
phylo
)
<>
" groups"
<>
" with "
<>
show
(
length
$
getGroupsFromScale
(
getLastLevel
phylo
)
phylo
)
<>
" groups"
<>
" and "
<>
show
(
length
$
nub
$
map
_phylo_groupBranchId
$
getGroupsFromScale
(
getLastLevel
phylo
)
phylo
)
<>
" and "
<>
show
(
length
$
nub
$
map
_phylo_groupBranchId
$
getGroupsFromScale
(
getLastLevel
phylo
)
phylo
)
<>
" branches"
<>
"
\n
"
::
Text
<>
" branches"
<>
"
\n
"
::
Text
...
@@ -710,7 +719,7 @@ traceSynchronyEnd phylo =
...
@@ -710,7 +719,7 @@ traceSynchronyEnd phylo =
traceSynchronyStart
::
Phylo
->
Phylo
traceSynchronyStart
::
Phylo
->
Phylo
traceSynchronyStart
phylo
=
traceSynchronyStart
phylo
=
trace
(
"
\n
"
<>
"-- | Start synchronic clustering at scale "
<>
show
(
getLastLevel
phylo
)
trace
Phylo
(
"
\n
"
<>
"-- | Start synchronic clustering at scale "
<>
show
(
getLastLevel
phylo
)
<>
" with "
<>
show
(
length
$
getGroupsFromScale
(
getLastLevel
phylo
)
phylo
)
<>
" groups"
<>
" with "
<>
show
(
length
$
getGroupsFromScale
(
getLastLevel
phylo
)
phylo
)
<>
" groups"
<>
" and "
<>
show
(
length
$
nub
$
map
_phylo_groupBranchId
$
getGroupsFromScale
(
getLastLevel
phylo
)
phylo
)
<>
" and "
<>
show
(
length
$
nub
$
map
_phylo_groupBranchId
$
getGroupsFromScale
(
getLastLevel
phylo
)
phylo
)
<>
" branches"
<>
"
\n
"
::
Text
<>
" branches"
<>
"
\n
"
::
Text
...
@@ -754,7 +763,7 @@ ngramsInBranches branches = nub $ foldl' (\acc g -> acc ++ (g ^. phylo_groupNgra
...
@@ -754,7 +763,7 @@ ngramsInBranches branches = nub $ foldl' (\acc g -> acc ++ (g ^. phylo_groupNgra
traceMatchSuccess
::
Double
->
Double
->
Double
->
[[[
PhyloGroup
]]]
->
[[[
PhyloGroup
]]]
traceMatchSuccess
::
Double
->
Double
->
Double
->
[[[
PhyloGroup
]]]
->
[[[
PhyloGroup
]]]
traceMatchSuccess
thr
qua
qua'
nextBranches
=
traceMatchSuccess
thr
qua
qua'
nextBranches
=
trace
(
"
\n
"
<>
"-- local branches : "
trace
Phylo
(
"
\n
"
<>
"-- local branches : "
<>
(
Text
.
pack
$
init
$
show
((
init
.
init
.
snd
)
<>
(
Text
.
pack
$
init
$
show
((
init
.
init
.
snd
)
$
(
head'
"trace"
$
head'
"trace"
$
head'
"trace"
nextBranches
)
^.
phylo_groupBranchId
))
$
(
head'
"trace"
$
head'
"trace"
$
head'
"trace"
nextBranches
)
^.
phylo_groupBranchId
))
<>
",(1.."
<>
show
(
length
nextBranches
)
<>
")]"
<>
",(1.."
<>
show
(
length
nextBranches
)
<>
")]"
...
@@ -767,7 +776,7 @@ traceMatchSuccess thr qua qua' nextBranches =
...
@@ -767,7 +776,7 @@ traceMatchSuccess thr qua qua' nextBranches =
traceMatchFailure
::
Double
->
Double
->
Double
->
[[
PhyloGroup
]]
->
[[
PhyloGroup
]]
traceMatchFailure
::
Double
->
Double
->
Double
->
[[
PhyloGroup
]]
->
[[
PhyloGroup
]]
traceMatchFailure
thr
qua
qua'
branches
=
traceMatchFailure
thr
qua
qua'
branches
=
trace
(
"
\n
"
<>
"-- local branches : "
trace
Phylo
(
"
\n
"
<>
"-- local branches : "
<>
(
Text
.
pack
$
init
$
show
((
init
.
snd
)
$
(
head'
"trace"
$
head'
"trace"
branches
)
^.
phylo_groupBranchId
))
<>
(
Text
.
pack
$
init
$
show
((
init
.
snd
)
$
(
head'
"trace"
$
head'
"trace"
branches
)
^.
phylo_groupBranchId
))
<>
",(1.."
<>
show
(
length
branches
)
<>
")]"
<>
",(1.."
<>
show
(
length
branches
)
<>
")]"
<>
" | "
<>
show
(
length
$
concat
branches
)
<>
" groups"
<>
"
\n
"
<>
" | "
<>
show
(
length
$
concat
branches
)
<>
" groups"
<>
"
\n
"
...
@@ -778,7 +787,7 @@ traceMatchFailure thr qua qua' branches =
...
@@ -778,7 +787,7 @@ traceMatchFailure thr qua qua' branches =
traceMatchNoSplit
::
[[
PhyloGroup
]]
->
[[
PhyloGroup
]]
traceMatchNoSplit
::
[[
PhyloGroup
]]
->
[[
PhyloGroup
]]
traceMatchNoSplit
branches
=
traceMatchNoSplit
branches
=
trace
(
"
\n
"
<>
"-- local branches : "
trace
Phylo
(
"
\n
"
<>
"-- local branches : "
<>
(
Text
.
pack
$
init
$
show
((
init
.
snd
)
$
(
head'
"trace"
$
head'
"trace"
branches
)
^.
phylo_groupBranchId
))
<>
(
Text
.
pack
$
init
$
show
((
init
.
snd
)
$
(
head'
"trace"
$
head'
"trace"
branches
)
^.
phylo_groupBranchId
))
<>
",(1.."
<>
show
(
length
branches
)
<>
")]"
<>
",(1.."
<>
show
(
length
branches
)
<>
")]"
<>
" | "
<>
show
(
length
$
concat
branches
)
<>
" groups"
<>
"
\n
"
<>
" | "
<>
show
(
length
$
concat
branches
)
<>
" groups"
<>
"
\n
"
...
@@ -788,7 +797,7 @@ traceMatchNoSplit branches =
...
@@ -788,7 +797,7 @@ traceMatchNoSplit branches =
traceMatchLimit
::
[[
PhyloGroup
]]
->
[[
PhyloGroup
]]
traceMatchLimit
::
[[
PhyloGroup
]]
->
[[
PhyloGroup
]]
traceMatchLimit
branches
=
traceMatchLimit
branches
=
trace
(
"
\n
"
<>
"-- local branches : "
trace
Phylo
(
"
\n
"
<>
"-- local branches : "
<>
(
Text
.
pack
$
init
$
show
((
init
.
snd
)
$
(
head'
"trace"
$
head'
"trace"
branches
)
^.
phylo_groupBranchId
))
<>
(
Text
.
pack
$
init
$
show
((
init
.
snd
)
$
(
head'
"trace"
$
head'
"trace"
branches
)
^.
phylo_groupBranchId
))
<>
",(1.."
<>
show
(
length
branches
)
<>
")]"
<>
",(1.."
<>
show
(
length
branches
)
<>
")]"
<>
" | "
<>
show
(
length
$
concat
branches
)
<>
" groups"
<>
"
\n
"
<>
" | "
<>
show
(
length
$
concat
branches
)
<>
" groups"
<>
"
\n
"
...
@@ -798,15 +807,15 @@ traceMatchLimit branches =
...
@@ -798,15 +807,15 @@ traceMatchLimit branches =
traceMatchEnd
::
[
PhyloGroup
]
->
[
PhyloGroup
]
traceMatchEnd
::
[
PhyloGroup
]
->
[
PhyloGroup
]
traceMatchEnd
groups
=
traceMatchEnd
groups
=
trace
(
"
\n
"
<>
"-- | End temporal matching with "
<>
show
(
length
$
nub
$
map
(
\
g
->
g
^.
phylo_groupBranchId
)
groups
)
trace
Phylo
(
"
\n
"
<>
"-- | End temporal matching with "
<>
show
(
length
$
nub
$
map
(
\
g
->
g
^.
phylo_groupBranchId
)
groups
)
<>
" branches and "
<>
show
(
length
groups
)
<>
" groups"
<>
"
\n
"
::
Text
)
groups
<>
" branches and "
<>
show
(
length
groups
)
<>
" groups"
<>
"
\n
"
::
Text
)
groups
traceTemporalMatching
::
[
PhyloGroup
]
->
[
PhyloGroup
]
traceTemporalMatching
::
[
PhyloGroup
]
->
[
PhyloGroup
]
traceTemporalMatching
groups
=
traceTemporalMatching
groups
=
trace
(
"
\n
"
<>
"-- | Start temporal matching for "
<>
show
(
length
groups
)
<>
" groups"
<>
"
\n
"
::
Text
)
groups
trace
Phylo
(
"
\n
"
<>
"-- | Start temporal matching for "
<>
show
(
length
groups
)
<>
" groups"
<>
"
\n
"
::
Text
)
groups
traceGroupsProxi
::
[
Double
]
->
[
Double
]
traceGroupsProxi
::
[
Double
]
->
[
Double
]
traceGroupsProxi
l
=
traceGroupsProxi
l
=
trace
(
"
\n
"
<>
"-- | "
<>
show
(
List
.
length
l
)
<>
" computed pairs of groups Similarity"
<>
"
\n
"
::
Text
)
l
trace
Phylo
(
"
\n
"
<>
"-- | "
<>
show
(
List
.
length
l
)
<>
" computed pairs of groups Similarity"
<>
"
\n
"
::
Text
)
l
src/Gargantext/Core/Viz/Phylo/TemporalMatching.hs
View file @
0405e007
...
@@ -662,7 +662,7 @@ seaLevelRise fdt similarity lambda minBranch frequency ladder rise frame periods
...
@@ -662,7 +662,7 @@ seaLevelRise fdt similarity lambda minBranch frequency ladder rise frame periods
else
else
-- start breaking up all the possible branches for the current similarity threshold
-- start breaking up all the possible branches for the current similarity threshold
let
thr
=
List
.
head
ladder
let
thr
=
List
.
head
ladder
branches'
=
trace
(
"threshold = "
<>
(
T
.
pack
$
printf
"%.3f"
thr
)
branches'
=
trace
Phylo
(
"threshold = "
<>
(
T
.
pack
$
printf
"%.3f"
thr
)
<>
" F(λ) = "
<>
(
T
.
pack
$
printf
"%.5f"
(
toPhyloQuality
fdt
lambda
frequency
(
map
fst
branches
)))
<>
" F(λ) = "
<>
(
T
.
pack
$
printf
"%.5f"
(
toPhyloQuality
fdt
lambda
frequency
(
map
fst
branches
)))
<>
" ξ = "
<>
(
T
.
pack
$
printf
"%.5f"
(
globalAccuracy
frequency
(
map
fst
branches
)))
<>
" ξ = "
<>
(
T
.
pack
$
printf
"%.5f"
(
globalAccuracy
frequency
(
map
fst
branches
)))
<>
" ρ = "
<>
(
T
.
pack
$
printf
"%.5f"
(
globalRecall
frequency
(
map
fst
branches
)))
<>
" ρ = "
<>
(
T
.
pack
$
printf
"%.5f"
(
globalRecall
frequency
(
map
fst
branches
)))
...
...
stack.yaml
View file @
0405e007
...
@@ -324,6 +324,7 @@ flags:
...
@@ -324,6 +324,7 @@ flags:
"
build-search-demo"
:
false
"
build-search-demo"
:
false
gargantext
:
gargantext
:
"
disable-db-obfuscation-executable"
:
false
"
disable-db-obfuscation-executable"
:
false
"
no-phylo-debug-logs"
:
false
"
test-crypto"
:
false
"
test-crypto"
:
false
"
generic-deriving"
:
"
generic-deriving"
:
"
base-4-9"
:
true
"
base-4-9"
:
true
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment