Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
H
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Przemyslaw Kaminski
haskell-gargantext
Commits
8b7506c0
Commit
8b7506c0
authored
May 17, 2019
by
Alexandre Delanoë
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[PARSERS] refactoring.
parent
fa4332db
Changes
6
Hide whitespace changes
Inline
Side-by-side
Showing
6 changed files
with
26 additions
and
26 deletions
+26
-26
CleanCsvCorpus.hs
bin/gargantext-cli/CleanCsvCorpus.hs
+1
-1
Main.hs
bin/gargantext-cli/Main.hs
+2
-2
package.yaml
package.yaml
+1
-0
CSV.hs
src/Gargantext/Text/Parsers/CSV.hs
+17
-20
Search.hs
src/Gargantext/Text/Search.hs
+3
-3
stack.yaml
stack.yaml
+2
-0
No files found.
bin/gargantext-cli/CleanCsvCorpus.hs
View file @
8b7506c0
...
...
@@ -29,7 +29,7 @@ import qualified Gargantext.Text.Parsers.CSV as CSV
type
Query
=
[
S
.
Term
]
filterDocs
::
[
DocId
]
->
Vector
CSV
.
Doc
->
Vector
CSV
.
Doc
filterDocs
::
[
DocId
]
->
Vector
CSV
.
CsvGargV3
->
Vector
CSV
.
CsvGargV3
filterDocs
docIds
=
V
.
filter
(
\
doc
->
S
.
member
(
CSV
.
d_docId
doc
)
$
S
.
fromList
docIds
)
...
...
bin/gargantext-cli/Main.hs
View file @
8b7506c0
...
...
@@ -57,7 +57,7 @@ import Gargantext.Core.Types
import
Gargantext.Text.Terms
import
Gargantext.Text.Context
import
Gargantext.Text.Terms.WithList
import
Gargantext.Text.Parsers.CSV
(
read
Csv
,
csv_title
,
csv_abstract
,
csv_publication_year
)
import
Gargantext.Text.Parsers.CSV
(
read
File
,
csv_title
,
csv_abstract
,
csv_publication_year
)
import
Gargantext.Text.List.CSV
(
csvGraphTermList
)
import
Gargantext.Text.Terms
(
terms
)
import
Gargantext.Text.Metrics.Count
(
coocOnContexts
,
Coocs
)
...
...
@@ -105,7 +105,7 @@ main = do
.
DV
.
toList
.
DV
.
map
(
\
n
->
(
csv_publication_year
n
,
[(
csv_title
n
)
<>
" "
<>
(
csv_abstract
n
)]))
.
snd
<$>
read
Csv
corpusFile
<$>
read
File
corpusFile
-- termListMap :: [Text]
termList
<-
csvGraphTermList
termListFile
...
...
package.yaml
View file @
8b7506c0
...
...
@@ -96,6 +96,7 @@ library:
-
conduit-extra
-
containers
-
contravariant
-
crawlerPubMed
-
data-time-segment
-
directory
-
duckling
...
...
src/Gargantext/Text/Parsers/CSV.hs
View file @
8b7506c0
...
...
@@ -46,7 +46,7 @@ headerCsvGargV3 = header [ "title"
,
"authors"
]
---------------------------------------------------------------
data
Doc
=
Doc
data
CsvGargV3
=
CsvGargV3
{
d_docId
::
!
Int
,
d_title
::
!
Text
,
d_source
::
!
Text
...
...
@@ -59,9 +59,8 @@ data Doc = Doc
deriving
(
Show
)
---------------------------------------------------------------
-- | Doc 2 HyperdataDocument
doc2hyperdataDocument
::
Doc
->
HyperdataDocument
--doc2hyperdataDocument (Doc did dt ds dpy dpm dpd dab dau) =
doc2hyperdataDocument
(
Doc
did
dt
_
dpy
dpm
dpd
dab
dau
)
=
toDoc
::
CsvGargV3
->
HyperdataDocument
toDoc
(
CsvGargV3
did
dt
_
dpy
dpm
dpd
dab
dau
)
=
HyperdataDocument
(
Just
"CSV"
)
(
Just
.
pack
.
show
$
did
)
Nothing
...
...
@@ -82,25 +81,22 @@ doc2hyperdataDocument (Doc did dt _ dpy dpm dpd dab dau) =
Nothing
Nothing
---------------------------------------------------------------
-- | Types Conversions
toDocs
::
Vector
CsvDoc
->
[
Doc
]
toDocs
::
Vector
CsvDoc
->
[
CsvGargV3
]
toDocs
v
=
V
.
toList
$
V
.
zipWith
(
\
nId
(
CsvDoc
t
s
py
pm
pd
abst
auth
)
->
Doc
nId
t
s
py
pm
pd
abst
auth
)
->
CsvGargV3
nId
t
s
py
pm
pd
abst
auth
)
(
V
.
enumFromN
1
(
V
.
length
v''
))
v''
where
v''
=
V
.
foldl
(
\
v'
sep
->
V
.
concatMap
(
splitDoc
(
docsSize
v'
)
sep
)
v'
)
v
seps
seps
=
(
V
.
fromList
[
Paragraphs
1
,
Sentences
3
,
Chars
3
])
---------------------------------------------------------------
fromDocs
::
Vector
Doc
->
Vector
CsvDoc
fromDocs
::
Vector
CsvGargV3
->
Vector
CsvDoc
fromDocs
docs
=
V
.
map
fromDocs'
docs
where
fromDocs'
(
Doc
_
t
s
py
pm
pd
abst
auth
)
=
(
CsvDoc
t
s
py
pm
pd
abst
auth
)
fromDocs'
(
CsvGargV3
_
t
s
py
pm
pd
abst
auth
)
=
(
CsvDoc
t
s
py
pm
pd
abst
auth
)
---------------------------------------------------------------
-- | Split a document in its context
...
...
@@ -201,25 +197,25 @@ delimiter = fromIntegral $ ord '\t'
------------------------------------------------------------------------
------------------------------------------------------------------------
readCsvOn
::
[
CsvDoc
->
Text
]
->
FilePath
->
IO
[
Text
]
readCsvOn
fields
fp
=
V
.
toList
<$>
V
.
map
(
\
l
->
intercalate
(
pack
" "
)
$
map
(
\
field
->
field
l
)
fields
)
<$>
snd
<$>
readFile
fp
readCsvOn
fields
fp
=
V
.
toList
<$>
V
.
map
(
\
l
->
intercalate
(
pack
" "
)
$
map
(
\
field
->
field
l
)
fields
)
<$>
snd
<$>
readFile
fp
------------------------------------------------------------------------
readFileLazy
::
(
FromNamedRecord
a
)
=>
a
->
FilePath
->
IO
(
Header
,
Vector
a
)
readFileLazy
::
(
FromNamedRecord
a
)
=>
proxy
a
->
FilePath
->
IO
(
Header
,
Vector
a
)
readFileLazy
f
=
fmap
(
readByteStringLazy
f
)
.
BL
.
readFile
readFileStrict
::
(
FromNamedRecord
a
)
=>
a
->
FilePath
->
IO
(
Header
,
Vector
a
)
readFileStrict
::
(
FromNamedRecord
a
)
=>
proxy
a
->
FilePath
->
IO
(
Header
,
Vector
a
)
readFileStrict
f
=
fmap
(
readByteStringStrict
f
)
.
BS
.
readFile
readByteStringLazy
::
(
FromNamedRecord
a
)
=>
a
->
BL
.
ByteString
->
(
Header
,
Vector
a
)
readByteStringLazy
f
bs
=
case
decodeByNameWith
csvDecodeOptions
bs
of
readByteStringLazy
::
(
FromNamedRecord
a
)
=>
proxy
a
->
BL
.
ByteString
->
(
Header
,
Vector
a
)
readByteStringLazy
_f
bs
=
case
decodeByNameWith
csvDecodeOptions
bs
of
Left
e
->
panic
(
pack
e
)
Right
csvDocs
->
csvDocs
readByteStringStrict
::
(
FromNamedRecord
a
)
=>
a
->
BS
.
ByteString
->
(
Header
,
Vector
a
)
readByteStringStrict
::
(
FromNamedRecord
a
)
=>
proxy
a
->
BS
.
ByteString
->
(
Header
,
Vector
a
)
readByteStringStrict
ff
=
(
readByteStringLazy
ff
)
.
BL
.
fromStrict
------------------------------------------------------------------------
...
...
@@ -227,6 +223,7 @@ readByteStringStrict ff = (readByteStringLazy ff) . BL.fromStrict
readFile
::
FilePath
->
IO
(
Header
,
Vector
CsvDoc
)
readFile
=
fmap
readCsvLazyBS
.
BL
.
readFile
-- | TODO use readByteStringLazy
readCsvLazyBS
::
BL
.
ByteString
->
(
Header
,
Vector
CsvDoc
)
readCsvLazyBS
bs
=
case
decodeByNameWith
csvDecodeOptions
bs
of
...
...
src/Gargantext/Text/Search.hs
View file @
8b7506c0
...
...
@@ -35,7 +35,7 @@ import Gargantext.Text.Parsers.CSV
type
DocId
=
Int
type
DocSearchEngine
=
SearchEngine
Doc
CsvGargV3
DocId
DocField
NoFeatures
...
...
@@ -48,7 +48,7 @@ initialDocSearchEngine :: DocSearchEngine
initialDocSearchEngine
=
initSearchEngine
docSearchConfig
defaultSearchRankParameters
docSearchConfig
::
SearchConfig
Doc
DocId
DocField
NoFeatures
docSearchConfig
::
SearchConfig
CsvGargV3
DocId
DocField
NoFeatures
docSearchConfig
=
SearchConfig
{
documentKey
=
d_docId
,
...
...
@@ -57,7 +57,7 @@ docSearchConfig =
documentFeatureValue
=
const
noFeatures
}
where
extractTerms
::
Doc
->
DocField
->
[
Text
]
extractTerms
::
CsvGargV3
->
DocField
->
[
Text
]
extractTerms
doc
TitleField
=
monoTexts
(
d_title
doc
)
extractTerms
doc
AbstractField
=
monoTexts
(
d_abstract
doc
)
...
...
stack.yaml
View file @
8b7506c0
...
...
@@ -25,6 +25,8 @@ extra-deps:
commit
:
3fe28b683aba5ddf05e3b5f8eced0bd05c5a29f9
-
git
:
https://github.com/robstewart57/rdf4h.git
commit
:
4fd2edf30c141600ffad6d730cc4c1c08a6dbce4
-
git
:
https://gitlab.iscpif.fr/gargantext/crawlers/pubmed
commit
:
dcaa0f5dd53f20648f4f5a615d29163582a4219c
#- opaleye-0.6.7002.0
-
KMP-0.1.0.2
-
accelerate-1.2.0.0
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment