Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
147
Issues
147
List
Board
Labels
Milestones
Merge Requests
6
Merge Requests
6
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gargantext
haskell-gargantext
Commits
2cdbaa72
Commit
2cdbaa72
authored
May 28, 2018
by
Alexandre Delanoë
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[CONTEXT]
parent
24171124
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
43 additions
and
17 deletions
+43
-17
Context.hs
src/Gargantext/Text/Context.hs
+39
-0
CSV.hs
src/Gargantext/Text/Parsers/CSV.hs
+4
-17
No files found.
src/Gargantext/Text/Context.hs
0 → 100644
View file @
2cdbaa72
{-|
Module : Gargantext.Text.Context
Description :
Copyright : (c) CNRS, 2017-Present
License : AGPL + CECILL v3
Maintainer : team@gargantext.org
Stability : experimental
Portability : POSIX
Context of text management tool
-}
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}
module
Gargantext.Text.Context
where
import
Data.Text
(
Text
,
pack
,
unpack
,
length
)
import
Data.String
(
IsString
)
import
Text.HTML.TagSoup
import
Gargantext.Text
import
Gargantext.Prelude
hiding
(
length
)
data
SplitBy
=
Paragraph
|
Sentences
|
Chars
splitBy
::
SplitBy
->
Int
->
Text
->
[
Text
]
splitBy
Chars
n
=
map
pack
.
chunkAlong
n
n
.
unpack
splitBy
Sentences
n
=
map
unsentences
.
chunkAlong
n
n
.
sentences
splitBy
Paragraph
_
=
map
removeTag
.
filter
isTagText
.
parseTags
where
removeTag
::
IsString
p
=>
Tag
p
->
p
removeTag
(
TagText
x
)
=
x
removeTag
(
TagComment
x
)
=
x
removeTag
_
=
""
src/Gargantext/Text/Parsers/CSV.hs
View file @
2cdbaa72
...
...
@@ -13,7 +13,7 @@ CSV parser for Gargantext corpus files.
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric
#-}
{-# LANGUAGE DeriveGeneric #-}
module
Gargantext.Text.Parsers.CSV
where
...
...
@@ -25,16 +25,15 @@ import Control.Applicative
import
Data.Char
(
ord
)
import
Data.Csv
import
Data.Either
(
Either
(
Left
,
Right
))
import
Data.String
(
IsString
)
import
Data.Text
(
Text
,
pack
,
unpack
,
length
)
import
Data.Text
(
Text
,
pack
,
length
)
import
qualified
Data.ByteString.Lazy
as
BL
import
Data.Vector
(
Vector
)
import
qualified
Data.Vector
as
V
import
Safe
(
tailMay
)
import
Text.HTML.TagSoup
import
Gargantext.Text
import
Gargantext.Text.Context
import
Gargantext.Prelude
hiding
(
length
)
---------------------------------------------------------------
...
...
@@ -69,7 +68,6 @@ fromDocs docs = V.map fromDocs' docs
-- | Split a document in its context
-- TODO adapt the size of the paragraph according to the corpus average
data
SplitBy
=
Paragraph
|
Sentences
|
Chars
splitDoc
::
Mean
->
SplitBy
->
CsvDoc
->
Vector
CsvDoc
splitDoc
m
splt
doc
=
let
docSize
=
(
length
$
c_abstract
doc
)
in
...
...
@@ -92,21 +90,10 @@ splitDoc' splt (CsvDoc t s py pm pd abst auth) = V.fromList $ [firstDoc] <> next
nextDocs
=
map
(
\
txt
->
CsvDoc
(
head'
$
sentences
txt
)
s
py
pm
pd
(
unsentences
$
tail'
$
sentences
txt
)
auth
)
(
tail'
abstracts
)
abstracts
=
(
splitBy
splt
)
abst
abstracts
=
(
splitBy
splt
20
)
abst
head'
x
=
maybe
""
identity
(
head
x
)
tail'
x
=
maybe
[
""
]
identity
(
tailMay
x
)
splitBy
::
SplitBy
->
Text
->
[
Text
]
splitBy
Chars
=
map
pack
.
chunkAlong
1000
1
.
unpack
splitBy
Sentences
=
map
unsentences
.
chunkAlong
20
1
.
sentences
splitBy
Paragraph
=
map
removeTag
.
filter
isTagText
.
parseTags
where
removeTag
::
IsString
p
=>
Tag
p
->
p
removeTag
(
TagText
x
)
=
x
removeTag
(
TagComment
x
)
=
x
removeTag
_
=
""
---------------------------------------------------------------
---------------------------------------------------------------
type
Mean
=
Double
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment