Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Grégoire Locqueville
haskell-gargantext
Commits
916be24b
Commit
916be24b
authored
Sep 01, 2022
by
Alexandre Delanoë
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[WIP] how to clean data text coming from a Book from Gutemberg
parent
91b97fbd
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
49 additions
and
0 deletions
+49
-0
Clean.hs
src/Gargantext/Core/Text/Clean.hs
+49
-0
No files found.
src/Gargantext/Core/Text/Clean.hs
0 → 100644
View file @
916be24b
{-|
Module : Gargantext.Core.Text.Clean
Description : Tools to clean text
Copyright : (c) CNRS, 2017 - present
License : AGPL + CECILL v3
Maintainer : team@gargantext.org
Stability : experimental
Portability : POSIX
Clean some texts before importing it.
For a given Language, chose a big master piece of litteracy to analyze
it with GarganText. Here is a an example with a famous French Writer
that could be the incarnation of the mythic Gargantua.
-}
{-# LANGUAGE OverloadedStrings #-}
module
Gargantext.Core.Text.Clean
where
import
Gargantext.Prelude
import
Data.Text
(
Text
)
import
qualified
Data.Text
as
Text
import
qualified
Data.List
as
List
groupLines
::
[
Text
]
->
[
Text
]
groupLines
(
a
:
x
:
xs
)
=
undefined
cleanText
::
Text
->
[
Text
]
cleanText
txt
=
List
.
filter
(
/=
""
)
$
toParagraphs
$
Text
.
lines
$
Text
.
replace
"--"
""
-- removing bullets like of dialogs
$
Text
.
replace
"
\xd
"
""
txt
toParagraphs
::
[
Text
]
->
[
Text
]
toParagraphs
(
a
:
x
:
xs
)
=
if
a
==
""
then
[
a
]
<>
toParagraphs
(
x
:
xs
)
else
if
x
==
""
then
[
a
]
<>
toParagraphs
(
x
:
xs
)
else
toParagraphs
$
[
a
<>
" "
<>
x
]
<>
xs
toParagraphs
[
a
]
=
[
a
]
toParagraphs
[]
=
[]
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment