Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
haskell-gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
158
Issues
158
List
Board
Labels
Milestones
Merge Requests
11
Merge Requests
11
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gargantext
haskell-gargantext
Commits
8f2332b3
Commit
8f2332b3
authored
Oct 01, 2018
by
Mael NICOLAS
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
need to fix text parser and apply Pandoc to it,the title parser work
parent
73bccfaf
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
45 additions
and
0 deletions
+45
-0
package.yaml
package.yaml
+3
-0
Wikimedia.hs
src/Gargantext/Text/Parsers/Wikimedia.hs
+42
-0
No files found.
package.yaml
View file @
8f2332b3
...
...
@@ -50,6 +50,7 @@ library:
-
Gargantext.Text.Metrics.Count
-
Gargantext.Text.Parsers.CSV
-
Gargantext.Text.Parsers.Date
-
Gargantext.Text.Parsers.Wikimedia
-
Gargantext.Text.Parsers.WOS
-
Gargantext.Text.Search
-
Gargantext.Text.Terms
...
...
@@ -150,6 +151,8 @@ library:
-
wai-cors
-
wai-extra
-
warp
-
xml-conduit
-
xml-types
-
yaml
-
zip
-
zlib
...
...
src/Gargantext/Text/Parsers/Wikimedia.hs
0 → 100644
View file @
8f2332b3
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE NoImplicitPrelude #-}
module
Gargantext.Text.Parsers.Wikimedia
where
import
Prelude
(
print
)
import
Gargantext.Prelude
import
Text.XML.Stream.Parse
import
Control.Monad.Catch
import
Data.ByteString.Lazy
import
Data.Conduit
import
Data.XML.Types
(
Event
)
import
Data.Text
as
T
data
Page
=
Page
{
_title
::
T
.
Text
,
_text
::
Maybe
T
.
Text
}
deriving
(
Show
)
runParser
::
IO
()
runParser
=
do
file
<-
readFile
"text.xml"
page
<-
runConduit
$
parseLBS
def
file
.|
force
"page required"
parsePage
print
page
parseRevision
::
MonadThrow
m
=>
ConduitT
Event
o
m
(
Maybe
T
.
Text
)
parseRevision
=
tagNoAttr
"{http://www.mediawiki.org/xml/export-0.10/}revision"
$
do
text
<-
force
"text is missing"
$
tagIgnoreAttrs
"{http://www.mediawiki.org/xml/export-0.10/}text"
content
many_
$
ignoreAnyTreeContent
return
text
parsePage
::
MonadThrow
m
=>
ConduitT
Event
o
m
(
Maybe
Page
)
parsePage
=
tagNoAttr
"{http://www.mediawiki.org/xml/export-0.10/}page"
$
do
title
<-
force
"title is missing"
$
tagNoAttr
"{http://www.mediawiki.org/xml/export-0.10/}title"
content
revision
<-
parseRevision
many_
$
ignoreAnyTreeContent
return
$
Page
title
revision
parseMediawiki
::
MonadThrow
m
=>
ConduitT
Event
Page
m
(
Maybe
()
)
parseMediawiki
=
tagIgnoreAttrs
"{http://www.mediawiki.org/xml/export-0.10/}mediawiki"
$
manyYield'
parsePage
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment