Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pubmed
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
2
Issues
2
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gargantext
crawlers
pubmed
Commits
c2e3e77c
Commit
c2e3e77c
authored
Jul 05, 2019
by
Alexandre Delanoë
Browse files
Options
Browse Files
Download
Plain Diff
[MERGE]
parents
dcaa0f5d
8504d4c5
Changes
4
Show whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
65 additions
and
5 deletions
+65
-5
Main.hs
app/Main.hs
+58
-2
package.yaml
package.yaml
+2
-0
Client.hs
src/PUBMED/Client.hs
+1
-0
Parser.hs
src/PUBMED/Parser.hs
+4
-3
No files found.
app/Main.hs
View file @
c2e3e77c
...
...
@@ -2,7 +2,63 @@
module
Main
where
import
qualified
PUBMED
as
PubMed
import
PUBMED.Client
import
PUBMED.Parser
import
Network.HTTP.Client
(
newManager
)
import
Network.HTTP.Client.TLS
(
tlsManagerSettings
)
import
Servant.Client
(
runClientM
,
mkClientEnv
,
BaseUrl
(
..
),
Scheme
(
..
))
import
Text.XML
(
parseLBS_
,
def
)
import
Text.XML.Cursor
(
fromDocument
,
Cursor
)
import
Data.Conduit
(
ConduitT
)
import
Data.ByteString.Lazy
(
ByteString
)
import
Data.ByteString.Char8
(
pack
)
import
Control.Monad.Catch
(
MonadThrow
)
import
Control.Applicative
import
Data.Attoparsec.ByteString
import
Data.Attoparsec.ByteString.Char8
(
anyChar
)
import
qualified
Data.ByteString.Lazy
as
LBS
import
qualified
Data.ByteString
as
DB
import
qualified
Data.Text
as
T
runParser
::
Show
res
=>
(
Cursor
->
res
)
->
LBS
.
ByteString
->
res
runParser
parser
=
parser
.
fromDocument
.
parseLBS_
def
-- | TODO this parser need at least one subs at the end
-- (use endOfInput)
removeSub
::
Parser
ByteString
removeSub
=
do
dt
<-
many
textWithBalise
pure
$
LBS
.
fromStrict
$
pack
$
concat
dt
where
textWithBalise
=
manyTill
anyChar
(
try
subs
)
subs
=
sub
<|>
asub
-- <|> isEndOfInput
sub
=
string
"<sub>"
asub
=
string
"</sub>"
runSimpleFindPubmedAbstractRequest
::
T
.
Text
->
IO
[
PubMed
]
runSimpleFindPubmedAbstractRequest
rq
=
do
manager'
<-
newManager
tlsManagerSettings
res
<-
runClientM
(
search
(
Just
rq
))
(
mkClientEnv
manager'
$
BaseUrl
Https
"eutils.ncbi.nlm.nih.gov"
443
"entrez/eutils"
)
case
res
of
(
Left
err
)
->
return
[]
(
Right
(
BsXml
docs
))
->
do
let
docIds
=
runParser
parseDocId
docs
res'
<-
runClientM
(
fetch
(
Just
"pubmed"
)
(
Just
"abstract"
)
docIds
)
(
mkClientEnv
manager'
$
BaseUrl
Https
"eutils.ncbi.nlm.nih.gov"
443
"entrez/eutils"
)
case
res'
of
(
Left
err
)
->
return
[]
(
Right
(
BsXml
abstracts
))
->
do
-- TODO remove "</sub>" maybe there is a cleaner way with isEndOfInput
case
(
parseOnly
removeSub
$
LBS
.
toStrict
abstracts
<>
"</sub>"
)
of
(
Left
_
)
->
return
[]
(
Right
v
)
->
pubMedParser
v
main
::
IO
()
main
=
PubMed
.
crawler
"organ"
>>=
print
main
=
do
pubmeds
<-
runSimpleFindPubmedAbstractRequest
"bisphenol"
print
pubmeds
package.yaml
100644 → 100755
View file @
c2e3e77c
...
...
@@ -35,6 +35,8 @@ dependencies:
-
time
-
data-time-segment
-
protolude
-
attoparsec
-
either
library
:
source-dirs
:
src
...
...
src/PUBMED/Client.hs
View file @
c2e3e77c
...
...
@@ -43,6 +43,7 @@ pubmedApi = Proxy
search
::
Maybe
T
.
Text
->
ClientM
BsXml
fetch
::
Maybe
T
.
Text
->
Maybe
T
.
Text
->
[
Integer
]
...
...
src/PUBMED/Parser.hs
View file @
c2e3e77c
...
...
@@ -38,7 +38,8 @@ parseDocId cursor = fst <$>
identity
::
a
->
a
identity
x
=
x
manyTagsUntil
::
MonadThrow
m
=>
Name
manyTagsUntil
::
MonadThrow
m
=>
Name
->
ConduitT
Event
o
m
b
->
ConduitT
Event
o
m
(
Maybe
b
)
manyTagsUntil
name
f
=
do
...
...
@@ -97,11 +98,11 @@ pubMedParser bstring = runConduit $ parseLBS def bstring
parseArticleSet
::
MonadThrow
m
=>
ConduitT
Event
PubMed
m
()
parseArticleSet
=
force
"
force"
$
tagIgnoreAttrs
"PubmedArticleSet"
$
manyYield
parsePubMedArticle
force
"
PubmedArticleSet required"
$
manyTagsUntil
"PubmedArticleSet"
$
manyYield
parsePubMedArticle
parsePubMedArticle
::
MonadThrow
m
=>
ConduitT
Event
o
m
(
Maybe
PubMed
)
parsePubMedArticle
=
tagIgnoreAttrs
"PubmedArticle"
parsePubMedArticle'
manyTagsUntil
"PubmedArticle"
parsePubMedArticle'
parsePubMedArticle'
::
MonadThrow
m
=>
ConduitT
Event
o
m
PubMed
parsePubMedArticle'
=
do
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment