Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
A
arxiv-api
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gargantext
crawlers
arxiv-api
Commits
f3e517cc
Commit
f3e517cc
authored
Apr 09, 2022
by
Przemyslaw Kaminski
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Fix for checking if page is empty to break out of conduit
parent
3a61fb84
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
25 additions
and
14 deletions
+25
-14
Main.hs
app/Main.hs
+4
-1
Arxiv.hs
src/Arxiv.hs
+21
-13
No files found.
app/Main.hs
View file @
f3e517cc
...
...
@@ -5,9 +5,12 @@ import Conduit
main
::
IO
()
main
=
do
(
cnt
,
resC
)
<-
searchAxv'
$
simpleQuery
[
"b
anach
space"
]
(
cnt
,
resC
)
<-
searchAxv'
$
simpleQuery
[
"b
esov
space"
]
putStrLn
$
"Total count: "
<>
show
cnt
runConduitRes
$
resC
.|
outSnk
--let resC = searchAxv $ simpleQuery ["besov fluid"]
--runConduitRes $ resC .| outSnk
src/Arxiv.hs
View file @
f3e517cc
...
...
@@ -27,7 +27,7 @@ type Limit = Int
type
Soup
=
Tag
String
batchSize
::
Int
batchSize
=
50
batchSize
=
50
0
------------------------------------------------------------
...
...
@@ -41,7 +41,7 @@ batchSize = 50
apiSimple
::
Maybe
Limit
->
[
String
]
->
IO
[
Result
]
apiSimple
limit
query
=
execQuery
limit
$
simpleQuery
query
apiSimpleC
::
MonadResource
m
=>
Maybe
Limit
->
[
String
]
->
IO
(
Int
,
C
.
ConduitT
()
Result
m
()
)
apiSimpleC
::
(
Monad
m
,
C
.
MonadIO
m
)
=>
Maybe
Limit
->
[
String
]
->
IO
(
Int
,
C
.
ConduitT
()
Result
m
()
)
apiSimpleC
Nothing
query
=
searchAxv'
$
simpleQuery
query
apiSimpleC
(
Just
limit
)
query
=
do
(
cnt
,
resC
)
<-
searchAxv'
$
simpleQuery
query
...
...
@@ -87,7 +87,7 @@ searchAxvBody q =
(
Status
200
_
)
->
pure
$
getResponseBody
rsp
st
->
error
$
"Error:"
++
show
st
searchAxv'
::
MonadResource
m
=>
Ax
.
Query
->
IO
(
Int
,
C
.
ConduitT
()
Result
m
()
)
searchAxv'
::
(
Monad
m
,
C
.
MonadIO
m
)
=>
Ax
.
Query
->
IO
(
Int
,
C
.
ConduitT
()
Result
m
()
)
searchAxv'
q
=
let
s
=
Ax
.
mkQuery
q
in
do
rsp
<-
HT
.
httpBS
=<<
liftIO
(
parseRequest
s
)
...
...
@@ -97,12 +97,18 @@ searchAxv' q =
soups
@
(
s
:
_
)
->
pure
(
Ax
.
totalResults
soups
,
results
q
soups
)
st
->
error
$
"Error:"
++
show
st
searchAxv
::
MonadResource
m
=>
Ax
.
Query
->
C
.
ConduitT
()
Result
m
()
searchAxv
::
(
Monad
m
,
C
.
MonadIO
m
)
=>
Ax
.
Query
->
C
.
ConduitT
()
Result
m
()
searchAxv
q
=
let
s
=
Ax
.
mkQuery
q
in
do
rsp
<-
HT
.
httpBS
=<<
liftIO
(
parseRequest
s
)
case
getResponseStatus
rsp
of
(
Status
200
_
)
->
getSoupC
(
getResponseBody
rsp
)
>>=
results
q
(
Status
200
_
)
->
let
body
=
getResponseBody
rsp
in
-- If no mo results, break, otherwise this keeps looping for next page forever
case
Ax
.
getEntry
(
getSoup
body
)
of
(
[]
,
_
)
->
C
.
yieldMany
[]
_
->
getSoupC
body
>>=
results
q
st
->
error
$
"Error:"
++
show
st
----------------------------------------------------------------------
...
...
@@ -114,8 +120,8 @@ getSoup b = parseTags $ B.unpack b
--getSoupC :: MonadResource m =>
-- B.ByteString -> C.ConduitT () a m [Soup]
getSoupC
::
Monad
m
=>
B
.
ByteString
->
C
.
ConduitT
()
a
m
[
Soup
]
getSoupC
b
=
concat
<$>
(
C
.
yield
b
.|
toSoup
.|
CL
.
consume
)
getSoupC
::
(
Monad
m
,
C
.
MonadIO
m
)
=>
B
.
ByteString
->
C
.
ConduitT
()
a
m
[
Soup
]
getSoupC
b
=
concat
<$>
(
C
.
yield
b
.|
C
.
mapM_C
(
liftIO
.
putStrLn
.
show
)
.|
toSoup
.|
CL
.
consume
)
----------------------------------------------------------------------
...
...
@@ -128,7 +134,7 @@ toSoup = C.awaitForever (C.yield . parseTags . B.unpack)
----------------------------------------------------------------------
-- Yield all entries and fetch next page
----------------------------------------------------------------------
results
::
MonadResource
m
=>
results
::
(
Monad
m
,
C
.
MonadIO
m
)
=>
Ax
.
Query
->
[
Soup
]
->
C
.
ConduitT
()
Result
m
()
results
q
sp
=
Ax
.
forEachEntryM
sp
(
C
.
yield
.
mkResult
)
>>
searchAxv
(
Ax
.
nextPage
q
)
...
...
@@ -140,6 +146,7 @@ data Result = Result { abstract :: String
,
authors
::
[
Ax
.
Author
]
,
categories
::
[
Ax
.
Category
]
,
doi
::
String
,
id
::
String
,
journal
::
String
,
primaryCategory
::
Maybe
Ax
.
Category
,
publication_date
::
String
...
...
@@ -153,14 +160,15 @@ mkResult :: [Soup] -> Result
mkResult
sp
=
let
abstract
=
Ax
.
getSummary
sp
&
clean'
authors
=
Ax
.
getAuthors
sp
categories
=
Ax
.
getCategories
sp
doi
=
Ax
.
getDoi
sp
doi
=
Ax
.
getDoi
sp
id
=
Ax
.
getId
sp
journal
=
Ax
.
getJournal
sp
primaryCategory
=
Ax
.
getPrimaryCategory
sp
publication_date
=
Ax
.
getPublished
sp
title
=
Ax
.
getTitle
sp
&
clean'
total
=
Ax
.
totalResults
sp
url
=
Ax
.
getPdf
sp
year
=
readMaybe
$
Ax
.
getYear
sp
total
=
Ax
.
totalResults
sp
url
=
Ax
.
getPdf
sp
year
=
readMaybe
$
Ax
.
getYear
sp
in
(
Result
{
..
}
)
where
clean'
x
=
let
x'
=
clean
[
'
\n
'
,
'
\r
'
,
'
\t
'
]
x
in
if
null
x'
then
"Not found"
else
x'
...
...
@@ -172,6 +180,6 @@ mkResult sp = let abstract = Ax.getSummary sp & clean'
----------------------------------------------------------------------
-- Sink results To print
----------------------------------------------------------------------
outSnk
::
MonadResource
m
=>
C
.
ConduitT
Result
C
.
Void
m
()
outSnk
::
(
Monad
m
,
C
.
MonadIO
m
)
=>
C
.
ConduitT
Result
C
.
Void
m
()
outSnk
=
C
.
awaitForever
(
liftIO
.
putStrLn
.
show
)
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment