Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
H
hal
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
3
Issues
3
List
Board
Labels
Milestones
Merge Requests
2
Merge Requests
2
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gargantext
crawlers
hal
Commits
000e2322
Commit
000e2322
authored
Sep 16, 2019
by
Mael NICOLAS
Browse files
Options
Browse Files
Download
Plain Diff
Merge branch 'patch-2' into 'dev'
Patch 2 See merge request
!6
parents
39f9a367
7ecef6a1
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
59 additions
and
7 deletions
+59
-7
README.md
README.md
+45
-1
Main.hs
app/Main.hs
+3
-4
HAL.hs
src/HAL.hs
+11
-2
No files found.
README.md
View file @
000e2322
# halCrawler
# HAL API Crawler
## API documentation
## Base website
https://api.archives-ouvertes.fr/docs
## Usage
### Entry function
The basic entry point of this crawler is the function
`HAL.`
`getMetadataWith`
.
This function take a
`Text`
representing the query you want to run on hal
and a
`Maybe Int`
representing the maximum number of result you want to get.
### Return Type
The return type is a bit more tricky, it's
**either**
a
`ClientError`
or a
`HAL.Client`
`Response Corpus`
.
In the usage it's exactly working as the other crawler
`Documents`
type, but it facilitate on the implementation side.
`Response x`
represent a collection of
`x`
and the number of
`x`
returned.
` HAL.Doc.Corpus.`
`Corpus`
is a simple type that contain every informations we need (id,title,abstract,publicationDate,sources).
### Exemple
Here is a basic main using the entry point of the crawler and printing the 5 first documents.
```
hs
{-# LANGUAGE OverloadedStrings #-}
module
Main
where
import
Network.HTTP.Client
(
newManager
)
import
Network.HTTP.Client.TLS
(
tlsManagerSettings
)
import
Servant.Client
import
HAL
(
getMetadataWith
)
import
HAL.Client
import
HAL.Doc
import
Tree
main
::
IO
()
main
=
do
res
<-
getMetadataWith
"artificial intelligence"
(
Just
10
)
case
res
of
(
Left
err
)
->
print
err
(
Right
val
)
->
print
$
take
5
$
_docs
val
```
app/Main.hs
View file @
000e2322
...
...
@@ -6,15 +6,14 @@ import Network.HTTP.Client (newManager)
import
Network.HTTP.Client.TLS
(
tlsManagerSettings
)
import
Servant.Client
import
HAL
(
runSearchRequest
)
import
HAL
(
getMetadataWith
)
import
HAL.Client
import
HAL.Doc
import
Tree
main
::
IO
()
main
=
do
manager'
<-
newManager
tlsManagerSettings
res
<-
runSearchRequest
$
[
"ia"
]
res
<-
getMetadataWith
"artificial intelligence"
(
Just
10
)
case
res
of
(
Left
err
)
->
print
err
(
Left
err
)
->
print
err
(
Right
val
)
->
print
$
_docs
val
src/HAL.hs
View file @
000e2322
...
...
@@ -13,6 +13,15 @@ import Servant.Client (BaseUrl(..), Scheme(..), ClientM, ClientError, runClientM
import
HAL.Client
import
HAL.Doc.Corpus
getMetadataWith
::
Text
->
Maybe
Int
->
IO
(
Either
ClientError
(
Response
Corpus
))
getMetadataWith
q
l
=
do
manager'
<-
newManager
tlsManagerSettings
runHalAPIClient
$
search
(
Just
requestedFields
)
[
q
]
Nothing
l
Nothing
requestedFields
::
Text
requestedFields
=
"docid,title_s,abstract_s,submittedDate_s,source_s,authFullName_s,authOrganism_s"
runHalAPIClient
::
ClientM
(
Response
Corpus
)
->
IO
(
Either
ClientError
(
Response
Corpus
))
runHalAPIClient
cmd
=
do
manager'
<-
newManager
tlsManagerSettings
...
...
@@ -20,8 +29,8 @@ runHalAPIClient cmd = do
runStructureRequest
::
Maybe
Text
->
IO
(
Either
ClientError
(
Response
Corpus
))
runStructureRequest
rq
=
runHalAPIClient
$
structure
(
Just
"docid,title_s,abstract_s,submittedDate_s,source_s,authFullName_s,authOrganism_s"
)
rq
(
Just
10000
)
runHalAPIClient
$
structure
(
Just
requestedFields
)
rq
(
Just
10000
)
runSearchRequest
::
[
Text
]
->
IO
(
Either
ClientError
(
Response
Corpus
))
runSearchRequest
rq
=
runHalAPIClient
$
search
(
Just
"docid,title_s,abstract_s,submittedDate_s,source_s,authFullName_s,authOrganism_s"
)
rq
Nothing
Nothing
Nothing
runHalAPIClient
$
search
(
Just
requestedFields
)
rq
Nothing
Nothing
Nothing
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment