Commit 000e2322 authored by Mael NICOLAS's avatar Mael NICOLAS

Merge branch 'patch-2' into 'dev'

Patch 2

See merge request !6
parents 39f9a367 7ecef6a1
# halCrawler
# HAL API Crawler
## API documentation
## Base website
https://api.archives-ouvertes.fr/docs
## Usage
### Entry function
The basic entry point of this crawler is the function `HAL.` `getMetadataWith`.
This function take a `Text` representing the query you want to run on hal
and a `Maybe Int` representing the maximum number of result you want to get.
### Return Type
The return type is a bit more tricky, it's **either** a `ClientError` or a `HAL.Client` `Response Corpus`.
In the usage it's exactly working as the other crawler `Documents` type, but it facilitate on the implementation side.
`Response x` represent a collection of `x` and the number of `x` returned.
` HAL.Doc.Corpus.` `Corpus` is a simple type that contain every informations we need (id,title,abstract,publicationDate,sources).
### Exemple
Here is a basic main using the entry point of the crawler and printing the 5 first documents.
```hs
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Network.HTTP.Client (newManager)
import Network.HTTP.Client.TLS (tlsManagerSettings)
import Servant.Client
import HAL (getMetadataWith)
import HAL.Client
import HAL.Doc
import Tree
main :: IO ()
main = do
res <- getMetadataWith "artificial intelligence" (Just 10)
case res of
(Left err) -> print err
(Right val) -> print $ take 5 $ _docs val
```
......@@ -6,15 +6,14 @@ import Network.HTTP.Client (newManager)
import Network.HTTP.Client.TLS (tlsManagerSettings)
import Servant.Client
import HAL (runSearchRequest)
import HAL (getMetadataWith)
import HAL.Client
import HAL.Doc
import Tree
main :: IO ()
main = do
manager' <- newManager tlsManagerSettings
res <- runSearchRequest $ ["ia"]
res <- getMetadataWith "artificial intelligence" (Just 10)
case res of
(Left err) -> print err
(Left err) -> print err
(Right val) -> print $ _docs val
......@@ -13,6 +13,15 @@ import Servant.Client (BaseUrl(..), Scheme(..), ClientM, ClientError, runClientM
import HAL.Client
import HAL.Doc.Corpus
getMetadataWith :: Text -> Maybe Int -> IO (Either ClientError (Response Corpus))
getMetadataWith q l = do
manager' <- newManager tlsManagerSettings
runHalAPIClient $ search (Just requestedFields) [q] Nothing l Nothing
requestedFields :: Text
requestedFields = "docid,title_s,abstract_s,submittedDate_s,source_s,authFullName_s,authOrganism_s"
runHalAPIClient :: ClientM (Response Corpus) -> IO (Either ClientError (Response Corpus))
runHalAPIClient cmd = do
manager' <- newManager tlsManagerSettings
......@@ -20,8 +29,8 @@ runHalAPIClient cmd = do
runStructureRequest :: Maybe Text -> IO (Either ClientError (Response Corpus))
runStructureRequest rq =
runHalAPIClient $ structure (Just "docid,title_s,abstract_s,submittedDate_s,source_s,authFullName_s,authOrganism_s") rq (Just 10000)
runHalAPIClient $ structure (Just requestedFields) rq (Just 10000)
runSearchRequest :: [Text] -> IO (Either ClientError (Response Corpus))
runSearchRequest rq =
runHalAPIClient $ search (Just "docid,title_s,abstract_s,submittedDate_s,source_s,authFullName_s,authOrganism_s") rq Nothing Nothing Nothing
runHalAPIClient $ search (Just requestedFields) rq Nothing Nothing Nothing
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment