Commit 3ba747fa authored by c24b's avatar c24b

[DOC] HOw to add a parser step by step in docs/overview

parent d4ae320d
...@@ -11,18 +11,20 @@ in gargantext.moissonneurs ...@@ -11,18 +11,20 @@ in gargantext.moissonneurs
in templates and views in templates and views
## Reference parser into gargantext website ## Reference parser into gargantext website
gargantext website is stored in gargantext/gargantext
### reference your new parser into contants.py ### reference your new parser into contants.py
* import your parser l.125 * import your parser l.125
``` ```
from gargantext.util.parsers import \ from gargantext.util.parsers import \
EuropressParser, RISParser, PubmedParser, ISIParser, CSVParser, ISTexParser, CernParser EuropressParser, RISParser, PubmedParser, ISIParser, CSVParser, ISTexParser, CernParser
``` ```
Le parser correspond au nom du parser référencé dans gargantext/util/parser The parser corresponds to the name of the parser referenced in gargantext/util/parser
ici il est appelé CernParser here name is CernParser
* index your RESOURCETYPES * index your RESOURCETYPE
RESOURCETYPES (l.145) **at the end of the list** int RESOURCETYPES (l.145) **at the end of the list**
``` ```
# type 10 # type 10
{ "name": 'SCOAP (XML MARC21 Format)', { "name": 'SCOAP (XML MARC21 Format)',
...@@ -31,18 +33,21 @@ RESOURCETYPES (l.145) **at the end of the list** ...@@ -31,18 +33,21 @@ RESOURCETYPES (l.145) **at the end of the list**
'accepted_formats':["zip","xml"], 'accepted_formats':["zip","xml"],
}, },
``` ```
A noter le nom ici est composé de l'API_name(SCOAP) + (GENERICFILETYPE FORMAT_XML Format) A noter le nom ici est composé de l'API_name(SCOAP) + (GENERICFILETYPE FORMAT_XML Format)
La complexité du nommage correspond à trois choses: La complexité du nommage correspond à trois choses:
* le nom de l'API (different de l'organisme de production) * le nom de l'API (different de l'organisme de production)
* le type de format: XML * le type de format: XML
* la norme XML de ce format : MARC21 (cf. CernParser in gargantext/util/parser/Cern.py ) * la norme XML de ce format : MARC21 (cf. CernParser in gargantext/util/parser/Cern.py )
The default_langage corresponds to the default accepted lang that **should load** the default corresponding tagger
La langue correspond à la langue par défaut acceptée et qui charge le tagger correspondant
``` ```
from gargantext.util.taggers import NltkTagger from gargantext.util.taggers import NltkTagger
``` ```
TO DO: charger à la demander les types de taggers en fonction des langues et de l'install TO DO: charger à la demander les types de taggers en fonction des langues et de l'install
TO DO: proposer un module pour télécharger des parsers supplémentaires TO DO: proposer un module pour télécharger des parsers supplémentaires
TO DO: provide install tagger module scripts inside lib
Les formats correspondent aux types de fichiers acceptées lors de l'envoi du fichier dans le formulaire de Les formats correspondent aux types de fichiers acceptées lors de l'envoi du fichier dans le formulaire de
parsing disponible dans `gargantext/view/pages/projects.py` et parsing disponible dans `gargantext/view/pages/projects.py` et
...@@ -63,24 +68,25 @@ but nothing will occur ...@@ -63,24 +68,25 @@ but nothing will occur
Three main and only requirements: Three main and only requirements:
* your parser class should inherit from the base class _Parser() * your parser class should inherit from the base class _Parser()
* your parser class must have a parse method that take a **filename** as input `gargantext/gargantext/util/parser/_Parser`
* your parser class must have a parse method that take a **file buffer** as input
* you parser must structure and store data into **hyperdata_list** variable name * you parser must structure and store data into **hyperdata_list** variable name
to be properly indexed by toolchain to be properly indexed by toolchain
! Be careful of date format: provide a publication_date in a string format YYYY-mm-dd HH:MM:SS
# Adding a scrapper API to offer search option: # Adding a scrapper API to offer search option:
En cours
* Add pop up question Do you have a corpus * Add pop up question Do you have a corpus
option search in /templates/pages/projects/project.html line 181 option search in /templates/pages/projects/project.html line 181
## Reference a scrapper (moissonneur) into gargantext
# Some changes
* adding accepted_formats in constants * adding accepted_formats in constants
* adding check_file routine in Form check ==> but should inherit from utils/files.py * adding check_file routine in Form check ==> but should inherit from utils/files.py
that also have implmented the size upload limit check that also have implmented the size upload limit check
# Suggestion next step: # Suggestion 4 next steps:
* XML parser MARC21 UNIMARC ... * XML parser MARC21 UNIMARC ...
* A project type is qualified by the first element add i.e: * A project type is qualified by the first element add i.e:
the first element determine the type of corpus of all the corpora within the project the first element determine the type of corpus of all the corpora within the project
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment