Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
humanities
gargantext
Commits
3ba747fa
Commit
3ba747fa
authored
May 12, 2016
by
c24b
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
[DOC] HOw to add a parser step by step in docs/overview
parent
d4ae320d
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
21 additions
and
15 deletions
+21
-15
parser.md
docs/overview/parser.md
+21
-15
No files found.
docs/overview/parser.md
View file @
3ba747fa
...
@@ -11,18 +11,20 @@ in gargantext.moissonneurs
...
@@ -11,18 +11,20 @@ in gargantext.moissonneurs
in templates and views
in templates and views
## Reference parser into gargantext website
## Reference parser into gargantext website
gargantext website is stored in gargantext/gargantext
### reference your new parser into contants.py
### reference your new parser into contants.py
*
import your parser l.125
*
import your parser l.125
```
```
from gargantext.util.parsers import \
from gargantext.util.parsers import \
EuropressParser, RISParser, PubmedParser, ISIParser, CSVParser, ISTexParser, CernParser
EuropressParser, RISParser, PubmedParser, ISIParser, CSVParser, ISTexParser, CernParser
```
```
Le parser correspond au nom du parser référencé dans
gargantext/util/parser
The parser corresponds to the name of the parser referenced in
gargantext/util/parser
ici il est appelé
CernParser
here name is
CernParser
*
index your RESOURCETYPE
S
*
index your RESOURCETYPE
RESOURCETYPES (l.145)
**at the end of the list**
int
RESOURCETYPES (l.145)
**at the end of the list**
```
```
# type 10
# type 10
{ "name": 'SCOAP (XML MARC21 Format)',
{ "name": 'SCOAP (XML MARC21 Format)',
...
@@ -31,18 +33,21 @@ RESOURCETYPES (l.145) **at the end of the list**
...
@@ -31,18 +33,21 @@ RESOURCETYPES (l.145) **at the end of the list**
'accepted_formats':["zip","xml"],
'accepted_formats':["zip","xml"],
},
},
```
```
A noter le nom ici est composé de l'API_name(SCOAP) + (GENERICFILETYPE FORMAT_XML Format)
A noter le nom ici est composé de l'API_name(SCOAP) + (GENERICFILETYPE FORMAT_XML Format)
La complexité du nommage correspond à trois choses:
La complexité du nommage correspond à trois choses:
*
le nom de l'API (different de l'organisme de production)
*
le nom de l'API (different de l'organisme de production)
*
le type de format: XML
*
le type de format: XML
*
la norme XML de ce format : MARC21 (cf. CernParser in gargantext/util/parser/Cern.py )
*
la norme XML de ce format : MARC21 (cf. CernParser in gargantext/util/parser/Cern.py )
The default_langage corresponds to the default accepted lang that
**should load**
the default corresponding tagger
La langue correspond à la langue par défaut acceptée et qui charge le tagger correspondant
```
```
from gargantext.util.taggers import NltkTagger
from gargantext.util.taggers import NltkTagger
```
```
TO DO: charger à la demander les types de taggers en fonction des langues et de l'install
TO DO: charger à la demander les types de taggers en fonction des langues et de l'install
TO DO: proposer un module pour télécharger des parsers supplémentaires
TO DO: proposer un module pour télécharger des parsers supplémentaires
TO DO: provide install tagger module scripts inside lib
Les formats correspondent aux types de fichiers acceptées lors de l'envoi du fichier dans le formulaire de
Les formats correspondent aux types de fichiers acceptées lors de l'envoi du fichier dans le formulaire de
parsing disponible dans
`gargantext/view/pages/projects.py`
et
parsing disponible dans
`gargantext/view/pages/projects.py`
et
...
@@ -63,24 +68,25 @@ but nothing will occur
...
@@ -63,24 +68,25 @@ but nothing will occur
Three main and only requirements:
Three main and only requirements:
*
your parser class should inherit from the base class _Parser()
*
your parser class should inherit from the base class _Parser()
*
your parser class must have a parse method that take a
**filename**
as input
`gargantext/gargantext/util/parser/_Parser`
*
your parser class must have a parse method that take a
**file buffer**
as input
*
you parser must structure and store data into
**hyperdata_list**
variable name
*
you parser must structure and store data into
**hyperdata_list**
variable name
to be properly indexed by toolchain
to be properly indexed by toolchain
! Be careful of date format: provide a publication_date in a string format YYYY-mm-dd HH:MM:SS
# Adding a scrapper API to offer search option:
# Adding a scrapper API to offer search option:
En cours
*
Add pop up question Do you have a corpus
*
Add pop up question Do you have a corpus
option search in /templates/pages/projects/project.html line 181
option search in /templates/pages/projects/project.html line 181
## Reference a scrapper (moissonneur) into gargantext
# Some changes
*
adding accepted_formats in constants
*
adding accepted_formats in constants
*
adding check_file routine in Form check ==> but should inherit from utils/files.py
*
adding check_file routine in Form check ==> but should inherit from utils/files.py
that also have implmented the size upload limit check
that also have implmented the size upload limit check
# Suggestion next step:
# Suggestion 4 next steps:
*
XML parser MARC21 UNIMARC ...
*
XML parser MARC21 UNIMARC ...
*
A project type is qualified by the first element add i.e:
*
A project type is qualified by the first element add i.e:
the first element determine the type of corpus of all the corpora within the project
the first element determine the type of corpus of all the corpora within the project
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment