resource.md 2.15 KB
Newer Older
c24b's avatar
c24b committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
#resources

Adding a new source into Gargantext requires a previous declaration
of the source inside constants.py

```python
RESOURCETYPES= [
{    "type":9, #give a unique type int
      "name": 'SCOAP [XML]', #resource name as proposed into the add corpus FORM [generic format]
      "parser": "CernParser", #name of the new parser class inside a CERN.py file (set to None if not implemented)
      "format": 'MARC21', #specific format
      'file_formats':["zip","xml"],# accepted file format
      "crawler": "CernCrawler", #name of the new crawler class inside a CERN.py file (set to None if no Crawler implemented)
      'default_languages': ['en', 'fr'], #supported defaut languages of the source
 },
 ...
 ]
```
## adding a new parser

Once you declared your new parser inside constants.py

add your new crawler file into /srv/gargantext/utils/parsers/
following this naming convention:

* Filename must be in uppercase without the Crawler mention.
  eg. MailParser => MAIL.py
* Inside this file the Parser must be called following the exact typo declared as parser in constants.py
* Your new crawler shall inherit from baseclasse Parser and provide a parse(filebuffer) method

```python
  #!/usr/bin/python3 env
  #filename:/srv/gargantext/util/parser/MAIL.py:
  from ._Parser import Parser
  class MailParser(Parser):
      def parse(self, file):
          ...
```
## adding a new crawler

Once you declared your new parser inside constants.py
add your new crawler file into /srv/gargantext/utils/parsers/
following this naming convention:

* Filename must be in uppercase without the Crawler mention.
  eg. MailCrawler => MAIL.py
* Inside this file the Crawler must be called following the exact typo declared as crawler in constants.py
* Your new crawler shall inherit from baseclasse Crawler and provide three method:
  * scan_results => ids
  * sample = > yes/no
  * fetch

```python
  #!/usr/bin/python3 env
  #filename:/srv/gargantext/util/crawler/MAIL.py:
  from ._Crawler import Crawler
  class MailCrawler(Crawler):
      def scan_results(self, query):
        ...
        self.ids = set()
      def sample(self, results_nb):
        ...
      def fetch(self, ids):
        
```