Commit 5866a043 authored by c24b's avatar c24b

Prepare a method for checking content file = is it the right parser?

parent 3374d428
......@@ -50,35 +50,38 @@ exposé dans `/templates/pages/projects/project.html`
## reference your parser script
## add your script into gargantext/util/
here filename is
## add your parser script into folder gargantext/util/parser/
here my filename was
##declare it into gargantext/util/parser/
from .Cern import CernParser
At this step, you will be able to see your parser and add a file with the form
but nothing will occur
## add your parser script into gargantext/util/parser/
## the good way to write the scrapper script
At this step, you will be able to see your parser and add a file with the form
it will send the job to toolchain
Three main and only requirements:
* your parser class should inherit from the base class _Parser()
* your parser class must have a parse method that take a **filename** as input
* you parser must structure and store data into **hyperdata_list** variable name
to be properly indexed by toolchain
# Adding a scrapper API to offer search option:
* Add pop up question Do you have a corpus
option search in /templates/pages/projects/project.html line 181
# Some changes
* adding accepted_formats in constants
* adding check_file routine in Form check
* adding check_file routine in Form check ==> but should inherit from utils/
that also have implmented the size upload limit check
# Suggestion next step:
* XML parser MARC21 UNIMARC ...
* A project type is qualified by the first element add i.e:
the first element determine the type of corpus of all the corpora within the project
......@@ -246,7 +246,8 @@ from .settings import BASE_DIR
# uploads/.gitignore prevents corpora indexing
# copora can be either a folder or symlink towards specific partition
UPLOAD_DIRECTORY = os.path.join(BASE_DIR, 'uploads/corpora')
UPLOAD_LIMIT = 1024 * 1024 * 1024
#* 1024 * 1024
......@@ -25,11 +25,13 @@ def download(url, name=''):
def upload(uploaded):
if uploaded.size > UPLOAD_LIMIT:
raise IOError('Uploaded file is bigger than allowed: %d > %d' % (
return save(
contents =,
name =,
......@@ -23,6 +23,9 @@ class Parser:
def __del__(self):
def detect_format(self, accepted_format):
def detect_encoding(self, string):
"""Useful method to detect the encoding of a document.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment