Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
gargantext
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
humanities
gargantext
Commits
5866a043
Commit
5866a043
authored
May 11, 2016
by
c24b
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Prepare a method for checking content file = is it the right parser?
parent
3374d428
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
21 additions
and
12 deletions
+21
-12
parser.md
docs/overview/parser.md
+14
-11
constants.py
gargantext/constants.py
+2
-1
files.py
gargantext/util/files.py
+2
-0
_Parser.py
gargantext/util/parsers/_Parser.py
+3
-0
No files found.
docs/overview/parser.md
View file @
5866a043
...
@@ -50,35 +50,38 @@ exposé dans `/templates/pages/projects/project.html`
...
@@ -50,35 +50,38 @@ exposé dans `/templates/pages/projects/project.html`
## reference your parser script
## reference your parser script
## add your
script into gargantext/util
/
## add your
parser script into folder gargantext/util/parser
/
here
filename i
s Cern.py
here
my filename wa
s Cern.py
##declare it into gargantext/util/parser/__init__.py
##declare it into gargantext/util/parser/__init__.py
from .Cern import CernParser
from .Cern import CernParser
At this step, you will be able to see your parser and add a file with the form
but nothing will occur
##
add your parser script into gargantext/util/parser/
##
the good way to write the scrapper script
At this step, you will be able to see your parser and add a file with the form
Three main and only requirements:
it will send the job to toolchain
*
your parser class should inherit from the base class _Parser()
##
*
your parser class must have a parse method that take a
**filename**
as input
parse_extract_indexhyperdata(corpus)
*
you parser must structure and store data into
**hyperdata_list**
variable name
to be properly indexed by toolchain
# Adding a scrapper API to offer search option:
*
Add pop up question Do you have a corpus
*
Add pop up question Do you have a corpus
option search in /templates/pages/projects/project.html line 181
option search in /templates/pages/projects/project.html line 181
adding
# Some changes
# Some changes
*
adding accepted_formats in constants
*
adding accepted_formats in constants
*
adding check_file routine in Form check
*
adding check_file routine in Form check ==> but should inherit from utils/files.py
that also have implmented the size upload limit check
# Suggestion next step:
# Suggestion next step:
*
XML parser MARC21 UNIMARC ...
*
XML parser MARC21 UNIMARC ...
*
A project type is qualified by the first element add i.e:
*
A project type is qualified by the first element add i.e:
the first element determine the type of corpus of all the corpora within the project
the first element determine the type of corpus of all the corpora within the project
gargantext/constants.py
View file @
5866a043
...
@@ -246,7 +246,8 @@ from .settings import BASE_DIR
...
@@ -246,7 +246,8 @@ from .settings import BASE_DIR
# uploads/.gitignore prevents corpora indexing
# uploads/.gitignore prevents corpora indexing
# copora can be either a folder or symlink towards specific partition
# copora can be either a folder or symlink towards specific partition
UPLOAD_DIRECTORY
=
os
.
path
.
join
(
BASE_DIR
,
'uploads/corpora'
)
UPLOAD_DIRECTORY
=
os
.
path
.
join
(
BASE_DIR
,
'uploads/corpora'
)
UPLOAD_LIMIT
=
1024
*
1024
*
1024
UPLOAD_LIMIT
=
1024
#* 1024 * 1024
DOWNLOAD_DIRECTORY
=
UPLOAD_DIRECTORY
DOWNLOAD_DIRECTORY
=
UPLOAD_DIRECTORY
...
...
gargantext/util/files.py
View file @
5866a043
...
@@ -25,11 +25,13 @@ def download(url, name=''):
...
@@ -25,11 +25,13 @@ def download(url, name=''):
def
upload
(
uploaded
):
def
upload
(
uploaded
):
print
(
repr
(
uploaded
))
if
uploaded
.
size
>
UPLOAD_LIMIT
:
if
uploaded
.
size
>
UPLOAD_LIMIT
:
raise
IOError
(
'Uploaded file is bigger than allowed:
%
d >
%
d'
%
(
raise
IOError
(
'Uploaded file is bigger than allowed:
%
d >
%
d'
%
(
uploaded
.
size
,
uploaded
.
size
,
UPLOAD_LIMIT
,
UPLOAD_LIMIT
,
))
))
return
save
(
return
save
(
contents
=
uploaded
.
file
.
read
(),
contents
=
uploaded
.
file
.
read
(),
name
=
uploaded
.
name
,
name
=
uploaded
.
name
,
...
...
gargantext/util/parsers/_Parser.py
View file @
5866a043
...
@@ -23,6 +23,9 @@ class Parser:
...
@@ -23,6 +23,9 @@ class Parser:
def
__del__
(
self
):
def
__del__
(
self
):
self
.
_file
.
close
()
self
.
_file
.
close
()
def
detect_format
(
self
,
accepted_format
):
print
(
self
.
_file
[:
1000
])
def
detect_encoding
(
self
,
string
):
def
detect_encoding
(
self
,
string
):
"""Useful method to detect the encoding of a document.
"""Useful method to detect the encoding of a document.
"""
"""
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment