Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
G
GarganTexternal tools
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
1
Merge Requests
1
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Anne-Laure Thomas Derepas
GarganTexternal tools
Commits
0f423675
Commit
0f423675
authored
Jul 19, 2023
by
Loïc Chapron
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
JsonCorpusToTSV
parent
9aecec67
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
82 additions
and
0 deletions
+82
-0
JsonCorpusToCsv.py
Conversion/ToTSV/JsonCorpusToTSV/JsonCorpusToCsv.py
+62
-0
README.md
Conversion/ToTSV/JsonCorpusToTSV/README.md
+19
-0
GarganTextPierreZip.zip
.../ToTSV/JsonCorpusToTSV/sample-zip/GarganTextPierreZip.zip
+0
-0
GarganTextPierreJson.json
...on/ToTSV/JsonCorpusToTSV/sample/GarganTextPierreJson.json
+1
-0
No files found.
Conversion/ToTSV/JsonCorpusToTSV/JsonCorpusToCsv.py
0 → 100644
View file @
0f423675
#######
# jsonCorpusToCsv.py
# description : change a json GarganText corpus into a csv legacy corpus
# licence : AGPL + CECILL v3
# author : quentin lobbé - qlobbe@iscpif.fr
#######
# python3 jsonCorpusToCsv.py corpus.json
import
sys
import
csv
import
json
from
zipfile
import
ZipFile
try
:
pathCorpus
=
sys
.
argv
[
1
]
except
:
print
(
"! args error
\n
"
)
sys
.
exit
(
0
)
def
readZipFile
(
path
):
with
ZipFile
(
path
,
'r'
)
as
f
:
file
=
f
.
open
(
f
.
namelist
()[
0
])
return
json
.
load
(
file
)
def
readJson
(
path
)
:
file
=
open
(
path
)
return
json
.
load
(
file
)
if
pathCorpus
.
split
(
'.'
)[
1
]
==
'zip'
:
corpusJson
=
readZipFile
(
pathCorpus
)
else
:
corpusJson
=
readJson
(
pathCorpus
)
output
=
open
(
str
(
pathCorpus
.
split
(
'.'
)[
0
])
+
".csv"
,
"w"
)
header
=
"title
\t
source
\t
publication_year
\t
publication_month
\t
publication_day
\t
abstract
\t
authors
\t
weight
\n
"
output
.
write
(
header
)
for
row
in
corpusJson
[
'corpus'
]
:
doc
=
row
[
'document'
][
'hyperdata'
]
abstract
=
"empty"
authors
=
"empty"
title
=
"empty"
source
=
"empty"
if
'title'
in
doc
.
keys
()
:
title
=
doc
[
'title'
]
.
replace
(
'"'
,
''
)
.
replace
(
'
\t
'
,
''
)
if
'source'
in
doc
.
keys
()
:
source
=
doc
[
'source'
]
.
replace
(
'"'
,
''
)
.
replace
(
'
\t
'
,
''
)
if
'abstract'
in
doc
.
keys
()
:
abstract
=
doc
[
'abstract'
]
.
replace
(
'"'
,
''
)
.
replace
(
'
\t
'
,
''
)
if
'authors'
in
doc
.
keys
()
:
authors
=
doc
[
'authors'
]
output_row
=
title
+
"
\t
"
+
source
+
"
\t
"
+
str
(
doc
[
'publication_year'
])
+
"
\t
"
+
str
(
doc
[
'publication_month'
])
+
"
\t
"
+
str
(
doc
[
'publication_day'
])
+
"
\t
"
+
abstract
+
"
\t
"
+
authors
+
"
\t
"
+
str
(
1
)
+
"
\n
"
output
.
write
(
output_row
)
\ No newline at end of file
Conversion/ToTSV/JsonCorpusToTSV/README.md
0 → 100644
View file @
0f423675
# JsonCorpusToTSV
## About The project
JsonCorpusToTSv transform a JsonCorpus from Gargantext into a TSV corpus.
## Usage
```
shell
python3 JsonCorpusToTSV.py corpus.json
```
corpus.json -> GarganText corpus in json format
Output a TSV legacy corpus : corpus.csv
You can also use a zip file with a json corpus in it
```
shell
python3 JsonCorpusToTSV.py corpus.zip
```
\ No newline at end of file
Conversion/ToTSV/JsonCorpusToTSV/sample-zip/GarganTextPierreZip.zip
0 → 100644
View file @
0f423675
File added
Conversion/ToTSV/JsonCorpusToTSV/sample/GarganTextPierreJson.json
0 → 100644
View file @
0f423675
This source diff could not be displayed because it is too large. You can
view the blob
instead.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment