Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
G
GarganTexternal tools
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Julien Moutinho
GarganTexternal tools
Commits
cef0870c
Commit
cef0870c
authored
Jul 25, 2023
by
Loïc Chapron
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
pubmedToGarganText
parent
485bc840
Changes
4
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
5240 additions
and
0 deletions
+5240
-0
README.md
Conversion/ToTSV/pubmedCorpusToTSV/README.md
+24
-0
pubmedCorpusToTsv.py
Conversion/ToTSV/pubmedCorpusToTSV/pubmedCorpusToTsv.py
+87
-0
pubmed-Biologie-set+abstract.txt
...pubmedCorpusToTSV/sample/pubmed-Biologie-set+abstract.txt
+4586
-0
pubmed-Biologie-set.txt
...on/ToTSV/pubmedCorpusToTSV/sample/pubmed-Biologie-set.txt
+543
-0
No files found.
Conversion/ToTSV/pubmedCorpusToTSV/README.md
0 → 100644
View file @
cef0870c
# pubmedCorpusToTSV
## About The project
pubmedCorpusToTSV transform a text file from PubMed into TSV file usable in GarganText
## Usage
```
shell
python3 pubmedCorpusToTSV.py corpus.txt
```
corpus.txt -> Text file from PubMed
Output a TSV legacy file next to the text file : corpus.csv
## Date
This script have been last updated the 2023/07/25.
It can be outdated if the futur.
## Note
Every nbib file also work with this script
\ No newline at end of file
Conversion/ToTSV/pubmedCorpusToTSV/pubmedCorpusToTsv.py
0 → 100644
View file @
cef0870c
#######
# pubmedCorpusToCsv.py
# description : turn a pubmed file (nbib) into a gargantext csv corpus
# licence : AGPL + CECILL v3
# author : quentin lobbé - qlobbe@iscpif.fr
#######
import
sys
import
csv
import
nbib
import
re
import
calendar
# python3 pubmedCorpusToCsv.py corpus.txt
path
=
""
try
:
path
=
sys
.
argv
[
1
]
except
:
print
(
"! args error
\n
"
)
sys
.
exit
(
0
)
def
normalizePath
(
path
)
:
splited
=
path
.
split
(
'/'
)
name
=
(
splited
[
-
1
])
.
split
(
'.'
)[
0
]
root
=
'/'
.
join
(
splited
[:
-
1
])
return
(
root
,
name
)
root
,
name
=
normalizePath
(
path
)
if
root
!=
''
:
root
+=
'/'
output
=
open
(
root
+
name
+
".csv"
,
"w"
)
header
=
"title
\t
source
\t
publication_year
\t
publication_month
\t
publication_day
\t
abstract
\t
authors
\t
weight
\n
"
output
.
write
(
header
)
docs
=
nbib
.
read_file
(
path
)
for
doc
in
docs
:
keys
=
doc
.
keys
()
if
len
(
list
(
set
([
'title'
,
'publication_date'
,
'authors'
])
&
set
(
keys
)))
<
3
:
continue
if
'journal'
in
keys
:
source
=
doc
[
'journal'
]
else
:
source
=
""
if
'abstract'
in
keys
:
abstract
=
doc
[
'abstract'
]
else
:
abstract
=
""
title
=
doc
[
'title'
]
date
=
doc
[
'publication_date'
]
.
split
(
' '
)
year
=
date
[
0
]
if
len
(
date
)
>
1
:
try
:
month
=
list
(
calendar
.
month_abbr
)
.
index
(
date
[
1
])
except
Exception
as
e
:
month
=
'1'
else
:
month
=
'1'
if
len
(
date
)
>
2
:
day
=
date
[
2
]
else
:
day
=
'1'
abstract
=
re
.
sub
(
'
\"
'
,
""
,
abstract
)
.
replace
(
"
\t
"
,
""
)
title
=
re
.
sub
(
'
\"
'
,
""
,
title
)
.
replace
(
"
\t
"
,
""
)
authors_lst
=
[]
for
author
in
doc
[
'authors'
]
:
authors_lst
.
append
((
author
[
'author'
])
.
replace
(
','
,
''
))
authors
=
','
.
join
(
authors_lst
)
row
=
str
(
title
)
+
"
\t
"
+
"scopus"
+
"
\t
"
+
year
+
"
\t
"
+
str
(
month
)
+
"
\t
"
+
str
(
day
)
+
"
\t
"
+
abstract
+
"
\t
"
+
authors
+
"
\t
"
+
str
(
1
)
+
"
\n
"
output
.
write
(
row
)
Conversion/ToTSV/pubmedCorpusToTSV/sample/pubmed-Biologie-set+abstract.txt
0 → 100644
View file @
cef0870c
This source diff could not be displayed because it is too large. You can
view the blob
instead.
Conversion/ToTSV/pubmedCorpusToTSV/sample/pubmed-Biologie-set.txt
0 → 100644
View file @
cef0870c
This diff is collapsed.
Click to expand it.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment