Description

This R project containt some script to format the database of clinical trials on covid-19 in Gargantext readable files (see http://Gargarntext.org).

The database should be in the tsv format (separator = tabulation ; no delimiters) and be formated in UTF8.

Load of Data

First define what is the name of the file to be processed. This file should be in the folder /data

library(lubridate)
source("coronalib.R") # R libraries 
name<-"database_d_chavalarias_2020-05-15" # name of the csv to be loaded

AllData <-read.csv(paste("data/",name,".csv",sep=""),head=TRUE,sep="\t")
AllData$Inclusion.criteria <- NULL
AllData$Exclusion.criteria <- NULL
AllData <- filter(AllData,!is.na(AllData$Registration.date))
nrow(AllData)
[1] 1902
x <- unique(AllData$Trial.registration.number)
head(x)[[1]]
[1] NCT04254874
796 Levels:  2020-001113-21 2020-001200-42 2020-001209-22 2020-001246-18 2020-001327-13 2020-001408-41 ... TCTR20200409006

Data segmentation

Several dataframe are generated according to which are the CTs under study.

library(dplyr) 
library(stringr) 
Prevention<- filter(AllData,grepl("Prevention",AllData$Study.aim)) # CTs taggés prevention
Treatments <- filter(AllData,grepl("Treatment",AllData$Study.aim)) # # CTs taggés Treaments
Posttreatment<- filter(AllData,grepl("Post treatment",AllData$Study.aim)) # CTs taggé Post-Treatment
print(paste(count(Prevention)," Prevention CTs,", count(Treatments)," Treatments CTs and ",count(Posttreatment)," Post-treatment CTs."))
[1] "282  Prevention CTs, 1604  Treatments CTs and  15  Post-treatment CTs."

Export of data and viz

Data are exported in several formats. The list of all treatments is also exported assuming that treatments are separated by a ‘+’ signe in the column treatment of the original db.

source("coronalib.R")
library(reshape)
library(wordcloud)
library(ggplot2)

# Html format
# export of a corpus with treatments and outcomes
garg_export_with_html(Treatments,"Treatment") # exporte Treatmeant et Outcomes des essais cliniques de type Treatment
garg_export_with_html(AllData,"AllData") # exporte Treatmeant et Outcomes des essais cliniques de tous types
garg_export_treatments_with_html(AllData,"AllData") # exporte Treatmeant des essais cliniques de tous types


# export of the list of all types of treatments whatever the phase in the format Gargantext map list Gargantext V3 & V4
gargV4_export_treaments_list(AllData,"AllDb")
gargV3_export_treaments_list(AllData,"AllDb")


# Conversion of the tsv file into Gargantext readable tsv dile
# Seleciton of the kind of CT to export : All / Prevention / Treatment / Post-treatment
# Selection of the kind of informations to include in the main text to be processed by Gargantext (bastract column): Treatmeant and/or Outcomes


# simple txt export
garg_export_all_plain(Treatments,"Treatment") # export main information in plain text
garg_export_OnlyTreatments(Treatments,"Treatment") # export only treatments in plain text
garg_export_OnlyOutcomes(Treatments,"Treatment") # export only outcomes in plain text
garg_export_all_plain(AllData,"All") # export main information in plain text

# raw export (just to have specific maps)
garg_export_raw_treatments(AllData,"AllData") ## export only info relative to treatments without any formating.

# Some simple viz - Tag cloud of the treaments per category of CT
TreatmentsCloud(Treatments)
TreatmentsCloud(Prevention)
TreatmentsCloud(Posttreatment)
LS0tCnRpdGxlOiAiQ29yb25hdmlydXMgQ2xpbmljYWwgVHJpYWxzIFIgbGlicmFyeSIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQojIyMgRGVzY3JpcHRpb24KClRoaXMgUiBwcm9qZWN0IGNvbnRhaW50IHNvbWUgc2NyaXB0IHRvIGZvcm1hdCB0aGUgZGF0YWJhc2Ugb2YgY2xpbmljYWwgdHJpYWxzIG9uIGNvdmlkLTE5IGluIEdhcmdhbnRleHQgcmVhZGFibGUgZmlsZXMgKHNlZSBodHRwOi8vR2FyZ2FybnRleHQub3JnKS4KClRoZSBkYXRhYmFzZSBzaG91bGQgYmUgaW4gdGhlIHRzdiBmb3JtYXQgKHNlcGFyYXRvciA9IHRhYnVsYXRpb24gOyBubyBkZWxpbWl0ZXJzKSBhbmQgYmUgZm9ybWF0ZWQgaW4gVVRGOC4KCiMjIyBMb2FkIG9mIERhdGEKRmlyc3QgZGVmaW5lIHdoYXQgaXMgdGhlIG5hbWUgb2YgdGhlIGZpbGUgdG8gYmUgcHJvY2Vzc2VkLiBUaGlzIGZpbGUgc2hvdWxkIGJlIGluIHRoZSBmb2xkZXIgL2RhdGEKYGBge3J9CmxpYnJhcnkobHVicmlkYXRlKQpzb3VyY2UoImNvcm9uYWxpYi5SIikgIyBSIGxpYnJhcmllcyAKbmFtZTwtImRhdGFiYXNlX2RfY2hhdmFsYXJpYXNfMjAyMC0wNS0xNSIgIyBuYW1lIG9mIHRoZSBjc3YgdG8gYmUgbG9hZGVkCgpBbGxEYXRhIDwtcmVhZC5jc3YocGFzdGUoImRhdGEvIixuYW1lLCIuY3N2IixzZXA9IiIpLGhlYWQ9VFJVRSxzZXA9Ilx0IikKQWxsRGF0YSRJbmNsdXNpb24uY3JpdGVyaWEgPC0gTlVMTApBbGxEYXRhJEV4Y2x1c2lvbi5jcml0ZXJpYSA8LSBOVUxMCkFsbERhdGEgPC0gZmlsdGVyKEFsbERhdGEsIWlzLm5hKEFsbERhdGEkUmVnaXN0cmF0aW9uLmRhdGUpKQpucm93KEFsbERhdGEpCnggPC0gdW5pcXVlKEFsbERhdGEkVHJpYWwucmVnaXN0cmF0aW9uLm51bWJlcikKaGVhZCh4KVtbMV1dCgpgYGAKIyMjIERhdGEgc2VnbWVudGF0aW9uClNldmVyYWwgZGF0YWZyYW1lIGFyZSBnZW5lcmF0ZWQgYWNjb3JkaW5nIHRvIHdoaWNoIGFyZSB0aGUgQ1RzIHVuZGVyIHN0dWR5LgoKYGBge3J9CmxpYnJhcnkoZHBseXIpIApsaWJyYXJ5KHN0cmluZ3IpIApQcmV2ZW50aW9uPC0gZmlsdGVyKEFsbERhdGEsZ3JlcGwoIlByZXZlbnRpb24iLEFsbERhdGEkU3R1ZHkuYWltKSkgIyBDVHMgdGFnZ8OpcyBwcmV2ZW50aW9uClRyZWF0bWVudHMgPC0gZmlsdGVyKEFsbERhdGEsZ3JlcGwoIlRyZWF0bWVudCIsQWxsRGF0YSRTdHVkeS5haW0pKSAjwqAjIENUcyB0YWdnw6lzIFRyZWFtZW50cwpQb3N0dHJlYXRtZW50PC0gZmlsdGVyKEFsbERhdGEsZ3JlcGwoIlBvc3QgdHJlYXRtZW50IixBbGxEYXRhJFN0dWR5LmFpbSkpICMgQ1RzIHRhZ2fDqSBQb3N0LVRyZWF0bWVudApwcmludChwYXN0ZShjb3VudChQcmV2ZW50aW9uKSwiIFByZXZlbnRpb24gQ1RzLCIsIGNvdW50KFRyZWF0bWVudHMpLCIgVHJlYXRtZW50cyBDVHMgYW5kICIsY291bnQoUG9zdHRyZWF0bWVudCksIiBQb3N0LXRyZWF0bWVudCBDVHMuIikpCmBgYAojIyBFeHBvcnQgb2YgZGF0YSBhbmQgdml6CkRhdGEgYXJlIGV4cG9ydGVkIGluIHNldmVyYWwgZm9ybWF0cy4gVGhlIGxpc3Qgb2YgYWxsIHRyZWF0bWVudHMgaXMgYWxzbyBleHBvcnRlZCBhc3N1bWluZyB0aGF0IHRyZWF0bWVudHMgYXJlIHNlcGFyYXRlZCBieSBhICcrJyBzaWduZSBpbiB0aGUgY29sdW1uIHRyZWF0bWVudCBvZiB0aGUgb3JpZ2luYWwgZGIuCgpgYGB7cn0Kc291cmNlKCJjb3JvbmFsaWIuUiIpCmxpYnJhcnkocmVzaGFwZSkKbGlicmFyeSh3b3JkY2xvdWQpCmxpYnJhcnkoZ2dwbG90MikKCiMgSHRtbCBmb3JtYXQKIyBleHBvcnQgb2YgYSBjb3JwdXMgd2l0aCB0cmVhdG1lbnRzIGFuZCBvdXRjb21lcwpnYXJnX2V4cG9ydF93aXRoX2h0bWwoVHJlYXRtZW50cywiVHJlYXRtZW50IikgIyBleHBvcnRlIFRyZWF0bWVhbnQgZXQgT3V0Y29tZXMgZGVzIGVzc2FpcyBjbGluaXF1ZXMgZGUgdHlwZSBUcmVhdG1lbnQKZ2FyZ19leHBvcnRfd2l0aF9odG1sKEFsbERhdGEsIkFsbERhdGEiKSAjIGV4cG9ydGUgVHJlYXRtZWFudCBldCBPdXRjb21lcyBkZXMgZXNzYWlzIGNsaW5pcXVlcyBkZSB0b3VzIHR5cGVzCmdhcmdfZXhwb3J0X3RyZWF0bWVudHNfd2l0aF9odG1sKEFsbERhdGEsIkFsbERhdGEiKSAjIGV4cG9ydGUgVHJlYXRtZWFudCBkZXMgZXNzYWlzIGNsaW5pcXVlcyBkZSB0b3VzIHR5cGVzCgoKIyBleHBvcnQgb2YgdGhlIGxpc3Qgb2YgYWxsIHR5cGVzIG9mIHRyZWF0bWVudHMgd2hhdGV2ZXIgdGhlIHBoYXNlIGluIHRoZSBmb3JtYXQgR2FyZ2FudGV4dCBtYXAgbGlzdCBHYXJnYW50ZXh0IFYzICYgVjQKZ2FyZ1Y0X2V4cG9ydF90cmVhbWVudHNfbGlzdChBbGxEYXRhLCJBbGxEYiIpCmdhcmdWM19leHBvcnRfdHJlYW1lbnRzX2xpc3QoQWxsRGF0YSwiQWxsRGIiKQoKCiMgQ29udmVyc2lvbiBvZiB0aGUgdHN2IGZpbGUgaW50byBHYXJnYW50ZXh0IHJlYWRhYmxlIHRzdiBkaWxlCiMgU2VsZWNpdG9uIG9mIHRoZSBraW5kIG9mIENUIHRvIGV4cG9ydCA6IEFsbCAvIFByZXZlbnRpb24gLyBUcmVhdG1lbnQgLyBQb3N0LXRyZWF0bWVudAojwqBTZWxlY3Rpb24gb2YgdGhlIGtpbmQgb2YgaW5mb3JtYXRpb25zIHRvIGluY2x1ZGUgaW4gdGhlIG1haW4gdGV4dCB0byBiZSBwcm9jZXNzZWQgYnkgR2FyZ2FudGV4dCAoYmFzdHJhY3QgY29sdW1uKTogVHJlYXRtZWFudCBhbmQvb3IgT3V0Y29tZXMKCgojwqBzaW1wbGUgdHh0IGV4cG9ydApnYXJnX2V4cG9ydF9hbGxfcGxhaW4oVHJlYXRtZW50cywiVHJlYXRtZW50IikgIyBleHBvcnQgbWFpbiBpbmZvcm1hdGlvbiBpbiBwbGFpbiB0ZXh0CmdhcmdfZXhwb3J0X09ubHlUcmVhdG1lbnRzKFRyZWF0bWVudHMsIlRyZWF0bWVudCIpICMgZXhwb3J0IG9ubHkgdHJlYXRtZW50cyBpbiBwbGFpbiB0ZXh0CmdhcmdfZXhwb3J0X09ubHlPdXRjb21lcyhUcmVhdG1lbnRzLCJUcmVhdG1lbnQiKSAjIGV4cG9ydCBvbmx5IG91dGNvbWVzIGluIHBsYWluIHRleHQKZ2FyZ19leHBvcnRfYWxsX3BsYWluKEFsbERhdGEsIkFsbCIpICMgZXhwb3J0IG1haW4gaW5mb3JtYXRpb24gaW4gcGxhaW4gdGV4dAoKIyByYXcgZXhwb3J0IChqdXN0IHRvIGhhdmUgc3BlY2lmaWMgbWFwcykKZ2FyZ19leHBvcnRfcmF3X3RyZWF0bWVudHMoQWxsRGF0YSwiQWxsRGF0YSIpICMjIGV4cG9ydCBvbmx5IGluZm8gcmVsYXRpdmUgdG8gdHJlYXRtZW50cyB3aXRob3V0IGFueSBmb3JtYXRpbmcuCgojIFNvbWUgc2ltcGxlIHZpeiAtIFRhZyBjbG91ZCBvZiB0aGUgdHJlYW1lbnRzIHBlciBjYXRlZ29yeSBvZiBDVApUcmVhdG1lbnRzQ2xvdWQoVHJlYXRtZW50cykKVHJlYXRtZW50c0Nsb3VkKFByZXZlbnRpb24pClRyZWF0bWVudHNDbG91ZChQb3N0dHJlYXRtZW50KQoKYGBgCg==