add qmd data description

96b55c0c · Robin Quillivic · 3d7b36f1 · 96b55c0c · 96b55c0c
Commit 96b55c0c authored Jun 01, 2023 by Robin Quillivic
Hide whitespace changes
Inline Side-by-side

Showing with 206 additions and 0 deletions

dataset_description.pdf doc/dataset_description.pdf +0 -0

dataset_description.qmd doc/dataset_description.qmd +206 -0

No files found.
--- a/doc/dataset_description.pdf
+++ b/doc/dataset_description.pdf
--- a/doc/dataset_description.qmd
+++ b/doc/dataset_description.qmd
+---
+title: "Dataset Description for emotional and psychological response paper"
+subtitle: "Data from "Etude 1000", programme 13-Novembre"
+author : "Robin Quillivic, PhD student"
+jupyter : emo
+date: "2023-06-01"
+execute:
+  echo: false
+format:
+    pdf: default
+    
+    html: 
+      self-contained: true
+      grid: 
+        margin-width: 100px
+
+---
+
+
+# Introduction
+
+This document describe precisly the transformation of the raw data into the final dataset used for the analysis of the emotional and psychological response of the participants from Etude 1000.
+
+# Data
+
+## Interview related features
+
+| Features name | Unit | Description | Remarks  |
+|---------------|------|-------------|----------|
+| code_enqueteur | str | unique id associated with interviewer  |   |
+| interview_date | date | date of the interview of p1  |    |
+| participation_p2 | int | participation to p2  |    |
+| interview_location | str | where interview took place  |    |
+
+{{< pagebreak >}}
+
+## Sociodemographic features
+
+| Features name | Unit | Description | Remarks  |
+|---------------|------|-------------|----------|
+| age | int | age of participant  | ex: 22  |
+| age_norm | float | age of participant normalize by max  | ex: 0.43   |
+| birthdate | date | birthdate of participant  |    |
+| sexe | str | M or F for male or female |    |
+| sexe_enc | int | 1 for M |    |
+| code_insee | str | profession categories associated with participant |    |
+| code_insee_fr | str | profession categories associated with participant in french |    |
+| code_insee_enc | str | ordinal encoding  of code_insee |    |
+| education_level | str | number of year after or before Bac |  ex: Bac+2  |
+| education_level_enc | int | number of year after or before Bac |  2  |
+| education_degree | str | Degree name in anglo-saxon reference|  ex: Bachelor  |
+| marital_status | str | Marital status of participant |  ex: Single  |
+| single | int | single or not (1 or 0) |  ex: 1  |
+| child_number | int | number of children |  ex: 1  |
+| residence_mode | str |mode of residence |  ex: In FLATSHARE  |
+| living_alone | int |living alon,e or not |  ex: 1  |
+
+### Distribution of some features
+```{python}
+import pandas as pd
+from IPython.display import display,Markdown
+import warnings
+warnings.filterwarnings('ignore')
+
+data = pd.read_csv('/home/robin/Data/Etude_1000/20230601_socio_and_emotional_data.csv', sep=';', encoding='utf-8')
+
+for col in ['code_insee', "education_degree","marital_status", "residence_mode"] :
+    Markdown(f"### {col}")
+    display(data[col].value_counts()) 
+
+```
+
+{{< pagebreak >}}
+
+## Expostion related features
+
+| Features name | Unit | Description | Remarks  |
+|---------------|------|-------------|----------|
+| exp_cercle | int | cercle of exposition  |   |
+| exp_critereA | str |based on DSM-V and category of testimony  | ex : A4   |
+| exp_exposition | int | Exposed or not  |  ex :1 |
+| exp_testimony_category | str | raw Etude 1000 data  P1_0_9_1 |   |
+| history_personal | int | presence or absence of familly history with trauma or attentat |  ex: 1 |
+| history_family | int | presence or absence of personal history with trauma or attentat |  ex :0 |
+
+
+```{python}
+
+for col in ["exp_critereA","exp_cercle"] :
+    Markdown(f"### {col}")
+    display(data[col].value_counts()) 
+
+```
+
+{{< pagebreak >}}
+
+## Media  and communcation related features
+
+| Features name | Unit | Description | Remarks  |
+|---------------|------|-------------|----------|
+| media_last_month | str | response to III_5_1  | ex: more than once a week   |
+| media_last_month_enc | int |bordinal encoding of media_last_month  | ex : 1  |
+| media_nb_hour_13_14 | int | number of hour of media consumption in the night of 13-14th  |  ex :10 |
+| media_cat_hour_13_14 | str | categorial encoding of media_nb_hour_13_14|  ex: between 5h and 10h |
+| communication_last_month | str | pnumber of communication related to the evt,n in last month |  ex: "more than onse a week" |
+| communication_last_month_enc | int | ordinal encoding of communication_last_month |  ex :1 |
+| communication_13_14 | int | number of communication in the night of 13-14th |  ex :15 |
+
+
+
+### Some distribution related to Media and communciation
+```{python}
+
+for col in ["media_last_month"] :
+    Markdown(f"### {col}")
+    display(data[col].value_counts()) 
+
+```
+
+{{< pagebreak >}}
+
+## Memory accuracy related features
+
+| Features name | Unit | Description | Remarks  |
+|---------------|------|-------------|----------|
+| estimation_nb_death | int | estimation of number of death from participant  | ex: 150   |
+| estimation_nb_death_correct | int | is the estimation correct ?  | ex: 1   |
+| estimation_nb_death_correct _cat | str | Is the estimation is over or under estimated ?  | ex: "strong_over_estimation"   |
+| estimation_nb_attackers | int | estimation of number of attackers from participant  | ex: 3   |
+| estimation_nb_attackers _correct | int | is the estimation correct ?  | ex: 1   |
+| memory_before_event | str | accuracy of memory before the event  | ex: "I remember_precisly"   |
+| memory_before_event_enc | int | Ordinal encoding of  memory_before_event  | ex: 1"   |
+| memory_after_event | str | accuracy of memory after the event  | ex: "I remember_precisly"   |
+| memory_after_event_enc | int | Ordinal encoding of  memory_after_event  | ex: 1"   |
+| memory_event | str | accuracy of memory of the event  | ex: "I remember_precisly"   |
+| memory_event_enc | int | Ordinal encoding of  memory_event  | ex: 1"   |
+| memory_hour_event | str | accuracy of memory of the hour of the event  | ex: "I remember_precisly"   |
+| memory_hour_event_enc | int | Ordinal encoding of  memory_hour_event  | ex: 1"   |
+| memory_crash_tgv_after _event | str | accuracy of memory of the crash of the TGV 2 days after the event  | ex: "I remember_precisly"   |
+| memory_crash_tgv_after _event_enc | int | Ordinal encoding of  memory_crash_tgv_after_event  | ex: 1"   |
+| memory_crash_plane _before_event | str | accuracy of memory of the crash of the plane in March 2015 | ex: "I remember_precisly"   |
+| memory_crash_plane_before _event_enc | int | Ordinal encoding of  memory_crash_plane_before_event  | ex: 1"   |
+
+
+
+
+### Some distribution related to memory
+```{python}
+
+for col in ["estimation_nb_death_correct_cat","memory_hour_event"] :
+    Markdown(f"### {col}")
+    display(data[col].value_counts()) 
+
+```
+
+{{< pagebreak >}}
+
+## Psychological features
+
+This features were computed using psychatrist expertise and 14 response of the emotional questionaries.
+
+| Features name | Unit | Description | Remarks  |
+|---------------|------|-------------|----------|
+| diagnosis_confidence_score | float | confidence of the diagnosis |  ex: 0.8  |
+|PTSD_probable | int | is the participant is probable PTSD |  ex: 1  |
+| partial_PTSD_probable | int | is the participant is probable partial PTSD |  ex: 1  |
+| full_or_partial_PTSD _probable | int | is the participant is probable full or partial PTSD |  ex: 1  |
+| CB_probable | int | is the participant is probable CB, intruisins symptoms |  ex: 1  |
+|CC_probable | int | is the participant is probable CC, avoidance symptoms |  ex: 1  |
+| CD_probable | int | is the participant is probable CD |  ex: 1  |
+| CD_probable_depression | int | is the participant is probable CD with depression |  ex: 1  |
+| CD_probable_dissociation | int | is the participant is probable CD with dissociation |  ex: 1  |
+| CE_probable | int | is the participant is probable CE , hyperarousal symptoms |  ex: 1  |
+| CG_probable | int | is the participant is probable CG |  ex: 1  |
+
+{{< pagebreak >}}
+
+## Emotional features
+
+List of emotions selected by the participants at this questions: 
+
+> II_5_1_P1_II_5_1_QUELLES_SONT_LES_EMOTIONS_QUE_VOUS_ RESSENTEZ_QUAND_VOUS_PENSEZ_AUX_EVENEMENTS_DU_13_NOVEMBRE_ AU_COURS_DU_MOIS_QUI_VIENT_DE_S_ECOULER
+
+> II_5_1_P1_II_5_1_WHAT_ARE_THE_EMOTIONS_YOU_FEEL_WHEN_YOU_ THINK_ABOUT_THE_EVENTS_OF_13_NOVEMBER_ IN_THE_MONTH_THAT _HAS_JUST_PASSED
+
+### Simple Emotion distribution
+```{python}
+
+for col in ['Surprise',
+ 'Joy',
+ 'Anger',
+ 'Stunned',
+ 'Sadness',
+ 'Satisfaction',
+ 'Emphathy',
+ 'Interest',
+ 'Fear',
+ 'Incomprehension',
+ 'Disgust'] :
+    Markdown(f"### {col}")
+    display(data[col.lower()].value_counts(normalize=True)) 
+
+```
+
+All the combination of 2 emotions are also computed. For example *surprise_joy*, *surprise_anger*, *surprise_stunned*, *surprise_sadness* etc...
+