Commit a26bb0f6 authored by delanoe's avatar delanoe

Merge branch 'romain-reintegration-graphExplorer' into anoe-graph

parents b2067a0c f5314bfa
Install Instructions for Gargantext (CNRS):
#Install Instructions for Gargantext (CNRS):
## Get the source code
by cloning gargantext into /srv/gargantext
``` bash
git clone ssh://gitolite@delanoe.org:1979/gargantext /srv/gargantext \
&& cd /srv/gargantext \
&& git fetch origin stable \
&& git checkout stable \
```
Help needed ?
See [http://gargantext.org/about](http://gargantext.org/about) and [tools]() for the community
The folder will be /srv/gargantext:
* docs containes all informations on gargantext
/srv/gargantext/docs/
* install contains all the installation files
/srv/gargantext/install/
Prepare your environnement and make the initial installation.
Once you setup and install the Gargantext box. You can use ./install/run.sh utility
to load gargantext web plateform and access it throught your web browser
Help needed ?
See [http://gargantext.org/about](http://gargantext.org/about) and [tools](./contribution_guide.md) for the community
______________________________
Two installation procedure are provided:
1. [Prerequisites](##Prerequisites)
1. Semi-automatic installation [EASY]
2. Step by step installation [ADVANCED]
2. [SETUP](##Setup)
Here only semi-automatic installation is covered checkout [manual_install](manual_install.md)
to follow step by step procedure
3. [INSTALL](##Install)
4. [RUN](##RUN)
______________________________
##Prerequisites
## Init Setup
## Install
## Run
--------------------
# Semi automatic installation
All the procedure files are located into /srv/garantext/install/
``` bash
user@computer:$ cd /srv/garantext/install/
```
## Prerequisites
* A Debian based OS >= [FIXME]
* At least 35GO in the desired location of Gargantua [FIXME]
* At least 35GO in /srv/ [FIXME]
todo: reduce the size of gargantext lib
todo: remove lib once docker is configure
todo: remove lib once docker is configured
tip: if you have enought space for the full package you can:
! tip: if you have enought space for the full package you can:
* resize your partition
* make a simlink on gargantext_lib
* A [docker engine installation](https://docs.docker.com/engine/installation/linux/)
##Setup
Prepare your environnement and make the initial setup.
Setup can be done in 2 ways:
* [automatic setup](setup.sh) can be done by using the setup script provided [here](setup.sh)
* [manual setup](manual_setup.md) if you want to change some parameters [here](manual_setup.md)
##Install
Two installation procedure are actually proposed:
* the docker way [easy]
* the debian way [advanced]
####DOCKER WAY [EASY]
##Init Setup
Prepare your environnement and make the initial setup.
* Install docker
See [installation instruction for your distribution](https://docs.docker.com/engine/installation/)
This initial step creates a user for gargantext plateform along with dowloading additionnal libs and files.
* Build your docker image
It also install docker and build the docker image and build the gargantext box
``` bash
cd /srv/gargantext/install/docker/dev
./build
ID=$(docker build .) && docker run -i -t $ID
user@computer:/srv/garantext/install/$ .init.sh
```
You should see
```
Successfully built <container_id>
```
### Install
Once the init step is done
* Enter into the docker environnement
Inside folder /srv/garantext/install/
enter the gargantext image
``` bash
./srv/gargantext/install/docker/enterGargantextImage
user@computer:/srv/garantext/install/$ .docker/enterGargantextImage
```
go to the installation folder
``` bash
root@dockerimage8989809:$ cd /srv/gargantext/install/
```
[ICI] Tester si les config de postgresql et python sont faits en amont à la création du docker file
* Install Python environment
Inside the docker image, execute as root:
``` bash
/srv/gargantext/install/python/configure
root@dockerimage8989809:/srv/garantext/install/$ python/configure
```
* Configure PostgreSql
Inside the docker image, execute as root:
``` bash
/srv/gargantext/install/postgres/configure
```
* Exit the docker
```
exit (or Ctrl+D)
root@computer:/srv/garantext/install/$ postgres/configure
```
[Si OK ] enlever ses lignes
Install Gargantext server
* Enter docker container
``` bash
/srv/gargantext/install/docker/enterGargantextImage
```
* Configure the database
Inside the docker container:
``` bash
service postgresql start
#su gargantua
#activate the virtualenv
source /srv/env_3-5/bin/activate
python /srv/gargantext/dbmigrate.py
/srv/gargantext/manage.py makemigrations
/srv/gargantext/manage.py migrate
python /srv/gargantext/dbmigrate.py
```
You have entered the virtualenv as shown with (env_3-5)
``` bash
(env_3-5) $ python /srv/gargantext/dbmigrate.py
(env_3-5) $ /srv/gargantext/manage.py makemigrations
(env_3-5) $ /srv/gargantext/manage.py migrate
(env_3-5) $ python /srv/gargantext/dbmigrate.py
#will create tables and not hyperdata_nodes
python /srv/gargantext/dbmigrate.py
(env_3-5) $ python /srv/gargantext/dbmigrate.py
#will create table hyperdata_nodes
#launch first time the server to create first user
/srv/gargantext/manage.py runserver 0.0.0.0:8000
/srv/gargantext/init_accounts.py /srv/gargantext/install/init/account.csv
(env_3-5) $ /srv/gargantext/manage.py runserver 0.0.0.0:8000
(env_3-5) $ /srv/gargantext/init_accounts.py /srv/gargantext/install/init/account.csv
```
FIXME: dbmigrate need to launched several times since tables are
ordered with alphabetical order (and not dependencies order)
####Debian way [advanced]
##Run Gargantext
* Launch Gargantext
* Exit the docker
```
exit (or Ctrl+D)
```
## Run Gargantext
Enter the docker container:
``` bash
......@@ -126,31 +141,30 @@ Enter the docker container:
```
Inside the docker container:
``` bash
#start postgresql
#start Database (postgresql)
service postgresql start
#change to user
su gargantua
#activate the virtualenv
source /srv/env_3-5/bin/activate
#go to gargantext srv
cd /srv/gargantext/
(env_3-5) $ cd /srv/gargantext/
#run the server
/manage.py runserver 0.0.0.0:8000
(env_3-5) $ /manage.py runserver 0.0.0.0:8000
```
* Launch browser
outside the docker
Keep it open and outside the docker launch browser
``` bash
chromium http://127.0.0.1:8000/
```
* Click on Test Gargantext
```
Login : gargantua
Password : autnagrag
```
Enjoy :)
See [User Guide](/demo/tuto.md) for quick usage example
......@@ -12,14 +12,16 @@ LISTTYPES = {
'STOPLIST' : UnweightedList,
'MAINLIST' : UnweightedList,
'MAPLIST' : UnweightedList,
'SPECIFICITY' : WeightedList,
'SPECCLUSION' : WeightedList,
'GENCLUSION' : WeightedList,
'OCCURRENCES' : WeightedIndex, # could be WeightedList
'COOCCURRENCES': WeightedMatrix,
'TFIDF-CORPUS' : WeightedIndex,
'TFIDF-GLOBAL' : WeightedIndex,
'TIRANK-LOCAL' : WeightedIndex, # could be WeightedList
'TIRANK-GLOBAL' : WeightedIndex # could be WeightedList
'TIRANK-GLOBAL' : WeightedIndex, # could be WeightedList
}
# 'OWNLIST' : UnweightedList, # £TODO use this for any term-level tags
NODETYPES = [
# TODO separate id not array index, read by models.node
......@@ -37,7 +39,7 @@ NODETYPES = [
'COOCCURRENCES', # 9
# scores
'OCCURRENCES', # 10
'SPECIFICITY', # 11
'SPECCLUSION', # 11
'CVALUE', # 12
'TFIDF-CORPUS', # 13
'TFIDF-GLOBAL', # 14
......@@ -47,6 +49,7 @@ NODETYPES = [
# more scores (sorry!)
'TIRANK-LOCAL', # 16
'TIRANK-GLOBAL', # 17
'GENCLUSION', # 18
]
INDEXED_HYPERDATA = {
......@@ -222,12 +225,16 @@ DEFAULT_RANK_CUTOFF_RATIO = .75 # MAINLIST maximum terms in %
DEFAULT_RANK_HARD_LIMIT = 5000 # MAINLIST maximum terms abs
# (makes COOCS larger ~ O(N²) /!\)
DEFAULT_COOC_THRESHOLD = 2 # inclusive minimum for COOCS coefs
DEFAULT_COOC_THRESHOLD = 3 # inclusive minimum for COOCS coefs
# (makes COOCS more sparse)
DEFAULT_MAPLIST_MAX = 350 # MAPLIST maximum terms
DEFAULT_MAPLIST_MONOGRAMS_RATIO = .15 # part of monograms in MAPLIST
DEFAULT_MAPLIST_MONOGRAMS_RATIO = .2 # quota of monograms in MAPLIST
# (vs multigrams = 1-mono)
DEFAULT_MAPLIST_GENCLUSION_RATIO = .6 # quota of top genclusion in MAPLIST
# (vs top specclusion = 1-gen)
DEFAULT_MAX_NGRAM_LEN = 7 # limit used after POStagging rule
# (initial ngrams number is a power law of this /!\)
......@@ -272,7 +279,7 @@ DOWNLOAD_DIRECTORY = UPLOAD_DIRECTORY
# about batch processing...
BATCH_PARSING_SIZE = 256
BATCH_NGRAMSEXTRACTION_SIZE = 1024
BATCH_NGRAMSEXTRACTION_SIZE = 3000 # how many distinct ngrams before INTEGRATE
# Scrapers config
......@@ -282,7 +289,7 @@ QUERY_SIZE_N_DEFAULT = 1000
# Grammar rules for chunking
RULE_JJNN = "{<JJ.*>*<NN.*|>+<JJ.*>*}"
RULE_JJDTNN = "{<JJ.*>*<NN.*>+((<P|IN> <DT>? <JJ.*>* <NN.*>+ <JJ.*>*)|(<JJ.*>))*}"
RULE_NPN = "{<JJ.*>*<NN.*>+((<P|IN> <DT>? <JJ.*>* <NN.*>+ <JJ.*>*)|(<JJ.*>))*}"
RULE_TINA = "^((VBD,|VBG,|VBN,|CD.?,|JJ.?,|\?,){0,2}?(N.?.?,|\?,)+?(CD.,)??)\
+?((PREP.?|DET.?,|IN.?,|CC.?,|\?,)((VBD,|VBG,|VBN,|CD.?,|JJ.?,|\?\
,){0,2}?(N.?.?,|\?,)+?)+?)*?$"
......@@ -42,6 +42,9 @@ CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
CELERY_IMPORTS = ("gargantext.util.toolchain", "graph.cooccurrences")
# garg's custom unittests runner (adapted to our db models)
TEST_RUNNER = 'unittests.framework.GargTestRunner'
# Application definition
INSTALLED_APPS = [
......@@ -123,6 +126,9 @@ DATABASES = {
'PASSWORD': 'C8kdcUrAQy66U',
'HOST': '127.0.0.1',
'PORT': '5432',
'TEST': {
'NAME': 'test_gargandb',
},
}
}
......
......@@ -19,7 +19,7 @@ from gargantext.constants import DEFAULT_CSV_DELIM, DEFAULT_CSV_DELIM_GRO
# import will implement the same text cleaning procedures as toolchain
from gargantext.util.toolchain.parsing import normalize_chars
from gargantext.util.toolchain.ngrams_extraction import normalize_terms
from gargantext.util.toolchain.ngrams_extraction import normalize_forms
from sqlalchemy.sql import exists
from os import path
......
from gargantext.util.languages import languages
from gargantext.constants import LANGUAGES, DEFAULT_MAX_NGRAM_LEN, RULE_JJNN, RULE_JJDTNN
from gargantext.constants import LANGUAGES, DEFAULT_MAX_NGRAM_LEN, RULE_JJNN, RULE_NPN
import nltk
import re
......
......@@ -39,11 +39,11 @@ def do_mainlist(corpus,
# retrieve helper nodes if not provided
if not ranking_scores_id:
ranking_scores_id = session.query(Node.id).filter(
Node.typename == "TFIDF-GLOBAL",
Node.typename == "TIRANK-GLOBAL",
Node.parent_id == corpus.id
).first()
if not ranking_scores_id:
raise ValueError("MAINLIST: TFIDF node needed for mainlist creation")
raise ValueError("MAINLIST: TIRANK node needed for mainlist creation")
if not stoplist_id:
stoplist_id = session.query(Node.id).filter(
......
......@@ -9,37 +9,49 @@ from gargantext.util.db_cache import cache
from gargantext.util.lists import UnweightedList
from sqlalchemy import desc, asc
from gargantext.constants import DEFAULT_MAPLIST_MAX,\
DEFAULT_MAPLIST_GENCLUSION_RATIO,\
DEFAULT_MAPLIST_MONOGRAMS_RATIO
def do_maplist(corpus,
overwrite_id = None,
mainlist_id = None,
specificity_id = None,
specclusion_id = None,
genclusion_id = None,
grouplist_id = None,
limit=DEFAULT_MAPLIST_MAX,
genclusion_part=DEFAULT_MAPLIST_GENCLUSION_RATIO,
monograms_part=DEFAULT_MAPLIST_MONOGRAMS_RATIO
):
'''
According to Specificities and mainlist
According to Genericity/Specificity and mainlist
Parameters:
- mainlist_id (starting point, already cleaned of stoplist terms)
- specificity_id (ranking factor)
- specclusion_id (ngram inclusion by cooc specificity -- ranking factor)
- genclusion_id (ngram inclusion by cooc genericity -- ranking factor)
- grouplist_id (filtering grouped ones)
- overwrite_id: optional if preexisting MAPLIST node to overwrite
+ 2 constants to modulate the terms choice
+ 3 params to modulate the terms choice
- limit for the amount of picked terms
- monograms_part: a ratio of terms with only one lexical unit to keep
(multigrams quota = limit * (1-monograms_part))
- genclusion_part: a ratio of terms with only one lexical unit to keep
(speclusion quota = limit * (1-genclusion_part))
'''
if not (mainlist_id and specificity_id and grouplist_id):
raise ValueError("Please provide mainlist_id, specificity_id and grouplist_id")
if not (mainlist_id and specclusion_id and genclusion_id and grouplist_id):
raise ValueError("Please provide mainlist_id, specclusion_id, genclusion_id and grouplist_id")
monograms_limit = round(limit * monograms_part)
multigrams_limit = limit - monograms_limit
print("MAPLIST: monograms_limit =", monograms_limit)
print("MAPLIST: multigrams_limit = ", multigrams_limit)
quotas = {'topgen':{}, 'topspec':{}}
genclusion_limit = round(limit * genclusion_part)
speclusion_limit = limit - genclusion_limit
quotas['topgen']['monograms'] = round(genclusion_limit * monograms_part)
quotas['topgen']['multigrams'] = genclusion_limit - quotas['topgen']['monograms']
quotas['topspec']['monograms'] = round(speclusion_limit * monograms_part)
quotas['topspec']['multigrams'] = speclusion_limit - quotas['topspec']['monograms']
print("MAPLIST quotas:", quotas)
#dbg = DebugTime('Corpus #%d - computing Miam' % corpus.id)
......@@ -54,11 +66,19 @@ def do_maplist(corpus,
)
ScoreSpec=aliased(NodeNgram)
# specificity-ranked
query = (session.query(ScoreSpec.ngram_id)
ScoreGen=aliased(NodeNgram)
# ngram with both ranking factors spec and gen
query = (session.query(
ScoreSpec.ngram_id,
ScoreSpec.weight,
ScoreGen.weight,
Ngram.n
)
.join(Ngram, Ngram.id == ScoreSpec.ngram_id)
.filter(ScoreSpec.node_id == specificity_id)
.join(ScoreGen, ScoreGen.ngram_id == ScoreSpec.ngram_id)
.filter(ScoreSpec.node_id == specclusion_id)
.filter(ScoreGen.node_id == genclusion_id)
# we want only terms within mainlist
.join(MainlistTable, Ngram.id == MainlistTable.ngram_id)
......@@ -68,36 +88,99 @@ def do_maplist(corpus,
.outerjoin(IsSubform,
IsSubform.c.ngram2_id == ScoreSpec.ngram_id)
.filter(IsSubform.c.ngram2_id == None)
)
# TODO: move these 2 pools up to mainlist selection
top_monograms = (query
.filter(Ngram.n == 1)
.order_by(asc(ScoreSpec.weight))
.limit(monograms_limit)
.all()
)
top_multigrams = (query
.filter(Ngram.n >= 2)
# specificity-ranked
.order_by(desc(ScoreSpec.weight))
.limit(multigrams_limit)
.all()
)
obtained_mono = len(top_monograms)
obtained_multi = len(top_multigrams)
obtained_total = obtained_mono + obtained_multi
# print("MAPLIST: top_monograms =", obtained_mono)
# print("MAPLIST: top_multigrams = ", obtained_multi)
)
# format in scored_ngrams array:
# -------------------------------
# [(37723, 8.428, 14.239, 3 ), etc]
# ngramid wspec wgen nwords
scored_ngrams = query.all()
n_ngrams = len(scored_ngrams)
if n_ngrams == 0:
raise ValueError("No ngrams in cooc table ?")
# results, with same structure as quotas
chosen_ngrams = {
'topgen':{'monograms':[], 'multigrams':[]},
'topspec':{'monograms':[], 'multigrams':[]}
}
# specificity and genericity are rather reverse-correlated
# but occasionally they can have common ngrams (same ngram well ranked in both)
# => we'll use a lookup table to check if we didn't already get it
already_gotten_ngramids = {}
# 2 loops to fill spec-clusion then gen-clusion quotas
# (1st loop uses order from DB, 2nd loop uses our own sort at end of 1st)
for rkr in ['topspec', 'topgen']:
got_enough_mono = False
got_enough_multi = False
all_done = False
i = -1
while((not all_done) and (not (got_enough_mono and got_enough_multi))):
# retrieve sorted ngram n° i
i += 1
(ng_id, wspec, wgen, nwords) = scored_ngrams[i]
# before any continue case, we check the next i for max reached
all_done = (i+1 >= n_ngrams)
if ng_id in already_gotten_ngramids:
continue
# NB: nwords could be replaced by a simple search on r' '
if nwords == 1:
if got_enough_mono:
continue
else:
# add ngram to results and lookup
chosen_ngrams[rkr]['monograms'].append(ng_id)
already_gotten_ngramids[ng_id] = True
# multi
else:
if got_enough_multi:
continue
else:
# add ngram to results and lookup
chosen_ngrams[rkr]['multigrams'].append(ng_id)
already_gotten_ngramids[ng_id] = True
got_enough_mono = (len(chosen_ngrams[rkr]['monograms']) >= quotas[rkr]['monograms'])
got_enough_multi = (len(chosen_ngrams[rkr]['multigrams']) >= quotas[rkr]['multigrams'])
# at the end of the first loop we just need to sort all by the second ranker (gen)
scored_ngrams = sorted(scored_ngrams, key=lambda ng_infos: ng_infos[2], reverse=True)
obtained_spec_mono = len(chosen_ngrams['topspec']['monograms'])
obtained_spec_multi = len(chosen_ngrams['topspec']['multigrams'])
obtained_gen_mono = len(chosen_ngrams['topgen']['monograms'])
obtained_gen_multi = len(chosen_ngrams['topgen']['multigrams'])
obtained_total = obtained_spec_mono \
+ obtained_spec_multi \
+ obtained_gen_mono \
+ obtained_gen_multi
print("MAPLIST: top_spec_monograms =", obtained_spec_mono)
print("MAPLIST: top_spec_multigrams =", obtained_spec_multi)
print("MAPLIST: top_gen_monograms =", obtained_gen_mono)
print("MAPLIST: top_gen_multigrams =", obtained_gen_multi)
print("MAPLIST: kept %i ngrams in total " % obtained_total)
obtained_data = chosen_ngrams['topspec']['monograms'] \
+ chosen_ngrams['topspec']['multigrams'] \
+ chosen_ngrams['topgen']['monograms'] \
+ chosen_ngrams['topgen']['multigrams']
# NEW MAPLIST NODE
# -----------------
# saving the parameters of the analysis in the Node JSON
new_hyperdata = { 'corpus': corpus.id,
'limit' : limit,
'monograms_part' : monograms_part,
'monograms_result' : obtained_mono/obtained_total if obtained_total != 0 else 0
'monograms_part' : monograms_part,
'genclusion_part' : genclusion_part,
}
if overwrite_id:
# overwrite pre-existing node
......@@ -118,9 +201,7 @@ def do_maplist(corpus,
the_id = the_maplist.id
# create UnweightedList object and save (=> new NodeNgram rows)
datalist = UnweightedList(
[res.ngram_id for res in top_monograms + top_multigrams]
)
datalist = UnweightedList(obtained_data)
# save
datalist.save(the_id)
......
......@@ -10,8 +10,8 @@ from .ngram_groups import compute_groups
from .metric_tfidf import compute_occs, compute_tfidf_local, compute_ti_ranking
from .list_main import do_mainlist
from .ngram_coocs import compute_coocs
from .metric_specificity import compute_specificity
from .list_map import do_maplist # TEST
from .metric_specgen import compute_specgen
from .list_map import do_maplist
from .mail_notification import notify_owner
from gargantext.util.db import session
from gargantext.models import Node
......@@ -136,22 +136,26 @@ def parse_extract_indexhyperdata(corpus):
# => used for doc <=> ngram association
# ------------
# -> cooccurrences on mainlist: compute + write (=> Node and NodeNgramNgram)
# -> cooccurrences on mainlist: compute + write (=> Node and NodeNgramNgram)*
coocs = compute_coocs(corpus,
on_list_id = mainlist_id,
groupings_id = group_id,
just_pass_result = True)
just_pass_result = True,
diagonal_filter = False) # preserving the diagonal
# (useful for spec/gen)
print('CORPUS #%d: [%s] computed mainlist coocs for specif rank' % (corpus.id, t()))
# -> specificity: compute + write (=> NodeNodeNgram)
spec_id = compute_specificity(corpus,cooc_matrix = coocs)
# -> specclusion/genclusion: compute + write (2 Nodes + 2 lists in NodeNgram)
(spec_id, gen_id) = compute_specgen(corpus,cooc_matrix = coocs)
# no need here for subforms because cooc already counted them in mainform
print('CORPUS #%d: [%s] new specificity node #%i' % (corpus.id, t(), spec_id))
print('CORPUS #%d: [%s] new spec-clusion node #%i' % (corpus.id, t(), spec_id))
print('CORPUS #%d: [%s] new gen-clusion node #%i' % (corpus.id, t(), gen_id))
# maplist: compute + write (to Node and NodeNgram)
map_id = do_maplist(corpus,
mainlist_id = mainlist_id,
specificity_id=spec_id,
specclusion_id=spec_id,
genclusion_id=gen_id,
grouplist_id=group_id
)
print('CORPUS #%d: [%s] new maplist node #%i' % (corpus.id, t(), map_id))
......@@ -187,7 +191,7 @@ def recount(corpus):
- ndocs
- ti_rank
- coocs
- specificity
- specclusion/genclusion
- tfidf
NB: no new extraction, no list change, just the metrics
......@@ -208,10 +212,15 @@ def recount(corpus):
old_tirank_id = None
try:
old_spec_id = corpus.children("SPECIFICITY").first().id
old_spec_id = corpus.children("SPECCLUSION").first().id
except:
old_spec_id = None
try:
old_gen_id = corpus.children("GENCLUSION").first().id
except:
old_gen_id = None
try:
old_ltfidf_id = corpus.children("TFIDF-CORPUS").first().id
except:
......@@ -254,11 +263,13 @@ def recount(corpus):
just_pass_result = True)
print('RECOUNT #%d: [%s] updated mainlist coocs for specif rank' % (corpus.id, t()))
# -> specificity: compute + write (=> NodeNgram)
spec_id = compute_specificity(corpus,cooc_matrix = coocs, overwrite_id = old_spec_id)
# -> specclusion/genclusion: compute + write (=> NodeNodeNgram)
(spec_id, gen_id) = compute_specgen(corpus, cooc_matrix = coocs,
spec_overwrite_id = spec_id, gen_overwrite_id = gen_id)
print('RECOUNT #%d: [%s] updated specificity node #%i' % (corpus.id, t(), spec_id))
print('RECOUNT #%d: [%s] updated spec-clusion node #%i' % (corpus.id, t(), spec_id))
print('RECOUNT #%d: [%s] updated gen-clusion node #%i' % (corpus.id, t(), gen_id))
print('RECOUNT #%d: [%s] FINISHED metric recounts' % (corpus.id, t()))
......
"""
Computes a specificity metric from the ngram cooccurrence matrix.
+ SAVE => WeightedList => NodeNgram
"""
from gargantext.models import Node, Ngram, NodeNgram, NodeNgramNgram
from gargantext.util.db import session, aliased, func, bulk_insert
from gargantext.util.lists import WeightedList
from collections import defaultdict
from pandas import DataFrame
from numpy import diag
def round3(floating_number):
"""
Rounds a floating number to 3 decimals
Good when we don't need so much details in the DB writen data
"""
return float("%.3f" % floating_number)
def compute_specgen(corpus, cooc_id=None, cooc_matrix=None,
spec_overwrite_id = None, gen_overwrite_id = None):
'''
Compute genericity/specificity:
P(j|i) = N(ij) / N(ii)
P(i|j) = N(ij) / N(jj)
Gen(i) = Sum{j} P(j_k|i)
Spec(i) = Sum{j} P(i|j_k)
Gen-clusion(i) = (Spec(i) + Gen(i)) / 2
Spec-clusion(i) = (Spec(i) - Gen(i)) / 2
Parameters:
- cooc_id: mandatory id of a cooccurrences node to use as base
- spec_overwrite_id: optional preexisting specificity node to overwrite
- gen_overwrite_id: optional preexisting genericity node to overwrite
'''
matrix = defaultdict(lambda : defaultdict(float))
if cooc_id == None and cooc_matrix == None:
raise TypeError("compute_specificity: needs a cooc_id or cooc_matrix param")
elif cooc_id:
cooccurrences = (session.query(NodeNgramNgram)
.filter(NodeNgramNgram.node_id==cooc_id)
)
# no filtering: cooc already filtered on mainlist_id at creation
for cooccurrence in cooccurrences:
matrix[cooccurrence.ngram1_id][cooccurrence.ngram2_id] = cooccurrence.weight
# matrix[cooccurrence.ngram2_id][cooccurrence.ngram1_id] = cooccurrence.weight
elif cooc_matrix:
# copy WeightedMatrix into local matrix structure
for (ngram1_id, ngram2_id) in cooc_matrix.items:
w = cooc_matrix.items[(ngram1_id, ngram2_id)]
# ------- 8< --------------------------------------------
# tempo hack to ignore lines/columns where diagonal == 0
# £TODO find why they exist and then remove this snippet
if (((ngram1_id,ngram1_id) not in cooc_matrix.items) or
((ngram2_id,ngram2_id) not in cooc_matrix.items)):
continue
# ------- 8< --------------------------------------------
matrix[ngram1_id][ngram2_id] = w
nb_ngrams = len(matrix)
print("SPECIFICITY: computing on %i ngrams" % nb_ngrams)
# example corpus (7 docs, 8 nouns)
# --------------------------------
# "The report says that humans are animals."
# "The report says that rivers are full of water."
# "The report says that humans like to make war."
# "The report says that animals must eat food."
# "The report says that animals drink water."
# "The report says that humans like food and water."
# "The report says that grass is food for some animals."
#===========================================================================
cooc_counts = DataFrame(matrix).fillna(0)
# cooc_counts matrix
# ------------------
# animals food grass humans report rivers war water
# animals 4 2 1 1 4 0 0 1
# food 2 3 1 1 3 0 0 1
# grass 1 1 1 0 1 0 0 0
# humans 1 1 0 3 3 0 1 1
# report 4 3 1 3 7 1 1 3
# rivers 0 0 0 0 1 1 0 1
# war 0 0 0 1 1 0 1 0
# water 1 1 0 1 3 1 0 3
#===========================================================================
# conditional p(col|line)
diagonal = list(diag(cooc_counts))
# debug
# print("WARN diag: ", diagonal)
# print("WARN diag: =================== 0 in diagonal ?\n",
# 0 in diagonal ? "what ??? zeros in the diagonal :/" : "ok no zeros",
# "\n===================")
p_col_given_line = cooc_counts / list(diag(cooc_counts))
# p_col_given_line
# ----------------
# animals food grass humans report rivers war water
# animals 1.0 0.7 1.0 0.3 0.6 0.0 0.0 0.3
# food 0.5 1.0 1.0 0.3 0.4 0.0 0.0 0.3
# grass 0.2 0.3 1.0 0.0 0.1 0.0 0.0 0.0
# humans 0.2 0.3 0.0 1.0 0.4 0.0 1.0 0.3
# report 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
# rivers 0.0 0.0 0.0 0.0 0.1 1.0 0.0 0.3
# war 0.0 0.0 0.0 0.3 0.1 0.0 1.0 0.0
# water 0.2 0.3 0.0 0.3 0.4 1.0 0.0 1.0
#===========================================================================
# total per lines (<=> genericity)
Gen = p_col_given_line.sum(axis=1)
# Gen.sort_values(ascending=False)
# ---
# report 8.0
# animals 3.9
# food 3.6
# water 3.3
# humans 3.3
# grass 1.7
# war 1.5
# rivers 1.5
#===========================================================================
# total columnwise (<=> specificity)
Spec = p_col_given_line.sum(axis=0)
# Spec.sort_values(ascending=False)
# ----
# grass 4.0
# food 3.7
# water 3.3
# humans 3.3
# report 3.3
# animals 3.2
# war 3.0
# rivers 3.0
#===========================================================================
# our "inclusion by specificity" metric
Specclusion = Spec-Gen
# Specclusion.sort_values(ascending=False)
# -----------
# grass 1.1
# war 0.8
# rivers 0.8
# food 0.0
# humans -0.0
# water -0.0
# animals -0.3
# report -2.4
#===========================================================================
# our "inclusion by genericity" metric
Genclusion = Spec+Gen
# Genclusion.sort_values(ascending=False)
# -----------
# report 11.3
# food 7.3
# animals 7.2
# water 6.7
# humans 6.7
# grass 5.7
# war 4.5
# rivers 4.5
#===========================================================================
# specificity node
if spec_overwrite_id:
# overwrite pre-existing id
the_spec_id = spec_overwrite_id
session.query(NodeNgram).filter(NodeNgram.node_id==the_spec_id).delete()
session.commit()
else:
specnode = corpus.add_child(
typename = "SPECCLUSION",
name = "Specclusion (in:%s)" % corpus.id
)
session.add(specnode)
session.commit()
the_spec_id = specnode.id
if not Specclusion.empty:
data = WeightedList(
zip( Specclusion.index.tolist()
, [v for v in map(round3, Specclusion.values.tolist())]
)
)
data.save(the_spec_id)
else:
print("WARNING: had no terms in COOCS => empty SPECCLUSION node")
#===========================================================================
# genclusion node
if gen_overwrite_id:
the_gen_id = gen_overwrite_id
session.query(NodeNgram).filter(NodeNgram.node_id==the_gen_id).delete()
session.commit()
else:
gennode = corpus.add_child(
typename = "GENCLUSION",
name = "Genclusion (in:%s)" % corpus.id
)
session.add(gennode)
session.commit()
the_gen_id = gennode.id
if not Genclusion.empty:
data = WeightedList(
zip( Genclusion.index.tolist()
, [v for v in map(round3, Genclusion.values.tolist())]
)
)
data.save(the_gen_id)
else:
print("WARNING: had no terms in COOCS => empty GENCLUSION node")
#===========================================================================
return(the_spec_id, the_gen_id)
"""
Computes a specificity metric from the ngram cooccurrence matrix.
+ SAVE => WeightedList => NodeNgram
"""
from gargantext.models import Node, Ngram, NodeNgram, NodeNgramNgram
from gargantext.util.db import session, aliased, func, bulk_insert
from gargantext.util.lists import WeightedList
from collections import defaultdict
from pandas import DataFrame
import pandas as pd
def compute_specificity(corpus, cooc_id=None, cooc_matrix=None, overwrite_id = None):
'''
Compute the specificity, simple calculus.
Parameters:
- cooc_id: mandatory id of a cooccurrences node to use as base
- overwrite_id: optional preexisting specificity node to overwrite
'''
matrix = defaultdict(lambda : defaultdict(float))
if cooc_id == None and cooc_matrix == None:
raise TypeError("compute_specificity: needs a cooc_id or cooc_matrix param")
elif cooc_id:
cooccurrences = (session.query(NodeNgramNgram)
.filter(NodeNgramNgram.node_id==cooc_id)
)
# no filtering: cooc already filtered on mainlist_id at creation
for cooccurrence in cooccurrences:
matrix[cooccurrence.ngram1_id][cooccurrence.ngram2_id] = cooccurrence.weight
matrix[cooccurrence.ngram2_id][cooccurrence.ngram1_id] = cooccurrence.weight
elif cooc_matrix:
# copy WeightedMatrix into local matrix structure
for (ngram1_id, ngram2_id) in cooc_matrix.items:
w = cooc_matrix.items[(ngram1_id, ngram2_id)]
matrix[ngram1_id][ngram2_id] = w
nb_ngrams = len(matrix)
print("SPECIFICITY: computing on %i ngrams" % nb_ngrams)
x = DataFrame(matrix).fillna(0)
# proba (x/y) ( <= on divise chaque ligne par son total)
x = x / x.sum(axis=1)
# vectorisation
# d:Matrix => v: Vector (len = nb_ngrams)
# v = d.sum(axis=1) (- lui-même)
xs = x.sum(axis=1) - x
ys = x.sum(axis=0) - x
# top inclus ou exclus
#n = ( xs + ys) / (2 * (x.shape[0] - 1))
# top generic or specific (asc is spec, desc is generic)
v = ( xs - ys) / ( 2 * (x.shape[0] - 1))
## d ##
#######
# Grenelle biodiversité kilomètres site élus île
# Grenelle 0 0 4 0 0 0
# biodiversité 0 0 0 0 4 0
# kilomètres 4 0 0 0 4 0
# site 0 0 0 0 4 6
# élus 0 4 4 4 0 0
# île 0 0 0 6 0 0
## d.sum(axis=1) ##
###################
# Grenelle 4
# biodiversité 4
# kilomètres 8
# site 10
# élus 12
# île 6
# résultat temporaire
# -------------------
# pour l'instant on va utiliser les sommes en ligne comme ranking de spécificité
# (**même** ordre qu'avec la formule d'avant le refactoring mais calcul + simple)
# TODO analyser la cohérence math ET sem de cet indicateur
#v.sort_values(inplace=True)
# [ ('biodiversité' , 0.333 ),
# ('Grenelle' , 0.5 ),
# ('île' , 0.599 ),
# ('kilomètres' , 1.333 ),
# ('site' , 1.333 ),
# ('élus' , 1.899 ) ]
# ----------------
# specificity node
if overwrite_id:
# overwrite pre-existing id
the_id = overwrite_id
session.query(NodeNgram).filter(NodeNgram.node_id==the_id).delete()
session.commit()
else:
specnode = corpus.add_child(
typename = "SPECIFICITY",
name = "Specif (in:%s)" % corpus.id
)
session.add(specnode)
session.commit()
the_id = specnode.id
# print(v)
pd.options.display.float_format = '${:,.2f}'.format
if not v.empty:
data = WeightedList(
zip( v.index.tolist()
, v.values.tolist()[0]
)
)
data.save(the_id)
else:
print("WARNING: had no terms in COOCS => empty SPECIFICITY node")
return(the_id)
......@@ -18,7 +18,8 @@ def compute_coocs( corpus,
stoplist_id = None,
start = None,
end = None,
symmetry_filter = False):
symmetry_filter = False,
diagonal_filter = True):
"""
Count how often some extracted terms appear
together in a small context (document)
......@@ -55,6 +56,9 @@ def compute_coocs( corpus,
NB the expected type of parameter value is datetime.datetime
(string is also possible but format must follow
this convention: "2001-01-01" aka "%Y-%m-%d")
- symmetry_filter: prevent calculating where ngram1_id > ngram2_id
- diagonal_filter: prevent calculating where ngram1_id == ngram2_id
(deprecated parameters)
- field1,2: allowed to count other things than ngrams (eg tags) but no use case at present
......@@ -69,7 +73,7 @@ def compute_coocs( corpus,
JOIN nodes_ngrams AS idxb
ON idxa.node_id = idxb.node_id <== that's cooc
---------------------------------
AND idxa.ngram_id <> idxb.ngram_id
AND idxa.ngram_id <> idxb.ngram_id (diagonal_filter)
AND idxa.node_id = MY_DOC ;
on entire corpus
......@@ -152,16 +156,14 @@ def compute_coocs( corpus,
ucooc
# for debug (2/4)
#, Xngram.terms.label("w_x")
#, Yngram.terms.label("w_y")
# , Xngram.terms.label("w_x")
# , Yngram.terms.label("w_y")
)
.join(Yindex, Xindex.node_id == Yindex.node_id ) # <- by definition of cooc
.join(Node, Node.id == Xindex.node_id) # <- b/c within corpus
.filter(Node.parent_id == corpus.id) # <- b/c within corpus
.filter(Node.typename == "DOCUMENT") # <- b/c within corpus
.filter(Xindex_ngform_id != Yindex_ngform_id) # <- b/c not with itself
)
# outerjoin the synonyms if needed
......@@ -179,12 +181,12 @@ def compute_coocs( corpus,
.group_by(
Xindex_ngform_id, Yindex_ngform_id # <- what we're counting
# for debug (3/4)
#,"w_x", "w_y"
# ,"w_x", "w_y"
)
# for debug (4/4)
#.join(Xngram, Xngram.id == Xindex_ngform_id)
#.join(Yngram, Yngram.id == Yindex_ngform_id)
# .join(Xngram, Xngram.id == Xindex_ngform_id)
# .join(Yngram, Yngram.id == Yindex_ngform_id)
.order_by(ucooc)
)
......@@ -192,6 +194,9 @@ def compute_coocs( corpus,
# 4) INPUT FILTERS (reduce N before O(N²))
if on_list_id:
# £TODO listes différentes ou bien une liste pour x et tous les ngrammes pour y
# car permettrait expansion de liste aux plus proches voisins (MacLachlan)
# (avec une matr rectangulaire)
m1 = aliased(NodeNgram)
m2 = aliased(NodeNgram)
......@@ -226,6 +231,10 @@ def compute_coocs( corpus,
)
if diagonal_filter:
# don't compute ngram with itself
coocs_query = coocs_query.filter(Xindex_ngform_id != Yindex_ngform_id)
if start or end:
Time = aliased(NodeHyperdata)
......@@ -268,6 +277,7 @@ def compute_coocs( corpus,
# threshold
# £TODO adjust COOC_THRESHOLD a posteriori:
# ex: sometimes 2 sometimes 4 depending on sparsity
print("COOCS: filtering pairs under threshold:", threshold)
coocs_query = coocs_query.having(ucooc >= threshold)
......
......@@ -77,7 +77,7 @@ def extract_ngrams(corpus, keys=('title', 'abstract', ), do_subngrams = DEFAULT_
continue
# get ngrams
for ngram in ngramsextractor.extract(value):
tokens = tuple(token[0] for token in ngram)
tokens = tuple(normalize_forms(token[0]) for token in ngram)
if do_subngrams:
# ex tokens = ["very", "cool", "exemple"]
......@@ -90,7 +90,7 @@ def extract_ngrams(corpus, keys=('title', 'abstract', ), do_subngrams = DEFAULT_
subterms = [tokens]
for seqterm in subterms:
ngram = normalize_terms(' '.join(seqterm))
ngram = ' '.join(seqterm)
if len(ngram) > 1:
# doc <=> ngram index
nodes_ngrams_count[(document.id, ngram)] += 1
......@@ -118,7 +118,7 @@ def extract_ngrams(corpus, keys=('title', 'abstract', ), do_subngrams = DEFAULT_
raise error
def normalize_terms(term_str, do_lowercase=DEFAULT_ALL_LOWERCASE_FLAG):
def normalize_forms(term_str, do_lowercase=DEFAULT_ALL_LOWERCASE_FLAG):
"""
Removes unwanted trailing punctuation
AND optionally puts everything to lowercase
......@@ -127,14 +127,14 @@ def normalize_terms(term_str, do_lowercase=DEFAULT_ALL_LOWERCASE_FLAG):
(benefits from normalize_chars upstream so there's less cases to consider)
"""
# print('normalize_terms IN: "%s"' % term_str)
term_str = sub(r'^[-",;/%(){}\\\[\]\.\' ]+', '', term_str)
term_str = sub(r'[-",;/%(){}\\\[\]\.\' ]+$', '', term_str)
# print('normalize_forms IN: "%s"' % term_str)
term_str = sub(r'^[-\'",;/%(){}\\\[\]\. ©]+', '', term_str)
term_str = sub(r'[-\'",;/%(){}\\\[\]\. ©]+$', '', term_str)
if do_lowercase:
term_str = term_str.lower()
# print('normalize_terms OUT: "%s"' % term_str)
# print('normalize_forms OUT: "%s"' % term_str)
return term_str
......
......@@ -57,7 +57,7 @@ class CSVLists(APIView):
params in request.GET:
onto_corpus: the corpus whose lists are getting patched
params in request.FILES:
params in request.data:
csvfile: the csv file
/!\ We assume we checked the file size client-side before upload
......
This diff is collapsed.
# Gargantext Installation
You will find here a Dockerfile and docker-compose script
that builds a development container for Gargantex
along with a PostgreSQL 9.5.X server.
* Install Docker
On your host machine, you need Docker.
[Installation guide details](https://docs.docker.com/engine/installation/#installation)
* clone the gargantex repository and get the refactoring branch
```
git clone ssh://gitolite@delanoe.org:1979/gargantext /srv/gargantext
cd /srv/gargantext
git fetch origin refactoring
git checkout refactoring
Install additionnal dependencies into gargantex_lib
```
wget http://dl.gargantext.org/gargantext_lib.tar.bz2 \
&& sudo tar xvjf gargantext_lib.tar.bz2 -o /srv/gargantext_lib \
&& sudo chown -R gargantua:gargantua /srv/gargantext_lib \
```
* Developers: create your own branch based on refactoring
see [CHANGELOG](CHANGELOG.md) for migrations and branches name
```
git checkout-b username-refactoring refactoring
```
Build the docker images:
- a database container
- a gargantext container
```
cd /srv/gargantext/install/
docker-compose build -t gargantex /srv/gargantext/install/docker/config/
docker-compose run web bundle install
```
Finally, setup the PostgreSQL database with the following commands.
```
docker-compose run web bundle exec rake db:create
docker-compose run web bundle exec rake db:migrate
docker-compose run web bundle exec rake db:seed
```
## OS
## Debian Stretch
See install/debian
If you do not have a Debian environment, then install docker and
execute /srv/gargantext/install/docker/dev/install.sh
You need a docker image.
All the steps are explained in [docker/dev/install.sh](docker/dev/install.sh) (not automatic yet).
Bug reports are welcome.
......@@ -26,6 +26,7 @@ ENV PYTHON_ENV /srv/env_3-5
RUN apt-get update && \
apt-get install -y \
apt-utils ca-certificates locales \
python3-dev \
sudo aptitude gcc g++ wget git postgresql-9.5 vim \
build-essential make
......@@ -44,7 +45,7 @@ RUN apt-get update && apt-get install -y \
postgresql-server-dev-9.5 libpq-dev libxml2 \
libxml2-dev xml-core libgfortran-5-dev \
virtualenv python3-virtualenv \
python3.5 python3-dev \
python3.5 \
python3-six python3-numpy python3-setuptools \
# ^for numpy, pandas
python3-numexpr \
......
#!/bin/bash
#######################################################################
# ____ _
# | _ \ ___ ___| | _____ _ __
# | | | |/ _ \ / __| |/ / _ \ '__|
# | |_| | (_) | (__| < __/ |
# |____/ \___/ \___|_|\_\___|_|
#
######################################################################
sudo docker build -t gargantext .
# OR Get the ID of your container
#ID=$(docker build .) && docker run -i -t $ID
# OR
# cd /tmp
# wget http://dl.gargantext.org/gargantext_docker_image.tar \
# && sudo docker import - gargantext:latest < gargantext_docker_image.tar
#!/bin/bash
echo "Adding user gargantua";
sudo adduser --disabled-password --gecos "" gargantua;
echo "Creating the environnement into /srv/";
for dir in "/srv/gargantext" "/srv/gargantext_lib" "/srv/gargantext_static" "/srv/gargantext_media""/srv/env_3-5"; do
sudo mkdir -p $dir ;
sudo chown gargantua:gargantua $dir ;
done;
echo "Downloading the libs. Please be patient!";
wget http://dl.gargantext.org/gargantext_lib.tar.bz2 \
&& tar xvjf gargantext_lib.tar.bz2 -o /srv/gargantext_lib \
&& sudo chown -R gargantua:gargantua /srv/gargantext_lib \
&& echo "Libs installed";
echo 'Install docker'
sudo apt-get install -y docker-engine
echo 'Build gargantext image'
cd /srv/gargantext/install/
./docker/config/build
#Next steps
#install and configure git
#sudo apt-get install -y git
#clone your SSH key
#cp ~/.ssh/id_rsa.pub id_rsa.pub
#clone the repo
#~ git clone ssh://gitolite@delanoe.org:1979/gargantext /srv/gargantext \
#~ && cd /srv/gargantext \
# get on branch
#~ && git fetch origin unstable \
#~ && git checkout unstable \
#~ echo "Currently on /srv/gargantext unstable branch";
#create your own branch
# git checkout -b my-unstable
......@@ -256,7 +256,7 @@
</div>
<!-- Sidebar -->
<div id="leftcolumn">
<div id="sidecolumn">
<div style="text-align: center;">
<a href="http://www.cnrs.fr" target="_blank"><img width="40%" src="https://www.ipmc.cnrs.fr/~duprat/comm/images/logo_cnrs_transparent.gif"></a>
</div>
......
......@@ -149,11 +149,11 @@ function CRUD( list_id , ngram_ids , http_method , callback) {
var div_info = "";
if( $( ".colorgraph_div" ).length>0 )
div_info += '<ul id="colorGraph" class="nav navbar-nav navbar-right">'
div_info += '<ul id="colorGraph" class="nav navbar-nav">'
div_info += ' <li class="dropdown">'
div_info += '<a href="#" class="dropdown-toggle" data-toggle="dropdown">'
div_info += ' <img title="Set Colors" src="/static/img/colors.png" width="20px"><b class="caret"></b></img>'
div_info += ' <img title="Set Colors" src="/static/img/colors.png" width="22px"><b class="caret"></b></img>'
div_info += '</a>'
div_info += ' <ul class="dropdown-menu">'
......@@ -186,11 +186,11 @@ function CRUD( list_id , ngram_ids , http_method , callback) {
div_info = "";
if( $( ".sizegraph_div" ).length>0 )
div_info += '<ul id="sizeGraph" class="nav navbar-nav navbar-right">'
div_info += '<ul id="sizeGraph" class="nav navbar-nav">'
div_info += ' <li class="dropdown">'
div_info += '<a href="#" class="dropdown-toggle" data-toggle="dropdown">'
div_info += ' <img title="Set Sizes" src="/static/img/NodeSize.png" width="20px"><b class="caret"></b></img>'
div_info += ' <img title="Set Sizes" src="/static/img/NodeSize.png" width="18px"><b class="caret"></b></img>'
div_info += '</a>'
div_info += ' <ul class="dropdown-menu">'
......
......@@ -18,14 +18,53 @@
}
.navbar {
margin-bottom:1px;
}
#defaultop{
#defaultop{
min-height: 5%;
max-height: 10%;
/*max-height: 10%;*/
text-align: center;
}
#defaultop li.basicitem{
/*font-family: "Helvetica Neue", Helvetica, Arial, sans-serif ;*/
padding-left: .4em;
padding-right: .4em;
padding-bottom: 0;
font-size: 90% ;
}
#defaultop > div {
float: none;
display: inline-block;
text-align: left;
}
#defaultop > div {
float: none;
display: inline-block;
text-align: left;
}
#defaultop .nav > li > a {
text-align: center;
padding-top: .4em;
padding-bottom: .2em;
margin-left: auto ;
margin-right: auto ;
}
/*searchnav should get same padding as our .navbar-nav > li > a or bootstrap's*/
#defaultop div#searchnav {
padding-top: 13px;
padding-bottom: 9px;
}
#defaultop .settingslider {
max-width: 80px;
display: inline-block ;
}
#sigma-example {
......@@ -165,9 +204,7 @@
display:inline-block;
border:solid 1px;
/*box-shadow: 0px 0px 0px 1px rgba(0,0,0,0.3); */
-moz-border-radius: 6px;
-webkit-border-radius: 6px;
-khtml-border-radius: 6px;'+
border-radius: 6px;
border-color:#BDBDBD;
padding:0px 2px 0px 2px;
margin:1px 0px 1px 0px;
......@@ -367,6 +404,12 @@
padding-left:5%;
}
/* small messages */
p.micromessage{
font-size: 85%;
color: #707070 ;
}
.btn-sm:hover {
font-weight: bold;
}
......@@ -376,7 +419,7 @@
.tab { display: inline-block; zoom:1; *display:inline; background: #eee; border: solid 1px #999; border-bottom: none; -moz-border-radius: 4px 4px 0 0; -webkit-border-radius: 4px 4px 0 0; }
.tab a { font-size: 12px; line-height: 2em; display: block; padding: 0 10px; outline: none; }
.tab a:hover { text-decoration: underline; }
.tab.active { background: #fff; padding-top: 6px; position: relative; top: 1px; border-color: #666; }
.tab.active { background: #fff; padding-top: 6px; position: relative; top: 3px; border-color: #666; }
.tab a.active { font-weight: bold; }
.tab-container .panel-container { background: #fff; border: solid #666 1px; padding: 10px; -moz-border-radius: 0 4px 4px 4px; -webkit-border-radius: 0 4px 4px 4px; }
.panel-container { margin-bottom: 10px; }
.fsslider {
position: relative;
min-width: 100px;
min-width: 80px;
height: 8px;
display: inline-block;
width: 100%;
......
......@@ -21,14 +21,15 @@ box-shadow: 0px 0px 3px 0px #888888;
}*/
#leftcolumn {
#sidecolumn {
overflow-y: scroll;
margin-right: -300px;
margin-left: 0px;
padding-bottom: 10px;
padding-left: 5px;
right: 300px;
width: 300px;
right: 0px;
/* this width one is just a first guess...
/* (it will be changed in main.js to sidecolumnSize param)
*/
width: 25em;
position: fixed;
height: 100%;
border: 1px #888888 solid;
......
......@@ -30,6 +30,11 @@ var mainfile = ["db.json"];
// getUrlParam.file = window.location.origin+"/"+$("#graphid").html(); // garg exclusive
// var corpusesList = {} // garg exclusive -> corpus comparison
var tagcloud_limit = 50;
// for the css of sidecolumn and canvasLimits size
var sidecolumnSize = "20%"
var current_url = window.location.origin+window.location.pathname+window.location.search
getUrlParam.file = current_url.replace(/projects/g, "api/projects")
......
......@@ -22,45 +22,27 @@
/[$\w]+/g
);
$.fn.visibleHeight = function() {
console.log('FUN t.TinawebJS:visibleHeight')
var elBottom, elTop, scrollBot, scrollTop, visibleBottom, visibleTop;
scrollTop = $(window).scrollTop();
scrollBot = scrollTop + $(window).height();
elTop = this.offset().top;
elBottom = elTop + this.outerHeight();
visibleTop = elTop < scrollTop ? scrollTop : elTop;
visibleBottom = elBottom > scrollBot ? scrollBot : elBottom;
return visibleBottom - visibleTop
}
// for new SigmaUtils
function sigmaLimits( sigmacanvas ) {
console.log('FUN t.TinawebJS:sigmaLimits')
pw=$( sigmacanvas ).width();
ph=$( sigmacanvas ).height();
// $("body").css("padding-top",0)
// var footer = ( $("footer").length>0) ? ($('#leftcolumn').position().top -$("footer").height()) : $('#leftcolumn').position().top*2;
var ancho_total = $( window ).width() - $('#leftcolumn').width() ;
var alto_total = $('#leftcolumn').visibleHeight() ;
// console.log("")
// console.log(footer)
// console.log(ancho_total)
// console.log(alto_total)
// console.log("")
sidebar=$('#leftcolumn').width();
anchototal=$('#dafixedtop').width();
$( sigmacanvas ).width(ancho_total);
$( sigmacanvas ).height( alto_total );
pw=$( sigmacanvas ).width();
ph=$( sigmacanvas ).height();
return "new canvas! w:"+pw+" , h:"+ph;
// on window resize
// @param canvasdiv: id of the div (without '#')
function sigmaLimits( canvasdiv ) {
console.log('FUN t.TinawebJS:sigmaLimits') ;
var canvas = document.getElementById(canvasdiv) ;
var sidecolumn = document.getElementById('sidecolumn') ;
var ancho_total = window.innerWidth - sidecolumn.offsetWidth ;
var alto_total = window.innerHeight - sidecolumn.offsetTop ;
// setting new size
canvas.style.width = ancho_total - 5 ;
canvas.style.height = alto_total - 5 ;
// fyi result
var pw=canvas.offsetWidth;
var ph=canvas.offsetHeight;
console.log("new canvas! w:"+pw+" , h:"+ph) ;
}
SelectionEngine = function() {
console.log('FUN t.TinawebJS:SelectionEngine:new')
// Selection Engine!! finally...
......@@ -381,6 +363,8 @@ SelectionEngine = function() {
TinaWebJS = function ( sigmacanvas ) {
console.log('FUN t.TinawebJS:TinaWebJS:new')
// '#canvasid'
this.sigmacanvas = sigmacanvas;
this.init = function () {
......@@ -392,11 +376,11 @@ TinaWebJS = function ( sigmacanvas ) {
return this.sigmacanvas;
}
this.AdjustSigmaCanvas = function ( sigmacanvas ) {
this.AdjustSigmaCanvas = function ( canvasdiv ) {
console.log('FUN t.TinawebJS:AdjustSigmaCanvas')
var canvasdiv = "";
if( sigmacanvas ) canvasdiv = sigmacanvas;
else canvasdiv = this.sigmacanvas;
if (! canvasdiv)
// '#canvasid' => 'canvasid'
canvasdiv = sigmacanvas.substring(1);
return sigmaLimits( canvasdiv );
}
......@@ -565,8 +549,8 @@ TinaWebJS = function ( sigmacanvas ) {
// === un/hide leftpanel === //
$("#aUnfold").click(function(e) {
//SHOW leftcolumn
sidebar = $("#leftcolumn");
//SHOW sidecolumn
sidebar = $("#sidecolumn");
fullwidth=$('#fixedtop').width();
e.preventDefault();
// $("#wrapper").toggleClass("active");
......@@ -590,7 +574,7 @@ TinaWebJS = function ( sigmacanvas ) {
}, 400);
}
else {
//HIDE leftcolumn
//HIDE sidecolumn
$("#aUnfold").attr("class","leftarrow");
sidebar.animate({
"right" : "-" + sidebar.width() + "px"
......
......@@ -178,7 +178,7 @@ function MainFunction( RES ) {
// [ Initiating Sigma-Canvas ]
var twjs_ = new TinaWebJS('#sigma-example');
print( twjs_.AdjustSigmaCanvas() );
$( window ).resize(function() { print(twjs_.AdjustSigmaCanvas()) });
window.onresize = function(){twjs_.AdjustSigmaCanvas()} // TODO: debounce?
// [ / Initiating Sigma-Canvas ]
print("categories: "+categories)
......@@ -357,6 +357,9 @@ function MainFunction( RES ) {
partialGraph.stopForceAtlas2();
}, fa2seconds*1000);
// apply width from settings on left column
document.getElementById('sidecolumn').style.width = sidecolumnSize ;
}
......
This diff is collapsed.
......@@ -119,13 +119,14 @@
<a tabindex="-1"
data-url="/projects/{{project.id}}/corpora/{{ corpus.id }}/explorer?field1=ngrams&amp;field2=ngrams&amp;distance=distributional&amp;bridgeness=5" onclick='gotoexplorer(this)' >With distributional distance</a>
</li>
<!--
<li>
<a tabindex="-1"
onclick="javascript:location.href='/projects/{{project.id}}/corpora/{{ corpus.id }}/myGraphs'"
data-target='#' href='#'>My Graphs
</a>
</li>
--!>
</ul>
......@@ -213,16 +214,11 @@
</div>
</div>
</div>
{% else %}
<div class="container theme-showcase">
<div class="jumbotron" style="margin-bottom:0">
</div>
</div>
{% endif %}
{% endif %}
{% endblock %}
{% block content %}
{% endblock %}
......@@ -235,7 +231,7 @@
<p>
Gargantext
<span class="glyphicon glyphicon-registration-mark" aria-hidden="true"></span>
, version 3.0.3.1,
, version 3.0.3.3,
<a href="http://www.cnrs.fr" target="blank" title="Institution that enables this project.">
Copyrights
<span class="glyphicon glyphicon-copyright-mark" aria-hidden="true"></span>
......
UNIT TESTS
==========
Prerequisite
------------
Running unit tests will involve creating a **temporary test DB** !
+ it implies **CREATEDB permssions** for settings.DATABASES.user
(this has security consequences)
+ for instance in gargantext you would need to run this in psql as postgres:
`# ALTER USER gargantua CREATEDB;`
A "principe de précaution" could be to allow gargantua the CREATEDB rights on the **dev** machines (to be able to run tests) and not give it on the **prod** machines (no testing but more protection just in case).
Usage
------
```
./manage.py test unittests/ -v 2 # in django root container directory
# or for a single module
./manage.py test unittests.tests_010_basic -v 2
```
( `-v 2` is the verbosity level )
Tests
------
1. **tests_010_basic**
2. ** tests ??? **
3. ** tests ??? **
4. ** tests ??? **
5. ** tests ??? **
6. ** tests ??? **
7. **tests_070_routes**
Checks the response types from the app url routes:
- "/"
- "/api/nodes"
- "/api/nodes/<ID>"
GargTestRunner
---------------
Most of the tests will interact with a DB but we don't want to touch the real one so we provide a customized test_runner class in `unittests/framework.py` that creates a test database.
It must be referenced in django's `settings.py` like this:
```
TEST_RUNNER = 'unittests.framework.GargTestRunner'
```
(This way the `./manage.py test` command will be using GargTestRunner.)
Using a DB session
------------------
To emulate a session the way we usually do it in gargantext, our `unittests.framework` also
provides a session object to the test database via `GargTestRunner.testdb_session`
To work correctly, it needs to be read *inside the test setup.*
**Example**
```
from unittests.framework import GargTestRunner
class MyTestRecipes(TestCase):
def setUp(self):
# -------------------------------------
session = GargTestRunner.testdb_session
# -------------------------------------
new_project = Node(
typename = 'PROJECT',
name = "hello i'm a project",
)
session.add(new_project)
session.commit()
```
Accessing the URLS
------------------
Django tests provide a client to browse the urls
**Example**
```
from django.test import Client
class MyTestRecipes(TestCase):
def setUp(self):
self.client = Client()
def test_001_get_front_page(self):
''' get the about page localhost/about '''
# --------------------------------------
the_response = self.client.get('/about')
# --------------------------------------
self.assertEqual(the_response.status_code, 200)
```
Logging in
-----------
Most of our functionalities are only available on login so we provide a fake user at the initialization of the test DB.
His login in 'pcorser' and password is 'peter'
**Example**
```
from django.test import Client
class MyTestRecipes(TestCase):
def setUp(self):
self.client = Client()
# login ---------------------------------------------------
response = self.client.post(
'/auth/login/',
{'username': 'pcorser', 'password': 'peter'}
)
# ---------------------------------------------------------
def test_002_get_to_a_restricted_page(self):
''' get the projects page /projects '''
the_response = self.client.get('/projects')
self.assertEqual(the_response.status_code, 200)
```
*Si vous aimez les aventures de Peter Corser, lisez l'album précédent ["Doors"](https://gogs.iscpif.fr/leclaire/doors)* (Scénario M. Leclaire, Dessins R. Loth) (disponible dans toutes les bonnes librairies)
FIXME
-----
url client get will still give read access to original DB ?
cf. http://stackoverflow.com/questions/19714521
cf. http://stackoverflow.com/questions/11046039
cf. test_073_get_api_one_node
"""
A test runner derived from default (DiscoverRunner) but adapted to our custom DB
cf. docs.djangoproject.com/en/1.9/topics/testing/advanced/#using-different-testing-frameworks
cf. gargantext/settings.py => TEST_RUNNER
cf. dbmigrate.py
FIXME url get will still give read access to original DB ?
cf. http://stackoverflow.com/questions/19714521
cf. http://stackoverflow.com/questions/11046039
cf. test_073_get_api_one_node
"""
# basic elements
from django.test.runner import DiscoverRunner, get_unique_databases_and_mirrors
from sqlalchemy import create_engine
from gargantext.settings import DATABASES
# things needed to create a user
from django.contrib.auth.models import User
# here we setup a minimal django so as to load SQLAlchemy models ---------------
# and then be able to import models and Base.metadata.tables
from os import environ
from django import setup
environ.setdefault("DJANGO_SETTINGS_MODULE", "gargantext.settings")
setup() # models can now be imported
from gargantext import models # Base is now filled
from gargantext.util.db import Base # contains metadata.tables
# ------------------------------------------------------------------------------
# things needed to provide a session
from sqlalchemy.orm import sessionmaker, scoped_session
class GargTestRunner(DiscoverRunner):
"""
We use the default test runner but we just add
our own dbmigrate elements at db creation
=> we let django.test.runner do the test db creation + auto migrations
=> we retrieve the test db name from django.test.runner
=> we create a test engine like in gargantext.db.create_engine but with the test db name
=> we create tables for our models like in dbmigrate with the test engine
TODO: list of tables to be created are hard coded in self.models
"""
# we'll also expose a session as GargTestRunner.testdb_session
testdb_session = None
def __init__(self, *args, **kwargs):
# our custom tables to be created (in correct order)
self.models = ['ngrams', 'nodes', 'contacts', 'nodes_nodes', 'nodes_ngrams', 'nodes_nodes_ngrams', 'nodes_ngrams_ngrams', 'nodes_hyperdata']
self.testdb_engine = None
# and execute default django init
old_config = super(GargTestRunner, self).__init__(*args, **kwargs)
def setup_databases(self, *args, **kwargs):
"""
Complement the database creation
by our own "models to tables" migration
"""
# default django setup performs base creation + auto migrations
old_config = super(GargTestRunner, self).setup_databases(*args, **kwargs)
# retrieve the testdb_name set by DiscoverRunner
testdb_names = []
for db_infos in get_unique_databases_and_mirrors():
# a key has the form: (IP, port, backend, dbname)
for key in db_infos:
# db_infos[key] has the form (dbname, {'default'})
testdb_names.append(db_infos[key][0])
# /!\ hypothèse d'une database unique /!\
testdb_name = testdb_names[0]
# now we use a copy of our normal db config...
db_params = DATABASES['default']
# ...just changing the name
db_params['NAME'] = testdb_name
# connect to this test db
testdb_url = 'postgresql+psycopg2://{USER}:{PASSWORD}@{HOST}:{PORT}/{NAME}'.format_map(db_params)
self.testdb_engine = create_engine( testdb_url )
print("TESTDB INIT: opened connection to database **%s**" % db_params['NAME'])
# we retrieve real tables declarations from our loaded Base
sqla_models = (Base.metadata.tables[model_name] for model_name in self.models)
# example: Base.metadata.tables['ngrams']
# ---------------------------------------
# Table('ngrams', Column('id', Integer(), table=<ngrams>, primary_key=True),
# Column('terms', String(length=255), table=<ngrams>),
# Column('n', Integer(), table=<ngrams>),
# schema=None)
# and now creation of each table in our test db (like dbmigrate)
for model in sqla_models:
try:
model.create(self.testdb_engine)
print('TESTDB INIT: created model: `%s`' % model)
except Exception as e:
print('TESTDB INIT ERROR: could not create model: `%s`, %s' % (model, e))
# we also create a session to provide it the way we usually do in garg
# (it's a class based static var to be able to share it with our tests)
GargTestRunner.testdb_session = scoped_session(sessionmaker(bind=self.testdb_engine))
# and let's create a user too otherwise we'll never be able to login
user = User.objects.create_user(username='pcorser', password='peter')
# old_config will be used by DiscoverRunner
# (to remove everything at the end)
return old_config
def teardown_databases(self, old_config, *args, **kwargs):
"""
After all tests
"""
# close the session
GargTestRunner.testdb_session.close()
# free the connection
self.testdb_engine.dispose()
# default django teardown performs destruction of the test base
super(GargTestRunner, self).teardown_databases(old_config, *args, **kwargs)
# snippets if we choose direct model building instead of setup() and Base.metadata.tables[model_name]
# from sqlalchemy.types import Integer, String, DateTime, Text, Boolean, Float
# from gargantext.models.nodes import NodeType
# from gargantext.models.hyperdata import HyperdataKey
# from sqlalchemy.schema import Table, Column, ForeignKey, UniqueConstraint, MetaData
# from sqlalchemy.dialects.postgresql import JSONB, DOUBLE_PRECISION
# from sqlalchemy.ext.mutable import MutableDict, MutableList
# Double = DOUBLE_PRECISION
# sqla_models = [i for i in sqla_models]
# print (sqla_models)
# sqla_models = [Table('ngrams', MetaData(bind=None), Column('id', Integer(), primary_key=True, nullable=False), Column('terms', String(length=255)), Column('n', Integer()), schema=None), Table('nodes', MetaData(bind=None), Column('id', Integer(), primary_key=True, nullable=False), Column('typename', NodeType()), Column('user_id', Integer(), ForeignKey('auth_user.id')), Column('parent_id', Integer(), ForeignKey('nodes.id')), Column('name', String(length=255)), Column('date', DateTime()), Column('hyperdata', JSONB(astext_type=Text())), schema=None), Table('contacts', MetaData(bind=None), Column('id', Integer(), primary_key=True, nullable=False), Column('user1_id', Integer(), primary_key=True, nullable=False), Column('user2_id', Integer(), primary_key=True, nullable=False), Column('is_blocked', Boolean()), Column('date_creation', DateTime()), schema=None), Table('nodes_nodes', MetaData(bind=None), Column('node1_id', Integer(), ForeignKey('nodes.id'), primary_key=True, nullable=False), Column('node2_id', Integer(), ForeignKey('nodes.id'), primary_key=True, nullable=False), Column('score', Float(precision=24)), schema=None), Table('nodes_ngrams', MetaData(bind=None), Column('node_id', Integer(), ForeignKey('nodes.id'), primary_key=True, nullable=False), Column('ngram_id', Integer(), ForeignKey('ngrams.id'), primary_key=True, nullable=False), Column('weight', Float()), schema=None), Table('nodes_nodes_ngrams', MetaData(bind=None), Column('node1_id', Integer(), ForeignKey('nodes.id'), primary_key=True, nullable=False), Column('node2_id', Integer(), ForeignKey('nodes.id'), primary_key=True, nullable=False), Column('ngram_id', Integer(), ForeignKey('ngrams.id'), primary_key=True, nullable=False), Column('score', Float(precision=24)), schema=None), Table('nodes_ngrams_ngrams', MetaData(bind=None), Column('node_id', Integer(), ForeignKey('nodes.id'), primary_key=True, nullable=False), Column('ngram1_id', Integer(), ForeignKey('ngrams.id'), primary_key=True, nullable=False), Column('ngram2_id', Integer(), ForeignKey('ngrams.id'), primary_key=True, nullable=False), Column('weight', Float(precision=24)), schema=None), Table('nodes_hyperdata', MetaData(bind=None), Column('id', Integer(), primary_key=True, nullable=False), Column('node_id', Integer(), ForeignKey('nodes.id')), Column('key', HyperdataKey()), Column('value_int', Integer()), Column('value_flt', DOUBLE_PRECISION()), Column('value_utc', DateTime(timezone=True)), Column('value_str', String(length=255)), Column('value_txt', Text()), schema=None)]
"""
BASIC UNIT TESTS FOR GARGANTEXT IN DJANGO
=========================================
"""
from django.test import TestCase
class NodeTestCase(TestCase):
def setUp(self):
from gargantext.models import nodes
self.node_1000 = nodes.Node(id=1000)
self.new_node = nodes.Node()
def test_010_node_has_id(self):
'''new_node.id'''
self.assertEqual(self.node_1000.id, 1000)
def test_011_node_write(self):
'''write new_node to DB and commit'''
from gargantext.util.db import session
self.assertFalse(self.new_node._sa_instance_state._attached)
session.add(self.new_node)
session.commit()
self.assertTrue(self.new_node._sa_instance_state._attached)
"""
ROUTE UNIT TESTS
================
"""
from django.test import TestCase
from django.test import Client
# to be able to create Nodes
from gargantext.models import Node
# to be able to compare in test_073_get_api_one_node()
from gargantext.constants import NODETYPES
# provides GargTestRunner.testdb_session
from unittests.framework import GargTestRunner
class RoutesChecker(TestCase):
def setUp(self):
"""
Will be run before each test
"""
self.client = Client()
# login with our fake user
response = self.client.post(
'/auth/login/',
{'username': 'pcorser', 'password': 'peter'}
)
print(response.status_code)
session = GargTestRunner.testdb_session
new_project = Node(
typename = 'PROJECT',
name = "hello i'm a project",
)
session.add(new_project)
session.commit()
self.a_node_id = new_project.id
print("created a project with id: %i" % new_project.id)
def test_071_get_front_page(self):
''' get the front page / '''
front_response = self.client.get('/')
self.assertEqual(front_response.status_code, 200)
self.assertIn('text/html', front_response.get('Content-Type'))
# on suppose que la page contiendra toujours ce titre
self.assertIn(b'<h1>Gargantext</h1>', front_response.content)
def test_072_get_api_nodes(self):
''' get "/api/nodes" '''
api_response = self.client.get('/api/nodes')
self.assertEqual(api_response.status_code, 200)
# 1) check the type is json
self.assertTrue(api_response.has_header('Content-Type'))
self.assertIn('application/json', api_response.get('Content-Type'))
# 2) let's try to get things in the json
json_content = api_response.json()
json_count = json_content['count']
json_nodes = json_content['records']
self.assertEqual(type(json_count), int)
self.assertEqual(type(json_nodes), list)
print("\ntesting nodecount: %i " % json_count)
def test_073_get_api_one_node(self):
''' get "api/nodes/<node_id>" '''
# we first get one node id by re-running this bit from test_072
a_node_id = self.client.get('/api/nodes').json()['records'][0]['id']
one_node_route = '/api/nodes/%i' % a_node_id
# print("\ntesting node route: %s" % one_node_route)
api_response = self.client.get(one_node_route)
self.assertTrue(api_response.has_header('Content-Type'))
self.assertIn('application/json', api_response.get('Content-Type'))
json_content = api_response.json()
nodetype = json_content['typename']
nodename = json_content['name']
print("\ntesting nodename:", nodename)
print("\ntesting nodetype:", nodetype)
self.assertIn(nodetype, NODETYPES)
# TODO http://localhost:8000/api/nodes?types[]=CORPUS
# £TODO test request.*
# print ("request")
# print ("user.id", request.user.id)
# print ("user.name", request.user.username)
# print ("path", request.path)
# print ("path_info", request.path_info)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment