Merge branch 'testing' into testing-share

4c00539c · delanoe · 3d3724d1 · 9a6805bf · 4c00539c · 4c00539c
Commit 4c00539c authored Jun 26, 2017 by delanoe
40 changed files
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,12 @@
 * Guided Tour
 * Sources form highlighting crawlers

+
+## Version 3.0.6.8
+* REPEC Crawler (connection with https://multivac.iscpif.fr)
+* HAL Crawler (connection to https://hal.archives-ouvertes.fr/)
+* New Graph Feature: color nodes by growth
+
 ## Version 3.0.6.4
 * COOC SQL improved


--- a/docs/architecture.md
+++ b/docs/architecture.md
+
+# Definitions and notation for the documentation (!= python notation)
+
+## Node
+
+The table (nodes) is a list of nodes: [Node]
+
+Each Node has:
+- a typename
+- a parent_id
+- a name
+
+
+### Each Node has a parent_id
+
+Node A
+├── Node B
+└── Node C
+
+If Node A is Parent of Node B and Node C
+then NodeA.id == NodeB.parent_id == NodeC.parent_id.
+
+### Each Node has a typename
+
+Notation: Node[foo](bar) is a Node of typename "foo" and with name "bar".
+
+Then:
+    - Then Node[project] is a project.
+    - Then Node[corpus] is a corpus.
+    - Then Node[document] is a document.
+
+
+### Each Node as a typename and a parent
+
+
+Node[user](name)
+├── Node[project](myProject1)
+│   ├── Node[corpus](myCorpus1)
+│   ├── Node[corpus](myCorpus2)
+│   └── Node[corpus](myCorpus3)
+└── Node[project](myProject2)
+
+
+/!\ 3 way to manage rights of the Node:
+    1) Then Node[User] is a folder containing all User projects and corpus and
+documents (i.e. Node[user] is the parent_id of the children).
+    2) Each node as a user_id (mainly used today)
+    3) Right management for the groups (implemented already but not
+    used since not connected to the frontend).
+
+
+## Global Parameters
+
+Global User is Gargantua (Node with typename User).
+This node is the parent of the others Nodes for parameters.
+
+Node[user](gargantua) (gargantua.id == Node[user].user_id)
+├── Node[TFIDF-Global](global) : without group
+│   ├── Node[tfidf](database1)
+│   ├── Node[tfidf](database2)
+│   └── Node[tfidf](database2)
+└── Node[anotherMetric](global)
+
+
+
+## NodeNgram
+
+NodeNgram is a relation of a Node with a ngram:
+    - document and ngrams
+    - metrics  and ngrams (position of the node metrics indicates the
+      context)
+
+
+
+
+# Community Parameters
+
+
+# User Parameters
+
+
+
--- a/docs/index.md
+++ b/docs/index.md
@@ -8,6 +8,9 @@ Gargantext is a web plateform to explore your corpora using text-mining[...](abo

 * [Take a tour](demo.md) of the different features offered by Gargantext

+## Architecture
+* [Architecture](architecture.md) Architecture of Gargantext
+
 ##Need some help?

 Ask the community at:

--- a/docs/manual_install.md
+++ b/docs/manual_install.md
-* Create user gargantua
-Main user of Gargantext is Gargantua (role of Pantagruel soon)!
-``` bash
-sudo adduser --disabled-password --gecos "" gargantua
-```
-
-* Create the directories you need
-
-here for the example gargantext package will be installed in /srv/
-``` bash
-for dir in "/srv/gargantext"
-           "/srv/gargantext_lib"
-           "/srv/gargantext_static"
-           "/srv/gargantext_media"
-           "/srv/env_3-5"; do
-    sudo mkdir -p $dir ;
-    sudo chown gargantua:gargantua $dir ;
-done
-```
-
-You should see:
-
-```bash
-$tree /srv
-/srv
-├── gargantext
-├── gargantext_lib
-├── gargantext_media
-│   └── srv
-│       └── env_3-5
-└── gargantext_static
-```
-* Get the main libraries
-
-Download uncompress and make main user access to it.
-PLease, Be patient due to the size of the packages libraries (27GO)
-this step can be long....
-
-``` bash
-wget http://dl.gargantext.org/gargantext_lib.tar.bz2 \
-&& tar xvjf gargantext_lib.tar.bz2 -o /srv/gargantext_lib \
-&& sudo chown -R gargantua:gargantua /srv/gargantext_lib \
-&& echo "Libs installed"
-```
-
-* Get the source code of Gargantext
-
-by cloning the repository of gargantext
-``` bash
-git clone ssh://gitolite@delanoe.org:1979/gargantext /srv/gargantext \
-        && cd /srv/gargantext \
-        && git fetch origin refactoring \
-        && git checkout refactoring \
-```
-
-    TODO(soon): git clone https://gogs.iscpif.fr/gargantext.git
-
-
-See the [next steps of installation procedure](install.md#Install)
--- a/docs/manual_install.md
+++ b/docs/manual_install.md
+tools/manual_install.md
\ No newline at end of file
--- a/gargantext/constants.py
+++ b/gargantext/constants.py
@@ -181,8 +181,6 @@ def get_tagger(lang):
    return tagger()


-
-
 RESOURCETYPES = [
    {   "type": 1,
        'name': 'Europresse',
@@ -242,19 +240,44 @@ RESOURCETYPES = [
        'crawler': None,
    },
   {    "type": 9,
-        "name": 'SCOAP [XML]',
+        "name": 'SCOAP [API/XML]',
        "parser": "CernParser",
        "format": 'MARC21',
        'file_formats':["zip","xml"],
        "crawler": "CernCrawler",
   },
+#   {    "type": 10,
+#        "name": 'REPEC [RIS]',
+#        "parser": "RISParser",
+#        "format": 'RIS',
+#        'file_formats':["zip","ris", "txt"],
+#        "crawler": None,
+#   },
+#
   {    "type": 10,
-        "name": 'REPEC [RIS]',
-        "parser": "RISParser",
-        "format": 'RIS',
-        'file_formats':["zip","ris", "txt"],
-        "crawler": None,
+        "name": 'REPEC [MULTIVAC API]',
+        "parser": "MultivacParser",
+        "format": 'JSON',
+        'file_formats':["zip","json"],
+        "crawler": "MultivacCrawler",
+   },
+
+   {    "type": 11,
+        "name": 'HAL [API]',
+        "parser": "HalParser",
+        "format": 'JSON',
+        'file_formats':["zip","json"],
+        "crawler": "HalCrawler",
   },
+
+   {    "type": 12,
+        "name": 'ISIDORE [SPARQLE API /!\ BETA]',
+        "parser": "IsidoreParser",
+        "format": 'JSON',
+        'file_formats':["zip","json"],
+        "crawler": "IsidoreCrawler",
+   },
+
 ]
 #shortcut for resources declaration in template
 PARSERS = [(n["type"],n["name"]) for n in RESOURCETYPES if n["parser"] is not None]

--- a/gargantext/urls.py
+++ b/gargantext/urls.py
@@ -28,19 +28,20 @@ import graph.urls
 import moissonneurs.urls


-urlpatterns = [ url(r'^admin/'     , admin.site.urls                           )
-              , url(r'^api/'       , include( gargantext.views.api.urls )      )
-              , url(r'^'           , include( gargantext.views.pages.urls )    )
+urlpatterns = [ url(r'^admin/'     , admin.site.urls                                   )
+              , url(r'^api/'       , include( gargantext.views.api.urls )              )
+              , url(r'^'           , include( gargantext.views.pages.urls )            )
              , url(r'^favicon.ico$', Redirect.as_view( url=static.url('favicon.ico')
-                                    , permanent=False), name="favicon")
+                                    , permanent=False), name="favicon"                 )

              # Module Graph
-              , url(r'^'           , include( graph.urls )                     )
+              , url(r'^'           , include( graph.urls )                             )

              # Module Annotation
              # tempo: unchanged doc-annotations routes --
-              , url(r'^annotations/', include( annotations_urls )              )
-              , url(r'^projects/(\d+)/corpora/(\d+)/documents/(\d+)/(focus=[0-9,]+)?$', annotations_main_view)
+              , url(r'^annotations/', include( annotations_urls )                      )
+              , url(r'^projects/(\d+)/corpora/(\d+)/documents/(\d+)/(focus=[0-9,]+)?$'
+                                                                , annotations_main_view)

              # Module Scrapers (Moissonneurs in French)
              , url(r'^moissonneurs/'   , include( moissonneurs.urls )                 )

--- a/gargantext/util/crawlers/CERN.py
+++ b/gargantext/util/crawlers/CERN.py
@@ -4,7 +4,7 @@
 # *****  CERN Scrapper    *****
 # ****************************
 # Author:c24b
-# Date: 27/05/2015
+# Date: 27/05/2016
 import hmac, hashlib
 import requests
 import os
@@ -96,10 +96,12 @@ class CernCrawler(Crawler):
        print(self.results_nb, "res")
        #self.generate_urls()
        return(self.ids)
+    
    def generate_urls(self):
        ''' generate raw urls of ONE record'''
        self.urls = ["http://repo.scoap3.org/record/%i/export/xm?ln=en" %rid for rid in self.ids]
        return self.urls
+    
    def fetch_records(self, ids):
        ''' for NEXT time'''
        raise NotImplementedError

--- a/gargantext/util/crawlers/HAL.py
+++ b/gargantext/util/crawlers/HAL.py
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# ****************************
+# ****  HAL      Scrapper  ***
+# ****************************
+# CNRS COPYRIGHTS
+# SEE LEGAL LICENCE OF GARGANTEXT.ORG
+
+from ._Crawler import *
+import json
+from gargantext.constants  import UPLOAD_DIRECTORY
+from math                  import trunc
+from gargantext.util.files import save
+
+class HalCrawler(Crawler):
+    ''' HAL API CLIENT'''
+    
+    def __init__(self):
+        # Main EndPoints
+        self.BASE_URL = "https://api.archives-ouvertes.fr"
+        self.API_URL  = "search"
+        
+        # Final EndPoints
+        # TODO : Change endpoint according type of database
+        self.URL   = self.BASE_URL + "/" + self.API_URL
+        self.status = []
+
+    def __format_query__(self, query=None):
+        '''formating the query'''
+
+        #search_field="title_t"
+        search_field="abstract_t"
+
+        return (search_field + ":" + "(" + query  + ")")
+
+
+    def _get(self, query, fromPage=1, count=10, lang=None):
+        # Parameters
+
+        fl = """ title_s
+               , abstract_s
+               , submittedDate_s
+               , journalDate_s
+               , authFullName_s
+               , uri_s
+               , isbn_s
+               , issue_s
+               , journalPublisher_s
+             """
+               #, authUrl_s
+               #, type_s
+        
+        wt = "json"
+
+        querystring = { "q"       : query
+                      , "rows"    : count
+                      , "start"   : fromPage
+                      , "fl"      : fl
+                      , "wt"      : wt
+                      }
+        
+        # Specify Headers
+        headers = { "cache-control" : "no-cache" }
+        
+        
+        # Do Request and get response
+        response = requests.request( "GET"
+                                   , self.URL
+                                   , headers = headers
+                                   , params  = querystring
+                                   )
+        
+        #print(querystring)
+        # Validation : 200 if ok else raise Value
+        if response.status_code == 200:
+            charset = ( response.headers["Content-Type"]
+                                .split("; ")[1]
+                                .split("=" )[1]
+                      )
+            return (json.loads(response.content.decode(charset)))
+        else:
+            raise ValueError(response.status_code, response.reason)
+        
+    def scan_results(self, query):
+        '''
+        scan_results : Returns the number of results
+        Query String -> Int
+        '''
+        self.results_nb = 0
+        
+        total = ( self._get(query)
+                      .get("response", {})
+                      .get("numFound"  ,  0)
+                )
+        
+        self.results_nb = total
+
+        return self.results_nb
+
+    def download(self, query):
+        
+        downloaded = False
+        
+        self.status.append("fetching results")
+
+        corpus = []
+        paging = 100
+        self.query_max = self.scan_results(query)
+        #print("self.query_max : %s" % self.query_max)
+
+        if self.query_max > QUERY_SIZE_N_MAX:
+            msg = "Invalid sample size N = %i (max = %i)" % ( self.query_max
+                                                            , QUERY_SIZE_N_MAX
+                                                            )
+            print("ERROR (scrap: Multivac d/l ): " , msg)
+            self.query_max = QUERY_SIZE_N_MAX
+        
+        #for page in range(1, trunc(self.query_max / 100) + 2):
+        for page in range(0, self.query_max, paging):
+            print("Downloading page %s to %s results" % (page, paging))
+            docs = (self._get(query, fromPage=page, count=paging)
+                        .get("response", {})
+                        .get("docs"   , [])
+                   )
+
+            for doc in docs:
+                corpus.append(doc)
+
+        self.path = save( json.dumps(corpus).encode("utf-8")
+                        , name='HAL.json'
+                        , basedir=UPLOAD_DIRECTORY
+                        )
+        downloaded = True
+        
+        return downloaded
--- a/gargantext/util/crawlers/ISIDORE.py
+++ b/gargantext/util/crawlers/ISIDORE.py
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# ****************************
+# ****  ISIDORE  Scrapper  ***
+# ****************************
+# CNRS COPYRIGHTS
+# SEE LEGAL LICENCE OF GARGANTEXT.ORG
+
+from ._Crawler import *
+import json
+from gargantext.constants  import UPLOAD_DIRECTORY
+from math                  import trunc
+from gargantext.util.files import save
+from gargantext.util.crawlers.sparql.bool2sparql import bool2sparql, isidore
+
+
+class IsidoreCrawler(Crawler):
+    ''' ISIDORE SPARQL API CLIENT'''
+    
+    def __init__(self):
+        # Main EndPoints
+        self.BASE_URL = "https://www.rechercheisidore.fr"
+        self.API_URL  = "sparql"
+        
+        # Final EndPoints
+        # TODO : Change endpoint according type of database
+        self.URL   = self.BASE_URL + "/" + self.API_URL
+        self.status = []
+
+    def __format_query__(self, query=None, count=False, offset=None, limit=None):
+        '''formating the query'''
+
+        return (bool2sparql(query, count=count, offset=offset, limit=limit))
+
+
+    def _get(self, query, offset=0, limit=None, lang=None):
+        '''Parameters to download data'''
+        
+        isidore(query, count=False, offset=offset, limit=limit)
+
+    def scan_results(self, query):
+        '''
+        scan_results : Returns the number of results
+        Query String -> Int
+        '''
+        self.results_nb = [n for n in isidore(query, count=True)][0]
+        return self.results_nb
+
+    def download(self, query):
+        
+        downloaded = False
+        
+        self.status.append("fetching results")
+
+        corpus = []
+        limit = 1000
+        self.query_max = self.scan_results(query)
+        print("self.query_max : %s" % self.query_max)
+
+        if self.query_max > QUERY_SIZE_N_MAX:
+            msg = "Invalid sample size N = %i (max = %i)" % ( self.query_max
+                                                            , QUERY_SIZE_N_MAX
+                                                            )
+            print("WARNING (scrap: ISIDORE d/l ): " , msg)
+            self.query_max = QUERY_SIZE_N_MAX
+        
+        for offset in range(0, self.query_max, limit):
+            print("Downloading result %s to %s" % (offset, self.query_max))
+
+            for doc in isidore(query, offset=offset, limit=limit) :
+                corpus.append(doc)
+
+        self.path = save( json.dumps(corpus).encode("utf-8")
+                        , name='ISIDORE.json'
+                        , basedir=UPLOAD_DIRECTORY
+                        )
+        downloaded = True
+        
+        return downloaded
--- a/gargantext/util/crawlers/ISTEX.py
+++ b/gargantext/util/crawlers/ISTEX.py
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# ****************************
+# ****  MULTIVAC Scrapper  ***
+# ****************************
+# CNRS COPYRIGHTS
+# SEE LEGAL LICENCE OF GARGANTEXT.ORG
+
 from ._Crawler import *
 import json


--- a/gargantext/util/crawlers/MULTIVAC.py
+++ b/gargantext/util/crawlers/MULTIVAC.py
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# ****************************
+# ****  MULTIVAC Scrapper  ***
+# ****************************
+# CNRS COPYRIGHTS
+# SEE LEGAL LICENCE OF GARGANTEXT.ORG
+
+from ._Crawler import *
+import json
+from gargantext.settings   import API_TOKENS
+from gargantext.constants  import UPLOAD_DIRECTORY
+from math                  import trunc
+from gargantext.util.files import save
+
+class MultivacCrawler(Crawler):
+    ''' Multivac API CLIENT'''
+    
+    def __init__(self):
+        self.apikey = API_TOKENS["MULTIVAC"]
+        
+        # Main EndPoints
+        self.BASE_URL = "https://api.iscpif.fr/v2"
+        self.API_URL  = "pvt/economy/repec/search"
+        
+        # Final EndPoints
+        # TODO : Change endpoint according type of database
+        self.URL   = self.BASE_URL + "/" + self.API_URL
+        self.status = []
+
+    def __format_query__(self, query=None):
+        '''formating the query'''
+        None
+
+    def _get(self, query, fromPage=1, count=10, lang=None):
+        # Parameters
+        querystring = { "q"       : query
+                      , "count"   : count
+                      , "from"    : fromPage
+                      , "api_key" : API_TOKENS["MULTIVAC"]["APIKEY"]
+                      }
+        
+        if lang is not None:
+            querystring["lang"] = lang
+        
+        # Specify Headers
+        headers = { "cache-control" : "no-cache" }
+        
+        
+        # Do Request and get response
+        response = requests.request( "GET"
+                                   , self.URL
+                                   , headers = headers
+                                   , params  = querystring
+                                   )
+        
+        #print(querystring)
+        # Validation : 200 if ok else raise Value
+        if response.status_code == 200:
+            charset = ( response.headers["Content-Type"]
+                                .split("; ")[1]
+                                .split("=" )[1]
+                      )
+            return (json.loads(response.content.decode(charset)))
+        else:
+            raise ValueError(response.status_code, response.reason)
+        
+    def scan_results(self, query):
+        '''
+        scan_results : Returns the number of results
+        Query String -> Int
+        '''
+        self.results_nb = 0
+        
+        total = ( self._get(query)
+                      .get("results", {})
+                      .get("total"  ,  0)
+                )
+        
+        self.results_nb = total
+
+        return self.results_nb
+
+    def download(self, query):
+        
+        downloaded = False
+        
+        self.status.append("fetching results")
+
+        corpus = []
+        paging = 100
+        self.query_max = self.scan_results(query)
+        #print("self.query_max : %s" % self.query_max)
+
+        if self.query_max > QUERY_SIZE_N_MAX:
+            msg = "Invalid sample size N = %i (max = %i)" % ( self.query_max
+                                                            , QUERY_SIZE_N_MAX
+                                                            )
+            print("ERROR (scrap: Multivac d/l ): " , msg)
+            self.query_max = QUERY_SIZE_N_MAX
+        
+        for page in range(1, trunc(self.query_max / 100) + 2):
+            print("Downloading page %s to %s results" % (page, paging))
+            docs = (self._get(query, fromPage=page, count=paging)
+                        .get("results", {})
+                        .get("hits"   , [])
+                   )
+
+            for doc in docs:
+                corpus.append(doc)
+
+        self.path = save( json.dumps(corpus).encode("utf-8")
+                        , name='Multivac.json'
+                        , basedir=UPLOAD_DIRECTORY
+                        )
+        downloaded = True
+        
+        return downloaded
--- a/gargantext/util/crawlers/_Crawler.py
+++ b/gargantext/util/crawlers/_Crawler.py
 # Scrapers config
 QUERY_SIZE_N_MAX     = 1000

-from gargantext.constants import get_resource
+from gargantext.constants import get_resource, QUERY_SIZE_N_MAX
 from gargantext.util.scheduling import scheduled
 from gargantext.util.db         import session
 from requests_futures.sessions import FuturesSession
@@ -18,31 +18,34 @@ class Crawler:

        #the name of corpus
        #that will be built in case of internal fileparsing
-        self.record = record
-        self.name = record["corpus_name"]
-        self.project_id = record["project_id"]
-        self.user_id = record["user_id"]
-        self.resource = record["source"]
-        self.type = get_resource(self.resource)
-        self.query = record["query"]
+        self.record       = record
+        self.name         = record["corpus_name"]
+        self.project_id   = record["project_id"]
+        self.user_id      = record["user_id"]
+        self.resource     = record["source"]
+        self.type         = get_resource(self.resource)
+        self.query        = record["query"]
        #format the sampling
        self.n_last_years = 5
-        self.YEAR = date.today().year
+        self.YEAR         = date.today().year
        #pas glop
        # mais easy version
-        self.MONTH = str(date.today().month)
+        self.MONTH        = str(date.today().month)
+        
        if len(self.MONTH) == 1:
            self.MONTH = "0"+self.MONTH
-        self.MAX_RESULTS = 1000
+        
+        self.MAX_RESULTS = QUERY_SIZE_N_MAX
+        
        try:
            self.results_nb = int(record["count"])
        except KeyError:
            #n'existe pas encore
            self.results_nb = 0
        try:
-            self.webEnv = record["webEnv"]
+            self.webEnv   = record["webEnv"]
            self.queryKey = record["queryKey"]
-            self.retMax = record["retMax"]
+            self.retMax   = record["retMax"]
        except KeyError:
            #n'exsite pas encore
            self.queryKey = None
@@ -67,6 +70,7 @@ class Crawler:
        if self.download():
            self.create_corpus()
            return self.corpus_id
+    
    def get_sampling_dates():
        '''Create a sample list of min and max date based on Y and M f*
        or N_LAST_YEARS results'''

--- a/gargantext/util/crawlers/sparql/bool2sparql-exe
+++ b/gargantext/util/crawlers/sparql/bool2sparql-exe
--- a/gargantext/util/crawlers/sparql/bool2sparql.py
+++ b/gargantext/util/crawlers/sparql/bool2sparql.py
+
+import subprocess
+import re
+from .sparql import Service
+#from sparql import Service
+
+def bool2sparql(rawQuery, count=False, offset=None, limit=None):
+    """
+    bool2sparql :: String -> Bool -> Int -> String
+    Translate a boolean query into a Sparql request
+    You need to build bool2sparql binaries before
+    See: https://github.com/delanoe/bool2sparql
+    """
+    query = re.sub("\"", "\'", rawQuery)
+    bashCommand = ["/srv/gargantext/gargantext/util/crawlers/sparql/bool2sparql-exe","-q",query]
+
+    if count is True :
+        bashCommand.append("-c")
+    else :
+        if offset is not None :
+            for command in ["--offset", str(offset)] :
+                bashCommand.append(command)
+        
+        if limit is not None :
+            for command in ["--limit", str(limit)] :
+                bashCommand.append(command)
+
+
+    process = subprocess.Popen(bashCommand, stdout=subprocess.PIPE)
+    output, error = process.communicate()
+    
+    if error is not None :
+        raise(error)
+    else :
+        print(output)
+        return(output.decode("utf-8"))
+
+def isidore(query, count=False, offset=None, limit=None):
+    """
+    isidore :: String -> Bool -> Int -> Either (Dict String) Int
+    use sparql-client either to search or to scan
+    """
+
+    query = bool2sparql(query, count=count, offset=offset, limit=limit)
+    
+    go = Service("https://www.rechercheisidore.fr/sparql/", "utf-8", "GET")
+    results = go.query(query)
+
+    if count is False:
+        for r in results:
+            doc        = dict()
+            doc_values = dict()
+            doc["url"], doc["title"], doc["date"], doc["abstract"], doc["source"] = r
+            
+            for k in doc.keys():
+                doc_values[k] = doc[k].value
+            
+            yield(doc_values)
+
+
+    else :
+        count = []
+        for r in results:
+            n, = r
+            count.append(int(n.value))
+        yield count[0]
+
+
+def test():
+    query = "delanoe"
+    limit  = 100
+    offset = 10
+
+    for d in isidore(query, offset=offset, limit=limit):
+        print(d["date"])
+    #print([n for n in isidore(query, count=True)])
+
+if __name__ == '__main__':
+    test()
--- a/gargantext/util/crawlers/sparql/sparql.py
+++ b/gargantext/util/crawlers/sparql/sparql.py
--- a/gargantext/util/generators/credits.py
+++ b/gargantext/util/generators/credits.py
@@ -8,29 +8,12 @@ import random

 _members = [

-    { 'first_name' : 'Constance', 'last_name' : 'de Quatrebarbes',
-     'mail' : '4barbesATgmail.com',
-     'website'  : 'http://c24b.github.io/',
-     'picture' : 'constance.jpg',
-     'role' : 'developer'},
-
    { 'first_name' : 'David', 'last_name' : 'Chavalarias',
     'mail' : 'david.chavalariasATiscpif.fr',
     'website' : 'http://chavalarias.com',
     'picture' : 'david.jpg',
     'role':'principal investigator'},

-   # { 'first_name' : 'Elias', 'last_name' : 'Showk',
-   #  'mail' : '',
-   #  'website' : 'https://github.com/elishowk',
-   #  'picture' : '', 'role' : 'developer'},
-
-    { 'first_name' : 'Mathieu', 'last_name' : 'Rodic',
-     'mail' : '',
-     'website'  : 'http://rodic.fr',
-     'picture' : 'mathieu.jpg',
-     'role' : 'developer'},
-
    { 'first_name' : 'Samuel', 'last_name' : 'Castillo J.',
     'mail' : 'kaisleanATgmail.com',
     'website'  : 'http://www.pksm3.droppages.com',
@@ -43,12 +26,6 @@ _members = [
     'picture' : 'maziyar.jpg',
     'role' : 'developer'},

-    { 'first_name' : 'Romain', 'last_name' : 'Loth',
-     'mail' : '',
-     'website'  : 'http://iscpif.fr',
-     'picture' : 'romain.jpg',
-     'role' : 'developer'},
-
    { 'first_name' : 'Alexandre', 'last_name' : 'Delanoë',
     'mail' : 'alexandre+gargantextATdelanoe.org',
     'website' : 'http://alexandre.delanoe.org',
@@ -59,9 +36,34 @@ _members = [
    # copy-paste the line above and write your informations please
 ]

+_membersPast = [
+    { 'first_name' : 'Constance', 'last_name' : 'de Quatrebarbes',
+     'mail' : '4barbesATgmail.com',
+     'website'  : 'http://c24b.github.io/',
+     'picture' : 'constance.jpg',
+     'role' : 'developer'},
+
+     { 'first_name' : 'Mathieu', 'last_name' : 'Rodic',
+     'mail' : '',
+     'website'  : 'http://rodic.fr',
+     'picture' : 'mathieu.jpg',
+     'role' : 'developer'},
+        
+    { 'first_name' : 'Romain', 'last_name' : 'Loth',
+     'mail' : '',
+     'website'  : 'http://iscpif.fr',
+     'picture' : 'romain.jpg',
+     'role' : 'developer'},
+
+    { 'first_name' : 'Elias', 'last_name' : 'Showk',
+     'mail' : '',
+     'website' : 'https://github.com/elishowk',
+     'picture' : '', 'role' : 'developer'},
+        ]
+
 _institutions = [
    { 'name' : 'Mines ParisTech', 'website' : 'http://mines-paristech.fr', 'picture' : 'mines.png', 'funds':''},
-    { 'name' : 'Institut Pasteur', 'website' : 'http://www.pasteur.fr', 'picture' : 'pasteur.png', 'funds':''},
+    #{ 'name' : 'Institut Pasteur', 'website' : 'http://www.pasteur.fr', 'picture' : 'pasteur.png', 'funds':''},
    { 'name' : 'EHESS', 'website' : 'http://www.ehess.fr', 'picture' : 'ehess.png', 'funds':''},
    #{ 'name' : '', 'website' : '', 'picture' : '', 'funds':''},
    # copy paste the line above and write your informations please
@@ -74,9 +76,10 @@ _labs = [
 ]

 _grants = [
+    { 'name' : 'Institut Mines Telecom', 'website' : 'https://www.imt.fr', 'picture' : 'IMT.jpg', 'funds':''},
    { 'name' : 'Forccast', 'website' : 'http://forccast.hypotheses.org/', 'picture' : 'forccast.png', 'funds':''},
    { 'name' : 'Mastodons', 'website' : 'http://www.cnrs.fr/mi/spip.php?article53&lang=fr', 'picture' : 'mastodons.png', 'funds':''},
-    { 'name' : 'ADEME', 'website' : 'http://www.ademe.fr', 'picture' : 'ademe.png', 'funds':''},
+    #{ 'name' : 'ADEME', 'website' : 'http://www.ademe.fr', 'picture' : 'ademe.png', 'funds':''},
    #{ 'name' : '', 'website' : '', 'picture' : '', 'funds':''},
    # copy paste the line above and write your informations please
 ]
@@ -86,6 +89,10 @@ def members():
    random.shuffle(_members)
    return _members

+def membersPast():
+    random.shuffle(_membersPast)
+    return _membersPast
+
 def institutions():
    random.shuffle(_institutions)
    return _institutions

--- a/gargantext/util/parsers/HAL.py
+++ b/gargantext/util/parsers/HAL.py
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# ****************************
+# ****  HAL Parser    ***
+# ****************************
+# CNRS COPYRIGHTS 2017
+# SEE LEGAL LICENCE OF GARGANTEXT.ORG
+
+from ._Parser import Parser
+from datetime import datetime
+import json
+
+class HalParser(Parser):
+
+    def parse(self, filebuf):
+        '''
+        parse :: FileBuff -> [Hyperdata]
+        '''
+        contents = filebuf.read().decode("UTF-8")
+        data = json.loads(contents)
+        
+        filebuf.close()
+        
+        json_docs = data
+        hyperdata_list = []
+        
+        hyperdata_path = { "id"       : "isbn_s"
+                         , "title"    : "title_s"
+                         , "abstract" : "abstract_s"
+                         , "source"   : "journalPublisher_s"
+                         , "url"      : "uri_s"
+                         , "authors"  : "authFullName_s"
+                         }
+
+        uris = set()
+
+        for doc in json_docs:
+
+            hyperdata = {}
+            
+            for key, path in hyperdata_path.items():
+                    
+                    field = doc.get(path, "NOT FOUND")
+                    if isinstance(field, list):
+                        hyperdata[key] = ", ".join(field)
+                    else:
+                        hyperdata[key] = field
+            
+            if hyperdata["url"] in uris:
+                print("Document already parsed")
+            else:
+                uris.add(hyperdata["url"])
+#            hyperdata["authors"] = ", ".join(
+#                                             [ p.get("person", {})
+#                                                .get("name"  , "")
+#                          
+#                                               for p in doc.get("hasauthor", [])
+#                                             ]
+#                                            )
+#            
+                maybeDate = doc.get("submittedDate_s", None)
+
+                if maybeDate is not None:
+                    date = datetime.strptime(maybeDate, "%Y-%m-%d %H:%M:%S")
+                else:
+                    date = datetime.now()
+
+                hyperdata["publication_date"] = date
+                hyperdata["publication_year"]  = str(date.year)
+                hyperdata["publication_month"] = str(date.month)
+                hyperdata["publication_day"]   = str(date.day)
+                
+                hyperdata_list.append(hyperdata)
+        
+        return hyperdata_list
--- a/gargantext/util/parsers/ISIDORE.py
+++ b/gargantext/util/parsers/ISIDORE.py
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# ****************************
+# ****  ISIDORE Parser    ***
+# ****************************
+# CNRS COPYRIGHTS
+# SEE LEGAL LICENCE OF GARGANTEXT.ORG
+
+from ._Parser import Parser
+from datetime import datetime
+import json
+
+class IsidoreParser(Parser):
+
+    def parse(self, filebuf):
+        '''
+        parse :: FileBuff -> [Hyperdata]
+        '''
+        contents = filebuf.read().decode("UTF-8")
+        data = json.loads(contents)
+        
+        filebuf.close()
+        
+        json_docs = data
+        hyperdata_list = []
+        
+        hyperdata_path = { "title"    : "title"
+                         , "abstract" : "abstract"
+                         , "authors"  : "authors"
+                         , "url"      : "url"
+                         , "source"   : "source"
+                         }
+        
+        uniq_id = set()
+
+        for doc in json_docs:
+
+            hyperdata = {}
+            
+            for key, path in hyperdata_path.items():
+                    hyperdata[key] = doc.get(path, "")
+            
+            if hyperdata["url"] not in uniq_id:
+                # Removing the duplicates implicitly
+                uniq_id.add(hyperdata["url"])
+                
+                # Source is the Journal Name 
+                hyperdata["source"] = doc.get("source", "ISIDORE Database")
+                
+                # Working on the date
+                maybeDate = doc.get("date"  , None)
+
+                if maybeDate is None:
+                    date = datetime.now()
+                else:
+                    try :
+                        # Model of date: 1958-01-01T00:00:00
+                        date = datetime.strptime(maybeDate, '%Y-%m-%dT%H:%M:%S')
+                    except :
+                        print("FIX DATE ISIDORE please >%s<" % maybeDate)
+                        date = datetime.now()
+
+                hyperdata["publication_date"] = date
+                hyperdata["publication_year"]  = str(date.year)
+                hyperdata["publication_month"] = str(date.month)
+                hyperdata["publication_day"]   = str(date.day)
+                
+                hyperdata_list.append(hyperdata)
+        
+        return hyperdata_list
--- a/gargantext/util/parsers/ISTEX.py
+++ b/gargantext/util/parsers/ISTEX.py
@@ -13,20 +13,21 @@ class ISTexParser(Parser):
        hyperdata_list = []
        hyperdata_path = {
            "id"                : "id",
-            "source"           : 'corpusName',
-            "title"             : 'title',
+            "source"            : "corpusName",
+            "title"             : "title",
            "genre"             : "genre",
-            "language_iso3"     : 'language',
-            "doi"               : 'doi',
-            "host"              : 'host',
-            "publication_date"  : 'publicationDate',
-            "abstract"  : 'abstract',
+            "language_iso3"     : "language",
+            "doi"               : "doi",
+            "host"              : "host",
+            "publication_date"  : "publicationDate",
+            "abstract"          : "abstract",
            # "authors"           : 'author',
-            "authorsRAW"        : 'author',
+            "authorsRAW"        : "author",
            #"keywords"          : "keywords"
        }

        suma = 0
+        
        for json_doc in json_docs:

            hyperdata = {}
@@ -103,7 +104,7 @@ class ISTexParser(Parser):
                    RealDate = RealDate[0]

                # print( RealDate ," | length:",len(RealDate))
-                Decision=""
+                Decision = True
                if len(RealDate)>4:
                    if len(RealDate)>8:
                        try: Decision = datetime.strptime(RealDate, '%Y-%b-%d').date()

--- a/gargantext/util/parsers/MULTIVAC.py
+++ b/gargantext/util/parsers/MULTIVAC.py
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# ****************************
+# ****  MULTIVAC Parser    ***
+# ****************************
+# CNRS COPYRIGHTS
+# SEE LEGAL LICENCE OF GARGANTEXT.ORG
+
+from ._Parser import Parser
+from datetime import datetime
+import json
+
+class MultivacParser(Parser):
+
+    def parse(self, filebuf):
+        '''
+        parse :: FileBuff -> [Hyperdata]
+        '''
+        contents = filebuf.read().decode("UTF-8")
+        data = json.loads(contents)
+        
+        filebuf.close()
+        
+        json_docs = data
+        hyperdata_list = []
+        
+        hyperdata_path = { "id"       : "id"
+                         , "title"    : "title"
+                         , "abstract" : "abstract"
+                         , "type"     : "type"
+                         }
+        
+        for json_doc in json_docs:
+
+            hyperdata = {}
+            
+            doc = json_doc["_source"]
+
+            for key, path in hyperdata_path.items():
+                    hyperdata[key] = doc.get(path, "")
+            
+            hyperdata["source"] = doc.get("serial"      , {})\
+                                     .get("journaltitle", "REPEC Database")
+            
+            try:
+                hyperdata["url"]    = doc.get("file", {})\
+                                         .get("url" , "")
+            except:
+                pass
+
+            hyperdata["authors"] = ", ".join(
+                                             [ p.get("person", {})
+                                                .get("name"  , "")
+                          
+                                               for p in doc.get("hasauthor", [])
+                                             ]
+                                            )
+            
+
+            year = doc.get("serial"  , {})\
+                      .get("issuedate", None)
+            
+            if year == "Invalide date":
+                year = doc.get("issuedate"  , None)
+
+            if year is None:
+                year = datetime.now()
+            else:
+                try:
+                    date = datetime.strptime(year, '%Y')
+                except:
+                    print("FIX DATE MULTIVAC REPEC %s" % year)
+                    date = datetime.now()
+
+            hyperdata["publication_date"] = date
+            hyperdata["publication_year"]  = str(date.year)
+            hyperdata["publication_month"] = str(date.month)
+            hyperdata["publication_day"]   = str(date.day)
+            
+            hyperdata_list.append(hyperdata)
+        
+        return hyperdata_list
--- a/gargantext/util/parsers/PUBMED.py
+++ b/gargantext/util/parsers/PUBMED.py
@@ -78,7 +78,7 @@ class PubmedParser(Parser):
            if "publication_month" in hyperdata: PubmedDate+=" "+hyperdata["publication_month"]
            if "publication_day" in hyperdata: PubmedDate+=" "+hyperdata["publication_day"]

-            Decision=""
+            Decision=True
            if len(RealDate)>4:
                if len(RealDate)>8:
                    try: Decision = datetime.strptime(RealDate, '%Y %b %d').date()

--- a/gargantext/util/toolchain/parsing.py
+++ b/gargantext/util/toolchain/parsing.py
@@ -175,7 +175,6 @@ def parse(corpus):
                        hyperdata = hyperdata,
                    )
                    session.add(document)
-                    session.commit()
                    documents_count += 1

                    if pending_add_error_stats:
@@ -190,6 +189,9 @@ def parse(corpus):
                        session.add(corpus)
                        session.commit()

+                # Commit any pending document
+                session.commit()
+
                # update info about the resource
                resource['extracted'] = True
                #print( "resource n°",i, ":", d, "docs inside this file")

--- a/gargantext/views/pages/main.py
+++ b/gargantext/views/pages/main.py
@@ -47,7 +47,8 @@ def about(request):
        context = {
            'user': request.user,
            'date': datetime.datetime.now(),
-            'team': credits.members(),
+            'team'    : credits.members(),
+            'teamPast': credits.membersPast(),
            'institutions': credits.institutions(),
            'labos': credits.labs(),
            'grants': credits.grants(),

--- a/install/gargamelle/Debian.sh
+++ b/install/gargamelle/Debian.sh
+#!/bin/bash
+
 ### Update and install base dependencies
 echo "############ DEBIAN LIBS ###############"
 apt-get update && \
@@ -32,26 +34,26 @@ update-locale LC_ALL=fr_FR.UTF-8
  libxml2-dev xml-core libgfortran-6-dev \
  libpq-dev \
  python3.5 \
-  python3-dev \
+  python3.5-dev \
  python3-six python3-numpy python3-setuptools \
  python3-numexpr \
  python3-pip \
-  libxml2-dev libxslt-dev zlib1g-dev
+  libxml2-dev libxslt-dev zlib1g-dev libigraph0-dev
  #libxslt1-dev
- 
- UPDATE AND CLEAN
+
+ # UPDATE AND CLEAN
 apt-get update && apt-get autoclean
 #NB: removing /var/lib will avoid to significantly fill up your /var/ folder on your native system
- 
+
 ########################################################################
 ### PYTHON ENVIRONNEMENT (as ROOT)
 ########################################################################
- 
+
 #adduser --disabled-password --gecos "" gargantua
- 
+
 cd /srv/
 pip3 install virtualenv
- virtualenv /srv/env_3-5
+ virtualenv /srv/env_3-5 -p /usr/bin/python3.5
 echo 'alias venv="source /srv/env_3-5/bin/activate"' >> ~/.bashrc
 # CONFIG FILES

@@ -60,9 +62,9 @@ update-locale LC_ALL=fr_FR.UTF-8
 source /srv/env_3-5/bin/activate && pip3 install -r /srv/gargantext/install/gargamelle/requirements.txt && \
 pip3  install git+https://github.com/zzzeek/sqlalchemy.git@rel_1_1 && \
 python3 -m nltk.downloader averaged_perceptron_tagger -d /usr/local/share/nltk_data
- 
+
 chown gargantua:gargantua -R /srv/env_3-5
- 
+
 #######################################################################
 ## POSTGRESQL DATA (as ROOT)
 #######################################################################

--- a/install/gargamelle/django_configure.sh
+++ b/install/gargamelle/django_configure.sh
@@ -14,7 +14,7 @@ echo "::::: DJANGO :::::"



-/bin/su gargantua -c 'source /env_3-5/bin/activate &&\
+su gargantua -c 'source /srv/env_3-5/bin/activate &&\
    echo "Activated env" &&\
    /srv/gargantext/manage.py makemigrations &&\
    /srv/gargantext/manage.py migrate && \
@@ -24,4 +24,4 @@ echo "::::: DJANGO :::::"
    /srv/gargantext/dbmigrate.py && \
    /srv/gargantext/manage.py createsuperuser'

-/usr/sbin/service postgresql stop
+service postgresql stop
--- a/install/gargamelle/nginx.config
+++ b/install/gargamelle/nginx.config
+
+##
+# You should look at the following URL's in order to grasp a solid understanding
+# of Nginx configuration files in order to fully unleash the power of Nginx.
+# http://wiki.nginx.org/Pitfalls
+# http://wiki.nginx.org/QuickStart
+# http://wiki.nginx.org/Configuration
+#
+# Generally, you will want to move this file somewhere, and start with a clean
+# file but keep this around for reference. Or just disable in sites-enabled.
+#
+# Please see /usr/share/doc/nginx-doc/examples/ for more detailed examples.
+##
+
+# the upstream component nginx needs to connect to
+upstream gargantext {
+    server unix:///tmp/gargantext.sock; # for a file socket
+    #server 127.0.0.1:8001; # for a web port socket (we'll use this first)
+}
+
+# Default server configuration
+#
+server {
+        listen 80 default_server;
+        listen [::]:80 default_server;
+
+        # SSL configuration
+        #
+        # listen 443 ssl default_server;
+        # listen [::]:443 ssl default_server;
+        #
+        # Note: You should disable gzip for SSL traffic.
+        # See: https://bugs.debian.org/773332
+        #
+        # Read up on ssl_ciphers to ensure a secure configuration.
+        # See: https://bugs.debian.org/765782
+        #
+        # Self signed certs generated by the ssl-cert package
+        # Don't use them in a production server!
+        #
+        # include snippets/snakeoil.conf;
+
+        client_max_body_size 800M;
+        client_body_timeout 12;
+        client_header_timeout 12;
+        keepalive_timeout 15;
+        send_timeout 10;
+
+        root /var/www/html;
+
+        # Add index.php to the list if you are using PHP
+        #index index.html index.htm index.nginx-debian.html;
+
+        server_name _ stable.gargantext.org gargantext.org ;
+
+            # Django media
+
+        location /media  {
+                alias /var/www/gargantext/media;  # your Django project's media files - amend as required
+        }
+
+        location /static {
+                alias /srv/gargantext_static; # your Django project's static files - amend as required
+        }
+
+        # Finally, send all non-media requests to the Django server.
+        location / {
+                uwsgi_pass  gargantext;
+                include     uwsgi_params;
+        }
+        #access_log off;
+        access_log /var/log/nginx/access.log;
+        error_log /var/log/nginx/error.log;
+}
+
+
+server {
+    listen 80 ;
+    listen [::]:80;
+    server_name dl.gargantext.org ;
+    error_page 404 /index.html;
+
+
+    location / {
+        root /var/www/dl ;
+        proxy_set_header Host $host;
+        proxy_buffering off;
+
+}
+    access_log /var/log/nginx/dl.gargantext.org-access.log;
+    error_log /var/log/nginx/dl.gargantext.org-error.log;
+}
+
+
--- a/install/gargamelle/requirements.txt
+++ b/install/gargamelle/requirements.txt
 # try bottleneck
+eventlet==0.20.1
 amqp==1.4.9
 anyjson==0.3.3
 billiard==3.3.0.23

--- a/moissonneurs/hal.py
+++ b/moissonneurs/hal.py
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# ****************************
+# ***** HAL Crawler *****
+# ****************************
+# LICENCE: GARGANTEXT.org Licence
+
+RESOURCE_TYPE_HAL = 11
+
+from django.shortcuts               import redirect, render
+from django.http                    import Http404, HttpResponseRedirect \
+                                                  , HttpResponseForbidden
+
+from gargantext.constants           import get_resource, load_crawler, QUERY_SIZE_N_MAX
+from gargantext.models.nodes        import Node
+from gargantext.util.db             import session
+from gargantext.util.db_cache       import cache
+from gargantext.util.http           import JsonHttpResponse
+from gargantext.util.scheduling     import scheduled
+from gargantext.util.toolchain      import parse_extract_indexhyperdata
+
+
+def query( request):
+    '''get GlobalResults()'''
+    if request.method == "POST":
+        query = request.POST["query"]
+        source = get_resource(RESOURCE_TYPE_HAL)
+        if source["crawler"] is not None:
+            crawlerbot = load_crawler(source)()
+            #old raw way to get results_nb
+            results = crawlerbot.scan_results(query)
+            #ids = crawlerbot.get_ids(query)
+            print(results)
+            return JsonHttpResponse({"results_nb":crawlerbot.results_nb})
+
+def save(request, project_id):
+    '''save'''
+    if request.method == "POST":
+
+        query = request.POST.get("query")
+        try:
+            N = int(request.POST.get("N"))
+        except:
+            N = 0
+        print(query, N)
+        #for next time
+        #ids = request.POST["ids"]
+        source = get_resource(RESOURCE_TYPE_HAL)
+        if N == 0:
+            raise Http404()
+        if N > QUERY_SIZE_N_MAX:
+            N = QUERY_SIZE_N_MAX
+
+        try:
+            project_id = int(project_id)
+        except ValueError:
+            raise Http404()
+        # do we have a valid project?
+        project = session.query( Node ).filter(Node.id == project_id).first()
+        if project is None:
+            raise Http404()
+        user = cache.User[request.user.id]
+        if not user.owns(project):
+            return HttpResponseForbidden()
+        # corpus node instanciation as a Django model
+
+        corpus = Node(
+            name = query,
+            user_id = request.user.id,
+            parent_id = project_id,
+            typename = 'CORPUS',
+                        hyperdata    = { "action"        : "Scrapping data"
+                                        }
+        )
+
+        #download_file
+        crawler_bot = load_crawler(source)()
+        #for now no way to force downloading X records
+
+        #the long running command
+        filename = crawler_bot.download(query)
+        corpus.add_resource(
+           type = source["type"]
+        #,  name = source["name"]
+        ,  path = crawler_bot.path
+                           )
+
+        session.add(corpus)
+        session.commit()
+        #corpus_id = corpus.id
+
+        try:
+            scheduled(parse_extract_indexhyperdata)(corpus.id)
+        except Exception as error:
+            print('WORKFLOW ERROR')
+            print(error)
+            try:
+                print_tb(error.__traceback__)
+            except:
+                pass
+            # IMPORTANT ---------------------------------
+            # sanitize session after interrupted transact
+            session.rollback()
+            # --------------------------------------------
+
+        return render(
+            template_name = 'pages/projects/wait.html',
+            request = request,
+            context = {
+                'user'   : request.user,
+                'project': project,
+            },
+        )
+
+
+    data = [query_string,query,N]
+    print(data)
+    return JsonHttpResponse(data)
+
--- a/moissonneurs/isidore.py
+++ b/moissonneurs/isidore.py
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# ****************************
+# ***** ISIDORE Crawler  *****
+# ****************************
+RESOURCE_TYPE_ISIDORE = 12
+
+from django.shortcuts import redirect, render
+from django.http import Http404, HttpResponseRedirect, HttpResponseForbidden
+
+from gargantext.constants           import get_resource, load_crawler, QUERY_SIZE_N_MAX
+from gargantext.models.nodes        import Node
+from gargantext.util.db             import session
+from gargantext.util.db_cache       import cache
+from gargantext.util.http           import JsonHttpResponse
+from gargantext.util.scheduling     import scheduled
+from gargantext.util.toolchain      import parse_extract_indexhyperdata
+
+
+def query( request):
+    '''get GlobalResults()'''
+    if request.method == "POST":
+        query = request.POST["query"]
+        source = get_resource(RESOURCE_TYPE_ISIDORE)
+        if source["crawler"] is not None:
+            crawlerbot = load_crawler(source)()
+            #old raw way to get results_nb
+            results = crawlerbot.scan_results(query)
+            #ids = crawlerbot.get_ids(query)
+            return JsonHttpResponse({"results_nb":crawlerbot.results_nb})
+
+def save(request, project_id):
+    '''save'''
+    if request.method == "POST":
+
+        query = request.POST.get("query")
+        try:
+            N = int(request.POST.get("N"))
+        except:
+            N = 0
+        print(query, N)
+        #for next time
+        #ids = request.POST["ids"]
+        source = get_resource(RESOURCE_TYPE_ISIDORE)
+        if N == 0:
+            raise Http404()
+        if N > QUERY_SIZE_N_MAX:
+            N = QUERY_SIZE_N_MAX
+
+        try:
+            project_id = int(project_id)
+        except ValueError:
+            raise Http404()
+        # do we have a valid project?
+        project = session.query( Node ).filter(Node.id == project_id).first()
+        if project is None:
+            raise Http404()
+        user = cache.User[request.user.id]
+        if not user.owns(project):
+            return HttpResponseForbidden()
+        # corpus node instanciation as a Django model
+
+        corpus = Node(
+            name = query,
+            user_id = request.user.id,
+            parent_id = project_id,
+            typename = 'CORPUS',
+                        hyperdata    = { "action"        : "Scrapping data"
+                                        , "language_id" : "fr"
+                                        }
+        )
+
+        #download_file
+        crawler_bot = load_crawler(source)()
+        #for now no way to force downloading X records
+
+        #the long running command
+        filename = crawler_bot.download(query)
+        corpus.add_resource(
+           type = source["type"]
+        #,  name = source["name"]
+        ,  path = crawler_bot.path
+                           )
+
+        session.add(corpus)
+        session.commit()
+        #corpus_id = corpus.id
+
+        try:
+            scheduled(parse_extract_indexhyperdata)(corpus.id)
+        except Exception as error:
+            print('WORKFLOW ERROR')
+            print(error)
+            try:
+                print_tb(error.__traceback__)
+            except:
+                pass
+            # IMPORTANT ---------------------------------
+            # sanitize session after interrupted transact
+            session.rollback()
+            # --------------------------------------------
+
+        return render(
+            template_name = 'pages/projects/wait.html',
+            request = request,
+            context = {
+                'user'   : request.user,
+                'project': project,
+            },
+        )
+
+
+    data = [query_string,query,N]
+    print(data)
+    return JsonHttpResponse(data)
--- a/moissonneurs/multivac.py
+++ b/moissonneurs/multivac.py
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# ****************************
+# ***** MULTIVAC Crawler *****
+# ****************************
+# LICENCE: GARGANTEXT.org Licence
+
+RESOURCE_TYPE_MULTIVAC = 10
+
+from django.shortcuts import redirect, render
+from django.http import Http404, HttpResponseRedirect, HttpResponseForbidden
+
+from gargantext.constants           import get_resource, load_crawler, QUERY_SIZE_N_MAX
+from gargantext.models.nodes        import Node
+from gargantext.util.db             import session
+from gargantext.util.db_cache       import cache
+from gargantext.util.http           import JsonHttpResponse
+from gargantext.util.scheduling     import scheduled
+from gargantext.util.toolchain      import parse_extract_indexhyperdata
+
+
+
+def query( request):
+    '''get GlobalResults()'''
+    if request.method == "POST":
+        query = request.POST["query"]
+        source = get_resource(RESOURCE_TYPE_MULTIVAC)
+        if source["crawler"] is not None:
+            crawlerbot = load_crawler(source)()
+            #old raw way to get results_nb
+            results = crawlerbot.scan_results(query)
+            #ids = crawlerbot.get_ids(query)
+            print(results)
+            return JsonHttpResponse({"results_nb":crawlerbot.results_nb})
+
+def save(request, project_id):
+    '''save'''
+    if request.method == "POST":
+
+        query = request.POST.get("query")
+        try:
+            N = int(request.POST.get("N"))
+        except:
+            N = 0
+        print(query, N)
+        #for next time
+        #ids = request.POST["ids"]
+        source = get_resource(RESOURCE_TYPE_MULTIVAC)
+        if N == 0:
+            raise Http404()
+        if N > QUERY_SIZE_N_MAX:
+            N = QUERY_SIZE_N_MAX
+
+        try:
+            project_id = int(project_id)
+        except ValueError:
+            raise Http404()
+        # do we have a valid project?
+        project = session.query( Node ).filter(Node.id == project_id).first()
+        if project is None:
+            raise Http404()
+        user = cache.User[request.user.id]
+        if not user.owns(project):
+            return HttpResponseForbidden()
+        # corpus node instanciation as a Django model
+
+        corpus = Node(
+            name = query,
+            user_id = request.user.id,
+            parent_id = project_id,
+            typename = 'CORPUS',
+                        hyperdata    = { "action"        : "Scrapping data"
+                                        , "language_id" : "en"
+                                        }
+        )
+
+        #download_file
+        crawler_bot = load_crawler(source)()
+        #for now no way to force downloading X records
+
+        #the long running command
+        filename = crawler_bot.download(query)
+        corpus.add_resource(
+           type = source["type"]
+        #,  name = source["name"]
+        ,  path = crawler_bot.path
+                           )
+
+        session.add(corpus)
+        session.commit()
+        #corpus_id = corpus.id
+
+        try:
+            scheduled(parse_extract_indexhyperdata)(corpus.id)
+        except Exception as error:
+            print('WORKFLOW ERROR')
+            print(error)
+            try:
+                print_tb(error.__traceback__)
+            except:
+                pass
+            # IMPORTANT ---------------------------------
+            # sanitize session after interrupted transact
+            session.rollback()
+            # --------------------------------------------
+
+        return render(
+            template_name = 'pages/projects/wait.html',
+            request = request,
+            context = {
+                'user'   : request.user,
+                'project': project,
+            },
+        )
+
+
+    data = [query_string,query,N]
+    print(data)
+    return JsonHttpResponse(data)
+
--- a/moissonneurs/urls.py
+++ b/moissonneurs/urls.py
@@ -10,32 +10,35 @@
 # moissonneurs == getting data from external databases


-# Available databases :
-## Pubmed
-## IsTex,
-## CERN
-
-
 from django.conf.urls import url

-import moissonneurs.pubmed as pubmed
-import moissonneurs.istex  as istex
-import moissonneurs.cern  as cern
-
-# TODO
-#import moissonneurs.hal         as hal
-#import moissonneurs.revuesOrg   as revuesOrg
-
+# Available databases :
+import moissonneurs.pubmed   as pubmed
+import moissonneurs.istex    as istex
+import moissonneurs.cern     as cern
+import moissonneurs.multivac as multivac
+import moissonneurs.hal      as hal
+import moissonneurs.isidore  as isidore

-# TODO ?
-# REST API for the moissonneurs
+# TODO : ISIDORE

 # /!\ urls patterns here are *without* the trailing slash
-urlpatterns = [ url(r'^pubmed/query$'     , pubmed.query    )
-              , url(r'^pubmed/save/(\d+)' , pubmed.save     )
-
-              , url(r'^istex/query$'      , istex.query     )
-              , url(r'^istex/save/(\d+)'  , istex.save      )
-              , url(r'^cern/query$'      , cern.query       )
-              , url(r'^cern/save/(\d+)'  , cern.save        )
+urlpatterns = [ url(r'^pubmed/query$'       , pubmed.query   )
+              , url(r'^pubmed/save/(\d+)'   , pubmed.save    )
+
+              , url(r'^istex/query$'        , istex.query    )
+              , url(r'^istex/save/(\d+)'    , istex.save     )
+              
+              , url(r'^cern/query$'         , cern.query     )
+              , url(r'^cern/save/(\d+)'     , cern.save      )
+              
+              , url(r'^multivac/query$'     , multivac.query )
+              , url(r'^multivac/save/(\d+)' , multivac.save  )
+
+              , url(r'^hal/query$'          , hal.query      )
+              , url(r'^hal/save/(\d+)'      , hal.save       )
+
+             , url(r'^isidore/query$'      , isidore.query   )
+             , url(r'^isidore/save/(\d+)'  , isidore.save    )
+              
              ]
--- a/static/img/credits/IMT.jpg
+++ b/static/img/credits/IMT.jpg
--- a/templates/pages/main/about.html
+++ b/templates/pages/main/about.html
@@ -183,9 +183,55 @@
                        </div>
                    </div>
                </div>
+        {% endif %}
+
+        {% if teamPast %}
+        <div class="panel panel-default">
+            <div class="panel-heading">
+                <h2 class="panel-title">
+                    <a data-toggle="collapse" data-parent="#accordion" href="#collapseTeamPast">
+                        <center>
+                            <h2>
+                                <span class="glyphicon glyphicon-question-sign" aria-hidden="true"></span>
+                                Former Developers
+                                <span class="glyphicon glyphicon-question-sign" aria-hidden="true"></span>
+                            </h2>
+                        </center>
+                    </a>
+                </h2>
+            </div>
+            <div id="collapseTeamPast" class="panel-collapse collapse" role="tabpanel">
+                <div class="panel-body">
+                    <div class="container">
+                        <div class="row">
+                            <div class="thumbnails">
+                                {% for member in teamPast %}
+                                <div class="col-md-5 ">
+                                    <div class="thumbnail">
+                                        <div class="caption">
+                                            <center>
+                                            <h3>{{ member.first_name }} {{member.last_name }}</h3>
+                                            {% if member.role %}
+                                            <p class="description">{{ member.role }}</p>
+                                            {% endif %}
+                                            </center>
+                                        </div>
+                                    </div>
+                                </div>
+                                {% endfor %}
+                            </div>
+                        </div>
+                    </div>
+                </div>
            </div>
        </div>
        {% endif %}
+            
+            </div>
+        </div>
+
+
+

        <div class="panel panel-default">
            <div class="panel-heading">

--- a/templates/pages/menu.html
+++ b/templates/pages/menu.html
@@ -367,7 +367,7 @@
            <p>
                Gargantext
                <span class="glyphicon glyphicon-registration-mark" aria-hidden="true"></span>
-                , version 3.0.6.7,
+                , version 3.0.6.9.4,
                <a href="http://www.cnrs.fr" target="blank" title="Institution that enables this project.">
                    Copyrights
                    <span class="glyphicon glyphicon-copyright-mark" aria-hidden="true"></span>

--- a/templates/pages/projects/modals.tpl
+++ b/templates/pages/projects/modals.tpl
@@ -86,12 +86,12 @@
        <button type="button" class="close" data-dismiss="modal" aria-label="Close">
          <span aria-hidden="true">&times;</span>
        </button>
-    <h2 class="modal-title"><h2><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span>  Uploading corpus...</h2>
+    <h2 class="modal-title"><h2><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span>Building corpus...</h2>
  </div>
  <div class="modal-body">
    <h5>
-    Your file has been uploaded !
-    Gargantext need some time to eat it.
+    Gargantext is gathering your texts
+      and need some time to eat it.
    Duration depends on the size of the dish.
  </h5>
  </div>

--- a/templates/pages/projects/moissonneurs.js
+++ b/templates/pages/projects/moissonneurs.js
@@ -209,9 +209,11 @@
  function CustomForSelect( selected ) {
      // show Radio-Inputs and trigger FileOrNotFile>@upload-file events
      selected = selected.toLowerCase()
-      var is_pubmed = (selected.indexOf('pubmed') != -1);
-      var is_istex = (selected.indexOf('istex') != -1);
-      if (is_pubmed || is_istex) {
+      var is_pubmed = (selected.indexOf('pubmed')  != -1);
+      var is_istex  = (selected.indexOf('istex' )  != -1);
+      var is_repec  = (selected.indexOf('repec' )  != -1);
+      
+      if (is_pubmed || is_istex || is_repec) {
          // if(selected=="pubmed") {
          console.log("show the button for: " + selected)
          $("#pubmedcrawl").css("visibility", "visible");

--- a/templates/pages/projects/overview.html
+++ b/templates/pages/projects/overview.html
@@ -41,39 +41,42 @@
 <div class="container theme-showcase" role="main">
    <div class="jumbotron">
        <div class="row">
-        <div class="col-md-4">
-            <h1>
-                <span class="glyphicon glyphicon-home" aria-hidden="true"></span>
-                Projects
-            </h1>
-        </div>
-        <div class="col-md-3"></div>
-        <div class="col-md-5">
-            <p id="project" class="help">
-            <br>
-            <button id="add" type="button" class="btn btn-primary btn-lg help" data-container="body" data-toggle="popover" data-placement="bottom">
-                <span class="glyphicon glyphicon-plus" aria-hidden="true"></span>
-                Add a new project
-
-            </button>
-            <div id="popover-content"  class="hide">
-                <div id="createForm" class="form-group">
-                    {% csrf_token %}
-                    <div id="status-form" class="collapse">
-                    </div>
-                    <div class="row inline">
-                      <label class="col-lg-3" for="inputName" ><span class="pull-right">Name:</span></label>
-                      <input class="col-lg-8" type="text" id="inputName" class="form-control">
-                    </div>
-
-                    <div class="row inline">
-                      <div class="col-lg-3"></div>
-                      <button id="createProject" class="btn btn-primary btn-sm col-lg-8 push-left">Add Project</button>
-                      <div class="col-lg-2"></div>
+            <div class="col-md-4">
+                <h1>
+                    <span class="glyphicon glyphicon-home" aria-hidden="true"></span>
+                    Projects
+                </h1>
+            </div>
+            <div class="col-md-3"></div>
+            <div class="col-md-5">
+                <p id="project" class="help">
+                    <br>
+                    <button id="add" type="button" class="btn btn-primary btn-lg help" data-container="body" data-toggle="popover" data-placement="bottom">
+                        <span class="glyphicon glyphicon-plus" aria-hidden="true"></span>
+                        Add a new project
+
+                    </button>
+                    <div id="popover-content" class="hide">
+                        <form>
+                            <div id="createForm" class="form-group">
+                                {% csrf_token %}
+                                <div id="status-form" class="collapse"></div>
+
+                                <div class="row inline">
+                                    <label class="col-lg-3" for="inputName" ><span class="pull-right">Name:</span></label>
+                                    <input class="col-lg-8" type="text" id="inputName" class="form-control">
+                                </div>
+
+                                <div class="row inline">
+                                    <div class="col-lg-3"></div>
+                                    <button id="createProject" class="btn btn-primary btn-sm col-lg-8 push-left">Add Project</button>
+                                    <div class="col-lg-2"></div>
+                                </div>
+                            </div>
+                        </form>
                    </div>
-                  </div>
-              </div>
-            </p>
+                </p>
+            </div>
        </div>
    </div>
 </div>
@@ -87,7 +90,7 @@

    </div>
    <!-- CHECKBOX EDITION -->
-          <!--
+    <!--
    <div class="row collapse" id="editor">
          <button title="delete selected project" type="button" class="btn btn-danger" id="delete">
            <span class="glyphicon glyphicon-trash " aria-hidden="true" ></span>
@@ -98,9 +101,8 @@
          <!-- <button type="button" class="btn btn-info" id="recalculate">
                  <span class="glyphicon glyphicon-refresh " aria-hidden="true" onclick="recalculateProjects()"></span>
          </button>
-          -->
-
    </div>
+    -->

    <br />


--- a/templates/pages/projects/project.html
+++ b/templates/pages/projects/project.html
--- a/templates/pages/projects/wait.html
+++ b/templates/pages/projects/wait.html
@@ -199,12 +199,12 @@
 					<button type="button" class="close" data-dismiss="modal" aria-label="Close">
 						<span aria-hidden="true">&times;</span>
 					</button>
-			<h2 class="modal-title"><h2><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span>  Uploading corpus...</h2>
+			<h2 class="modal-title"><h2><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span>Building the corpus...</h2>
 		</div>
 		<div class="modal-body">
 			<p>
-			Your file has been uploaded !
-			Gargantext need some time to eat it.
+			Gargantext is gathering your texts
+			 and need some time to eat it.
 			Duration depends on the size of the dish.
 			</p>
 		</div>