Commit 0b2cc7bf authored by delanoe's avatar delanoe

Merge branch 'refactoring' into unstable

parents f9ab3d20 f6e87beb
*.pyc
*/__pycache__/*
VENV/*
install/docker/gargantext_lib.tar.bz2
# API
Be more careful about authorizations.
cf. "ng-resource".
# Taggers
Path for data used by taggers should be defined in `gargantext.constants`.
# Database
# Sharing
Here follows a brief description of how sharing could be implemented.
## Database representation
The database representation of sharing can be distributed among 4 tables:
- `persons`, of which items represent either a user or a group
- `relationships` describes the relationships between persons (affiliation
of a user to a group, contact between two users, etc.)
- `nodes` contains the projects, corpora, documents, etc. to share (they shall
inherit the sharing properties from their parents)
- `permissions` stores the relations existing between the three previously
described above: it only consists of 2 foreign keys, plus an integer
between 1 and 3 representing the level of sharing and the start date
(when the sharing has been set) and the end date (when necessary, the time
at which sharing has been removed, `NULL` otherwise)
## Python code
The permission levels should be set in `gargantext.constants`, and defined as:
```python
PERMISSION_NONE = 0 # 0b0000
PERMISSION_READ = 1 # 0b0001
PERMISSION_WRITE = 3 # 0b0011
PERMISSION_OWNER = 7 # 0b0111
```
The requests to check for permissions (or add new ones) should not be rewritten
every time. They should be "hidden" within the models:
- `Person.owns(node)` returns a boolean
- `Person.can_read(node)` returns a boolean
- `Person.can_write(node)` returns a boolean
- `Person.give_right(node, permission)` gives a right to a given user
- `Person.remove_right(node, permission)` removes a right from a given user
- `Person.get_nodes(permission[, type])` returns an iterator on the list of
nodes on which the person has at least the given permission (optional
argument: type of requested node)
- `Node.get_persons(permission[, type])` returns an iterator on the list of
users who have at least the given permission on the node (optional argument:
type of requested persons, such as `USER` or `GROUP`)
## Example
Let's imagine the `persons` table contains the following data:
| id | type | username |
|----|-------|-----------|
| 1 | USER | David |
| 2 | GROUP | C.N.R.S. |
| 3 | USER | Alexandre |
| 4 | USER | Untel |
| 5 | GROUP | I.S.C. |
| 6 | USER | Bidule |
Assume "David" owns the groups "C.N.R.S." and "I.S.C.", "Alexandre" belongs to
the group "I.S.C.", with "Untel" and "Bidule" belonging to the group "C.N.R.S.".
"Alexandre" and "David" are in contact.
The `relationships` table then contains:
| person1_id | person2_id | type |
|------------|------------|---------|
| 1 | 2 | OWNER |
| 1 | 5 | OWNER |
| 3 | 2 | MEMBER |
| 4 | 5 | MEMBER |
| 6 | 5 | MEMBER |
| 1 | 3 | CONTACT |
The `nodes` table is populated as such:
| id | type | name |
|----|----------|----------------------|
| 12 | PROJECT | My super project |
| 13 | CORPUS | A given corpus |
| 13 | CORPUS | The corpus |
| 14 | DOCUMENT | Some document |
| 15 | DOCUMENT | Another document |
| 16 | DOCUMENT | Yet another document |
| 17 | DOCUMENT | Last document |
| 18 | PROJECT | Another project |
| 19 | PROJECT | That project |
If we want to express that "David" created "My super project" (and its children)
and wants everyone in "C.N.R.S." to be able to view it, but not access it,
`permissions` should contain:
| person_id | node_id | permission |
|-----------|---------|------------|
| 1 | 12 | OWNER |
| 2 | 12 | READ |
If "David" also wanted "Alexandre" (and no one else) to view and modify "The
corpus" (and its children), we would have:
| person_id | node_id | permission |
|-----------|---------|------------|
| 1 | 12 | OWNER |
| 2 | 12 | READ |
| 3 | 13 | WRITE |
If "Alexandre" created "That project" and wants "Bidule" (and no one else) to be
able to view and modify it (and its children), the table should then have:
| person_id | node_id | permission |
|-----------|---------|------------|
| 3 | 19 | OWNER |
| 6 | 19 | WRITE |
{
"directory": "static/bower_components"
}
static/bower_components/
{
"globalstrict": true,
"globals": {
"angular": false,
"describe": false,
"it": false,
"expect": false,
"beforeEach": false,
"afterEach": false,
"module": false,
"inject": false
}
}
\ No newline at end of file
set tabstop=4
set shiftwidth=4
set expandtab
set softtabstop=4
# Gargantext Annotations web application
We also use a number of node.js tools to initialize and test. You must have node.js and
its package manager (npm) installed. You can get them from [http://nodejs.org/](http://nodejs.org/).
## Preview only
Activate your virtualenv and run a simple http server
```
workon gargantext
python3 -m http.server
```
or :
```
npm start
```
Now browse to the app at `http://localhost:8000/app/index.html`.
## Install development tools and dependencies
We have two kinds of dependencies in this project: tools and angular framework code. The tools help
us manage and test the application.
* We get the tools we depend upon via `npm`, the [node package manager][npm].
* We get the angular code via `bower`, a [client-side code package manager][bower].
We have preconfigured `npm` to automatically run `bower` so we can simply do:
```
npm install
```
Behind the scenes this will also call `bower install`. You should find that you have two new
folders in your project.
* `node_modules` - contains the npm packages for the tools we need
* `app/bower_components` - contains the angular framework files
*Note that the `bower_components` folder would normally be installed in the root folder but
angular-seed changes this location through the `.bowerrc` file. Putting it in the app folder makes
it easier to serve the files by a webserver.*
## Directory Layout
This will be adapted to fit the django API code as well. For now, the generic layout is :
```
app/ --> all of the source files for the application
app.css --> default stylesheet
components/ --> all app specific modules
view1/ --> the view1 view template and logic
view1.html --> the partial template
view1.js --> the controller logic
view1_test.js --> tests of the controller
app.js --> main application module
index.html --> app layout file (the main html template file of the app)
```
# Conception and workflow documentation
## TODO : à traduire en anglais
Cette API permet d'éditer les mots-clés miamlistés ou stoplistés associé à un document affiché dans un cadre d'une page web permettant de naviguer à travers un ensemble de document d'un corpus.
### Architecture
- Templates : Django et Angular.js ?
- Communication entre les modules : évènements Angular ($emit et $broadcast)
- Pas de routage entre différentes URL, car ici une seule vue principale basée sur le template django corpus.html
- Modèle d'abstraction de données : côté client (Angular Scopes) et côté serveur (Django Model et SQLAlchemy)
- Composants : TODO lister et décrire les composants client et serveur
- Structure de l'application : organisation du client et du serveur
- Style : Bootstrap et un thème spécifique choisi pour Gargantext
- Gestion des dépendances :
- bower, npm pour le développement web et les tests côté client
- pip requirements pour le côté serveur
## Quelles actions execute l'API ?
- afficher le titre, les auteurs, le résumé, la date de publication et le corps d'un document.
- lecture des mots-clés miamlistés associés à un document (dans le texte et hors du texte).
- lecture des mots-clés stoplistés associés à un document (dans le texte et hors du texte).
- lecture des documents ayant le plus de mots-clés miamlistés associés identiques pour afficher une liste de liens vers de nouveaux documents
- lecture du groupe de mots-clés auquel appartient un mot-clé (synonymes, différentes formes)
- modification du groupe de mots-clés auquel appartient un mot-clé donné
On désigne par mot-clé un NGram.
## Schéma de l'API
Liste des endpoints
### Lecture des données
- POST '^api/nodes/(\d+)/children/queries$' : liste des NGrams d'un document avec la possibilité de filtrer par NGrams
- GET '^api/nodes$' : liste des identifiants de mots-clés filtrés par type (NGram ou autre) pour un identifiant de parent (Document ou autre)
- GET '^api/nodes/(\d+)/ngrams$': liste des termes des mots-clés associés à un Document parent, filtrés par termes
- GET ^api/nodes/(\d+)/children/metadata$ : liste des metadata d'un Node, c'est-à-dire :
- pour un document : titre, auteur, etc
- pour un NGram : stoplisté ou miamlisté ?
### Écriture des données
TODO
## Workflow
Nous nous fixons sur cette documentation et spécification de l'API
- en parallèle : développement de l'API et prototypage de l'interface
- le prototypage de l'interface peut modifier l'API si besoin
### Spécifications des fondations de l'interface
- résolutions d'écran
- browsers
- langue: english only
- SEO: aucun ?
- collaboratif : oui, les modifications d'un autre utilisateurs seront notifiées à tous les utilisateurs présent sur le même corpus de documents
- fonctionne offline ?
### Working process
- follow board is updated regularly (https://trello.com/b/96ItkDBS/gargantext-miamlists-and-stoplists)[on Trello]
- calendrier prévisionnel: TODO
- interactions entre les acteurs: emails
- git, branches : branche "elias", `git pull --rebase origin master` réguliers
- prévision des revues de code et de l'interface : TODO
### Plateforme
- Python 3.4
- Django 1.6
- Postgresql 9.3 + HSTORE
- SQLAlchemy
- Bootstrap CSS
- Angular.js
### Outils de qualité de code
- pylint
- jshint (voir .jshintrc)
- indentations : 4 espaces (voir .lvimrc)
- nettoyage automatique des espaces en fin de ligne
## Tests
There are two kinds of tests possible : Unit tests and End to End tests.
- côté client : étudier karma.js et protractor
- définir la stratégie de tests : TODO
## Déploiement
- définir le processus de déploiement
- prévoir un système de monitoring des erreurs du serveur une fois en ligne
- Sentry ?
## Updating the web application
Previously we recommended that you merge in changes to angular-seed into your own fork of the project.
Now that the angular framework library code and tools are acquired through package managers (npm and
bower) you can use these tools instead to update the dependencies.
You can update the tool dependencies by running:
```
npm update
```
This will find the latest versions that match the version ranges specified in the `package.json` file.
You can update the Angular dependencies by running:
```
bower update
```
This will find the latest versions that match the version ranges specified in the `bower.json` file.
### Running the App in Production
This really depends on how complex your app is and the overall infrastructure of your system, but
the general rule is that all you need in production are all the files under the `app/` directory.
Everything else should be omitted.
Angular apps are really just a bunch of static html, css and js files that just need to be hosted
somewhere they can be accessed by browsers.
If your Angular app is talking to the backend server via xhr or other means, you need to figure
out what is the best way to host the static files to comply with the same origin policy if
applicable. Usually this is done by hosting the files by the backend server or through
reverse-proxying the backend server(s) and webserver(s).
##
[AngularJS]: http://angularjs.org/
[git]: http://git-scm.com/
[bower]: http://bower.io
[npm]: https://www.npmjs.org/
[node]: http://nodejs.org
[protractor]: https://github.com/angular/protractor
[jasmine]: http://jasmine.github.io
[karma]: http://karma-runner.github.io
[http-server]: https://github.com/nodeapps/http-server
{
"name": "annotations",
"description": "Annotations for Gargantext",
"version": "0.0.1",
"license": "GPLv3",
"private": true,
"dependencies": {
"angular": "~1.2.x",
"angular-loader": "~1.2.x",
"angular-resource": "~1.2.x",
"bootstrap": "~3.x",
"angular-cookies": "1.2",
"bootstrap-select": "silviomoreto/bootstrap-select#~1.7.3"
},
"resolutions": {
"angular": "~1.2.x"
}
}
{
"name": "gargantext-annotations",
"private": true,
"version": "0.0.1",
"description": "Annotations for gargantext",
"license": "GPLv3",
"devDependencies": {
"karma": "~0.10",
"protractor": "^1.1.1",
"http-server": "^0.6.1",
"bower": "^1.3.1",
"shelljs": "^0.2.6",
"karma-junit-reporter": "^0.2.2"
},
"scripts": {
"postinstall": "bower install",
"prestart": "npm install",
"start": "http-server -a localhost -p 8000 -c-1",
"pretest": "npm install",
"test": "karma start karma.conf.js",
"test-single-run": "karma start karma.conf.js --single-run",
"preupdate-webdriver": "npm install",
"update-webdriver": "webdriver-manager update",
"preprotractor": "npm run update-webdriver",
"protractor": "protractor e2e-tests/protractor.conf.js",
"update-index-async": "node -e \"require('shelljs/global'); sed('-i', /\\/\\/@@NG_LOADER_START@@[\\s\\S]*\\/\\/@@NG_LOADER_END@@/, '//@@NG_LOADER_START@@\\n' + sed(/sourceMappingURL=angular-loader.min.js.map/,'sourceMappingURL=bower_components/angular-loader/angular-loader.min.js.map','static/bower_components/angular-loader/angular-loader.min.js') + '\\n//@@NG_LOADER_END@@', 'templates/annotations/demo.html');\""
}
}
ANNOTATIONS
===========
2016-01
## Routines de manipulation de ngrams dans les listes
#### Trajectoire globale des actions choisies
1. angular: ngramlist.js (user input) or highlight.js (user menu controller)
2. angular: http.js configuration object
{ 'action': 'post', 'listId': miamlist_id, ..}
3. AJAX POST/DELETE
4. "API locale" (=> annotations.views)
5. DB insert/delete
Remarque:
Dans le code annotations d'Elias, il y a une "API locale" qui transmet les actions client vers le serveur.
=> l'interconnexion est configurée pour angular dans annotations/static/annotations/app.js qui lance son propre main sur la fenêtre en prenant les paramètres depuis l'url et en s'isolant de django
=> les routes amont sont définies pour django dans annotations.urls et reprises pour angular dans http.js
#### Par ex: l'étape AJAX pour suppression
`curl -XDELETE http://localhost:8000/annotations/lists/7129/ngrams/4497`
via annotations.views.NgramEdit.as_view())
#### ajout d'un ngram
```
curl -XPOST http://localhost:8000/annotations/lists/1866/ngrams/create \
-H "Content-Type: application/json" \
-d '{"text":"yooooooooo"}' > response_to_ngrams_create.html
```
## Points d'interaction côté client (GUI)
Add Ngram via dialog box :
- controller:
ngramlist.annotationsAppNgramList.controller('NgramInputController')
- effect:
1. NgramHttpService.post()
Add Ngram via select new + menu
- controller:
highlight.annotationsAppHighlight.controller('TextSelectionMenuController')
1. toggleMenu (sets action = X)
2. onMenuClick
- effect:
1. NgramHttpService[action]
(function () {
'use strict';
var annotationsAppActiveLists = angular.module('annotationsAppActiveLists', []);
annotationsAppActiveLists.controller('ActiveListsController',
['$scope', '$rootScope', '$timeout',
function ($scope, $rootScope, $timeout) {
$scope.activeListsChange = function() {
var selected = $('.selectpicker option:selected').val();
var newActive = {};
$('.selectpicker option:selected').each(function(item, opt) {
// ex opt:
// <option id="list---748" value="MAINLIST">MAINLIST</option>
var id = opt.id.split("---", 2)[1];
newActive[id] = opt.value;
});
// ex: {745: "MAINLIST", 748: "MAPLIST"}
$rootScope.activeLists = newActive;
};
$rootScope.$watchCollection('activeLists', function (newValue, oldValue) {
if (newValue === undefined) return;
$timeout(function() {
$('.selectpicker').selectpicker('refresh');
});
});
// FIXME: est-ce qu'on ne pourrait pas directement utiliser lists
// au lieu de recopier dans allListsSelect ?
$rootScope.$watchCollection('lists', function (newValue, oldValue) {
if (newValue === undefined) return;
// reformat lists to allListsSelect
var allListsSelect = [];
// console.log($rootScope.lists)
angular.forEach($rootScope.lists, function(value, key) {
this.push({
'id': key,
'label': value
});
// initialize activeLists with the MAPLIST by default
if (value == 'MAPLIST') {
$rootScope.activeLists = {};
$rootScope.activeLists[key] = value;
}
}, allListsSelect);
$rootScope.allListsSelect = allListsSelect;
$timeout(function() {
$('.selectpicker').selectpicker();
$('.selectpicker').selectpicker('val', ['MAPLIST']);
});
});
}]);
})(window);
/* app css stylesheet */
/*
* Class names corresponding to server-side list names
* To display another list name, add a new class under this
*/
.MAPLIST {
color: black;
/* green */
background-color: rgba(23, 255, 189, .7);
/* background-color: rgba(60, 118, 61, .7); */
cursor: pointer;
}
.MAINLIST {
color: black;
/* background-color: rgba(60, 118, 61, 0.5); */
background-color: orange;
cursor: pointer;
}
.STOPLIST {
color: black;
/* grey */
background-color: rgba(169, 68, 66, 0.2);
cursor: pointer;
}
.delete-keyword, .occurrences {
vertical-align: super;
font-size: 70%;
}
.delete-keyword {
cursor: pointer;
}
.center-block {
display: block;
margin-left: auto;
margin-right: auto;
}
.keyword-inline {
display: inline;
}
.keyword-inline:hover {
text-decoration: none;
}
.nav-tabs {
border-bottom: none;
}
.main-panel, .text-panel, .words-panel {
margin: 10px 0;
}
#annotationsApp {
min-width: 780px;
}
.words-panel {
min-width: 220px;
}
.text-panel {
overflow-y: auto;
min-width: 400px;
}
.words-list {
margin-bottom: 5px;
}
.keyword-text {
word-break: break-all;
}
.keyword-group-item {
display: inline-block;
float: left;
padding: 5px;
margin: .25em;
box-shadow: .2em .2em .1em rgba(0, 0, 0, .125);
}
.words-pagination {
margin: 5px 0;
}
.text-panel p, .text-panel h3 {
-webkit-transition: all 0.25s linear;
-moz-transition: all 0.25s linear;
-ms-transition: all 0.25s linear;
-o-transition: all 0.25s linear;
transition: all 0.25s linear;
}
.selection {
color: #aaa;
}
::selection {
color: black;
background-color: rgba(0, 0, 0, 0.4);
}
.noselection {
-webkit-touch-callout: none;
-webkit-user-select: none;
-khtml-user-select: none;
-moz-user-select: none;
-ms-user-select: none;
user-select: none;
}
.selection-menu {
display: none;
position: absolute;
color: #394141;
background: white;
font-size: 0.8em;
font-weight: 600;
-webkit-box-shadow: 1px 1px 2px rgba(0, 0, 0, 0.5);
-moz-box-shadow: 1px 1px 2px rgba(0, 0, 0, 0.5);
box-shadow: 1px 1px 2px rgba(0, 0, 0, 0.5);
}
/*.selection-menu:before {
content: '';
position: absolute;
left: -10px;
top: 0px;
border-right: solid white 10px;
border-top: solid transparent 8px;
border-bottom: solid transparent 8px;
}*/
.selection-menu ul {
list-style: none;
margin: 0;
padding: 0;
}
.selection-menu li {
border-bottom: solid thin #CCC;
padding: 10px;
white-space: nowrap;
}
.selection-menu [class*="glyphicon"] {
min-width: 25px;
display: inline-block;
text-align: center;
border-right: solid thin #CCC;
margin-right: 5px;
}
.float-right {
float: right;
}
.favactive {
/* yellow */
color: #FFF50D;
text-shadow: -1px 0 #777777, 0 1px #777777, 1px 0 #777777, 0 -1px #777777;
}
(function () {
'use strict';
/*
* Django STATIC_URL given to angular to load async resources
*/
var S = window.STATIC_URL;
window.annotationsApp = angular.module('annotationsApp', ['annotationsAppHttp',
'annotationsAppNgramList', 'annotationsAppHighlight', 'annotationsAppDocument',
'annotationsAppActiveLists', 'annotationsAppUtils']);
/*
* Angular Templates must not conflict with Django's
*/
window.annotationsApp.config(function($interpolateProvider) {
$interpolateProvider.startSymbol('{[{');
$interpolateProvider.endSymbol('}]}');
});
/*
* Main function
* GET the document node and all its ngrams
*/
window.annotationsApp.run(function ($rootScope) {
var path = window.location.pathname.match(/\/projects\/(.*)\/corpora\/(.*)\/documents\/(.*)\//);
$rootScope.projectId = path[1];
$rootScope.corpusId = path[2];
$rootScope.docId = path[3];
});
})(window);
(function () {
'use strict';
var annotationsAppDocument = angular.module('annotationsAppDocument', ['annotationsAppHttp']);
annotationsAppDocument.controller('DocController',
['$scope', '$rootScope', '$timeout', 'NgramListHttpService', 'DocumentHttpService',
function ($scope, $rootScope, $timeout, NgramListHttpService, DocumentHttpService) {
// dataLoading = signal pour afficher wait
$scope.dataLoading = true ;
console.log("annotations.document.DocController.DocumentHttpService.get():before")
$rootScope.documentResource = DocumentHttpService.get(
{'docId': $rootScope.docId},
function(data, responseHeaders) {
$scope.authors = data.authors;
$scope.journal = data.journal;
$scope.publication_date = data.publication_date;
//$scope.current_page_number = data.current_page_number;
//$scope.last_page_number = data.last_page_number;
$rootScope.title = data.title;
$rootScope.docId = data.id;
$rootScope.full_text = data.full_text;
$rootScope.abstract_text = data.abstract_text;
console.log("annotations.document.DocController.getannotations")
// GET the annotationss
NgramListHttpService.get(
{
'corpusId': $rootScope.corpusId,
'docId': $rootScope.docId
},
function(data) {
$rootScope.annotations = data[$rootScope.corpusId.toString()][$rootScope.docId.toString()];
$rootScope.lists = data[$rootScope.corpusId.toString()].lists;
$scope.dataLoading = false ;
},
function(data) {
console.error("unable to get the list of ngrams");
}
);
});
// TODO setup article pagination
$scope.onPreviousClick = function () {
DocumentHttpService.get($scope.docId - 1);
};
$scope.onNextClick = function () {
DocumentHttpService.get($scope.docId + 1);
};
}]);
annotationsAppDocument.controller('DocFavoriteController',
['$scope', '$rootScope', 'MainApiFavoritesHttpService',
function ($scope, $rootScope, MainApiFavoritesHttpService) {
$scope.isFavorite = false;
MainApiFavoritesHttpService.get(
{
'corpusId': $rootScope.corpusId,
'docId': $rootScope.docId
},
function(data) {
if (data['favdocs'].length > 0
&& data['favdocs'][0] == $scope.docId) {
$scope.isFavorite = true ;
}
else {
$scope.isFavorite = false ;
}
},
function(data) {
console.error("unable to check if document belongs to favorites");
$scope.isFavorite = false ;
}
) ;
$scope.onStarClick = function($event) {
console.log($scope.isFavorite)
// console.log($scope)
console.log("TODO");
var myAction ;
if (! $scope.isFavorite) {
// PUT api/nodes/574/favorites?docs=576
myAction = MainApiFavoritesHttpService.put
}
else {
// DELETE api/nodes/574/favorites?docs=576
myAction = MainApiFavoritesHttpService.delete
}
// (1) do the action
myAction(
{
'corpusId': $rootScope.corpusId,
'docId': $rootScope.docId
},
// success
function(data) {
// (2) toggle status and refresh
$scope.isFavorite = ! $scope.isFavorite
$rootScope.refreshDisplay();
},
// failure
function(data) {
console.error("unable to change favorite status");
}
);
};
}]);
})(window);
This diff is collapsed.
(function () {
'use strict';
var http = angular.module('annotationsAppHttp', ['ngResource', 'ngCookies']);
http.config(['$httpProvider', function($httpProvider){
$httpProvider.defaults.xsrfHeaderName = 'X-CSRFToken';
$httpProvider.defaults.xsrfCookieName = 'csrftoken';
}]);
/*
* DocumentHttpService: Read Document
* ===================
*
* route: annotations/documents/@d_id
* ------
*
* exemple:
* --------
* {
* "id": 556,
* "publication_date": "01/01/66",
* "title": "Megalithic astronomy: Indications in standing stones",
* "abstract_text": "An account is given of a number of surveys of
* stone circles, alignments, etc., found in Britain.
* The geometry of the rings is discussed in so far
* as it affects the determination of the azimuths
* to outliers and other circles.",
* "full_text": null,
* "journal": "Vistas in Astronomy",
* "authors": "A. Thom"
* }
*
*/
http.factory('DocumentHttpService', function($resource) {
return $resource(
window.ANNOTATION_API_URL + "documents/:docId/",
{
docId: '@docId'
},
{
get: {
method: 'GET',
params: {docId: '@docId'}
}
}
);
});
/*
* NgramListHttpService: Read all Ngrams
* =====================
*
* route: annotations/corpora/@c_id/documents/@d_id
* ------
*
* json return format:
* -------------------
* corpus_id : {
* lists: {(list_id:name)+}
* doc_id : [ngrams_objects]+,
* }
*
* exemple:
* --------
* "554": {
* "lists": { "558": "StopList", "564": "MiamList", "565": "MapList" }
* "556": [{ "uuid": 2368, "occurrences": 1.0, "text": "idea", "list_id": 564 },
* { "uuid": 5031, "occurrences": 1.0, "text": "indications", "list_id": 564},
* { "uuid": 5015, "occurrences": 3.0, "text": "star", "list_id": 565 },
* ... ],
* }
*/
http.factory('NgramListHttpService', function ($resource) {
return $resource(
window.ANNOTATION_API_URL + 'corpora/:corpusId/documents/:docId',
{
corpusId: '@corpusId',
docId: '@docId'
},
{
get: {
method: 'GET',
params: {}
}
}
);
});
/*
* NgramHttpService: Create, modify or delete 1 Ngram
* =================
*
* TODO REACTIVATE IN urls.py
*
* if new ngram:
* -> ngram_id will be "create"
* -> route: annotations/lists/@node_id/ngrams/create
* -> will land on views.NgramCreate
*
* else:
* -> ngram_id is a real ngram id
* -> route: annotations/lists/@node_id/ngrams/@ngram_id
* -> will land on views.NgramCreate
*
*/
http.factory('NgramHttpService', function ($resource) {
return $resource(
window.ANNOTATION_API_URL + 'lists/:listId/ngrams/:ngramId',
{
listId: '@listId',
ngramId: '@id'
},
{
post: {
method: 'POST',
params: {'listId': '@listId', 'ngramId': '@ngramId'}
},
delete: {
method: 'DELETE',
params: {'listId': '@listId', 'ngramId': '@ngramId'}
}
}
);
});
/*
* MainApiFavoritesHttpService: Check/Add/Del Document in favorites
* ============================
* route: api/nodes/574/favorites?docs=576
* /!\ for this route we reach out of this annotation module
* and send directly to the gargantext api route for favs
* (cross origin request with http protocol scheme)
* ------
*
* exemple:
* --------
* {
* "favdocs": [576] // <== if doc is among favs
* "missing": [] // <== if doc is not in favs
* }
*
*/
http.factory('MainApiFavoritesHttpService', function($resource) {
return $resource(
// adding explicit "http://" b/c this a cross origin request
'http://' + window.GARG_ROOT_URL + "/api/nodes/:corpusId/favorites?docs=:docId",
{
corpusId: '@corpusId',
docId: '@docId'
},
{
get: {
method: 'GET',
params: {corpusId: '@corpusId', docId: '@docId'}
},
put: {
method: 'PUT',
params: {corpusId: '@corpusId', docId: '@docId'}
},
delete: {
method: 'DELETE',
params: {corpusId: '@corpusId', docId: '@docId'}
}
}
);
});
})(window);
<span ng-click='onDeleteClick()' class="delete-keyword">×</span>
<span data-toggle="tooltip" class="keyword-text {[{keyword.listName}]}">{[{keyword.text}]}</span>
<span class="occurrences" data-keyword-id="{[{keyword.uuid}]}">{[{keyword.occurrences}]}</span>
// include angular loader, which allows the files to load in any order
//@@NG_LOADER_START@@
// You need to run `npm run update-index-async` to inject the angular async code here
//@@NG_LOADER_END@@
// include a third-party async loader library
/*!
* $script.js v1.3
* https://github.com/ded/script.js
* Copyright: @ded & @fat - Dustin Diaz, Jacob Thornton 2011
* Follow our software http://twitter.com/dedfat
* License: MIT
*/
!function(a,b,c){function t(a,c){var e=b.createElement("script"),f=j;e.onload=e.onerror=e[o]=function(){e[m]&&!/^c|loade/.test(e[m])||f||(e.onload=e[o]=null,f=1,c())},e.async=1,e.src=a,d.insertBefore(e,d.firstChild)}function q(a,b){p(a,function(a){return!b(a)})}var d=b.getElementsByTagName("head")[0],e={},f={},g={},h={},i="string",j=!1,k="push",l="DOMContentLoaded",m="readyState",n="addEventListener",o="onreadystatechange",p=function(a,b){for(var c=0,d=a.length;c<d;++c)if(!b(a[c]))return j;return 1};!b[m]&&b[n]&&(b[n](l,function r(){b.removeEventListener(l,r,j),b[m]="complete"},j),b[m]="loading");var s=function(a,b,d){function o(){if(!--m){e[l]=1,j&&j();for(var a in g)p(a.split("|"),n)&&!q(g[a],n)&&(g[a]=[])}}function n(a){return a.call?a():e[a]}a=a[k]?a:[a];var i=b&&b.call,j=i?b:d,l=i?a.join(""):b,m=a.length;c(function(){q(a,function(a){h[a]?(l&&(f[l]=1),o()):(h[a]=1,l&&(f[l]=1),t(s.path?s.path+a+".js":a,o))})},0);return s};s.get=t,s.ready=function(a,b,c){a=a[k]?a:[a];var d=[];!q(a,function(a){e[a]||d[k](a)})&&p(a,function(a){return e[a]})?b():!function(a){g[a]=g[a]||[],g[a][k](b),c&&c(d)}(a.join("|"));return s};var u=a.$script;s.noConflict=function(){a.$script=u;return this},typeof module!="undefined"&&module.exports?module.exports=s:a.$script=s}(this,document,setTimeout)
// load all of the dependencies asynchronously.
var S = window.STATIC_URL;
$script([
S + 'bower_components/angular/angular.min.js',
S + 'bower_components/bootstrap/dist/js/bootstrap.min.js',
S + 'bower_components/bootstrap-select/dist/js/bootstrap-select.min.js',
S + 'bower_components/angular-loader/angular-loader.min.js',
S + 'bower_components/underscore/underscore-1.5.2.js',
//'bower_components/angular-route/angular-route.js',
], function() {
$script([
S + 'bower_components/angular-cookies/angular-cookies.min.js',
S + 'bower_components/angular-resource/angular-resource.min.js'], function() {
$script([S + 'annotations/http.js', S + 'annotations/highlight.js',
S + 'annotations/document.js', S + 'annotations/ngramlist.js',
S + 'annotations/activelists.js', S + 'annotations/ngramlist.js',
S + 'annotations/utils.js', S + 'annotations/app.js'], function() {
// when all is done, execute bootstrap angular application (replace ng-app directive)
angular.bootstrap(document.getElementById("annotationsApp"), ['annotationsApp']);
});
});
});
(function () {
'use strict';
var annotationsAppNgramList = angular.module('annotationsAppNgramList', ['annotationsAppHttp']);
/*
* Controls one Ngram displayed in the flat lists (called "extra-text")
*/
annotationsAppNgramList.controller('NgramController',
['$scope', '$rootScope', 'NgramHttpService', 'NgramListHttpService',
function ($scope, $rootScope, NgramHttpService, NgramListHttpService) {
/*
* Click on the 'delete' cross button
*/
$scope.onDeleteClick = function () {
NgramHttpService.delete({
'listId': $scope.keyword.list_id,
'ngramId': $scope.keyword.uuid
}, function(data) {
// Refresh the annotationss
NgramListHttpService.get(
{
'corpusId': $rootScope.corpusId,
'docId': $rootScope.docId
},
function(data) {
// $rootScope.annotations
// ----------------------
// is the union of all lists, one being later "active"
// (then used for left-side flatlist AND inline annots)
$rootScope.annotations = data[$rootScope.corpusId.toString()][$rootScope.docId.toString()];
// TODO £NEW : lookup obj[list_id][term_text] = {terminfo}
// $rootScope.lookup =
$rootScope.refreshDisplay();
},
function(data) {
console.error("unable to refresh the list of ngrams");
}
);
}, function(data) {
console.error("unable to remove the Ngram " + $scope.keyword.text);
});
};
}]);
/*
* Controller for the list panel displaying extra-text ngram
*/
annotationsAppNgramList.controller('NgramListPaginationController',
['$scope', '$rootScope', function ($scope, $rootScope) {
$rootScope.$watchCollection('ngramsInPanel', function (newValue, oldValue) {
$scope.currentListPage = 0;
$scope.pageSize = 15;
$scope.nextListPage = function() {
$scope.currentListPage = $scope.currentListPage + 1;
};
$scope.previousListPage = function() {
$scope.currentListPage = $scope.currentListPage - 1;
};
$scope.totalListPages = function(listId) {
if ($rootScope.ngramsInPanel[listId] === undefined) return 0;
return Math.ceil($rootScope.ngramsInPanel[listId].length / $scope.pageSize);
};
});
}]);
/*
* Template of the ngram element displayed in the flat lists
*/
annotationsAppNgramList.directive('keywordTemplate', function () {
return {
templateUrl: function ($element, $attributes) {
return S + 'annotations/keyword_tpl.html';
}
};
});
/*
* new NGram from the user input
*/
annotationsAppNgramList.controller('NgramInputController',
['$scope', '$rootScope', '$element', 'NgramHttpService', 'NgramListHttpService',
function ($scope, $rootScope, $element, NgramHttpService, NgramListHttpService) {
/*
* Add a new NGram from the user input in the extra-text list
*/
$scope.onListSubmit = function ($event, listId) {
var inputEltId = "#"+ listId +"-input";
if ($event.keyCode !== undefined && $event.keyCode != 13) return;
var value = angular.element(inputEltId).val().trim();
if (value === "") return;
// £TEST locally check if already in annotations NodeNgrams ------
// $rootScope.annotations = array of ngram objects like:
// {"list_id":805,"occurrences":2,"uuid":9386,"text":"petit échantillon"}
console.log('looking for "' + value + '" in list:' + listId)
var already_in_list = false ;
angular.forEach($rootScope.annotations, function(annot,i) {
// console.log(i + ' => ' + annot.text + ',' + annot.list_id) ;
if (value == annot.text && listId == annot.list_id) {
console.log('the term "' + value + '" was already present in list')
// no creation
already_in_list = true ;
}
}
);
if (already_in_list) { return ; }
// ---------------------------------------------------------------
// will check if there's a preexisting ngramId for this value
// TODO: if maplist => also add to miam
NgramHttpService.post(
{
'listId': listId,
'ngramId': 'create'
},
{
'text': value
},
function(data) {
console.warn("refresh attempt");
// on success
if (data) {
angular.element(inputEltId).val("");
// Refresh the annotationss
NgramListHttpService.get(
{
'corpusId': $rootScope.corpusId,
'docId': $rootScope.docId
},
function(data) {
$rootScope.annotations = data[$rootScope.corpusId.toString()][$rootScope.docId.toString()];
// TODO £NEW : lookup obj[list_id][term_text] = {terminfo}
// $rootScope.lookup =
$rootScope.refreshDisplay();
},
function(data) {
console.error("unable to get the list of ngrams");
}
);
}
}, function(data) {
// on error
angular.element(inputEltId).parent().addClass("has-error");
console.error("error adding Ngram "+ value);
}
);
};
}]);
})(window);
(function () {
'use strict';
var annotationsAppUtils = angular.module('annotationsAppUtils', []);
/*
* Filter used in lists pagination (extra-text panel)
*/
annotationsAppUtils.filter('startFrom', function () {
return function (input, start) {
if (input === undefined) return;
start = +start; //parse to int
return input.slice(start);
};
});
})(window);
{% load staticfiles %}
<!DOCTYPE html>
<!--[if lt IE 7]> <html lang="en" class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]> <html lang="en" class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]> <html lang="en" class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!-->
<html lang="en" class="no-js">
<!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>Gargantext article editor</title>
<meta name="description" content="Gargantext">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="{% static 'bower_components/bootstrap/dist/css/bootstrap.min.css' %}">
<link rel="stylesheet" href="{% static 'bower_components/bootstrap-select/dist/css/bootstrap-select.min.css' %}">
<link rel="stylesheet" href="{% static 'bower_components/angular/angular-csp.css' %}">
<link rel="stylesheet" href="{% static 'annotations/app.css' %}">
<script src="{% static 'bower_components/jquery/dist/jquery.min.js' %}"></script>
</head>
<body>
<!-- TODO integrate this later into the any other django template -->
<div id="annotationsApp" ng-cloak>
<div class="container-fluid">
<div class="row-fluid main-panel" ng-controller="NGramHighlightController">
<div class="col-md-4 col-xs-4 tabbable words-panel">
<ul class="nav nav-pills nav-justified">
<li ng-repeat="(listId, listName) in activeLists" ng-class="{active: $first == true}">
<a href="#tab-{[{listId}]}" data-toggle="tab">{[{listName}]}</a>
</li>
</ul>
<div class="tab-content">
<div ng-controller="NgramListPaginationController" ng-repeat="(listId, listName) in activeLists" ng-class="{active: $first == true}" class="tab-pane" id="tab-{[{listId}]}">
<div ng-if="ngramsInPanel[listId].length == 0" class="alert alert-info" role="alert">
Input any keyword you want to link to this article and the list named '{[{listName}]}'
</div>
<ul class="list-group words-list clearfix">
<li ng-repeat="keyword in ngramsInPanel[listId] | startFrom:currentListPage * pageSize | limitTo:pageSize" class="keyword-group-item">
<div ng-controller="NgramController" keyword-template class="keyword-container"></div>
</li>
</ul>
<nav ng-class="{invisible: totalListPages(listId) - 1 == 0}" class="clearfix">
<ul class="pagination pagination-s pull-right words-pagination">
<li ng-class="{'disabled': currentListPage == 0}"><a ng-click="previousListPage()" class="glyphicon glyphicon-backward"></a></li>
<li ng-class="{'disabled': currentListPage >= totalListPages(listId) - 1}"><a ng-click="nextListPage()" class="glyphicon glyphicon-forward"></a></li>
</ul>
</nav>
<div class="form-group" ng-controller="NgramInputController">
<input autosave="search" maxlength="240" placeholder="Add any text" type="text" class="form-control" id="{[{listId}]}-input" ng-keypress="onListSubmit($event, listId)">
<button type="submit" class="form-control btn btn-default btn-primary" ng-click="onListSubmit($event, listId)">Add to {[{listName}]}</button>
</div>
</div>
</div>
<div class="list-selector">
<h5>Select lists</h5>
<select class="selectpicker" multiple ng-change="activeListsChange()" ng-model="lists" ng-controller="ActiveListsController">
<option ng-repeat="item in allListsSelect" id="list---{[{item.id}]}">{[{item.label}]}</option>
<!-- to disallow unchecking MapList add this into <option> element: ng-disabled="{[{ item.label == 'MapList' }]}" -->
</select>
</div>
</div>
<div class="col-md-8 col-xs-8 text-panel" ng-controller="DocController" id="document">
<div class="row-fluid clearfix">
<div class="col-md-10 col-xs-10">
<h3 class="text-container" id="title">{[{title}]}</h3>
</div>
<div class="col-md-2 col-xs-2 clearfix">
<button ng-controller="DocFavoriteController" type="button" class="btn btn-default float-right" ng-click="onStarClick($event)">
<span class="glyphicon" ng-class="{'glyphicon-star-empty': isFavorite == false, 'glyphicon-star favactive': isFavorite == true}"></span>
</button>
<!--<nav>
<ul class="pager">
<li ng-if="current_page_number > 1"><a ng-click="onPreviousClick()" href="#">Previous</a></li>
<li ng-if="current_page_number < last_page_number"><a ng-click="onNextClick()" href="#">Next</a></li>
</ul>
</nav>-->
</div>
</div>
<div class="row-fluid">
<ul class="list-group clearfix">
<li class="list-group-item small"><span class="badge">journal</span>{[{journal}]}</li>
<li class="list-group-item small"><span class="badge">authors</span>{[{authors}]}</li>
<li class="list-group-item small"><span class="badge">date</span>{[{publication_date}]}</li>
</ul>
</div>
<div ng-if="dataLoading">
Loading text...
<br>
<center>
<img width="10%" src="{% static 'img/ajax-loader.gif'%}"></img>
</center>
<br>
</div>
<div ng-if="abstract_text != null">
<span class="badge">abstract</span>
</div>
<p id="abstract-text" class="text-container">
<div ng-if="abstract_text == null" class="alert alert-info small" role="alert">Empty abstract text</div>
</p>
<div ng-if="full_text != null">
<span class="badge">full article</span>
</div>
<p id="full-text" class="text-container">
<div ng-if="full_text == null" class="alert alert-info small" role="alert">Empty full text</div>
</p>
</div>
</div> <!-- end of the main row -->
</div>
<!-- this menu is over the text on mouse selection -->
<div ng-controller="TextSelectionMenuController" id="selection" class="selection-menu">
<ul class="noselection">
<li ng-repeat="item in menuItems" class="{[{item.listName}]}" ng-click="onMenuClick($event, item.action, item.listId)">{[{item.verb}]} {[{item.listName}]}</li>
</ul>
</div>
</div>
<!--[if lt IE 7]>
<p class="browsehappy">You are using an <strong>outdated</strong> browser. Please <a href="http://browsehappy.com/">upgrade your browser</a> to improve your experience.</p>
<![endif]-->
<script type="application/javascript">
/* Constants required for annotations app JS to work */
window.STATIC_URL = "{% static '' %}";
window.ANNOTATION_API_URL = "{{ api_url }}";
window.GARG_ROOT_URL = "{{ garg_url }}";
window.NODES_API_URL = "{{ nodes_api_url }}";
</script>
<script src="{% static 'annotations/main.js' %}"></script>
</body>
</html>
from django.conf.urls import patterns, url
from annotations import views
# /!\ urls patterns here are *without* the trailing slash
urlpatterns = [
# GET [DocumentHttpService]
# json:title,id,authors,journal,
# publication_date
# abstract_text,full_text
url(r'^documents/(?P<doc_id>[0-9]+)$', views.Document.as_view()), # document view
# GET [NgramListHttpService]
# was : lists ∩ document (ngram_ids intersection if connected to list node_id and doc node_id)
# fixed 2016-01: just lists (because document doesn't get updated by POST create cf. ngram.lists.DocNgram filter commented)
url(r'^corpora/(?P<corpus_id>[0-9]+)/documents/(?P<doc_id>[0-9]+)$', views.NgramList.as_view()), # the list associated with an ngram
# 2016-03-24: refactoring, deactivated NgramEdit and NgramCreate
#
# url(r'^lists/(?P<list_id>[0-9]+)/ngrams/(?P<ngram_ids>[0-9,\+]+)+$', views.NgramEdit.as_view()),
# POST (fixed 2015-12-16)
# url(r'^lists/(?P<list_id>[0-9]+)/ngrams/create$', views.NgramCreate.as_view()), #
]
from urllib.parse import urljoin
import json
import datetime
from django.shortcuts import render_to_response
from django.template import RequestContext
from django.contrib.auth.decorators import login_required
from rest_framework.views import APIView
from rest_framework.response import Response
from rest_framework.renderers import JSONRenderer
from rest_framework.exceptions import APIException
from rest_framework.authentication import SessionAuthentication, BasicAuthentication
# 2016-03-24: refactoring, new paths
from gargantext.models.ngrams import Node, NodeNgram, Ngram
from gargantext.util.db import session, aliased
from gargantext.util.db_cache import cache
from gargantext.util.http import requires_auth
# from ngram.lists import listIds, listNgramIds
# from gargantext_web.db import get_or_create_node
@requires_auth
def main(request, project_id, corpus_id, document_id):
"""
Full page view
"""
return render_to_response('annotations/main.html', {
# TODO use reverse()
'api_url': urljoin(request.get_host(), '/annotations/'),
'garg_url': request.get_host(),
'nodes_api_url': urljoin(request.get_host(), '/api/'),
}, context_instance=RequestContext(request))
class NgramList(APIView):
"""Read the lists of ngrams (terms) that will become annotations"""
renderer_classes = (JSONRenderer,)
def get(self, request, corpus_id, doc_id):
"""Get All for a doc id"""
corpus_id = int(corpus_id)
doc_id = int(doc_id)
# our results: ngrams for the corpus_id (ignoring doc_id for the moment)
doc_ngram_list = []
lists = {}
corpus_nod = cache.Node[corpus_id]
doc_nod = cache.Node[doc_id]
scores_nod = corpus_nod.children(typename="OCCURRENCES").first()
for list_type in ['MAINLIST', 'MAPLIST', 'STOPLIST']:
list_nod = corpus_nod.children(typename=list_type).first()
list_id = list_nod.id
lists["%s" % list_id] = list_type
ListsTable = aliased(NodeNgram)
# doc_nod.ngrams iff we just need the occurrences in the doc (otherwise do manually)
q = doc_nod.ngrams.join(ListsTable).filter(ListsTable.node_id == list_id)
# add to results
doc_ngram_list += [(obj.id, obj.terms, w, list_id) for (w,obj) in q.all()]
# debug
# print("annotations.views.NgramList.doc_ngram_list: ", doc_ngram_list)
data = { '%s' % corpus_id : {
'%s' % doc_id :
[
{'uuid': ngram_id,
'text': ngram_text,
'occurrences': ngram_occurrences,
'list_id': list_id,}
for (ngram_id,ngram_text,ngram_occurrences,list_id) in doc_ngram_list
],
'lists': lists
}}
# format alternatif de transmission des "annotations", classé par listes puis ngram_id
# { 'corpus_id' : {
# list_id_stop: {term_stop1: {term_data}, term_stop2: {term_data}..},
# list_id_miam: {term_miam1: {term_data}, term_miam2: {term_data}..},
# list_id_map: {term_map1: {term_data}, term_map2: {term_data}..},
# }
# 'lists' : {"list_id" : "list_type" ... }
# }
# NB 3rd possibility: unicity of ngram_text could also allow us to use it
# as key and could enhance lookup later (frequent checks if term exists)
return Response(data)
# 2016-03-24: refactoring, deactivated NgramEdit and NgramCreate
# ------------------------------------
# class NgramEdit(APIView):
# """
# Actions on one existing Ngram in one list
# """
# renderer_classes = (JSONRenderer,)
# authentication_classes = (SessionAuthentication, BasicAuthentication)
#
# def post(self, request, list_id, ngram_ids):
# """
# Edit an existing NGram in a given list
# """
# # implicit global session
# list_id = int(list_id)
# list_node = session.query(Node).filter(Node.id==list_id).first()
# # TODO add 1 for MapList social score ?
# if list_node.type_id == cache.NodeType['MiamList']:
# weight=1.0
# elif list_node.type_id == cache.NodeType['StopList']:
# weight=-1.0
#
# # TODO remove the node_ngram from another conflicting list
# for ngram_id in ngram_ids.split('+'):
# ngram_id = int(ngram_id)
# node_ngram = NodeNgram(node_id=list_id, ngram_id=ngram_id, weight=weight)
# session.add(node_ngram)
#
# session.commit()
#
# # return the response
# return Response({
# 'uuid': ngram_id,
# 'list_id': list_id,
# } for ngram_id in ngram_ids)
#
# def put(self, request, list_id, ngram_ids):
# return Response(None, 204)
#
# def delete(self, request, list_id, ngram_ids):
# """
# Delete a ngram from a list
# """
# # implicit global session
# print("to del",ngram_ids)
# for ngram_id in ngram_ids.split('+'):
# print('ngram_id', ngram_id)
# ngram_id = int(ngram_id)
# (session.query(NodeNgram)
# .filter(NodeNgram.node_id==list_id)
# .filter(NodeNgram.ngram_id==ngram_id).delete()
# )
#
# session.commit()
#
# # [ = = = = del from map-list = = = = ]
# list_id = session.query(Node).filter(Node.id==list_id).first()
# corpus = session.query(Node).filter(Node.id==list_id.parent_id , Node.type_id==cache.NodeType['Corpus'].id).first()
# node_mapList = get_or_create_node(nodetype='MapList', corpus=corpus )
# results = session.query(NodeNgram).filter(NodeNgram.node_id==node_mapList.id ).all()
# ngram_2del = [int(i) for i in ngram_ids.split('+')]
# ngram_2del_ = session.query(NodeNgram).filter(NodeNgram.node_id==node_mapList.id , NodeNgram.ngram_id.in_(ngram_2del) ).all()
# for map_node in ngram_2del_:
# session.delete(map_node)
# session.commit()
#
# node_stopList = get_or_create_node(nodetype='StopList', corpus=corpus )
# for ngram_id in ngram_2del:
# stop_node = NodeNgram( weight=1.0, ngram_id=ngram_id , node_id=node_stopList.id)
# session.add(stop_node)
# session.commit()
# # [ = = = = / del from map-list = = = = ]
#
# return Response(None, 204)
#
# class NgramCreate(APIView):
# """
# Create a new Ngram in one list
# """
# renderer_classes = (JSONRenderer,)
# authentication_classes = (SessionAuthentication, BasicAuthentication)
#
# def post(self, request, list_id):
# """
# create NGram in a given list
#
# example: request.data = {'text': 'phylogeny'}
# """
# # implicit global session
# list_id = int(list_id)
# # format the ngram's text
# ngram_text = request.data.get('text', None)
# if ngram_text is None:
# raise APIException("Could not create a new Ngram without one \
# text key in the json body")
#
# ngram_text = ngram_text.strip().lower()
# ngram_text = ' '.join(ngram_text.split())
# # check if the ngram exists with the same terms
# ngram = session.query(Ngram).filter(Ngram.terms == ngram_text).first()
# if ngram is None:
# ngram = Ngram(n=len(ngram_text.split()), terms=ngram_text)
# else:
# # make sure the n value is correct
# ngram.n = len(ngram_text.split())
#
# session.add(ngram)
# session.commit()
# ngram_id = ngram.id
# # create the new node_ngram relation
# # TODO check existing Node_Ngram ?
# # £TODO ici indexation
# node_ngram = NodeNgram(node_id=list_id, ngram_id=ngram_id, weight=1.0)
# session.add(node_ngram)
# session.commit()
#
# # return the response
# return Response({
# 'uuid': ngram_id,
# 'text': ngram_text,
# 'list_id': list_id,
# })
class Document(APIView):
"""
Read-only Document view, similar to /api/nodes/
"""
renderer_classes = (JSONRenderer,)
def get(self, request, doc_id):
"""Document by ID"""
# implicit global session
node = session.query(Node).filter(Node.id == doc_id).first()
if node is None:
raise APIException('This node does not exist', 404)
try:
pub_date = datetime.datetime.strptime(node.hyperdata.get('publication_date'),
"%Y-%m-%d %H:%M:%S")
pub_date = pub_date.strftime("%x")
except ValueError:
pub_date = node.hyperdata.get('publication_date')
data = {
'title': node.hyperdata.get('title'),
'authors': node.hyperdata.get('authors'),
'journal': node.hyperdata.get('journal'),
'publication_date': pub_date,
'full_text': node.hyperdata.get('full_text'),
'abstract_text': node.hyperdata.get('abstract'),
'id': node.id
}
return Response(data)
mkdocs build --clean
mkdocs serve
#!/usr/bin/env python
import sys
import os
if __name__ == "__main__":
# Django settings
dirname = os.path.dirname(os.path.realpath(__file__))
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "gargantext.settings")
# initialize Django application
from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()
# retrieve Django models
import django.apps
django_models = django.apps.apps.get_models()
django_models_names = set(model._meta.db_table for model in django_models)
# migrate SQLAlchemy models
from gargantext import models
from gargantext.util.db import Base, engine
sqla_models_names = (
model for model in Base.metadata.tables.keys()
if model not in django_models_names
)
sqla_models = (
Base.metadata.tables[model_name]
for model_name in sqla_models_names
)
print()
for model in sqla_models:
try:
model.create(engine)
print('created model: `%s`' % model)
except Exception as e:
print('could not create model: `%s`, %s' % (model, e))
print()
#INSTALL
## Clone the repositority
(For now git clone ssh://gitolite@delanoe.org:1979/gargantext)
copy the workibg branch
git fetch origin refactoring
create your own branch if you want to contribute
git checkout -b username-refact refactoring
## Installation instruction
are detailled in gargantex/install
create a default user for granatext: gargantua
``` bash
sudo adduser --disabled-password --gecos "" gargantua
```
create the different directory for Gargantex
``` bash
for dir in "/srv/gargantext"
"/srv/gargantext_lib"
"/srv/gargantext_static"
"/srv/gargantext_media"
"/srv/env_3-5"; do
sudo mkdir -p $dir ;
sudo chown gargantua:gargantua $dir ;
done
```
You should have:
```bash
tree /srv
/srv
├── gargantext
├── gargantext_lib
├── gargantext_media
│   └── srv
│   └── env_3-5
├── gargantext_static
└── lost+found [error opening dir]
```
#Gargantext
==========
Install Instructions for Gargantext (CNRS):
1. [SETUP](##SETUP)
2. [INSTALL](##INSTALL)
3. [RUN](##RUN)
## Help needed ?
See http://gargantext.org/about and tools for the community
## SETUP
Prepare your environnement
Create user gargantua
Main user of Gargantext is Gargantua (role of Pantagruel soon)!
``` bash
sudo adduser --disabled-password --gecos "" gargantua
```
Create the directories you need
``` bash
for dir in "/srv/gargantext"
"/srv/gargantext_lib"
"/srv/gargantext_static"
"/srv/gargantext_media"
"/srv/env_3-5"; do
sudo mkdir -p $dir ;
sudo chown gargantua:gargantua $dir ;
done
```
You should see:
```bash
$tree /srv
/srv
├── gargantext
├── gargantext_lib
├── gargantext_media
│   └── srv
│   └── env_3-5
├── gargantext_static
└── lost+found [error opening dir]
```
## Get the source code of Gargantext
Clone the repository of gargantext
``` bash
git clone ssh://gitolite@delanoe.org:1979/gargantext /srv/gargantext \
&& cd /srv/gargantext \
&& git fetch origin refactoring \
&& git checkout refactoring \
```
**Optionnal**: if you want to contribute clone the repo into your own branch
``` bash
git checkout -b username-refactoring refactoring
```
! TODO (soon) : git clone https://gogs.iscpif.fr/gargantext.git
## SETUP
Build your OS dependencies
2 ways, for each you need to install Debian GNU/Linux dependencies.
1. [EASY] [Docker way](#DOCKER)
2. [EXPERT] [Debian way](#DEBIAN)
### DOCKER
* Install docker
See [installation instruction for your distribution](https://docs.docker.com/engine/installation/)
#### Build your docker image
``` bash
cd /srv/gargantext/install/docker/dev
./build
```
You should see
```
Successfully built <container_id>
```
#### Enter into the docker environnement
``` bash
./srv/gargantext/install/docker/enterGargantextImage
```
#### Install Python environment
Inside the docker image, execute as root:
``` bash
/srv/gargantext/install/python/configure
```
#### Configure PostgreSql
Inside the docker image, execute as root:
``` bash
/srv/gargantext/install/postgres/configure
```
#### Exit the docker
``` exit
```
#### Get main librairies
Can be long, so be patient :)
``` bash
wget http://dl.gargantext.org/gargantext_lib.tar.bz2 \
&& tar xvjf gargantext_lib.tar.bz2 -o /srv/gargantext_lib \
&& sudo chown -R gargantua:gargantua /srv/gargantext_lib \
&& echo "Libs installed"
```
### DEBIAN
[EXPERTS] Debian way (directory install/debian)
## INSTALL Gargantext
### Enter docker container
``` bash
/srv/gargantext/install/docker/enterGargantextImage
```
### Inside docker container configure the database
``` bash
service postgresql start
su gargantua
source /srv/env_3-5/bin/activate
python /srv/gargantext/dbmigrate.py
/srv/gargantext/manage.py migrate
python /srv/gargantext/dbmigrate.py
python /srv/gargantext/dbmigrate.py
echo "TODO: Init first user"
```
FIXME: dbmigrate need to launched several times since tables are
ordered with alphabetical order (and not dependencies order)
### Inside docker container launch Gargantext
``` bash
service postgresql start
su gargantua
source /srv/env_3-5/bin/activate
/srv/gargantext/manage.py runserver 0.0.0.0:8000
python /srv/gargantext/init_accounts.py /srv/gargantext/install/init/account.csv
```
## RUN
### Outside docker container launch browser
``` bash
chromium http://127.0.0.1:8000/
```
Click on Test Gargantext
Login : gargantua
Password : autnagrag
Enjoy :)
#Gargantext
==========
Install Instructions for Gargantext (CNRS):
1. [SETUP](##SETUP)
2. [INSTALL](##INSTALL)
3. [RUN](##RUN)
## Help needed ?
See http://gargantext.org/about and tools for the community
## SETUP
Prepare your environnement
Create user gargantua
Main user of Gargantext is Gargantua (role of Pantagruel soon)!
``` bash
sudo adduser --disabled-password --gecos "" gargantua
```
Create the directories you need
``` bash
for dir in "/srv/gargantext"
"/srv/gargantext_lib"
"/srv/gargantext_static"
"/srv/gargantext_media"
"/srv/env_3-5"; do
sudo mkdir -p $dir ;
sudo chown gargantua:gargantua $dir ;
done
```
You should see:
```bash
$tree /srv
/srv
├── gargantext
├── gargantext_lib
├── gargantext_media
│   └── srv
│   └── env_3-5
├── gargantext_static
└── lost+found [error opening dir]
```
## Get the source code of Gargantext
Clone the repository of gargantext
``` bash
git clone ssh://gitolite@delanoe.org:1979/gargantext /srv/gargantext \
&& cd /srv/gargantext \
&& git fetch origin refactoring \
&& git checkout refactoring \
```
**Optionnal**: if you want to contribute clone the repo into your own branch
``` bash
git checkout -b username-refactoring refactoring
```
! TODO (soon) : git clone https://gogs.iscpif.fr/gargantext.git
## SETUP
Build your OS dependencies
2 ways, for each you need to install Debian GNU/Linux dependencies.
1. [EASY] [Docker way](#DOCKER)
2. [EXPERT] [Debian way](#DEBIAN)
### DOCKER
* Install docker
See [installation instruction for your distribution](https://docs.docker.com/engine/installation/)
#### Build your docker image
``` bash
cd /srv/gargantext/install/docker/dev
./build
```
You should see
```
Successfully built <container_id>
```
#### Enter into the docker environnement
``` bash
./srv/gargantext/install/docker/enterGargantextImage
```
#### Install Python environment
Inside the docker image, execute as root:
``` bash
/srv/gargantext/install/python/configure
```
#### Configure PostgreSql
Inside the docker image, execute as root:
``` bash
/srv/gargantext/install/postgres/configure
```
#### Exit the docker
``` exit
```
#### Get main librairies
Can be long, so be patient :)
``` bash
wget http://dl.gargantext.org/gargantext_lib.tar.bz2 \
&& tar xvjf gargantext_lib.tar.bz2 -o /srv/gargantext_lib \
&& sudo chown -R gargantua:gargantua /srv/gargantext_lib \
&& echo "Libs installed"
```
### DEBIAN
[EXPERTS] Debian way (directory install/debian)
## INSTALL Gargantext
### Enter docker container
``` bash
/srv/gargantext/install/docker/enterGargantextImage
```
### Inside docker container configure the database
``` bash
service postgresql start
#su gargantua
source /srv/env_3-5/bin/activate
python /srv/gargantext/dbmigrate.py
/srv/gargantext/manage.py makemigrations
/srv/gargantext/manage.py migrate
python /srv/gargantext/dbmigrate.py
#create models:
python /srv/gargantext/dbmigrate.py
#created model: `nodes_hyperdata`
echo "TODO: Init first user"
```
FIXME: dbmigrate need to launched several times since tables are
ordered with alphabetical order (and not dependencies order)
### Inside docker container launch Gargantext
``` bash
service postgresql start
su gargantua
source /srv/env_3-5/bin/activate
/srv/gargantext/manage.py runserver 0.0.0.0:8000
python /srv/gargantext/init_accounts.py /srv/gargantext/install/init/account.csv
```
## RUN
### Outside docker container launch browser
``` bash
chromium http://127.0.0.1:8000/
```
Click on Test Gargantext
Login : gargantua
Password : autnagrag
Enjoy :)
// dot ngram_parsing_flow.dot -Tpng -o ngram_parsing_flow.png
digraph ngramflow {
edge [fontsize=10] ;
label=<<B><U>gargantext.util.toolchain</U></B><BR/>(ngram extraction flow)>;
labelloc="t" ;
"extracted_ngrams" -> "grouplist" ;
"extracted_ngrams" -> "occs+tfidfs" ;
"main_user_stoplist" -> "stoplist" ;
"stoplist" -> "mainlist" ;
"occs+tfidfs" -> "mainlist" [label=" TFIDF_LIMIT"];
"mainlist" -> "coocs" [label=" COOCS_THRESHOLD"] ;
"coocs" -> "specificity" ;
"specificity" -> "maplist" [label="MAPLIST_LIMIT\nMONOGRAM_PART"];
"maplist" -> "explore" ;
"grouplist" -> "maplist" ;
}
##Welcome to Gargantext documentation
#Contribution guide
## Community
* [http://gargantext.org/about](http://gargantext.org/about)
* IRC Chat: (OFTC/FreeNode) #gargantex
##Tools
* gogs
* server access
* gargantext box
##Gargantex
* Gargantex box install
(S.I.R.= Setup Install & Run procedures)
* Architecture Overview
* Database Schema Overview
* Interface design Overview
##To do:
* Docs
* Interface deisgn
* Parsers/scrapers
* Computing
## How to contribute:
1. Clone the repo
2. Create a new branch <username>-refactoring
3. Run the gargantext-box
4. Code
5.Test
6. Commit
### Exemple1: Adding a parser
* create your new file cern.py into gargantex/scrapers/
* reference into gargantex/scrapers/urls.py
add this line:
import scrapers.cern as cern
* reference into gargantext/constants
```
# type 9
{ 'name': 'Cern',
'parser': CernParser,
'default_language': 'en',
},
```
* add an APIKEY in gargantex/settings
### Exemple2: User Interface Design
* Referencer ses pages dans mkdocs.yml
* Ecrire chaque fichier en markdown (github flavor)
* Générer la doc create-dic.sh > génére un dossier site
* [RTFM](http://www.mkdocs.org/)!
#Gargantext
Welcome to Garagentext documentation!
Gargantext
===========
Install Instructions for Gargantext (CNRS):
=> Help needed ?
See [http://gargantext.org/about](http://gargantext.org/about) and [tools]() for the community
1. [SETUP](##SETUP)
2. [INSTALL](##INSTALL)
*.1 with [docker](####DOCKER) [EASY]
*.2 with [debian](####DEBIAN) [EXPERT]
3. [RUN](##RUN)
##SETUP
Prepare your environnement
* Create user gargantua
Main user of Gargantext is Gargantua (role of Pantagruel soon)!
``` bash
sudo adduser --disabled-password --gecos "" gargantua
```
* Create the directories you need
``` bash
for dir in "/srv/gargantext"
"/srv/gargantext_lib"
"/srv/gargantext_static"
"/srv/gargantext_media"
"/srv/env_3-5"; do
sudo mkdir -p $dir ;
sudo chown gargantua:gargantua $dir ;
done
```
You should see:
```bash
$tree /srv
/srv
├── gargantext
├── gargantext_lib
├── gargantext_media
│   └── srv
│   └── env_3-5
├── gargantext_static
└── lost+found [error opening dir]
```
* Get the main libraries
``` bash
wget http://dl.gargantext.org/gargantext_lib.tar.bz2 \
&& tar xvjf gargantext_lib.tar.bz2 -o /srv/gargantext_lib \
&& sudo chown -R gargantua:gargantua /srv/gargantext_lib \
&& echo "Libs installed"
```
* Get the source code of Gargantext
by cloning the repository of gargantext
``` bash
git clone ssh://gitolite@delanoe.org:1979/gargantext /srv/gargantext \
&& cd /srv/gargantext \
&& git fetch origin refactoring \
&& git checkout refactoring \
```
TODO(soon): git clone https://gogs.iscpif.fr/gargantext.git
TODO(soon): install/setup.sh
* **Optionnal**: if you want to contribute clone the repo into your own branch
``` bash
git checkout -b username-refactoring refactoring
```
##INSTALL
Build your OS dependencies
2 ways, for each you need to install Debian GNU/Linux dependencies.
####DOCKER WAY [EASY]
* Install docker
See [installation instruction for your distribution](https://docs.docker.com/engine/installation/)
* Build your docker image
``` bash
cd /srv/gargantext/install/docker/dev
./build
```
You should see
```
Successfully built <container_id>
```
* Enter into the docker environnement
``` bash
./srv/gargantext/install/docker/enterGargantextImage
```
* Install Python environment
Inside the docker image, execute as root:
``` bash
/srv/gargantext/install/python/configure
```
* Configure PostgreSql
Inside the docker image, execute as root:
``` bash
/srv/gargantext/install/postgres/configure
```
* Exit the docker
```
exit (or Ctrl+D)
```
####DEBIAN way [EXPERT]
[EXPERTS] Debian way (directory install/debian)
Install Gargantext server
* Enter docker container
``` bash
/srv/gargantext/install/docker/enterGargantextImage
```
* Configure the database
Inside the docker container:
``` bash
service postgresql start
#su gargantua
source /srv/env_3-5/bin/activate
python /srv/gargantext/dbmigrate.py
/srv/gargantext/manage.py makemigrations
/srv/gargantext/manage.py migrate
python /srv/gargantext/dbmigrate.py
#will create tables and not hyperdata_nodes
python /srv/gargantext/dbmigrate.py
#will create table hyperdata_nodes
#launch first time the server to create first user
/srv/gargantext/manage.py runserver 0.0.0.0:8000
/srv/gargantext/init_accounts.py /srv/gargantext/install/init/account.csv
```
FIXME: dbmigrate need to launched several times since tables are
ordered with alphabetical order (and not dependencies order)
##RUN
* Launch Gargantext
Inside the docker container:
``` bash
#start postgresql
service postgresql start
#change to user
su gargantua
#activate the virtualenv
source /srv/env_3-5/bin/activate
#go to gargantext srv
cd /srv/gargantext/
#run the server
/manage.py runserver 0.0.0.0:8000
```
* Launch browser
outside the docker
``` bash
chromium http://127.0.0.1:8000/
```
* Click on Test Gargantext
```
Login : gargantua
Password : autnagrag
```
Enjoy :)
See [User Guide](./tuto.md) for quick usage example
# API
Be more careful about authorizations.
cf. "ng-resource".
# Projects
## Overview of all projects
- re-implement deletion
## Single project view
- re-implement deletion
# Taggers
Path for data used by taggers should be defined in `gargantext.constants`.
# Database
# Sharing
Here follows a brief description of how sharing could be implemented.
## Database representation
The database representation of sharing can be distributed among 4 tables:
- `persons`, of which items represent either a user or a group
- `relationships` describes the relationships between persons (affiliation
of a user to a group, contact between two users, etc.)
- `nodes` contains the projects, corpora, documents, etc. to share (they shall
inherit the sharing properties from their parents)
- `permissions` stores the relations existing between the three previously
described above: it only consists of 2 foreign keys, plus an integer
between 1 and 3 representing the level of sharing and the start date
(when the sharing has been set) and the end date (when necessary, the time
at which sharing has been removed, `NULL` otherwise)
## Python code
The permission levels should be set in `gargantext.constants`, and defined as:
```python
PERMISSION_NONE = 0 # 0b0000
PERMISSION_READ = 1 # 0b0001
PERMISSION_WRITE = 3 # 0b0011
PERMISSION_OWNER = 7 # 0b0111
```
The requests to check for permissions (or add new ones) should not be rewritten
every time. They should be "hidden" within the models:
- `Person.owns(node)` returns a boolean
- `Person.can_read(node)` returns a boolean
- `Person.can_write(node)` returns a boolean
- `Person.give_right(node, permission)` gives a right to a given user
- `Person.remove_right(node, permission)` removes a right from a given user
- `Person.get_nodes(permission[, type])` returns an iterator on the list of
nodes on which the person has at least the given permission (optional
argument: type of requested node)
- `Node.get_persons(permission[, type])` returns an iterator on the list of
users who have at least the given permission on the node (optional argument:
type of requested persons, such as `USER` or `GROUP`)
## Example
Let's imagine the `persons` table contains the following data:
| id | type | username |
|----|-------|-----------|
| 1 | USER | David |
| 2 | GROUP | C.N.R.S. |
| 3 | USER | Alexandre |
| 4 | USER | Untel |
| 5 | GROUP | I.S.C. |
| 6 | USER | Bidule |
Assume "David" owns the groups "C.N.R.S." and "I.S.C.", "Alexandre" belongs to
the group "I.S.C.", with "Untel" and "Bidule" belonging to the group "C.N.R.S.".
"Alexandre" and "David" are in contact.
The `relationships` table then contains:
| person1_id | person2_id | type |
|------------|------------|---------|
| 1 | 2 | OWNER |
| 1 | 5 | OWNER |
| 3 | 2 | MEMBER |
| 4 | 5 | MEMBER |
| 6 | 5 | MEMBER |
| 1 | 3 | CONTACT |
The `nodes` table is populated as such:
| id | type | name |
|----|----------|----------------------|
| 12 | PROJECT | My super project |
| 13 | CORPUS | A given corpus |
| 13 | CORPUS | The corpus |
| 14 | DOCUMENT | Some document |
| 15 | DOCUMENT | Another document |
| 16 | DOCUMENT | Yet another document |
| 17 | DOCUMENT | Last document |
| 18 | PROJECT | Another project |
| 19 | PROJECT | That project |
If we want to express that "David" created "My super project" (and its children)
and wants everyone in "C.N.R.S." to be able to view it, but not access it,
`permissions` should contain:
| person_id | node_id | permission |
|-----------|---------|------------|
| 1 | 12 | OWNER |
| 2 | 12 | READ |
If "David" also wanted "Alexandre" (and no one else) to view and modify "The
corpus" (and its children), we would have:
| person_id | node_id | permission |
|-----------|---------|------------|
| 1 | 12 | OWNER |
| 2 | 12 | READ |
| 3 | 13 | WRITE |
If "Alexandre" created "That project" and wants "Bidule" (and no one else) to be
able to view and modify it (and its children), the table should then have:
| person_id | node_id | permission |
|-----------|---------|------------|
| 3 | 19 | OWNER |
| 6 | 19 | WRITE |
#User guide
1. Login
run the gargantex box following the install procedure
open a webrowser at http://127.0.0.1:8000/
click on Test Gargantext
login with:
```
Login : gargantua
Password : autnagrag
```
2. Create a project
3. Import an existing corpus
4. Create corpus from search
5. Explore stats
6. Explore graphs
7. Query
8. Refine
* Time periods
* Nodes
9. Export
# django.ini file
[uwsgi]
# uwsgi --vacuum --socket monsite/mysite.sock --wsgi-file monsite/wsgi.py --chmod-socket=666 --home=/srv/alexandre.delanoe/env --chdir=/var/www/www/alexandre/monsite --env
env = DJANGO_SETTINGS_MODULE=gargantext.settings
#module = django.core.handlers.wsgi:WSGIHandler()
plugins = python35
# the base directory
chdir = /srv/gargantext
# Django's wsgi file
#module = wsgi
wsgi-file = /srv/gargantext/gargantext/wsgi.py
# the virtualenv
home = /srv/env_3-5
lazy-apps = True
# master
master = true
# maximum number of processes
processes = 10
# the socket (use the full path to be safe)
socket = /tmp/gargantext.sock
threads = 4
# with appropriate permissions - *may* be needed
chmod-socket = 666
# clear environment on exit
vacuum = true
pidfile = /tmp/gargantext.pid
# touch /tmp/gargantext.reload to reload configuration (after git pull for instance)
touch-reload = /tmp/gargantext.reload
# respawn processes taking more than 20 seconds
harakiri = 120
# limit the project to 128 MB
#limit-as = 128
# respawn processes after serving 5000 requests
max-requests = 5000
# background the process & log
#daemonize = /var/log/uwsgi/gargantext.log
uid = 1000
gid = 1000
# WARNING: to ensure consistency and retrocompatibility, lists should keep the
# initial order (ie., new elements should be appended at the end of the lists)
from gargantext.util.lists import *
from gargantext.util.tools import datetime, convert_to_date
import re
LISTTYPES = {
'DOCUMENT' : WeightedList,
'GROUPLIST' : Translations,
'STOPLIST' : UnweightedList,
'MAINLIST' : UnweightedList,
'MAPLIST' : UnweightedList,
'SPECIFICITY' : WeightedList,
'OCCURRENCES' : WeightedContextIndex,
'COOCCURRENCES': WeightedMatrix,
'TFIDF-CORPUS' : WeightedContextIndex,
'TFIDF-GLOBAL' : WeightedContextIndex,
}
NODETYPES = [
None,
# documents hierarchy
'USER', # 1
'PROJECT', # 2
'CORPUS', # 3
'DOCUMENT', # 4
# lists
'STOPLIST', # 5
'GROUPLIST', # 6
'MAINLIST', # 7
'MAPLIST', # 8
'COOCCURRENCES', # 9
# scores
'OCCURRENCES', # 10
'SPECIFICITY', # 11
'CVALUE', # 12
'TFIDF-CORPUS', # 13
'TFIDF-GLOBAL', # 14
# docs subset
'FAVORITES' # 15
]
INDEXED_HYPERDATA = {
# TODO use properties during toolchain.hyperdata_indexing
# (type, convert_to_db, convert_from_db)
'count':
{ 'id' : 1
, 'type' : int
, 'convert_to_db' : int
, 'convert_from_db': int
},
'publication_date':
{ 'id' : 2
, 'type' : datetime.datetime
, 'convert_to_db' : convert_to_date
, 'convert_from_db': datetime.datetime.fromtimestamp
},
'title':
{ 'id' : 3
, 'type' : str
, 'convert_to_db' : str
, 'convert_from_db': str
},
'authors':
{ 'id' : 4
, 'type' : str
, 'convert_to_db' : str
, 'convert_from_db': str
},
'journal':
{ 'id' : 5
, 'type' : str
, 'convert_to_db' : str
, 'convert_from_db': str
},
'abstract':
{ 'id' : 6
, 'type' : str
, 'convert_to_db' : str
, 'convert_from_db': str
},
# 'text':
# { 'id' : 7
# , 'type' : str
# , 'convert_to_db' : str
# , 'convert_from_db': str
# },
#
# 'page':
# { 'id' : 8
# , 'type' : int
# , 'convert_to_db' : int
# , 'convert_from_db': int
# },
}
from gargantext.util.taggers import FrenchMeltTagger, TurboTagger
LANGUAGES = {
'en': {
#'tagger': EnglishMeltTagger,
'tagger': TurboTagger,
#'tagger': NltkTagger,
},
'fr': {
'tagger': FrenchMeltTagger,
# 'tagger': TreeTagger,
},
}
from gargantext.util.parsers import \
EuropressParser, RISParser, PubmedParser, ISIParser, CSVParser, ISTexParser
def resourcetype(name):
'''
resourcetype :: String -> Int
Usage : resourcetype("Europress (English)") == 1
Examples in scrapers scripts (Pubmed or ISTex for instance).
'''
return [n[0] for n in enumerate(r['name'] for r in RESOURCETYPES) if n[1] == name][0]
def resourcename(corpus):
'''
resourcetype :: Corpus -> String
Usage : resourcename(corpus) == "ISTex"
'''
resource = corpus.resources()[0]
resourcename = RESOURCETYPES[resource['type']]['name']
return re.sub(r'\(.*', '', resourcename)
RESOURCETYPES = [
# type 0
{ 'name': 'Select database below',
'parser': None,
'default_language': None,
},
# type 1
{ 'name': 'Europress (English)',
'parser': EuropressParser,
'default_language': 'en',
},
# type 2
{ 'name': 'Europress (French)',
'parser': EuropressParser,
'default_language': 'fr',
},
# type 3
{ 'name': 'Jstor (RIS format)',
'parser': RISParser,
'default_language': 'en',
},
# type 4
{ 'name': 'Pubmed (XML format)',
'parser': PubmedParser,
'default_language': 'en',
},
# type 5
{ 'name': 'Scopus (RIS format)',
'parser': RISParser,
'default_language': 'en',
},
# type 6
{ 'name': 'Web of Science (ISI format)',
'parser': ISIParser,
'default_language': 'en',
},
# type 7
{ 'name': 'Zotero (RIS format)',
'parser': RISParser,
'default_language': 'en',
},
# type 8
{ 'name': 'CSV',
'parser': CSVParser,
'default_language': 'en',
},
# type 9
{ 'name': 'ISTex',
'parser': ISTexParser,
'default_language': 'en',
},
]
# linguistic extraction parameters ---------------------------------------------
DEFAULT_TFIDF_CUTOFF_RATIO = .75 # MAINLIST maximum terms in %
DEFAULT_TFIDF_HARD_LIMIT = 3000 # MAINLIST maximum terms abs
# (makes COOCS larger ~ O(N²) /!\)
DEFAULT_COOC_THRESHOLD = 2 # inclusive minimum for COOCS coefs
# (makes COOCS more sparse)
DEFAULT_MAPLIST_MAX = 350 # MAPLIST maximum terms
DEFAULT_MAPLIST_MONOGRAMS_RATIO = .05 # part of monograms in MAPLIST
DEFAULT_MAX_NGRAM_LEN = 7 # limit used after POStagging rule
# (initial ngrams number is a power law of this /!\)
# (and most longer ngrams have tiny freq anyway)
DEFAULT_ALL_LOWERCASE_FLAG = True # lowercase ngrams before recording
# them to their DB table
# (potentially bad for acronyms but
# good for variants like same term
#  occurring at sentence beginning)
# ------------------------------------------------------------------------------
# other parameters
# default number of docs POSTed to scrappers.views.py
# (at page project > add a corpus > scan/process sample)
QUERY_SIZE_N_DEFAULT = 1000
import os
from .settings import BASE_DIR
# uploads/.gitignore prevents corpora indexing
# copora can be either a folder or symlink towards specific partition
UPLOAD_DIRECTORY = os.path.join(BASE_DIR, 'uploads/corpora')
UPLOAD_LIMIT = 1024 * 1024 * 1024
DOWNLOAD_DIRECTORY = UPLOAD_DIRECTORY
# about batch processing...
BATCH_PARSING_SIZE = 256
BATCH_NGRAMSEXTRACTION_SIZE = 1024
# Scrapers config
QUERY_SIZE_N_MAX = 1000
QUERY_SIZE_N_DEFAULT = 1000
# Grammar rules for chunking
RULE_JJNN = "{<JJ.*>*<NN.*|>+<JJ.*>*}"
RULE_JJDTNN = "{<JJ.*>*<NN.*>+((<P|IN> <DT>? <JJ.*>* <NN.*>+ <JJ.*>*)|(<JJ.*>))*}"
RULE_TINA = "^((VBD,|VBG,|VBN,|CD.?,|JJ.?,|\?,){0,2}?(N.?.?,|\?,)+?(CD.,)??)\
+?((PREP.?|DET.?,|IN.?,|CC.?,|\?,)((VBD,|VBG,|VBN,|CD.?,|JJ.?,|\?\
,){0,2}?(N.?.?,|\?,)+?)+?)*?$"
from .nodes import *
from .hyperdata import *
from .users import *
from .ngrams import *
from gargantext.util.db import *
from gargantext.constants import INDEXED_HYPERDATA
from .nodes import Node
import datetime
__all__ = ['NodeHyperdata', 'HyperdataKey']
class classproperty(object):
"""See: http://stackoverflow.com/a/3203659/734335
"""
def __init__(self, getter):
self.getter = getter
def __get__(self, instance, owner):
return self.getter(owner)
class HyperdataValueComparer(object):
"""This class is there to allow hyperdata comparison.
Its attribute are overrided at the end of the present module to fit those
of the `value_flt` and `value_str` attributes of the `NodeHyperdata` class.
"""
class HyperdataKey(TypeDecorator):
"""Define a new type of column to describe a Hyperdata field's type.
Internally, this column type is implemented as an SQL integer.
Values are detailed in `gargantext.constants.INDEXED_HYPERDATA`.
"""
impl = Integer
def process_bind_param(self, keyname, dialect):
if keyname in INDEXED_HYPERDATA:
return INDEXED_HYPERDATA[keyname]['id']
raise ValueError('Hyperdata key "%s" was not found in `gargantext.constants.INDEXED_HYPERDATA`' % keyname)
def process_result_value(self, keyindex, dialect):
for keyname, keysubhash in INDEXED_HYPERDATA.items():
if keysubhash['id'] == keyindex:
return keyname
raise ValueError('Hyperdata key with id=%d was not found in `gargantext.constants.INDEXED_HYPERDATA`' % keyindex)
class NodeHyperdata(Base):
"""This model's primary role is to allow better indexation of hyperdata.
It stores values contained in the `nodes.hyperdata` column (only those
listed in `gargantext.constants.INDEXED_HYPERDATA`), associated with the
corresponding key's index, and hyperdata value.
Example:
query = (session
.query(Node)
.join(NodeHyperdata)
.filter(NodeHyperdata.key == 'title')
.filter(NodeHyperdata.value.startswith('Bees'))
)
Example:
query = (session
.query(Node)
.join(NodeHyperdata)
.filter(NodeHyperdata.key == 'publication_date')
.filter(NodeHyperdata.value > datetime.datetime.now())
)
"""
__tablename__ = 'nodes_hyperdata'
id = Column( Integer, primary_key=True )
node_id = Column( Integer, ForeignKey(Node.id, ondelete='CASCADE'))
key = Column( HyperdataKey )
value_int = Column( Integer , index=True )
value_flt = Column( Double() , index=True )
value_utc = Column( DateTime(timezone=True) , index=True )
value_str = Column( String(255) , index=True )
value_txt = Column( Text , index=False )
def __init__(self, node=None, key=None, value=None):
"""Custom constructor
"""
# node reference
if node is not None:
if hasattr(node, 'id'):
self.node_id = node.id
else:
self.node_id = node
# key
if key is not None:
self.key = key
# value
self.value = value
# FIXME
@property
def value(self):
"""Pseudo-attribute used to extract the value in the right format.
"""
key = INDEXED_HYPERDATA[self.key]
return key['convert_from_db'](
self.value_flt if (self.value_str is None) else self.value_str
)
@value.setter
def value(self, value):
"""Pseudo-attribute used to insert the value in the right format.
"""
key = INDEXED_HYPERDATA[self.key]
value = key['convert_to_db'](value)
if isinstance(value, str):
self.value_str = value
else:
self.value_flt = value
@classproperty
def value(cls):
"""Pseudo-attribute used for hyperdata comparison inside a query.
"""
return HyperdataValueComparer()
def HyperdataValueComparer_overrider(key):
def comparator(self, *args):
if len(args) == 0:
return
if isinstance(args[0], datetime.datetime):
args = tuple(map(datetime.datetime.timestamp, args))
if isinstance(args[0], (int, float)):
return getattr(NodeHyperdata.value_flt, key)(*args)
if isinstance(args[0], str):
return getattr(NodeHyperdata.value_str, key)(*args)
return comparator
# ??
for key in set(dir(NodeHyperdata.value_flt) + dir(NodeHyperdata.value_str)):
if key in ( '__dict__'
, '__weakref__'
, '__repr__'
, '__str__') \
or 'attr' in key \
or 'class' in key \
or 'init' in key \
or 'new' in key :
continue
setattr(HyperdataValueComparer, key, HyperdataValueComparer_overrider(key))
from gargantext.util.db import *
from .nodes import Node
__all__ = ['Ngram', 'NodeNgram', 'NodeNodeNgram', 'NodeNgramNgram']
class Ngram(Base):
__tablename__ = 'ngrams'
id = Column(Integer, primary_key=True)
terms = Column(String(255), unique=True)
n = Column(Integer)
class NodeNgram(Base):
__tablename__ = 'nodes_ngrams'
node_id = Column(Integer, ForeignKey(Node.id, ondelete='CASCADE'), primary_key=True)
ngram_id = Column(Integer, ForeignKey(Ngram.id, ondelete='CASCADE'), primary_key=True)
weight = Column(Float)
class NodeNodeNgram(Base):
""" for instance for TFIDF
(
doc ::Node ,
corpus ::Node ,
word ::Ngram ,
tfidf of ngram in doc in corpus ::Float (real)
)
"""
__tablename__ = 'nodes_nodes_ngrams'
node1_id = Column(Integer, ForeignKey(Node.id, ondelete='CASCADE'), primary_key=True)
node2_id = Column(Integer, ForeignKey(Node.id, ondelete='CASCADE'), primary_key=True)
ngram_id = Column(Integer, ForeignKey(Ngram.id, ondelete='CASCADE'), primary_key=True)
score = Column(Float(precision=24))
# précision max 24 bit pour un type sql "real" (soit 7 chiffres après virgule)
# sinon par défaut on aurait un type sql "double_precision" (soit 15 chiffres)
# (cf. www.postgresql.org/docs/9.4/static/datatype-numeric.html#DATATYPE-FLOAT)
class NodeNgramNgram(Base):
""" for instance for COOCCURRENCES and GROUPLIST
(
cooc_node/group_node ::Node ,
term_A ::Ngram ,
term_B ::Ngram ,
weight ::Float (real)
)
"""
__tablename__ = 'nodes_ngrams_ngrams'
node_id = Column(Integer, ForeignKey(Node.id, ondelete='CASCADE'), primary_key=True)
ngram1_id = Column(Integer, ForeignKey(Ngram.id, ondelete='CASCADE'), primary_key=True)
ngram2_id = Column(Integer, ForeignKey(Ngram.id, ondelete='CASCADE'), primary_key=True)
weight = Column(Float(precision=24)) # see comment for NodeNodeNgram.score
from gargantext.util.db import *
from gargantext.util.files import upload
from gargantext.constants import *
from datetime import datetime
from .users import User
__all__ = ['Node', 'NodeNode']
class NodeType(TypeDecorator):
"""Define a new type of column to describe a Node's type.
Internally, this column type is implemented as an SQL integer.
Values are detailed in `gargantext.constants.NODETYPES`.
"""
impl = Integer
def process_bind_param(self, typename, dialect):
return NODETYPES.index(typename)
def process_result_value(self, typeindex, dialect):
return NODETYPES[typeindex]
class Node(Base):
"""This model can fit many purposes.
It intends to provide a generic model, allowing hierarchical structure
and NoSQL-like data structuring.
The possible types are defined in `gargantext.constants.NODETYPES`.
"""
__tablename__ = 'nodes'
id = Column(Integer, primary_key=True)
typename = Column(NodeType, index=True)
# foreign keys
user_id = Column(Integer, ForeignKey(User.id, ondelete='CASCADE'))
parent_id = Column(Integer, ForeignKey('nodes.id', ondelete='CASCADE'))
# main data
name = Column(String(255))
date = Column(DateTime(), default=datetime.now)
# metadata (see https://bashelton.com/2014/03/updating-postgresql-json-fields-via-sqlalchemy/)
hyperdata = Column(JSONB, default=dict)
def __init__(self, **kwargs):
"""Node's constructor.
Initialize the `hyperdata` as a dictionary if no value was given.
"""
if 'hyperdata' not in kwargs:
kwargs['hyperdata'] = kwargs.get('hyperdata', MutableDict())
Base.__init__(self, **kwargs)
def __getitem__(self, key):
"""Allow direct access to hyperdata via the bracket operator.
"""
return self.hyperdata[key]
def __setitem__(self, key, value):
"""Allow direct access to hyperdata via the bracket operator.
"""
self.hyperdata[key] = value
@property
def ngrams(self):
"""Pseudo-attribute allowing to retrieve a node's ngrams.
Returns a query (which can be further filtered), of which returned rows
are the ngram's weight for this node and the ngram.
"""
from . import NodeNgram, Ngram
query = (session
.query(NodeNgram.weight, Ngram)
.select_from(NodeNgram)
.join(Ngram)
.filter(NodeNgram.node_id == self.id)
)
return query
def as_list(self):
"""Retrieve the current node as a list/matrix of ngrams identifiers.
See `gargantext.util.lists` and `gargantext.constants.LISTTYPES`
for more info.
"""
try:
return LISTTYPES[self.typename](self.id)
except KeyError:
raise ValueError('This node\'s typename is not convertible to a list: %s (accepted values: %s)' % (
self.typename,
', '.join(LISTTYPES.keys())
))
def save_hyperdata(self):
"""This is a necessary, yet ugly trick.
Indeed, PostgreSQL does not yet manage incremental updates (see
https://bashelton.com/2014/03/updating-postgresql-json-fields-via-sqlalchemy/)
"""
from sqlalchemy.orm.attributes import flag_modified
flag_modified(self, 'hyperdata')
# # previous trick (even super-uglier)
# hyperdata = self.hyperdata
# self.hyperdata = None
# session.add(self)
# session.commit()
# self.hyperdata = hyperdata
# session.add(self)
# session.commit()
def children(self, typename=None, order=None):
"""Return a query to all the direct children of the current node.
Allows filtering by typename (see `constants.py`)
"""
query = session.query(Node).filter(Node.parent_id == self.id)
if typename is not None:
query = query.filter(Node.typename == typename)
if order is not None:
query = query.order_by(Node.name)
return query
def add_child(self, **kwargs):
"""Create and return a new direct child of the current node.
"""
return Node(
user_id = self.user_id,
parent_id = self.id,
**kwargs
)
def resources(self):
"""Return all the resources attached to a given node.
Mainly used for corpora.
example:
[{'extracted': True,
'path': '/home/me/gargantext/uploads/corpora/0c/0c5b/0c5b50/0c5b50ad8ebdeb2ae33d8e54141a52ee_Corpus_Europresse-Français-2015-12-11.zip',
'type': 1,
'url': None}]
"""
if 'resources' not in self.hyperdata:
self['resources'] = MutableList()
return self['resources']
def add_resource(self, type, path=None, url=None):
"""Attach a resource to a given node.
Mainly used for corpora.
this just adds metadata to the CORPUS node (NOT for adding documents)
example:
{'extracted': True,
'path': '/home/me/gargantext/uploads/corpora/0c/0c5b/0c5b50/0c5b50ad8ebdeb2ae33d8e54141a52ee_Corpus_Europresse-Français-2015-12-11.zip',
'type': 1,
'url': None}
"""
self.resources().append(MutableDict(
{'type': type, 'path':path, 'url':url, 'extracted': False}
))
def status(self, action=None, progress=0, complete=False, error=None):
"""Get or update the status of the given action.
If no action is given, the status of the first uncomplete or last item
is returned.
The `complete` parameter should be a boolean.
The `error` parameter should be an exception.
"""
date = datetime.now()
# if the hyperdata do not have data about status
if 'statuses' not in self.hyperdata:
self['statuses'] = MutableList()
# if no action name is given, return the last appended status
if action is None:
for status in self['statuses']:
if not status['complete']:
return status
if len(self['statuses']):
return self['statuses'][-1]
return None
# retrieve the status concerning by the given action name
for status in self['statuses']:
if status['action'] == action:
if error:
status['error'] = error
if progress:
status['progress'] = progress
if complete:
status['complete'] = complete
if error or progress or complete:
status['date'] = date
return status
# if no status has been found for the action, append a new one
self['statuses'].append(MutableDict(
{'action':action, 'progress':progress, 'complete':complete, 'error':error, 'date':date}
))
return self['statuses'][-1]
class NodeNode(Base):
__tablename__ = 'nodes_nodes'
node1_id = Column(Integer, ForeignKey(Node.id, ondelete='CASCADE'), primary_key=True)
node2_id = Column(Integer, ForeignKey(Node.id, ondelete='CASCADE'), primary_key=True)
score = Column(Float(precision=24))
from django.contrib.auth import models
from gargantext.util.db import *
from datetime import datetime
__all__ = ['User']
class User(Base):
# The properties below are a reflection of Django's auth module's models.
__tablename__ = models.User._meta.db_table
id = Column(Integer, primary_key=True)
password = Column(String(128))
is_superuser = Column(Boolean(), default=False)
is_staff = Column(Boolean(), default=False)
username = Column(String(30))
first_name = Column(String(30), default="")
last_name = Column(String(30), default="")
email = Column(String(75))
is_active = Column(Boolean(), default=True)
last_login = Column(DateTime(timezone=False))
date_joined = Column(DateTime(timezone=False), default=datetime.now)
def contacts(self):
"""get all contacts in relation with the user"""
Friend = aliased(User)
query = (session
.query(Friend)
.join(Contact, Contact.user2_id == Friend.id)
.filter(Contact.user1_id == self.id)
)
return query.all()
def nodes(self, typename=None, order=None):
"""get all nodes belonging to the user"""
from .nodes import Node
query = (session
.query(Node)
.filter(Node.user_id == self.id)
)
if typename is not None:
query = query.filter(Node.typename == typename)
if order is not None:
query = query.order_by(Node.name)
return query
def contacts_nodes(self, typename=None):
for contact in self.contacts():
contact_nodes = (session
.query(Node)
.filter(Node.user_id == contact.id)
.filter(Node.typename == typename)
.order_by(Node.date)
).all()
yield contact, contact_nodes
def owns(self, node):
"""check if a given node is owned by the user"""
return (node.user_id == self.id) or \
node.id in (contact.id for contact in self.contacts())
class Contact(Base):
__tablename__ = 'contacts'
id = Column(Integer, primary_key=True)
user1_id = Column(Integer, primary_key=True)
user2_id = Column(Integer, primary_key=True)
is_blocked = Column(Boolean(), default=False)
date_creation = Column(DateTime(timezone=False))
__table_args__ = (UniqueConstraint('user1_id', 'user2_id'), )
"""URL Configuration of GarganText
Views are shared between these modules:
- `api`, for JSON and CSV interaction with data
- `pages`, to present HTML views to the user
- `contents`, for Python-generated contents
- `annotations`, to annotate local context of a corpus (as global context)
- `graph explorer`, to explore graphs
"""
from django.conf.urls import include, url
from django.contrib import admin
from django.views.generic.base import RedirectView as Redirect
from django.contrib.staticfiles.storage import staticfiles_storage as static
import gargantext.views.api.urls
import gargantext.views.pages.urls
# Module Annotation
## tempo: unchanged doc-annotations --
from annotations import urls as annotations_urls
from annotations.views import main as annotations_main_view
# Module "Graph Explorer"
#from graphExplorer import urls as graphExplorer_urls
import graphExplorer.urls
# Module Scrapers
import moissonneurs.urls
urlpatterns = [ url(r'^admin/' , admin.site.urls )
, url(r'^api/' , include( gargantext.views.api.urls ) )
, url(r'^' , include( gargantext.views.pages.urls ) )
, url(r'^favicon.ico$', Redirect.as_view( url=static.url('favicon.ico')
, permanent=False), name="favicon")
# Module "Graph Explorer"
, url(r'^' , include( graphExplorer.urls ) )
# Module Annotation
# tempo: unchanged doc-annotations routes --
, url(r'^annotations/', include( annotations_urls ) )
, url(r'^projects/(\d+)/corpora/(\d+)/documents/(\d+)/$', annotations_main_view)
# Module Scrapers (Moissonneurs in French)
, url(r'^moissonneurs/' , include( moissonneurs.urls ) )
]
from gargantext import settings
from gargantext.util.json import json_dumps
# get engine, session, etc.
from sqlalchemy.orm import sessionmaker, scoped_session
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import delete
def get_engine():
from sqlalchemy import create_engine
url = 'postgresql+psycopg2://{USER}:{PASSWORD}@{HOST}:{PORT}/{NAME}'.format(
**settings.DATABASES['default']
)
return create_engine( url
, use_native_hstore = True
, json_serializer = json_dumps
, pool_size=20, max_overflow=0
)
engine = get_engine()
Base = declarative_base()
session = scoped_session(sessionmaker(bind=engine))
# tools to build models
from sqlalchemy.types import *
from sqlalchemy.schema import Column, ForeignKey, UniqueConstraint
from sqlalchemy.dialects.postgresql import JSONB, DOUBLE_PRECISION
from sqlalchemy.ext.mutable import MutableDict, MutableList
Double = DOUBLE_PRECISION
# useful for queries
from sqlalchemy.orm import aliased
from sqlalchemy import func, desc
# bulk insertions
import psycopg2
def get_cursor():
db_settings = settings.DATABASES['default']
db = psycopg2.connect(**{
'database': db_settings['NAME'],
'user': db_settings['USER'],
'password': db_settings['PASSWORD'],
'host': db_settings['HOST'],
'port': db_settings['PORT']
})
return db, db.cursor()
class bulk_insert:
def __init__(self, table, fields, data, cursor=None):
# prepare the iterator
self.iter = iter(data)
# prepare the cursor
if cursor is None:
db, cursor = get_cursor()
mustcommit = True
else:
mustcommit = False
# insert data
if not isinstance(table, str):
table = table.__tablename__
cursor.copy_from(self, table, columns=fields)
# commit if necessary
if mustcommit:
db.commit()
def read(self, size=None):
# see http://www.postgresql.org/docs/9.4/static/sql-copy.html#AEN72054
try:
return '\t'.join(
value.replace('\\', '\\\\').replace('\n', '\\\n').replace('\r', '\\\r').replace('\t', '\\\t')
if isinstance(value, str) else str(value) if value is not None else '\\N'
for value in next(self.iter)
) + '\n'
except StopIteration:
return ''
readline = read
def bulk_insert_ifnotexists(model, uniquekey, fields, data, cursor=None):
if cursor is None:
db, cursor = get_cursor()
mustcommit = True
else:
mustcommit = False
# create temporary table with given data
sql_columns = 'id INTEGER'
for field in fields:
column = getattr(model, field)
sql_columns += ', %s %s' % (field, column.type, )
cursor.execute('CREATE TEMPORARY TABLE __tmp__ (%s)' % (sql_columns, ))
bulk_insert('__tmp__', fields, data, cursor=cursor)
# update ids of the temporary table
cursor.execute('''
UPDATE __tmp__
SET id = source.id
FROM {sourcetable} AS source
WHERE __tmp__.{uniquecolumn} = source.{uniquecolumn}
'''.format(
sourcetable = model.__tablename__,
uniquecolumn = uniquekey,
))
# insert what has not been found to the real table
cursor.execute('''
INSERT INTO {sourcetable} ({columns})
SELECT {columns}
FROM __tmp__
WHERE id IS NULL
'''.format(
sourcetable = model.__tablename__,
columns = ', '.join(fields),
))
# retrieve dict associating unique key to id
cursor.execute('''
SELECT source.id, source.{uniquecolumn}
FROM {sourcetable} AS source
INNER JOIN __tmp__ ON __tmp__.{uniquecolumn} = source.{uniquecolumn}
'''.format(
sourcetable = model.__tablename__,
uniquecolumn = uniquekey,
columns = ', '.join(fields),
))
result = {
row[1]: row[0] for row in cursor.fetchall()
}
# this is the end!
cursor.execute('DROP TABLE __tmp__')
if mustcommit:
db.commit()
return result
"""Cache management
Allows retrieval of an instance from the value of one of its primary or unique
keys, without querying the database.
"""
from sqlalchemy import or_
from gargantext.util.db import *
from gargantext import models
class ModelCache(dict):
def __init__(self, model, preload=False):
self._model = model
self._columns = [column for column in model.__table__.columns if column.unique or column.primary_key]
self._columns_names = [column.name for column in self._columns]
if preload:
self.preload()
def __missing__(self, key):
formatted_key = None
conditions = []
for column in self._columns:
try:
formatted_key = column.type.python_type(key)
conditions.append(column == key)
except ValueError as e:
continue
if formatted_key in self:
self[key] = self[formatted_key]
else:
element = session.query(self._model).filter(or_(*conditions)).first()
if element is None:
raise KeyError
self[key] = element
return element
def preload(self):
self.clear()
for element in session.query(self._model).all():
for column_name in self._columns_names:
key = getattr(element, column_name)
self[key] = element
class Cache:
def __getattr__(self, key):
try:
model = getattr(models, key)
except AttributeError:
raise AttributeError('No such model: `%s`' % key)
modelcache = ModelCache(model)
setattr(self, key, modelcache)
return modelcache
cache = Cache()
import hashlib
import binascii
def digest(value, algorithm='md5'):
m = hashlib.new(algorithm)
m.update(value)
return m.digest()
def str_digest(value, algorithm='md5'):
return binascii.hexlify(digest(value, algorithm)).decode()
from gargantext.constants import *
from gargantext.util.digest import str_digest
from gargantext.util import http
def save(contents, name='', basedir=''):
digest = str_digest(contents[:4096] + contents[-4096:])
path = basedir
for i in range(2, 8, 2):
path += '/' + digest[:i]
if not os.path.exists(path):
os.makedirs(path)
# save file and return its path
path = '%s/%s_%s' % (path, digest, name, )
open(path, 'wb').write(contents)
return path
def download(url, name=''):
return save(
contents = http.get(url),
name = name,
basedir = DOWNLOAD_DIRECTORY,
)
def upload(uploaded):
if uploaded.size > UPLOAD_LIMIT:
raise IOError('Uploaded file is bigger than allowed: %d > %d' % (
uploaded.size,
UPLOAD_LIMIT,
))
return save(
contents = uploaded.file.read(),
name = uploaded.name,
basedir = UPLOAD_DIRECTORY,
)
# -*- coding: utf-8 -*-
# Order in script: Alphabetical order (first_name, name, mail, website)
# Order in public: Shuffled order
import random
_members = [
{ 'first_name' : 'David', 'last_name' : 'Chavalarias',
'mail' : 'david.chavalariasATiscpif.fr',
'website' : 'http://chavalarias.com',
'picture' : 'david.jpg',
'role':'principal investigator'},
# { 'first_name' : 'Elias', 'last_name' : 'Showk',
# 'mail' : '',
# 'website' : 'https://github.com/elishowk',
# 'picture' : '', 'role' : 'developer'},
{ 'first_name' : 'Mathieu', 'last_name' : 'Rodic',
'mail' : '',
'website' : 'http://rodic.fr',
'picture' : 'mathieu.jpg',
'role' : 'developer'},
{ 'first_name' : 'Samuel', 'last_name' : 'Castillo J.',
'mail' : 'kaisleanATgmail.com',
'website' : 'http://www.pksm3.droppages.com',
'picture' : 'samuel.jpg',
'role' : 'developer'},
{ 'first_name' : 'Maziyar', 'last_name' : 'Panahi',
'mail' : '',
'website' : 'http://iscpif.fr',
'picture' : 'maziyar.jpg',
'role' : 'developer'},
{ 'first_name' : 'Romain', 'last_name' : 'Loth',
'mail' : '',
'website' : 'http://iscpif.fr',
'picture' : 'romain.jpg',
'role' : 'developer'},
{ 'first_name' : 'Alexandre', 'last_name' : 'Delanoë',
'mail' : 'alexandre+gargantextATdelanoe.org',
'website' : 'http://alexandre.delanoe.org',
'picture' : 'alexandre.jpg',
'role' : 'project manager'},
#{ 'first_name' : '', 'name' : '', 'mail' : '', 'website' : '', 'picture' : ''},
# copy-paste the line above and write your informations please
]
_institutions = [
{ 'name' : 'Mines ParisTech', 'website' : 'http://mines-paristech.fr', 'picture' : 'mines.png', 'funds':''},
{ 'name' : 'Institut Pasteur', 'website' : 'http://www.pasteur.fr', 'picture' : 'pasteur.png', 'funds':''},
{ 'name' : 'ADEME', 'website' : 'http://www.ademe.fr', 'picture' : 'ademe.png', 'funds':''},
{ 'name' : 'EHESS', 'website' : 'http://www.ehess.fr', 'picture' : 'ehess.png', 'funds':''},
#{ 'name' : '', 'website' : '', 'picture' : '', 'funds':''},
# copy paste the line above and write your informations please
]
_labs = [
{ 'name' : 'Centre de Sociologie de l\'innovation', 'website' : 'http://www.csi.mines-paristech.fr/en/', 'picture' : 'csi.png', 'funds':''},
#{ 'name' : '', 'website' : '', 'picture' : '', 'funds':''},
# copy paste the line above and write your informations please
]
_grants = [
{ 'name' : 'Forccast', 'website' : 'http://forccast.hypotheses.org/', 'picture' : 'forccast.png', 'funds':''},
{ 'name' : 'Mastodons', 'website' : 'http://www.cnrs.fr/mi/spip.php?article53&lang=fr', 'picture' : 'mastodons.png', 'funds':''},
#{ 'name' : '', 'website' : '', 'picture' : '', 'funds':''},
# copy paste the line above and write your informations please
]
def members():
random.shuffle(_members)
return _members
def institutions():
random.shuffle(_institutions)
return _institutions
def partners():
random.shuffle(_partners)
return _partners
def labs():
random.shuffle(_labs)
return _labs
def grants():
random.shuffle(_grants)
return _grants
import random
import random_words
from math import pi
def lorem(size_target=450):
'''
Function that returns paragraph with false latin language.
size_target is the number of random words that will be given.
'''
lorem = random_words.LoremIpsum()
sentences_list = lorem.get_sentences_list(sentences=5)
paragraph_size = 0
while paragraph_size < size_target :
sentences_list.append(lorem.get_sentence())
paragraph = ' '.join(sentences_list)
paragraph_size = len(paragraph)
return(paragraph)
def gargantua(size_target=500):
'''
Function that returns paragraph with chapter titles of Gargantua.
size_target is the number of random words that will be given.
'''
paragraph = list()
paragraph_size = 0
chapter_number = 1
while paragraph_size < size_target and chapter_number < 6:
chapitre = open('/srv/gargantext/static/docs/gargantua_book/gargantua_chapter_%d.txt' % chapter_number, 'r')
paragraph.append(random.choice(chapitre.readlines()).strip())
chapitre.close()
paragraph_size = len(' '.join(paragraph))
chapter_number += 1
return(' '.join(paragraph))
def random_letter(mot, size_min=5):
'''
Functions that randomize order letters of a
word which size is greater that size_min.
'''
if len(mot) > size_min:
size = round(len(mot) / pi)
first_letters = mot[:size]
last_letters = mot[-size:]
others_letters = list(mot[size:-size])
random.shuffle(others_letters)
mot_list = list()
mot_list.append(first_letters)
for letter in others_letters:
mot_list.append(letter)
mot_list.append(last_letters)
return(''.join(mot_list))
else:
return(mot)
tutoriel = """Il paraît que l'ordre des lettres dans un mot n'a pas d'importance. La première et la dernière lettre doivent être à la bonne place. Le reste peut être dans un désordre total et on peut toujours lire sans problème. On ne lit donc pas chaque lettre en elle-même, mais le mot comme un tout. Un changement de référentiel et nous transposons ce résultat au texte lui-même: l'ordre des mots est faiblement important comparé au contexte du texte qui, lui, est compté"""
def tutoreil(tutoriel=tutoriel):
'''
Functions that returns paragraph of words with words with
randomized letters.
'''
paragraph = ' '.join([ random_letter(mot) for mot in tutoriel.split(" ")]) \
+ ": comptexter avec Gargantext."
return(paragraph)
from django.template.loader import get_template
from django.http import Http404, HttpResponse, HttpResponseRedirect, HttpResponseForbidden
from django.shortcuts import render, redirect
from django import forms
from urllib.parse import quote_plus as urlencode
from gargantext import settings
# authentication
def requires_auth(func):
"""Provides a decorator to force authentication on a given view.
Also passes the URL to redirect towards as a GET parameter.
"""
def _requires_auth(request, *args, **kwargs):
if not request.user.is_authenticated():
url = '/auth/login/?next=%s' % urlencode(request.path)
return redirect(url)
return func(request, *args, **kwargs)
return _requires_auth
# download from a given URL
import urllib.request
def get(url):
response = urllib.request.urlopen(url)
return response.read()
# retrieve GET parameters from a request
def get_parameters(request):
parameters = {}
for key, value in request.GET._iterlists():
if key.endswith('[]'):
parameters[key[:-2]] = value
else:
parameters[key] = value[0]
return parameters
# REST
from rest_framework.views import APIView
# provide a JSON response
from gargantext.util.json import json_encoder
def JsonHttpResponse(data, status=200):
return HttpResponse(
content = json_encoder.encode(data),
content_type = 'application/json; charset=utf-8',
status = status
)
# provide exceptions for JSON APIs
from rest_framework.exceptions import APIException
from rest_framework.exceptions import ValidationError as ValidationException
import json
import types
import datetime
import traceback
import inspect
__all__ = ['json_encoder', 'json_dumps']
class JSONEncoder(json.JSONEncoder):
def default(self, obj):
from gargantext.util.db import Base
if isinstance(obj, Base):
return {
key: value
for key, value in obj.__dict__.items()
if not key.startswith('_')
}
elif isinstance(obj, datetime.datetime):
return obj.isoformat()[:19] + 'Z'
elif isinstance(obj, Exception):
tbe = traceback.TracebackException.from_exception(obj)
return list(line.strip() for line in tbe.format())
elif hasattr(obj, '__iter__') and not isinstance(obj, dict):
return list(obj)
else:
return super(self.__class__, self).default(obj)
# json_encoder = JSONEncoder(indent=4)
json_encoder = JSONEncoder() # compact json
def json_dumps(obj):
return json.dumps(obj, cls=JSONEncoder)
from gargantext.constants import *
class Language:
def __init__(self, iso2=None, iso3=None, name=None):
self.iso2 = iso2
self.iso3 = iso3
self.name = name
self.implemented = iso2 in LANGUAGES
def __str__(self):
result = '<Language'
for key, value in self.__dict__.items():
result += ' %s="%s"' % (key, value, )
result += '>'
return result
__repr__ = __str__
class Languages(dict):
def __missing__(self, key):
key = key.lower()
if key in self:
return self[key]
raise KeyError
languages = Languages()
import pycountry
pycountry_keys = (
('iso639_3_code', 'iso3', ),
('iso639_1_code', 'iso2', ),
('name', 'name', ),
('reference_name', None, ),
('inverted_name', None, ),
)
for pycountry_language in pycountry.languages:
language_properties = {}
for pycountry_key, key in pycountry_keys:
if key is not None and hasattr(pycountry_language, pycountry_key):
language_properties[key] = getattr(pycountry_language, pycountry_key)
language = Language(**language_properties)
for pycountry_key, key in pycountry_keys:
if hasattr(pycountry_language, pycountry_key):
languages[getattr(pycountry_language, pycountry_key).lower()] = language
# because PubMed has weird language codes:
languages['fre'] = languages['fr']
languages['ger'] = languages['de']
languages['Français'] = languages['fr']
languages['en_US'] = languages['en']
This diff is collapsed.
from gargantext.util.languages import languages
from gargantext.constants import LANGUAGES, DEFAULT_MAX_NGRAM_LEN, RULE_JJNN, RULE_JJDTNN
import nltk
import re
class NgramsExtractor:
def __init__(self, tagger):
self._tagger = tagger()
@staticmethod
def clean_text(text):
"""Clean the text for better POS tagging.
For now, only removes (short) XML tags.
"""
return re.sub(r'<[^>]{0,45}>', '', text)
def extract(self, text, rule=RULE_JJNN, label='NP', max_n_words=DEFAULT_MAX_NGRAM_LEN):
text = self.clean_text(text)
grammar = nltk.RegexpParser(label + ': ' + rule)
tagged_tokens = list(self._tagger.tag_text(text))
if len(tagged_tokens):
grammar_parsed = grammar.parse(tagged_tokens)
for subtree in grammar_parsed.subtrees():
if subtree.label() == label:
if len(subtree) < max_n_words:
yield subtree.leaves()
# ex: [('wild', 'JJ'), ('pollinators', 'NNS')]
class NgramsExtractors(dict):
def __missing__(self, key):
if not isinstance(key, str):
raise KeyError
if len(key) == 2 and key == key.lower():
tagger = LANGUAGES[key]['tagger']
self[key] = NgramsExtractor(tagger)
else:
self[key] = self[LANGUAGES[key].iso3]
return self[key]
# this below will be shared within the current thread
ngramsextractors = NgramsExtractors()
from ._Parser import Parser
# from ..NgramsExtractors import *
import sys
import csv
csv.field_size_limit(sys.maxsize)
import numpy as np
import os
class CSVParser(Parser):
def CSVsample( self, small_contents , delim) :
reader = csv.reader(small_contents, delimiter=delim)
Freqs = []
for row in reader:
Freqs.append(len(row))
return Freqs
def parse(self, filebuf):
print("CSV: parsing (assuming UTF-8 and LF line endings)")
contents = filebuf.read().decode("UTF-8").split("\n")
sample_size = 10
sample_contents = contents[0:sample_size]
hyperdata_list = []
# # = = = = [ Getting delimiters frequency ] = = = = #
PossibleDelimiters = [ ',',' ','\t', ';', '|', ':' ]
AllDelimiters = {}
for delim in PossibleDelimiters:
AllDelimiters[delim] = self.CSVsample( sample_contents , delim )
# # = = = = [ / Getting delimiters frequency ] = = = = #
# # OUTPUT example:
# # AllDelimiters = {
# # '\t': [1, 1, 1, 1, 1],
# # ' ': [1, 13, 261, 348, 330],
# # ',': [15, 15, 15, 15, 15],
# # ';': [1, 1, 1, 1, 1],
# # '|': [1, 1, 1, 1, 1]
# # }
# # = = = = [ Stand.Dev=0 & Sum of delimiters ] = = = = #
Delimiters = []
for d in AllDelimiters:
freqs = AllDelimiters[d]
suma = np.sum( freqs )
if suma >0:
std = np.std( freqs )
# print [ d , suma , len(freqs) , std]
if std == 0:
Delimiters.append ( [ d , suma , len(freqs) , std] )
# # = = = = [ / Stand.Dev=0 & Sum of delimiters ] = = = = #
# # OUTPUT example:
# # Delimiters = [
# # ['\t', 5, 5, 0.0],
# # [',', 75, 5, 0.0],
# # ['|', 5, 5, 0.0]
# # ]
# # = = = = [ Delimiter selection ] = = = = #
Sorted_Delims = sorted(Delimiters, key=lambda x: x[1], reverse=True)
HighestDelim = Sorted_Delims[0][0]
# HighestDelim = ","
print("CSV selected delimiter:",[HighestDelim])
# # = = = = [ / Delimiter selection ] = = = = #
# # = = = = [ First data coordinate ] = = = = #
Coords = {
"row": -1,
"column": -1
}
reader = csv.reader(contents, delimiter=HighestDelim)
for rownum, tokens in enumerate(reader):
if rownum % 250 == 0:
print("CSV row: ", rownum)
joined_tokens = "".join (tokens)
if Coords["row"]<0 and len( joined_tokens )>0 :
Coords["row"] = rownum
for columnum in range(len(tokens)):
t = tokens[columnum]
if len(t)>0:
Coords["column"] = columnum
break
# # = = = = [ / First data coordinate ] = = = = #
# # = = = = [ Setting Headers ] = = = = #
Headers_Int2Str = {}
reader = csv.reader(contents, delimiter=HighestDelim)
for rownum, tokens in enumerate(reader):
if rownum>=Coords["row"]:
for columnum in range( Coords["column"],len(tokens) ):
t = tokens[columnum]
Headers_Int2Str[columnum] = t
break
# print("Headers_Int2Str")
# print(Headers_Int2Str)
# # = = = = [ / Setting Headers ] = = = = #
# # OUTPUT example:
# # Headers_Int2Str = {
# # 0: 'publication_date',
# # 1: 'publication_month',
# # 2: 'publication_second',
# # 3: 'abstract'
# # }
# # = = = = [ Reading the whole CSV and saving ] = = = = #
hyperdata_list = []
reader = csv.reader(contents, delimiter=HighestDelim)
for rownum, tokens in enumerate(reader):
if rownum>Coords["row"]:
RecordDict = {}
for columnum in range( Coords["column"],len(tokens) ):
data = tokens[columnum]
RecordDict[ Headers_Int2Str[columnum] ] = data
if len(RecordDict.keys())>0:
hyperdata_list.append( RecordDict )
# # = = = = [ / Reading the whole CSV and saving ] = = = = #
return hyperdata_list
This diff is collapsed.
from ._Parser import Parser
from datetime import datetime
from io import BytesIO
import json
class ISTexParser(Parser):
def parse(self, filebuf):
contents = filebuf.read().decode("UTF-8")
data = json.loads(contents)
filebuf.close()
json_docs = data["hits"]
hyperdata_list = []
hyperdata_path = {
"id" : "id",
"source" : 'corpusName',
"title" : 'title',
"genre" : "genre",
"language_iso3" : 'language',
"doi" : 'doi',
"host" : 'host',
"publication_date" : 'publicationDate',
"abstract" : 'abstract',
# "authors" : 'author',
"authorsRAW" : 'author',
"keywords" : "keywords"
}
suma = 0
for json_doc in json_docs:
hyperdata = {}
for key, path in hyperdata_path.items():
try:
# print(path," ==> ",len(json_doc[path]))
hyperdata[key] = json_doc[path]
except:
pass
# print("|",hyperdata["language_iso3"])
if "doi" in hyperdata:
hyperdata["doi"] = hyperdata["doi"][0]
keywords = []
if "keywords" in hyperdata:
for keyw in hyperdata["keywords"]:
keywords.append(keyw["value"] )
hyperdata["keywords"] = ", ".join( keywords )
moredate=False
moresource=False
if "host" in hyperdata:
if "genre" in hyperdata["host"] and len(hyperdata["host"]["genre"])>0:
if "genre" in hyperdata and len(hyperdata["genre"])==0:
hyperdata["genre"] = hyperdata["host"]["genre"]
# print(hyperdata["host"])
if "pubdate" in hyperdata["host"]:
onebuffer = hyperdata["publication_date"]
hyperdata["publication_date"] = []
hyperdata["publication_date"].append(onebuffer)
hyperdata["publication_date"].append( hyperdata["host"]["pubdate"] )
if "title" in hyperdata["host"]:
hyperdata["journal"] = hyperdata["host"]["title"]
authors=False
if "authorsRAW" in hyperdata:
names = []
for author in hyperdata["authorsRAW"]:
names.append(author["name"])
hyperdata["authors"] = ", ".join(names)
if "host" in hyperdata: hyperdata.pop("host")
if "genre" in hyperdata:
if len(hyperdata["genre"])==0:
hyperdata.pop("genre")
if "language_iso3" in hyperdata:
# retrieve lang if lang != [] and lang != ["unknown"]
# ---------------------------------------------------
if len(hyperdata["language_iso3"])>0 and hyperdata["language_iso3"][0] != "unknown" :
hyperdata["language_iso3"] = hyperdata["language_iso3"][0]
# default value = eng
# possible even better: langid.classify(abstract)
else:
# NB 97% des docs istex sont eng donc par défaut
# ----------------------------------------------
hyperdata["language_iso3"] = "eng"
# (cf. api.istex.fr/document/?q=*&facet=language
# et tests langid sur les language=["unknown"])
if "publication_date" in hyperdata:
RealDate = hyperdata["publication_date"]
if "publication_date" in hyperdata:
hyperdata.pop("publication_date")
if isinstance(RealDate, list):
RealDate = RealDate[0]
# print( RealDate ," | length:",len(RealDate))
Decision=""
if len(RealDate)>4:
if len(RealDate)>8:
try: Decision = datetime.strptime(RealDate, '%Y-%b-%d').date()
except:
try: Decision = datetime.strptime(RealDate, '%Y-%m-%d').date()
except: Decision=False
else:
try: Decision = datetime.strptime(RealDate, '%Y-%b').date()
except:
try: Decision = datetime.strptime(RealDate, '%Y-%m').date()
except: Decision=False
else:
try: Decision = datetime.strptime(RealDate, '%Y').date()
except: Decision=False
if Decision!=False:
hyperdata["publication_year"] = str(Decision.year)
hyperdata["publication_month"] = str(Decision.month)
hyperdata["publication_day"] = str(Decision.day)
hyperdata_list.append(hyperdata)
# print("\t||",hyperdata["title"])
# print("\t\t",Decision)
# print("=============================")
# else:
# suma+=1
# if "pubdate" in json_doc:
# print ("\tfail pubdate:",json_doc["pubdate"])
# print ("nb_hits:",len(json_docs))
# print("\t - nb_fails:",suma)
# print(" -- - - - - - -- - -")
return hyperdata_list
from .Ris import RISParser
class ISIParser(RISParser):
_begin = 3
_parameters = {
b"ER": {"type": "delimiter"},
b"TI": {"type": "hyperdata", "key": "title", "separator": " "},
b"AU": {"type": "hyperdata", "key": "authors", "separator": ", "},
b"DI": {"type": "hyperdata", "key": "doi"},
b"SO": {"type": "hyperdata", "key": "journal"},
b"PY": {"type": "hyperdata", "key": "publication_year"},
b"PD": {"type": "hyperdata", "key": "publication_month"},
b"LA": {"type": "hyperdata", "key": "language_fullname"},
b"AB": {"type": "hyperdata", "key": "abstract", "separator": " "},
b"WC": {"type": "hyperdata", "key": "fields"},
}
This diff is collapsed.
from ._Parser import Parser
from gargantext.util.languages import languages
#from admin.utils import PrintException
class RISParser(Parser):
# def __init__(self, language_cache=None):
#
# #super(Parser, self).__init__()
# #super(Parser, self).__init__()
# self._languages_cache = LanguagesCache() if language_cache is None else language_cache
_begin = 6
_parameters = {
b"ER": {"type": "delimiter"},
b"TI": {"type": "hyperdata", "key": "title", "separator": " "},
b"ST": {"type": "hyperdata", "key": "subtitle", "separator": " "},
b"AU": {"type": "hyperdata", "key": "authors", "separator": ", "},
b"T2": {"type": "hyperdata", "key": "journal"},
b"UR": {"type": "hyperdata", "key": "doi"},
b"PY": {"type": "hyperdata", "key": "publication_year"},
b"PD": {"type": "hyperdata", "key": "publication_month"},
b"LA": {"type": "hyperdata", "key": "language_iso2"},
b"AB": {"type": "hyperdata", "key": "abstract", "separator": " "},
b"WC": {"type": "hyperdata", "key": "fields"},
}
def parse(self, file):
hyperdata = {}
last_key = None
last_values = []
# browse every line of the file
for line in file:
if len(line) > 2 :
# extract the parameter key
parameter_key = line[:2]
if parameter_key != b' ' and parameter_key != last_key:
if last_key in self._parameters:
# translate the parameter key
parameter = self._parameters[last_key]
if parameter["type"] == "hyperdata":
separator = parameter["separator"] if "separator" in parameter else ""
hyperdata[parameter["key"]] = separator.join(last_values)
elif parameter["type"] == "delimiter":
if 'language_fullname' not in hyperdata.keys():
if 'language_iso3' not in hyperdata.keys():
if 'language_iso2' not in hyperdata.keys():
hyperdata['language_iso2'] = 'en'
yield hyperdata
hyperdata = {}
last_key = parameter_key
last_values = []
try:
last_values.append(line[self._begin:-1].decode())
except Exception as error:
print(error)
# if a hyperdata object is left in memory, yield it as well
if hyperdata:
yield hyperdata
This diff is collapsed.
from .Ris import RISParser
from .Isi import ISIParser
# from .Jstor import JstorParser
# from .Zotero import ZoteroParser
from .Pubmed import PubmedParser
# # 2015-12-08: parser 2 en 1
from .Europress import EuropressParser
from .ISTex import ISTexParser
from .CSV import CSVParser
#from .CERN import CernParser
This diff is collapsed.
This diff is collapsed.
from ._Tagger import Tagger
import nltk
class NltkTagger(Tagger):
def tag_tokens(self, tokens, single=True):
return nltk.pos_tag(tokens)
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
from .TurboTagger import TurboTagger
from .NltkTagger import NltkTagger
from .TreeTagger import TreeTagger
from .MeltTagger import EnglishMeltTagger, FrenchMeltTagger
/srv/gargantext_lib/taggers/melttagger
\ No newline at end of file
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment