Commit 29942ecd authored by Alexandre Delanoë's avatar Alexandre Delanoë

[FIX]

parents 32de2d3f 742b8194
......@@ -2,6 +2,11 @@
* Guided Tour
* Sources form highlighting crawlers
## Version 3.0.8.1
* WOS parser date FIX
* EUROPRESS parser author/text article FIX
* Backend: each project as user node as parent
## Version 3.0.7
* Alembic implemented to manage database migrations
......
......@@ -6,6 +6,21 @@ Keep in mind that Alembic only handles SQLAlchemy models: tables created from
Django ORM must be put out of Alembic sight. See [alembic:exclude] section in
alembic.ini.
To bootstrap Alembic where a gargantext database is already existing see
below: TELL ALEMBIC TO NOT START FROM SCRATCH.
USUAL WORKFLOW WITH ALEMBIC
1. Make change to models in gargantext/models
2. Autogenerate revision (see below GENERATE A REVISION)
3. Manually check and edit revision file in alembic/versions
4. Commit alembic revision (it should never be reverted)
5. Commit changes in models (it can be reverted if needed)
To create, drop or modify views, schemas, roles, stored procedures, triggers or
policies see below: REPLACEABLE OBJECTS.
TELL ALEMBIC TO NOT START FROM SCRATCH
......@@ -29,25 +44,76 @@ DOWNGRADE TO INITIAL DATABASE STATE
alembic downgrade base
GENERATE A NEW REVISION
GENERATE A REVISION
alembic revision -m "Message for this migration"
alembic revision --autogenerate -m "Message for this migration"
# A migration script is then created in alembic/versions directory. For
# example alembic/versions/3adcc9a56557_message_for_this_migration.py
# where 3adcc9a56557 is the revision id generated by Alembic.
#
# Alembic should generate a script reflecting changes already made in
# models or database. However it is always a good idea to check it and edit
# it manually, Alembic is not always accurate and can't see all alterations.
# It should work with basic changes such as model or column creation. See
# http://alembic.zzzcomputing.com/en/latest/autogenerate.html#what-does-autogenerate-detect-and-what-does-it-not-detect
GENERATE AN EMPTY REVISION
alembic revision -m "Message for this migration"
# This script must be edited to write the migration itself, mainly
# in `upgrade` and `downgrade` functions. See Alembic documentation for
# further details.
GENERATE A REVISION FROM CURRENT STATE
REPLACEABLE OBJECTS
alembic revision --autogenerate -m "Message for this migration"
There is no specific way no handle views, schemas, roles, stored procedures,
triggers or policies with Alembic. To ease revisions of such objects, avoid
boilerplate code and too much op.execute we use an enhanced version of
ReplaceableObject recipe (see Alembic documentation).
# Alembic should generate a script reflecting changes already made in
# database. However it is always a good idea to check it and edit it
# manually, Alembic is not always accurate and can't see all alterations.
# It should work with basic changes such as model or column creation. See
# http://alembic.zzzcomputing.com/en/latest/autogenerate.html#what-does-autogenerate-detect-and-what-does-it-not-detect
To create, drop or modify such object you need to make a ReplaceableObject
instance, and then use create_*, drop_* or replace_* method of alembic.op.
Conversion between ReplaceableObject and SQL is implemented in
gargantext/util/alembic.py.
* Views: create_view(ReplaceableObject(<name>, <query>))
* Roles: create_role(ReplaceableObject(<name>, <options>))
* Schemas: create_schema(ReplaceableObject(<name>))
* Stored procedures: create_sp(ReplaceableObject(<name(arguments)>, <body>)
* Triggers: create_trigger(ReplaceableObject(<name>, <when>, <table>, <body>))
* Policies: create_policy(ReplaceableObject(<name>, <table>, <body>))
Here is an example with a stored procedure:
...
from gargantext.util.alembic import ReplaceableObject
revision = '08230100f512'
...
my_function_sp = ReplaceableObject(
"my_function()", "RETURNS integer AS $$ SELECT 42 $$ LANGUAGE sql")
def upgrade():
op.create_sp(my_function_sp)
def downgrade():
op.drop_sp(my_function_sp)
To modify this stored procedure in a later revision:
...
from gargantext.util.alembic import ReplaceableObject
my_function_sp = ReplaceableObject(
"my_function()", "RETURNS integer AS $$ SELECT 43 $$ LANGUAGE sql")
def upgrade():
op.replace_sp(my_function_sp, replaces="08230100f512.my_function_sp")
def downgrade():
op.replace_sp(my_function_sp, replace_with="08230100f512.my_function_sp")
......@@ -18,7 +18,8 @@ from gargantext import settings, models
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config
config.set_main_option("sqlalchemy.url", settings.DATABASES['default']['URL'])
config.set_main_option("sqlalchemy.url",
settings.DATABASES['default']['SECRET_URL'])
# Interpret the config file for Python logging.
# This line sets up loggers basically.
......@@ -52,6 +53,14 @@ def include_object(obj, name, typ, reflected, compare_to):
return True
context_opts = dict(
target_metadata=target_metadata,
include_object=include_object,
compare_server_default=True,
compare_type=True,
)
def run_migrations_offline():
"""Run migrations in 'offline' mode.
......@@ -65,9 +74,7 @@ def run_migrations_offline():
"""
url = config.get_main_option("sqlalchemy.url")
context.configure(
url=url, target_metadata=target_metadata, literal_binds=True,
include_object=include_object)
context.configure(url=url, literal_binds=True, **context_opts)
with context.begin_transaction():
context.run_migrations()
......@@ -86,11 +93,7 @@ def run_migrations_online():
poolclass=pool.NullPool)
with connectable.connect() as connection:
context.configure(
connection=connection,
target_metadata=target_metadata,
include_object=include_object
)
context.configure(connection=connection, **context_opts)
with context.begin_transaction():
context.run_migrations()
......
"""Fix bug in title_abstract indexation
Revision ID: 159a5154362b
Revises: 73112a361617
Create Date: 2017-09-18 18:00:26.055335
"""
from alembic import op
import sqlalchemy as sa
from gargantext.util.alembic import ReplaceableObject
# revision identifiers, used by Alembic.
revision = '159a5154362b'
down_revision = '73112a361617'
branch_labels = None
depends_on = None
title_abstract_insert = ReplaceableObject(
'title_abstract_insert',
'BEFORE INSERT',
'nodes',
"""FOR EACH ROW
WHEN (NEW.hyperdata::text <> '{}'::text)
EXECUTE PROCEDURE title_abstract_update_trigger()"""
)
title_abstract_update = ReplaceableObject(
'title_abstract_update',
'BEFORE UPDATE OF hyperdata',
'nodes',
"""FOR EACH ROW
WHEN ((OLD.hyperdata ->> 'title', OLD.hyperdata ->> 'abstract')
IS DISTINCT FROM
(NEW.hyperdata ->> 'title', NEW.hyperdata ->> 'abstract'))
EXECUTE PROCEDURE title_abstract_update_trigger()"""
)
def upgrade():
op.replace_trigger(title_abstract_insert, replaces="73112a361617.title_abstract_insert")
op.replace_trigger(title_abstract_update, replaces="73112a361617.title_abstract_update")
# Manually re-build index
op.execute("UPDATE nodes SET title_abstract = to_tsvector('english', (hyperdata ->> 'title') || ' ' || (hyperdata ->> 'abstract')) WHERE typename=4")
def downgrade():
# Won't unfix the bug !
pass
"""Add english fulltext index on Nodes.hyperdata for abstract and title
Revision ID: 1fb4405b59e1
Revises: bedce47c9e34
Create Date: 2017-09-13 16:31:36.926692
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy_utils.types import TSVectorType
from gargantext.util.alembic import ReplaceableObject
# revision identifiers, used by Alembic.
revision = '1fb4405b59e1'
down_revision = 'bedce47c9e34'
branch_labels = None
depends_on = None
title_abstract_update_trigger = ReplaceableObject(
'title_abstract_update_trigger()',
"""
RETURNS trigger AS $$
begin
new.title_abstract := to_tsvector('english', (new.hyperdata ->> 'title') || ' ' || (new.hyperdata ->> 'abstract'));
return new;
end
$$ LANGUAGE plpgsql;
"""
)
title_abstract_update = ReplaceableObject(
'title_abstract_update',
'BEFORE INSERT OR UPDATE',
'nodes',
'FOR EACH ROW EXECUTE PROCEDURE title_abstract_update_trigger()'
)
def upgrade():
op.add_column('nodes', sa.Column('title_abstract', TSVectorType))
op.create_sp(title_abstract_update_trigger)
op.create_trigger(title_abstract_update)
# Initialize index with already existing data
op.execute('UPDATE nodes SET hyperdata = hyperdata');
def downgrade():
op.drop_trigger(title_abstract_update)
op.drop_sp(title_abstract_update_trigger)
op.drop_column('nodes', 'title_abstract')
"""Optimize title_abstract indexation
Revision ID: 73112a361617
Revises: 1fb4405b59e1
Create Date: 2017-09-15 14:14:51.737963
"""
from alembic import op
import sqlalchemy as sa
from gargantext.util.alembic import ReplaceableObject
# revision identifiers, used by Alembic.
revision = '73112a361617'
down_revision = '1fb4405b59e1'
branch_labels = None
depends_on = None
title_abstract_insert = ReplaceableObject(
'title_abstract_insert',
'AFTER INSERT',
'nodes',
"""FOR EACH ROW
WHEN (NEW.hyperdata::text <> '{}'::text)
EXECUTE PROCEDURE title_abstract_update_trigger()"""
)
title_abstract_update = ReplaceableObject(
'title_abstract_update',
'AFTER UPDATE OF hyperdata',
'nodes',
"""FOR EACH ROW
WHEN ((OLD.hyperdata ->> 'title', OLD.hyperdata ->> 'abstract')
IS DISTINCT FROM
(NEW.hyperdata ->> 'title', NEW.hyperdata ->> 'abstract'))
EXECUTE PROCEDURE title_abstract_update_trigger()"""
)
def upgrade():
op.replace_trigger(title_abstract_update, replaces="1fb4405b59e1.title_abstract_update")
op.create_trigger(title_abstract_insert)
def downgrade():
op.drop_trigger(title_abstract_insert)
op.replace_trigger(title_abstract_update, replace_with="1fb4405b59e1.title_abstract_update")
"""Add server side sensible defaults for nodes
Revision ID: 73304ae9f1fb
Revises: 159a5154362b
Create Date: 2017-10-05 14:17:58.326646
"""
from alembic import op
import sqlalchemy as sa
import gargantext
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = '73304ae9f1fb'
down_revision = '159a5154362b'
branch_labels = None
depends_on = None
def upgrade():
op.alter_column('nodes', 'date',
existing_type=postgresql.TIMESTAMP(timezone=True),
server_default=sa.text('CURRENT_TIMESTAMP'),
nullable=False)
op.alter_column('nodes', 'hyperdata',
existing_type=postgresql.JSONB(astext_type=sa.Text()),
server_default=sa.text("'{}'::jsonb"),
nullable=False)
op.alter_column('nodes', 'name',
existing_type=sa.VARCHAR(length=255),
server_default='',
nullable=False)
op.alter_column('nodes', 'typename',
existing_type=sa.INTEGER(),
nullable=False)
op.alter_column('nodes', 'user_id',
existing_type=sa.INTEGER(),
nullable=False)
def downgrade():
op.alter_column('nodes', 'user_id',
existing_type=sa.INTEGER(),
nullable=True)
op.alter_column('nodes', 'typename',
existing_type=sa.INTEGER(),
nullable=True)
op.alter_column('nodes', 'name',
existing_type=sa.VARCHAR(length=255),
server_default=None,
nullable=True)
op.alter_column('nodes', 'hyperdata',
existing_type=postgresql.JSONB(astext_type=sa.Text()),
server_default=None,
nullable=True)
op.alter_column('nodes', 'date',
existing_type=postgresql.TIMESTAMP(timezone=True),
server_default=None,
nullable=True)
......@@ -7,6 +7,12 @@
$httpProvider.defaults.xsrfHeaderName = 'X-CSRFToken';
$httpProvider.defaults.xsrfCookieName = 'csrftoken';
}]);
function url(path) {
// adding explicit "http[s]://" -- for cross origin requests
return location.protocol + '//' + window.GARG_ROOT_URL + path;
}
/*
* DocumentHttpService: Read Document
* ===================
......@@ -98,9 +104,7 @@
*/
http.factory('MainApiAddNgramHttpService', function($resource) {
return $resource(
// adding explicit "http://" b/c this a cross origin request
'http://' + window.GARG_ROOT_URL
+ "/api/ngrams?text=:ngramStr&corpus=:corpusId&testgroup",
url("/api/ngrams?text=:ngramStr&corpus=:corpusId&testgroup"),
{
ngramStr: '@ngramStr',
corpusId: '@corpusId',
......@@ -131,9 +135,7 @@
http.factory('MainApiChangeNgramHttpService', function($resource) {
return $resource(
// adding explicit "http://" b/c this a cross origin request
'http://' + window.GARG_ROOT_URL
+ "/api/ngramlists/change?list=:listId&ngrams=:ngramIdList",
url("/api/ngramlists/change?list=:listId&ngrams=:ngramIdList"),
{
listId: '@listId',
ngramIdList: '@ngramIdList' // list in str form (sep=","): "12,25,30"
......@@ -171,8 +173,7 @@
*/
http.factory('MainApiFavoritesHttpService', function($resource) {
return $resource(
// adding explicit "http://" b/c this a cross origin request
'http://' + window.GARG_ROOT_URL + "/api/nodes/:corpusId/favorites?docs=:docId",
url("/api/nodes/:corpusId/favorites?docs=:docId"),
{
corpusId: '@corpusId',
docId: '@docId'
......
......@@ -89,9 +89,9 @@
</div>
<div class="row-fluid">
<ul class="list-group clearfix">
<li class="list-group-item small"><span class="badge">source</span>{[{source}]}</li>
<li class="list-group-item small"><span class="badge">authors</span>{[{authors}]}</li>
<li class="list-group-item small"><span class="badge">date</span>{[{publication_date}]}</li>
<li class="list-group-item small"><span class="badge">source</span>{[{source || '&nbsp;'}]}</li>
<li class="list-group-item small"><span class="badge">authors</span>{[{authors || '&nbsp;'}]}</li>
<li class="list-group-item small"><span class="badge">date</span>{[{publication_date || '&nbsp;'}]}</li>
</ul>
</div>
......
......@@ -2,8 +2,6 @@
[uwsgi]
# uwsgi --vacuum --socket monsite/mysite.sock --wsgi-file monsite/wsgi.py --chmod-socket=666 --home=/srv/alexandre.delanoe/env --chdir=/var/www/www/alexandre/monsite --env
env = DJANGO_SETTINGS_MODULE=gargantext.settings
#module = django.core.handlers.wsgi:WSGIHandler()
......@@ -44,7 +42,7 @@ touch-reload = /tmp/gargantext.reload
# respawn processes taking more than 20 seconds
harakiri = 120
harakiri = 1200
post-buffering=8192
# limit the project to 128 MB
......@@ -55,7 +53,18 @@ max-requests = 5000
# background the process & log
#daemonize = /var/log/uwsgi/gargantext.log
uid = 1000
gid = 1000
daemonize = /var/log/gargantext/uwsgi/@(exec://date +%%Y-%%m-%%d_%%H%%M).log
log-reopen = true
#uid = 1000
#gid = 1000
#
how-config=true
disable-logging=false
logfile-chmod=644
#logfile-chown=false
log-maxsize=500000000
##logto=%(chdir)logs/uwsgi_access.log
#logger = longquery file:%(chdir)logs/uwsgi_long.log
#log-route = longquery msec
#
from sqlalchemy.schema import Column, ForeignKey, UniqueConstraint, Index
from sqlalchemy.orm import relationship, validates
from sqlalchemy.types import TypeDecorator, \
Integer, Float, Boolean, DateTime, String, Text
Integer, REAL, Boolean, DateTime, String, Text
from sqlalchemy_utils.types import TSVectorType
from sqlalchemy.dialects.postgresql import JSONB, DOUBLE_PRECISION as Double
from sqlalchemy.ext.mutable import MutableDict, MutableList
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import text
__all__ = ["Column", "ForeignKey", "UniqueConstraint", "relationship",
__all__ = ["Column", "ForeignKey", "UniqueConstraint", "Index", "relationship",
"text",
"validates", "ValidatorMixin",
"Integer", "Float", "Boolean", "DateTime", "String", "Text",
"TSVectorType",
"TypeDecorator",
"JSONB", "Double",
"MutableDict", "MutableList",
......@@ -25,6 +29,16 @@ Base = declarative_base()
DjangoBase = declarative_base()
class Float(REAL):
"""Reflect exact REAL type for PostgreSQL in order to avoid confusion
within Alembic type comparison"""
def __init__(self, *args, **kwargs):
if kwargs.get('precision') == 24:
kwargs.pop('precision')
super(Float, self).__init__(*args, **kwargs)
class ValidatorMixin(object):
def enforce_length(self, key, value):
"""Truncate a string according to its column length
......
......@@ -2,14 +2,11 @@ from gargantext.util.db import session
from gargantext.util.files import upload
from gargantext.constants import *
# Uncomment to make column full text searchable
#from sqlalchemy_utils.types import TSVectorType
from datetime import datetime
from .base import Base, Column, ForeignKey, relationship, TypeDecorator, Index, \
Integer, Float, String, DateTime, JSONB, \
MutableList, MutableDict, validates, ValidatorMixin
Integer, Float, String, DateTime, JSONB, TSVectorType, \
MutableList, MutableDict, validates, ValidatorMixin, text
from .users import User
__all__ = ['Node', 'NodeNode', 'CorpusNode']
......@@ -47,7 +44,7 @@ class Node(ValidatorMixin, Base):
>>> session.query(Node).filter_by(typename='USER').first() # doctest: +ELLIPSIS
<UserNode(...)>
But beware, there are some caveats with bulk queries. In this case typename
But beware, there are some pitfalls with bulk queries. In this case typename
MUST be specified manually.
>>> session.query(UserNode).delete() # doctest: +SKIP
......@@ -60,28 +57,33 @@ class Node(ValidatorMixin, Base):
Index('nodes_user_id_typename_parent_id_idx', 'user_id', 'typename', 'parent_id'),
Index('nodes_hyperdata_idx', 'hyperdata', postgresql_using='gin'))
# TODO
# create INDEX full_text_idx on nodes using gin(to_tsvector('english', hyperdata ->> 'abstract' || 'title'));
id = Column(Integer, primary_key=True)
typename = Column(NodeType, index=True)
typename = Column(NodeType, index=True, nullable=False)
__mapper_args__ = { 'polymorphic_on': typename }
# foreign keys
user_id = Column(Integer, ForeignKey(User.id, ondelete='CASCADE'))
user_id = Column(Integer, ForeignKey(User.id, ondelete='CASCADE'),
nullable=False)
user = relationship(User)
parent_id = Column(Integer, ForeignKey('nodes.id', ondelete='CASCADE'))
parent = relationship('Node', remote_side=[id])
name = Column(String(255))
date = Column(DateTime(timezone=True), default=datetime.now)
name = Column(String(255), nullable=False, server_default='')
date = Column(DateTime(timezone=True), nullable=False,
server_default=text('CURRENT_TIMESTAMP'))
hyperdata = Column(JSONB, default=dict, nullable=False,
server_default=text("'{}'::jsonb"))
hyperdata = Column(JSONB, default=dict)
# metadata (see https://bashelton.com/2014/03/updating-postgresql-json-fields-via-sqlalchemy/)
# To make search possible uncomment the line below
#search_vector = Column(TSVectorType('hyperdata'))
# Create a TSVECTOR column to use fulltext search feature of PostgreSQL.
# We need to create a trigger to update this column on update and insert,
# it's created in alembic/version/1fb4405b59e1_add_english_fulltext_index_on_nodes_.py
#
# To use this column: session.query(DocumentNode) \
# .filter(Node.title_abstract.match('keyword'))
title_abstract = Column(TSVectorType(regconfig='english'))
def __new__(cls, *args, **kwargs):
if cls is Node and kwargs.get('typename'):
......
"""
Django settings for gargantext project.
Generated by 'django-admin startproject' using Django 1.9.2.
For more information on this file, see
https://docs.djangoproject.com/en/1.9/topics/settings/
For the full list of settings and their values, see
https://docs.djangoproject.com/en/1.9/ref/settings/
"""
import os
# Build paths inside the project like this: os.path.join(BASE_DIR, ...)
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
# Quick-start development settings - unsuitable for production
# See https://docs.djangoproject.com/en/1.9/howto/deployment/checklist/
# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = '!%ktkh981)piil1%t5r0g4$^0=uvdafk!=f2x8djxy7_gq(n5%'
# SECURITY WARNING: don't run with debug turned on in production!
DEBUG = True
MAINTENANCE = False
BASE_URL = "testing.gargantext.org"
ALLOWED_HOSTS = ["localhost", ".gargantext.org", ".iscpif.fr",]
# Asynchronous tasks
import djcelery
djcelery.setup_loader()
BROKER_URL = 'amqp://guest:guest@localhost:5672/'
CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
CELERY_TIMEZONE = 'Europe/Paris'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
CELERY_IMPORTS = (
"gargantext.util.toolchain",
"gargantext.util.crawlers",
"graph.graph",
"moissonneurs.pubmed",
"moissonneurs.istex",
"gargantext.util.ngramlists_tools",
)
# garg's custom unittests runner (adapted to our db models)
TEST_RUNNER = 'unittests.framework.GargTestRunner'
# Application definition
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'rest_framework',
'djcelery',
'annotations',
'graph',
'moissonneurs',
'gargantext',
]
MIDDLEWARE_CLASSES = [
'django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.auth.middleware.SessionAuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
]
ROOT_URLCONF = 'gargantext.urls'
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [
os.path.join(BASE_DIR, 'templates'),
#'./templates'
],
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]
WSGI_APPLICATION = 'gargantext.wsgi.application'
# http://getblimp.github.io/django-rest-framework-jwt/#additional-settings
REST_FRAMEWORK = {
'DEFAULT_PERMISSION_CLASSES': (
'rest_framework.permissions.IsAuthenticated',
),
'DEFAULT_AUTHENTICATION_CLASSES': (
'rest_framework_jwt.authentication.JSONWebTokenAuthentication',
'rest_framework.authentication.SessionAuthentication',
'rest_framework.authentication.BasicAuthentication',
),
}
JWT_AUTH = {
'JWT_VERIFY_EXPIRATION': False,
'JWT_SECRET_KEY': SECRET_KEY,
'JWT_AUTH_HEADER_PREFIX': 'Bearer',
}
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/1.9/howto/static-files/
STATIC_ROOT = '/srv/gargantext_static/'
STATIC_URL = '/static/'