...
 
Commits (1513)
*.pyc
*/__pycache__/*
VENV/*
install/docker/gargantext_lib.tar.bz2
## TODO
* Guided Tour
* Sources form highlighting crawlers
## Version 3.0.7
* Alembic implemented to manage database migrations
## Version 3.0.6.8
* REPEC Crawler (connection with https://multivac.iscpif.fr)
* HAL Crawler (connection to https://hal.archives-ouvertes.fr/)
* New Graph Feature: color nodes by growth
## Version 3.0.6.4
* COOC SQL improved
## Version 3.0.6.3
* New buttons
* Big graphs with conditional distance
## Version 3.0.6.2
* SQL Table change. Update:
* psql gargandb
* drop table contacts
* ./manage.py dbmigrate.py
* Init accounts with ISC-PIF partner group
* Link to Licence
## Version 3.0.6
* New Menu
* Links to Documentation
* Contextual Help English or French (French by default)
* User parameters stored in Node.hyperdata with typename 'USER'
Each user has only one node with typename 'USER'
# Contributor Covenant Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, gender identity and expression, level of experience,
nationality, personal appearance, race, religion, or sexual identity and
orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment
include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at [sos AT gargantext DOT org]. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at [http://contributor-covenant.org/version/1/4][version]
[homepage]: http://contributor-covenant.org
[version]: http://contributor-covenant.org/version/1/4/
This diff is collapsed.
# About Gargantext
Gargantext is a collaborative web platform for the exploration of sets
of unstructured documents. It combines tools from natural language
processing, text-mining, complex networks analysis and interactive data
visualization to pave the way toward new kinds of interactions with your
digital corpora. In few minutes, you will be able to do knowledge maps,
collaborative state-of-the-art, portfolio analysis and many other crazy
things. Say good-bye to headaches in front of thousands of document to
analyze and launch yourself into the Gargantext adventure.
This software is free software, developed by the CNRS Complex Systems
Institute of Paris Île-de-France (ISC-PIF) and its partners.
Developers willing to join the Gargantext community are welcome.
[Gargantext plateforme](http://gargantext.org)
[Official code repository](http://code.gargantext.org)
# History of Gargantext
Gargantext project is a continuation of the TINA project (Chavalarias D.
2009-2011, EU FP7 FET Open 245412). The development of Gargantext has
benefited from synergies with the whole ecosystem of open source
softwares for text and network analytics, including Cortext, NLTK and
SigmaJS.
# Code of Conduct for the contributors
In order to be allowed to contribute to the project, each
contributor has to sign manually or digitally this [Code Of
Conduct](CODE_OF_CONDUCT.md) and send it to (team AT gargantext DOT
org). In case of digital signature, the digital identity has to be
signed by 2 others members of the project at least. Digital identity
means certified with the GPG key of the contributor with a strong level
of security.
# Team, partners and supports
The core team at the origin of Gargantext is :
* David Chavalarias, principal investigator
* Alexandre Delanoë, project manager
* Samuel Castillo J., developer
* Romain Loth, developer
* Mathieu Rodic, developer
* Constance de Quatrebarbes, developer
# Host institutions of this project
* CNRS labs ISC-PIF (Institut des Systèmes Complexes de Paris Île de France)
* CAMS (Centre d'Analyse et de Mathématiques Sociales)
# This project has received the support from the following institutional partners are :
* Ecole des Hautes Etudes en Science Sociales (EHESS)
* Ecole des Mines ParisTech, CSI
* Institut Pasteur
# This project has received grant from the following programs
* IDEFI FORCCAST
* Programme CNRS MI Mastodons
* ADEME
# API
Be more careful about authorizations.
cf. "ng-resource".
# Taggers
Path for data used by taggers should be defined in `gargantext.constants`.
# Database
# Sharing
Here follows a brief description of how sharing could be implemented.
## Database representation
The database representation of sharing can be distributed among 4 tables:
- `persons`, of which items represent either a user or a group
- `relationships` describes the relationships between persons (affiliation
of a user to a group, contact between two users, etc.)
- `nodes` contains the projects, corpora, documents, etc. to share (they shall
inherit the sharing properties from their parents)
- `permissions` stores the relations existing between the three previously
described above: it only consists of 2 foreign keys, plus an integer
between 1 and 3 representing the level of sharing and the start date
(when the sharing has been set) and the end date (when necessary, the time
at which sharing has been removed, `NULL` otherwise)
## Python code
The permission levels should be set in `gargantext.constants`, and defined as:
```python
PERMISSION_NONE = 0 # 0b0000
PERMISSION_READ = 1 # 0b0001
PERMISSION_WRITE = 3 # 0b0011
PERMISSION_OWNER = 7 # 0b0111
```
The requests to check for permissions (or add new ones) should not be rewritten
every time. They should be "hidden" within the models:
- `Person.owns(node)` returns a boolean
- `Person.can_read(node)` returns a boolean
- `Person.can_write(node)` returns a boolean
- `Person.give_right(node, permission)` gives a right to a given user
- `Person.remove_right(node, permission)` removes a right from a given user
- `Person.get_nodes(permission[, type])` returns an iterator on the list of
nodes on which the person has at least the given permission (optional
argument: type of requested node)
- `Node.get_persons(permission[, type])` returns an iterator on the list of
users who have at least the given permission on the node (optional argument:
type of requested persons, such as `USER` or `GROUP`)
## Example
Let's imagine the `persons` table contains the following data:
| id | type | username |
|----|-------|-----------|
| 1 | USER | David |
| 2 | GROUP | C.N.R.S. |
| 3 | USER | Alexandre |
| 4 | USER | Untel |
| 5 | GROUP | I.S.C. |
| 6 | USER | Bidule |
Assume "David" owns the groups "C.N.R.S." and "I.S.C.", "Alexandre" belongs to
the group "I.S.C.", with "Untel" and "Bidule" belonging to the group "C.N.R.S.".
"Alexandre" and "David" are in contact.
The `relationships` table then contains:
| person1_id | person2_id | type |
|------------|------------|---------|
| 1 | 2 | OWNER |
| 1 | 5 | OWNER |
| 3 | 2 | MEMBER |
| 4 | 5 | MEMBER |
| 6 | 5 | MEMBER |
| 1 | 3 | CONTACT |
The `nodes` table is populated as such:
| id | type | name |
|----|----------|----------------------|
| 12 | PROJECT | My super project |
| 13 | CORPUS | A given corpus |
| 13 | CORPUS | The corpus |
| 14 | DOCUMENT | Some document |
| 15 | DOCUMENT | Another document |
| 16 | DOCUMENT | Yet another document |
| 17 | DOCUMENT | Last document |
| 18 | PROJECT | Another project |
| 19 | PROJECT | That project |
If we want to express that "David" created "My super project" (and its children)
and wants everyone in "C.N.R.S." to be able to view it, but not access it,
`permissions` should contain:
| person_id | node_id | permission |
|-----------|---------|------------|
| 1 | 12 | OWNER |
| 2 | 12 | READ |
If "David" also wanted "Alexandre" (and no one else) to view and modify "The
corpus" (and its children), we would have:
| person_id | node_id | permission |
|-----------|---------|------------|
| 1 | 12 | OWNER |
| 2 | 12 | READ |
| 3 | 13 | WRITE |
If "Alexandre" created "That project" and wants "Bidule" (and no one else) to be
able to view and modify it (and its children), the table should then have:
| person_id | node_id | permission |
|-----------|---------|------------|
| 3 | 19 | OWNER |
| 6 | 19 | WRITE |
# A generic, single database configuration.
[alembic]
# path to migration scripts
script_location = alembic
# template used to generate migration files
# file_template = %%(rev)s_%%(slug)s
# timezone to use when rendering the date
# within the migration file as well as the filename.
# string value is passed to dateutil.tz.gettz()
# leave blank for localtime
# timezone =
# max length of characters to apply to the
# "slug" field
#truncate_slug_length = 40
# set to 'true' to run the environment during
# the 'revision' command, regardless of autogenerate
# revision_environment = false
# set to 'true' to allow .pyc and .pyo files without
# a source .py file to be detected as revisions in the
# versions/ directory
# sourceless = false
# version location specification; this defaults
# to alembic/versions. When using multiple version
# directories, initial revisions must be specified with --version-path
# version_locations = %(here)s/bar %(here)s/bat alembic/versions
# the output encoding used when revision files
# are written from script.py.mako
# output_encoding = utf-8
# XXX For database access configuration, see alembic/env.py
#sqlalchemy.url = driver://user:pass@localhost/dbname
[alembic:exclude]
tables = django_* celery_* djcelery_* auth_*
# Logging configuration
[loggers]
keys = root,sqlalchemy,alembic
[handlers]
keys = console
[formatters]
keys = generic
[logger_root]
level = WARN
handlers = console
qualname =
[logger_sqlalchemy]
level = WARN
handlers =
qualname = sqlalchemy.engine
[logger_alembic]
level = INFO
handlers =
qualname = alembic
[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic
[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S
Alembic must be installed in the virtualenv in order to use right python paths,
so it's installed with pip. Commands described in this little documentation
must be executed from gargantext root directory, ie. /srv/gargantext.
Keep in mind that Alembic only handles SQLAlchemy models: tables created from
Django ORM must be put out of Alembic sight. See [alembic:exclude] section in
alembic.ini.
To bootstrap Alembic where a gargantext database is already existing see
below: TELL ALEMBIC TO NOT START FROM SCRATCH.
USUAL WORKFLOW WITH ALEMBIC
1. Make change to models in gargantext/models
2. Autogenerate revision (see below GENERATE A REVISION)
3. Manually check and edit revision file in alembic/versions
4. Commit alembic revision (it should never be reverted)
5. Commit changes in models (it can be reverted if needed)
TELL ALEMBIC TO NOT START FROM SCRATCH
# To upgrade a database populated before Alembic usage in Gargantext,
# don't forget to tell Alembic your current version before to run
# "upgrade head" command. If you don't want to do this, you can of course
# drop your database and really start from scratch.
alembic stamp bedce47c9e34
UPGRADE TO LATEST DATABASE VERSION
alembic upgrade head
DOWNGRADE TO INITIAL DATABASE STATE
# /!\ RUNNING THIS COMMAND WILL CAUSE ALL DATA LOST WITHOUT ASKING !!
alembic downgrade base
GENERATE A REVISION
alembic revision --autogenerate -m "Message for this migration"
# A migration script is then created in alembic/versions directory. For
# example alembic/versions/3adcc9a56557_message_for_this_migration.py
# where 3adcc9a56557 is the revision id generated by Alembic.
#
# Alembic should generate a script reflecting changes already made in
# models or database. However it is always a good idea to check it and edit
# it manually, Alembic is not always accurate and can't see all alterations.
# It should work with basic changes such as model or column creation. See
# http://alembic.zzzcomputing.com/en/latest/autogenerate.html#what-does-autogenerate-detect-and-what-does-it-not-detect
GENERATE AN EMPTY REVISION
alembic revision -m "Message for this migration"
# This script must be edited to write the migration itself, mainly
# in `upgrade` and `downgrade` functions. See Alembic documentation for
# further details.
from __future__ import with_statement
from alembic import context
from sqlalchemy import engine_from_config, pool
from logging.config import fileConfig
import re
# Add projet root directory in path and setup Django...
import os
import django
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'gargantext.settings')
django.setup()
# ...to be able to import gargantext.
from gargantext import settings, models
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config
config.set_main_option("sqlalchemy.url", settings.DATABASES['default']['URL'])
# Interpret the config file for Python logging.
# This line sets up loggers basically.
fileConfig(config.config_file_name)
# add your model's MetaData object here
# for 'autogenerate' support
# from myapp import mymodel
# target_metadata = mymodel.Base.metadata
target_metadata = models.Base.metadata
# other values from the config, defined by the needs of env.py,
# can be acquired:
# my_important_option = config.get_main_option("my_important_option")
# ... etc.
# Inspired from https://gist.github.com/utek/6163250
def exclude_tables_from_config(config):
tables = config.get("tables", '').replace('*', '.*').split(' ')
pattern = '|'.join(tables)
return re.compile(pattern)
exclude_tables = exclude_tables_from_config(config.get_section('alembic:exclude'))
def include_object(obj, name, typ, reflected, compare_to):
if typ == "table" and exclude_tables.match(name):
return False
else:
return True
context_opts = dict(
target_metadata=target_metadata,
include_object=include_object,
compare_server_default=True,
compare_type=True,
)
def run_migrations_offline():
"""Run migrations in 'offline' mode.
This configures the context with just a URL
and not an Engine, though an Engine is acceptable
here as well. By skipping the Engine creation
we don't even need a DBAPI to be available.
Calls to context.execute() here emit the given string to the
script output.
"""
url = config.get_main_option("sqlalchemy.url")
context.configure(url=url, literal_binds=True, **context_opts)
with context.begin_transaction():
context.run_migrations()
def run_migrations_online():
"""Run migrations in 'online' mode.
In this scenario we need to create an Engine
and associate a connection with the context.
"""
connectable = engine_from_config(
config.get_section(config.config_ini_section),
prefix='sqlalchemy.',
poolclass=pool.NullPool)
with connectable.connect() as connection:
context.configure(connection=connection, **context_opts)
with context.begin_transaction():
context.run_migrations()
if context.is_offline_mode():
run_migrations_offline()
else:
run_migrations_online()
"""${message}
Revision ID: ${up_revision}
Revises: ${down_revision | comma,n}
Create Date: ${create_date}
"""
from alembic import op
import sqlalchemy as sa
import gargantext
${imports if imports else ""}
# revision identifiers, used by Alembic.
revision = ${repr(up_revision)}
down_revision = ${repr(down_revision)}
branch_labels = ${repr(branch_labels)}
depends_on = ${repr(depends_on)}
def upgrade():
${upgrades if upgrades else "pass"}
def downgrade():
${downgrades if downgrades else "pass"}
"""Put a timezone on Node.date
Revision ID: 08230100f262
Revises: 601e9d9baa4c
Create Date: 2017-07-06 13:47:10.788569
"""
from alembic import op
import sqlalchemy as sa
import gargantext
# revision identifiers, used by Alembic.
revision = '08230100f262'
down_revision = '601e9d9baa4c'
branch_labels = None
depends_on = None
def upgrade():
op.alter_column('nodes', 'date', type_=sa.DateTime(timezone=True))
def downgrade():
op.alter_column('nodes', 'date', type_=sa.DateTime(timezone=False))
"""Fix bug in title_abstract indexation
Revision ID: 159a5154362b
Revises: 73112a361617
Create Date: 2017-09-18 18:00:26.055335
"""
from alembic import op
import sqlalchemy as sa
from gargantext.util.alembic import ReplaceableObject
# revision identifiers, used by Alembic.
revision = '159a5154362b'
down_revision = '73112a361617'
branch_labels = None
depends_on = None
title_abstract_insert = ReplaceableObject(
'title_abstract_insert',
'BEFORE INSERT',
'nodes',
"""FOR EACH ROW
WHEN (NEW.hyperdata::text <> '{}'::text)
EXECUTE PROCEDURE title_abstract_update_trigger()"""
)
title_abstract_update = ReplaceableObject(
'title_abstract_update',
'BEFORE UPDATE OF hyperdata',
'nodes',
"""FOR EACH ROW
WHEN ((OLD.hyperdata ->> 'title', OLD.hyperdata ->> 'abstract')
IS DISTINCT FROM
(NEW.hyperdata ->> 'title', NEW.hyperdata ->> 'abstract'))
EXECUTE PROCEDURE title_abstract_update_trigger()"""
)
def upgrade():
op.replace_trigger(title_abstract_insert, replaces="73112a361617.title_abstract_insert")
op.replace_trigger(title_abstract_update, replaces="73112a361617.title_abstract_update")
# Manually re-build index
op.execute("UPDATE nodes SET title_abstract = to_tsvector('english', (hyperdata ->> 'title') || ' ' || (hyperdata ->> 'abstract')) WHERE typename=4")
def downgrade():
# Won't unfix the bug !
pass
"""Add english fulltext index on Nodes.hyperdata for abstract and title
Revision ID: 1fb4405b59e1
Revises: bedce47c9e34
Create Date: 2017-09-13 16:31:36.926692
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy_utils.types import TSVectorType
from gargantext.util.alembic import ReplaceableObject
# revision identifiers, used by Alembic.
revision = '1fb4405b59e1'
down_revision = 'bedce47c9e34'
branch_labels = None
depends_on = None
title_abstract_update_trigger = ReplaceableObject(
'title_abstract_update_trigger()',
"""
RETURNS trigger AS $$
begin
new.title_abstract := to_tsvector('english', (new.hyperdata ->> 'title') || ' ' || (new.hyperdata ->> 'abstract'));
return new;
end
$$ LANGUAGE plpgsql;
"""
)
title_abstract_update = ReplaceableObject(
'title_abstract_update',
'BEFORE INSERT OR UPDATE',
'nodes',
'FOR EACH ROW EXECUTE PROCEDURE title_abstract_update_trigger()'
)
def upgrade():
op.add_column('nodes', sa.Column('title_abstract', TSVectorType))
op.create_sp(title_abstract_update_trigger)
op.create_trigger(title_abstract_update)
# Initialize index with already existing data
op.execute('UPDATE nodes SET hyperdata = hyperdata');
def downgrade():
op.drop_trigger(title_abstract_update)
op.drop_sp(title_abstract_update_trigger)
op.drop_column('nodes', 'title_abstract')
"""Use current_user_id() as default nodes.user_id
Revision ID: 291289b47bad
Revises: 4db5dcbe4bc7
Create Date: 2017-10-09 14:58:12.992106
"""
from alembic import op
import sqlalchemy as sa
import gargantext
# revision identifiers, used by Alembic.
revision = '291289b47bad'
down_revision = '4db5dcbe4bc7'
branch_labels = None
depends_on = None
def upgrade():
op.alter_column('nodes', 'user_id',
existing_type=sa.INTEGER(),
server_default=sa.text('current_user_id()'),
existing_nullable=False)
def downgrade():
op.alter_column('nodes', 'user_id',
existing_type=sa.INTEGER(),
server_default=None,
existing_nullable=False)
"""Bootstrap access control system
Revision ID: 4db5dcbe4bc7
Revises: 73304ae9f1fb
Create Date: 2017-10-06 17:23:27.765318
"""
from alembic import op
import sqlalchemy as sa
from gargantext.util.alembic import ReplaceableObject
# revision identifiers, used by Alembic.
revision = '4db5dcbe4bc7'
down_revision = '73304ae9f1fb'
branch_labels = None
depends_on = None
# Publicly exposed schema through PostgREST
api_schema = ReplaceableObject("api")
api_nodes_view = ReplaceableObject(
"api.nodes",
"SELECT id, typename AS type, user_id, parent_id, name, date AS created, hyperdata AS data, title_abstract FROM nodes")
# Mere mortals have 'gargantext' role, admin is 'gargantua'
gargantext_role = ReplaceableObject("gargantext", "NOLOGIN")
# PostgREST authentification system; could be used without PostgREST
authenticator_role = ReplaceableObject(
"authenticator",
"LOGIN NOINHERIT PASSWORD 'CHANGEME'")
anon_role = ReplaceableObject("anon", "NOLOGIN")
roles = [gargantext_role, authenticator_role, anon_role]
grants = [
('gargantext', 'gargantua'),
# Enable login through PostgREST auth system for gargantua, anon and
# gargantext
('gargantua, anon, gargantext', 'authenticator'),
# Basic privileges for gargantext role
('CREATE, USAGE ON SCHEMA api', 'gargantext'),
('SELECT ON nodes', 'gargantext'),
('UPDATE (parent_id, name, date, hyperdata) ON nodes', 'gargantext'),
('INSERT ON nodes', 'gargantext'),
('USAGE, SELECT ON SEQUENCE nodes_id_seq', 'gargantext'),
('DELETE ON nodes', 'gargantext'),
]
current_user_id_sp = ReplaceableObject(
"current_user_id()",
"""
-- Assuming JWT and claim.user_id is set to user.id at login
-- https://stackoverflow.com/questions/2082686/how-do-i-cast-a-string-to-integer-and-have-0-in-case-of-error-in-the-cast-with-p
RETURNS integer AS $$
DECLARE
user_id INTEGER NOT NULL DEFAULT 0;
BEGIN
BEGIN
user_id := current_setting('request.jwt.claim.user_id')::int;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'Invalid user_id: %. Check JWT generation.',
current_setting('request.jwt.claim.user_id', TRUE);
RETURN -1;
END;
RETURN user_id;
END;
$$ LANGUAGE plpgsql""")
stored_procedures = [current_user_id_sp]
is_owner = "COALESCE(current_user_id() = user_id, FALSE)"
is_parent_owner = "COALESCE(current_user_id() = (SELECT user_id FROM nodes n WHERE id = nodes.parent_id), FALSE)"
owner_select_policy = ReplaceableObject("owner_select", "nodes", "FOR SELECT USING (%s)" % is_owner)
owner_update_policy = ReplaceableObject("owner_update", "nodes", "FOR UPDATE USING (%s)" % is_owner)
owner_insert_policy = ReplaceableObject("owner_insert", "nodes", "FOR INSERT WITH CHECK (%s)" % is_parent_owner)
owner_delete_policy = ReplaceableObject("owner_delete", "nodes", "FOR DELETE USING (%s)" % is_parent_owner)
policies = [owner_select_policy, owner_update_policy, owner_insert_policy,
owner_delete_policy]
def upgrade():
op.create_schema(api_schema)
for role in roles:
op.create_role(role)
op.create_view(api_nodes_view)
for grant in grants:
op.execute('GRANT {} TO {}'.format(*grant))
op.execute("ALTER VIEW api.nodes OWNER TO gargantext")
op.execute("ALTER TABLE nodes ENABLE ROW LEVEL SECURITY")
for sp in stored_procedures:
op.create_sp(sp)
for policy in policies:
op.create_policy(policy)
def downgrade():
for policy in policies:
op.drop_policy(policy)
for sp in stored_procedures:
op.drop_sp(sp)
op.execute("ALTER TABLE nodes DISABLE ROW LEVEL SECURITY")
for grant in grants:
op.execute('REVOKE {} FROM {}'.format(*grant))
op.drop_view(api_nodes_view)
for role in roles:
op.drop_role(role)
op.drop_schema(api_schema)
"""Add OCC_HIST & OCC_HIST_PART functions
Revision ID: 601e9d9baa4c
Revises: 932dbf3e8c43
Create Date: 2017-07-06 10:52:16.161118
"""
from alembic import op
import sqlalchemy as sa
from gargantext.util.alembic import ReplaceableObject
# revision identifiers, used by Alembic.
revision = '601e9d9baa4c'
down_revision = '932dbf3e8c43'
branch_labels = None
depends_on = None
# -- OCC_HIST_PART :: Corpus.id -> GroupList.id -> Start -> End
occ_hist_part = ReplaceableObject(
"OCC_HIST_PART(int, int, timestamp, timestamp)",
"""
RETURNS TABLE (ng_id int, score float8)
AS $$
-- EXPLAIN ANALYZE
SELECT
COALESCE(gr.ngram1_id, ng1.ngram_id) as ng_id,
SUM(ng1.weight) as score
from nodes n
-- BEFORE
INNER JOIN nodes as n1 ON n1.id = n.id
INNER JOIN nodes_ngrams ng1 ON ng1.node_id = n1.id
-- Limit with timestamps: ]start, end]
INNER JOIN nodes_hyperdata nh1 ON nh1.node_id = n1.id
AND nh1.value_utc > $3
AND nh1.value_utc <= $4
-- Group List
LEFT JOIN nodes_ngrams_ngrams gr ON ng1.ngram_id = gr.ngram2_id
AND gr.node_id = $2
WHERE
n.typename = 4
AND n.parent_id = $1
GROUP BY 1
$$
LANGUAGE SQL;
"""
)
# -- OCC_HIST :: Corpus.id -> GroupList.id -> MapList.id -> Start -> EndFirst -> EndLast
# -- EXEMPLE USAGE
# -- SELECT * FROM OCC_HIST(182856, 183859, 183866, '1800-03-15 17:00:00+01', '2000-03-15 17:00:00+01', '2017-03-15 17:00:00+01')
occ_hist = ReplaceableObject(
"OCC_HIST(int, int, int, timestamp, timestamp, timestamp)",
"""
RETURNS TABLE (ng_id int, score numeric)
AS $$
WITH OCC1 as (SELECT * from OCC_HIST_PART($1, $2, $4, $5))