Gargantext with Haskell (Backend instance)
Table of Contents
About the project
GarganText is a collaborative web-decentralized-based macro-service platform for the exploration of unstructured texts. It combines tools from natural language processing, text-data-mining bricks, complex networks analysis algorithms and interactive data visualization tools to pave the way toward new kinds of interactions with your textual and digital corpora.
This software is free (as "Libre" in French) software, developed by the CNRS Complex Systems Institute of Paris Île-de-France (ISC-PIF) and its partners.
GarganText Project: this repo builds the backend for the frontend server built by backend.
Installation and development
Disclaimer: since this project is still in development, this document remains in progress. Please report and improve this documentation if you encounter any issues.
Prerequisites
You must have the following installed:
Building
Clone the projects
Clone both the backend (haskell-gargantext
), and the frontend (purescript-gargantext
) at the root of the backend.
$ git clone https://gitlab.iscpif.fr/gargantext/haskell-gargantext.git
$ cd haskell-gargantext
$ git clone https://gitlab.iscpif.fr/gargantext/purescript-gargantext.git
$ cd ..
The Nix shell
In what follows, many commands need to be executed from within the Nix shell. To make that clear, those will be prefixed with n$
, but you must not actually type n$
before the commands.
To enter a Nix shell, run the following (this will take a moment the first time you run it, be patient):
$ nix-shell
Once you are in a Nix shell, you can run commands like you would in any other shell.
At any point, you can exit a Nix shell and go back to your regular shell by running exit
.
If for some reason you do not want to enter a Nix shell, you can still run a command from outside: running the following in a non-Nix shell
$ nix-shell --run "my command"
is equivalent to running my command
from within a Nix shell.
(Optional) Disable optimization flags
If you are developing Gargantext, you might be interested in disabling compiler optimizations. This speeds up compilation, but the compiled program itself will be less efficient.
To disable compiler optimizations, copy the file cabal.project.local_toCopy
(which contains the flags that disable optimizations) into cabal.project.local
(which will be read by Cabal):
$ cp cabal.project.local_toCopy cabal.project.local
Build the frontend
$ cd purescript-gargantext/
$ ./bin/install
$ cd ..
Build the backend
Note: This project can be built with either stack or cabal. We keep the cabal.project
up-to-date, which allows us to build with cabal by default but we support stack thanks to thanks to cabal2stack
, which allows us to generate a valid stack.yaml
from a cabal.project
. Due to the fact gargantext requires a particular set of system dependencies (C++ libraries, toolchains, etc) we use nix to setup an environment with all the required system dependencies, in a sandboxed and isolated fashion.
This documentation shows how to build with cabal. For information related to stack, see docs/using_stack.md
.
Depending on your situation, there are several ways to build the project:
- Simple build
This will build the project and install the executables gargantext-cli
and gargantext-server
somewhere on your system.
Depending on your Cabal configuration, this is probably ~/.local/bin/
or ~/.cabal/bin/
.
From within the Nix shell, run:
n$ cabal update
n$ cabal install
- Full build
Same as "simple build" above, but also runs tests and builds documentation.
Just run the install
script:
$ ./bin/install
- Build and run
Builds and runs the Gargantext server. This has the advantage of letting you run Gargantext without having to know where on your machine the executable is.
Since you will be running Gargantext, you need to have gone through initialization first; see "Initializing and running" below.
From inside a Nix shell:
n$ cabal run gargantext-server -- --ini gargantext.ini --run Prod
Initializing and running
Start containers for database and NLP software bricks
$ cd devops/docker
$ docker compose up
The initialization schema should be loaded automatically from devops/postgres/schema.sql
.
Create configuration file
$ cp gargantext.ini_toModify gargantext.ini
.gitignore
excludes this file, so you don't need to worry about committing it by mistake, and you can change the passwords ingargantext.ini
safely.
Create master user
From within the Nix shell:
n$ gargantext-cli init --ini-path gargantext.ini
The master user's name is automatically set to gargantua
, but you will be prompted for their password and email address.
Running
Make sure you know where gargantext-server
is (probably in ~/.local/bin/
or .cabal/bin/
). If the location is in your $PATH
, just run:
$ gargantext-server -- --ini gargantext.ini --run Prod
(If the location is not in your $PATH
, just prefix gargantext-server
with the path to it.)
You might want to use the ./start
script: it rebuilds the backend, starts the docker containers, and launches the Gargantext server at once.
Running tests
From nix shell:
n$ cabal v2-test --test-show-details=streaming
Or, from "outside":
$ nix-shell --run "cabal v2-test --test-show-details=streaming"
Working on libraries
When a devlopment is needed on libraries (for instance, the HAL crawler in https://gitlab.iscpif.fr/gargantext/crawlers):
- Ongoing devlopment (on local repo):
- In
cabal.project
:- add
../hal
topackages:
- turn off (temporarily) the
hal
insource-repository-package
- add
- When changes work and tests are OK, commit in repo
hal
- In
- When changes are commited / merged:
- Get the hash id, and edit
cabal.project
with the new commit id - run
./bin/update-project-dependencies
- get an error that sha256 don't match, so update the
./bin/update-project-dependencies
with new sha256 hash - run again
./bin/update-project-dependencies
(to make sure it's a fixed point now)
- get an error that sha256 don't match, so update the
- Get the hash id, and edit
Note: without
stack.yaml
we would have to only fixcabal.project
->source-repository-package
commit id. Sha256 is there to make sure CI reruns the tests.
Tooling info
Once you get Gargantext to compile and run on your machine, you will likely want the following:
- Language support (intellisense) in your editor; see
docs/editor_setup.md
- Being able to send commands to the Gargantext server from GHCI; see
docs/running_commands.md
Use Cases
Multi-User with Graphical User Interface (Server Mode)
$ ~/.local/bin/stack --docker exec gargantext-server -- --ini "gargantext.ini" --run Prod
Then you can log in with user1
/ 1resu
Command Line Mode tools
Simple cooccurrences computation and indexation from a list of Ngrams
$ stack --docker exec gargantext-cli -- CorpusFromGarg.csv ListFromGarg.csv Ouput.json
Analyzing the ngrams table repo
We store the repository in directory repos
in the CBOR file format. To decode it to JSON and analyze, say, using jq, use the following command:
$ cat repos/repo.cbor.v5 | stack exec gargantext-cbor2json | jq .
Documentation
To build documentation, run:
$ stack build --haddock --no-haddock-deps --fast
(in .stack-work/dist/x86_64-linux-nix/Cabal-3.2.1.0/doc/html/gargantext
).
GraphQL
Some introspection information.
Playground is located at http://localhost:8008/gql
List all GraphQL types in the Playground
{
__schema {
types {
name
}
}
}
List details about a type in GraphQL
{
__type(name:"User") {
fields {
name
description
type {
name
}
}
}
}
PostgreSQL
Upgrading using Docker
https://www.cloudytuts.com/tutorials/docker/how-to-upgrade-postgresql-in-docker-and-kubernetes/
To upgrade PostgreSQL in Docker containers, for example from 11.x to 14.x, simply run:
$ docker exec -it <container-id> pg_dumpall -U gargantua > 11-db.dump
Then, shut down the container, replace image
section in devops/docker/docker-compose.yaml
with postgres:14
. Also, it is a good practice to create a new volume, say garg-pgdata14
and bind the new container to it. If you want to keep the same volume, remember about removing it like so:
$ docker-compose rm postgres
$ docker volume rm docker_garg-pgdata
Now, start the container and execute:
$ # need to drop the empty DB first, since schema will be created when restoring the dump
$ docker exec -i <new-container-id> dropdb -U gargantua gargandbV5
$ # recreate the db, but empty with no schema
$ docker exec -i <new-container-id> createdb -U gargantua gargandbV5
$ # now we can restore the dump
$ docker exec -i <new-container-id> psql -U gargantua -d gargandbV5 < 11-db.dump
Upgrading using
There is a solution using pgupgrade_cluster but you need to manage the clusters version 14 and 13. Hence here is a simple solution to upgrade.
First save your data:
$ sudo su postgres
$ pg_dumpall > gargandb.dump
Upgrade postgresql:
$ sudo apt install postgresql-server-14 postgresql-client-14
$ sudo apt remove --purge postgresql-13
Restore your data:
$ sudo su postgres
$ psql < gargandb.dump
Maybe you need to restore the gargantua password
$ ALTER ROLE gargantua PASSWORD 'yourPasswordIn_gargantext.ini'
Maybe you need to change the port to 5433 for database connection in your gargantext.ini file.