Name
Last commit
Last update
.psc-package/local/.set Loading commit data...
dist Loading commit data...
nix Loading commit data...
prod Loading commit data...
s Loading commit data...
src Loading commit data...
test Loading commit data...
.babelrc Loading commit data...
.gitignore Loading commit data...
.gitlab-ci.yml Loading commit data...
.yarnrc Loading commit data...
CODE_OF_CONDUCT.md Loading commit data...
CONTRIBUTING.md Loading commit data...
Dockerfile.dev Loading commit data...
LICENSE Loading commit data...
REACTIX.md Loading commit data...
README.md Loading commit data...
docker-compose.yml Loading commit data...
docker-env.sh Loading commit data...
env.sh Loading commit data...
index.html Loading commit data...
package.json Loading commit data...
packages.dhall Loading commit data...
psc-package.json Loading commit data...
shell.nix Loading commit data...
yarn.lock Loading commit data...

Gargantext Purescript

About this project

Gargantext is a collaborative web platform for the exploration of sets of unstructured documents. It combines tools from natural language processing, text-mining, complex networks analysis and interactive data visualization to pave the way toward new kinds of interactions with your digital corpora.

This repo deals with the frontend or client which needs a backend server running or being granted access to a backend.

This software is free software, developed by the CNRS Complex Systems Institute of Paris Île-de-France (ISC-PIF) and its partners.

Getting set up

There are two approaches to working with the build:

  1. Use our docker setup
  2. Install our dependencies yourself

The javascript ecosystem kind of assumes if you're on linux, you're running on debian or ubuntu. I haven't yet managed to get garg to build on alpine linux, for example. If you're on an oddball system, I strongly recommend you just use the docker setup.

Docker setup

You will need docker and docker-compose installed.

First, Source our environment file:

source ./env.sh

WARNING: you must source ./env.sh before using the docker container. If you don't do that, the container will write files as root and you'll need root powers to get ownership back!

Now build the docker image:

docker-compose build frontend

That's it, skip ahead to "Development".

Manual setup

The build requires the following system dependencies preinstalled:

  • NodeJS (11+)
  • Yarn (Recent)

NodeJS

On debian testing, debian unstable or ubuntu:

sudo apt update && sudo apt install nodejs yarn

On debian stable:

curl -sL https://deb.nodesource.com/setup_11.x | sudo bash -
sudo apt update && sudo apt install nodejs

On Mac OS X with homebrew:

brew install node

For other platforms, please refer to the nodejs website.

Yarn (javascript package manager)

On debian or ubuntu:

curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
sudo apt update && sudo apt install yarn

On Mac OS X with homebrew:

brew install yarn

For other platforms, please refer to the yarn website.

Development

Docker environment

Are you using the docker setup? Run this:

source ./env.sh

This enables the docker container to run as the current user so any files it writes will be readable by you. It also creates a darn shell alias (short for docker yarn) for running yarn commands inside the docker container.

Basic tasks

Now we must install our javascript and purescript dependencies:
Note: if you're installing manually you might also need to manually install psc-package

darn install -D && darn install-ps # for docker setup
yarn install -D && yarn install-ps # for manual setup

You will likely want to check your work in a browser. We provide a local development webserver that serves on port 5000 for this purpose:

darn server # for docker setup
yarn server # for manual setup

To generate a new browser bundle to test:

darn build # for docker setup
yarn build # for manual setup

If you are rapidly iterating and just want to type check your code:

darn compile # for docker setup
yarn compile # for manual setup

You may access a purescript repl if you want to explore:

darn repl # for docker setup
yarn repl # for manual setup

If you need to reinstall dependencies such as after a git pull or branch switch:

darn install -D && darn install-ps # for docker setup
yarn install -D && yarn install-ps # for manual setup

If something goes wrong building after a deps update, you may clean build artifacts and try again:

# for docker setup
darn clean-js # clean javascript, very useful
darn clean-ps # clean purescript, should never be required, possible purescript bug
darn clean # clean both purescript and javascript
# for manual setup
yarn clean-js
yarn clean-ps
yarn clean

If you edit the SASS, you'll need to rebuild the CSS:

darn css # for docker setup
yarn css # for manual setup

A guide to getting set up with the IDE integration is coming soon.

Testing

To run unit tests, just run:

test-ps

Note to contributors

Please follow CONTRIBUTING.md

How do I?

Add a javascript dependency?

Add it to package.json, under dependencies if it is needed at runtime or devDependencies if it is not.

Add a purescript dependency?

Add it to psc-package.json without the purescript- prefix.

If is not in the package set, you will need to read the next section.

Add a custom or override package to the local package set?

You need to add an entry to the relevant map in packages.dhall. There are comments in the file explaining how it works. It's written in dhall, so you can use comments and such.

You will then need to rebuild the package set:

yarn rebuild-set # or darn rebuild-set

Upgrade the base package set local is based on to latest?

yarn rebase-set && yarn rebuild-set # or darn rebase-set && darn rebuild-set

Theory Introduction

Making sense of out text isn't actually that hard, but it does require a little background knowledge to understand.

N-grams

N-grams in contexts (of texts) are at the heart of how Gargantext makes sense out of text.

There are two common meanings in the literature for n-gram:

  • a sequence of n characters
  • a sequence of n words

Gargantext is focused on words. Here are some example word n-grams usually extracted by our Natural Language Process toolkit;

  • coffee (unigram or 1-gram)
  • black coffee (bigram or 2-gram)
  • hot black coffee (trigram or 3-gram)
  • arabica hot black coffee (4-gram)

N-grams are matched case insensitively and across whole words removing the linked syntax if exists. Examples:

Text N-gram Matches
Coffee cup coffee YES
Coffee cup off NO, not a whole word
Coffee cup coffee cup YES

You may read more about n-grams on wikipedia.

Gargantext allows you to define and refine n-grams interactively in your browser and explore the relationships they uncover across a corpus of text.

Various metrics can be applied to n-grams, the most common of which is the number of times an n-gram appears in a document (occurrences). GarganText uses extensively the cooccurrences: times 2 n-grams appear in same context of text.

Glossary

document : One or more texts comprising a single logical document field : A portion of a document or metadata, e.g. title, abstract, body corpus : A collection of documents as set (with no repetition) n-gram/ngram : A word or words to be indexed, consisting of n words. This technically includes skip-grams, but in the general case the words will be contiguous. unigram/1-gram : A one-word n-gram, e.g. cow, coffee bigram/2-gram : A two-word n-gram, e.g. coffee cup trigram/3-gram : A three-word n-gram, e.g. coffee cup holder skip-gram : An n-gram where the words are not all adjacent. Group 2 different n-grams to enable such feature. k-skip-n-gram : An n-gram where the words are at most distance k from each other. This feature is used for advanced research in text (not yet supported in GarganText)