README.md 8.1 KB
Newer Older
1
# Gargantext with Purescript (FrontEnd instance)
2

3
## About the project
4

5 6 7 8 9
GarganText is a collaborative web-decentralized-based macro-service
platform for the exploration of unstructured texts. It combines tools
from natural language processing, text-data-mining tricks, complex
networks analysis algorithms and interactive data visualization tools to
pave the way toward new kinds of interactions with your digital corpora.
10

11 12 13 14 15 16
This software is free software, developed and offered by the CNRS
Complex Systems Institute of Paris Île-de-France (ISC-PIF) and its
partners.

GarganText Project: this repo builds the
frontend for the backend server built by
Alexandre Delanoë's avatar
Alexandre Delanoë committed
17
[backend](https://gitlab.iscpif.fr/gargantext/haskell-gargantext).
18

19

James Laver's avatar
James Laver committed
20 21 22
## Getting set up

There are two approaches to working with the build:
23
1. Use our Nix or Docker setup
James Laver's avatar
James Laver committed
24 25
2. Install our dependencies yourself

26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
### With Nix setup

First install [nix](https://nixos.org/guides/install-nix.html): 

```shell
sh < (curl -L https://nixos.org/nix/install) --daemon
```

Verify the installation is complete
```shell
$ nix-env
nix-env (Nix) 2.3.12
```

To build the frontend just do:
```
nix-shell --run build
```
Just serve dist/index.html with any server and you are ready to be
connected to any backend.

James Laver's avatar
James Laver committed
47

48
### With Docker setup
James Laver's avatar
James Laver committed
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70

You will need docker and docker-compose installed.

First, Source our environment file:

```shell
source ./env.sh
```

WARNING: you must `source ./env.sh` before using the docker
container. If you don't do that, the container will write files as
root and you'll need root powers to get ownership back!

Now build the docker image:

```shell
docker-compose build frontend
```

That's it, skip ahead to "Development".

### Manual setup
71

James Laver's avatar
James Laver committed
72
The build requires the following system dependencies preinstalled:
73

James Laver's avatar
James Laver committed
74 75 76
* NodeJS (11+)
* Yarn (Recent)

James Laver's avatar
James Laver committed
77
#### NodeJS
James Laver's avatar
James Laver committed
78 79

On debian testing, debian unstable or ubuntu:
80 81 82 83 84

```shell
sudo apt update && sudo apt install nodejs yarn
```

James Laver's avatar
James Laver committed
85 86
On debian stable:

87 88 89 90 91
```shell
curl -sL https://deb.nodesource.com/setup_11.x | sudo bash -
sudo apt update && sudo apt install nodejs
```

James Laver's avatar
James Laver committed
92 93 94
<!-- TODO: wtf is all this sudo? -->
<!-- To upgrade to latest version (and not current stable) version, you can -->
<!-- use the `n` module from npm to upgrade node: -->
95

James Laver's avatar
James Laver committed
96 97 98 99 100 101
<!-- ```shell -->
<!-- sudo npm cache clean -f -->
<!-- sudo npm install -g n -->
<!-- sudo n stable -->
<!-- sudo n latest -->
<!-- ``` -->
102

James Laver's avatar
James Laver committed
103
On Mac OS X with homebrew:
104 105

```shell
James Laver's avatar
James Laver committed
106
brew install node
107 108
```

James Laver's avatar
James Laver committed
109 110
For other platforms, please refer to [the nodejs website](https://nodejs.org/en/download/).

James Laver's avatar
James Laver committed
111
#### Yarn (javascript package manager)
112

James Laver's avatar
James Laver committed
113
On debian or ubuntu:
114 115 116 117 118 119 120

```shell
curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
sudo apt update && sudo apt install yarn
```

James Laver's avatar
James Laver committed
121
On Mac OS X with homebrew:
James Laver's avatar
James Laver committed
122

123
```shell
James Laver's avatar
James Laver committed
124
brew install yarn
125 126
```

James Laver's avatar
James Laver committed
127
For other platforms, please refer to [the yarn website](https://www.yarnpkg.com/).
James Laver's avatar
James Laver committed
128

129 130 131 132 133 134 135 136 137 138 139 140 141 142
#### Purescript build tools

Once you have yarn installed you can install the necessary purescript build tools:

```shell
yarn global add purescript spago pulp
```

In order to use those tools you might need to add the yarn global package install location to your path. On linux this can be done by adding the following line at the end of your `.bashrc` file:

```shell
export PATH="$(yarn global bin):$PATH"
```

James Laver's avatar
James Laver committed
143
## Development
144

James Laver's avatar
James Laver committed
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
### Docker environment

Are you using the docker setup? Run this:

```shell
source ./env.sh
```

This enables the docker container to run as the current user so any
files it writes will be readable by you. It also creates a `darn`
shell alias (short for `docker yarn`) for running yarn commands inside
the docker container.

### Basic tasks

160
Now we must install our javascript and purescript dependencies:  
161

162
```shell
Karen Konou's avatar
Karen Konou committed
163 164
darn install -D # for docker setup
yarn install -D # for manual setup
165
```
166

James Laver's avatar
James Laver committed
167 168
You will likely want to check your work in a browser. We provide a
local development webserver that serves on port 5000 for this purpose:
169

170
```shell
James Laver's avatar
James Laver committed
171 172
darn server # for docker setup
yarn server # for manual setup
173
```
174

James Laver's avatar
James Laver committed
175
To generate a new browser bundle to test:
176 177

```shell
James Laver's avatar
James Laver committed
178 179
darn build # for docker setup
yarn build # for manual setup
180 181
```

James Laver's avatar
James Laver committed
182
You may access a purescript repl if you want to explore:
183 184

```shell
James Laver's avatar
James Laver committed
185 186
darn repl # for docker setup
yarn repl # for manual setup
187 188
```

James Laver's avatar
James Laver committed
189
If you need to reinstall dependencies such as after a git pull or branch switch:
190 191

```shell
James Laver's avatar
James Laver committed
192 193
darn install -D && darn install-ps # for docker setup
yarn install -D && yarn install-ps # for manual setup
194 195
```

James Laver's avatar
James Laver committed
196 197
If something goes wrong building after a deps update, you may clean
build artifacts and try again:
James Laver's avatar
James Laver committed
198 199

```shell
James Laver's avatar
James Laver committed
200 201 202 203 204 205 206 207
# for docker setup
darn clean-js # clean javascript, very useful
darn clean-ps # clean purescript, should never be required, possible purescript bug
darn clean # clean both purescript and javascript
# for manual setup
yarn clean-js
yarn clean-ps
yarn clean
James Laver's avatar
James Laver committed
208 209
```

James Laver's avatar
James Laver committed
210
If you edit the SASS, you'll need to rebuild the CSS:
James Laver's avatar
James Laver committed
211

James Laver's avatar
James Laver committed
212
```shell
James Laver's avatar
James Laver committed
213 214
darn css # for docker setup
yarn css # for manual setup
James Laver's avatar
James Laver committed
215
```
James Laver's avatar
James Laver committed
216

James Laver's avatar
James Laver committed
217 218
<!-- A `purs ide` connection will be available on port 9002 while the -->
<!-- development server is running. -->
James Laver's avatar
James Laver committed
219

Alexandre Delanoë's avatar
Alexandre Delanoë committed
220
A guide to getting set up with the IDE integration is coming soon.
James Laver's avatar
James Laver committed
221

222 223 224 225 226 227 228 229
### Testing

To run unit tests, just run:

``` shell
test-ps
```

James Laver's avatar
James Laver committed
230
### Note to contributors
James Laver's avatar
James Laver committed
231

James Laver's avatar
James Laver committed
232
Please follow CONTRIBUTING.md
233 234 235

### How do I?

236 237 238 239 240 241 242
#### Add a javascript dependency?

Add it to `package.json`, under `dependencies` if it is needed at
runtime or `devDependencies` if it is not.

#### Add a purescript dependency?

243
Add it to `spago.dhall` (or run `spago install ...`).
244 245 246 247 248 249 250 251 252 253

If is not in the package set, you will need to read the next section.

#### Add a custom or override package to the local package set?

You need to add an entry to the relevant map in
`packages.dhall`. There are comments in the file explaining how it
works. It's written in dhall, so you can use comments and such.

## Theory Introduction
254 255

Making sense of out text isn't actually that hard, but it does require
256
a little background knowledge to understand.
257 258 259

### N-grams

Alexandre Delanoë's avatar
Alexandre Delanoë committed
260 261
N-grams in contexts (of texts) are at the heart of how Gargantext makes
sense out of text.
262 263 264 265 266

There are two common meanings in the literature for n-gram:
- a sequence of `n` characters
- a sequence of `n` words

Alexandre Delanoë's avatar
Alexandre Delanoë committed
267 268
Gargantext is focused on words. Here are some example word n-grams
usually extracted by our Natural Language Process toolkit;
269 270

- `coffee` (unigram or 1-gram)
Alexandre Delanoë's avatar
Alexandre Delanoë committed
271 272 273
- `black coffee` (bigram or 2-gram)
- `hot black coffee` (trigram or 3-gram)
- `arabica hot black coffee` (4-gram)
274

Alexandre Delanoë's avatar
Alexandre Delanoë committed
275 276
N-grams are matched case insensitively and across whole words removing
the linked syntax if exists. Examples:
277 278 279 280 281 282 283 284 285

| Text         | N-gram       | Matches              |
|--------------|--------------|----------------------|
| `Coffee cup` | `coffee`     | YES                  |
| `Coffee cup` | `off`        | NO, not a whole word |
| `Coffee cup` | `coffee cup` | YES                  |

You may read more about n-grams [on wikipedia](https://en.wikipedia.org/wiki/N-gram).

286 287
<!-- TODO: Discuss punctuation -->

Alexandre Delanoë's avatar
Alexandre Delanoë committed
288 289 290
Gargantext allows you to define and refine n-grams interactively in your
browser and explore the relationships they uncover across a corpus of
text.
291

Alexandre Delanoë's avatar
Alexandre Delanoë committed
292 293 294 295
Various metrics can be applied to n-grams, the most common of which
is the number of times an n-gram appears in a document (occurrences).
GarganText uses extensively the cooccurrences: times 2 n-grams appear in
same context of text.
296 297 298 299 300 301

## Glossary

document
: One or more texts comprising a single logical document
field
Alexandre Delanoë's avatar
Alexandre Delanoë committed
302
: A portion of a document or metadata, e.g. `title`, `abstract`, `body`
303
corpus
Alexandre Delanoë's avatar
Alexandre Delanoë committed
304
: A collection of documents as set (with no repetition)
305 306 307 308 309 310 311 312
n-gram/ngram
: A word or words to be indexed, consisting of `n` words.
  This technically includes skip-grams, but in the general case
  the words will be contiguous.
unigram/1-gram
: A one-word n-gram, e.g. `cow`, `coffee`
bigram/2-gram
: A two-word n-gram, e.g. `coffee cup`
313
trigram/3-gram
314
: A three-word n-gram, e.g. `coffee cup holder`
James Laver's avatar
James Laver committed
315
skip-gram
Alexandre Delanoë's avatar
Alexandre Delanoë committed
316 317
: An n-gram where the words are not all adjacent. Group 2 different
n-grams to enable such feature.
James Laver's avatar
James Laver committed
318
k-skip-n-gram
Alexandre Delanoë's avatar
Alexandre Delanoë committed
319 320 321
: An n-gram where the words are at most distance k from each other. This
feature is used for advanced research in text (not yet supported in
GarganText)
322