README.md 7.77 KB
Newer Older
1 2 3 4 5 6 7 8 9 10
# Gargantext Purescript

## About this project

Gargantext is a collaborative web platform for the exploration of sets
of unstructured documents. It combines tools from natural language
processing, text-mining, complex networks analysis and interactive data
visualization to pave the way toward new kinds of interactions with your
digital corpora.

11 12 13 14
You will not find this software very useful without also running or being
granted access to a [backend](https://gitlab.iscpif.fr/gargantext/haskell-gargantext).

This software is free software, developed by the CNRS Complex Systems
15 16
Institute of Paris Île-de-France (ISC-PIF) and its partners.

James Laver's avatar
James Laver committed
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## Getting set up

There are two approaches to working with the build:
1. Use our docker setup
2. Install our dependencies yourself

The javascript ecosystem kind of assumes if you're on linux, you're
running on debian or ubuntu. I haven't yet managed to get garg to
build on alpine linux, for example. If you're on an oddball system, I
*strongly* recommend you just use the docker setup.

### Docker setup

You will need docker and docker-compose installed.

First, Source our environment file:

```shell
source ./env.sh
```

WARNING: you must `source ./env.sh` before using the docker
container. If you don't do that, the container will write files as
root and you'll need root powers to get ownership back!

Now build the docker image:

```shell
docker-compose build frontend
```

That's it, skip ahead to "Development".

### Manual setup
51

James Laver's avatar
James Laver committed
52
The build requires the following system dependencies preinstalled:
53

James Laver's avatar
James Laver committed
54 55 56
* NodeJS (11+)
* Yarn (Recent)

James Laver's avatar
James Laver committed
57
#### NodeJS
James Laver's avatar
James Laver committed
58 59

On debian testing, debian unstable or ubuntu:
60 61 62 63 64

```shell
sudo apt update && sudo apt install nodejs yarn
```

James Laver's avatar
James Laver committed
65 66
On debian stable:

67 68 69 70 71
```shell
curl -sL https://deb.nodesource.com/setup_11.x | sudo bash -
sudo apt update && sudo apt install nodejs
```

James Laver's avatar
James Laver committed
72 73 74
<!-- TODO: wtf is all this sudo? -->
<!-- To upgrade to latest version (and not current stable) version, you can -->
<!-- use the `n` module from npm to upgrade node: -->
75

James Laver's avatar
James Laver committed
76 77 78 79 80 81
<!-- ```shell -->
<!-- sudo npm cache clean -f -->
<!-- sudo npm install -g n -->
<!-- sudo n stable -->
<!-- sudo n latest -->
<!-- ``` -->
82

James Laver's avatar
James Laver committed
83
On Mac OS X with homebrew:
84 85

```shell
James Laver's avatar
James Laver committed
86
brew install node
87 88
```

James Laver's avatar
James Laver committed
89 90
For other platforms, please refer to [the nodejs website](https://nodejs.org/en/download/).

James Laver's avatar
James Laver committed
91
#### Yarn (javascript package manager)
92

James Laver's avatar
James Laver committed
93
On debian or ubuntu:
94 95 96 97 98 99 100

```shell
curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
sudo apt update && sudo apt install yarn
```

James Laver's avatar
James Laver committed
101
On Mac OS X with homebrew:
James Laver's avatar
James Laver committed
102

103
```shell
James Laver's avatar
James Laver committed
104
brew install yarn
105 106
```

James Laver's avatar
James Laver committed
107
For other platforms, please refer to [the yarn website](https://www.yarnpkg.com/).
James Laver's avatar
James Laver committed
108

James Laver's avatar
James Laver committed
109
## Development
110

James Laver's avatar
James Laver committed
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
### Docker environment

Are you using the docker setup? Run this:

```shell
source ./env.sh
```

This enables the docker container to run as the current user so any
files it writes will be readable by you. It also creates a `darn`
shell alias (short for `docker yarn`) for running yarn commands inside
the docker container.

### Basic tasks

126 127
Now we must install our javascript and purescript dependencies:  
*Note: if you're installing manually you might also need to manually install [psc-package](https://github.com/purescript/psc-package)*
128

129
```shell
James Laver's avatar
James Laver committed
130 131
darn install -D && darn install-ps # for docker setup
yarn install -D && yarn install-ps # for manual setup
132
```
133

James Laver's avatar
James Laver committed
134 135
You will likely want to check your work in a browser. We provide a
local development webserver that serves on port 5000 for this purpose:
136

137
```shell
James Laver's avatar
James Laver committed
138 139
darn server # for docker setup
yarn server # for manual setup
140
```
141

James Laver's avatar
James Laver committed
142
To generate a new browser bundle to test:
143 144

```shell
James Laver's avatar
James Laver committed
145 146
darn build # for docker setup
yarn build # for manual setup
147 148
```

James Laver's avatar
James Laver committed
149
If you are rapidly iterating and just want to type check your code:
150 151

```shell
James Laver's avatar
James Laver committed
152 153
darn compile # for docker setup
yarn compile # for manual setup
154 155
```

James Laver's avatar
James Laver committed
156
You may access a purescript repl if you want to explore:
157 158

```shell
James Laver's avatar
James Laver committed
159 160
darn repl # for docker setup
yarn repl # for manual setup
161 162
```

James Laver's avatar
James Laver committed
163
If you need to reinstall dependencies such as after a git pull or branch switch:
164 165

```shell
James Laver's avatar
James Laver committed
166 167
darn install -D && darn install-ps # for docker setup
yarn install -D && yarn install-ps # for manual setup
168 169
```

James Laver's avatar
James Laver committed
170 171
If something goes wrong building after a deps update, you may clean
build artifacts and try again:
James Laver's avatar
James Laver committed
172 173

```shell
James Laver's avatar
James Laver committed
174 175 176 177 178 179 180 181
# for docker setup
darn clean-js # clean javascript, very useful
darn clean-ps # clean purescript, should never be required, possible purescript bug
darn clean # clean both purescript and javascript
# for manual setup
yarn clean-js
yarn clean-ps
yarn clean
James Laver's avatar
James Laver committed
182 183
```

James Laver's avatar
James Laver committed
184
If you edit the SASS, you'll need to rebuild the CSS:
James Laver's avatar
James Laver committed
185

James Laver's avatar
James Laver committed
186
```shell
James Laver's avatar
James Laver committed
187 188
darn css # for docker setup
yarn css # for manual setup
James Laver's avatar
James Laver committed
189
```
James Laver's avatar
James Laver committed
190

James Laver's avatar
James Laver committed
191 192
<!-- A `purs ide` connection will be available on port 9002 while the -->
<!-- development server is running. -->
James Laver's avatar
James Laver committed
193

James Laver's avatar
James Laver committed
194
A guide to getting set up with the IDE integration is coming soon, I hope.
James Laver's avatar
James Laver committed
195

196 197 198 199 200 201 202 203
### Testing

To run unit tests, just run:

``` shell
test-ps
```

James Laver's avatar
James Laver committed
204
### Note to contributors
James Laver's avatar
James Laver committed
205

James Laver's avatar
James Laver committed
206
Please follow CONTRIBUTING.md
207 208 209

### How do I?

210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226
#### Add a javascript dependency?

Add it to `package.json`, under `dependencies` if it is needed at
runtime or `devDependencies` if it is not.

#### Add a purescript dependency?

Add it to `psc-package.json` without the `purescript-` prefix.

If is not in the package set, you will need to read the next section.

#### Add a custom or override package to the local package set?

You need to add an entry to the relevant map in
`packages.dhall`. There are comments in the file explaining how it
works. It's written in dhall, so you can use comments and such.

227 228 229
You will then need to rebuild the package set:

```shell
James Laver's avatar
James Laver committed
230
yarn rebuild-set # or darn rebuild-set
231 232
```

233 234 235
#### Upgrade the base package set local is based on to latest?

```shell
James Laver's avatar
James Laver committed
236
yarn rebase-set && yarn rebuild-set # or darn rebase-set && darn rebuild-set
237 238
```

James Laver's avatar
James Laver committed
239 240
This will occasionally result in swearing when you go on to build.

241
## Theory Introduction
242 243

Making sense of out text isn't actually that hard, but it does require
244
a little background knowledge to understand.
245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271

### N-grams

N-grams are at the heart of how Gargantext makes sense out of text.

There are two common meanings in the literature for n-gram:
- a sequence of `n` characters
- a sequence of `n` words

Gargantext is focused on words. Here are some example word n-grams;

- `coffee` (unigram or 1-gram)
- `need coffee` (bigram or 2-gram)
- `one coffee please` (trigram or 3-gram)
- `here is your coffee` (4-gram)
- `i need some more coffee` (5-gram)

N-grams are matched case insensitively and across whole words. Examples:

| Text         | N-gram       | Matches              |
|--------------|--------------|----------------------|
| `Coffee cup` | `coffee`     | YES                  |
| `Coffee cup` | `off`        | NO, not a whole word |
| `Coffee cup` | `coffee cup` | YES                  |

You may read more about n-grams [on wikipedia](https://en.wikipedia.org/wiki/N-gram).

272 273
<!-- TODO: Discuss punctuation -->

274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295
Gargantext allows you to define n-grams interactively in your browser
and explore the relationships they uncover across a corpus of text.

Various metrics can be applied to n-grams, the most common of which is
the number of times an n-gram appears in a document.

## Glossary

document
: One or more texts comprising a single logical document
field
: A portion of a document, e.g. `title`, `abstract`, `body`
corpus
: A collection of documents
n-gram/ngram
: A word or words to be indexed, consisting of `n` words.
  This technically includes skip-grams, but in the general case
  the words will be contiguous.
unigram/1-gram
: A one-word n-gram, e.g. `cow`, `coffee`
bigram/2-gram
: A two-word n-gram, e.g. `coffee cup`
296
trigram/3-gram
297
: A three-word n-gram, e.g. `coffee cup holder`
James Laver's avatar
James Laver committed
298 299 300 301
skip-gram
: An n-gram where the words are not all adjacent. Not yet supported.
k-skip-n-gram
: An n-gram where the words are at most distance k from each other.
302 303