README.md 6.87 KB
Newer Older
1 2 3 4 5 6 7 8 9 10
# Gargantext Purescript

## About this project

Gargantext is a collaborative web platform for the exploration of sets
of unstructured documents. It combines tools from natural language
processing, text-mining, complex networks analysis and interactive data
visualization to pave the way toward new kinds of interactions with your
digital corpora.

11 12 13 14
You will not find this software very useful without also running or being
granted access to a [backend](https://gitlab.iscpif.fr/gargantext/haskell-gargantext).

This software is free software, developed by the CNRS Complex Systems
15 16
Institute of Paris Île-de-France (ISC-PIF) and its partners.

17 18
## Development

James Laver's avatar
James Laver committed
19
### System Dependencies
20

James Laver's avatar
James Laver committed
21 22 23 24 25 26 27
* NodeJS (11+)
* Yarn (Recent)
* A webserver (anything that can serve a static directory will do)

#### NodeJS Installation

On debian testing, debian unstable or ubuntu:
28 29 30 31 32

```shell
sudo apt update && sudo apt install nodejs yarn
```

James Laver's avatar
James Laver committed
33 34
On debian stable:

35 36 37 38 39
```shell
curl -sL https://deb.nodesource.com/setup_11.x | sudo bash -
sudo apt update && sudo apt install nodejs
```

40
(For Ubuntu)
41 42 43 44 45 46
```shell
curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
sudo apt update && sudo apt install yarn
```

47 48 49 50 51 52 53 54 55 56
To upgrade to latest version (and not current stable) version, you can use
(Use n module from npm in order to upgrade node)

```shell
sudo npm cache clean -f
sudo npm install -g n
sudo n stable
sudo n latest
```

57 58 59

### OSX
```shell
James Laver's avatar
James Laver committed
60
brew install node
61 62
```

James Laver's avatar
James Laver committed
63
#### Yarn installation
64

James Laver's avatar
James Laver committed
65
On ubuntu:
66 67 68 69 70 71 72

```shell
curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
sudo apt update && sudo apt install yarn
```

James Laver's avatar
James Laver committed
73 74
On Mac OS (with Homebrew):

75
```shell
James Laver's avatar
James Laver committed
76
brew install yarn
77 78
```

James Laver's avatar
James Laver committed
79 80 81 82 83 84
#### Webservers

Some options:

* The `python3` builtin webserver 
* Caddy
85

James Laver's avatar
James Laver committed
86
### Purescript and Javascript dependencies
87

James Laver's avatar
James Laver committed
88
Once you have node and yarn installed, you may install deps with:
89

90
```shell
Alexandre Delanoë's avatar
Alexandre Delanoë committed
91
yarn install && yarn add psc-package && yarn install-ps && yarn build
92
```
93 94 95 96
You need to copy index.html:
```shell
cp src/index.html dist/
```
97

Alexandre Delanoë's avatar
Alexandre Delanoë committed
98 99 100
(Be careful, to update or upgrade your install, maybe you need to remove
old files in node_modules).

James Laver's avatar
James Laver committed
101 102 103
### Development

You can compile the purescript code with:
104

105
```shell
James Laver's avatar
James Laver committed
106
yarn compile
107
```
108

James Laver's avatar
James Laver committed
109
Or run a repl:
110 111 112 113 114

```shell
yarn repl
```

115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
```shell
yarn install && yarn ps-deps
```

### Running a dev server

```shell
yarn dev
```

This will launch a hot-reloading development server with
webpack-dev-server.  Visit [localhost:9000](http://localhost:9000/) to
see the result when the output shows a line like this:

```
ℹ 「wdm」: Compiled successfully.
```

#### Purescript IDE integration

A `purs ide` connection will be available on port 9002 while the
development server is running.

A guide to getting set up with the IDE integration is beyond the scope
of this document.

#### Source maps

Currently broken. Someone please fix them.

### Getting a purescript repl

```shell
yarn repl
```

151 152 153 154 155 156 157 158
### Compiling styles

We use the `sass` compiler for some of the style files. To convert them to CSS do:

```shell
yarn sass
```

159
### Building for production
James Laver's avatar
James Laver committed
160 161 162 163 164 165 166 167 168 169 170

```shell
yarn build
```

It is *not* necessary to `yarn compile` before running `yarn build`.

You can then serve the `dist` directory with your favourite webserver.

Examples:

171
* `python3 -m http.server --directory dist` (requires Python 3.7+)
James Laver's avatar
James Laver committed
172 173 174 175 176 177 178

<!-- To get a live-reloading development server -->

<!-- ```shell -->
<!-- yarn live -->
<!-- ``` -->

179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196
Note that a production build takes a little while.

### How do I?

#### Change which backend to connect to?

Edit `Config.purs`. Find the function `endConfig'` just after the
imports and edit `back`. The definitions are not far below, just after
the definitions of the various `front` options.

Example (using `demo.gargantext.org` as backend):

```
endConfig' :: ApiVersion -> EndConfig
endConfig' v = { front : frontRelative
               , back  : backDemo v  }
```

197 198 199
## Note to the contributors

Please follow CONTRIBUTING.md
200

201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219
### How do I?

#### Add a javascript dependency?

Add it to `package.json`, under `dependencies` if it is needed at
runtime or `devDependencies` if it is not.

#### Add a purescript dependency?

Add it to `psc-package.json` without the `purescript-` prefix.

If is not in the package set, you will need to read the next section.

#### Add a custom or override package to the local package set?

You need to add an entry to the relevant map in
`packages.dhall`. There are comments in the file explaining how it
works. It's written in dhall, so you can use comments and such.

220 221 222 223 224 225
You will then need to rebuild the package set:

```shell
yarn rebuild-set
```

226 227 228
#### Upgrade the base package set local is based on to latest?

```shell
229
yarn rebase-set && yarn rebuild-set
230 231
```

James Laver's avatar
James Laver committed
232 233
This will occasionally result in swearing when you go on to build.

234
## Theory Introduction
235 236

Making sense of out text isn't actually that hard, but it does require
237
a little background knowledge to understand.
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264

### N-grams

N-grams are at the heart of how Gargantext makes sense out of text.

There are two common meanings in the literature for n-gram:
- a sequence of `n` characters
- a sequence of `n` words

Gargantext is focused on words. Here are some example word n-grams;

- `coffee` (unigram or 1-gram)
- `need coffee` (bigram or 2-gram)
- `one coffee please` (trigram or 3-gram)
- `here is your coffee` (4-gram)
- `i need some more coffee` (5-gram)

N-grams are matched case insensitively and across whole words. Examples:

| Text         | N-gram       | Matches              |
|--------------|--------------|----------------------|
| `Coffee cup` | `coffee`     | YES                  |
| `Coffee cup` | `off`        | NO, not a whole word |
| `Coffee cup` | `coffee cup` | YES                  |

You may read more about n-grams [on wikipedia](https://en.wikipedia.org/wiki/N-gram).

265 266
<!-- TODO: Discuss punctuation -->

267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288
Gargantext allows you to define n-grams interactively in your browser
and explore the relationships they uncover across a corpus of text.

Various metrics can be applied to n-grams, the most common of which is
the number of times an n-gram appears in a document.

## Glossary

document
: One or more texts comprising a single logical document
field
: A portion of a document, e.g. `title`, `abstract`, `body`
corpus
: A collection of documents
n-gram/ngram
: A word or words to be indexed, consisting of `n` words.
  This technically includes skip-grams, but in the general case
  the words will be contiguous.
unigram/1-gram
: A one-word n-gram, e.g. `cow`, `coffee`
bigram/2-gram
: A two-word n-gram, e.g. `coffee cup`
289
trigram/3-gram
290 291 292 293 294 295 296 297
: A three-word n-gram, e.g. `coffee cup holder`
<!-- skip-grams are not yet supported -->
<!-- skip-gram -->
<!-- : An n-gram where the words are not all adjacent -->
<!-- k-skip-n-gram -->
<!-- : An n-gram where the words are at most distance k from each other -->