README.md 6.36 KB
Newer Older
1 2 3 4 5 6 7 8 9 10
# Gargantext Purescript

## About this project

Gargantext is a collaborative web platform for the exploration of sets
of unstructured documents. It combines tools from natural language
processing, text-mining, complex networks analysis and interactive data
visualization to pave the way toward new kinds of interactions with your
digital corpora.

11 12 13 14
You will not find this software very useful without also running or being
granted access to a [backend](https://gitlab.iscpif.fr/gargantext/haskell-gargantext).

This software is free software, developed by the CNRS Complex Systems
15 16
Institute of Paris Île-de-France (ISC-PIF) and its partners.

James Laver's avatar
James Laver committed
17
## Dependencies
18

James Laver's avatar
James Laver committed
19
The build requires the following system dependencies preinstalled:
20

James Laver's avatar
James Laver committed
21 22 23
* NodeJS (11+)
* Yarn (Recent)

James Laver's avatar
James Laver committed
24
### NodeJS
James Laver's avatar
James Laver committed
25 26

On debian testing, debian unstable or ubuntu:
27 28 29 30 31

```shell
sudo apt update && sudo apt install nodejs yarn
```

James Laver's avatar
James Laver committed
32 33
On debian stable:

34 35 36 37 38
```shell
curl -sL https://deb.nodesource.com/setup_11.x | sudo bash -
sudo apt update && sudo apt install nodejs
```

James Laver's avatar
James Laver committed
39 40 41
<!-- TODO: wtf is all this sudo? -->
<!-- To upgrade to latest version (and not current stable) version, you can -->
<!-- use the `n` module from npm to upgrade node: -->
42

James Laver's avatar
James Laver committed
43 44 45 46 47 48
<!-- ```shell -->
<!-- sudo npm cache clean -f -->
<!-- sudo npm install -g n -->
<!-- sudo n stable -->
<!-- sudo n latest -->
<!-- ``` -->
49

James Laver's avatar
James Laver committed
50
On Mac OS X with homebrew:
51 52

```shell
James Laver's avatar
James Laver committed
53
brew install node
54 55
```

James Laver's avatar
James Laver committed
56 57 58
For other platforms, please refer to [the nodejs website](https://nodejs.org/en/download/).

### Yarn (javascript package manager)
59

James Laver's avatar
James Laver committed
60
On debian or ubuntu:
61 62 63 64 65 66 67

```shell
curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
sudo apt update && sudo apt install yarn
```

James Laver's avatar
James Laver committed
68
On Mac OS X with homebrew:
James Laver's avatar
James Laver committed
69

70
```shell
James Laver's avatar
James Laver committed
71
brew install yarn
72 73
```

James Laver's avatar
James Laver committed
74
For other platforms, please refer to [the yarn website](https://www.yarnpkg.com/).
James Laver's avatar
James Laver committed
75

James Laver's avatar
James Laver committed
76
## Development
77

James Laver's avatar
James Laver committed
78
Once you have node and yarn installed, you may install deps with:
79

80
```shell
James Laver's avatar
James Laver committed
81
yarn install -D && yarn install-ps
82
```
83

James Laver's avatar
James Laver committed
84 85
You will likely want to check your work in a browser. We provide a
local development webserver that serves on port 5000 for this purpose:
86

87
```shell
James Laver's avatar
James Laver committed
88
yarn server
89
```
90

James Laver's avatar
James Laver committed
91
To generate a new browser bundle to test:
92 93

```shell
James Laver's avatar
James Laver committed
94
yarn build
95 96
```

James Laver's avatar
James Laver committed
97
If you are rapidly iterating and just want to type check your code:
98 99

```shell
James Laver's avatar
James Laver committed
100
yarn compile
101 102
```

James Laver's avatar
James Laver committed
103
You may access a purescript repl if you want to explore:
104 105 106 107 108

```shell
yarn repl
```

James Laver's avatar
James Laver committed
109
If you need to reinstall dependencies such as after a git pull or branch switch:
110 111

```shell
James Laver's avatar
James Laver committed
112
yarn install -D && yarn install-ps # both javascript and purescript
113 114
```

James Laver's avatar
James Laver committed
115 116
If something goes wrong building after a deps update, you may clean
build artifacts and try again:
James Laver's avatar
James Laver committed
117 118

```shell
James Laver's avatar
James Laver committed
119 120 121
yarn clean-js # clean javascript, very useful
yarn clean-ps # clean purescript, should never be required, possible purescript bug
yarn clean # clean both purescript and javascript
James Laver's avatar
James Laver committed
122 123
```

James Laver's avatar
James Laver committed
124
If you edit the SASS, you'll need to rebuild the CSS:
James Laver's avatar
James Laver committed
125

James Laver's avatar
James Laver committed
126 127 128
```shell
yarn sass
```
James Laver's avatar
James Laver committed
129

James Laver's avatar
James Laver committed
130 131
<!-- A `purs ide` connection will be available on port 9002 while the -->
<!-- development server is running. -->
James Laver's avatar
James Laver committed
132

James Laver's avatar
James Laver committed
133 134
A guide to getting set up with the IDE integration is coming soon, I hope.
of this document.
James Laver's avatar
James Laver committed
135

James Laver's avatar
James Laver committed
136
### Note to contributors
James Laver's avatar
James Laver committed
137

James Laver's avatar
James Laver committed
138
Please follow CONTRIBUTING.md
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155

### How do I?

#### Change which backend to connect to?

Edit `Config.purs`. Find the function `endConfig'` just after the
imports and edit `back`. The definitions are not far below, just after
the definitions of the various `front` options.

Example (using `demo.gargantext.org` as backend):

```
endConfig' :: ApiVersion -> EndConfig
endConfig' v = { front : frontRelative
               , back  : backDemo v  }
```

156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172
#### Add a javascript dependency?

Add it to `package.json`, under `dependencies` if it is needed at
runtime or `devDependencies` if it is not.

#### Add a purescript dependency?

Add it to `psc-package.json` without the `purescript-` prefix.

If is not in the package set, you will need to read the next section.

#### Add a custom or override package to the local package set?

You need to add an entry to the relevant map in
`packages.dhall`. There are comments in the file explaining how it
works. It's written in dhall, so you can use comments and such.

173 174 175 176 177 178
You will then need to rebuild the package set:

```shell
yarn rebuild-set
```

179 180 181
#### Upgrade the base package set local is based on to latest?

```shell
182
yarn rebase-set && yarn rebuild-set
183 184
```

James Laver's avatar
James Laver committed
185 186
This will occasionally result in swearing when you go on to build.

187
## Theory Introduction
188 189

Making sense of out text isn't actually that hard, but it does require
190
a little background knowledge to understand.
191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217

### N-grams

N-grams are at the heart of how Gargantext makes sense out of text.

There are two common meanings in the literature for n-gram:
- a sequence of `n` characters
- a sequence of `n` words

Gargantext is focused on words. Here are some example word n-grams;

- `coffee` (unigram or 1-gram)
- `need coffee` (bigram or 2-gram)
- `one coffee please` (trigram or 3-gram)
- `here is your coffee` (4-gram)
- `i need some more coffee` (5-gram)

N-grams are matched case insensitively and across whole words. Examples:

| Text         | N-gram       | Matches              |
|--------------|--------------|----------------------|
| `Coffee cup` | `coffee`     | YES                  |
| `Coffee cup` | `off`        | NO, not a whole word |
| `Coffee cup` | `coffee cup` | YES                  |

You may read more about n-grams [on wikipedia](https://en.wikipedia.org/wiki/N-gram).

218 219
<!-- TODO: Discuss punctuation -->

220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241
Gargantext allows you to define n-grams interactively in your browser
and explore the relationships they uncover across a corpus of text.

Various metrics can be applied to n-grams, the most common of which is
the number of times an n-gram appears in a document.

## Glossary

document
: One or more texts comprising a single logical document
field
: A portion of a document, e.g. `title`, `abstract`, `body`
corpus
: A collection of documents
n-gram/ngram
: A word or words to be indexed, consisting of `n` words.
  This technically includes skip-grams, but in the general case
  the words will be contiguous.
unigram/1-gram
: A one-word n-gram, e.g. `cow`, `coffee`
bigram/2-gram
: A two-word n-gram, e.g. `coffee cup`
242
trigram/3-gram
243
: A three-word n-gram, e.g. `coffee cup holder`
James Laver's avatar
James Laver committed
244 245 246 247
skip-gram
: An n-gram where the words are not all adjacent. Not yet supported.
k-skip-n-gram
: An n-gram where the words are at most distance k from each other.
248 249