Tokenize ngrams on the backend, don't do it on the frontend
We currently perform ngrams analysis on the backend (via corenlp; also we use postgres fts). However, there is also custom code on the frontend which is supposed to find ngrams in a given text and highlight them:
This code is quite complex and error-prone (also with #551 (closed) it seems it might not be too fast).
We can use the fact that our documents are immutable.
My suggestion is to store the ngrams position in the DB already and just serve the frontend with a list like this:
[
{ "from": 10
, "to": 30
, "text": "Michael Jackson"
, "type": "MapTerm"
...
}
]
This way the frontend doesn't care how the ngrams were generated, it just does what a frontend should do, i.e. be dumb, just show data, don't compute it. I guess this would also make a better decomposition of responsibilities: if highlighting doesn't work then it's frontend's fault, if terms aren't shown then it's backend's fault.