Investigate `getOccByNgramsOnlyFast`
This is a follow-up of #505 .
In !445 I introduced some fixes to the query.
However, the functions getOccByNgramsOnlyFast_withSample
and getOccByNgramsOnlyFast
should be analyzed closer.
The second one returns a HashMap NgramsTerm [ContextId]
. That context list is generated tediously by postgres, collecting DISTINCT context_id
into an array. That function, via setNgramsTableScores
is used to return ngrams with a list of their occurrences
(all context ids).
However, when I look at purescript code, and search for occurrences
, I see mostly things like sumOccurrences
or Set.size occurrences
which would suggest that we only need occurrences_count
.
This could help further simplify this query, if only count were needed.