Fix NGrams pagination (purescript-gargantext#531)
This merge request fixes the ngrams pagination problem(s). I say "problems" (plural) because when I was working on this issue, I've also noticed another bug: not only was the number of entries returned per page incorrect, but in case of NGramElement
s with children, we were not sorting correctly by the number of occurrences. To give a concrete example, in my development environment previously I had this when I requested the second page for the Haskell corpus:
This is incorrect, because the terms
Ngram had an occurrences count of 128, so it should have been listed as the third entry of the first page, not the first of the second!
After my patch, both bugs have been fixed:
@anoe I had to make some changes to the NGrams types, in particular now the occurrences are stored as a Set ContextId
rather than a [ContextId]
, because my assumption here is that we cannot have duplicate context ids in the occurrences, so let me know if you think this change might be causing problems elsewhere.
IMPORTANT: While reviewing this code, it occurred to me that most of its implementation is pure, so it will be really helpful to spend some time adding more tests which will shelter us in the future from possible regressions in the way we paginate. @anoe maybe something for me to work on next week while you are away?
I haven't measured the performance impact after my changes, but I think it's important to have a correct implementation before even considering its performance, which should be comparable to the previous one anyway.