Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Submit feedback
    • Contribute to GitLab
  • Sign in
haskell-gargantext
haskell-gargantext
  • Project
    • Project
    • Details
    • Activity
    • Releases
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
  • Issues 186
    • Issues 186
    • List
    • Board
    • Labels
    • Milestones
  • Merge Requests 13
    • Merge Requests 13
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • gargantext
  • haskell-gargantexthaskell-gargantext
  • Issues
  • #509

Closed
Open
Opened Sep 12, 2025 by Przemyslaw Kaminski@cgenie
  • Report abuse
  • New issue
Report abuse New issue

Investigate `getOccByNgramsOnlyFast`

This is a follow-up of #505 .

In !445 I introduced some fixes to the query.

However, the functions getOccByNgramsOnlyFast_withSample and getOccByNgramsOnlyFast should be analyzed closer.

The second one returns a HashMap NgramsTerm [ContextId]. That context list is generated tediously by postgres, collecting DISTINCT context_id into an array. That function, via setNgramsTableScores is used to return ngrams with a list of their occurrences (all context ids).

However, when I look at purescript code, and search for occurrences, I see mostly things like sumOccurrences or Set.size occurrences which would suggest that we only need occurrences_count.

This could help further simplify this query, if only count were needed.

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
0
Labels
None
Assign labels
  • View project labels
Reference: gargantext/haskell-gargantext#509