Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Submit feedback
    • Contribute to GitLab
  • Sign in
haskell-gargantext
haskell-gargantext
  • Project
    • Project
    • Details
    • Activity
    • Releases
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
  • Issues 159
    • Issues 159
    • List
    • Board
    • Labels
    • Milestones
  • Merge Requests 8
    • Merge Requests 8
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • gargantext
  • haskell-gargantexthaskell-gargantext
  • Merge Requests
  • !415

Open
Opened Jun 05, 2025 by Alfredo Di Napoli@AlfredoDiNapoli
  • Report abuse
Report abuse

Separate ngram extraction from document insertion

Fixes #473.

This MR refactors a bit the code around insertMasterDocs to split the Ngrams generation from the document creation. Before this MR the ngrams extraction happened in the middle of insertMasterDocs, meaning that we had to contact the NLP server in the middle of what could have been a perfectly atomic DB transaction, with the risk of leading to an inconsistent state, i.e. document inserted without any ngrams (effectively leaving the flow in an incomplete state).

This MR fixes that by splitting the process in two parts: first we generate the Ngrams, storing those in a Map indexed by a DocumentHashId, and later we match every Node created with the previously generated ngrams -- this last step can happen in a pure fashion (it's just a map lookup) so it can be embedded safely inside insertMasterDocs, which is now a single DBUpdate.

@cgenie I don't think my work is necessarily conflicting with yours, but I did some refactoring around some typeclasses like UniqParameters & friends as they were a bit iffy, so perhaps have a look if something stands out that might create problems on your side (this is work-in-progress work, btw).

Edited Jun 05, 2025 by Alfredo Di Napoli

Check out, review, and merge locally

Step 1. Fetch and check out the branch for this merge request

git fetch origin
git checkout -b adinapoli/issue-473 origin/adinapoli/issue-473

Step 2. Review the changes locally

Step 3. Merge the branch and fix any conflicts that come up

git fetch origin
git checkout origin/dev
git merge --no-ff adinapoli/issue-473

Step 4. Push the result of the merge to GitLab

git push origin dev

Note that pushing to GitLab requires write access to this repository.

Tip: You can also checkout merge requests locally by following these guidelines.

  • Discussion 8
  • Commits 8
  • Pipelines 3
  • Changes 16
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
1
Labels
To Review (review requested)
Assign labels
  • View project labels
Reference: gargantext/haskell-gargantext!415