Break loops in Ngrams graphs
Fixes #513 (closed). (Alas I made a bit of a boo-boo with the commits as I had to work on multiple machines yesterday, apologies).
Tests pass but I couldn't test it thoroughly.
This MR reworks the internal of buildForest
so that we build a graph first, precisely a list of strongly connected components , then we look for loops and if we find them we apply a OnLoopDetectedStrategy
, out of which for now I have implemented only the "just do it", which will remove the loop by computing the spanning forest and otherwise trivially remove the loop.
The tests pass and despite I didn't have time to add more tests to test this as thoroughly as I wanted, I have tried a number of scenario -- I can now run the query that @davidchavalarias mentioned that was looping (the one from HAL about the "numeric technique") as well as import @fmaniere 's docslist which was shared inside #513 (closed) that was looping before.
I think/hope what this MR delivers should be an improvement on what we have now, so I think it's worth merging and deploying over to dev.sub
provided CI is green.