Keep only the roots in searchTableNgrams
Work in progress for #504.
@davidchavalarias I have some questions -- the code for managing ngrams is a bit subtle, but in essence what we do is to build a forest of ngrams trees, where each tree is composed by a root (e.g. soils
) and one or more children, that they shows up in the UI bundled together.
I have done some tweaks as part of this MR which, on the surface, makes it look like it's fixing things:
However, I think there are some dark corners that I would like to clarify:
-
In the issue #504, you were surprised that the number of candidate terms were too small. I have looked at the number this patch generate, and we generate
350
terms if we include all terms, but less than 350 if we consider candidate or map terms, which seems reasonable to me. Does that match your intuition? In other terms, 350 will be a fixed number of all terms, the candidate will be less; -
I am not sure if this is a bug in Gargantext or something which wasn't accounted for, but at the moment the UI will (correctly so) not show trees which mixes list types. What do I mean? Consider the following two terms in that "water" example:
risks
andrisk
. It turns out that one is a candidate term (i.e.risk
) but due to the fact its parent is a map term, then at the moment it won't be included in the final list of terms. What should be the correct behaviour between:- a. The parent gets "promoted" to a candidate term, so that we can show both
risk
andrisks
bundled together; - b. Only
risk
gets shown as a candidate term alone, i.e. detached from the parent. This means that now alsorisks
will be shown as a standalone item in the Map terms view, which seems incorrect.
- a. The parent gets "promoted" to a candidate term, so that we can show both