Doc annotation
There is a bug for
- multiple consecutive ngrams which intersection is not Null
- changing type of ngrams which has punctuation near of it
- some double spaces issue in ngrams
FIX:
- change the return from Maybe to Array
- remove the punctuation when editing ngrams ".,;(){}"
changed due date to June 27, 2019
changed milestone to %V4 Release
- Owner
Hello, nice push. When adding a new multi-terms near from punctuation seems to not highlight the terms itself. Ask me if you need more to reproduce the bug.
- Maintainer
You mean selecting something like
i.e.
? I see in the network that it is sent asi.e.
. I think it has something to do we mangle the strings internally. Anyways after refresh I see these terms are not stored in the API -- is this a bug? - Maintainer
I mean even single-word terms which are highlighted are not preserved after refresh.
- Developer
@anoe was refering to this commit a42e4aaa which makes it work much better. However I realize that the selected term should be normalized further.
Currently it is case lowered. Additionally I think it should:
- replace punctuation by spaces
- then trimed
Indeed, to be potentially highlighted, ngrams should follow the same rules as the highlighter.
- Owner
@np yes, removing punctuation will improve it. But it will not be definitive if we hard code the punctuation. For instance i.e. here the dot is a part of the term (we will need to add some machine learning stuff about what is a punctuation). Having that in mind I agree with that punctuation fix for now.
- Maintainer
This seems to fix things for the
i.e.
string andtarget-independent
. Punctuation replaced by spaces in the same manner as inhighlightNgrams
.Edited by Przemyslaw Kaminski - Maintainer
BTW, it might be profitable at this stage to write some tests for this function. This is purely algorithmic, it would be "type-safe" to store such special cases as unit tests, just to be sure things aren't broken in the future.
- Maintainer
BTW2 Maybe it's better to move this functionality to the backend? I mean the frontend would just send the unprocessed selected string. Text could be also sent with ngrams already highlighted by the backend (we would need to think of some format to send this data) so that the frontend only renders whatever the backend provided. This allows to abstract away those special cases like
i.e.
etc. Think of supporting other languages as well. I think you have to do that normalization anyways somewhere in the backend when you analyze the text. - Owner
I just tested on the V4 and on several example the underlying of a single term was not taken into account. I wanted to highlight "Antiviral" in "TREATMENT: PHARMACOLOGICAL TREATMENT , TYPE: Antiviral, antiretrovirals , TREATMENT NAME: lopinavir + ritonavir". It appears in the term table as a mapterm but with 0 count and it is not highlighted in the doc. Maybe because of the comma ?
closed