Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Submit feedback
    • Contribute to GitLab
  • Sign in
P
purescript-gargantext
  • Project
    • Project
    • Details
    • Activity
    • Releases
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
  • Issues 137
    • Issues 137
    • List
    • Board
    • Labels
    • Milestones
  • Merge Requests 5
    • Merge Requests 5
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • gargantext
  • purescript-gargantext
  • Issues
  • #86

Something went wrong while fetching related merge requests.
Closed
Open
Opened 5 years ago by delanoe@anoe
  • Report abuse
  • New issue
Report abuse New issue

Doc annotation

There is a bug for

  • multiple consecutive ngrams which intersection is not Null
  • changing type of ngrams which has punctuation near of it
  • some double spaces issue in ngrams

FIX:

  • change the return from Maybe to Array
  • remove the punctuation when editing ngrams ".,;(){}"

Please solve the reCAPTCHA

We want to be sure it is you, please confirm you are not a robot.

  • delanoe @anoe changed due date to June 27, 2019 5 years ago

    changed due date to June 27, 2019

  • delanoe @anoe changed milestone to %V4 Release 5 years ago

    changed milestone to %V4 Release

  • delanoe
    delanoe @anoe · 5 years ago
    Owner

    Hello, nice push. When adding a new multi-terms near from punctuation seems to not highlight the terms itself. Ask me if you need more to reproduce the bug.

  • Przemyslaw Kaminski
    Przemyslaw Kaminski @cgenie · 5 years ago
    Maintainer

    You mean selecting something like i.e.? I see in the network that it is sent as i.e.. I think it has something to do we mangle the strings internally. Anyways after refresh I see these terms are not stored in the API -- is this a bug?

  • Przemyslaw Kaminski
    Przemyslaw Kaminski @cgenie · 5 years ago
    Maintainer

    I mean even single-word terms which are highlighted are not preserved after refresh.

  • Nicolas Pouillard
    Nicolas Pouillard @np · 5 years ago
    Developer

    @anoe was refering to this commit a42e4aaa which makes it work much better. However I realize that the selected term should be normalized further.

    Currently it is case lowered. Additionally I think it should:

    • replace punctuation by spaces
    • then trimed

    Indeed, to be potentially highlighted, ngrams should follow the same rules as the highlighter.

  • delanoe
    delanoe @anoe · 5 years ago
    Owner

    @cgenie yes, we need to reindex the database with new ngrams since occurrences are used to retrive them after refresh

  • delanoe
    delanoe @anoe · 5 years ago
    Owner

    @np yes, removing punctuation will improve it. But it will not be definitive if we hard code the punctuation. For instance i.e. here the dot is a part of the term (we will need to add some machine learning stuff about what is a punctuation). Having that in mind I agree with that punctuation fix for now.

  • Przemyslaw Kaminski
    Przemyslaw Kaminski @cgenie · 5 years ago
    Maintainer

    !16 (merged)

    This seems to fix things for the i.e. string and target-independent. Punctuation replaced by spaces in the same manner as in highlightNgrams.

    Edited by Przemyslaw Kaminski 5 years ago
  • Przemyslaw Kaminski
    Przemyslaw Kaminski @cgenie · 5 years ago
    Maintainer

    BTW, it might be profitable at this stage to write some tests for this function. This is purely algorithmic, it would be "type-safe" to store such special cases as unit tests, just to be sure things aren't broken in the future.

  • Przemyslaw Kaminski
    Przemyslaw Kaminski @cgenie · 5 years ago
    Maintainer

    BTW2 Maybe it's better to move this functionality to the backend? I mean the frontend would just send the unprocessed selected string. Text could be also sent with ngrams already highlighted by the backend (we would need to think of some format to send this data) so that the frontend only renders whatever the backend provided. This allows to abstract away those special cases like i.e. etc. Think of supporting other languages as well. I think you have to do that normalization anyways somewhere in the backend when you analyze the text.

  • david Chavalarias
    david Chavalarias @davidchavalarias · 5 years ago
    Owner

    I just tested on the V4 and on several example the underlying of a single term was not taken into account. I wanted to highlight "Antiviral" in "TREATMENT: PHARMACOLOGICAL TREATMENT , TYPE: Antiviral, antiretrovirals , TREATMENT NAME: lopinavir + ritonavir". It appears in the term table as a mapterm but with 0 count and it is not highlighted in the doc. Maybe because of the comma ?

  • delanoe @anoe closed 4 years ago

    closed

  • You're only seeing other activity in the feed. To add a comment, switch to one of the following options.
Please register or sign in to reply
Assignee
Nicolas Pouillard's avatar Nicolas Pouillard @np
Assign to
V4 Release
Milestone
V4 Release
Assign milestone
None
Time tracking
No estimate or time spent
Jun 27, 2019
Due date
Jun 27, 2019
0
Labels
None
Assign labels
  • View project labels
Confidentiality
Not confidential
Lock issue
Unlocked
4
4 participants
user avatar
Nicolas Pouillard
user avatar
delanoe
user avatar
david Chavalarias
user avatar
Przemyslaw Kaminski
Reference: gargantext/purescript-gargantext#86