Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Submit feedback
    • Contribute to GitLab
  • Sign in
O
openalex
  • Project
    • Project
    • Details
    • Activity
    • Releases
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
  • Issues 3
    • Issues 3
    • List
    • Board
    • Labels
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • gargantext
  • crawlers
  • openalex
  • Issues
  • #3

Closed
Open
Opened Jul 26, 2023 by delanoe@anoe
  • Report abuse
  • New issue
Report abuse New issue

Clean the Text before sending to NLP

In the GarganText textflow, we need to send the text (i.e. title <> abstract) to a NLP ngrams postagging service.

If the abstract is not cleaned then it makes crash the NLP micro service, that is why I have not merged openalex yet.

It can be reproduced with the query "b12 AND children".

I have started but not finished to clean the text on the branch: clean-text-please

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
0
Labels
None
Assign labels
  • View project labels
Reference: gargantext/crawlers/openalex#3