Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Submit feedback
    • Contribute to GitLab
  • Sign in
haskell-gargantext
haskell-gargantext
  • Project
    • Project
    • Details
    • Activity
    • Releases
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
  • Issues 179
    • Issues 179
    • List
    • Board
    • Labels
    • Milestones
  • Merge Requests 10
    • Merge Requests 10
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • gargantext
  • haskell-gargantexthaskell-gargantext
  • Issues
  • #292

Closed
Open
Opened Dec 14, 2023 by Alfredo Di Napoli@AlfredoDiNapoli
  • Report abuse
  • New issue
Report abuse New issue

Improve Phylo robustness and performance

In the context of #290 (closed), I have taken a deeper look at the Phylo code, and in the context of !232 (merged) I have paved the way to a more systematic testing and benchmarking of Phylo.

While looking at the code, I have realised a few things:

  1. The Phylo code is fairly slow (and this is advertised in the UI as well), but I think there are a couple of places where we could try to parallelise the code (but we need benchmark and performance tests first, to make sure we know what we are improving);

  2. While taking a look at the generated .dot file, I have noticed that we seems to emit the wrong data in same places. Take a look at this excerpt, for example:

        group1984198620 [fontname=Arial
                        ,shape=square
                        ,penwidth=4
                        ,nodeType=group
                        ,gid=group1984198620
                        ,from=1984
                        ,to=1986
                        ,strFrom="\"1986-01-01\""
                        ,strTo="\"1986-01-01\""
                        ,branchId="0 2 1 1 1 1 1 1 1 1 1"
                        ,bId=0
                        ,support=2
                        ,weight="Just 2.0"
                        ,source="[]"
                        ,sourceFull="[]"
                        ,density=0.0
                        ,cooc="fromList [((2,2),3.0)]"
                        ,lbl="\"competitive intelligence\""
                        ,foundation="\"2\""
                        ,role="\"3.0\""
                        ,frequence="\"6.359649122807036e-2\""
                        ,seaLvl="[0.0,0.1,0.2,0.30000000000000004,0.4,0.5,0.6,0.7,0.7999999999999999,0.8999999999999999,0.9999999999999999]"];

This looks a bit iffy to me (but maybe that's intended):

  • The weight field is being represented as the Just 2.0 string, which sounds like it's a mistake -- shouldn't this be just 2.0, treated as a double?

  • The cooc includes the fromList, which is the direct show of the underlying Map, which seems suspect, I would have expected just a list of tuples here;

  • Things like strFrom and strTo includes a quoted date, whereas I would have expected to not include the internal quote (i.e. render this is just 1986-01-01 for example;

  • Numbers like foundation, role, frequence etc are all strings, but possibly they could be numbers?

  1. As mentioned in #290 (closed), we have an issue where the cooc field becomes too long; for now I have fixed this by manually patching graphviz, but it sounds like we should come up with a more succinct representation, if possible? It looks like unbounded strings (or linear in the number of documents) are going to be a problem.

It would be nice to spend a bit of time investigating the performance of Phylo as well as increase his coverage testing, because if the above rendering wasn't intentional, a test would have caught this.

@anoe I'm more than happy to take a look at this in the new year.

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
0
Labels
None
Assign labels
  • View project labels
Reference: gargantext/haskell-gargantext#292