Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Submit feedback
    • Contribute to GitLab
  • Sign in
haskell-gargantext
haskell-gargantext
  • Project
    • Project
    • Details
    • Activity
    • Releases
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
  • Issues 155
    • Issues 155
    • List
    • Board
    • Labels
    • Milestones
  • Merge Requests 9
    • Merge Requests 9
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • gargantext
  • haskell-gargantexthaskell-gargantext
  • Merge Requests
  • !249

Merged
Opened 1 year ago by Alfredo Di Napoli@AlfredoDiNapoli
  • Report abuse
Report abuse

Improve PhyloMaker performance up to 50%

@anoe This is the first of many patches that I will land trying to improve the performance and the readability of the Phylo code.

I have spent a few weeks on this trying to get myself familiar with the code, which is quite complex and intricate, and after a few false starts, I have gone back to the drawing board by starting to tackle the low hanging fruit, small wins where we could see immediately the performance gains. At glance, this is what I have done:

  1. I have replace the nub and sort functions in PhyloMaker with nubOrd or, when possible, with nub and sort from the discrimination package, which runs in O(n), whereas Data.List.nub sort is notoriously slow and exponential (O(n^2)) and Data.List.sort uses IIRC a flavour of the QuickSort and runs in O(n*logn).

Furthermore, as the majority of the time of the PhyloMaker is spent calculating relatedComponents, but we do this on all the similarity Set, we can parallelise that, and so I have added a strategic parMap.

This patch also adds a new (test) executable called garg-phylo-profile which shares code from the garg-phylo but is meant to be used for profiling as it load some canned data.

Results

Before my patch, running garg-phylo-profile would take:

real	0m17,210s
user	0m25,466s
sys	0m0,747s

After my patch:

real	0m9,230s
user	0m34,577s
sys	0m0,465s

In terms of eventlog, we are doing a bit better, but I need to do more investigation as I think there are bits which are highly parallel and they can be improved, but that will happen in a separate round of investigations:

Screenshot_2024-02-22_at_09.32.23

As you can see this is now much more parallel (green means "doing work", here) and the productivity went up a bit and the GC down a bit, but there is a huge gap where we are (likely) waiting for work to be computed (my hunch is that we are waiting for the similarities to be computed). This can be improved, but will happen in due course.

Edited 1 year ago by Alfredo Di Napoli

Check out, review, and merge locally

Step 1. Fetch and check out the branch for this merge request

git fetch origin
git checkout -b adinapoli/phylo-profile-2 origin/adinapoli/phylo-profile-2

Step 2. Review the changes locally

Step 3. Merge the branch and fix any conflicts that come up

git fetch origin
git checkout origin/dev
git merge --no-ff adinapoli/phylo-profile-2

Step 4. Push the result of the merge to GitLab

git push origin dev

Note that pushing to GitLab requires write access to this repository.

Tip: You can also checkout merge requests locally by following these guidelines.

Request to merge adinapoli/phylo-profile-2 into dev
  • Email patches
  • Plain diff
Pipeline #5644 passed for 484f3aea on adinapoli/phylo-profile-2

          Merged by delanoe 1 year ago

          The changes were merged into dev with 484f3aea

          • Discussion 3
          • Commits 12
          • Pipelines 2
          • Changes 18
          • Loading...
          • You're only seeing other activity in the feed. To add a comment, switch to one of the following options.
          Please register or sign in to reply
          Assignee
          Assign to
          None
          Milestone
          None
          Assign milestone
          None
          Time tracking
          No estimate or time spent
          0
          Labels
          None
          Assign labels
          • View project labels
          Lock merge request
          Unlocked
          participants
          Reference: gargantext/haskell-gargantext!249

          Revert this commit

          This will create a new commit in order to revert the existing changes.

          Switch branch
          Cancel
          A new branch will be created in your fork and a new merge request will be started.

          Cherry-pick this commit

          Switch branch
          Cancel
          A new branch will be created in your fork and a new merge request will be started.