Specify behaviour of Phylo's `relatedComponents` function
@anoe when investigating the performance of the Phylo code, it emerged that the majority of the time is spent in the toRelatedComponents
and relatedComponents
functions. Both functions works on "list of lists" and they are probably quadratic in nature (in terms of exponential complexity).
Before diving into any kind of optimisation, I thought it was a good idea to write tests for relatedComponents
, trying to build a solid baseline for regression testing but also for trying to get a handle of what this function is supposed to do.
Unfortunately the documentation is scarce, but as far as I know the idea is to build some sort of notion of "connected components" for graph's nodes, by taking as input the list of edges. I say "connected components" in quotes because it's not entirely clear what this function is doing. Let's consider this test:
(relatedComponents @Int) [[1,2], [2,4]] @?= [[1,2,4]]
When given the input list [[1,2],[2,4]]
identifying two edges, it correctly discovers that they are connected and that there is a path between them. In pictures:
This makes perfect sense to me. However, considering the following test (which passes on dev):
(relatedComponents @Int) [[1,2], [3,5], [2,4],[9,5],[5,4]] @?= [[1,2,4,3,5,9]]
This doesn't make sense to me. In pictures:
Here I have highlighted common nodes for enhanced clarity. As you can see, the function is returning a single list which gives the impression that there is a path between all of them and/or that they are all connected, but as you can see in reality this is more a tree that should look like this:
or, if we don't take into account edge's direction, something like this:
I think, but I'm not sure, that the last interpretation is the closest to the returned output, where the last 3 nodes are sorted by edge's id and in reality [3,5,9]
means "the branch which has 5 as pivot and 3 and 9 as leaves".
Or, rather, it seems the idea is to cluster together nodes which are all part of the same subgraph.
@anoe Can you shed some light on what is this function supposed to do? It feels like I'm missing something and it's hard to say from just looking at the implementation and the expected output.