Phylo refactoring ideas (#500) · Issues · gargantext / haskell-gargantext

Phylo refactoring ideas

As I was doing #494 , I noticed various things that can be fixed in Phylo. Since I don't want my commits to grow too large in #494, I write up ideas for refactorings in this issue.

Fix the TimeUnit constructors. Currently they are of this form:

https://gitlab.iscpif.fr/gargantext/haskell-gargantext/blob/84bbdd5e762377c3eea535545389475a2d76932d/src/Gargantext/Core/Viz/Phylo.hs#L117-138

      Epoch
      { _epoch_period :: Int
      , _epoch_step   :: Int
      , _epoch_matchingFrame :: Int }

This results in lots of repetitive code, e.g. to get scale: https://gitlab.iscpif.fr/gargantext/haskell-gargantext/blob/84bbdd5e762377c3eea535545389475a2d76932d/src/Gargantext/Core/Viz/Phylo/PhyloTools.hs#L172

I propose:

data TimeUnitCriteria = TimeUnitCriteria { period :: Int, step :: Int, matchingFrame :: Int }

and

data TimeUnit = Epoch TimeUnitCriteria | Year TimeUnitCriteria ...

This way the backend will mimick what is on the frontend: https://gitlab.iscpif.fr/gargantext/purescript-gargantext/blob/7477bfd1c997f711e0e0ecee263f0182b8486f3a/src/Gargantext/Components/PhyloExplorer/API.purs#L90-95

When it comes to the frontend, there is a second TimeUnit

https://gitlab.iscpif.fr/gargantext/purescript-gargantext/blob/7477bfd1c997f711e0e0ecee263f0182b8486f3a/src/Gargantext/Components/PhyloExplorer/JSON.purs#L338-363

Could we just have one such data structure?

As part of #494 , I added functionality to G.U.DateUtils. In particular, there is parseFlexibleTime. The usual API workflow is this: fetch date of document, insert that into hyperdata _hd_publication_date, then parse yyyy/mm/dd manually, then insert that into _hd_publication_year etc, and then in https://gitlab.iscpif.fr/gargantext/haskell-gargantext/blob/84bbdd5e762377c3eea535545389475a2d76932d/src/Gargantext/Database/Query/Table/Node/Document/Insert.hs#L280, convert _hd_publication_year etc into date column in nodes table. It is a bit complex, though I understand that date column is needed for fast docs sorting, lookup etc. I propose to use parseFlexibleTime and friends to uniformly parse _hd_publication_date into _hd_publication_year etc.

This might be a bit controversial, just look at all the modules under G.C.T.C.API: all APIs parse dates their own custom way.

getPhyloDataJson function in G.C.V.P.API calls dot2json everytime GET is requested. It is tempting to move that dot2json function right after worker finishes preparing phylo. This also has the benefit of quickly finding any JSON errors -- sometimes the phylo is generated so that e.g. edges are missing in JSON. This would be caught fast and thrown as worker error. Otherwise you get correctly generated phylo and a 500 error when doing a GET request to view it.

NOTE It might be that current phylo sometimes depends on e.g. current time -- in that case calling dot in GET makes sense...

As I was doing #494 , I noticed various things that can be fixed in Phylo. Since I don't want my commits to grow too large in #494, I write up ideas for refactorings in this issue.

1. Fix the `TimeUnit` constructors. Currently they are of this form:

https://gitlab.iscpif.fr/gargantext/haskell-gargantext/blob/84bbdd5e762377c3eea535545389475a2d76932d/src/Gargantext/Core/Viz/Phylo.hs#L117-138

```haskell
      Epoch
      { _epoch_period :: Int
      , _epoch_step   :: Int
      , _epoch_matchingFrame :: Int }
```

I propose:
```haskell
data TimeUnitCriteria = TimeUnitCriteria { period :: Int, step :: Int, matchingFrame :: Int }
```
and
```haskell
data TimeUnit = Epoch TimeUnitCriteria | Year TimeUnitCriteria ...
```

2. When it comes to the frontend, there is a second `TimeUnit`

https://gitlab.iscpif.fr/gargantext/purescript-gargantext/blob/7477bfd1c997f711e0e0ecee263f0182b8486f3a/src/Gargantext/Components/PhyloExplorer/JSON.purs#L338-363

Could we just have one such data structure?

3. As part of #494 , I added functionality to `G.U.DateUtils`. In particular, there is `parseFlexibleTime`. The usual API workflow is this: fetch date of document, insert that into hyperdata `_hd_publication_date`, then parse yyyy/mm/dd manually, then insert that into `_hd_publication_year` etc, and then in https://gitlab.iscpif.fr/gargantext/haskell-gargantext/blob/84bbdd5e762377c3eea535545389475a2d76932d/src/Gargantext/Database/Query/Table/Node/Document/Insert.hs#L280, convert `_hd_publication_year` etc into `date` column in `nodes` table. It is a bit complex, though I understand that `date` column is needed for fast docs sorting, lookup etc. I propose to use `parseFlexibleTime` and friends to uniformly parse `_hd_publication_date` into `_hd_publication_year` etc.

This might be a bit controversial, just look at all the modules under `G.C.T.C.API`: all APIs parse dates their own custom way.

4. `getPhyloDataJson` function in `G.C.V.P.API` calls dot2json everytime `GET` is requested. It is tempting to move that dot2json function right after worker finishes preparing phylo. This also has the benefit of quickly finding any `JSON` errors -- sometimes the phylo is generated so that e.g. `edges` are missing in JSON. This would be caught fast and thrown as worker error. Otherwise you get correctly generated phylo and a 500 error when doing a `GET` request to view it.

**NOTE** It might be that current phylo sometimes depends on e.g. current time -- in that case calling `dot` in `GET` makes sense...

Edited Aug 25, 2025 by Przemyslaw Kaminski