Phylo refactoring ideas
As I was doing #494 , I noticed various things that can be fixed in Phylo. Since I don't want my commits to grow too large in #494, I write up ideas for refactorings in this issue.
- Fix the
TimeUnit
constructors. Currently they are of this form:
Epoch
{ _epoch_period :: Int
, _epoch_step :: Int
, _epoch_matchingFrame :: Int }
This results in lots of repetitive code, e.g. to get scale: https://gitlab.iscpif.fr/gargantext/haskell-gargantext/blob/84bbdd5e762377c3eea535545389475a2d76932d/src/Gargantext/Core/Viz/Phylo/PhyloTools.hs#L172
I propose:
data TimeUnitCriteria = TimeUnitCriteria { period :: Int, step :: Int, matchingFrame :: Int }
and
data TimeUnit = Epoch TimeUnitCriteria | Year TimeUnitCriteria ...
This way the backend will mimick what is on the frontend: https://gitlab.iscpif.fr/gargantext/purescript-gargantext/blob/7477bfd1c997f711e0e0ecee263f0182b8486f3a/src/Gargantext/Components/PhyloExplorer/API.purs#L90-95
- When it comes to the frontend, there is a second
TimeUnit
Could we just have one such data structure?
- As part of #494 , I added functionality to
G.U.DateUtils
. In particular, there isparseFlexibleTime
. The usual API workflow is this: fetch date of document, insert that into hyperdata_hd_publication_date
, then parse yyyy/mm/dd manually, then insert that into_hd_publication_year
etc, and then in https://gitlab.iscpif.fr/gargantext/haskell-gargantext/blob/84bbdd5e762377c3eea535545389475a2d76932d/src/Gargantext/Database/Query/Table/Node/Document/Insert.hs#L280, convert_hd_publication_year
etc intodate
column innodes
table. It is a bit complex, though I understand thatdate
column is needed for fast docs sorting, lookup etc. I propose to useparseFlexibleTime
and friends to uniformly parse_hd_publication_date
into_hd_publication_year
etc.
This might be a bit controversial, just look at all the modules under G.C.T.C.API
: all APIs parse dates their own custom way.
-
getPhyloDataJson
function inG.C.V.P.API
calls dot2json everytimeGET
is requested. It is tempting to move that dot2json function right after worker finishes preparing phylo. This also has the benefit of quickly finding anyJSON
errors -- sometimes the phylo is generated so that e.g.edges
are missing in JSON. This would be caught fast and thrown as worker error. Otherwise you get correctly generated phylo and a 500 error when doing aGET
request to view it.
NOTE It might be that current phylo sometimes depends on e.g. current time -- in that case calling dot
in GET
makes sense...