haskell-gargantext issueshttps://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues2024-03-25T10:35:58+01:00https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/335[API search] When an external service is down (HAL or other), display a messa...2024-03-25T10:35:58+01:00Fabien Maniere[API search] When an external service is down (HAL or other), display a message with a more explicit textThis morning, the HAL services are down (including the API).<br>
In this case, Gargantext display this message as an error:
`ExternalAPIError (FailureResponse (Request {requestPath = (BaseUrl {baseUrlScheme = Https, baseUrlHost = "api.archives-ouvertes.fr", baseUrlPort = 443, baseUrlPath = ""},"/search"), requestQueryString = fromList [("q",Just "%28fr_title_t%3A%28grossesse%29%20OR%20fr_abstract_t%3A%28grossesse%29%29"),("fl",Just "docid%2Ctitle_s%2Cen_abstract_s%2CsubmittedDate_s%2Csource_s%2CauthFullName_s%2CauthOrganism_s"),("fq",Just "language_s%3Afr"),("start",Just "0"),("rows",Just "0")], requestBody = Nothing, requestAccept = fromList [application/json;charset=utf-8,application/json], requestHeaders = fromList [], requestHttpVersion = HTTP/1.1, requestMethod = "GET"}) (Response {responseStatusCode = Status {statusCode = 503, statusMessage = "Service Unavailable"}, responseHeaders = fromList [("content-length","107"),("cache-control","no-cache"),("content-type","text/html")], responseHttpVersion = HTTP/1.1, responseBody = "<html><body><h1>503 Service Unavailable</h1>\nNo server is available to handle this request.\n</body></html>\n"}))`
Is it possible to display a more explicit message, for non-technical usershttps://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/334Similarity Measure of Order 2: specification and implementations2024-03-25T15:39:58+01:00david ChavalariasSimilarity Measure of Order 2: specification and implementationsSeveral distributional measures have been tested in the literature (Weeds & Weir, 2003;2005), one of the objective being to find terms that are the most structurally equivalent to a target term $`i`$ or with a similar terms-terms co-occurrence distribution.
Performance of measures also depend on the frequency of the target term.
## First implementation
Based on Weeds & Weir 2005, GarganText implement the Additive MI-based CRM metrics which is the one that has the best performance in the WordNet Prediction Task while having fair performance in minimizing the α-skew Divergence Measure between the two distribution.
Here is the description of the Additive MI-based CRM metrics that is called Order2 in GarganText.
__Notations:__
* $`N`$ total number of documents
* $`n_{ij}`$ the number of co-occurences of $`i`$ and $`j`$
* $`n_{i}`$ the number of documents containing $`i`$.
* $`I_{ik} = log(\frac{\frac{n_{ik}}{N}}{\frac{n_{i}}{N}*\frac{n_{k}}{N}}) = log(\frac{N\times n_{ik}}{n_{i}\times n_{k}})`$ the mutual information between $`i`$ and $`j`$.
* $`sim_{mi}(i,j)=\frac{\Sigma_{k \neq i,j ; (I_{ik} >0 \wedge I_{jk} >0)}^{} I_{jk}}
{\Sigma_{k \neq i,j ; I_{jk}}^{}}`$
__Warning :__
* For the numerator, beware the condition $`I_{ik} >0 \wedge I_{jk}>0`$: we are summing on terms that both co-occur with $`i`$ and $`j`$, i.e. $`n_{ik}>0`$ and $`n_{jk}>0`$
* The condition $`k \neq i,j`$ is important
## Alternative implementation
It is also very interesting to implement the Difference-Weighted metrics $`sim_{mi}^{dw}`$
$`sim_{mi}^{dw}(i,j)=\frac{\Sigma_{k \neq i,j ; (I_{ik} >0 \wedge I_{jk} >0)}^{} min(I_{jk},I_{ik})}{\Sigma_{k \neq i,j ; I_{jk}}^{}}`$
Although the first one is state-of-the art for retrieval of structurally equivalent terms, the task of the graph explorer is a little different since we also try to provide a hierarchical organization of the terms. This second metrics, might be more suitable with respect to this. Since the code should be quite similar to the metrics aboce, I would like to make is available in the code for further test in the dev version and see if it could be an interesting addition to the graph toolbox after the 0.0.7 release.
__References :__
* Weeds, Julie, et David Weir. 2003. « A general framework for distributional similarity ». In Proceedings of the 2003 conference on Empirical methods in natural language processing, 81‑88.
* Weeds, Julie, et David Weir. 2005. « Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity ». Computational Linguistics 31 (4): 439‑75. https://doi.org/10.1162/089120105775299122.Epic 0.0.7delanoedelanoehttps://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/333[Node Corpus] Creating a corpus from an empty Notes node make a big document ...2024-03-20T14:15:48+01:00Fabien Maniere[Node Corpus] Creating a corpus from an empty Notes node make a big document with HTML code instead of simple text### To reproduce:
- create a corpus
- in this corpus, create a Notes node, and leave it empty
- click on create "fromWriteDocuments"
- we get a unique Doc with HTML code (see the below screenshot)
### Example of unique created document:
![image](/uploads/530f8fdeafe22983473952abefb5c448/image.png)https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/332[Node Corpus] Creating a corpus from Notes node with "NoList" option generate...2024-03-20T14:09:34+01:00Fabien Maniere[Node Corpus] Creating a corpus from Notes node with "NoList" option generates an error### To reproduce:
- create a corpus node
- inside it, create a Notes node and paste any text content from any source
- in the corpus settings, chosse the button ![image](/uploads/26b92f8e55b76d97e5ff71e30a60c148/image.png) "WriteNodesDocuments" and select:
- "**FR**" file lang
- "**NoList**" list selection (<-- this option seems to be the cause of the error, because when we choose "My list first", there's no error)
![image](/uploads/0f67f5fe2ba4e43d9a6be00362f27871/image.png)
### We get this error:
`SendResponseError (ResponseBodyError (ForeignError "Unexpected token 'E', \"Error in $\"... is not valid JSON") (rf)`
![image](/uploads/f7e7d50471bf352e99b3a6e8728699a1/image.png)https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/331Sort by terms is not language-aware2024-03-21T08:14:34+01:00Przemyslaw KaminskiSort by terms is not language-awareI have French terms and when sorting by terms I get: `période`, `vue`, `âge`, `étude`, etc while `âge`, `étude` should go before `période`, `vue`.https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/330[Node terms] institutes missing with HAL request2024-03-26T15:42:21+01:00mzheng[Node terms] institutes missing with HAL requestWhen working with the HAL API, and with a special institute like this :
![image](/uploads/19e2c020e16b0306913a7e030c02195b/image.png)
We get something like this in the intitute tab
![image](/uploads/4a08e6fabe0273a149191c33b519b297/image.png)
It should at least display the same number of document comming from Mines Alès as the number of documentmzhengmzhenghttps://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/329Output of `toPhylo` & co non-deterministic?2024-03-25T15:49:32+01:00Alfredo Di NapoliOutput of `toPhylo` & co non-deterministic?This ticket requires more investigation, but when I was working on https://gitlab.iscpif.fr/gargantext/purescript-gargantext/issues/632 , as part of my regression tests I wanted to add a test to check that the output of `PhyloExport` would stay the same between round of refactorings I was doing.
To do that, I have added the following test which checks the output against a golden test:
```hs
testPhyloExportExpectedOutput :: Assertion
testPhyloExportExpectedOutput = do
-- Acquire the config from the golden file.
expected_e <- JSON.eitherDecodeFileStrict' =<< getDataFileName "test-data/phylo/112828.json"
case expected_e of
Left err -> fail err
Right (pd :: PhyloData) -> do
let goldenCfg = pd_config pd
corpusPath' <- getDataFileName "test-data/phylo/GarganText_DocsList-nodeId-112828.csv"
listPath' <- getDataFileName "test-data/phylo/GarganText_NgramsList-112829.csv"
let config = goldenCfg { corpusPath = corpusPath'
, listPath = listPath'
, listParser = V3
}
mapList <- fileToList (listParser config) (listPath config)
corpus <- fileToDocsDefault (corpusParser config)
(corpusPath config)
[Year 3 1 5,Month 3 1 5,Week 4 2 5]
mapList
actual_e <- JSON.parseEither JSON.parseJSON <$> phylo2dot2json (toPhylo $ toPhyloWithoutLink corpus config)
case actual_e of
Left err -> fail err
Right (expected :: GraphData) -> do
let prettyConfig = JSON.defConfig { JSON.confCompare = compare }
let actualJSON = TE.decodeUtf8 (BL.toStrict $ JSON.encodePretty' prettyConfig $ pd_data pd)
let expectedJSON = TE.decodeUtf8 (BL.toStrict $ JSON.encodePretty' prettyConfig $ expected)
assertBool (show $ ansiWlEditExpr $ ediff' expectedJSON actualJSON) (expectedJSON == actualJSON)
```
To my surprise, this test **randomly fails** sometimes. At first I thought it was due to the fact that `JSON.encode` doesn't produce _sorted objects_ (especially since aeson 2.x switched from `containers` to `unordered-containers`) but I have mitigated that by using the `encodePretty'` function from `aeson-pretty` that can produce JSON objects which have sorted keys.
Despite that, I still get some random failures. Observe this excerpt, that is using the `tree-diff` library to show a diff-like counterexample:
```
-" \"name\": \"Period20062008\",\n",
+" \"name\": \"Branches peaks\",\n",
" \"nodes\": [\n",
-" 19\n",
+" 18\n",
" ],\n",
" \"nodesep\": \"1\",\n",
" \"overlap\": \"scale\",\n",
" \"phyloBranches\": \"1\",\n",
" \"phyloDocs\": \"72.0\",\n",
" \"phyloFoundations\": \"221\",\n",
" \"phyloGroups\": \"9\",\n",
" \"phyloPeriods\": \"17\",\n",
" \"phyloSources\": \"[]\",\n",
" \"phyloTerms\": \"95\",\n",
" \"phyloTimeScale\": \"year\",\n",
" \"rank\": \"same\",\n",
" \"ranksep\": \"1\",\n",
" \"ratio\": \"fill\",\n",
" \"splines\": \"spline\",\n",
" \"style\": \"filled\"\n",
" },\n",
" {\n",
" \"_gvid\": 1,\n",
" \"bb\": \"0,0,1224.6,2787\",\n",
" \"color\": \"white\",\n",
" \"fontsize\": \"30\",\n",
" \"label\": \"Phylo Name\",\n",
" \"labelloc\": \"t\",\n",
" \"lheight\": \"0.47\",\n",
" \"lp\": \"612.32,2766.2\",\n",
" \"lwidth\": \"2.07\",\n",
-" \"name\": \"Period20072009\",\n",
+" \"name\": \"Period20062008\",\n",
" \"nodes\": [\n",
-" 20\n",
+" 19\n",
" ],\n",
```
It's a bit hard to read, so let's try with a screenshot:
![Screenshot_2024-03-19_at_08.29.31](/uploads/c42d3e27666a184a9e7b892273fa00ea/Screenshot_2024-03-19_at_08.29.31.png)
I'm not sure exactly how the algorithm is meant to work, but I would have expected it to generate a predictable list of nodes and edges, especially since it's meant to be mostly (completely?) pure.
Now, there are a few aspects to consider here:
* As said, this requires further investigation, but it might explain why we have seen those [intermittent failures](https://gitlab.iscpif.fr/gargantext/haskell-gargantext/merge_requests/254) on the tests I added a while ago;
* **Important**: the two runs seems to contain the same output, it just that it's somehow wrongly "correlated" (for example nodes and labels for a given node are swapped between runs etc), so it seems there is some sort of effectful computation that is generating a list something in a potentially unpredictable order.
* It would be nice to reduce the test area further so that we can replicate this for smaller phylos and for code that doesn't use the export code (i.e. too many variables at play at once).
@anoe I think it would be nice to do some extra digging on this in due course, as having a strong test suite for Phylo is very important, considering how it's one of the primary source of issues (related to the frontend and the backend).Alfredo Di NapoliAlfredo Di Napolihttps://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/328Fix `cabal build` compilation warnings2024-03-18T09:27:12+01:00Alfredo Di NapoliFix `cabal build` compilation warningsWe have a few compilation warnings when building the project with `cabal build`:
```
In order, the following will be built (use -v for more details):
- gargantext-0.0.6.9.9.9.6.7 (exe:gargantext-server) (first run)
Warning: gargantext.cabal:819:3: Unknown field: "optimization"
Warning: gargantext.cabal:68:3: Unknown field: "optimization"
Configuring executable 'gargantext-server' for gargantext-0.0.6.9.9.9.6.7..
Warning: Languages listed as extensions: GHC2021. Languages must be specified
in either the 'default-language' or the 'other-languages' field.
Preprocessing executable 'gargantext-server' for gargantext-0.0.6.9.9.9.6.7..
Building executable 'gargantext-server' for gargantext-0.0.6.9.9.9.6.7..
```
In particular:
* The `optimization` field is misplaced; it needs to be placed inside the [cabal.project](https://cabal.readthedocs.io/en/3.4/cabal-project.html?highlight=optimization#cfg-field-optimization), not the cabal file!
* The `GHC2021` is listed as an extension (technically is), but it's more than that; it needs to be placed in the `default-language` as instructed by `cabal`.https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/327[API search HAL] On the HAL api, launching the exact same request several tim...2024-03-19T07:09:19+01:00Fabien Maniere[API search HAL] On the HAL api, launching the exact same request several times can give different results
Here is the observed behaviour:
- the request is the keyword "grossesse" in all_orgs, in FR language
![image](/uploads/10b795d30b146c1154ee2d88c6d2bb4d/image.png)
- the 1st request from Europa returned 0 result
- the 2nd request from DEV returned 2501 results
- the 3rd request from Europa (new corpus) returned 2683 results
![image](/uploads/8fd90def1d19d81537ddaab690c959e0/image.png)https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/326[Email notification] Make changes in the text of the email notification sent ...2024-03-11T10:55:52+01:00Fabien Maniere[Email notification] Make changes in the text of the email notification sent when a job is doneThe new test is to be determined.
Here is the current text (from the cnrs instance on this screenshot):
![image](/uploads/638f8d9560d1b166dde398b88ec2d175/image.png)Fabien ManiereFabien Manierehttps://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/325Route to Export Cooc Matrix2024-03-06T11:06:13+01:00delanoeRoute to Export Cooc Matrix- To debug and work.
- On the Node List, download either the list itslef or the cooc matrix as zipped jsonEpic 0.0.7Przemyslaw KaminskiPrzemyslaw Kaminskihttps://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/324Coherent Stemming interface2024-03-14T10:33:46+01:00Alfredo Di NapoliCoherent Stemming interfaceStepping stone towards fixing https://gitlab.iscpif.fr/gargantext/purescript-gargantext/issues/633
To summarise the context, we would like to have better control over our queries; at the moment we get 0 results for searches like "postpartum" on corpus documents either because Postgres' built-in full-text-search stemming isn't enough or because we stem with the "wrong" (for the query at hand) algorithm (i.e. porter vs lancaster). In the case of "postpartum", while the "Porter" algorithm cannot stem it further, using Lancaster would stem it into `postpart`, which means that if in postgres we use the `:*` syntax and search for `to_tsquery("postpart:*")` we would get results.
This ticket outlines a direction of travel. At the moment our stemming interface is a bit all over the place:
1. We have a `Gargantext.Core.Text.Terms.Mono.Stem` module which exposes a `stem` function, but this function uses the `stem` function from the [stemmer](https://hackage.haskell.org/package/stemmer) package, which is deprecated in favour of [snowball](https://hackage.haskell.org/package/snowball), which uses a C library (and implements the porter algorithm), but when I tried to use it, it segfaulted;
2. We have a porter implementation sitting at `Gargantext.Core.Text.Terms.Mono.Stem.En`, but this is used randomly, for example as part of `Gargantext.Database.Action.Search.searchInCorpus` instead of the "main" interface;
3. We have poor support for languages, also because our `Lang` type includes an `All` data constructor which makes annoying to have a total mapping between a `Lang` and a stemming algorithm;
4. We might want to pick a different algorithm for different contexts, for example we might want to have an "expert view" in our corpus search and run searches with different stemming strategies, and compare the results.
## Proposal
My proposal is as follows:
* Let's refactor the `Gargantext.Core.Text.Terms.Mono.Stem` so that it expose a single, nicely-encapsulate abstract function:
```hs
stem :: Lang -> StemmingAlgorithm -> T.Text -> T.Text
...
data StemmingAlgorithm
= -- | Use the 'porter' implementation for gargantext.
Porter
-- | Use the 'stemmer' implementation from the 'stemmer' package.
| Stemmer
-- | User Lancaster stemming.
| Lancaster
```
This means that all the requests for stemming a word needs to spell out, concretely:
a. The language;
b. The algorithm.
If we want, we could have an helper function which would default to our built-in `porter` algorithm if the language is English, or switch to one of the `stemmer` algos for other languages.
I would suggest to refactor the `Lang` type to get rid of the `All` constructor -- being this already an `Enum` and `Bounded` instance we can recover the `All` semantic by doing:
```hs
allLangs :: [Lang]
allLang = [minBound .. maxBound]
```
If `All` is needed as input for a query that we perform from the backend, then I would suggest that we create newtype wrappers so that what we use in the frontend is uncorrelated to the concrete backend type we end up working with. Getting rid of `All` means that we can precisely map each language to a particular stemming algorithm, which would also solve the problem that at the moment we are assuming that most of the document corpus is in english and therefore not using the correct stemming algorithm where we should.
@anoe What do you think?Alfredo Di NapoliAlfredo Di Napolihttps://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/323[Node Team] When we invite a user, the user email address sould be checked wi...2024-03-04T12:08:10+01:00Fabien Maniere[Node Team] When we invite a user, the user email address sould be checked without unwanted characters before being sentHere an example (" ' ") of unwanted characters in a username, which prevents the email from being sent:
![image](/uploads/6e126a9d001421a1b79ff0da684813de/image.png)
We need a validating rule applied on the email string.Fabien ManiereFabien Manierehttps://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/322Evaluating build systems: `nix`+`cabal.project.freeze` vs. `nixpkgs.haskellPa...2024-03-12T06:33:21+01:00Julien MoutinhoEvaluating build systems: `nix`+`cabal.project.freeze` vs. `nixpkgs.haskellPackages` vs. `nix-build-cabal-project` vs. `haskell.nix`This issue springs from my three attempts at writing a `flake.nix` for gargantext.
> Nix flakes is an experimental feature of the Nix package manager. Flakes was introduced with Nix 2.4 (see release notes).
>
> Flakes is a feature of managing Nix packages to simplify usability and improve reproducibility of Nix installations. Flakes manages dependencies between Nix expressions, which are the primary protocols for specifying packages. Flakes implements these protocols in a consistent schema with a common set of policies for managing packages.
https://nixos.wiki/wiki/Flakes
# Motivation
Lately, you've likely read me being noticeably wary of some dependencies
pulled by gargantext, especially ad-hoc ones or patched ones.
This wariness sprung out from my trying to rule all dependencies
with the best tool I currently know to wield
for clarifying and managing dependencies: `flake.nix`.
Yet I'd like to stress that the most important problem I'd like to face
here is not advocating for one flavor of the `nix` build system over another,
both still have pros and cons, but advocating for managing all Haskell dependencies
with `nix` instead of using `cabal-install`,
since, as I shall explain below, using `nix` is better for the concern
I personally often cherish the most: software correctness.
It just turns out that managing all Haskell dependencies
with `nix` is by far the most difficult obstacle to overcome
to be able to unlock the pros of a `flake.nix`.
Therefore, since I've just rebased my previous iteration of `flake.nix`
onto the lastest `dev` and `nixpkgs-23.11`,
I thought it would be a good time to explicit more thoroughly
the pros and cons of a [`nixpkgs.haskellPackages`](https://haskell4nix.readthedocs.io/search.html)-based `flake.nix`
with respect to any other legitimate software quality concerns.
To proceed in a methodological and exhaustive way I thought I would first let me guided by the [generic concerns](https://blog.codacy.com/iso-25010-software-quality-model) listed in the [ISO/IEC 25010 Software Quality Model](https://www.evs.ee/en/search?query=25010%3A2011&Search=Search), staying narrowly on the topic of each of them,
but that should not refrain you from raising (in the discussion below or out-of-band)
any other concern that you may have, be it related to software quality or out-of-code concerns.
Again let me stress that except for the narrow concern of "correctness",
I am not pretending that using `flake.nix`,
and more precisely *the* `flake.nix` based upon [`nixpkgs.haskellPackages`](https://haskell4nix.readthedocs.io/search.html)
I'm proposing in https://gitlab.iscpif.fr/gargantext/haskell-gargantext/merge_requests/258 is the best way to go,
nor that I've assessed correctly the impacts it could have on those other concerns.
In the end, I am happy to leave that to you ;)
#### Meta remark
Sorry, I am aware this is too verbose, but this is the best I can do to convince others,
and to force myself to recognize and maybe address concerns I'd rather not.
I will modify it in place and post a brief resume if need be,
but if that proves to be too hard or confusing,
I may switch to a standalone Markdown document,
trackable as easily as any other piece of code.
## Functional Suitability
### Functional Completeness
<!-- Evaluates the inclusivity of functions covering all designated tasks and user objectives without omission. -->
As could an improved `nix`+`cabal.project.freeze` build system, a `nixpkgs.haskellPackages`-based build system
can manage all dependencies (Haskell and non-Haskell),
but the latter enforces this when used to build the resulting gargantext package
(instead of just the development shells).
For the record, the current `nix`+`cabal.project.freeze` build system only manages with `nix` the executables ones: `ghc947`, `cabal_install_3_10_1_0`, `haskellPackages.alex`, `haskellPackages.happy` and `haskellPackages.pretty-show`.
### Functional Correctness
<!-- Evaluates the product's accuracy in delivering precise outcomes, aligning precisely with the required level of precision. -->
Both a `nix`+`cabal.project.freeze` build system or a `nixpkgs.haskellPackages`-based build system
enforce the precise pinning of **all** inputs which improves the delivery of precise outputs which is essential to correctness.
Side note: `cabal-install`'s is in Haskell, and `nix`'s in C++, but the former is likely less battle-tested than the latter.
However, contrary to the `nix`+`cabal.project.freeze` build system, a `nixpkgs.haskellPackages`-based build system
forces to specify some expectations about the outputs of all the dependencies,
and most notably what to expect from running their tests (either `doCheck` or `dontCheck` or which precise tests must not be run).
With the current `nix`+`cabal.project.freeze` build system,
the Haskell dependencies are managed with `cabal-install`
which does not run their `test-suite`s systematically:
[`cabal-install`'s `run-tests` defaults to `False`](https://cabal.readthedocs.io/en/latest/cabal-project-description-file.html#cfg-field-run-tests),
so to run the `test-suite`s of the whole closure of Haskell dependencies
I guess one could use a custom `--config-file=` containing `run-tests: True`,
yet that :
- would not be retro-active on already installed Haskell packages,
- would either force every installing user to run the `test-suite`s
or make it optional (hence forgettable),
- and, would not give a mean to specify which `test-suite`s, let alone specific tests,
are expected to fail at that point.
IMHO both systematic testing of dependencies and specifying failing tests,
*must* be done, either with `nix`+`cabal.project.freeze`+`run-tests` or `nixpkgs.haskellPackages`.
But doing so means writing Nix code to exhaustively build, test and fix Haskell packages,
which was the most difficult part of writing a `nixpkgs.haskellPackages`-based `flake.nix` for gargantext anyway.
As a motivating example demonstrating that the problem is not only hypothetic
but actually occuring, doing such a systematic checking while writing the demo `nixpkgs.haskellPackages`-based `flake.nix`
is what enabled the uncovering of a few [failing tests](https://github.com/adinapoli/text16-compat/issues/1) in our own `text16-compat`
which may or may not cause an other failing test (`/Unicode and Regex Test/`)
in `duckling` which was not failing 3 months ago where `text-2` and `text16-compat` were not yet used.
### Functional Appropriateness
<!-- Examines the effectiveness of functions in accomplishing designated tasks and objectives within the intended context. -->
As could do an improved `nix`+`cabal.project.freeze` build system, a `flake.nix` build system
gathers in a single tool (`nix`) all the tasks
and objectives of the current `nix`+`cabal.project.freeze` build system and more:
- providing development shell**s**:
- `nix -L develop`
- `nix -L develop .#prof-trace` (alternative development shell)
- `nix -L develop .#haskellPackages.boolexpr` (development shell for a specific dependency)
- …
- package**s**:
- `nix -L build .#gargantext` (building the gargantext package)
- `nix -L build .#haskellPackages.boolexpr` (building only a specific dependency)
- …
- source code formatters (eg. with the help of [pre-commit-hooks](https://github.com/cachix/pre-commit-hooks.nix)).
- applications (build or tests scripts or actual end-users programs).
- Nixpkgs overlays (propagated modification of a dependency).
- NixOS configurations.
- checks.
- …
`cabal-install` remains responsible to build Haskell packages
(mostly to invoke necessary `ghc` calls with the right parameters) under the hood,
but it no longer selects nor fetches Haskell dependencies.
## Reliability
<!-- Focuses on the dependability of a system, product, or component in executing predefined functions under stipulated conditions. -->
### Maturity
<!-- Evaluates the readiness of a system, product, or component to meet reliability needs satisfactorily. -->
Contrary to a `nix`+`cabal.project.freeze` build system, a `flake.nix` build system
is less mature, it's actually still guarded behind a feature flag,
usually set in `~/.config/nix/nix.conf` or `/etc/nix/nix.conf`:
```
experimental-features = nix-command flakes
```
See this June 2023 discussion: [Why are flakes still experimental?](https://discourse.nixos.org/t/why-are-flakes-still-experimental/29317):
> Note that there is also a bit of controversy around flakes. The feature has been implemented without going through the RFC process, lacking input from community. As a result part of the community rejected flakes.
> Many also feel that it is a too big monolithic change to the Nix language, parts of which could just as well be implemented outside of Nix (see e.g. niv for managing and pinning sources) and promoting one true way will stifle innovation. Having less controversial features like nix-command and evaluation caching tied to flakes is also considered unfortunate.
> However, there is currently a new RFC to try to resolve this situation:
> https://github.com/NixOS/rfcs/pull/136 (merged)
### Availability
<!-- Assesses the operational state and accessibility of a system, product, or component. -->
Contrary to the `nix`+`cabal.project.freeze` build system, a `flake.nix` build system
is only available since [nix-2.4 (2021-11-01)](https://nixos.org/manual/nix/unstable/release-notes/rl-2.4.html).
### Fault Tolerance
<!-- gauges the system's operational continuity despite potential hardware or software faults. -->
I currently see no visible difference on that concern.
### Recoverability
<!-- evaluates the system's capability to retrieve data following interruptions or failures. -->
Contrary to a `nix`+`cabal.project.freeze` build system, a `nixpkgs.haskellPackages`-based build system
handles Haskell-dependencies, and as all nix dependency by default they're built in a temporary sandbox,
meaning that when a dependency fails to build or pass its test-suite(s),
it has to be fixed or worked-around, which in both case means to be rebuilt from the beginning.
The same applies when patching a dependency, though there is [work to support incremental builds](https://github.com/NixOS/nixpkgs/pull/204020), it is not [yet morally-supported by upstream's nix](https://github.com/NixOS/nix/pull/7362#issuecomment-1564251964).
## Performance Efficiency
<!-- Performance Efficiency involves the optimization of resource utilization concerning the performance output of a system or product. -->
### Time Behavior
<!-- focuses on the system's response, processing times, and throughput rates during operational phases. -->
Contrary to a `nix`+`cabal.project.freeze` build system, a `flake.nix` build system
has an evaluation cache in `~/.cache/nix/` for flake outputs,
[speeding up significantly](https://nixos.wiki/wiki/Flakes#Super_fast_nix-shell) the spawning of development shells for instance. Meaning also that if a build fails, re-running it without modifying anything from what it depends, will fail without even trying to rebuild, hence adding `--show-trace` to get a nix stack trace will not work without modifying the evaluated nix expression. It's not a problem, it's just something to know.
In both cases, using `nix` to manage both Haskell and non-Haskell dependencies, enables to fetch the compiled outputs from https://cache.nixos.org (or a [custom cache](http://cachix.org/) (possibly populated by the CI) specific to gargantext) instead of having every integrator-user recompile Haskell dependencies when installing.
### Resource Utilization
<!-- concerns the effective utilization of resources, such as CPU, memory, and network bandwidth, during system operation. -->
Contrary to the `nix`+`cabal.project.freeze` build system, a `nixpkgs.haskellPackages`-based build system
uses more disk space as it always copies every changed input to the Nix store.
By making it more easy to update inputs, (especially `nixpkgs`),
it will also:
- download more versions of them,
which can be mitigated by running `nix flake lock --override-input nixpkgs github:NixOS/nixpkgs/<a commit-revision-already-downloaded-previously>` to the expense of reproductibility,
- keep more versions of them in the Nix store,
which can be mitigated by (manually) running `nix-collect-garbage`.
Moreover, when the outputs are registered as a nix garbage-collector root
(eg. as a `result` symlink when using `nix build`, or when using the `direnv` integration),
those roots (listable with `nix-store --gc --print-roots`) may be a pain to remove by hand.
So when using `direnv`, I recommend putting the following
in `~/.config/direnv/direnvrc` to gather all the GC roots in a single directory
where it's easier to remove them when we no longer need them:
```bash
: ${XDG_CACHE_HOME:=$HOME/.cache}
declare -A direnv_layout_dirs
direnv_layout_dir() {
echo "${direnv_layout_dirs[$PWD]:=$(
echo -n "$XDG_CACHE_HOME"/direnv/layouts/${PWD##*/}-
echo -n "$PWD" | shasum | cut -d ' ' -f 1
)}"
}
```
The impact on the CI should also be assessed,
for instance if it's using Docker containers this could likely be replaced by NixOS VM,
which could use much less disk space because they would share dependencies at the package level,
not the container level.
Besides, since flakes compose, their inputs may proliferate in cascade, giving rise to many different version of a same input (eg. `nixpkgs`) being pulled. This must be prevented by using the `follows` attribute of inputs. See the "Interoperability" section below.
### Capacity
<!-- evaluates the system's maximum limits concerning parameters and its ability to meet them adequately. -->
Contrary to a `nix`+`cabal.project.freeze` build system, a `nixpkgs.haskellPackages`-based build system
limits the inputs, when building an output package, no unspecified input can be accessed.
## Usability
<!-- Usability assesses the ease and effectiveness users can achieve predefined goals using a product or system. -->
### Appropriateness Recognizability
<!-- examines the user's ability to discern the product's suitability for their requirements. -->
Contrary to a `nix`+`cabal.project.freeze` build system, a `flake.nix` build system
cannot hardly be considered an ad-hoc architecture
it follows an [established schema](https://nixos.wiki/wiki/Flakes#Flake_schema),
making it easier to organize and document goals
in a way familiar to other Nix flakes users,
There's also `nix flake show` to print what's implemented on what system,
but in simple cases it's enough and easier to just read `flake.nix`.
### Learnability
<!-- evaluates the ease of learning to use the product or system effectively, particularly in emergencies. -->
From knowing the current `nix`+`cabal.project.freeze` build system, there are mainly two things to learn:
- The [`flake.nix` schema](https://nixos.wiki/wiki/Flakes#Flake_schema) used to organize inputs and outputs in a consistent and coherent Nix expression. That is quite simple and should not cause pain. Besides, learning it is an amortizable cost as more and more projects are using it to manage package inputs and outputs. However the `flake.nix` way is more recent, hence may not be the way presented in online documentations and other resources.
- The [Nixpkgs Haskell infrastructure](https://haskell4nix.readthedocs.io/search.html), used to replace `cabal-install`'s dependencies management. Though it's already used by current `nix`+`cabal.project.freeze` build system for specific packages (the executable ones), using a `nixpkgs.haskellPackages`-based build system would require a more thorough understanding of how to do things usually disabled (tests) or done with `Git` (patching) or `cabal.project` (pinning). However I think that the proposed `flake.nix` already has examples to learn from to solve most of the problems that may arise.
### Operability
<!-- measures the ease of operation and control of the product or system. -->
### User Error Protection
<!-- gauges the system's safeguards against user errors to minimize their occurrence and impact. -->
Contrary to a `nix`+`cabal.project.freeze` build system, a `flake.nix` build system
gives the ability to update all inputs in one command (`nix flake update`),
which helps a lot to prevent forgetting about updating a pinned forked repository,
and warns when an external upstream has been updated
(except when the input is pointing to an ad-hoc and unmaintained branch instead of `main`/`master`,
that's why I prefer to put yet-unmerged change in `patches/`
and apply them on top of upstream's `main` with `mkDerivation`'s `patches` or `lib`'s `applyPatches`,
which will raise an error when upstream's has merged it or has broken it).
### User Interface Aesthetics
<!-- evaluates the aesthetic appeal of the user interface and its impact on user engagement. -->
Contrary to a `nix`+`cabal.project.freeze` build system, a `flake.nix` build system
does not need ad-hoc scripts or manually fetching and modifying hardcoded commits in `cabal.project`'s `source-repository-package`
to update dependencies, the `nix flake` subcommand automates all that,
yet let us able to pin them manually if need be.
It is a unified way of managing dependencies, removing the responsability to select, pin and update dependencies
from underlying build tools like `cabal-install`
(which only keeps the task of building the Haskell code).
### Accessibility
<!-- evaluates the product's usability across various user characteristics and capabilities. -->
Contrary to a `nix`+`cabal.project.freeze` build system, a `nixpkgs.haskellPackages`-based build system
is not commonly used by other developers, and it's not clear for me whether or not
most nix users are already used to `flake.nix`, which are only 3 years old
and still require to enable a feature flag in `~/.config/nix/nix.conf` or `/etc/nix/nix.conf`.
Meaning that any basic usage of a `flake.nix` that would come to be required
must be carefuly documented in gargantext's own documentation.
## Security
<!-- Security refers to protecting information and data from potential security vulnerabilities. -->
### Confidentiality
<!-- focuses on ensuring that data remains accessible only to authorized individuals. -->
Contrary to a `nix`+`cabal.project.freeze` build system, a `flake.nix` build system
forcibly copies every input into the Nix store.
Because the Nix store is readable by all Unix users,
this can be problematic on shared computers if secrets are used in inputs.
### Integrity
<!-- evaluates the system's capability to prevent unauthorized access or modification to data and programs. -->
I do not see any difference for that concern.
### Non-repudiation
<!-- ensures that actions or events can be irrefutably proven to have occurred. -->
I do not see any difference for that concern.
### Accountability
<!-- refers to the traceability of unauthorized actions back to their originator. -->
I do not see any difference for that concern.
### Authenticity
<!-- concerns the verification of a subject or resource's identity. -->
I do not see any difference for that concern.
## Compatibility
<!-- Compatibility assesses a product, system, or component's ability to exchange information and perform its functions seamlessly within a shared hardware or software environment. -->
### Co-existence
<!-- evaluates a product's ability to operate efficiently alongside other products without adverse effects. -->
A `nix`+`cabal.project.freeze` build system can co-exist with a `nixpkgs.haskellPackages`-based `flake.nix` build system,
but this would be a duplication of efforts to support both of them beyond a demo.
### Interoperability
<!-- examines the seamless exchange of information and its utilization across multiple systems and software components. -->
Contrary to the `nix`+`cabal.project.freeze` build system, a `flake.nix` build system
enables users to change inputs without modifying gargantext's `flake.nix`
using the `follows` attribute.
For instance this is how we can use a single version of `nixpkgs`
instead of two (gargantext's and pre-commit-hooks'):
```
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-23.11";
pre-commit-hooks.url = "github:cachix/pre-commit-hooks.nix";
pre-commit-hooks.inputs.nixpkgs.follows = "nixpkgs";
};
```
The same could be done by someone importing gargantext's `flake.nix`
to run its own instance while using its own version of `nixpkgs`
instead of the one pinned by gargantext's `flake.nix`:
```
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-23.11";
gargantext.url = "git+https://gitlab.iscpif.fr/gargantext/haskell-gargantext.git";
gargantext.inputs.nixpkgs.follows = "nixpkgs";
};
```
## Maintainability
<!-- Maintainability evaluates a product or system's ease of modification to enhance, correct, or adapt to environmental or requirement changes. -->
### Modularity
<!-- assesses the extent to which system components can be altered with minimal impact on others. -->
Contrary to a `nix`+`cabal.project.freeze` build system, a `flake.nix` build system
can more easily pin, update and fetch other `flake.nix` which can themselves provide
not only packages, but all sort of integrations like overlays or shells.
See for example `pre-commit-hooks`'s `flake.nix`,
which provides facilities to integrate itself into the development shell.
But the most important use of that may be by gargantext's own users
which instead of publishing their data and mentionning gargantext's version used to analyse them,
could just publish their own `flake.nix` pinning gargantext's `flake.nix`
for a better and easier reproducibility of science results.
The same could be done with a `nix`+`cabal.project.freeze` build system,
but `flake.nix` makes it much easier and more manageable.
### Reusability
<!-- concerns the potential for assets to be utilized across multiple systems. -->
Contrary to a `nix`+`cabal.project.freeze` build system, a `flake.nix` build system
is designed to compose with other `flake.nix`, which can load other flakes as input
and easily reuse any code made available in its Flake schema.
This could help the next step of nixifying gargantext:
writing a NixOS module into `flake.nix` alongside the packaging.
### Analysability
<!-- evaluates the effectiveness of impact assessments on planned changes and the system's diagnosability for deficiencies. -->
Contrary to a `nix`+`cabal.project.freeze` build system, a `nixpkgs.haskellPackages`-based build system
also manages all Haskell dependencies, which, in the context of benchmarking, profiling, or testing,
make it easier to apply custom compile flags
to the whole package closure, and to switch between packages and shells
enabling different sets of flags.
### Modifiability
<!-- examines the ease of system modification without compromising quality. -->
Contrary to a `nix`+`cabal.project.freeze` build system, a `flake.nix` build system
may render the dependencies' code harder to modify
since (unless `--impure` is passed to `nix` by the developer),
all files necessary to build an output must be put inside Git's index (ie. `git add`).
Eg. it you add a patch in `patches/` and add it to a package's `patches=` attribute in `flake.nix`,
the build will fail to find the patch unless you've previously `git add patches/path/to/the.patch`,
as nix build packages in a sandbox only able to reach what is in the Nix store,
and when copying the `self` input (your working tree) only what has been put
into your Git repository's index will be copied there.
Side note, once `git add patches/path/to/the.patch` has been done for one version of the patch,
it's not necessary to do it again while iterating.
Moreover like when using git submodules, modifying dependencies requires to update them (in `flake.lock`, with `nix flake lock --update $input`), which can be problematic if the dependency modification has not yet been published.
That can be mitigated with a temporary :
```
$ nix flake lock --override-input some-input git+file:///local/path/to/some-input?ref=main
```
### Testability
<!-- concerns the effectiveness of establishing test criteria and conducting tests to ascertain compliance. -->
Besides the correctness concern of Haskell dependencies studied above,
there is also the possiblity to provide checks in the `flake.nix`:
```
$ nix flake check .#gargantext
```
Those checks could spawn multiple NixOS-based VM for running them in a more controlled environment.
## Portability
<!-- Portability evaluates a system, product, or component's ease of transfer between different environments. -->
### Adaptability
<!-- examines the system's ability to adapt to diverse or evolving hardware, software, and usage environments. -->
Contrary to a `nix`+`cabal.project.freeze` build system, a `flake.nix` build system
is not familiar to any other gargantext developers but me so far, and I'm no `darwin` user.
Yet, `flake.nix` provides a principled way to do adapt things to differents systems,
because both packages and shells are only reachable by first specifying
the system they should be built for.
### Installability
<!-- evaluates the system's success in installation and uninstallation processes. -->
Contrary to a `nix`+`cabal.project.freeze` build system, a `flake.nix` build system
is easier to install, as it takes care of installing all inputs (except for the kernel),
one could just do:
```
$ nix run git+https://gitlab.iscpif.fr/gargantext/haskell-gargantext.git?ref=dev#gargantext
```
and that would download the whole closure of latest `dev` branch of gargantext,
build what's not in the cache and execute the application named "gargantext" in `flake.nix`.
Whichever the build system chosen,
I'd like to stress the importance of using development shells
to provide development tools which have a strong compatiblity constraint.
It would prevent debugging a mismatch between GHC and HLS
like what happened to @cgenie recently.
### Replaceability
<!-- gauges a product's potential to substitute another comparable product effectively. -->
The actionable alternatives are:
1. The current `nix`+`cabal.project.freeze`.
2. A `flake.nix` based upon `nixpkgs.haskellPackages` like the current !258
3. A `flake.nix` based upon fgaz's [nix-build-cabal-project](https://git.sr.ht/\~fgaz/nix-build-cabal-project), which boils down to just a [thin wrapper](https://git.sr.ht/\~fgaz/nix-build-cabal-project/tree/877a08434ce1db4c677c41c56519f85f6e12ebe5/item/default.nix#L91) around `cabal-install`. Meaning it does not package nor cache each individual Haskell dependencies into Nix, it bundles them all in a single package, which is helpful for deployment, no so helpful for development. If `cabal-install` is setup to `run-tests` and we find a way to override `run-tests` per-package, it can also check `test-suite`s.
4. A `flake.nix` based upon IOHK's [haskell.nix](https://input-output-hk.github.io/haskell.nix), which produces Nix expressions for all of Hackage. However it does not put their outputs on the default Nix cache at https://cache.nixos.org but on [IOHK's Nix cache at hydra.iohk.io](https://input-output-hk.github.io/haskell.nix/tutorials/getting-started.html#setting-up-the-binary-cache).
5. "1. + 2." but this 2. would not use the exact same package versions than to 1.
6. "1. + 3."
7. "1. + 4."
8. `guix`'s [`haskell-build-system`](https://guix.gnu.org/manual/en/html_node/Build-Systems.html), but I doubt it provides the same level of integration of Haskell than 2. or 4. https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/321Error at graph O2 generation2024-03-26T13:51:50+01:00david ChavalariasError at graph O2 generation@anoe On https://academia.sub.gargantext.org V0.0.6.9.9.9.6.3, in CNS Center > Maps > Co-word analysis
Graph with distance order2 generate the following error :
cc: callProcess: posix_spawnp: does not exist (No such file or directory)Alfredo Di NapoliAlfredo Di Napolihttps://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/320wikiparsec fork2024-03-01T13:36:15+01:00Julien Moutinhowikiparsec forkThis issue is a reminder that gargantext currently uses a fork of [wikiparsec (branch `adinapoli/support-ghc-947`)](https://github.com/adinapoli/wikiparsec/tree/adinapoli/support-ghc-947).
As of february 2024:
> This branch is 2 commits ahead of rspeer/wikiparsec:master.
**Warning:** currently pinned commit (b3519a0351ae9515497680571f76200c24dedb53) is [not the latest on the `adinapoli/support-ghc-947` branch](https://github.com/adinapoli/wikiparsec/commits/adinapoli/support-ghc-947/).
Related MR: !231
> [wikiparsec](https://github.com/adinapoli/wikiparsec/tree/adinapoli/support-ghc-947) (temporary fork);
Upstream PR: TODO?https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/319duckling fork2024-02-29T08:37:21+01:00Julien Moutinhoduckling forkThis issue is a reminder that gargantext currently uses a fork of [duckling (branch `adinapoli/ghc947-compat`)](https://github.com/adinapoli/duckling/tree/adinapoli/ghc947-compat).
As of february 2024:
> This branch is 6 commits ahead of facebook/duckling:main.
Related MR: !231
> [Duckling](https://github.com/facebook/duckling#readme) doesn't work with aeson;
This fork [also depends on `text16-compat`](https://gitlab.iscpif.fr/gargantext/haskell-gargantext/merge_requests/231#note_9130):
> Update: I have fixed `duckling` by relying on a package I've developed for the occasion called [text16-compat](https://github.com/adinapoli/text16-compat) which exposes some compat shim that allows `duckling` to keep working even though the underlying `text` library is now using a `ByteArray` of UTF8 code points.https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/318llvm-hs fork2024-02-29T08:39:47+01:00Julien Moutinhollvm-hs forkThis issue is a reminder that gargantext currently uses a fork of [llvm-hs (branch `adinapoli/llvm-12-ghc-947-compat`)](https://github.com/adinapoli/llvm-hs/tree/adinapoli/llvm-12-ghc-947-compat).
As of february 2024:
> This branch is 1 commit ahead of llvm-hs/llvm-hs:llvm-12.
Related MR: !231
> [llvm-hs] doesn't work until we downgrade the toolchain of the whole nix shell, or we upgrade directly to llvm-15 hoping it will work;
Related analysis: https://gitlab.iscpif.fr/gargantext/haskell-gargantext/merge_requests/231#note_9193https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/317haskell-opaleye fork2024-02-29T01:36:22+01:00Julien Moutinhohaskell-opaleye forkThis issue is a reminder that gargantext currently uses a fork of [haskell-opaleye (branch `tsquery-fixes`)](https://github.com/garganscript/haskell-opaleye/tree/tsquery-fixes).
As of february 2024:
> This branch is 9 commits ahead of, 133 commits behind tomjaguarpaw/haskell-opaleye:master.
This fork was introduced in https://gitlab.iscpif.fr/gargantext/haskell-gargantext/merge_requests/201
and moved to https://github.com/garganscript/haskell-opaleye.git in !251https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/316Failed to replicate simple phylomemy2024-02-29T00:40:47+01:00david ChavalariasFailed to replicate simple phylomemyin 0.0.6.9.9.9.6.3
@anoe, @AlfredoDiNapoli @AlfredoDiNapoli : I tried to replicate the very simple phylomemy of
Lobbé, Quentin, David Chavalarias, Alexandre Delanoë, Gabriel Ferrand, Sarah Cohen-Boulakia, Philippe Ravaud, et Isabelle Boutron. 2022. « Toward an Observatory of the Evolution of Clinical Trials through Phylomemy Reconstruction: The COVID-19 Vaccines Example ». Journal of Clinical Epidemiology, mai. https://doi.org/10.1016/j.jclinepi.2022.05.004.
![image](/uploads/0cf7fe737f0a26ab7310dc8447f334bc/image.png)
And found the following issues :
* the one generated on FIS (with threshold 1 on size and support) is very different from [the one generated by the script](http://maps.gargantext.org/unpublished_maps_phylo/vaccines_fundings_10_2021/)
* The one generated on the MaxClique makes the page crash
All the data for replication are in a team on Dev "clinical trials DEBUG".
I think this is a good case study. The major pb here is that the date format is in the form of 1, 2, .... 15 (which are the weeks), instead of proper date. So maybe it is just a pb of date formatming but still, it could be nice that the phylo accept such ordered format.