Streamlined deployments of GGTX (#486) · Issues · gargantext / haskell-gargantext

Streamlined deployments of GGTX

@fmaniere @cgenie I'm opening this ticket to get the ball rolling on the deployment story, as well as giving us visibility and high-level cooperation on the topic.

I know that probably both of you have done individual thinking on the topic and Fabien has been doing some excellent work in trying to get things moving, but I think we need an holistic plan to bring us from A to B.

NOTE: This is going to be a long ticket, so grab your favourite beverage before diving in.

Problem description

Gargantext is an open source project with an active development, but its deployments have been for a long time centralised, meaning that in that area the bus factor is exactly 1. Recently, as Murphy's law has it, the bus factor manifested and as such deployments stopped/slowed down, leading to a lot of valuable contributions stagnating on the dev branch without the ability of being pushed outside for users to test it.

Current (guessed) status quo

The current deployment scenario is, as far as I understand, the following:

A main GGTX engineer hops into one instance he would like to upgrade (let's pick dev.sub.gargantext.org as an example);
A "feature branch" is cut typically from a precise dev commit, and the feature branch is used as a baseline for the changes;
The code is compiled;
The version is bumped, with a versioning scheme which at the moment is not publicly disclosed or, in general, it's not completely obvious to guess (i.e. it doesn't adhere to the classic PVP policy;
A changelog is (manually) forged that highlights all the changes happened between two versions. Such changelog as far as I understand is manually curated by a human;
The code is run (probably guarded by some systemd service) and the DB migration executed;
The release branch is merged back into dev.

I'm not exactly sure if the code gets run before the version bump and changelog have happened, but I decided to include the "run" step last because it's not obvious to make what should happen in case step 6 fails in terms of rollbacks (especially database migrations), so I have decided to put it last.

Requirements

Going forward, we have a bunch of requirements for deployments:

Inspectable -- the deployment process should be "auditable", i.e. it should be document or, even better, self contained in a bunch of scripts (like Ansible for example), so that there are no "obscure steps" left to the imagination;
Automatic -- it should ideally be possible to run a single command for tagging a release (example: gargantext release -o <version_number>), to make a changelog (example: gargantext generate-changelog -i <version_number>) and to deploy (example: gargantext deploy -i dev.sub.gargantext.org);
Decentralised -- it should be possible for a pool of trusted GGTX engineers to all deploy if their SSH keys are present on the server(s). This way, the bus factor increases;
Nix-based -- we are already using nix to provide system dependencies, it's conceivable nix should play a role. We should research if there are tools that integrates with Nix (I know that Ansible should, at least) or if Nix has a bespoke way of deploying things (I know about NixOps but that is NixOS specific);
CI-powered -- see also the ticket #485 that @cgenie opened here; given that CI has already cached dependencies and have to build GGTX anyway (even though not in a shape that is currently viable for production) it feels like a waste to not use it to do some sort of continuous integration. We could in theory also explore hydra if that helps with deployments, but I don't think so, and I think that setting up a hydra instance will be a bit of a hassle.

Plan of action

I have a lot of ideas, but there are a few preliminary steps that we can conceivably do, for example start researching the solution space (I have used to fair share of tools for other clients but I'm always interested in hearing about new / more fitting tools). I will open other umbrella tickets to get us going before coming back here with a more thorough implementation plan, as this ticket is already long enough.

Straw-man proposal

NOTE: I'm discussing only the backend for now, I need to think how the frontend fits into this. For me, that's a separate project and as such the backend release and deploy code shouldn't be intertwined with it, so I'm omitting it for now. I understand that the release and deploy of the two happens in lockstep, so does for many projects out there.

Given the discussion and the feedback from the initial writeup, here is my initial proposal that we can revise and discuss as we see fit:

Let's introduce a set of CLI commands, gargantext release, gargantext deploy, which semantic will be explained later, but in a nutshell gargantext release can be used to tag a new release, generate a new changelog, push things to dev and that's about it, whereas gargantext deploy can be used to actually deploy the server to a target machine;

gargantext release

The release command should be responsible for:

Discovering the current release of gargantext via the programmatic API that GHC exposes via the Paths_gargantext file, in the version CAF
Bumping the release by tweaking the Version type following loosely the PVP -- I propose we do something very simple: a. By default, if the release command is run without any strong preferences, a minor release is performed, meaning that in the example above we would go from 0.0.7.4.7 to 0.0.7.4.8; b. In the initial description I stressed the term "loosely", and the reason being that if we were to stick to the the classic PVP, in case of breaking changes we would have to bump the second digit by one, so instead of 0.0.7.4.7 we should release 0.1.0.0, which is a bit counterintuitive given that our ggtx versioning scheme is not PVP-compliant. I propose we simplify our lives and we use a very simple schema: I. For minor changes we bump the last digit, so we have 0.0.7.4.7 -> 0.0.7.4.8; II. For breaking changes we bump the penultimate digit, so we have 0.0.7.4.7 -> 0.0.7.5.0; III. For new versions of the platform we bump the 3rd digit, so we have 0.0.7.4.7 -> 0.0.8.0.0;
Generating the changelog (See later section);
Modify the current version in the gargantext.cabal via sed in-place. We could use something more sophisticated like cabal-install-parsers but that is overkill, a simple sed substitution is enough;
Ensure that the project still builds;
Push to dev the released new version: that will trigger a CI rule to build the executables which will be used for the deploy of that particular version.

gargantext deploy

The deploy command should take as input at least a version and a target host and:

Acquire the binaries produced by CI after the push to dev;
Push the binaries to the target host;
Apply any DB migration on the host, if any;
Restart the server.

Deployment UML Sequence diagrams

In the following section I'm including some visual cues to showcase how things would work, in the form of UML Sequence diagrams:

gargantext release sequence diagram

gargantext deploy sequence diagram

Generating the changelog

In order to generate the changelog, we will need to acquire the set of commits pushed to dev between the last "release commit" (i.e. the one pushed during a gargantext release) and HEAD, filter only merge requests and have a way for merge requests to be "annotated" so that we can point back to the relevant issue. There are multiple ways to do this, one simple and naive, the other more complicated:

Simple: Every last commit that we push before opening a MR should be something like Fixes #xxx (did feature x y z) -- then we can scan the git log history, look for such text fragments and simply put in the changelog the issue number and the brief description of the fix. We don't need a very long description, because the changelog will include already a hyperlink to read the full issue, no need to be verbose;
Complicated: We need to still mention the related ticket a MR is closing, but then we could use the gitlab API to get the issue title from the MR number. The workflow would be that from the git log we identify the MRs commits, we grab the text description of the MR via the gitlab API, we find the referenced issue, and we include that and its title.

I think that both requires discipline from developers and both have the risk of missing changelog entries; even the complicated approach still relies on developers mentioning the initial issue in the MR description.

My slight inclination would be to go with the "complicated" approach is having access to a Gitlab API key is easy enough, but have a "preflight" check during release that before actually committing to gitlab anything the command should spit out the rendered changelog for auditing. Then a human can inspect it contains a sensible changelog and, if it doesn't, it might amend the MR description to include the backlink.

Technical decisions

In the past I have used Ansible and I find setting it up a bit clunky and fragile to write, but the glaring advantage is that code gets "shipped" via SSH to the remote machine and executed there, meaning that after we finish #487 any of the trusted developers can perform deploys and releases from their local machines, preventing malicious users to just checking out the code and do the same.

I don't really mind if we end up using Ansible or not, but the key feature is that it should be a system where we have a permission/auth mechanism to prevent unauthorised deploys. I'm a dinosaur in this regard so I'd argue we should deploy to bare metal and not into a docker container. I'm in favour of not reinventing the wheel here and use something like Ansible or similar as it's really battle tested and used my gazillion companies out there.

@fmaniere @cgenie I'm opening this ticket to get the ball rolling on the deployment story, as well as giving us visibility and high-level cooperation on the topic.

**NOTE:** This is going to be a long ticket, so grab your favourite beverage before diving in.

## Problem description

Gargantext is an open source project with an active development, but its deployments have been for a long time centralised, meaning that in that area the [bus factor](https://en.wikipedia.org/wiki/Bus_factor) is exactly 1. Recently, as Murphy's law has it, the bus factor manifested and as such deployments stopped/slowed down, leading to a lot of valuable contributions stagnating on the `dev` branch without the ability of being pushed outside for users to test it.

## Current (guessed) status quo

The current deployment scenario is, as far as I understand, the following:

1. A main GGTX engineer hops into one instance he would like to upgrade (let's pick `dev.sub.gargantext.org` as an example);
2. A "feature branch" is cut typically from a precise `dev` commit, and the feature branch is used as a baseline for the changes;
3. The code is compiled;
4. The version is bumped, with a versioning scheme which at the moment is not publicly disclosed or, in general, it's not completely obvious to guess (i.e. it doesn't adhere to the classic [PVP](https://pvp.haskell.org/) policy;
5. A changelog is (manually) forged that highlights all the changes happened between two versions. Such changelog as far as I understand is manually curated by a human;
6. The code is run (probably guarded by some `systemd` service) and the DB migration executed;
7. The release branch is merged back into `dev`.

I'm not exactly sure if the code gets run before the version bump and changelog have happened, but I decided to include the "run" step last because it's not obvious to make what should happen in case step `6` fails in terms of rollbacks (especially database migrations), so I have decided to put it last.

## Requirements

Going forward, we have a bunch of requirements for deployments:

1. Inspectable -- the deployment process should be "auditable", i.e. it should be document or, even better, self contained in a bunch of scripts (like [Ansible](https://docs.ansible.com/) for example), so that there are no "obscure steps" left to the imagination;
2. Automatic -- it should ideally be possible to run a single command for tagging a release (example: `gargantext release -o <version_number>`), to make a changelog (example: `gargantext generate-changelog -i <version_number>`) and to deploy (example: `gargantext deploy -i dev.sub.gargantext.org`);
3. Decentralised -- it should be possible for a pool of trusted GGTX engineers to all deploy if their SSH keys are present on the server(s). This way, the bus factor increases;
4. Nix-based -- we are already using `nix` to provide system dependencies, it's conceivable `nix` should play a role. We should research if there are tools that integrates with Nix (I know that Ansible should, at least) or if Nix has a bespoke way of deploying things (I know about `NixOps` but that is NixOS specific);
5. CI-powered -- see also the ticket #485 that @cgenie opened [here](https://gitlab.iscpif.fr/gargantext/haskell-gargantext/issues/485); given that CI has already cached dependencies and have to build GGTX anyway (even though not in a shape that is currently viable for production) it feels like a waste to not use it to do some sort of continuous integration. We could in theory also explore [hydra](https://github.com/NixOS/hydra) if that helps with deployments, but I don't think so, and I think that setting up a hydra instance will be a bit of a hassle.

## Plan of action

## Straw-man proposal

**NOTE**: I'm discussing only the backend for now, I need to think how the frontend fits into this. For me, that's a separate project and as such the backend release and deploy code shouldn't be intertwined with it, so I'm omitting it for now. I understand that the release and deploy of the two happens in lockstep, so does for many projects out there.

Given the discussion and the feedback from the initial writeup, here is my initial proposal that we can revise and discuss as we see fit:

> Let's introduce a set of CLI commands, `gargantext release`, `gargantext deploy`, which semantic will be explained later, but in a nutshell `gargantext release` can be used to tag a new release, generate a new changelog, push things to `dev` and that's about it, whereas `gargantext deploy` can be used to actually deploy the server to a target machine;

### gargantext release

The `release` command should be responsible for:

1. Discovering the current release of `gargantext` via the programmatic API that GHC exposes via the `Paths_gargantext` file, in the `version` CAF ![Screenshot_2025-06-30_at_10.35.57](/uploads/d8a8274c5ae5887337c1735454312f2c/Screenshot_2025-06-30_at_10.35.57.png) 
2. Bumping the release by tweaking the `Version` type following **loosely** the PVP -- I propose we do something very simple:
  a. By default, if the `release` command is run without any strong preferences, a minor release is performed, meaning that in the example above we would go from `0.0.7.4.7` to `0.0.7.4.8`;
  b. In the initial description I stressed the term "loosely", and the reason being that if we were to stick to the the classic PVP, in case of breaking changes we would have to bump the second digit by one, so instead of `0.0.7.4.7` we should release `0.1.0.0`, which is a bit counterintuitive given that our ggtx versioning scheme is not PVP-compliant. I propose we simplify our lives and we use a very simple schema:
    I. For minor changes we bump the last digit, so we have `0.0.7.4.7` -> `0.0.7.4.8`;
    II. For breaking changes we bump the penultimate digit, so we have `0.0.7.4.7` -> `0.0.7.5.0`;
    III. For new versions of the platform we bump the 3rd digit, so we have `0.0.7.4.7` -> `0.0.8.0.0`;
3. Generating the changelog (See later section);
4. Modify the current version in the `gargantext.cabal` via `sed` in-place. We could use something more sophisticated like [cabal-install-parsers](https://hackage.haskell.org/package/cabal-install-parsers) but that is overkill, a simple sed substitution is enough;
5. Ensure that the project still builds;
6. Push to `dev` the released new version: that will trigger a CI rule to build the executables which will be used for the deploy of that particular version.

### gargantext deploy

The `deploy` command should take as input at least a version and a target host and:

1. Acquire the binaries produced by CI after the push to `dev`;
2. Push the binaries to the target host;
3. Apply any DB migration on the host, if any;
4. Restart the server.

## Deployment UML Sequence diagrams

In the following section I'm including some visual cues to showcase how things would work, in the form of UML Sequence diagrams:

### gargantext release sequence diagram

![release_uml](/uploads/b7d74913171a07b2ed78d52726c21634/release_uml.png)

### gargantext deploy sequence diagram

![garg_deploy_uml](/uploads/eb4bec4c5f68619e2936aabfaf9c094b/garg_deploy_uml.png)

## Generating the changelog

In order to generate the changelog, we will need to acquire the set of commits pushed to `dev` between the last "release commit" (i.e. the one pushed during a `gargantext release`) and `HEAD`, filter only merge requests and have a way for merge requests to be "annotated" so that we can point back to the relevant issue. There are multiple ways to do this, one simple and naive, the other more complicated:

* **Simple:** Every last commit that we push before opening a MR should be something like `Fixes #xxx (did feature x y z)` -- then we can scan the `git log` history, look for such text fragments and simply put in the changelog the issue number and the brief description of the fix. We don't need a very long description, because the changelog will include already a hyperlink to read the full issue, no need to be verbose;

* **Complicated:** We need to still mention the related ticket a MR is closing, but then we could use the gitlab API to get the issue title from the MR number. The workflow would be that from the `git log` we identify the MRs commits, we grab the text description of the MR via the gitlab API, we find the referenced issue, and we include that and its title.

My slight inclination would be to go with the "complicated" approach is having access to a Gitlab API key is easy enough, but have a "preflight" check during `release` that before actually committing to `gitlab` anything the command should spit out the rendered changelog for auditing. Then a human can inspect it contains a sensible changelog and, if it doesn't, it might amend the MR description to include the backlink.

## Technical decisions

Edited Jun 30, 2025 by Alfredo Di Napoli