Failure to import large corpora (#511) · Issues · gargantext / haskell-gargantext

Failure to import large corpora

Summary

This might be specific to either the IMT instance or the HAL query but it seems that GTX fails to import the full corpora of all IMT publications : https://imt.sub.gargantext.org/#/share/NodeCorpus/132585

There is also en error in the doc chart which suggest that some process has been interrupted somewhere in the middle (on this chart, we have mostly doc in 2014, which does not reflect the state of the system).

Steps to reproduce

The query is API -> in database : HAL -> filter with organization: IMT : all_IMT

What is the current bug behavior?

The import is stuck at 49257. The relaunch of the query do not update the corpora. Estimated final corpora size is 100k doc.

What is the expected correct behavior?

When you re-launch the API search on the node https://imt.sub.gargantext.org/#/share/NodeCorpus/132585 the import should continue and complete until about 100k doc.
At first launch, it should have gone through the ~100k

## Summary

![image](/uploads/902a90a9029c9154ea0a55b49070c8dd/image.png)

## Steps to reproduce

The query is API -> in database : HAL -> filter with organization: IMT : all_IMT

## What is the current bug behavior?

The import is stuck at 49257. The relaunch of the query do not update the corpora. Estimated final corpora size is 100k doc.

## What is the expected correct behavior?

- When you re-launch the API search on the node https://imt.sub.gargantext.org/#/share/NodeCorpus/132585 the import should continue and complete until about 100k doc. 
- At first launch, it should have gone through the ~100k

Edited Sep 15, 2025 by david Chavalarias