Failure to import large corpora
Summary
This might be specific to either the IMT instance or the HAL query but it seems that GTX fails to import the full corpora of all IMT publications : https://imt.sub.gargantext.org/#/share/NodeCorpus/132585
There is also en error in the doc chart which suggest that some process has been interrupted somewhere in the middle (on this chart, we have mostly doc in 2014, which does not reflect the state of the system).
Steps to reproduce
The query is API -> in database : HAL -> filter with organization: IMT : all_IMT
What is the current bug behavior?
The import is stuck at 49257. The relaunch of the query do not update the corpora. Estimated final corpora size is 100k doc.
What is the expected correct behavior?
- When you re-launch the API search on the node https://imt.sub.gargantext.org/#/share/NodeCorpus/132585 the import should continue and complete until about 100k doc.
- At first launch, it should have gone through the ~100k