LLM summarizer, other ideas (#470) · Issues · gargantext / haskell-gargantext

LLM summarizer, other ideas

LLMs are on top these days. I would like to open a thread about possibilities of using that in Gargantext (I wouldn't care much, but since it's specifically a text-processing platform, we should at least have some stance on the topic).

My two main concerns with LLMs are: high hardware requirements (preferably a GPU card with lots of VRAM) and privacy (if you use the hosted solutions).

I have recently spent some time playing around with this tech and I have some random thoughts.

One is able to self-host LLM easily with Ollama.
For commodity hardware, use 7b models (actually, more parameters doesn't directly translate to quality)
I have played around with Kagi assistant (which is free for Premium users) and I am most satisfied with DeepSeek Chat v3.
I am dissatisfied with what GPT 4o mini returned.
Kagi updates it's models regularly, so DeepSeek from Kagi is better than the (outdated) deepseek on ollama
Model updating is very expensive. Recent improvement in that area is RAG

It gets interesting when one starts to construct LLM workflows (langchain). Out-of-the box the LLM doesn't have access to anything, apart from what it was trained on. However, one can augment it with the possibility of using various, user-programmed "tools". E.g. render a website, execute rm -rf etc :)

Some tasks are blocked by the LLM itself, like fetching pages from the internet (this is because people complain about LLMs being trained on their website and traffic spiking very high, not taking into account the robots.txt files).

Anyways, we could send user's documents, terms as a context and ask for some kind of summary or ideas from the LLM. The process itself could be asynchronous, so the low performance of LLM models on commodity hardware wouldn't be that uncomfortable.

Another idea would be to use word2vec for computing similarity between documents (e.g. "car" and "automobile" are close in word2vec, but far apart as ngrams).

My two main concerns with LLMs are: high hardware requirements (preferably a GPU card with lots of VRAM) and privacy (if you use the hosted solutions).

I have recently spent some time playing around with this tech and I have some random thoughts.

1. One is able to self-host LLM easily with [Ollama](https://ollama.com).
2. For commodity hardware, use 7b models (actually, more parameters doesn't directly translate to quality)
3. I have played around with Kagi assistant (which is free for Premium users) and I am most satisfied with DeepSeek Chat v3.
4. I am dissatisfied with what GPT 4o mini returned.
5. Kagi updates it's models regularly, so DeepSeek from Kagi is better than the (outdated) deepseek on ollama
6. Model updating is very expensive. Recent improvement in that area is [RAG](https://en.wikipedia.org/wiki/Retrieval-augmented_generation)

It gets interesting when one starts to construct LLM workflows ([langchain](https://www.langchain.com)). Out-of-the box the LLM doesn't have access to anything, apart from what it was trained on. However, one can augment it with the possibility of using various, user-programmed "tools". E.g. render a website, execute `rm -rf` etc :)

Another idea would be to use [word2vec](https://en.wikipedia.org/wiki/Word2vec) for computing similarity between documents (e.g. "car" and "automobile" are close in word2vec, but far apart as ngrams).

Edited Jun 02, 2025 by Przemyslaw Kaminski