Fixing tree API calls
Problem
As part of the API analysis
(#467), I
noticed that we call /api/v1.0/tree/<id>/first-level
multiple times
from the frontend.
The response is in this format:
{
"node": {
"name": "Private",
"type": "NodeFolderPrivate",
"id": 9
},
"children": [
{
"node": {
"name": "LLM",
"type": "NodeCorpus",
"id": 104
},
"children": [
{
"node": {
"name": "Graph",
"type": "NodeGraph",
"id": 107
},
"children": []
},
{
"node": {
"name": "Terms",
"type": "NodeList",
"id": 106
},
"children": []
},
{
"node": {
"name": "Docs",
"type": "NodeTexts",
"id": 105
},
"children": []
}
]
}
]
}
As you can see, we request node id 9 and we get children along with
their children (whose children is an empty array, on purpose). But
then when a child is rendered, a similar request is made e.g. for node
id 104, resulting in fetching the Graph
etc nodes again. For large
trees, this can result in making a lot of requests to the API.
Moreover, I made a test where I created a corpus, then moved that
corpus under another corpus (apparently, you can't create a corpus
directly under a corpus, but moving works). When I fetch the root
node, the /first-level
API returns all the nested children. From
what I see, in G.D.Q.Tree.hs
we use a foldM
without any stop
condition, which just causes looping until the whole subtree structure
is fetched. Thus the API name is misleading: /first-level
in fact
returns all levels of the subtree.
I think the root cause of all this is that the frontend uses a
recursive NTree
data structure (see G.C.F.T.N.T.FTree
). I.e. it
expects a fully loaded recursive tree.
Ideally, we want to load as minimal of the tree as possible. However, having such a recursive data structure prevents us and requires using some hackery to further load the deeper, unfolded nodes.
Proposed solution
I propose to change this definition:
data NTree a = NTree a (Array (NTree a))
type FTree = NTree LNode
to something like this:
data LazyLoadedTree = LoadedNode LNode (Array LazyLoadedTree)
| PartialNode LNode (Array ID)
This way we clearly represent which children are fully loaded and which ones are not.
Please note that we do need the child list so that we can properly render the arrows for expanding/collapsing the tree.
If we also switched from the REST API into GraphQL endpoint and work a bit on the GraphQL tree endpoint we could issue requests as follows:
- Get the user's (expanded) Private folder with full nodes:
(key part is{ get_tree(root_id: 9) { root: { name node_type id parent_id } children: { name node_type id parent_id children: { id } } } }
children / children / id
query). - We have the information about tree id 9, it is a
LoadedNode
. It's children are of typePartialNode
since only theid
is loaded for them. - If a child of this tree is expanded, fetch it again as above.
- If it is not expanded, do nothing, just render the arrow depending on whether the children list contains elements or not.
- The GraphQL endpoint gives us the possibility to query only the
node and the children ids:
i.e. to return a{ get_tree(root_id: 9) { root: { name node_type id parent_id } children: { id } } }
PartialNode
. - The GraphQL arguments could mimick the returned tree structure,
i.e. we would probably have
FullNode
andPartialNode
depending on what is requested from children. This way we could make a similarfoldM
as in/first-tree
but with a proper stop condition.
The rendering with the above LazyLoadedTree
is done as follows:
- If it's a
LoadedNode
, just render the node and it's children - If it's a
PartialNode
, then render the node and, if it's expanded, perform the above GraphQL query, then render the children. If it's not expanded, no query is needed and the arrow is rendered according to thechildren
array length.
Remarks
This task relates both to frontend and backend.
The end-user shouldn't see any changes.
There would be less tree requests being made and they would be shorter, giving overall better performance, especially for large trees.