[query] one more performance improvement for getOccByNgramsOnlyFast

This time, we reduce one CTE table, this makes the query shorter,
doesn't reduce readability, maybe it will make it easier on the
postgres planner.
parent 0b3d274e
Pipeline #7881 passed with stages
in 47 minutes
......@@ -132,7 +132,8 @@ getOccByNgramsOnlyFast_withSample cId int nt ngs =
HM.fromListWith (+) <$> selectNgramsOccurrencesOnlyByContextUser_withSample cId int nt ngs
-- Returns occurrences of ngrams in given corpus/list (for each ngram, a list of contexts is returned)
-- | Returns occurrences of ngrams in given corpus/list (for each
-- ngram, a list of contexts is returned)
getOccByNgramsOnlyFast :: CorpusId
-> ListId
-> NgramsType
......@@ -166,16 +167,14 @@ getOccByNgramsOnlyFast cId lId nt = do
FROM context_node_ngrams
WHERE context_id IN (SELECT context_id FROM nc)
),
node_context_ids AS
( SELECT context_id, ngrams_id, terms
ncids_agg AS
( SELECT array_agg(DISTINCT context_id) AS agg,
ngrams_id,
terms
FROM cnnv
JOIN ngrams
ON cnnv.ngrams_id = ngrams.id
),
ncids_agg AS
( SELECT ngrams_id, terms, array_agg(DISTINCT context_id) AS agg
FROM node_context_ids
GROUP BY (ngrams_id, terms)
GROUP BY (ngrams_id, terms)
),
ns AS
(SELECT ngrams_id, terms
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment