Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Submit feedback
    • Contribute to GitLab
  • Sign in
haskell-gargantext
haskell-gargantext
  • Project
    • Project
    • Details
    • Activity
    • Releases
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
  • Issues 175
    • Issues 175
    • List
    • Board
    • Labels
    • Milestones
  • Merge Requests 9
    • Merge Requests 9
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • gargantext
  • haskell-gargantexthaskell-gargantext
  • Issues
  • #434

Closed
Open
Opened Dec 13, 2024 by Przemyslaw Kaminski@cgenie
  • Report abuse
  • New issue
Report abuse New issue

For PostgreSQL queries, don't use `offset`

This is inspired by the articles:

  • https://cedardb.com/blog/pagination/
  • https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/

Basically, using offset in PostgreSQL queries results in these issues:

  1. When there are 'ties' at the page break, you can get unpredictable results on the current and next page (because the second time you make a query, sorting can be done differently). Although our 'terms' have unique names and, e.g. for docs, it rarely happens that two documents have the same title, however, we allow also sorting by score, document source, document date, document tag, and these offer real threats of ties.
  2. offset requires the DB to sort all results and then throw away offset number of these

Proposition is to replace offset by something else.

One solution is a PostgreSQL cursor: https://www.postgresql.org/docs/current/plpgsql-cursors.html. However, this requires keeping an open connection to the client.

Another proposition is to use something like last-id. If we sort by id, then for the next page, we just require results with id > last_id. If we sort by name, we could additionally sort by id and return results with (name == last_name AND id > last_id) OR (name > last_name). The additional id part in ordering and next query allows splitting the 'tie' when two name's are the same. Also, by requesting name >= last_name we already allow PostgreSQL to optimize the query by first filtering by name and then sorting (with offset it has to sort the whole result-set first).

From a technical point of view in our code, we currently have: Maybe Offset, Maybe Limit, Maybe OrderBy. One would change that to Maybe Id, Maybe Limit, Maybe OrderBy. The (Offset, OrderBy) pair should be replaced by (Id, OrderBy).

This issue is just a proposition, I'm open to any comments, including ones that just tell us to stick with Offset :)

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
0
Labels
None
Assign labels
  • View project labels
Reference: gargantext/haskell-gargantext#434