Skip to content

Inconsistent Paging in PostgreSQL with Non-Unique SortBy #1155

@nirolevy

Description

@nirolevy

When a CSW GetRecords request is run against a PostgreSQL backend and uses SortBy on a field that isn't guaranteed to be unique (such as dc:date or dc:modified), the sort order becomes non-deterministic for tied records. This lack of a unique ordering, specifically in PostgreSQL, may cause a record to be included in more than one page.


Problem Description

If multiple records share the exact same value for the field specified in the SortBy clause, the database determines the tie-breaker order arbitrarily. This issue was not observed when using SQLite.

When pycsw translates the maxRecords (LIMIT) and startPosition (OFFSET) parameters for paging:

  • The non-deterministic order means the records returned near the boundary between two pages are not consistently ordered between successive queries.
  • For example, when retrieving Page 1 (startPosition=1, maxRecords=50) followed by Page 2 (startPosition=51, maxRecords=50), records that were the last entries in Page 1 may reappear as the first entries in Page 2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions