Skip to content

Extremely Slow STAC Collections Endpoint /stac/collections #802

@omad

Description

@omad

Issue

I'm having trouble with extremely slow responses from /stac/collections on a Datacube Explorer instance we're running. It's taking about 45s to return a response, and is no faster for subsequent responses. We were also having trouble with it running out of memory until we allowed it to use more.

I think this issue existed before #730 but it did touch the relevant code.

Diagnosis

The problem appears to be caused by a couple of things.

  • Some of our products have extremely complex geometries created by Explorer as part of their all of time summary.
    For example, it looks like this

    Image
  • The code that's turning Products+Summaries into STAC Collections is extracting the full summary geometry for every Product from the Database, reprojectinging it, storing it, and then throwing it all away and just returning the Bounding Box, since that's all the STAC Collections specification says to include.

I've also found another inefficiency that occurs if pagination is happening on STAC Collections results, the set of all Products/Summary geometry is downloaded a second time into RAM to just to get the count.

Proposed fixes

The collections search query is only used to return STAC Collections, which means it never needs to return full geometries, only bounding boxes. Bounding Boxes can be efficiently computed within PostGIS. (Ugh, which may not be available... so might need something else).

The lines at https://github.com/opendatacube/datacube-explorer/blob/develop/cubedash/_stac.py#L965-L967 should be replaced with a call to an efficient SQL count(*) query.

@whatnick

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions