-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Issue
I'm having trouble with extremely slow responses from /stac/collections
on a Datacube Explorer instance we're running. It's taking about 45s to return a response, and is no faster for subsequent responses. We were also having trouble with it running out of memory until we allowed it to use more.
I think this issue existed before #730 but it did touch the relevant code.
Diagnosis
The problem appears to be caused by a couple of things.
-
Some of our products have extremely complex geometries created by Explorer as part of their all of time summary.
For example, it looks like this -
The code that's turning Products+Summaries into STAC Collections is extracting the full summary geometry for every Product from the Database, reprojectinging it, storing it, and then throwing it all away and just returning the Bounding Box, since that's all the STAC Collections specification says to include.
I've also found another inefficiency that occurs if pagination is happening on STAC Collections results, the set of all Products/Summary geometry is downloaded a second time into RAM to just to get the count.
Proposed fixes
The collections search query is only used to return STAC Collections, which means it never needs to return full geometries, only bounding boxes. Bounding Boxes can be efficiently computed within PostGIS. (Ugh, which may not be available... so might need something else).
The lines at https://github.com/opendatacube/datacube-explorer/blob/develop/cubedash/_stac.py#L965-L967 should be replaced with a call to an efficient SQL count(*)
query.