Skip to content

Conversation

omad
Copy link
Member

@omad omad commented Sep 17, 2025

As reported in #802, we were experiencing timeouts or extremely slow responses to the /stac/collections endpoint. The implementation was doing a lot of extra data transfer and work, by transferring full geometries from PostGIS into Python before simplifying them down to a bbox.

This moves the processing up into PostGIS, and in my testing gives a speedup between 10x and 121x, and potentially more in different environments. I only tested on my laptop.

Running with local PostGIS server, M4 MacBook Pro.

pytest-benchmark

Loading /stac/collections with the data loaded in test_stac.py

image

Small dataset, but about 10x faster.

Running datacube-explorer locally

Using a copy of our database with the horrendously complex geometries

Run the server:

uv run gunicorn -w 3 --threads 2 -k gthread --config python:cubedash.gunicorn_config 'cubedash:create_app()'

Run benchmark with hey

Before this change:

# 2 workers, 4 requests
$ hey -c 2 -n 4 -t 30 http://127.0.0.1:8000/stac/collections

Summary:
  Total:        23.5620 secs
  Slowest:      11.9025 secs
  Fastest:      11.6424 secs
  Average:      11.7767 secs
  Requests/sec: 0.1698

After this change:

# For fast, defaults to 50 workers and 200 requests
$ hey -t 30 http://127.0.0.1:8000/stac/collections

Summary:
  Total:        9.8161 secs
  Slowest:      2.6334 secs
  Fastest:      0.2301 secs
  Average:      1.8218 secs
  Requests/sec: 20.3746

121x faster.

I haven't added any request caching discussed in the issue, that can be done separately.

Fixes #802


📚 Documentation preview 📚: https://datacube-explorer--871.org.readthedocs.build/en/871/

@omad omad requested review from pjonsson and whatnick September 17, 2025 05:45
Copy link

codecov bot commented Sep 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.33%. Comparing base (662c6a1) to head (950a435).
⚠️ Report is 6 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #871      +/-   ##
===========================================
+ Coverage    84.32%   84.33%   +0.01%     
===========================================
  Files           35       35              
  Lines         4230     4214      -16     
  Branches       537      535       -2     
===========================================
- Hits          3567     3554      -13     
+ Misses         467      465       -2     
+ Partials       196      195       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

else [-180.0, -90.0, 180.0, 90.0]
]
# TODO: Find a nicer way to make the typechecker happier
# pystac is too specific in wanting a list[float | int]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not going to help right now, but it's possible pystac would accept a PR that changes the type annotation from list to Sequence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I was thinking that's the best solution too.

@pjonsson
Copy link
Contributor

I started looking at the code but there's a lot of code being changed, and there are still test failures. I will take a look again when the tests are not failing.

@omad
Copy link
Member Author

omad commented Sep 17, 2025

I started looking at the code but there's a lot of code being changed, and there are still test failures. I will take a look again when the tests are not failing.

It's not too much code...

But there is a massive amount of duplication between the postgis and postgres index modules unfortunately, but using different sqlalchemy table definition styles which makes all the code that's otherwise identical, slightly different. It's unfortunately been added to my mental tidy up list.

@omad omad force-pushed the speedup-stac-collections branch 2 times, most recently from d3ebb8e to a9e3d97 Compare September 17, 2025 12:14
indexed=False,
offset=("metadata", "title"),
)

description = SimpleDocField(
name="description",
description="product description",
alchemy_column=collection.c.definition,
alchemy_column=ODC_PRODUCT.definition,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corresponding postgres change also fixes the type of the offset= parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that SimpleDocField turns a plain string into a length 1 tuple... but I'm not sure.

This is ugly though, I'll make them match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed type annotations around that (and a bunch of usages) in datacube-core, I'm quite sure you need the tuple. A bit surprised the type checker wasn't complaining about these lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

pyright was complaining, but mypy wasn't for some reason.

It's fixed now regardless.

) -> list[Row]:
# STAC Collections only hold a bounding box in EPSG:4326, no geometry
# Calculate the bounding box on the server, it's far more efficient
collection_bbox = func.Box2D(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RHS takes its data from TimeOverview, shouldn't this be called t_o_bbox (or overview_bbox, but that is longer)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. I might add some comments.

The Cubedash Product (which maps to a STAC Collection) doesn't have any bounding box or geometry attached, all the geometries are in the TimeOverview table, grouped by different period_types. In this case we use the period_type=="all" to get the one that covers all of time.


if bbox:
query = query.where(
collection.c.footprint_geometry.intersects(func.ST_MakeEnvelope(*bbox))
TimeOverview.footprint_geometry.intersects(func.ST_MakeEnvelope(*bbox))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no test coverage for most of these if-branches, but I'll trust you know what you are doing with these changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there's a least one bug that I found in these if-branches. The 'time' filter was crashing by trying to apply a python datetime function to a SQLAlchemy Column.

I only added one test in this PR. There's room for more...

In this case, collection was a subquery that held the required columns. I've removed the sub-query since it was unnecessary obfuscation, and apply the WHERE clause against the table.

# supports the full 'Features' specification.


@pytest.mark.parametrize("env_name", ("default",), indirect=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a performance issue, shouldn't this also be enabled for the postgis backend?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not using the PostGIS backend yet, and, there's a LOT of integration tests here that aren't testing against the PostGIS index either. I was afraid I'd fall down another rabbit hole if I started expanding the test coverage.

The table structure and query structure are identical between the two, with the exception of the naming of the String name field that is used to connect the cubedash product table with the odc product/agdc dataset_type table.

- Explain at a high level why we're using the TimeOverview able
- Fix a couple more typing errors,
  - Can't apply datetime_utc to SQLAlchemy Columns
  - SimpleDocField expects `offset` to be a tuple
- Inline a confusingly name variable in the main query
@omad omad force-pushed the speedup-stac-collections branch from 8cb8757 to 950a435 Compare September 18, 2025 19:41
@pjonsson pjonsson merged commit 6740b1c into develop Sep 18, 2025
17 checks passed
@pjonsson pjonsson deleted the speedup-stac-collections branch September 18, 2025 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extremely Slow STAC Collections Endpoint /stac/collections
2 participants