New Release v2.0.0 - #major #586

adk-swisstopo · 2025-06-10T14:24:24Z

We were holding releases due to PB-1619. It looks increasingly likely that is an issue specific to how the DEV environment is setup. Let's cut a new release and push it to INT to verify. This is a major release because at least #572 qualifies.

In #558 we added the ability to log stack traces upon exit. However we did not distinguish between a normal graceful exit and an abnormal exit of a deadlocked service. This means that on every exit/restart, we clutter the logs with stack traces. With this change, upon exit we only attempt to dump the stack if the worker is still running after a delay. This means that graceful exit that complete within that delay do not clutter the logs with stack traces. This change removes the `GUNICORN_DUMP_STACKS_ON_EXIT` variable and adds `GUNICORN_STACK_DUMP_DELAY` and `GUNICORN_GRACEFUL_TIMEOUT`. The former being set by default to one second less than the latter. We may need to tweak `GUNICORN_GRACEFUL_TIMEOUT` to be coherent with the Kubernetes `terminationGracePeriodSeconds`. Another way to resolve this might be implemented upstream later in benoitc/gunicorn#3385

PB-1531: only dump stacks when exit is taking too long.

PB-1575: make remove_expired_items more memory-efficient, probably.

Previously: Response 200 Now: Response 200 GET /checker?

PB-1619: make response logging more useful.

This includes a fix for PB-1531.

This was fixed in gevent 25.4.1.

Upgrade gevent and remove our custom bug workaround.

This refactores the `RemoveExpiredItems` test case to make it easier to modify the code being tested and to add more tests in the future. Specifically: * move the fixtures into `setUp` and `tearDown` as that's where they belong * split `test_remove_item` into two as it's really two tests

This will be used to add more test cases in a later change.

Specifically: * verify non-expiring items are not removed * allow defining arbitrary number of expiring and non-expiring items * test the default min_age_hours * print the correct class name on object existence assertion failure * deduplicate expected output * add test case for many items (currently 20)

That was only useful when testing exact match on stdout. We now only look for patterns so it's less cumbersome to write new tests or to modify the output.

Specifically: * check that non-numerical values trigger an error * check that passing a value lower than the default works

We track memory usage and wallclock. Also increase the number of Items from 20 to 110.

The data_factory creates Django objects individually, which is really slow when many objects are involved as that incurs one database round-trip per object. With this change we use the `bulk_create` method instead. This dramatically speeds up the test setup and allows much more practical performance testing of larger datasets. Also bump the number of Items in RemoveExpiredItemsManyWithProfiling to 1010. Profiling output on my laptop: memory consumed: 6857881 (6857.881 per item) time consumed: 15.571250406s (0.015571250406000001 per item)

We now only perform a fixed number of queries instead of multiple queries for each Item. Performance increase for RemoveExpiredItemsManyWithProfiling: memory +57% memory consumed: 2890362 (2890.362 per item) wallclock +69% time consumed: 4.712163328s (0.004712163328 per item) However this shifts the load to the database. A later commit will add safety features to reduce risks of overloading it. That said, I am not too concerned about [conflicting locks or transactions](https://pglocks.org/?pgcommand=DELETE).

This adds two flags: --max-deletions (absolute value) and --max-deletions-percentage (a percentage). If we detect an attempt to remove more than either value of Items, we fail with an error. This is useful to guard against bugs that would accidentally wipe out the whole database and to reduce risks of overloading the database with a request that might consume a lot of resources. Default values are based on the daily usage we observe in PROD. Given the backlog of expired Items currently in the database, the safety will trigger until we clean things up manually. This also forbids negative values for --min-age-hours.

While remove_expired_items only change the status of AssetUploads from IN_PROGRESS to ABORTED, there is an [ON_DELETE=CASCADE](https://github.com/geoadmin/service-stac/blob/1219a97f35dc296a44328e710a23c858a99aa3aa/app/stac_api/models/item.py#L290) that removes them implicitely. There is also a [Django pre_delete constraing](https://github.com/geoadmin/service-stac/blob/1219a97f35dc296a44328e710a23c858a99aa3aa/app/stac_api/signals.py#L35) preventing us from deleting AssetUploads that are IN_PROGRESS. So we do need to update them for them to be DELETED afterwards. This change ensures that the correct AssetUploads are deleted.

That would be pretty bad.

PB-1575: many improvements to remove_expired_items and its tests.

PB-1615: Fix change form template variable lookup error

…-upload-url PB-1616: Revert "PB-1615: Fix change form template variable lookup error"

Every time we delete an `Asset` or `CollectionAsset`, we make a call to the corresponding S3 `DeleteObjects` method. For objects that are hosted externally (i.e. not in an S3 bucket we control), this call always fails silently and we can just as well not do it. With this change, we stop attempting to call `DeleteObjects` for externally hosted `Assets` and `CollectionAssets`, thus speeding up deletion.

PB-1575: don't attempt to delete externally hosted S3 objects.

This is the same as in the other endpoints that hide expired items: - GET /collections/{collectionId}/items: Fetch features - GET /collections/{collectionId}/items/{featureId}: Fetch a single feature There is no difference in the result but it reduces confusion.

To reduce duplication. Note that we cannot extract the whole filtering operation (`Item.objects.filter(...)`) as the different usages add additional fields.

PB-1635 Hide expired items in search

One `delete()` per object is very slow. One `delete()` for all the objects consumes a lot of memory and fails in an all-or-nothing mode (i.e. when it fails mid-way, nothing gets deleted). With this change, we split the objects into batches of configurable size and run `delete()` over each batch. It consumes a ~constant amount of memory, is faster than having a batch size of 1 and is more robust. I.e. when it fails mid-way, it will most likely have deleted at least some objects and the next run has that much less work to do. There will be cases where expired items are partially deleted (i.e. some of their assets are deleted but not all). I don't expect that to be a problem for our use case.

Instead of extracting the ids of the objects to DELETE with SELECT/LIMIT/OFFSET, we can just use SELECT/LIMIT. This is supposed to be [faster](https://www.postgresql.org/docs/current/queries-limit.html) but mostly, it's simpler as we don't need to paginate. Idea from https://forum.djangoproject.com/t/incremental-bulk-deletion/40833/2

PB-1575: make deletion batch size configurable.

Update develop with master hot fixes

Fixed build badge of develop

Master -> Develop

PB-1669: Render asset hrefs as links in the admin UI

PB-1704: Handle application/geo+json requests

PTB-1596: Update boto3

…tests PB-1740: Remove sample data tests

…nt-command PB-1520: Add command for superuser creation

adk-swisstopo and others added 30 commits April 16, 2025 15:42

Merge pull request #563 from geoadmin/fix-PB-1531-cleanup

9af370b

PB-1531: only dump stacks when exit is taking too long.

PB-1575: make remove_expired_items more memory-efficient, probably.

4773101

Avoid the call to all() on assets too.

9b00972

Merge pull request #565 from geoadmin/fix-PB-1575-oom

1219a97

PB-1575: make remove_expired_items more memory-efficient, probably.

PB-1619: make response logging more useful.

31a1302

Previously: Response 200 Now: Response 200 GET /checker?

Merge pull request #566 from geoadmin/fix-PB-1619-log

f8bfe79

PB-1619: make response logging more useful.

PB-1622: upgrade to gevent 25.4.1.

dbcf4e9

This includes a fix for PB-1531.

PB-1622: remove obsolete workaround for PB-1531.

55925d0

This was fixed in gevent 25.4.1.

Merge pull request #567 from geoadmin/fix-PB-1622-gevent

94d2188

Upgrade gevent and remove our custom bug workaround.

Allow testing with multiple Items.

54a5010

This will be used to add more test cases in a later change.

Stop passing --no-color to remove_expired_items.

21b93d6

That was only useful when testing exact match on stdout. We now only look for patterns so it's less cumbersome to write new tests or to modify the output.

Add tests for --min-age-hours.

ff43038

Specifically: * check that non-numerical values trigger an error * check that passing a value lower than the default works

Add basic profiling to RemoveExpiredItemsMany.

a52bb86

We track memory usage and wallclock. Also increase the number of Items from 20 to 110.

Check that Items with no expiration dates are never deleted.

7d5433e

That would be pretty bad.

Merge pull request #569 from geoadmin/fix-PB-1575-oom-test3

d4e7bd7

PB-1575: many improvements to remove_expired_items and its tests.

PB-1615: Fix change form template variable lookup error

daf73d5

Merge pull request #570 from geoadmin/bug-PB-1615-property-upload-url

13e8b1d

PB-1615: Fix change form template variable lookup error

Revert "PB-1615: Fix change form template variable lookup error"

853da6e

Merge pull request #571 from geoadmin/revert-570-bug-PB-1615-property…

dc2f371

…-upload-url PB-1616: Revert "PB-1615: Fix change form template variable lookup error"

Merge pull request #573 from geoadmin/fix-PB-1575-delete-external

27cba7c

PB-1575: don't attempt to delete externally hosted S3 objects.

PB-1635 Hide expired items in search

d5b208a

PB-1635 Add feature flag to disable hiding expired items

f74fd8b

asteiner-swisstopo and others added 24 commits May 8, 2025 14:09

PB-1635 Extract filter to function

3ec920b

To reduce duplication. Note that we cannot extract the whole filtering operation (`Item.objects.filter(...)`) as the different usages add additional fields.

PB-1635 Rename feature flag to avoid confusion

f3bb026

PB-1635 Also add an item with expiration date

381a423

Merge pull request #572 from geoadmin/feat-pb-1635-hide-expired-items

c721eaf

PB-1635 Hide expired items in search

Merge pull request #574 from geoadmin/fix-PB-1575-delete-external

dc4face

PB-1575: make deletion batch size configurable.

Merge pull request #576 from geoadmin/master

c55e6ef

Update develop with master hot fixes

Fixed build badge of develop

5966873

Merge pull request #577 from geoadmin/ltshb-patch-1

81e3fd8

Fixed build badge of develop

Merge pull request #581 from geoadmin/master

578daa6

Master -> Develop

PB-1669: Render asset hrefs as links in the admin UI

2540f8e

PB-1669: Remove asset and collection file field from admin UI

640110a

PB-1669: Add some notes on flaky behaviour.

9fac0fc

Merge pull request #580 from geoadmin/feat-PB-1669-clickable-href

2f55d39

PB-1669: Render asset hrefs as links in the admin UI

PB-1704: Handle application/geo+json requests

b21bda6

Merge pull request #582 from geoadmin/feat-PB-1704-response-content-type

17ffa4b

PB-1704: Handle application/geo+json requests

PTB-1596: Update boto3

9af9dfe

Merge pull request #583 from geoadmin/feat-PB-1596-update-boto3

8ad2c13

PTB-1596: Update boto3

PB-1740: Remove sample data tests

926d366

Merge pull request #584 from geoadmin/feat-PB-1740-remove-sampledata-…

256a3f0

…tests PB-1740: Remove sample data tests

PB-1520: Add command for superuser creation

70a066e

Merge pull request #585 from geoadmin/feat-PB-1520-superuser-manageme…

020ba4a

…nt-command PB-1520: Add command for superuser creation

github-actions bot added the new-release label Jun 10, 2025

adk-swisstopo requested a review from ltshb June 10, 2025 14:24

github-actions bot changed the title ~~#major release~~ New Release v2.0.0 - #major Jun 10, 2025

adk-swisstopo requested a review from boecklic June 10, 2025 14:24

ltshb approved these changes Jun 11, 2025

View reviewed changes

adk-swisstopo merged commit c240b8d into master Jun 11, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New Release v2.0.0 - #major #586

New Release v2.0.0 - #major #586

Uh oh!

adk-swisstopo commented Jun 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

New Release v2.0.0 - #major #586

New Release v2.0.0 - #major #586

Uh oh!

Conversation

adk-swisstopo commented Jun 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants