Skip to content

Commit fbf64e7

Browse files
committed
PB-1575: simplify batching a bit.
Instead of extracting the ids of the objects to DELETE with SELECT/LIMIT/OFFSET, we can just use SELECT/LIMIT. This is supposed to be [faster](https://www.postgresql.org/docs/current/queries-limit.html) but mostly, it's simpler as we don't need to paginate. Idea from https://forum.djangoproject.com/t/incremental-bulk-deletion/40833/2
1 parent d90aa66 commit fbf64e7

File tree

1 file changed

+7
-12
lines changed

1 file changed

+7
-12
lines changed

app/stac_api/management/commands/remove_expired_items.py

Lines changed: 7 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33

44
from django.conf import settings
55
from django.core.management.base import CommandParser
6-
from django.core.paginator import Paginator
76
from django.utils import timezone
87

98
from stac_api.models.general import BaseAssetUpload
@@ -42,20 +41,16 @@ def delete_by_batch(self, queryset, object_type, batch_size):
4241
# than waiting forever, to fail and to have to start from scratch next
4342
# time.
4443
type_name = f'stac_api.{object_type.__name__}'
44+
total = queryset.count()
4545
deleted_count = 0
46-
# We delete rows as we iterate over them. This only works if we iterate
47-
# from the end to the beginning. But we also want to delete the objects
48-
# in the order of the QuerySet we received. Hence, we first reverse the
49-
# the QuerySet then we reverse the iterator.
50-
queryset = queryset.reverse()
51-
paginator = Paginator(queryset, batch_size)
52-
for page_number in reversed(paginator.page_range):
53-
page = paginator.page(page_number)
54-
# We cannot just call page.object_list.delete() because DELETE
46+
while deleted_count < total:
47+
# We cannot just call queryset[:batch_size].delete() because DELETE
5548
# does not support LIMIT/OFFSET. So instead we extract the ids
5649
# then we'll build a new QuerySet to DELETE them.
57-
ids = page.object_list.values('id')
50+
ids = queryset.values('id')[:batch_size]
5851
expected_deletions = len(ids)
52+
if expected_deletions == 0:
53+
break
5954
dry_run_prefix = ''
6055
if self.options['dry_run']:
6156
dry_run_prefix = '[dry run]: '
@@ -66,7 +61,7 @@ def delete_by_batch(self, queryset, object_type, batch_size):
6661
actual_deletions = deleted_objs.get(type_name, 0)
6762
deleted_count += actual_deletions
6863
self.print_success(
69-
f'{dry_run_prefix}Deleted {deleted_count}/{paginator.count} {type_name}.'
64+
f'{dry_run_prefix}Deleted {deleted_count}/{total} {type_name}.'
7065
f' In this batch: {actual_deletions}/{expected_deletions}.'
7166
f' All objects in this batch: {deleted_objs}.'
7267
)

0 commit comments

Comments
 (0)