-
Notifications
You must be signed in to change notification settings - Fork 33
Description
While managing the DEA ODC Databases, we've had to do some manual deletions and updates to the agdc
database tables, and I suspect this has made Explorer unhappy.
We've used a 'QA' SQL query in the past, which compares the number of datasets explorer knows about vs the number of datasets ODC knows about.
SELECT *, (OUTPUT_QUERY.agdc_ds_count - OUTPUT_QUERY.explorer_ds_count) as diff
FROM (
select count(ds.id) as agdc_ds_count, p.dataset_count as explorer_ds_count, dt.id, p.name
from cubedash.product p, agdc.dataset_type dt, agdc.dataset ds
where
dt.name = p.name
and dt.id = ds.dataset_type_ref
and ds.archived is NULL
GROUP BY p.name, dt.id, p.dataset_count) AS OUTPUT_QUERY
WHERE NOT OUTPUT_QUERY.agdc_ds_count = OUTPUT_QUERY.explorer_ds_count;
Which shows a bunch of mismatched Products:
agdc_ds_count | explorer_ds_count | id | name | diff
---------------+-------------------+------+----------------------------+---------
419777 | 514922 | 479 | ga_ls_fc_3 | -95145
415373 | 417815 | 478 | ga_ls_wo_3 | -2442
39227 | 88696 | 1271 | ga_ls_wo_fq_cyear_3 | -49469
503454 | 503453 | 1798 | ga_s2am_ard_3 | 1
65063 | 184167 | 1006 | ga_s2_ba_provisional_3 | -119104
8639 | 10050 | 2953 | ga_s2ls_intertidal_cyear_3 | -1411
51381 | 204223 | 1600 | ga_s2_wo_provisional_3 | -152842
In an attempt to update this, I thought i could run cubedash-gen --force-refresh
on each of the out of sync products, however when I try this, i get a crash:
cubedash-gen --force-refresh ga_s2ls_intertidal_cyear_3
Updating 1 products for LocalConfig<loaded_from=defaults, environment='default', config={'db_hostname': 'dea-dev-db', 'db_database': 'odc', 'db_username': 'explorer_writer', 'db_password': '***', 'index_driver': 'default', 'db_connection_timeout': '60'}>
Generating product summaries...
ga_s2ls_intertidal_cyear_3 refresh
2024-04-05T04:27:02.288888Z [warning ] spatial_deletion_full_scan after_date=None product_name=ga_s2ls_intertidal_cyear_3
2024-04-05T04:27:02.326885Z [warning ] spatial_update.recreating_everything after_date=None product_name=ga_s2ls_intertidal_cyear_3
2024-04-05T04:33:35.828477Z [warning ] forcing_refresh product_name=ga_s2ls_intertidal_cyear_3
ga_s2ls_intertidal_cyear_3 1900
2024-04-05T04:33:35.996976Z [error ] product.error product=ga_s2ls_intertidal_cyear_3
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/cubedash/generate.py", line 127, in generate_report
result, updated_summary = store.refresh(
File "/usr/local/lib/python3.10/dist-packages/cubedash/summary/_stores.py", line 1528, in refresh
self._recalculate_period(
File "/usr/local/lib/python3.10/dist-packages/cubedash/summary/_stores.py", line 1374, in _recalculate_period
summary = self._summariser.calculate_summary(
File "/usr/local/lib/python3.10/dist-packages/cubedash/summary/_summarise.py", line 170, in calculate_summary
dict(
TypeError: 'LegacyCursorResult' object is not subscriptable
ga_s2ls_intertidal_cyear_3 error
finished. 1 error
This did seem to succeed in getting the dataset counts in sync, but I'm not sure what state it has left the database in.
Questions
- Is this crash happening after the Product is entirely updated, or is it only partially done?
- Does this SQL query make any sense?
- Should a 'QA' feature like this be included in Explorer?
- What can we do to make Explorer resilient to Database modifications? Ideally we need to fix ODC to support mutating the database, but there's going to be a few assumptions about datasets not being allowed to be deleted that will have flow on impacts.
Versions
Docker Image: docker.io/opendatacube/explorer@sha256:026ac9f06ad41abdf62dd5c8dc3206f6595bc25564f029bbcec574e64806d317
Explorer version: 2.10.2.dev20+ge0aa69f3
Core version: 1.8.17
Python: 3.10.12
SQLAlchemy Version: 1.4.52