Skip to content

Out of Sync Explorer Tables + Crashes #581

@omad

Description

@omad

While managing the DEA ODC Databases, we've had to do some manual deletions and updates to the agdc database tables, and I suspect this has made Explorer unhappy.

We've used a 'QA' SQL query in the past, which compares the number of datasets explorer knows about vs the number of datasets ODC knows about.

SELECT *, (OUTPUT_QUERY.agdc_ds_count - OUTPUT_QUERY.explorer_ds_count) as diff
FROM (
        select count(ds.id) as agdc_ds_count, p.dataset_count as explorer_ds_count, dt.id, p.name 
        from cubedash.product p, agdc.dataset_type dt, agdc.dataset ds 
        where 
            dt.name = p.name 
        and dt.id = ds.dataset_type_ref 
        and ds.archived is NULL 
        GROUP BY p.name, dt.id, p.dataset_count) AS OUTPUT_QUERY
WHERE NOT OUTPUT_QUERY.agdc_ds_count = OUTPUT_QUERY.explorer_ds_count;

Which shows a bunch of mismatched Products:

 agdc_ds_count | explorer_ds_count |  id  |            name            |  diff   
---------------+-------------------+------+----------------------------+---------
        419777 |            514922 |  479 | ga_ls_fc_3                 |  -95145
        415373 |            417815 |  478 | ga_ls_wo_3                 |   -2442
         39227 |             88696 | 1271 | ga_ls_wo_fq_cyear_3        |  -49469
        503454 |            503453 | 1798 | ga_s2am_ard_3              |       1
         65063 |            184167 | 1006 | ga_s2_ba_provisional_3     | -119104
          8639 |             10050 | 2953 | ga_s2ls_intertidal_cyear_3 |   -1411
         51381 |            204223 | 1600 | ga_s2_wo_provisional_3     | -152842

In an attempt to update this, I thought i could run cubedash-gen --force-refresh on each of the out of sync products, however when I try this, i get a crash:

cubedash-gen --force-refresh ga_s2ls_intertidal_cyear_3
Updating 1 products for LocalConfig<loaded_from=defaults, environment='default', config={'db_hostname': 'dea-dev-db', 'db_database': 'odc', 'db_username': 'explorer_writer', 'db_password': '***', 'index_driver': 'default', 'db_connection_timeout': '60'}>
Generating product summaries...
ga_s2ls_intertidal_cyear_3 refresh
2024-04-05T04:27:02.288888Z [warning  ] spatial_deletion_full_scan     after_date=None product_name=ga_s2ls_intertidal_cyear_3
2024-04-05T04:27:02.326885Z [warning  ] spatial_update.recreating_everything after_date=None product_name=ga_s2ls_intertidal_cyear_3
2024-04-05T04:33:35.828477Z [warning  ] forcing_refresh                product_name=ga_s2ls_intertidal_cyear_3
	 ga_s2ls_intertidal_cyear_3 1900
2024-04-05T04:33:35.996976Z [error    ] product.error                  product=ga_s2ls_intertidal_cyear_3
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/cubedash/generate.py", line 127, in generate_report
    result, updated_summary = store.refresh(
  File "/usr/local/lib/python3.10/dist-packages/cubedash/summary/_stores.py", line 1528, in refresh
    self._recalculate_period(
  File "/usr/local/lib/python3.10/dist-packages/cubedash/summary/_stores.py", line 1374, in _recalculate_period
    summary = self._summariser.calculate_summary(
  File "/usr/local/lib/python3.10/dist-packages/cubedash/summary/_summarise.py", line 170, in calculate_summary
    dict(
TypeError: 'LegacyCursorResult' object is not subscriptable
ga_s2ls_intertidal_cyear_3 error
finished. 1 error

This did seem to succeed in getting the dataset counts in sync, but I'm not sure what state it has left the database in.

Questions

  • Is this crash happening after the Product is entirely updated, or is it only partially done?
  • Does this SQL query make any sense?
  • Should a 'QA' feature like this be included in Explorer?
  • What can we do to make Explorer resilient to Database modifications? Ideally we need to fix ODC to support mutating the database, but there's going to be a few assumptions about datasets not being allowed to be deleted that will have flow on impacts.

Versions

Docker Image: docker.io/opendatacube/explorer@sha256:026ac9f06ad41abdf62dd5c8dc3206f6595bc25564f029bbcec574e64806d317
Explorer version: 2.10.2.dev20+ge0aa69f3
Core version: 1.8.17
Python: 3.10.12
SQLAlchemy Version: 1.4.52

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions