[v21.0.0] 2025-09-09
CUMULUS-4058 Epic: Handle Granules with Identical producerGranuleId in Different Collections
Migration Notes
CUMULUS-4069 Update granules table to include producer_granule_id column
Please follow the instructions before upgrading Cumulus
- This version requires Cumulus Dashboard v14 or greater
- The updates in CUMULUS-4069 require a manual update to the PostgreSQL database
in the production environment. Please follow the instructions in
Update granules to include producer_granule_id
Breaking Changes
- CUMULUS-4078
- Move Granules task will now check on file collision if the existing file is
registered in Core's database to another collection. If it is, the granule
(and the task execution) will fail, regardless of the duplicate behavior
configuration. If this behavior is undesirable for performance or logic
reasons, thecheckCrossCollectionCollisions
may be set tofalse
to
disable the behavior on a per-workflow, per-collection or other config
driven criteria.
- Move Granules task will now check on file collision if the existing file is
- CUMULUS-4072
- Updated the
parse-pdr
task component to throw an error if multiple
granules within the same PDR have the same granuleId after applying the
granuleIdFilter, unless theuniquifyGranuleId
configuration parameter is
explicitly set totrue
.
- Updated the
- CUMULUS-4074
- Updates
updateGranulesCmrMetadataFileLinks
to always ensure
producerGranuleId
identifier is set in updated CMR metadata
- Updates
- CUMULUS-4121
- Updates example deployment
cnm_response_task
to use newest versionv3.2.0
, which supports
producerGranuleId
. - Users must ensure that
cumulus-tf
includescnm_response_version = "3.2.0"
or greater.
- Updates example deployment
Added
- CUMULUS-4059
- Added new non-null column
producer_granule_id
to Postgresgranules
table. - Added
producerGranuleId
property togranule
record schema. - Updated
@cumulus
api/db/message packages to handleproducer_granule_id
andproducerGranuleId
. - Updated
@cumulus/api/lib/writeGranulesFromMessage
to set producerGranuleId
= granuleId if not set. - Updated
queue-granules
task to set producerGranuleId = granuleId if not
set.
- Added new non-null column
- CUMULUS-4061
- Added GenerateUniqueGranuleId to @cumulus/ingest for use in generating a
hashed/'uniquified' granuleID
- Added GenerateUniqueGranuleId to @cumulus/ingest for use in generating a
- CUMULUS-4062
- Added
producerGranuleId
toLzardsBackup
task component and lambda input/output schema - Updated
LzardsBackup
task component to submitproducerGranuleId
for storage in the lzards record as a key in themetadata
object.
- Added
- CUMULUS-4069
- Added migration script and instructions to add the producer_granule_id column
to the granules table and populate it in the production environment.
- Added migration script and instructions to add the producer_granule_id column
- CUMULUS-4072
- Updated
parse-pdr
task component to have the following behaviors:- Always populate producerGranuleId from the incoming parsed granuleId
- If
uniquifyGranuleId
configuration value is set to true, parse-PDR will
update the granuleId for all found granules to have a unique granule hash
appended to the existing ID - Updated
parse-pdr
such that if theuniquifyGranuleId
configuration
parameter is not set totrue
, and a duplicate granuleId is created as
part of the output after passing thegranuleIdFilter
, the task will
throw with an error.
- Added
ingestFromPdrWithUniqueGranuleIdsSpec.js
to the spec tests to
demonstrate the ingest workflow works as expected with unique granuleIds and
producerGranuleIds set.
- Updated
- CUMULUS-4073
- Adds AddUniqueGranuleId task to
ingest
terraform module for deployment
with Core. This task will update a payload of existing granules to have
'uniquified' IDs and preserve the original identifier in the
producerGranuleId
field
- Adds AddUniqueGranuleId task to
- CUMULUS-4074
- Updated
IngestGranuleSuccessSpec
/IngestUMMGSuccessSpec
to validate
producerGranuleId is populated in CMR post ingest - Updated IngestGranuleSuccessSpec to include a
producerGranuleId
in the default test case - Added ticket-relevant typing doc/ts-check updates to
@cumulus/cmrjs/cmr-utils
- Updated
updateCMRMetadata
to takeupdateGranuleIdentifiers
configuration
flag/producerGranuleId
such that that routine now will modify the CMR
metadata object with the correctGranuleUR
/ProducerGranuleId
values in
the CMR metadata. - Added unit test/refactored mocks to use direct injection for
cmr-utils
- Added
getCmrMetadata
helper to@cumulus/integration-tests
to allow
access to the full CMR metadata object for verification of record metadata
fields - Added
ApiFileGranuleIdOptional
to@cumulus/types/api
for cases where an
ApiFile is being generated and refactored existing code to use this type
instead of custom relaxed typing - Updates
update-granules-cmr-metadata-file-links
to use the updatedcmrjs
logic to set producerGranuleId identifiers in the CMR metadata, either equal
to granuleId or theproducerGranuleID
set on the granule. - Updates
@cumulus/tasks/sync-granule/GranuleFetcher
to allow and pass through an
incominggranule.producerGranuleId
- Updated
- CUMULUS-4077
- Updated
@cumulus/api/lib/ingest.reingestGranule
to only update the original granule
to 'queued' if the original payload contains the granule. This avoids a situation
where the original granule is updated to 'queued', but the reingest workflow
creates a new granule, leaving the original granule stuck in 'queued'.
- Updated
- CUMULUS-4078
- Added
getGranuleIdAndCollectionIdFromFile
query method to@cumulus/db
to
retrieve granule and collection metadata from a file's S3 location. - Added new API route
GET /granules/files/get_collection_and_granule_id/:bucket/:key
in@cumulus/api
to
return the granule ID and collection ID associated with a file. - Added
getFileGranuleAndCollectionByBucketAndKey
method to
@cumulus/api-client/granules
to allow use of new endpoint. - Added integration and unit tests for the new DB query, API endpoint, and
client method. - Updated
move-granules
task to validate cross-collection file collisions
using the new lookup logic whencheckCrossCollectionCollisions
is enabled. - Update
@cumulus/db
to add getGranuleIdAndCollectionIdFromFile query method
- Added
- CUMULUS-4079
- Added duplicate granule handling and related feature documentation, and updated related documentation to match
- Added
update-granules-cmr-metadata-file-links
task README
- CUMULUS-4080
- Add documentation for duplicate granule handling and, specifically, Collection configuration for duplicates.
- Update
urlPathTemplate
to allow falling back from one null/undefined interpolated value to a second argument
- CUMULUS-4082
- Updated example deployment to deploy
cnmResponse
lambda version
3.1.0-alpha.2-SNAPSHOT which utilizesproducerGranuleId
. - Updated example deployment to deploy
cnmToGranule
lambda version 2.1.0. - Added
FakeProcessing
task configurationmatchFilesWithProducerGranuleId
to determine if the generated cmr file names should match
granuleId
orproducerGranuleId
- Updated
AddUniqueGranuleId
task configurationhashLength
to accept
additional types and removed the use ofhashDepth
. - Updated
FilesToGranules
task configuration
matchFilesWithProducerGranuleId
to accept additional types. - Updated
ParsePdr
task configurationhashLength
to accept additional
types. - Fixed
tf-modules/cumulus
AddUniqueGranuleId
task output. - Updated example deployment workflow
CNMExampleWorkflow
to uniquify
granuleIds based on collection configuration - Added
KinesisTestTriggerWithUniqueGranuleIdsSpec.js
to the spec test to
demonstrate that the CNM ingest workflow ingests granules with unique
granuleIds and producerGranuleIds set, and that CnmResponse sends responses
using producerGranuleIds
- Updated example deployment to deploy
- CUMULUS-4085
- Added config option for files-to-granules task to use
producerGranuleId
when mapping files to their granules.
- Added config option for files-to-granules task to use
- CUMULUS-4089
- Add integration testing for duplicate granule workflows. This includes new
specs and workflows in theingestGranule
,discoverGranules
,
lzardsBackup
,cnmWorkflow
, andorca
specs.
- Add integration testing for duplicate granule workflows. This includes new
- CUMULUS-4110
- Added the
workflow_configurations
variable to thetf-modules/ingest
and
tf-modules/cumulus
modules.
The propertysf_event_sqs_to_db_records_types
has been added to
workflow_template.json
under thecumulus_meta
field to control which record
types should be written to the database during different workflow execution statuses.
Currently, both "execution" and "pdr" must be written to the database, so the
record type list must include both. - Updated the
SfSqsReport
task to setmeta.reportMessageSource
in the Cumulus message. - Updated the
@cumulus/api/sfEventSqsToDbRecords
lambda to determine which
record types ("execution", "granule", "pdr") should be written to the database based on the
cumulus_meta.sf_event_sqs_to_db_records_types
andmeta.reportMessageSource
fields.
By default, all record types will be written to the database. - Added
@cumulus/api/lib.writeRecords.writeGranuleExecutionAssociationsFromMessage
to write granule-execution associations from message. - Updated the
@cumulus/integration-tests
cmr.generateAndStoreCmrXml
to
applymatchFilesWithProducerGranuleId
when generaingOnlineAccessURL
.
- Added the
- CUMULUS-4119
- Added assertions in
KinesisTestTriggerWithUniqueGranuleIdsSpec
to cover "duplicate"
Granules in separate Collections.
- Added assertions in
- CUMULUS-4162
- Added an optional
includeTimestampHashKey
parameter to thegenerateUniqueGranuleId
function in the@cumulus/ingest/granule
, with a default value offalse
. - Added an optional
includeTimestampHashKey
configuration to theadd-unique-granuleId
andparse-pdr tasks
, also with a default value offalse
. - Added a documentation page titled
"Generate Unique GranuleId"
to explain the algorithm for generating uniquegranuleIds
.
- Added an optional
- CUMULUS-4028
- Update AddUniqueGranuleId task to output the input payload in addition to the modified granules.
- Added 'unique' version of ingest_and_publish granule workflow for 'uniquiy' feature ingest tests
- CUMULUS-4209
- Updated the
producer_granule_id
migration script to disable autovacuum before the
migration and re-enable it afterward to improve performance.
- Updated the
Changed
- CUMULUS-4165
- Update Async Operation container to new version 54,
cumuluss/async-operation:54
. Users should update their references toasync-operation
with the new version.
- Update Async Operation container to new version 54,
- CUMULUS-4205
- Add S3 Replicator lambda ARN to s3-replicator outputs