feat: get issueDts from AWS in parallel #113

mackenzie-grimes-noaa · 2025-07-01T23:07:25Z

Linear Issue

Changes

Don't try to re-establish RabbitMQ connection inside of Publisher
- Catch StreamLostError pika exception as well?
Look up issueDts available in AWS for any dataset in parallel
- Default is 24 threads, which was pretty fast during testing. ProtocolUtils.get_issues() accepts a custom value with max_workers argument.

Explanation

Previously the ProtocolUtils would sequentially crawl a given AWS bucket (such as NBM) to discover the most recent issuance datetimes available.

So each AWS S3 ls call would wait to complete before attempting to ls the next folder. This meant that get_issues() response time increased linearly as the num_issues argument increased.
Getting the latest issueDt might take 500 ms, the latest 3 issueDts would take 1.5 seconds, the latest 6 issueDts would take about 3 seconds, and so on.

Now the function uses simple Python threading to look for recent issueDts in AWS in parallel, sending all the S3 ls requests at once and then sorting through what files each one found. By default it does this with up to 24 parallel threads, which just experimentally seemed to be a pretty fast number.

mackenzie-grimes-noaa and others added 3 commits July 1, 2025 16:46

add Python threading to AWS ls() calls in get_issues()

cb04b7c

default parallelism of ProtocolUtils.get_issues() to 24 threads

53b2a8e

add StreamLostError to possible reasons of RMQ disconnect (#111)

b601c04

mackenzie-grimes-noaa merged commit 95bef15 into main Jul 2, 2025
2 checks passed

mackenzie-grimes-noaa deleted the feat/get-issues-parallel branch July 2, 2025 18:04

mackenzie-grimes-noaa mentioned this pull request Jul 3, 2025

bug: get_issues() returning empty list #114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: get issueDts from AWS in parallel #113

feat: get issueDts from AWS in parallel #113

Uh oh!

mackenzie-grimes-noaa commented Jul 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: get issueDts from AWS in parallel #113

feat: get issueDts from AWS in parallel #113

Uh oh!

Conversation

mackenzie-grimes-noaa commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Linear Issue

Changes

Explanation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mackenzie-grimes-noaa commented Jul 1, 2025 •

edited

Loading