Skip to content

Conversation

vanessavmac
Copy link
Collaborator

@vanessavmac vanessavmac commented Aug 4, 2025

Summary

The current batch image processing system runs MLJobs as a single celery task. This causes issues when processing large numbers of image (i.e. 100+ images) since the long running task can be interrupted or lost.

This PR uses celery as the task queue and rabbitmq as the message broker to send batches of images as individual process_pipeline_requests into queues dedicated to a specific ML pipeline. Processing services can pick up tasks based on the pipelines they host. A periodic celery beat task listens for completed process_pipeline_requests and enqueues save_results tasks.

The planning of this feature was discussed in #515. See the comments beginning at #515 (comment)

List of Changes

  • Add a process_pipeline_request task which takes a PipelineRequest and returns the model's results. This is defined on the processing service.
  • Update the Job model to include subtasks and inprogress_subtasks to track the celery tasks queued (these can be either process_pipeline_request or save_results tasks
  • Add a periodic check_ml_job_status which checks the subtasks of an MLJob, updates the job status, and schedules save_results tasks
  • Update processing services to include celery workers that subscribe to their pipelines' queues
  • Introduces a new MLTaskRecord model which stores the results and stats of a celery task

Related Issues

Addresses (in part?) #515

Detailed Description

Potential side effects or risks associated with the changes...

image

How to Test the Changes

Instructions on how to test the changes Include references to automated and/or manual tests that were created/used to
test the changes.

Screenshots

If applicable, add screenshots to help explain this PR (ex. Before and after for UI changes).

Deployment Notes

Include instructions if this PR requires specific steps for its deployment (database migrations, config changes, etc.)

Checklist

  • I have tested these changes appropriately.
  • I have added and/or modified relevant tests.
  • I updated relevant documentation or comments.
  • I have verified that this PR follows the project's coding standards.
  • Any dependent changes have already been merged to main.

vanessavmac and others added 30 commits March 23, 2025 11:17
Copy link

netlify bot commented Aug 4, 2025

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit 2fa57ef
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/68c8a5b1ce72f500086feb33

@vanessavmac vanessavmac changed the base branch from main to 706-support-for-reprocessing-detections-and-skipping-detector August 4, 2025 21:28
@vanessavmac vanessavmac requested a review from mihow August 4, 2025 21:37
Base automatically changed from 706-support-for-reprocessing-detections-and-skipping-detector to main August 16, 2025 01:36
@f-PLT f-PLT changed the title Async distributed ML Backend Enable async and distributed processing for the ML backend Aug 28, 2025
@vanessavmac vanessavmac marked this pull request as ready for review September 5, 2025 04:18
@mihow mihow added the VISS-SSEC label Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants