Skip to content

Conversation

vanessavmac
Copy link
Collaborator

@vanessavmac vanessavmac commented Apr 27, 2025

Summary

When a user processes empty source image with a pipeline, this will produce detections (potentially with classifications). Any time after, should never re-run the detector again, only run the classifier.

The user can choose to ignore existing detections and process an image from scratch by setting reprocess_existing_detections to False in their ProjectPipelineConfig for that run or all runs.

By default, this feature is turned off on live using a feature flag (reprocess_existing_detections in Project.feature_flags)

This will need be implemented in the default processing service before any effect is seen
RolnickLab/ami-data-companion#85

Related Issues

Closes #706
Related to #747 and #798

Screenshots

Reprocessing images means the same detections are used; classifications are made on the cropped image.

image

Example reprocessing job logs:

[2025-06-28 02:59:32] INFO Finished job #116 "Capture #1" (SUCCESS)
[2025-06-28 02:59:32] INFO Changing status of job 116 from STARTED to SUCCESS
[2025-06-28 02:59:32] INFO All tasks completed for job 116
[2025-06-28 02:59:32] INFO Waiting for batch 1 to finish saving results (sub-task d440d41d-f547-4e03-92a3-c577c1bedc63)
[2025-06-28 02:59:32] INFO Checking the status of 1 remaining sub-tasks that are still saving results.
[2025-06-28 02:59:32] INFO Processed 100% of images successfully.
[2025-06-28 02:59:32] INFO Saving results for batch 1 in sub-task d440d41d-f547-4e03-92a3-c577c1bedc63
[2025-06-28 02:59:32] INFO Processed image batch 1 in 0.20s
[2025-06-28 02:59:32] INFO Pipeline results returned 1 images, 1 detections, 2 classifications
[2025-06-28 02:59:32] INFO Found 1 existing detections.
[2025-06-28 02:59:32] INFO Using pipeline config: {}
[2025-06-28 02:59:31] INFO Sending 1 images to Pipeline #29 "Random Detection Random Species Pipeline" (random-detection-random-species) v1
[2025-06-28 02:59:31] INFO Using processing service with latency 0.0147: #13 "minimal" at http://ml_backend:2000
[2025-06-28 02:59:31] INFO Searching processing services:['minimal']
[2025-06-28 02:59:31] INFO Processing image batch 1 of 1
[2025-06-28 02:59:31] INFO Found 1 out of 1 images to process
[2025-06-28 02:59:31] INFO Filtering images that have already been processed by pipeline #29 "Random Detection Random Species Pipeline" (random-detection-random-species) v1
[2025-06-28 02:59:31] INFO Changing status of job 116 from PENDING to STARTED
[2025-06-28 02:59:31] INFO Running job #116 "Capture #1" (PENDING)

Checklist

  • I have tested these changes appropriately.
  • I have added and/or modified relevant tests.
  • I updated relevant documentation or comments.
  • I have verified that this PR follows the project's coding standards.
  • Any dependent changes have already been merged to main.

Copy link

netlify bot commented Apr 27, 2025

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit da35e16
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/689fde6b8666390008628c1a

ami/ml/tests.py Outdated

# Reprocess the same images
pipeline = self.processing_service_instance.pipelines.all().get(slug="constant")
pipeline_response = pipeline.process_images(self.test_images, project_id=self.project.pk)
Copy link
Collaborator

@mihow mihow Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should make a way to check the request data that will be sent to the pipeline. In this case, it should have detections in the request. Whereas the other requests only have the source images. This can be a later optimization (prepare request data, then manually pass it to process_images)

pydantic
Pillow
requests
scipy
Copy link
Collaborator

@mihow mihow Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is scipy used anywhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I meant to put scipy in the example processing service requirements.txt to resolve this error with the HF image classifier:

Owlv2ImageProcessor requires the scipy library but it was not found in your environment. You can install it with pip:
`pip install scipy`. Please note that you may need to restart your runtime after installation.

I thought the transformers package would install scipy automatically but that doesn't seem to be the case

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This screenshots are great, thank you! But we don't need to include them in the repo if you are only referencing them in the PR. GitHub will host the images for you if you paste or drag them in using the online text editor. Or are you using them somewhere else?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the images in the processing_services/README.md

## Demo
## `minimal` Pipelines and Output Images
- `ConstantPipeline` and `RandomDetectionRandomSpeciesPipeline`
![MinimalReprocessing](images/MinimalReprocessing.png)
## `example` Pipelines and Output Images
- `ZeroShotHFClassifierPipeline`
![ZeroShotHFClassifierPipeline](images/ZeroShotHFClassifierPipeline.png)
- `ZeroShotObjectDetectorWithRandomSpeciesClassifierPipeline` and `ZeroShotObjectDetectorWithConstantClassifierPipeline` (using reprocessing, skips the Zero Shot Object Detector if there are existing detections)
![ZeroShotReprocessing](images/ZeroShotReprocessing.png)

if bbox and detection.detection_algorithm:
detection_requests.append(
DetectionRequest(
source_image=source_images[-1],
Copy link
Collaborator

@mihow mihow Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure this is the right source image! Can you use just source_image=source_image instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like DetectionRequest expects a SourceImageRequest so I was assigning it the last created request (which corresponds to the source_image. Tried to make my code a bit clearer like this:

source_image_request = SourceImageRequest(
id=str(source_image.pk),
url=url,
)
source_image_requests.append(source_image_request)
# Re-process all existing detections if they exist
for detection in source_image.detections.all():
bbox = detection.get_bbox()
if bbox and detection.detection_algorithm:
detection_requests.append(
DetectionRequest(
source_image=source_image_request,

# Crop the image to set the _pil attribute
logger.info(f"Received detection without crop_image_url: {detection}")
logger.info("Falling back to cropping the source image...")
source_image = SourceImage(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some methods to help us only open each source image once per request.

  • Group detections by source image either in create_detections or before it's called
  • Keep the source images in memory for each group, rather than per detection
  • Consider making a temporary path that is accessible across multiple requests - at least for the downloaded image (it will still have to open it in memory each request)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the source image should only be opened once here

for img in source_images:
img.open(raise_exception=True)

Or if a cropped image is provided then that's also opened once
cropped_image.open(raise_exception=True)

@mihow mihow requested a review from Copilot August 15, 2025 20:51
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for re-processing detections by allowing pipelines to skip the localizer stage when existing detections are provided, while only running the classifier on previously detected regions.

  • Added ability to pass existing detections to pipeline requests to enable re-processing
  • Updated pipeline logic to conditionally skip detection generation when existing detections are available
  • Added new pipeline variants that demonstrate detection re-use with different classifiers

Reviewed Changes

Copilot reviewed 16 out of 20 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
processing_services/minimal/requirements.txt Pin package versions to specific releases
processing_services/example/requirements.txt Pin package versions and add scipy dependency
processing_services/minimal/api/utils.py Add utility functions for URL and base64 validation
processing_services/example/api/utils.py Add utility functions for URL and base64 validation
processing_services/minimal/api/schemas.py Add DetectionRequest schema and reorganize class definitions
processing_services/example/api/schemas.py Add DetectionRequest schema and new pipeline choices
processing_services/minimal/api/pipelines.py Refactor pipelines to support existing detections and add new pipeline variant
processing_services/example/api/pipelines.py Add new pipeline variants and update existing ones to support detection re-processing
processing_services/minimal/api/api.py Add detection creation logic and update process endpoint
processing_services/example/api/api.py Add detection creation logic and update process endpoint
processing_services/example/api/algorithms.py Add new classifier algorithms and remove duplicate image opening
processing_services/README.md Update documentation with new pipeline examples
ami/ml/tests.py Add test for pipeline reprocessing functionality
ami/ml/schemas.py Add DetectionRequest schema and reorganize definitions
ami/ml/models/pipeline.py Update pipeline processing to include existing detections
ami/main/models.py Add get_bbox method to Detection model

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

logger.info("[1/2] No existing detections, generating detections...")
detections: list[Detection] = make_constant_detection(self.source_images)

logger.info("[2/2] Running the classifier...")
Copy link
Preview

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The step numbering [1/2] is hardcoded and could become inconsistent if pipeline steps change. Consider using dynamic step counting or removing the step numbers.

Suggested change
logger.info("[2/2] Running the classifier...")
logger.info("Skipping the localizer, use existing detections...")
detections: list[Detection] = self.existing_detections
else:
logger.info("No existing detections, generating detections...")
detections: list[Detection] = make_constant_detection(self.source_images)
logger.info("Running the classifier...")

Copilot uses AI. Check for mistakes.

logger.info("[1/2] No existing detections, generating detections...")
detections: list[Detection] = make_constant_detection(self.source_images)

logger.info("[2/2] Running the classifier...")
Copy link
Preview

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The step numbering [2/2] is hardcoded and could become inconsistent if pipeline steps change. Consider using dynamic step counting or removing the step numbers.

Suggested change
logger.info("[2/2] Running the classifier...")
logger.info("Skipping the localizer, use existing detections...")
detections: list[Detection] = self.existing_detections
else:
logger.info("No existing detections, generating detections...")
detections: list[Detection] = make_constant_detection(self.source_images)
logger.info("Running the classifier...")

Copilot uses AI. Check for mistakes.

self.stages[0], self.source_images, self.batch_sizes[0], intermediate=True
)
detections_with_candidate_labels: list[Detection] = []
if self.existing_detections:
Copy link
Preview

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional logic for checking existing detections is duplicated across multiple pipeline classes. Consider extracting this into a shared base method to reduce code duplication.

Copilot uses AI. Check for mistakes.

existing_detection = Detection.objects.filter(
source_image=source_image,
detection_algorithm=detection_algo,
bbox=serialized_bbox,
Copy link
Preview

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query for existing detections no longer filters by detection_algorithm, which could cause incorrect detection matching when the same bbox exists from different algorithms.

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional, we want to classify detections from any previous detection algorithm, unless the user disables it for the pipeline run.

@mihow mihow merged commit 5be7396 into main Aug 16, 2025
6 checks passed
@mihow mihow deleted the 706-support-for-reprocessing-detections-and-skipping-detector branch August 16, 2025 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for reprocessing detections and skipping detector
3 participants