-
Notifications
You must be signed in to change notification settings - Fork 5
Support re-processing detections and skipping localizer #815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support re-processing detections and skipping localizer #815
Conversation
…void shadowing the FlatBugDetector model
✅ Deploy Preview for antenna-preview canceled.
|
…skipping-detector
ami/ml/tests.py
Outdated
|
||
# Reprocess the same images | ||
pipeline = self.processing_service_instance.pipelines.all().get(slug="constant") | ||
pipeline_response = pipeline.process_images(self.test_images, project_id=self.project.pk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should make a way to check the request data that will be sent to the pipeline. In this case, it should have detections in the request. Whereas the other requests only have the source images. This can be a later optimization (prepare request data, then manually pass it to process_images)
pydantic | ||
Pillow | ||
requests | ||
scipy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is scipy used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I meant to put scipy in the example processing service requirements.txt to resolve this error with the HF image classifier:
Owlv2ImageProcessor requires the scipy library but it was not found in your environment. You can install it with pip:
`pip install scipy`. Please note that you may need to restart your runtime after installation.
I thought the transformers
package would install scipy
automatically but that doesn't seem to be the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This screenshots are great, thank you! But we don't need to include them in the repo if you are only referencing them in the PR. GitHub will host the images for you if you paste or drag them in using the online text editor. Or are you using them somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the images in the processing_services/README.md
antenna/processing_services/README.md
Lines 79 to 93 in 3d3b820
## Demo | |
## `minimal` Pipelines and Output Images | |
- `ConstantPipeline` and `RandomDetectionRandomSpeciesPipeline` | |
 | |
## `example` Pipelines and Output Images | |
- `ZeroShotHFClassifierPipeline` | |
 | |
- `ZeroShotObjectDetectorWithRandomSpeciesClassifierPipeline` and `ZeroShotObjectDetectorWithConstantClassifierPipeline` (using reprocessing, skips the Zero Shot Object Detector if there are existing detections) | |
 |
ami/ml/models/pipeline.py
Outdated
if bbox and detection.detection_algorithm: | ||
detection_requests.append( | ||
DetectionRequest( | ||
source_image=source_images[-1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure this is the right source image! Can you use just source_image=source_image
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like DetectionRequest
expects a SourceImageRequest
so I was assigning it the last created request (which corresponds to the source_image
. Tried to make my code a bit clearer like this:
antenna/ami/ml/models/pipeline.py
Lines 202 to 213 in 5c7af56
source_image_request = SourceImageRequest( | |
id=str(source_image.pk), | |
url=url, | |
) | |
source_image_requests.append(source_image_request) | |
# Re-process all existing detections if they exist | |
for detection in source_image.detections.all(): | |
bbox = detection.get_bbox() | |
if bbox and detection.detection_algorithm: | |
detection_requests.append( | |
DetectionRequest( | |
source_image=source_image_request, |
# Crop the image to set the _pil attribute | ||
logger.info(f"Received detection without crop_image_url: {detection}") | ||
logger.info("Falling back to cropping the source image...") | ||
source_image = SourceImage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some methods to help us only open each source image once per request.
- Group detections by source image either in
create_detections
or before it's called - Keep the source images in memory for each group, rather than per detection
- Consider making a temporary path that is accessible across multiple requests - at least for the downloaded image (it will still have to open it in memory each request)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the source image should only be opened once here
antenna/processing_services/example/api/api.py
Lines 100 to 101 in e7e579e
for img in source_images: | |
img.open(raise_exception=True) |
Or if a cropped image is provided then that's also opened once
cropped_image.open(raise_exception=True) |
…skipping-detector
…tector' of github.com:RolnickLab/antenna into 706-support-for-reprocessing-detections-and-skipping-detector
…-for-reprocessing-detections-and-skipping-detector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for re-processing detections by allowing pipelines to skip the localizer stage when existing detections are provided, while only running the classifier on previously detected regions.
- Added ability to pass existing detections to pipeline requests to enable re-processing
- Updated pipeline logic to conditionally skip detection generation when existing detections are available
- Added new pipeline variants that demonstrate detection re-use with different classifiers
Reviewed Changes
Copilot reviewed 16 out of 20 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
processing_services/minimal/requirements.txt | Pin package versions to specific releases |
processing_services/example/requirements.txt | Pin package versions and add scipy dependency |
processing_services/minimal/api/utils.py | Add utility functions for URL and base64 validation |
processing_services/example/api/utils.py | Add utility functions for URL and base64 validation |
processing_services/minimal/api/schemas.py | Add DetectionRequest schema and reorganize class definitions |
processing_services/example/api/schemas.py | Add DetectionRequest schema and new pipeline choices |
processing_services/minimal/api/pipelines.py | Refactor pipelines to support existing detections and add new pipeline variant |
processing_services/example/api/pipelines.py | Add new pipeline variants and update existing ones to support detection re-processing |
processing_services/minimal/api/api.py | Add detection creation logic and update process endpoint |
processing_services/example/api/api.py | Add detection creation logic and update process endpoint |
processing_services/example/api/algorithms.py | Add new classifier algorithms and remove duplicate image opening |
processing_services/README.md | Update documentation with new pipeline examples |
ami/ml/tests.py | Add test for pipeline reprocessing functionality |
ami/ml/schemas.py | Add DetectionRequest schema and reorganize definitions |
ami/ml/models/pipeline.py | Update pipeline processing to include existing detections |
ami/main/models.py | Add get_bbox method to Detection model |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
logger.info("[1/2] No existing detections, generating detections...") | ||
detections: list[Detection] = make_constant_detection(self.source_images) | ||
|
||
logger.info("[2/2] Running the classifier...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The step numbering [1/2] is hardcoded and could become inconsistent if pipeline steps change. Consider using dynamic step counting or removing the step numbers.
logger.info("[2/2] Running the classifier...") | |
logger.info("Skipping the localizer, use existing detections...") | |
detections: list[Detection] = self.existing_detections | |
else: | |
logger.info("No existing detections, generating detections...") | |
detections: list[Detection] = make_constant_detection(self.source_images) | |
logger.info("Running the classifier...") |
Copilot uses AI. Check for mistakes.
logger.info("[1/2] No existing detections, generating detections...") | ||
detections: list[Detection] = make_constant_detection(self.source_images) | ||
|
||
logger.info("[2/2] Running the classifier...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The step numbering [2/2] is hardcoded and could become inconsistent if pipeline steps change. Consider using dynamic step counting or removing the step numbers.
logger.info("[2/2] Running the classifier...") | |
logger.info("Skipping the localizer, use existing detections...") | |
detections: list[Detection] = self.existing_detections | |
else: | |
logger.info("No existing detections, generating detections...") | |
detections: list[Detection] = make_constant_detection(self.source_images) | |
logger.info("Running the classifier...") |
Copilot uses AI. Check for mistakes.
self.stages[0], self.source_images, self.batch_sizes[0], intermediate=True | ||
) | ||
detections_with_candidate_labels: list[Detection] = [] | ||
if self.existing_detections: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conditional logic for checking existing detections is duplicated across multiple pipeline classes. Consider extracting this into a shared base method to reduce code duplication.
Copilot uses AI. Check for mistakes.
existing_detection = Detection.objects.filter( | ||
source_image=source_image, | ||
detection_algorithm=detection_algo, | ||
bbox=serialized_bbox, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query for existing detections no longer filters by detection_algorithm, which could cause incorrect detection matching when the same bbox exists from different algorithms.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional, we want to classify detections from any previous detection algorithm, unless the user disables it for the pipeline run.
Summary
When a user processes empty source image with a pipeline, this will produce detections (potentially with classifications). Any time after, should never re-run the detector again, only run the classifier.
The user can choose to ignore existing detections and process an image from scratch by setting
reprocess_existing_detections
toFalse
in theirProjectPipelineConfig
for that run or all runs.By default, this feature is turned off on live using a feature flag (
reprocess_existing_detections
in Project.feature_flags)This will need be implemented in the default processing service before any effect is seen
RolnickLab/ami-data-companion#85
Related Issues
Closes #706
Related to #747 and #798
Screenshots
Reprocessing images means the same detections are used; classifications are made on the cropped image.
Example reprocessing job logs:
Checklist