Skip to content

Conversation

carlosgjs
Copy link
Collaborator

@carlosgjs carlosgjs commented Aug 30, 2025

Summary

Avoid adding JobLogHandlers multiple times during the logger property access

List of Changes

  • Added check in ami/jobs/models.py
  • Update README with some notes on running antenna for the first time
  • Added configuration for attached the VS Code debugger to the celeryworker

Related Issues

Fixes #782

Detailed Description

I was watching the queue worker logs (celery), and I noticed what appeared to be many repeated entries. E.g.:

celeryworker-1  | [2025-08-26 18:00:09,068: INFO/ForkPoolWorker-7] Processing image batch 76 of 367
celeryworker-1  | [2025-08-26 18:00:09,075: INFO/ForkPoolWorker-7] Processing image batch 76 of 367
celeryworker-1  | [2025-08-26 18:00:09,082: INFO/ForkPoolWorker-7] Processing image batch 76 of 367
celeryworker-1  | [2025-08-26 18:00:09,089: INFO/ForkPoolWorker-7] Processing image batch 76 of 367
celeryworker-1  | [2025-08-26 18:00:09,095: INFO/ForkPoolWorker-7] Processing image batch 76 of 367
celeryworker-1  | [2025-08-26 18:00:09,102: INFO/ForkPoolWorker-7] Processing image batch 76 of 367
celeryworker-1  | [2025-08-26 18:00:09,109: INFO/ForkPoolWorker-7] Processing image batch 76 of 367
celeryworker-1  | [2025-08-26 18:00:09,115: INFO/ForkPoolWorker-7] Processing image batch 76 of 367
celeryworker-1  | [2025-08-26 18:00:09,122: INFO/ForkPoolWorker-7] Processing image batch 76 of 367
celeryworker-1  | [2025-08-26 18:00:09,129: INFO/ForkPoolWorker-7] Processing image batch 76 of 367

Furthermore, the number of entries increased by 7 on each iteration:

(base) ➜  antenna git:(main) ✗ cat worker.log| grep "Processing image batch 76 of 367" | sort | uniq | wc
     530    5830   53000
(base) ➜  antenna git:(main) ✗ cat worker.log| grep "Processing image batch 77 of 367" | sort | uniq | wc
     537    5907   53700
(base) ➜  antenna git:(main) ✗ cat worker.log| grep "Processing image batch 78 of 367" | sort | uniq | wc
     544    5984   54400
(base) ➜  antenna git:(main) ✗ cat worker.log| grep "Processing image batch 79 of 367" | sort | uniq | wc
     551    6061   55100
(base) ➜  antenna git:(main) ✗ cat worker.log| grep "Processing image batch 80 of 367" | sort | uniq | wc
     558    6138   55800

The problem is that the .logger property on the Job adds a JobLogHandler when read. Since this property is accessed ~7 times per iteration in the job logic, there are 7 handlers added per iteration. This means that every time the code logs, each of these handlers gets called and this number grows as 7n^2 . Since these JobLogHandler objects write to the DB, there are 7n^2 writes happening. At some point this slows to a crawl and the bigger the job the slower it gets.

How to Test the Changes

Running processing jobs with 300+ images.

Screenshots

Successful job processing 369 images in ~6 mins. Prior to the change the job ran for over an hour and was not even half-way done.

image

Checklist

  • I have tested these changes appropriately.
  • I have added and/or modified relevant tests.
  • I updated relevant documentation or comments.
  • I have verified that this PR follows the project's coding standards.
  • Any dependent changes have already been merged to main.

@carlosgjs carlosgjs requested a review from mihow August 30, 2025 02:13
Copy link

netlify bot commented Aug 30, 2025

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit 43aad7c
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/68b6008e8ce87c00082fd860

@carlosgjs carlosgjs changed the title fix: JobLogHandlers added multiple times fix: Big processing jobs get stalled Aug 30, 2025
@carlosgjs carlosgjs changed the title fix: Big processing jobs get stalled fix: ML processing that is causing jobs to fail for medium-large jobs (issue #782) Aug 30, 2025
@mihow mihow requested a review from Copilot September 1, 2025 18:22
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a performance issue where ML processing jobs were failing or running extremely slowly for medium-large jobs due to duplicate log handlers being added repeatedly. The fix prevents JobLogHandlers from being added multiple times to the same logger.

  • Added a check to prevent duplicate JobLogHandler instances from being added to job loggers
  • Updated README with Docker setup notes and troubleshooting tips
  • Added VS Code debugging configuration for the celeryworker service

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
ami/jobs/models.py Added duplicate handler check in logger property to prevent performance degradation
README.md Enhanced setup documentation with Docker version requirements and troubleshooting notes
requirements/local.txt Added commented debugpy dependency for development debugging
docker-compose.yml Added commented debugging configuration for celeryworker service
.vscode/launch.json Added VS Code remote debugging configuration

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

_logger = logging.getLogger(f"ami.jobs.{self.pk}")

# Only add JobLogHandler if not already present
if not any(isinstance(h, JobLogHandler) for h in _logger.handlers):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the culprit! what a catch.

@mihow
Copy link
Collaborator

mihow commented Sep 2, 2025

I have this branch deployed on staging now, running this job
https://app.preview.insectai.org/projects/1/jobs/1233

If all goes well I will push to live!

docker compose started for me on first try with the new debugger dependency with the following commands:

docker compose build celeryworker
docker compose up -d

I haven't tried setting a breakpoint in vscode yet

@mihow mihow merged commit 47fed9a into RolnickLab:main Sep 2, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix ML processing that is causing jobs to fail for medium-large jobs
2 participants