-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Feature/celery beat watchdog #4534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR implements a watchdog mechanism for monitoring and auto-restarting the Celery beat scheduler when it becomes unresponsive, using Redis-based heartbeat tracking.
- Critical bug:
monitor-celery-beat
task inbeat_schedule.py
incorrectly usesMONITOR_PROCESS_MEMORY
instead of a dedicated monitoring task type - Unreachable log message in
supervisord_watchdog.py
due to infinite loop in main function - Missing error handling for
subprocess.call
insupervisord_watchdog.py
when restarting processes - Typo in log message ("succeded") in
supervisord_watchdog.py
- Celery version upgrade from beta (5.5.0b4) to stable (5.5.1) improves reliability
8 file(s) reviewed, 6 comment(s)
Edit PR Review Bot Settings | Greptile
f"elapsed_threshold={MAX_AGE_SECONDS}" | ||
) | ||
|
||
subprocess.call(["supervisorctl", "-c", conf, "restart", program]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: subprocess.call return value should be checked for errors
subprocess.call(["supervisorctl", "-c", conf, "restart", program]) | |
result = subprocess.call(["supervisorctl", "-c", conf, "restart", program]) | |
if result != 0: | |
logger.error(f"Failed to restart {program} (exit code {result})") |
* upgrade celery to release version * make the watchdog script more reusable * use constant * code review * catch interrupt --------- Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* upgrade celery to release version * make the watchdog script more reusable * use constant * code review * catch interrupt --------- Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Description
Fixes https://linear.app/danswer/issue/DAN-1820/watchdog-on-celery-beat-in-supervisord
How Has This Been Tested?
[Describe the tests you ran to verify your changes]
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.