Skip to content

fix: clean up worker threads in the event of an error #2783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 07-18-fix_make_query_as_macros_retry
Choose a base branch
from

Conversation

MasterPtato
Copy link
Contributor

Changes

Copy link

vercel bot commented Jul 21, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
rivet-site ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 21, 2025 5:48pm

Copy link

claude bot commented Jul 21, 2025

Claude finished @MasterPtato's task —— View job


PR Review: Clean up worker threads in the event of an error

Review Checklist

  • Read and analyze the changed file
  • Review code quality and best practices
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Assess security concerns
  • Consider test coverage needs
  • Provide comprehensive feedback

Changes Summary

This PR addresses proper cleanup of background tasks when the worker encounters errors. The main changes are:

  1. Added background task cleanup (gc_handle.abort(), metrics_handle.abort()) when subscription becomes unsubscribed
  2. Added error handling around self.tick() call with proper cleanup on errors
  3. Reordered tokio::select! arms for clearer flow
  4. Added clarifying comment about background task cleanup

✅ Positive Aspects

Excellent Bug Fix: This addresses a real resource leak issue where background GC and metrics tasks would continue running after the main worker loop exits due to errors.

Consistent Error Handling: The fix applies the same cleanup pattern across all error exit paths (wake_sub.next() == None, self.tick() errors), ensuring consistent behavior.

Clear Code Organization: Reordering the tokio::select! arms puts the subscription handling before background task monitoring, which is more logical.

Good Documentation: The added comment at packages/common/chirp-workflow/core/src/worker.rs:195 clearly explains why rivet_runtime::shutdown() stops background tasks.

🔍 Code Quality Analysis

Rust Best Practices: ✅

  • Proper use of JoinHandle::abort() for task cancellation
  • Consistent error propagation with ? operator and explicit Err() returns
  • Appropriate use of tokio::select! for concurrent operations

Error Handling: ✅

  • All error paths now properly clean up resources
  • Error messages are preserved and bubbled up correctly
  • No silent failures or resource leaks

🐛 Potential Issues

Minor: Duplication: The cleanup code (gc_handle.abort(); metrics_handle.abort()) is now duplicated in two places. Consider extracting to a helper method:

impl Worker {
    fn abort_background_tasks(&self, gc_handle: &JoinHandle<()>, metrics_handle: &JoinHandle<()>) {
        gc_handle.abort();
        metrics_handle.abort();
    }
}

Edge Case: If gc_handle.abort() or metrics_handle.abort() panic (unlikely but theoretically possible), the second abort wouldn't execute. This is very minor since abort() is typically infallible.

🚀 Performance Considerations

Resource Cleanup: ✅ Improves performance by preventing background tasks from continuing unnecessarily after errors.

No Performance Regression: The changes add minimal overhead (two function calls) only in error paths.

🔒 Security Assessment

No Security Issues: This is defensive cleanup code that improves resource management. No security implications.

🧪 Test Coverage Recommendations

Consider adding tests for:

  1. Subscription failure cleanup: Verify that when wake_sub.next() returns None, both background tasks are aborted
  2. Tick error cleanup: Verify that when self.tick() returns an error, background tasks are aborted
  3. Task handle state: Confirm that after cleanup, the handles report as finished/aborted

Example test structure:

#[tokio::test]
async fn test_background_task_cleanup_on_tick_error() {
    // Setup worker with mocked dependencies that will cause tick() to fail
    // Verify gc_handle.is_finished() and metrics_handle.is_finished() return true
}

📋 Overall Assessment

Strong Approve

This is a well-implemented bug fix that addresses a real resource management issue. The changes are minimal, focused, and follow Rust best practices. The only suggestion is the minor code deduplication opportunity mentioned above.

Files Changed: packages/common/chirp-workflow/core/src/worker.rs:86-116


Copy link
Contributor Author

MasterPtato commented Jul 21, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

Deploying rivet with  Cloudflare Pages  Cloudflare Pages

Latest commit: d29d739
Status:🚫  Build failed.

View logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant