Skip to content

[Bug][tasks] Duplicate subtasks running #8379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
kostas-petrakis opened this issue Apr 10, 2025 · 4 comments
Open
2 of 3 tasks

[Bug][tasks] Duplicate subtasks running #8379

kostas-petrakis opened this issue Apr 10, 2025 · 4 comments
Labels
severity/p1 This bug affects functionality or significantly affect ux type/bug This issue is a bug

Comments

@kostas-petrakis
Copy link
Contributor

kostas-petrakis commented Apr 10, 2025

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

I cancelled a transformation pipeline but the pipeline remained hanging. I eventually killed the pod, once restarted Apache Devlake resumed (although cancelled) the pipeline, but now I see two tasks running.

Image

What do you expect to happen

Pipeline is cancelled

How to reproduce

Not sure, as I never experienced this issue before. I was running a pipeline which was cancelled because I needed to change the scope config.
Cancelling the pipeline resulted in a stuck state, I had to actually kill the pod, since I cannot access the DB directly.
Restarting the pod, resulted in the pipeline resuming but I see now two task ID's with the exact same finished_records column updates.

Anything else

No response

Version

v1.0.2-beta4

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@kostas-petrakis kostas-petrakis added the type/bug This issue is a bug label Apr 10, 2025
Copy link

dosubot bot commented Apr 10, 2025

The issue you're experiencing with duplicate tasks running after a pod restart might be related to how Apache DevLake handles task resumption and cancellation. When a pod is killed and restarted, the pipeline status is marked as interrupted. If the RESUME_PIPELINES configuration is set to true, the pipeline is set to resume, which might lead to the observed behavior of duplicate tasks if the cancellation process wasn't fully completed before the pod was killed [1][2].

To address this, you might want to check the RESUME_PIPELINES setting in your configuration. If it's set to true, consider setting it to false to prevent automatic resumption of tasks after a pod restart. Additionally, ensure that the task cancellation process is completed before killing the pod to avoid leaving tasks in an inconsistent state.

If this issue persists, it might be beneficial to investigate further into the task management and cancellation logic to identify any potential edge cases or race conditions that could lead to duplicate task entries.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@dosubot dosubot bot added the severity/p1 This bug affects functionality or significantly affect ux label Apr 10, 2025
@kostas-petrakis
Copy link
Contributor Author

@dosu the resume is expected, the issue is that there are two entries in the _devlake_subtasks table for the same subtask.

Copy link

dosubot bot commented Apr 10, 2025

Apache DevLake ensures the uniqueness of subtask entries in the _devlake_subtasks table by using a composite primary key consisting of the Plugin, Subtask, and Params fields. This setup should prevent duplicate entries for the same subtask [1].

If you're seeing duplicate entries, it might indicate a race condition or an issue in the task management logic that wasn't fully addressed. Unfortunately, I couldn't find specific information on known issues or race conditions that could lead to this situation. It might be helpful to review the task management and resumption logic to ensure that the uniqueness constraints are being properly enforced during pipeline resumption.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@kostas-petrakis kostas-petrakis changed the title [Bug][tasks] Duplicate task running [Bug][tasks] Duplicate subtasks running Apr 10, 2025
@kostas-petrakis
Copy link
Contributor Author

Adding some extra context, the issue has auto resolved, pipelines finished and Apache Devlake resumed as normal. Feel free to keep this open if you would like to test this corner case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/p1 This bug affects functionality or significantly affect ux type/bug This issue is a bug
Projects
None yet
Development

No branches or pull requests

1 participant