Rerun workflows

## What

Many workflow engines have a rerun feature that allows the user to restart the workflow at a safe stage after it was stopped ungracefully.

## Why

Workflows may terminally end in many situations, including unrecoverable error in the infrastructure (e.g. batch processing system, storage). Some of these conditions may be covered by automatic restarts, like some infrastructure problems, others, however, are best be handled by cancellation or automatic termination of workflows, such as workflow bugs, inability to deal with unexpected inputs, etc. One valid situation that we repeatedly observe is a workflow failing because some odd input data requires 10 times the memory or takes 10 times longer than a "normal" sample (e.g. normal cancer sample vs. chromothrypsis; some optimization algorithms with occasionally bad convergence characteristics)

In principle, workflow runs may be started from the beginning, and indeed some workflow engines do even not provide a restart feature. (Restarting from the beginning, i.e. with a new RunRequest, in any case is the safer solution if one does not trust an engine's rerun feature.)

However, restarting workflows at a later stage is still interesting, because it may safe resources (CPU, IO, energy/CO2), and time -- which is important in some application areas, like the timely processing of patient data.

## How

The request would just start exactly the same workflow again, i.e., use the same parameters. There may also be use-cases where a parameter may differ for a rerun. For instance, it may be necessary to increase the memory and time resources for an odd sample.

The rerun may be applied to a workflow in any terminal-state, such as CANCELED, EXECUTOR_ERROR, SYSTEM_ERROR, but obviously not in a running state, like RUNNING, QUEUED, etc.

The correct rerunning itself should be left to the workflow engine. We should not care whether the engine or even the workflow allow for this rerunning (that's outside the responsibility of a WES, I think).

This could be implemented as a separate RerunRequest route (e.g. with the run-id in the route). The whole feature should be optional, because it does not make sense to require the implementation of a RerunRequest API endpoint for a workflow engine that cannot do reruns.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rerun workflows #203

What

Why

How

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rerun workflows #203

Description

What

Why

How

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions