-
Notifications
You must be signed in to change notification settings - Fork 42
Description
Current behaviour
When a workspace contains many produced output files, the reana-workflow-controller
pod starts to use too much CPU and too much memory. It could happen that the whole system "freezes". This was observed for a workflow that was producing over 250k files, and when the researcher clicked on the "Workspace" tab of the REANA web interface.
(When this happened, the REANA web interface was slow to serve other users as well. The solution was to delete the workflow controller pod that was "stuck" in calculating the workspace listing response.)
Expected behaviour
We should improve the file serving behaviour for workspaces containing large number of files so that the workflow controller would not "freeze". This could be achieved in several manners:
- Simply refuse to serve the full file content if there are more than N files in the workspace, and serve only N files. (N should be configurable by the REANA cluster admins, because some deployments may have worker nodes with a lot of memory, some with smaller memory. For example, N could be set to 10k.)
- Allow specifying filtering, such as
reana-client ls -w myanalysis --filter name=mydirectory
, both from CLI and from Web UI, and restrict serving N files after the filtering. If we do this, then in theory the researcher could repeat downloading all files by issuing many requests, each with desired filtering. - Improve pagination so that only parts of the workspace files are returned, i.e. process at most 10 files using some pre-defined sorting order (e.g. creation timestamps), and then move onto next N, so that all the files do not have to be processed in advance. This could have an advantage in case the workspace is modified afterwards manually, since the pagination could be "lost".
Either of these solutions would prevent "freezing" of the workflow controller component.
Notes
This issue is related to #641