Skip to content

workspace: ensure smooth performance of file listings for workspaces containing 250k files #644

@tiborsimko

Description

@tiborsimko

Current behaviour

When a workspace contains many produced output files, the reana-workflow-controller pod starts to use too much CPU and too much memory. It could happen that the whole system "freezes". This was observed for a workflow that was producing over 250k files, and when the researcher clicked on the "Workspace" tab of the REANA web interface.

(When this happened, the REANA web interface was slow to serve other users as well. The solution was to delete the workflow controller pod that was "stuck" in calculating the workspace listing response.)

Expected behaviour

We should improve the file serving behaviour for workspaces containing large number of files so that the workflow controller would not "freeze". This could be achieved in several manners:

  1. Simply refuse to serve the full file content if there are more than N files in the workspace, and serve only N files. (N should be configurable by the REANA cluster admins, because some deployments may have worker nodes with a lot of memory, some with smaller memory. For example, N could be set to 10k.)
  2. Allow specifying filtering, such as reana-client ls -w myanalysis --filter name=mydirectory, both from CLI and from Web UI, and restrict serving N files after the filtering. If we do this, then in theory the researcher could repeat downloading all files by issuing many requests, each with desired filtering.
  3. Improve pagination so that only parts of the workspace files are returned, i.e. process at most 10 files using some pre-defined sorting order (e.g. creation timestamps), and then move onto next N, so that all the files do not have to be processed in advance. This could have an advantage in case the workspace is modified afterwards manually, since the pagination could be "lost".

Either of these solutions would prevent "freezing" of the workflow controller component.

Notes

This issue is related to #641

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions