Skip to content

Argument list to long #245

@tjgalvin

Description

@tjgalvin

Hi all,

A little strange issue popped up that has left me scratching my hand.

I was processing a collection of measurement sets in a pipeline. There is a stage early on that iterates over rows in the data table of a singular measurement set, and updates visbilities after applying a rotation correction, before writing them back out. This happens in a chunking fashion. This code is available here: https://github.com/AlecThomson/FixMS/blob/main/fixms/fix_ms_corrs.py#L264

Recently I was running a hefty series of jobs and stumbled on this error:

Encountered exception during execution:
Traceback (most recent call last):
  File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/fixms/fix_ms_corrs.py", line 330, in fix_ms_corrs
    tab.flush()
  File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/casacore/tables/table.py", line 557, in flush
    self._flush(recursive)
RuntimeError: FiledesIO::write - write error in /scratch3/gal16b/split/39403/2022-04-14_110035_18.RACS.0748-43.ms/table.f1: Argument list too long

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/prefect/engine.py", line 1719, in orchestrate_task_run
    result = await call.aresult()
  File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 292, in aresult
    return await asyncio.wrap_future(self.future)
  File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 316, in _run_sync
    result = self.fn(*self.args, **self.kwargs)
  File "/scratch3/gal16b/packages/flint/flint/ms.py", line 473, in preprocess_askap_ms
    fix_ms_corrs(
  File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/fixms/fix_ms_corrs.py", line 331, in fix_ms_corrs
    start_row += len(data_chunk_cor)
  File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/casacore/tables/table.py", line 406, in __exit__
    self.close()
  File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/casacore/tables/table.py", line 574, in close
    self._close()
RuntimeError: FiledesIO::write - write error in /scratch3/gal16b/split/39403/2022-04-14_110035_18.RACS.0748-43.ms/table.f1: Argument list too long

I am unsure what to make of this. I have reran my pipeline on a smaller dataset and which included this measurement set and found no issue. Looking at the specific error Argument list too long reads like there was some interaction with a shell when trying to flush the buffers to disk. Like there is a large cp or rm command trying to be executed.

Would you happen to have any insight into this and the underlying behavior of the close and flush of a casacore table? Is there a series of temporary files stored, say, in /dev/shm that are examined or the current working directory? I am at a total loss as to where else to look, and it is not clear to me if this is actually a python-casacore, a casacore or some other related issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions