-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Hi all,
A little strange issue popped up that has left me scratching my hand.
I was processing a collection of measurement sets in a pipeline. There is a stage early on that iterates over rows in the data table of a singular measurement set, and updates visbilities after applying a rotation correction, before writing them back out. This happens in a chunking fashion. This code is available here: https://github.com/AlecThomson/FixMS/blob/main/fixms/fix_ms_corrs.py#L264
Recently I was running a hefty series of jobs and stumbled on this error:
Encountered exception during execution:
Traceback (most recent call last):
File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/fixms/fix_ms_corrs.py", line 330, in fix_ms_corrs
tab.flush()
File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/casacore/tables/table.py", line 557, in flush
self._flush(recursive)
RuntimeError: FiledesIO::write - write error in /scratch3/gal16b/split/39403/2022-04-14_110035_18.RACS.0748-43.ms/table.f1: Argument list too long
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/prefect/engine.py", line 1719, in orchestrate_task_run
result = await call.aresult()
File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 292, in aresult
return await asyncio.wrap_future(self.future)
File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 316, in _run_sync
result = self.fn(*self.args, **self.kwargs)
File "/scratch3/gal16b/packages/flint/flint/ms.py", line 473, in preprocess_askap_ms
fix_ms_corrs(
File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/fixms/fix_ms_corrs.py", line 331, in fix_ms_corrs
start_row += len(data_chunk_cor)
File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/casacore/tables/table.py", line 406, in __exit__
self.close()
File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/casacore/tables/table.py", line 574, in close
self._close()
RuntimeError: FiledesIO::write - write error in /scratch3/gal16b/split/39403/2022-04-14_110035_18.RACS.0748-43.ms/table.f1: Argument list too long
I am unsure what to make of this. I have reran my pipeline on a smaller dataset and which included this measurement set and found no issue. Looking at the specific error Argument list too long
reads like there was some interaction with a shell when trying to flush the buffers to disk. Like there is a large cp
or rm
command trying to be executed.
Would you happen to have any insight into this and the underlying behavior of the close and flush of a casacore table? Is there a series of temporary files stored, say, in /dev/shm
that are examined or the current working directory? I am at a total loss as to where else to look, and it is not clear to me if this is actually a python-casacore, a casacore or some other related issue.