Model Execution Error——When submit the case #6462
sugerWu-2822
started this conversation in
E3SM model help
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Dear all,
I tried to perform the E3SM-FATES single-point simulation on Perlmutter. Recently when I submitted the case, I got the following ERROR. Strangely, none of the previous simulations reported errors, and the models were all able to run and output results.
ERROR: RUN FAIL: Command 'srun --label -n 1 -N 1 -c 2 --cpu_bind=cores -m plane=128 /pscratch/sd/x/myuser/e3sm_scratch/pm-cpu/Spin_up_1x1_mysite.IELMFATES.ELM_USRDAT.001.2024-07-16/bld/e3sm.exe >> e3sm.log.$LID 2>&1 ' failed
See log file for details: /pscratch/sd/x/myuser/e3sm_scratch/pm-cpu/Spin_up_1x1_mysite.IELMFATES.ELM
create_run1_1x1tanguroMTBR_fates_spinup.txt
e3sm.log.28453129.txt
_USRDAT.001.2024-07-16/run/e3sm.log.28453129.240722-232021
Find the ERROR keyword in the above log file, and the main errors are as follows.
PE 0: MPICH_ABORT_ON_ERROR = 0
PE 0: MPICH_MPIIO_ABORT_ON_RW_ERROR= disable
ERROR: Unknown error submitted to shr_abort_abort
MPICH ERROR [Rank 0] [job id 28453133.0] [Mon Jul 22 23:20:42 2024] [nid004682] - Abort(1001) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1001) - process 0
srun: error: nid004682: task 0: Exited with exit code 233
srun: Terminating StepId=28453133.0
The attached file is my log error file and the sh file that created the case.
Has anyone encountered a similar issue when submitting a case? Any suggestions and comments would be greatly appreciated!!
Beta Was this translation helpful? Give feedback.
All reactions