e3sm-2.1 mvapich2-2.3.4 mpirun error #6520
gwkwak
started this conversation in
E3SM model help
Replies: 2 comments
-
run command is mpirun --machinefile /var/spool/torque/aux//57.e3sm00 -np 96 /home/build/e3sm.exe |
Beta Was this translation helpful? Give feedback.
0 replies
-
Looks like it failed immediately with the first MPI command. Have you verified you can run any mpi program on that machine? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
info
OS = CentOS 7.9
compile = intel19, 20,22
anaconda3=2023.09
mvapich = mvapich2-2.3.4
infiniban EDR OFED = 4.9-7.1.0.0
lib = HDF5-1.10.5(Parallel install), Pnetcdf-1.11.2, Netcdf-C-4.6.1(Parallel install), Netcdf-Fortran-4.4.5
node Core = 32 , 3node, Total Core = 96
--->cime/CIME/XML/env_mach_pes.py file "value = -3 * value * max_mpitasks_per_node"
--->cime_config/machines/config_machines.xml file "<MAX_TASKS_PER_NODE>32</MAX_TASKS_PER_NODE>
<MAX_TASKS_PER_NODE>32</MAX_TASKS_PER_NODE>"
PBS Torque=2.5.12
bashrc file add = ulimit -S unlimited, ulimit -s unlimited
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1029554
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 1029554
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
run file run_e3sm.template.sh
TO DO:
- custom pelayout
main() {
For debugging, uncomment line below
#set -x
set -x
--- Configuration flags ----
Machine and project
readonly MACHINE=xxxx
readonly PROJECT="e3sm"
Simulation
readonly COMPSET="WCYCL1850"
readonly RESOLUTION="ne30pg2_EC30to60E2r2"
BEFORE RUNNING : CHANGE the following CASE_NAME to desired value
readonly CASE_NAME="${PROJECT}.${COMPSET}.${RESOLUTION}"
If this is part of a simulation campaign, ask your group lead about using a case_group label
readonly CASE_GROUP=""
Code and compilation
readonly CHECKOUT="20210702"
readonly BRANCH="master"
readonly CHERRY=( )
readonly DEBUG_COMPILE=false
#readonly DEBUG_COMPILE=true
Run options
readonly MODEL_START_TYPE="initial" # 'initial', 'continue', 'branch', 'hybrid'
readonly START_DATE="0001-01-01"
Additional options for 'branch' and 'hybrid'
readonly GET_REFCASE=FALSE
Set paths
readonly CODE_ROOT="${HOME}/model/E3SM-2.1.0"
readonly CASE_ROOT="${HOME}/Model/${CASE_NAME}/${CHECKOUT}"
Sub-directories
readonly CASE_BUILD_DIR=${CASE_ROOT}/build
readonly CASE_ARCHIVE_DIR=${CASE_ROOT}/archive
Define type of run
short tests: 'XS_2x5_ndays', 'XS_1x10_ndays', 'S_1x10_ndays',
'M_1x10_ndays', 'M2_1x10_ndays', 'M80_1x10_ndays', 'L_1x10_ndays'
or 'production' for full simulation
readonly run='XS_2x5_ndays'
if [ "${run}" != "production" ]; then
error
conf/eamconf/chem_mech.in -> /home/tests/XS_2x5_ndays/case_scripts/CaseDocs
2024-07-22 13:03:14 NAMELIST CREATION HAS FINISHED
2024-07-22 13:03:14 PRE_RUN_CHECK HAS FINISHED
run command is mpirun --machinefile /var/spool/torque/aux//57.e3sm00 -np 36 /home/build/e3sm.exe
2024-07-22 13:03:14 SAVE_PRERUN_PROVENANCE BEGINS HERE
Deprecated "arg" node detected in /home/tests/XS_2x5_ndays/case_scripts/env_batch.xml, check files /home/cime_config/machines/config_batch.xml
copying /home/tests/XS_2x5_ndays/run/preview_run.log -> /home/tests/XS_2x5_ndays/run/preview_run.log.57.e3sm00.240722-130314
2024-07-22 13:03:14 SAVE_PRERUN_PROVENANCE HAS FINISHED
2024-07-22 13:03:14 MODEL EXECUTION BEGINS HERE
2024-07-22 13:03:15 MODEL EXECUTION HAS FINISHED
ERROR: RUN FAIL: Command 'mpirun --machinefile /var/spool/torque/aux//57.e3sm00 -np 36 /home/build/e3sm.exe
See log file for details: /home/tests/XS_2x5_ndays/run/e3sm.log.57.e3sm00.240722-13031
e3sm.log.57.e3sm00.240722-13031 file error
(t_initf) Read in prof_inparm namelist from: drv_in
(t_initf) Using profile_disable= F
(t_initf) profile_timer= 4
(t_initf) profile_depth_limit= 20
(t_initf) profile_detail_limit= 12
(t_initf) profile_barrier= F
(t_initf) profile_outpe_num= 1
(t_initf) profile_outpe_stride= 0
(t_initf) profile_single_file= F
(t_initf) profile_global_stats= T
(t_initf) profile_ovhd_measurement= F
(t_initf) profile_add_detail= F
(t_initf) profile_papi_enable= F
[cli_0]: aborting job:
Fatal error in PMPI_Waitall:
Other MPI error, error stack:
PMPI_Waitall(419).................: MPI_Waitall(count=35, req_array=0x7ffc43ce90a0, status_array=0x1) failed
MPIR_Waitall_impl(248)............:
MPIDI_CH3I_Progress(284)..........:
handle_read(1349).................:
handle_read_individual(1407)......:
MPIDI_CH3I_MRAIL_Parse_header(404): Control shouldn't reach here in prototype, header %d
(errno 101)
Beta Was this translation helpful? Give feedback.
All reactions