Skip to content

Conversation

AaronDonahue
Copy link
Contributor

This commit will add a new sub-parameter list for the AD that will control debug options that would not typically be run for a model simulation.

The first application of the new DEBUG parameters is to add an option to force an abort at the driver level at a specific time step. The new option is called force_crash_nsteps and will force the AD to abort once a specific timestep is reached.

Copy link

github-actions bot commented Oct 4, 2024

PR Preview Action v1.4.8
🚀 Deployed preview to https://E3SM-Project.github.io/scream/pr-preview/pr-3031/
on branch gh-pages at 2024-10-07 18:35 UTC

@AaronDonahue
Copy link
Contributor Author

@bartgol , how does the TS.get_num_steps() function work for a restart? My intuition is that it counts the number of steps from the beginning of the case so on restart you resume counting steps from the last run. But I want to make sure.

I also need to add a unit test for this new capability. I'm posting now so I can solicit feedback sooner than later.

@E3SM-Bot
Copy link
Collaborator

E3SM-Bot commented Oct 4, 2024

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6115
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS enhancement;Atmosphere Driver
PULLREQUESTNUM 3031
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA f1b3a4b
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5881
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS enhancement;Atmosphere Driver
PULLREQUESTNUM 3031
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA f1b3a4b
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Using Repos:

Repo: SCREAM (E3SM-Project/scream)
  • Branch: aarondonahue/ad_debug_forced_fail_opt
  • SHA: f1b3a4b
  • Mode: TEST_REPO

Pull Request Author: AaronDonahue

mahf708
mahf708 previously approved these changes Oct 4, 2024
Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! 🚀

👼 😈 🤹‍♂️

Two minor comments that shouldn't impact whether or not to merge, so pls feel free to ignore them

</file>
<file name="data/scream_input.yaml" format="yaml">
<sections>driver_options,iop_options,atmosphere_processes,grids_manager,initial_conditions,Scorpio,e3sm_parameters</sections>
<sections>driver_debug_options,driver_options,iop_options,atmosphere_processes,grids_manager,initial_conditions,Scorpio,e3sm_parameters</sections>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I would prefer we list these vertically... is something like this acceptable in xml?

      <sections>
        driver_debug_options,
        driver_options,
        iop_options,
        atmosphere_processes,
        grids_manager,
        initial_conditions,
        Scorpio,
        e3sm_parameters
      </sections>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mahf708 I don't think it is trivial. I tried with just the above and hit an XML error. I tried a Google search and it looks like it is up to the interpreter if it catches the line breaks. So we may be stuck with a single long line for now. Although I agree it would be easier to read if we could break it down into multiple lines.

Copy link
Contributor

@bartgol bartgol Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue may be in how we parse the XML node content to generate the yaml/nml files during buildnml. We may be able to modify that code, so that a,\n b is treated the same as a,b. But probably not in this PR.

@E3SM-Bot
Copy link
Collaborator

E3SM-Bot commented Oct 4, 2024

Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED

Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run.

Pull Request Auto Testing has FAILED (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6115
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS enhancement;Atmosphere Driver
PULLREQUESTNUM 3031
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA f1b3a4b
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5881
  • Status: FAILED

Jenkins Parameters

Parameter Name Value
PR_LABELS enhancement;Atmosphere Driver
PULLREQUESTNUM 3031
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA f1b3a4b
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM
SCREAM_PullRequest_Autotester_Weaver # 6115 PASSED (click to see last 100 lines of console output)

        Start 143: model_restart
143/157 Test #143: model_restart .........................................................   Passed    7.14 sec
        Start 144: restarted_vs_monolithic_check_np1
144/157 Test #144: restarted_vs_monolithic_check_np1 .....................................   Passed    0.11 sec
        Start 145: homme_shoc_cld_spa_p3_rrtmgp_np1
145/157 Test #145: homme_shoc_cld_spa_p3_rrtmgp_np1 ......................................   Passed   12.89 sec
        Start 146: homme_shoc_cld_spa_p3_rrtmgp_baseline_cmp
146/157 Test #146: homme_shoc_cld_spa_p3_rrtmgp_baseline_cmp .............................   Passed    0.13 sec
        Start 147: homme_shoc_cld_spa_p3_rrtmgp_128levels_np1
147/157 Test #147: homme_shoc_cld_spa_p3_rrtmgp_128levels_np1 ............................   Passed    8.92 sec
        Start 148: homme_shoc_cld_spa_p3_rrtmgp_128levels_tend_check_np1
148/157 Test #148: homme_shoc_cld_spa_p3_rrtmgp_128levels_tend_check_np1 .................   Passed    1.47 sec
        Start 149: homme_shoc_cld_spa_p3_rrtmgp_128levels_baseline_cmp
149/157 Test #149: homme_shoc_cld_spa_p3_rrtmgp_128levels_baseline_cmp ...................   Passed    0.61 sec
        Start 150: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_np1
150/157 Test #150: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_np1 ...............................   Passed   12.92 sec
        Start 151: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_baseline_cmp
151/157 Test #151: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_baseline_cmp ......................   Passed    0.11 sec
        Start 152: homme_shoc_cld_p3_mam_optics_rrtmgp_np1
152/157 Test #152: homme_shoc_cld_p3_mam_optics_rrtmgp_np1 ...............................   Passed   19.22 sec
        Start 153: homme_shoc_cld_p3_mam_optics_rrtmgp_baseline_cmp
153/157 Test #153: homme_shoc_cld_p3_mam_optics_rrtmgp_baseline_cmp ......................   Passed    0.19 sec
        Start 154: homme_shoc_cld_mam_aci_p3_mam_optics_rrtmgp_mam_drydep_np1
154/157 Test #154: homme_shoc_cld_mam_aci_p3_mam_optics_rrtmgp_mam_drydep_np1 ............   Passed   19.93 sec
        Start 155: homme_shoc_cld_mam_aci_p3_mam_optics_rrtmgp_mam_drydep_baseline_cmp
155/157 Test #155: homme_shoc_cld_mam_aci_p3_mam_optics_rrtmgp_mam_drydep_baseline_cmp ...   Passed    0.16 sec
        Start 156: homme_shoc_cld_spa_p3_rrtmgp_mam4_wetscav_np1
156/157 Test #156: homme_shoc_cld_spa_p3_rrtmgp_mam4_wetscav_np1 .........................   Passed   40.47 sec
        Start 157: homme_shoc_cld_spa_p3_rrtmgp_mam4_wetscav_baseline_cmp
157/157 Test #157: homme_shoc_cld_spa_p3_rrtmgp_mam4_wetscav_baseline_cmp ................   Passed    0.21 sec

100% tests passed, 0 tests failed out of 157

Label Time Summary:
baseline_cmp = 138.67 secproc (23 tests)
baseline_gen = 347.03 sec
proc (25 tests)
bfbhash = 0.94 secproc (1 test)
check = 0.91 sec
proc (1 test)
cld = 57.37 secproc (7 tests)
cld_fraction = 5.11 sec
proc (1 test)
cxx baseline_cmp = 6.00 secproc (2 tests)
diagnostics = 55.14 sec
proc (23 tests)
driver = 116.37 secproc (16 tests)
dynamics = 9.81 sec
proc (3 tests)
fail = 28.08 secproc (5 tests)
io = 53.57 sec
proc (14 tests)
mam4_aci = 32.20 secproc (4 tests)
mam4_constituent_fluxes = 9.45 sec
proc (1 test)
mam4_drydep = 3.79 secproc (1 test)
mam4_optics = 10.42 sec
proc (1 test)
mam4_srf_online_emiss = 9.45 secproc (1 test)
mam4_wetscav = 27.95 sec
proc (2 tests)
nudging = 15.68 secproc (2 tests)
p3 = 119.09 sec
proc (12 tests)
p3_sk = 44.74 secproc (2 tests)
physics = 216.03 sec
proc (27 tests)
remap = 3.33 secproc (1 test)
rrtmgp = 53.82 sec
proc (11 tests)
shoc = 69.65 secproc (13 tests)
spa = 13.67 sec
proc (4 tests)
surface_coupling = 6.00 sec*proc (1 test)

Total Test time (real) = 856.71 sec

Testing '''9e71ecf5f2de9825ebaf71164474ad8fba7aa9f5''' for test '''full_sp_debug'''

RUN: taskset -c 52-103 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx/ctest-build/full_sp_debug/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx/ctest-build/full_sp_debug -DBUILD_NAME_MOD=full_sp_debug -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.1/gcc/11.3.0/openmpi/4.1.6/5tv5psl -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.2/gcc/11.3.0/openmpi/4.1.6/pyuuqd3 -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.6/2s52shy -DCMAKE_BUILD_TYPE=Debug -DEKAT_DEFAULT_BFB=True -DSCREAM_DOUBLE_PRECISION=False -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/full_sp_debug" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx/ctest-build/full_sp_debug

Testing '''9e71ecf5f2de9825ebaf71164474ad8fba7aa9f5''' for test '''release'''

RUN: taskset -c 104-155 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx/ctest-build/release/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx/ctest-build/release -DBUILD_NAME_MOD=release -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.1/gcc/11.3.0/openmpi/4.1.6/5tv5psl -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.2/gcc/11.3.0/openmpi/4.1.6/pyuuqd3 -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.6/2s52shy -DCMAKE_BUILD_TYPE=Release -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/release" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx/ctest-build/release

Testing '''9e71ecf5f2de9825ebaf71164474ad8fba7aa9f5''' for test '''full_debug'''

RUN: taskset -c 0-51 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx/ctest-build/full_debug/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx/ctest-build/full_debug -DBUILD_NAME_MOD=full_debug -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.1/gcc/11.3.0/openmpi/4.1.6/5tv5psl -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.2/gcc/11.3.0/openmpi/4.1.6/pyuuqd3 -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.6/2s52shy -DCMAKE_BUILD_TYPE=Debug -DEKAT_DEFAULT_BFB=True -DKokkos_ENABLE_DEBUG_BOUNDS_CHECK=True -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/full_debug" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx/ctest-build/full_debug
OVERALL STATUS: PASS
Starting analysis on weaver with cmd: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m weaver
RUN: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m weaver
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6115/scream/components/eamxx
Completed analysis on weaver'

  • [[ 0 != 0 ]]
  • [[ 1 == 0 ]]
  • [[ weaver == \m\a\p\p\y ]]
  • set +x
    Performing Post build task...
    Match found for : : True
    Logical operation result is TRUE
    Running script : #!/bin/bash -le

cd $WORKSPACE/${BUILD_ID}/

./scream/components/eamxx/scripts/jenkins/jenkins_cleanup.sh
[SCREAM_PullRequest_Autotester_Weaver] $ /bin/bash -le /tmp/jenkins8716700501625141601.sh
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Sending e-mails to: lbertag@sandia.gov
Finished: SUCCESS

SCREAM_PullRequest_Autotester_Mappy # 5881 FAILED (click to see last 100 lines of console output)

	at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2915)
	at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3410)
	at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:954)
	at java.base/java.io.ObjectInputStream.(ObjectInputStream.java:392)
	at hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:50)
	at hudson.remoting.Command.readFrom(Command.java:142)
	at hudson.remoting.Command.readFrom(Command.java:128)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)
Caused: java.io.IOException: Backing channel 'mappy' is disconnected.
	at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:215)
	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:285)
	at jdk.proxy2/jdk.proxy2.$Proxy101.isAlive(Unknown Source)
	at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1212)
	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1204)
	at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:195)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:145)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818)
	at hudson.model.Build$BuildExecution.build(Build.java:199)
	at hudson.model.Build$BuildExecution.doRun(Build.java:164)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:526)
	at hudson.model.Run.execute(Run.java:1894)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
	at hudson.model.ResourceController.execute(ResourceController.java:101)
	at hudson.model.Executor.run(Executor.java:446)
FATAL: Unable to delete script file /tmp/jenkins5895233058746487332.sh
java.io.EOFException
	at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2915)
	at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3410)
	at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:954)
	at java.base/java.io.ObjectInputStream.(ObjectInputStream.java:392)
	at hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:50)
	at hudson.remoting.Command.readFrom(Command.java:142)
	at hudson.remoting.Command.readFrom(Command.java:128)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)
Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@49331bf3:mappy": Remote call on mappy failed. The channel is closing down or has closed down
	at hudson.remoting.Channel.call(Channel.java:1035)
	at hudson.FilePath.act(FilePath.java:1229)
	at hudson.FilePath.act(FilePath.java:1218)
	at hudson.FilePath.delete(FilePath.java:1765)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:163)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818)
	at hudson.model.Build$BuildExecution.build(Build.java:199)
	at hudson.model.Build$BuildExecution.doRun(Build.java:164)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:526)
	at hudson.model.Run.execute(Run.java:1894)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
	at hudson.model.ResourceController.execute(ResourceController.java:101)
	at hudson.model.Executor.run(Executor.java:446)
Build step 'Execute shell' marked build as failure
ERROR: Unable to tear down: Channel "hudson.remoting.Channel@49331bf3:mappy": Remote call on mappy failed. The channel is closing down or has closed down
java.io.EOFException
	at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2915)
	at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3410)
	at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:954)
	at java.base/java.io.ObjectInputStream.(ObjectInputStream.java:392)
	at hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:50)
	at hudson.remoting.Command.readFrom(Command.java:142)
	at hudson.remoting.Command.readFrom(Command.java:128)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)
Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@49331bf3:mappy": Remote call on mappy failed. The channel is closing down or has closed down
	at hudson.remoting.Channel.call(Channel.java:1035)
	at hudson.Launcher$RemoteLauncher.launch(Launcher.java:1121)
	at hudson.Launcher$ProcStarter.start(Launcher.java:506)
	at PluginClassLoader for ssh-agent//com.cloudbees.jenkins.plugins.sshagent.exec.ExecRemoteAgent.stop(ExecRemoteAgent.java:116)
	at PluginClassLoader for ssh-agent//com.cloudbees.jenkins.plugins.sshagent.SSHAgentBuildWrapper$SSHAgentEnvironment.tearDown(SSHAgentBuildWrapper.java:343)
	at hudson.model.AbstractBuild$AbstractBuildExecution.tearDownBuildEnvironments(AbstractBuild.java:566)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:530)
	at hudson.model.Run.execute(Run.java:1894)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
	at hudson.model.ResourceController.execute(ResourceController.java:101)
	at hudson.model.Executor.run(Executor.java:446)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash -le

cd $WORKSPACE/${BUILD_ID}/

./scream/components/eamxx/scripts/jenkins/jenkins_cleanup.sh

We're having issues with some test-launcher job hanging forever. So let's make sure we clean all penting test-launcher jobs

squeue -o"%.7i %u %40j" | grep e3sm-jenkins | grep test-launcher | awk '{ print $1 }' | xargs -r scancel

Exception when executing the batch command : no workspace from node hudson.slaves.DumbSlave[mappy] which is computer hudson.slaves.SlaveComputer@58cc4d5f and has channel null
Build step 'Post build task' marked build as failure
Sending e-mails to: lbertag@sandia.gov
Finished: FAILURE

start_timer("EAMxx::run");

// DEBUG option: Check if user has set the run to fail at a specific timestep.
if (m_atm_params.isSublist("driver_debug_options")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to check if a sublist exists, imho. If you grab it, but did not exist, it will get created (empty). But adding the debug sublist is not really an issue, and arguably the code is easier to parse.

And I agree with Naser, that the line below is prob too long. You could do this:

auto& debug = m_atm_params.sublist("driver_debug_options");
auto fail_step = debug.get<int>("force_fail_at_step",-1);
if (fail_step==m_current_ts.get_num_steps()) {
  std::abort();
}

@E3SM-Bot
Copy link
Collaborator

E3SM-Bot commented Oct 7, 2024

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6120
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS enhancement;Atmosphere Driver
PULLREQUESTNUM 3031
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA f3ef2e6
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5886
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS enhancement;Atmosphere Driver
PULLREQUESTNUM 3031
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA f3ef2e6
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Using Repos:

Repo: SCREAM (E3SM-Project/scream)
  • Branch: aarondonahue/ad_debug_forced_fail_opt
  • SHA: f3ef2e6
  • Mode: TEST_REPO

Pull Request Author: AaronDonahue

@E3SM-Bot
Copy link
Collaborator

E3SM-Bot commented Oct 7, 2024

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6120
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS enhancement;Atmosphere Driver
PULLREQUESTNUM 3031
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA f3ef2e6
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5886
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS enhancement;Atmosphere Driver
PULLREQUESTNUM 3031
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA f3ef2e6
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

@E3SM-Bot E3SM-Bot merged commit 2bf1852 into master Oct 7, 2024
7 checks passed
@E3SM-Bot E3SM-Bot deleted the aarondonahue/ad_debug_forced_fail_opt branch October 7, 2024 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Atmosphere Driver enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants