-
Notifications
You must be signed in to change notification settings - Fork 74
Open
Labels
bugSomething isn't workingSomething isn't working
Description
In current master, two tests fail if run in parallel:
69/70 Testing: xshseqr
69/70 Test: xshseqr
Command: "/sw/env/gcc-10.3.0/openmpi/4.1.1/bin/mpiexec" "-n" "2" "./xshseqr"
Directory: /home/rrztest/src/scalapack/TESTING
"xshseqr" start time: Jul 25 20:04 CEST
Output:
----------------------------------------------------------
ScaLAPACK Test for PSHSEQR
epsilon = 5.96046448E-08
threshold = 30.0000000
Residual and Orthogonality Residual computed by:
Residual = || T - Q^T*A*Q ||_F / ( ||A||_F * eps * sqrt(N) )
Orthogonality = MAX( || I - Q^T*Q ||_F, || I - Q*Q^T ||_F ) / (eps * N)
Test passes if both residuals are less then threshold
N NB P Q QR Time CHECK
----- --- ---- ---- -------- ------
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x151fa27c93ff in ???
#1 0x151fa455124f in pstrord_
at /home/rrztest/src/scalapack/SRC/pstrord.f:1087
#2 0x151fa457a300 in pslaqr3_
at /home/rrztest/src/scalapack/SRC/pslaqr3.f:880
#3 0x151fa4565178 in pslaqr0_
at /home/rrztest/src/scalapack/SRC/pslaqr0.f:598
#4 0x151fa456209d in pshseqr_
at /home/rrztest/src/scalapack/SRC/pshseqr.f:441
#5 0x4036cf in pshseqrdriver
at /home/rrztest/src/scalapack/TESTING/EIG/pshseqrdriver.f:413
#6 0x404427 in main
at /home/rrztest/src/scalapack/TESTING/EIG/pshseqrdriver.f:565
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node node002 exited on signal 8 (Floating point exception).
--------------------------------------------------------------------------
<end of output>
Test time = 2.91 sec
----------------------------------------------------------
Test Failed.
"xshseqr" end time: Jul 25 20:04 CEST
"xshseqr" time elapsed: 00:00:02
----------------------------------------------------------
70/70 Testing: xdhseqr
70/70 Test: xdhseqr
Command: "/sw/env/gcc-10.3.0/openmpi/4.1.1/bin/mpiexec" "-n" "2" "./xdhseqr"
Directory: /home/rrztest/src/scalapack/TESTING
"xdhseqr" start time: Jul 25 20:04 CEST
Output:
----------------------------------------------------------
ScaLAPACK Test for PDHSEQR
epsilon = 1.1102230246251565E-016
threshold = 30.000000000000000
Residual and Orthogonality Residual computed by:
Residual = || T - Q^T*A*Q ||_F / ( ||A||_F * eps * sqrt(N) )
Orthogonality = MAX( || I - Q^T*Q ||_F, || I - Q*Q^T ||_F ) / (eps * N)
Test passes if both residuals are less then threshold
N NB P Q QR Time CHECK
----- --- ---- ---- -------- ------
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x1488be0113ff in ???
#1 0x1488bff4ebae in pdtrord_
at /home/rrztest/src/scalapack/SRC/pdtrord.f:1087
#2 0x1488bff77f2f in pdlaqr3_
at /home/rrztest/src/scalapack/SRC/pdlaqr3.f:878
#3 0x1488bff62d2b in pdlaqr0_
at /home/rrztest/src/scalapack/SRC/pdlaqr0.f:598
#4 0x1488bff5fc1d in pdhseqr_
at /home/rrztest/src/scalapack/SRC/pdhseqr.f:441
#5 0x4036e2 in pdhseqrdriver
at /home/rrztest/src/scalapack/TESTING/EIG/pdhseqrdriver.f:412
#6 0x404445 in main
at /home/rrztest/src/scalapack/TESTING/EIG/pdhseqrdriver.f:564
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node node002 exited on signal 8 (Floating point exception).
--------------------------------------------------------------------------
<end of output>
Test time = 2.70 sec
----------------------------------------------------------
Test Failed.
"xdhseqr" end time: Jul 25 20:04 CEST
"xdhseqr" time elapsed: 00:00:02
----------------------------------------------------------
End testing: Jul 25 20:04 CEST
Both tests pass fine with -n 1
. I tested on two machines with differing compilers and MPI versions (4.1.1 and 1.10.7).
I observe weirdly long runtimes (hundreds of seconds) for some 2.2.0 tests when run inside the pkgsrc build framework, but they do succeed eventually. These FPEs are more definite.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working