tmad test crashes in rotxxx (SIGFPE erroneous arithmetic operation) #855

valassi · 2024-06-02T07:27:34Z

"tmad test crashes for some iconfig (channel/iconfig mapping issues and SIGFPE erroneous arithmetic operation)"

Hi @oliviermattelaer this is a follow up to the discussions in #826 and PR #853.

I prefer to open this as a clean issue and investigate this independently of SUSY, or in any case of zero cross section #826.

In these discussions from your patch #853 I realised that we risk having a MAJOR problem not only for BSM but also for SM, namely: all of my 'tmad' tests test only iconfig=1. These were ok so far (in some cases by luck maybe), but for different iconfig (i.e. if we put a number different from 1 in the input_app.txt piped to madevent.

Indeed I found a crash on the first test I executed, ggttgg with iconfig=104.

 ./tmad/madX.sh -ggttgg -iconfig 104
...
On itscrd90.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: 1x Tesla V100S-PCIE-32GB]:
Working directory (run): /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg

*** (1) EXECUTE MADEVENT_FORTRAN (create results.dat) ***
 [OPENMPTH] omp_get_max_threads/nproc = 1/4
 [NGOODHEL] ngoodhel/ncomb = 64/64
 [XSECTION] VECSIZE_USED = 8192
 [XSECTION] MultiChannel = TRUE
 [XSECTION] Configuration = 104
 [XSECTION] ChannelId = 112
 [XSECTION] Cross section = 0.4632 [0.46320556621222242] fbridge_mode=0
 [UNWEIGHT] Wrote 11 events (found 187 events)
 [COUNTERS] PROGRAM TOTAL          :    4.4430s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.2478s
 [COUNTERS] Fortran MEs      ( 1 ) :    4.1953s for     8192 events => throughput is 1.95E+03 events/s

*** (1) EXECUTE MADEVENT_FORTRAN x1 (create events.lhe) ***
 [OPENMPTH] omp_get_max_threads/nproc = 1/4
 [NGOODHEL] ngoodhel/ncomb = 64/64
 [XSECTION] VECSIZE_USED = 8192
 [XSECTION] MultiChannel = TRUE
 [XSECTION] Configuration = 104
 [XSECTION] ChannelId = 112
 [XSECTION] Cross section = 0.4632 [0.46320556621222242] fbridge_mode=0
 [UNWEIGHT] Wrote 11 events (found 168 events)
 [COUNTERS] PROGRAM TOTAL          :    4.4488s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.2487s
 [COUNTERS] Fortran MEs      ( 1 ) :    4.2002s for     8192 events => throughput is 1.95E+03 events/s

*** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) ***

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7effbd423860 in ???
#1  0x7effbd422a05 in ???
#2  0x7effbd054def in ???
#3  0x44b5ff in ???
#4  0x4087df in ???
#5  0x409848 in ???
#6  0x40bb83 in ???
#7  0x40d1a9 in ???
#8  0x45c804 in ???
#9  0x434269 in ???
#10  0x40371e in ???
#11  0x7effbd03feaf in ???
#12  0x7effbd03ff5f in ???
#13  0x403844 in ???
#14  0xffffffffffffffff in ???
./tmad/madX.sh: line 387: 780951 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp}
ERROR! ' ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttgg_x1_cudacpp > /tmp/avalassi/output_ggttgg_x1_cudacpp' failed

This uses a sightly modified script, I will pur it in a PR.

I guess that the solution goes through what you proposed in #852 and the additional modifications you and I discussed there.

(Note: the 'tlau' tests that I proposed in July last year just before my absence were supposed to test exactly this (see #711), i.e. test all possible iconfig at the same time in a user-like enviornment, for all processes, but using a short manageable time. I continue to think that allowing the possibility to run shorter generate_events tests is necessary to allow better testing. There was disagreement last year, I hope we can come back and agree on this).

The text was updated successfully, but these errors were encountered:

valassi · 2024-06-02T15:47:46Z

In #852 (comment) Olivier suggested "you/we should compile with the C equivalent of -fbounds-check which is super usefull to spot segfault who by definition are hardware specific". I had a look but I am not sure there is an equivalent.

Instead I have run valgrind, this is interesting. This is a reproducer which mimics the tmad test above, but without using tmad tests

cd gg_ttgg.mad/SubProcesses/P1_gg_ttxgg
make cleanall
make -j BACKEND=cppnone -f cudacpp.mk debug
make -j BACKEND=cppnone
cat > input_cudacpp_104 << EOF
8192 1 1 ! Number of events and max and min iterations
0.000001 ! Accuracy (ignored because max iterations = min iterations)
0 ! Grid Adjustment 0=none, 2=adjust (NB if = 0, ftn26 will still be used if present)
1 ! Suppress Amplitude 1=yes (i.e. use MadEvent single-diagram enhancement)
0 ! Helicity Sum/event 0=exact
104 ! Channel number (1-N) for single-diagram enhancement multi-channel (NB used even if suppress amplitude is 0!)
EOF
./madevent_cpp < input_cudacpp_104
valgrind ./madevent_cpp < input_cudacpp_104

The valgrind output includes things like

...
==794089== Conditional jump or move depends on uninitialised value(s)
==794089==    at 0x426F03: setclscales_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x429569: update_scale_coupling_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x438857: dsig_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x45CC7A: sample_full_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x434269: MAIN__ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x40371E: main (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== 
==794089== Warning: client switching stacks?  SP change: 0x1ffeffeeb8 --> 0x1ffec3eb80
==794089==          to suppress, use: --max-stackframe=3932984 or greater
==794089== Invalid write of size 8
==794089==    at 0x4366D4: dsig1_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x437C97: dsigproc_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x4388A7: dsig_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x45CC7A: sample_full_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x434269: MAIN__ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x40371E: main (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==  Address 0x1ffec3eba8 is on thread 1's stack
==794089==  in frame #0, created by dsig1_vec_ (???:)
==794089== 
==794089== Invalid write of size 8
==794089==    at 0x4366D9: dsig1_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x437C97: dsigproc_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x4388A7: dsig_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x45CC7A: sample_full_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x434269: MAIN__ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x40371E: main (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==  Address 0x1ffec3ebb0 is on thread 1's stack
==794089==  in frame #0, created by dsig1_vec_ (???:)
...
==794089== Invalid read of size 4
==794089==    at 0x436AE5: dsig1_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x437C97: dsigproc_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x4388A7: dsig_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x45CC7A: sample_full_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x434269: MAIN__ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x40371E: main (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==  Address 0x1ffec3ebcc is on thread 1's stack
==794089==  in frame #0, created by dsig1_vec_ (???:)
...
==794089== Invalid read of size 8
==794089==    at 0x6E032EF: memmove (vg_replace_strmem.c:1385)
==794089==    by 0x6E6D811: mg5amcCpu::Bridge<double>::cpu_sequence(double const*, double const*, double const*, double const*, unsigned int, double*, int*, int*, bool) (Bridge.h:376)
==794089==    by 0x6E6F37B: fbridgesequence_ (fbridge.cc:106)
==794089==    by 0x6E6F3F2: fbridgesequence_nomultichannel_ (fbridge.cc:132)
==794089==    by 0x4358D9: smatrix1_multi_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x436C74: dsig1_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x437C97: dsigproc_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x4388A7: dsig_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x45CC7A: sample_full_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x434269: MAIN__ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==    by 0x40371E: main (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==  Address 0x1ffec7eec8 is on thread 1's stack
==794089==  in frame #5, created by dsig1_vec_ (???:)
...

Also I have rebuilt with -O3 -g in make_opts:

epochX/cudacpp/gg_ttgg.mad/Source/make_opts /tmp/git-blob-ieuRtt/make_opts e4b87ee6ad40ecb97ecbb40ae1811714ce5f1b46 100644 epochX/cudacpp/gg_ttgg.mad/Source/make_opts 0000000000000000000000000000000000000000 100644
4c4,5
< GLOBAL_FLAG=-O3 -ffast-math -fbounds-check
---
> ###GLOBAL_FLAG=-O3 -ffast-math -fbounds-check
> GLOBAL_FLAG=-O3 -g -ffast-math -fbounds-check

The crash now prints out where it happens, it is in rotxxx

Setting grid   1    0.17709E-03   1
Setting grid   2    0.17709E-03   1
Setting grid   3    0.22041E-03   1
 Transforming s_hat 1/s            9   8.8163313609467475E-004   119716.00000000000        168999999.99999997     
 Error opening symfact.dat. No permutations used.
Using random seed offsets   104 :      1
  with seed                   21
 Ranmar initialization seeds       27505        9395

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7f6471c23860 in ???
#1  0x7f6471c22a05 in ???
#2  0x7f6471854def in ???
#3  0x44b5ff in rotxxx_
        at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/Source/DHELAS/aloha_functions.f:1247
#4  0x4087df in gentcms_
        at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:1480
#5  0x409848 in one_tree_
        at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:1167
#6  0x40bb83 in gen_mom_
        at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:68
#7  0x40d1a9 in x_to_f_arg_
        at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:60
#8  0x45c804 in sample_full_
        at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/Source/dsample.f:172
#9  0x434269 in driver
        at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/driver.f:256
#10  0x40371e in main
        at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/driver.f:301
Floating point exception (core dumped)

Note, rotxxx is what I had already foun dalso in susy tests
#826 (comment)

valassi · 2024-06-02T16:00:34Z

As discussed in #826 this is again a weird optimization issue: gdb gives

Program received signal SIGFPE, Arithmetic exception.
rotxxx (p=..., q=..., prot=...) at aloha_functions.f:1247
1247              prot(1) = q(1)*q(3)/qq/qt*p1 -q(2)/qt*p(2) +q(1)/qq*p(3)
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-60.el9.x86_64 libgcc-11.3.1-4.3.el9.alma.x86_64 libgfortran-11.3.1-4.3.el9.alma.x86_64 libgomp-11.3.1-4.3.el9.alma.x86_64 libquadmath-11.3.1-4.3.el9.alma.x86_64 libstdc++-11.3.1-4.3.el9.alma.x86_64
(gdb) p qq qt p1
A syntax error in expression, near `qt p1'.
(gdb) p qq
$1 = <optimized out>
(gdb) p qt
$2 = <optimized out>
(gdb) p p1
$3 = <optimized out>

This was with -O3 -g. If I use lower optimization levels, the issue disappears.

As I have done withy many SIGFPEs in cudacpp, I tried adding volatile

--- a/epochX/cudacpp/gg_ttgg.mad/Source/DHELAS/aloha_functions.f
+++ b/epochX/cudacpp/gg_ttgg.mad/Source/DHELAS/aloha_functions.f
@@ -1201,7 +1201,7 @@ c       real    prot(0:3)      : four-momentum p in the rotated frame
 c
       implicit none
       double precision p(0:3),q(0:3),prot(0:3),qt2,qt,psgn,qq,p1
-
+      volatile qt, p1, qq
       double precision rZero, rOne
       parameter( rZero = 0.0d0, rOne = 1.0d0 )

Strangely enough. this prevents SIGFPE. But now the code seems stuck in an infinite loop?

valassi · 2024-06-02T16:06:01Z

I tried cuda to make it faster.

Again something strange, the code crashes without valgrind but does not crash with valgrind... (NB this is WITHOUT volatile)

cd gg_ttgg.mad/SubProcesses/P1_gg_ttxgg
make cleanall
make -j BACKEND=cuda -f cudacpp.mk debug
make -j BACKEND=cuda
cat > input_cudacpp_104 << EOF
8192 1 1 ! Number of events and max and min iterations
0.000001 ! Accuracy (ignored because max iterations = min iterations)
0 ! Grid Adjustment 0=none, 2=adjust (NB if = 0, ftn26 will still be used if present)
1 ! Suppress Amplitude 1=yes (i.e. use MadEvent single-diagram enhancement)
0 ! Helicity Sum/event 0=exact
104 ! Channel number (1-N) for single-diagram enhancement multi-channel (NB used even if suppress amplitude is 0!)
EOF
./madevent_cuda < input_cudacpp_104
valgrind ./madevent_cuda < input_cudacpp_104

valassi · 2024-06-02T16:15:08Z

Ok. In the cuda version, adding volatile in the Fortran removes SIGFPE and allows the program to reach the end.

So IS THIS A POSSIBLE FIX?

With cpp maybe I just needed to wait? Or is this going slower? I will try to rerun more tests and leave them running.

(In the meantime I will also try the susy_gg_t1t1 channel which in the past seemed problematic with SIGFPE).

… test a different iconfig In particular: the following triggers a SIGFPE reported in madgraph5#855 (crash in rotxxx that can be fixed adding volatile?) ./tmad/madX.sh -ggttgg -iconfig 104 -makeclean This also triggers a similar SIGFPE (initially reported in madgraph5#826) ./tmad/madX.sh -susyggt1t1 -iconfig 2 -makeclean

…SIGFPE madgraph5#855, and add volatile in aloha_functions.f to try to fix it The SIGFPE crash madgraph5#855 does seem to disappear in ./tmad/madX.sh -ggttgg -iconfig 104 -makeclean However, there is now a DIFFERENT issue, an lhe file mismatch between fortran and cpp (madgraph5#856) This is probably due to the iconfig/channel mapping issue reported by Olivier in madgraph5#852

…ebug SIGFPE madgraph5#855, and add volatile in aloha_functions.f to try to fix it The SIGFPE crash madgraph5#855 does seem to disappear in ./tmad/madX.sh -susyggt1t1 -iconfig 2 -makeclean Then no cross section is printed also for this iconfig (same as madgraph5#826 for iconfig 1), but this is a DIFFERENT issue

…: note that SIGFPE madgraph5#855 is still fixed because volatile has been added

…adgraph5#855 and prepare codegen backport

…dgraph5#855 in rotxxx The issue was observed and fixed in gg_ttgg (iconfig 104) and susy_gg_t1t1 (iconfig 2), the backport as usual is from gg_tt Note that aloha_functions.f is now added to the list of files to include when preparing patch.common ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/makefile gg_tt.mad/Source/dsample.f gg_tt.mad/Source/DHELAS/aloha_functions.f gg_tt.mad/Source/genps.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/bin/internal/banner.py gg_tt.mad/bin/internal/gen_ximprove.py gg_tt.mad/bin/internal/madevent_interface.py >> CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad

…syggt1t1 to test madgraph5#855 fix while still exposing madgraph5#826 and madgraph5#856

…o fix SIGFPE madgraph5#855 in rotxxx

valassi · 2024-06-02T17:26:15Z

This is fixed by #857 by adding volatile, as I had done for similar SIGFPE in cudacpp

…de with no volatile, to rerun tmad and expose SIGFPE madgraph5#855 git checkout upstream/master susy_gg_t1t1.mad gg_ttgg.mad

…se SIGFPE madgraph5#855 - will revert ./tmad/teeMadX.sh -mix -makeclean +10x -ggttgg -susyggt1t1

…h confirmed that SIGFPE madgraph5#855 was present and is now fixed Revert "[tmad] temporarely rerun tmad tests for ggttgg and susyggt1t1 to expose SIGFPE madgraph5#855 - will revert" This reverts commit 4fa1790. Revert "[tmad] in gg_ttgg.mad and susy_gg_t1t1.mad, temporarely go back to code with no volatile, to rerun tmad and expose SIGFPE madgraph5#855" This reverts commit 2f32ffd.

valassi · 2024-06-03T06:42:11Z

I completed my tests in PR #857 and I confirm that it fixes this issue, closing

valassi · 2024-06-24T15:46:16Z

Reopening until PR #857 is merged - or until this is otherwise clarified

… test a different iconfig In particular: the following triggers a SIGFPE reported in madgraph5#855 (crash in rotxxx that can be fixed adding volatile?) ./tmad/madX.sh -ggttgg -iconfig 104 -makeclean This also triggers a similar SIGFPE (initially reported in madgraph5#826) ./tmad/madX.sh -susyggt1t1 -iconfig 2 -makeclean

…g AS-IS Olivier's patches from the latest fix_826 branch for PR madgraph5#850 The gg_ttgg test still crashes (rotxxx madgraph5#855?) ./tmad/madX.sh -ggttgg -iconfig 104 -makeclean *** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) *** Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7fce5ec23860 in ??? 1 0x7fce5ec22a05 in ??? 2 0x7fce5e854def in ??? 3 0x44b5ff in ??? 4 0x4087df in ??? 5 0x409848 in ??? 6 0x40bb83 in ??? 7 0x40d1a9 in ??? 8 0x45c804 in ??? 9 0x434269 in ??? 10 0x40371e in ??? 11 0x7fce5e83feaf in ??? 12 0x7fce5e83ff5f in ??? 13 0x403844 in ??? 14 0xffffffffffffffff in ??? ./tmad/madX.sh: line 387: 3913008 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} The susy_gg_t1t1 test also still crashes (see madgraph5#826?), this looks like the same crash as ggttgg above ./tmad/madX.sh -susyggt1t1 -iconfig 2 -makeclean *** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) *** Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7f9f03423860 in ??? 1 0x7f9f03422a05 in ??? 2 0x7f9f03054def in ??? 3 0x43809f in ??? 4 0x40581f in ??? 5 0x4067b1 in ??? 6 0x408c71 in ??? 7 0x40a0a9 in ??? 8 0x444fdf in ??? 9 0x42bb38 in ??? 10 0x40371e in ??? 11 0x7f9f0303feaf in ??? 12 0x7f9f0303ff5f in ??? 13 0x403844 in ??? 14 0xffffffffffffffff in ??? ./tmad/madX.sh: line 387: 3907179 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} The gqttq test also still crashes intermittently, i.e. only on the second execution (madgraph5#845?) ./tmad/teeMadX.sh -gqttq +10x -fltonly -makeclean ./tmad/teeMadX.sh -gqttq +10x -fltonly Executing ' ./build.512z_f_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x1_cudacpp > /tmp/avalassi/output_gqttq_x1_cudacpp' Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7fbafa623860 in ??? 1 0x7fbafa622a05 in ??? 2 0x7fbafa254def in ??? 3 0x7fbafad24034 in ??? 4 0x7fbafa9a1575 in ??? 5 0x7fbafad20c89 in ??? 6 0x7fbafad2abfd in ??? 7 0x7fbafad30491 in ??? 8 0x43008b in ??? 9 0x431c10 in ??? 10 0x432d47 in ??? 11 0x433b1e in ??? 12 0x44a921 in ??? 13 0x42ebbf in ??? 14 0x40371e in ??? 15 0x7fbafa23feaf in ??? 16 0x7fbafa23ff5f in ??? 17 0x403844 in ??? 18 0xffffffffffffffff in ??? ./madX.sh: line 387: 3922797 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} ERROR! ' ./build.512z_f_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x1_cudacpp > /tmp/avalassi/output_gqttq_x1_cudacpp' failed

…nd cudacpp.mk to improve the crash dumps The susyggt1t1 test clearly crashes in rotxxx (madgraph5#855): ./tmad/madX.sh -susyggt1t1 -iconfig 2 -makeclean *** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) *** Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7fb7e1223860 in ??? 1 0x7fb7e1222a05 in ??? 2 0x7fb7e0e54def in ??? 3 0x43809f in rotxxx_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/Source/DHELAS/aloha_functions.f:1247 4 0x40581f in gentcms_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/genps.f:1480 5 0x4067b1 in one_tree_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/genps.f:1167 6 0x408c71 in gen_mom_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/genps.f:68 7 0x40a0a9 in x_to_f_arg_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/genps.f:60 8 0x444fdf in sample_full_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/Source/dsample.f:172 9 0x42bb38 in driver at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/driver.f:256 10 0x40371e in main at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/driver.f:301 ./tmad/madX.sh: line 387: 3928626 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} ERROR! ' ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_susyggt1t1_x1_cudacpp > /tmp/avalassi/output_susyggt1t1_x1_cudacpp' failed The ggttgg test also clearly crashes in rotxxx (madgraph5#855): ./tmad/madX.sh -ggttgg -iconfig 104 -makeclean^C *** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) *** Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7fb141c23860 in ??? 1 0x7fb141c22a05 in ??? 2 0x7fb141854def in ??? 3 0x44b5ff in rotxxx_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/Source/DHELAS/aloha_functions.f:1247 4 0x4087df in gentcms_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:1480 5 0x409848 in one_tree_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:1167 6 0x40bb83 in gen_mom_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:68 7 0x40d1a9 in x_to_f_arg_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:60 8 0x45c804 in sample_full_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/Source/dsample.f:172 9 0x434269 in driver at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/driver.f:256 10 0x40371e in main at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/driver.f:301 ./tmad/madX.sh: line 387: 3933302 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} ERROR! ' ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttgg_x1_cudacpp > /tmp/avalassi/output_ggttgg_x1_cudacpp' failed The gqttq test instead clearly crashes in sigmaKin (madgraph5#845): ./tmad/teeMadX.sh -gqttq +10x -fltonly -makeclean ./tmad/teeMadX.sh -gqttq +10x -fltonly Executing ' ./build.512z_f_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x10_cudacpp > /tmp/avalassi/output_gqttq_x10_cudacpp' Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7f607ee23860 in ??? 1 0x7f607ee22a05 in ??? 2 0x7f607ea54def in ??? 3 0x7f607f607008 in _ZN9mg5amcCpu8sigmaKinEPKfS1_S1_S1_PfjS2_S2_PiS3_i._omp_fn.0 at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/CPPProcess.cc:1190 4 0x7f607f4ab575 in ??? 5 0x7f607f603c89 in _ZN9mg5amcCpu8sigmaKinEPKfS1_S1_S1_PfjS2_S2_PiS3_i at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/CPPProcess.cc:1093 6 0x7f607f60dbfd in _ZN9mg5amcCpu23MatrixElementKernelHost21computeMatrixElementsEj at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/MatrixElementKernels.cc:115 7 0x7f607f613491 in _ZN9mg5amcCpu6BridgeIdE12cpu_sequenceEPKdS3_S3_S3_jPdPiS5_b at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/Bridge.h:390 8 0x7f607f613491 in fbridgesequence_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/fbridge.cc:106 9 0x43008b in smatrix1_multi_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/auto_dsig1.f:618 10 0x431c10 in dsig1_vec_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/auto_dsig1.f:445 11 0x432d47 in dsigproc_vec_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/auto_dsig.f:1034 12 0x433b1e in dsig_vec_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/auto_dsig.f:327 13 0x44a921 in sample_full_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/Source/dsample.f:208 14 0x42ebbf in driver at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/driver.f:256 15 0x40371e in main at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/driver.f:301 ./madX.sh: line 387: 3941122 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} ERROR! ' ./build.512z_f_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x10_cudacpp > /tmp/avalassi/output_gqttq_x10_cudacpp' failed

…g AS-IS Olivier's patches from the latest fix_826 branch for PR madgraph5#852 The gg_ttgg test still crashes (rotxxx madgraph5#855?) ./tmad/madX.sh -ggttgg -iconfig 104 -makeclean *** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) *** Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7fce5ec23860 in ??? 1 0x7fce5ec22a05 in ??? 2 0x7fce5e854def in ??? 3 0x44b5ff in ??? 4 0x4087df in ??? 5 0x409848 in ??? 6 0x40bb83 in ??? 7 0x40d1a9 in ??? 8 0x45c804 in ??? 9 0x434269 in ??? 10 0x40371e in ??? 11 0x7fce5e83feaf in ??? 12 0x7fce5e83ff5f in ??? 13 0x403844 in ??? 14 0xffffffffffffffff in ??? ./tmad/madX.sh: line 387: 3913008 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} The susy_gg_t1t1 test also still crashes (see madgraph5#826?), this looks like the same crash as ggttgg above ./tmad/madX.sh -susyggt1t1 -iconfig 2 -makeclean *** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) *** Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7f9f03423860 in ??? 1 0x7f9f03422a05 in ??? 2 0x7f9f03054def in ??? 3 0x43809f in ??? 4 0x40581f in ??? 5 0x4067b1 in ??? 6 0x408c71 in ??? 7 0x40a0a9 in ??? 8 0x444fdf in ??? 9 0x42bb38 in ??? 10 0x40371e in ??? 11 0x7f9f0303feaf in ??? 12 0x7f9f0303ff5f in ??? 13 0x403844 in ??? 14 0xffffffffffffffff in ??? ./tmad/madX.sh: line 387: 3907179 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} The gqttq test also still crashes intermittently, i.e. only on the second execution (madgraph5#845?) ./tmad/teeMadX.sh -gqttq +10x -fltonly -makeclean ./tmad/teeMadX.sh -gqttq +10x -fltonly Executing ' ./build.512z_f_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x1_cudacpp > /tmp/avalassi/output_gqttq_x1_cudacpp' Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7fbafa623860 in ??? 1 0x7fbafa622a05 in ??? 2 0x7fbafa254def in ??? 3 0x7fbafad24034 in ??? 4 0x7fbafa9a1575 in ??? 5 0x7fbafad20c89 in ??? 6 0x7fbafad2abfd in ??? 7 0x7fbafad30491 in ??? 8 0x43008b in ??? 9 0x431c10 in ??? 10 0x432d47 in ??? 11 0x433b1e in ??? 12 0x44a921 in ??? 13 0x42ebbf in ??? 14 0x40371e in ??? 15 0x7fbafa23feaf in ??? 16 0x7fbafa23ff5f in ??? 17 0x403844 in ??? 18 0xffffffffffffffff in ??? ./madX.sh: line 387: 3922797 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} ERROR! ' ./build.512z_f_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x1_cudacpp > /tmp/avalassi/output_gqttq_x1_cudacpp' failed

…nd cudacpp.mk to improve the crash dumps The susyggt1t1 test clearly crashes in rotxxx (madgraph5#855): ./tmad/madX.sh -susyggt1t1 -iconfig 2 -makeclean *** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) *** Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7fb7e1223860 in ??? 1 0x7fb7e1222a05 in ??? 2 0x7fb7e0e54def in ??? 3 0x43809f in rotxxx_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/Source/DHELAS/aloha_functions.f:1247 4 0x40581f in gentcms_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/genps.f:1480 5 0x4067b1 in one_tree_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/genps.f:1167 6 0x408c71 in gen_mom_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/genps.f:68 7 0x40a0a9 in x_to_f_arg_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/genps.f:60 8 0x444fdf in sample_full_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/Source/dsample.f:172 9 0x42bb38 in driver at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/driver.f:256 10 0x40371e in main at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/driver.f:301 ./tmad/madX.sh: line 387: 3928626 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} ERROR! ' ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_susyggt1t1_x1_cudacpp > /tmp/avalassi/output_susyggt1t1_x1_cudacpp' failed The ggttgg test also clearly crashes in rotxxx (madgraph5#855): ./tmad/madX.sh -ggttgg -iconfig 104 -makeclean^C *** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) *** Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7fb141c23860 in ??? 1 0x7fb141c22a05 in ??? 2 0x7fb141854def in ??? 3 0x44b5ff in rotxxx_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/Source/DHELAS/aloha_functions.f:1247 4 0x4087df in gentcms_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:1480 5 0x409848 in one_tree_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:1167 6 0x40bb83 in gen_mom_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:68 7 0x40d1a9 in x_to_f_arg_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:60 8 0x45c804 in sample_full_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/Source/dsample.f:172 9 0x434269 in driver at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/driver.f:256 10 0x40371e in main at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/driver.f:301 ./tmad/madX.sh: line 387: 3933302 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} ERROR! ' ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttgg_x1_cudacpp > /tmp/avalassi/output_ggttgg_x1_cudacpp' failed The gqttq test instead clearly crashes in sigmaKin (madgraph5#845): ./tmad/teeMadX.sh -gqttq +10x -fltonly -makeclean ./tmad/teeMadX.sh -gqttq +10x -fltonly Executing ' ./build.512z_f_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x10_cudacpp > /tmp/avalassi/output_gqttq_x10_cudacpp' Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: 0 0x7f607ee23860 in ??? 1 0x7f607ee22a05 in ??? 2 0x7f607ea54def in ??? 3 0x7f607f607008 in _ZN9mg5amcCpu8sigmaKinEPKfS1_S1_S1_PfjS2_S2_PiS3_i._omp_fn.0 at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/CPPProcess.cc:1190 4 0x7f607f4ab575 in ??? 5 0x7f607f603c89 in _ZN9mg5amcCpu8sigmaKinEPKfS1_S1_S1_PfjS2_S2_PiS3_i at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/CPPProcess.cc:1093 6 0x7f607f60dbfd in _ZN9mg5amcCpu23MatrixElementKernelHost21computeMatrixElementsEj at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/MatrixElementKernels.cc:115 7 0x7f607f613491 in _ZN9mg5amcCpu6BridgeIdE12cpu_sequenceEPKdS3_S3_S3_jPdPiS5_b at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/Bridge.h:390 8 0x7f607f613491 in fbridgesequence_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/fbridge.cc:106 9 0x43008b in smatrix1_multi_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/auto_dsig1.f:618 10 0x431c10 in dsig1_vec_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/auto_dsig1.f:445 11 0x432d47 in dsigproc_vec_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/auto_dsig.f:1034 12 0x433b1e in dsig_vec_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/auto_dsig.f:327 13 0x44a921 in sample_full_ at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/Source/dsample.f:208 14 0x42ebbf in driver at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/driver.f:256 15 0x40371e in main at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttq.mad/SubProcesses/P1_gu_ttxu/driver.f:301 ./madX.sh: line 387: 3941122 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp} ERROR! ' ./build.512z_f_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x10_cudacpp > /tmp/avalassi/output_gqttq_x10_cudacpp' failed Conclusion: I would not merge 852 as it does not fix issues yet. Instead I would merge 857 to fix the rotxxx crash 855 using volatile, and reassess from there...

…sm to bypass known issues in tmad tests Currently the following 12 (4 processes x 3 fptypes) issues are bypassed - "No cross section in ${proc%.mad} for FPTYPE=d,f,m (madgraph5#826)" for susy_gg_t1t1 - "SIGFPE crash in rotxxx in ${proc%.mad} for FPTYPE=d,f,m (madgraph5#855)" for gq_ttq, pp_tt012j, nobm_pp_ttW

… will now fail on rotxx crashes madgraph5#855 and on zero cross section madgraph5#826

…gpu#855): add 'volatile' to prevent optimizations

…raph5#855 (prepare to move upstream to mg5amcnlo gpucpp)

…raph5#855 crashes in rotxxx (move this upstream as suggested by Olivier)

valassi · 2024-06-27T15:20:01Z

I change the name of this to indicate that this is ONLY about rotxxx crashes. This can be fixed using 'volatile' in PR #857 and mg5amcnlo/mg5amcnlo#113

Conversely I removed "channel/iconfig mapping issues" from the name of this issue. Those "channel/iconfig mapping issues" are behind the LHE mismatch #856 and possibly the intermittent sigmakin crash #845.

…adgraph4gpu#855 crash in rotxxx) into gpucpp_826

…syggt1t1 to test madgraph5#855 fix while still exposing madgraph5#826 and madgraph5#856

… test a different iconfig In particular: the following triggers a SIGFPE reported in madgraph5#855 (crash in rotxxx that can be fixed adding volatile?) ./tmad/madX.sh -ggttgg -iconfig 104 -makeclean This also triggers a similar SIGFPE (initially reported in madgraph5#826) ./tmad/madX.sh -susyggt1t1 -iconfig 2 -makeclean

…syggt1t1 to test madgraph5#855 fix while still exposing madgraph5#826 and madgraph5#856

…p and gpucpp_826, to allow cherry-picking Olivier's fix_826 changes (later on, will include Olivier's gpucpp_826 change into gpucpp directly) Revert "[tmad] update mg5amcnlo to f274cab55, adding volatile to prevent madgraph5#855 crashes in rotxxx (move this upstream as suggested by Olivier)" This reverts commit 720ae02. Revert "[valgrind] upgrade MG5AMC to include the merge of PR madgraph5#110 and PR madgraph5#112 into the gpucpp branch" This reverts commit 7d3dc34. Revert "[valgrind] upgrade MG5AMC to include the workaround for uninitialised values mg5amcnlo/mg5amcnlo#111" This reverts commit f355965. Revert "[valgrind] upgrade MG5AMC to include the fix for memory leak mg5amcnlo/mg5amcnlo#109" This reverts commit 7bb4142.

…t on Olivier's latest fix_826 commit d23e773 1) Note about Olivier's latest fix_826 commit d23e773 Olivier's 75c05c5 includes his initial 6 commits in fix_826: git log upstream/master --oneline -n1 0992927 (upstream/master, origin/color2, origin/actions) Merge pull request madgraph5#857 from valassi/tmad git log --oneline 0992927..75c05c5 75c05c5 Merge branch 'master_june24' into fix_826 92a8284 better comment in coloramps 2bcea76 trying to fix git issue 63494ef change to Andrea convention of naming (but removing step variable) 5b6d065 increase readibility and move from map to array 41ddc38 fix a issue for omp compilation bed2e12 try to fix the segfault on issue 826 Olivier's d23e773 is then a merge of the latest upstream/master in 75c05c5, fixing the MG5AMC conflict by setting it to 74fd166c1 git show d23e773 Merge: 75c05c5 0992927 update this branch with andrea fix in master diff --cc MG5aMC/mg5amcnlo - Subproject commit 10378b3c0971e1a241fd9dc365e592c92d1f13ba -Subproject commit f274cab55d5d983c5612ca7ab3417ee796aa1a8c ++Subproject commit 74fd166c1e22bde2dfe01b2e001ac3b177628165 2) Note that, in MG5AMC, 74fd166c1 (obsolete branch gpucpp_826) is the same as 09c96dd17 (branch gpucpp): git diff 74fd166c1 09c96dd17 [NO DIFF] git log --oneline e428e38c6..09c96dd17 09c96dd17 (origin/gpucpp) allow for second exporter to have access to all variable used in the fortran exporter 9abf6a3ad Merge pull request madgraph5#113 from valassi/valassi_volatile f274cab55 (ghav/valassi_volatile, valassi_volatile) Workaround for SIGFPE crashes in function rotxxx (madgraph5#855): add 'volatile' to prevent optimizations 0b8678984 Merge pull request madgraph5#112 from valassi/valassi_uninitialised111 18696c1cf Merge pull request madgraph5#110 from valassi/valassi_leak109 4f8fbb7f3 (ghav/valassi_uninitialised111) Workaround for issue madgraph5#111 reported by valgrind (initialise goodjet array in function setclscales in reweight.f) f6d90fa58 (ghav/valassi_leak109, valassi_leak109) Fix memory leak madgraph5#109 in madevent_driver.f (close file dname.mg) f9f957918 (valgrind) Fix validity time check for UFO pickle (madgraph5#97) 619f5db45 avoid that some parameter switch type when loading model git log --oneline e428e38c6..74fd166c 74fd166c1 (HEAD, origin/gpucpp_826, gpucpp_826) Merge remote-tracking branch 'origin/gpucpp' (PR madgraph5#113 for madgraph5#855 crash in rotxxx) into gpucpp_826 9abf6a3ad Merge pull request madgraph5#113 from valassi/valassi_volatile f274cab55 (ghav/valassi_volatile, valassi_volatile) Workaround for SIGFPE crashes in function rotxxx (madgraph5#855): add 'volatile' to prevent optimizations e4d9df4ab Merge remote-tracking branch 'origin/gpucpp' (PRs madgraph5#110 and madgraph5#112 for issues madgraph5#109 and madgraph5#111) into gpucpp_826 0b8678984 Merge pull request madgraph5#112 from valassi/valassi_uninitialised111 18696c1cf Merge pull request madgraph5#110 from valassi/valassi_leak109 4f8fbb7f3 (ghav/valassi_uninitialised111) Workaround for issue madgraph5#111 reported by valgrind (initialise goodjet array in function setclscales in reweight.f) f6d90fa58 (ghav/valassi_leak109, valassi_leak109) Fix memory leak madgraph5#109 in madevent_driver.f (close file dname.mg) 10378b3c0 allow for second exporter to have access to all variable used in the fortran exporter f9f957918 (valgrind) Fix validity time check for UFO pickle (madgraph5#97) 619f5db45 avoid that some parameter switch type when loading model 3) Note that color includes the following submodule updates, passing through 09c96dd17 to ba54a4153 git show --oneline upstream/master..color ../../MG5aMC/ 4b29496 [color] update MG5AMC to ba54a4153 in th egpuccp branch, with a minor fix in a comment for my icolamp patch Submodule MG5aMC/mg5amcnlo 99e064157..ba54a4153: > minor fix in a printout in my previous patch in export_cpp.py 1c2a02d [color] update MG5AMC to 99e064157, fixing bug madgraph5#856 (and related ones) about the icolamp array in coloramps.h Submodule MG5aMC/mg5amcnlo 09c96dd17..99e064157: > In export_cpp.py fix bug madgraph5#114 in get_icolamp_lines, resulting in different icolamp arrays for F77 and CPP (see madgraph5#873) 0a60262 [color] update MG5AMC to 09c96dd17: this is the latest gpucpp branch, now including Olivier's extra commit previously in gpucpp_826 Submodule MG5aMC/mg5amcnlo 10378b3c0...09c96dd17: > allow for second exporter to have access to all variable used in the fortran exporter > Merge pull request madgraph5#113 from valassi/valassi_volatile > Merge pull request madgraph5#112 from valassi/valassi_uninitialised111 > Merge pull request madgraph5#110 from valassi/valassi_leak109 < allow for second exporter to have access to all variable used in the fortran exporter 16ff942 try to fix the segfault on issue 826 Submodule MG5aMC/mg5amcnlo f9f957918..10378b3c0: > allow for second exporter to have access to all variable used in the fortran exporter 4b12e79 [color] temporarely downgrade back MG5AMC to the common base of gpucpp and gpucpp_826, to allow cherry-picking Olivier's fix_826 changes > Submodule MG5aMC/mg5amcnlo f274cab55..f9f957918 (rewind): < Workaround for SIGFPE crashes in function rotxxx (madgraph5#855): add 'volatile' to prevent optimizations < Merge pull request madgraph5#112 from valassi/valassi_uninitialised111 < Merge pull request madgraph5#110 from valassi/valassi_leak109 => Therefore I can simply merge origin/color into color2 and fix the MG5AMC conflict by setting it to ba54a4153 (valassi_icolamp114, before more recent changes)

valassi · 2024-07-04T17:27:54Z

Note, there is a crash #885 in master_june40 that I thought was related to this, but it most likely is unrelated (and is instead speciufic to master_june40)

valassi self-assigned this Jun 2, 2024

valassi mentioned this issue Jun 2, 2024

"Fix 826" (actually: fix iconfig-channel mapping) #852

Merged

valassi changed the title ~~tmad test crashes for some iconfig (channel/iconfig mapping issues and SIGFPE erroneous arithmetic operation)~~ tmad test crashes for some iconfig (SIGFPE erroneous arithmetic operation: crash in rotxxx and/or channel/iconfig mapping issues?) Jun 2, 2024

valassi mentioned this issue Jun 2, 2024

No cross section in SUSY gg_t1t1 log file #826

Closed

valassi mentioned this issue Jun 2, 2024

MAJOR ISSUE: color mismatch fortran/cpp in LHE file for iconfig 104 in SM gg_ttgg (channel/iconfig mapping AND icolamp issues) #856

Closed

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 2, 2024

[tmad] in gg_ttgg.mad and susy_gg_t1t1.mad make_opts, remove -g again…

8dace2f

…: note that SIGFPE madgraph5#855 is still fixed because volatile has been added

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 2, 2024

[tmad] in gg_tt.mad, add volatile in aloha_functions.f to fix SIGFPE m…

d98f939

…adgraph5#855 and prepare codegen backport

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 2, 2024

[tmad] in tmad/madX.sh, use iconfig=104 in ggttgg and iconfig=2 in su…

2883c56

…syggt1t1 to test madgraph5#855 fix while still exposing madgraph5#826 and madgraph5#856

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 2, 2024

[tmad] regenerate all processes - add volatile in aloha_functions.f t…

3bb76c8

…o fix SIGFPE madgraph5#855 in rotxxx

valassi mentioned this issue Jun 2, 2024

Fix SIGFPE crash (855) in rotxxx by adding volatile in aloha_functions.f #857

Merged

valassi linked a pull request Jun 2, 2024 that will close this issue

Fix SIGFPE crash (855) in rotxxx by adding volatile in aloha_functions.f #857

Merged

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 3, 2024

[tmad] temporarely rerun tmad tests for ggttgg and susyggt1t1 to expo…

4fa1790

…se SIGFPE madgraph5#855 - will revert ./tmad/teeMadX.sh -mix -makeclean +10x -ggttgg -susyggt1t1

valassi closed this as completed Jun 3, 2024

valassi reopened this Jun 24, 2024

valassi mentioned this issue Jun 25, 2024

valgrind issues #868

Closed

valassi mentioned this issue Jun 27, 2024

extend testsuite CI (split codegen from build/test, execute tests for many fptypes, add tmad tests) #794

Merged

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 27, 2024

[actions] ** COMPLETE ACTIONS ** reenable the 12 known issues: the CI…

7466525

… will now fail on rotxx crashes madgraph5#855 and on zero cross section madgraph5#826

valassi added a commit to valassi/mg5amcnlo that referenced this issue Jun 27, 2024

Workaround for SIGFPE crashes in function rotxxx (madgraph5/madgraph4…

f274cab

…gpu#855): add 'volatile' to prevent optimizations

valassi mentioned this issue Jun 27, 2024

Add 'volatile' in function rotxxx to prevent crashes in optimized code mg5amcnlo/mg5amcnlo#113

Merged

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 27, 2024

[tmad] in gg_tt.mad aloha_functions.f, improve the comment about madg…

1626858

…raph5#855 (prepare to move upstream to mg5amcnlo gpucpp)

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 27, 2024

[tmad] update mg5amcnlo to f274cab55, adding volatile to prevent madg…

720ae02

…raph5#855 crashes in rotxxx (move this upstream as suggested by Olivier)

valassi changed the title ~~tmad test crashes for some iconfig (SIGFPE erroneous arithmetic operation: crash in rotxxx and/or channel/iconfig mapping issues?)~~ tmad test crashes in rotxxx (SIGFPE erroneous arithmetic operation) Jun 27, 2024

valassi closed this as completed in #857 Jun 27, 2024

valassi added a commit to mg5amcnlo/mg5amcnlo that referenced this issue Jun 27, 2024

Merge remote-tracking branch 'origin/gpucpp' (PR #113 for madgraph5/m…

74fd166

…adgraph4gpu#855 crash in rotxxx) into gpucpp_826

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 27, 2024

[tmad] in tmad/madX.sh, use iconfig=104 in ggttgg and iconfig=2 in su…

21c6945

…syggt1t1 to test madgraph5#855 fix while still exposing madgraph5#826 and madgraph5#856

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 27, 2024

[tmad] in tmad/madX.sh, use iconfig=104 in ggttgg and iconfig=2 in su…

0920709

…syggt1t1 to test madgraph5#855 fix while still exposing madgraph5#826 and madgraph5#856

valassi mentioned this issue Jul 4, 2024

Merge of master into master_june24 and channelid fixes/reimplementation #882

Merged

valassi mentioned this issue Jul 31, 2024

DIVBYZERO and INVALID FPEs in CMS tests (pp_dy012j.mad in P2_uc_epemuc/G2) #942

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tmad test crashes in rotxxx (SIGFPE erroneous arithmetic operation) #855

tmad test crashes in rotxxx (SIGFPE erroneous arithmetic operation) #855

valassi commented Jun 2, 2024

valassi commented Jun 2, 2024

valassi commented Jun 2, 2024

valassi commented Jun 2, 2024 •

edited

Loading

valassi commented Jun 2, 2024

valassi commented Jun 2, 2024

valassi commented Jun 3, 2024

valassi commented Jun 24, 2024

valassi commented Jun 27, 2024

valassi commented Jul 4, 2024

tmad test crashes in rotxxx (SIGFPE erroneous arithmetic operation) #855

tmad test crashes in rotxxx (SIGFPE erroneous arithmetic operation) #855

Comments

valassi commented Jun 2, 2024

valassi commented Jun 2, 2024

valassi commented Jun 2, 2024

valassi commented Jun 2, 2024 • edited Loading

valassi commented Jun 2, 2024

valassi commented Jun 2, 2024

valassi commented Jun 3, 2024

valassi commented Jun 24, 2024

valassi commented Jun 27, 2024

valassi commented Jul 4, 2024

valassi commented Jun 2, 2024 •

edited

Loading