(4 in pipeline) Faster RDTSC-based timers and new timer/counter APIs #1018

valassi · 2024-10-07T07:51:51Z

This is a PR including faster RDTSC-based timers and new timer/counter APIs. It completes #972.

This PR ("prof0") was derived from the pre-existing PR #962 ("prof"), by stripping off the second part (additional profiling of non-ME fortran components) and keeping only the first part (new RDTSC based timers and new APIs).

The idea is that the additional profiling of non-ME fortran components will be done at a later time in #962, but it will be modified to include patches in upstream mg5amcnlo as suggested by @oliviermattelaer , rather than relying on patchMad.sh with much larger patches, as is done presently.

…a counters namespace

…toring of counters using maps and explicit register methods

./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 1.4510s [COUNTERS] Fortran Overhead ( 0 ) : 1.3466s [COUNTERS] CudaCpp MEs ( 2 ) : 0.0871s for 16384 events => throughput is 5.32E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0008s [COUNTERS] Fortran X2F ( 4 ) : 0.0164s for 16399 events => throughput is 1.00E-06 events/s ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x10_cudacpp INFO: No Floating Point Exceptions have been reported [COUNTERS] PROGRAM TOTAL : 1.9073s [COUNTERS] Fortran Overhead ( 0 ) : 1.2890s [COUNTERS] CudaCpp MEs ( 2 ) : 0.5218s for 98304 events => throughput is 5.31E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0007s [COUNTERS] Fortran X2F ( 4 ) : 0.0958s for 98371 events => throughput is 9.74E-07 events/s

…ke cleanall and rebuild) Note: the counter itself has a huge overhead... ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7742s [COUNTERS] Fortran Overhead ( 0 ) : 0.5162s [COUNTERS] CudaCpp MEs ( 2 ) : 0.0906s for 16384 events => throughput is 5.53E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0007s [COUNTERS] Fortran X2F ( 4 ) : 0.0174s for 16399 events => throughput is 1.06E-06 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.1493s for 98304 events => throughput is 1.52E-06 events/s ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x10_cudacpp [COUNTERS] PROGRAM TOTAL : 4.1335s [COUNTERS] Fortran Overhead ( 0 ) : 2.6717s [COUNTERS] CudaCpp MEs ( 2 ) : 0.5176s for 98304 events => throughput is 5.27E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0008s [COUNTERS] Fortran X2F ( 4 ) : 0.0961s for 98371 events => throughput is 9.77E-07 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.8474s for 589824 events => throughput is 1.44E-06 events/s

…ain, to reduce performance overhead from counters themselves ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 1.4700s [COUNTERS] Fortran Overhead ( 0 ) : 1.2236s [COUNTERS] CudaCpp MEs ( 2 ) : 0.0867s for 16384 events => throughput is 5.29E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0008s [COUNTERS] Fortran X2F ( 4 ) : 0.0162s for 16399 events => throughput is 9.88E-07 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.1428s for 98304 events => throughput is 1.45E-06 events/s ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x10_cudacpp [COUNTERS] PROGRAM TOTAL : 1.9569s [COUNTERS] Fortran Overhead ( 0 ) : 0.4895s [COUNTERS] CudaCpp MEs ( 2 ) : 0.5181s for 98304 events => throughput is 5.27E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0007s [COUNTERS] Fortran X2F ( 4 ) : 0.0958s for 98371 events => throughput is 9.74E-07 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.8528s for 589824 events => throughput is 1.45E-06 events/s

…points ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7442s [COUNTERS] Fortran Overhead ( 0 ) : 0.2437s [COUNTERS] CudaCpp MEs ( 2 ) : 0.0871s for 16384 events => throughput is 5.32E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0008s [COUNTERS] Fortran X2F ( 4 ) : 0.0162s for 16399 events => throughput is 9.86E-07 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.1335s for 98304 events => throughput is 1.36E-06 events/s [COUNTERS] Fortran I/O ( 6 ) : 0.2629s for 16399 events => throughput is 1.60E-05 events/s ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x10_cudacpp [COUNTERS] PROGRAM TOTAL : 1.9099s [COUNTERS] Fortran Overhead ( 0 ) : 0.3233s [COUNTERS] CudaCpp MEs ( 2 ) : 0.5203s for 98304 events => throughput is 5.29E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0007s [COUNTERS] Fortran X2F ( 4 ) : 0.0956s for 98371 events => throughput is 9.71E-07 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.7980s for 589824 events => throughput is 1.35E-06 events/s [COUNTERS] Fortran I/O ( 6 ) : 0.1719s for 98371 events => throughput is 1.75E-06 events/s

NB: there is some hysteresis, the timing results depend on what was executed before For instance, x1 results may be 0.7 or 1.5, and x10 results may be 1.5 or 4.1: this does NOT depend on the software version! Start with x1, several times, eventually it gives 0.7 ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7417s [COUNTERS] Fortran Overhead ( 0 ) : 0.2435s [COUNTERS] CudaCpp MEs ( 2 ) : 0.0861s for 16384 events => throughput is 5.26E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0007s [COUNTERS] Fortran X2F ( 4 ) : 0.0166s for 16399 events => throughput is 1.01E-06 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.1345s for 98304 events => throughput is 1.37E-06 events/s [COUNTERS] Fortran I/O ( 6 ) : 0.2603s for 16399 events => throughput is 1.59E-05 events/s Then the FIRST execution of x10 gives 1.9 ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x10_cudacpp [COUNTERS] PROGRAM TOTAL : 1.9285s [COUNTERS] Fortran Overhead ( 0 ) : 0.3277s [COUNTERS] CudaCpp MEs ( 2 ) : 0.5237s for 98304 events => throughput is 5.33E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0007s [COUNTERS] Fortran X2F ( 4 ) : 0.0964s for 98371 events => throughput is 9.80E-07 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.8057s for 589824 events => throughput is 1.37E-06 events/s [COUNTERS] Fortran I/O ( 6 ) : 0.1741s for 98371 events => throughput is 1.77E-06 events/s But the SECOND execution gives 4.1s! With the big increase coming from the I/O part (And any subsequent execution also gives the same) ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x10_cudacpp [COUNTERS] PROGRAM TOTAL : 4.1048s [COUNTERS] Fortran Overhead ( 0 ) : 1.1119s [COUNTERS] CudaCpp MEs ( 2 ) : 0.5161s for 98304 events => throughput is 5.25E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0007s [COUNTERS] Fortran X2F ( 4 ) : 0.0946s for 98371 events => throughput is 9.62E-07 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.7954s for 589824 events => throughput is 1.35E-06 events/s [COUNTERS] Fortran I/O ( 6 ) : 1.5861s for 98371 events => throughput is 1.61E-05 events/s Now the FIRST execution of x1 gives 1.4s! ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 1.4677s [COUNTERS] Fortran Overhead ( 0 ) : 0.5601s [COUNTERS] CudaCpp MEs ( 2 ) : 0.0861s for 16384 events => throughput is 5.26E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0007s [COUNTERS] Fortran X2F ( 4 ) : 0.0167s for 16399 events => throughput is 1.02E-06 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.1338s for 98304 events => throughput is 1.36E-06 events/s [COUNTERS] Fortran I/O ( 6 ) : 0.6702s for 16399 events => throughput is 4.09E-05 events/s But the SECOND execution gives again 0.7s! And all subsequent executions too (so we are back at the beginning of the loop above) ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7480s [COUNTERS] Fortran Overhead ( 0 ) : 0.2472s [COUNTERS] CudaCpp MEs ( 2 ) : 0.0870s for 16384 events => throughput is 5.31E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0008s [COUNTERS] Fortran X2F ( 4 ) : 0.0166s for 16399 events => throughput is 1.01E-06 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.1337s for 98304 events => throughput is 1.36E-06 events/s [COUNTERS] Fortran I/O ( 6 ) : 0.2628s for 16399 events => throughput is 1.60E-05 events/s In the following, I will quote results for the second x1 and the first x10 only...

…een defined I had done this to try and decrease the 4.1s... but in the meantime I understood the problem is elsewhere. In particular, this is not faster than string comparison - will revert! ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7451s [COUNTERS] Fortran Overhead ( 0 ) : 0.2426s [COUNTERS] CudaCpp MEs ( 2 ) : 0.0875s for 16384 events => throughput is 5.34E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0007s [COUNTERS] Fortran X2F ( 4 ) : 0.0170s for 16399 events => throughput is 1.04E-06 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.1342s for 98304 events => throughput is 1.37E-06 events/s [COUNTERS] Fortran I/O ( 6 ) : 0.2631s for 16399 events => throughput is 1.60E-05 events/s ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x10_cudacpp [COUNTERS] PROGRAM TOTAL : 1.8970s [COUNTERS] Fortran Overhead ( 0 ) : 0.3151s [COUNTERS] CudaCpp MEs ( 2 ) : 0.5182s for 98304 events => throughput is 5.27E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0007s [COUNTERS] Fortran X2F ( 4 ) : 0.0952s for 98371 events => throughput is 9.67E-07 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.7950s for 589824 events => throughput is 1.35E-06 events/s [COUNTERS] Fortran I/O ( 6 ) : 0.1729s for 98371 events => throughput is 1.76E-06 events/s

…g if a counter has been defined: use string comparison to "", it is not slower Revert "[prof] in gg_tt.mad counters.cc add a flag showing if a counter has been defined" This reverts commit ee6f9f5.

…BLECOUNTERS to disable individual counters I initially wanted to use this to check if it is the individual counters that caused the 4.1s in x10 tests. But in the meantime I understood that the problem is elsewhere, and that timings depend on execution order! Will probably revert! Note, the second x1 execution takes 0.7s, with or without CUDACPP_RUNTIME_DISABLECOUNTERS ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7485s [COUNTERS] Fortran Overhead ( 0 ) : 0.2472s [COUNTERS] CudaCpp MEs ( 2 ) : 0.0872s for 16384 events => throughput is 5.32E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0008s [COUNTERS] Fortran X2F ( 4 ) : 0.0166s for 16399 events => throughput is 1.01E-06 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.1346s for 98304 events => throughput is 1.37E-06 events/s [COUNTERS] Fortran I/O ( 6 ) : 0.2621s for 16399 events => throughput is 1.60E-05 events/s CUDACPP_RUNTIME_DISABLECOUNTERS=1 ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7349s And then the first x10 execution takes 1.9s, with or without CUDACPP_RUNTIME_DISABLECOUNTERS ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x10_cudacpp [COUNTERS] PROGRAM TOTAL : 1.9127s [COUNTERS] Fortran Overhead ( 0 ) : 0.3268s [COUNTERS] CudaCpp MEs ( 2 ) : 0.5172s for 98304 events => throughput is 5.26E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0008s [COUNTERS] Fortran X2F ( 4 ) : 0.0964s for 98371 events => throughput is 9.80E-07 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.7992s for 589824 events => throughput is 1.36E-06 events/s [COUNTERS] Fortran I/O ( 6 ) : 0.1723s for 98371 events => throughput is 1.75E-06 events/s ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp CUDACPP_RUNTIME_DISABLECOUNTERS=1 ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x10_cudacpp [COUNTERS] PROGRAM TOTAL : 1.8511s While the SECOND execution x10 takes 4.1s, with or without CUDACPP_RUNTIME_DISABLECOUNTERS ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x10_cudacpp [COUNTERS] PROGRAM TOTAL : 4.1152s [COUNTERS] Fortran Overhead ( 0 ) : 1.1174s [COUNTERS] CudaCpp MEs ( 2 ) : 0.5173s for 98304 events => throughput is 5.26E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0008s [COUNTERS] Fortran X2F ( 4 ) : 0.0950s for 98371 events => throughput is 9.65E-07 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.8117s for 589824 events => throughput is 1.38E-06 events/s [COUNTERS] Fortran I/O ( 6 ) : 1.5731s for 98371 events => throughput is 1.60E-05 events/s CUDACPP_RUNTIME_DISABLECOUNTERS=1 ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x10_cudacpp [COUNTERS] PROGRAM TOTAL : 4.0680s Will therefore revert this

…CUDACPP_RUNTIME_DISABLECOUNTERS to disable individual counters Revert "[prof] in gg_tt.mad counters add an env variable CUDACPP_RUNTIME_DISABLECOUNTERS to disable individual counters" This reverts commit 0681a76.

…ther and make it counter[0] No change in the timings ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7531s [COUNTERS] Fortran Other ( 0 ) : 0.2447s [COUNTERS] CudaCpp MEs ( 2 ) : 0.0862s for 16384 events => throughput is 5.26E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0007s [COUNTERS] Fortran X2F ( 4 ) : 0.0166s for 16399 events => throughput is 1.01E-06 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.1395s for 98304 events => throughput is 1.42E-06 events/s [COUNTERS] Fortran I/O ( 6 ) : 0.2653s for 16399 events => throughput is 1.62E-05 events/s ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x10_cudacpp [COUNTERS] PROGRAM TOTAL : 1.9572s [COUNTERS] Fortran Other ( 0 ) : 0.3215s [COUNTERS] CudaCpp MEs ( 2 ) : 0.5202s for 98304 events => throughput is 5.29E-06 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0007s [COUNTERS] Fortran X2F ( 4 ) : 0.0941s for 98371 events => throughput is 9.57E-07 events/s [COUNTERS] Fortran PDF ( 5 ) : 0.8486s for 589824 events => throughput is 1.44E-06 events/s [COUNTERS] Fortran I/O ( 6 ) : 0.1720s for 98371 events => throughput is 1.75E-06 events/s

./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7543s [COUNTERS] Fortran Other ( 0 ) : 0.2451s [COUNTERS] Fortran X2F ( 1 ) : 0.0163s for 16399 events => throughput is 9.95E-07 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.1419s for 98304 events => throughput is 1.44E-06 events/s [COUNTERS] Fortran I/O ( 3 ) : 0.2617s for 16399 events => throughput is 1.60E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0885s for 16384 events => throughput is 5.40E-06 events/s ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x10_cudacpp [COUNTERS] PROGRAM TOTAL : 1.9649s [COUNTERS] Fortran Other ( 0 ) : 0.3239s [COUNTERS] Fortran X2F ( 1 ) : 0.0951s for 98371 events => throughput is 9.67E-07 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.8467s for 589824 events => throughput is 1.44E-06 events/s [COUNTERS] Fortran I/O ( 3 ) : 0.1783s for 98371 events => throughput is 1.81E-06 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.5202s for 98304 events => throughput is 5.29E-06 events/s

…xcluded from fortran other calculation) ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7510s [COUNTERS] Fortran Other ( 0 ) : 0.2485s [COUNTERS] Fortran X2F ( 1 ) : 0.0163s for 16399 events => throughput is 9.94E-07 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.1359s for 98304 events => throughput is 1.38E-06 events/s [COUNTERS] Fortran I/O ( 3 ) : 0.2628s for 16399 events => throughput is 1.60E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0868s for 16384 events => throughput is 5.30E-06 events/s [COUNTERS] PROGRAM sample_full ( 11 ) : 0.6822s ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x10_cudacpp [COUNTERS] PROGRAM TOTAL : 1.9135s [COUNTERS] Fortran Other ( 0 ) : 0.3225s [COUNTERS] Fortran X2F ( 1 ) : 0.0938s for 98371 events => throughput is 9.54E-07 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.7961s for 589824 events => throughput is 1.35E-06 events/s [COUNTERS] Fortran I/O ( 3 ) : 0.1819s for 98371 events => throughput is 1.85E-06 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.5184s for 98304 events => throughput is 5.27E-06 events/s [COUNTERS] PROGRAM sample_full ( 11 ) : 1.8445s

… that what is left is something inside sample_full Rephrasing: programtotal = samplefull + initialIO And FortranOther is inside sample_full ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7697s [COUNTERS] Fortran Other ( 0 ) : 0.1810s [COUNTERS] Fortran X2F ( 1 ) : 0.0166s for 16399 events => throughput is 1.01E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.1355s for 98304 events => throughput is 1.38E-06 events/s [COUNTERS] Fortran I/O ( 3 ) : 0.2672s for 16399 events => throughput is 1.63E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0877s for 16384 events => throughput is 5.35E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0808s [COUNTERS] PROGRAM sample_full ( 11 ) : 0.6860s ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x10_cudacpp [COUNTERS] PROGRAM TOTAL : 2.0621s [COUNTERS] Fortran Other ( 0 ) : 0.2829s [COUNTERS] Fortran X2F ( 1 ) : 0.1024s for 98371 events => throughput is 1.04E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.8580s for 589824 events => throughput is 1.45E-06 events/s [COUNTERS] Fortran I/O ( 3 ) : 0.1838s for 98371 events => throughput is 1.87E-06 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.5532s for 98304 events => throughput is 5.63E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0811s [COUNTERS] PROGRAM sample_full ( 11 ) : 1.9780s

…side the function to the calling sequence in sample_full ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7679s [COUNTERS] Fortran Other ( 0 ) : 0.1849s [COUNTERS] Fortran X2F ( 1 ) : 0.0169s for 16399 events => throughput is 1.03E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.1380s for 98304 events => throughput is 1.40E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2611s for 16399 events => throughput is 1.59E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0008s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0877s for 16384 events => throughput is 5.35E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0785s [COUNTERS] PROGRAM sample_full ( 11 ) : 0.6862s ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x10_cudacpp [COUNTERS] PROGRAM TOTAL : 1.9454s [COUNTERS] Fortran Other ( 0 ) : 0.2618s [COUNTERS] Fortran X2F ( 1 ) : 0.0961s for 98371 events => throughput is 9.77E-07 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.8161s for 589824 events => throughput is 1.38E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.1695s for 98371 events => throughput is 1.72E-06 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0008s [COUNTERS] CudaCpp MEs ( 6 ) : 0.5216s for 98304 events => throughput is 5.31E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0794s [COUNTERS] PROGRAM sample_full ( 11 ) : 1.8627s

…ing (as "test12" for the moment, wip) ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7447s [COUNTERS] Fortran Other ( 0 ) : 0.1308s [COUNTERS] Fortran X2F ( 1 ) : 0.0163s for 16399 events => throughput is 9.93E-07 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.1328s for 98304 events => throughput is 1.35E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2614s for 16399 events => throughput is 1.59E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0878s for 16384 events => throughput is 5.36E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0649s [COUNTERS] PROGRAM sample_full ( 11 ) : 0.6768s [COUNTERS] Fortran TEST ( 12 ) : 0.0499s for 16384 events => throughput is 3.05E-06 events/s

…or the moment, wip) ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7526s [COUNTERS] Fortran Other ( 0 ) : 0.1163s [COUNTERS] Fortran X2F ( 1 ) : 0.0165s for 16399 events => throughput is 1.01E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.1428s for 98304 events => throughput is 1.45E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2589s for 16399 events => throughput is 1.58E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0870s for 16384 events => throughput is 5.31E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0659s [COUNTERS] PROGRAM sample_full ( 11 ) : 0.6829s [COUNTERS] Fortran TEST ( 12 ) : 0.0537s for 16384 events => throughput is 3.28E-06 events/s [COUNTERS] Fortran TEST2 ( 13 ) : 0.0108s for 16384 events => throughput is 6.58E-07 events/s

This essentially completes the identification of all bottlenecks. Must now clean up the timers (and remove double counting, "Fortran Other" is now negative?) ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7581s [COUNTERS] Fortran Other ( 0 ) : -0.0298s [COUNTERS] Fortran X2F ( 1 ) : 0.0168s for 16399 events => throughput is 1.02E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.1441s for 98304 events => throughput is 1.47E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2627s for 16399 events => throughput is 1.60E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0882s for 16384 events => throughput is 5.38E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0656s [COUNTERS] PROGRAM sample_full ( 11 ) : 0.6896s [COUNTERS] Fortran TEST ( 12 ) : 0.0533s for 16384 events => throughput is 3.25E-06 events/s [COUNTERS] Fortran TEST2 ( 13 ) : 0.0105s for 16384 events => throughput is 6.41E-07 events/s [COUNTERS] Fortran TEST5 ( 16 ) : 0.1461s for 16384 events => throughput is 8.91E-06 events/s

./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7519s [COUNTERS] Fortran Other ( 0 ) : -0.0299s [COUNTERS] Fortran X2F ( 1 ) : 0.0165s for 16399 events => throughput is 1.01E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.1421s for 98304 events => throughput is 1.45E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2589s for 16399 events => throughput is 1.58E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0873s for 16384 events => throughput is 5.33E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0651s [COUNTERS] PROGRAM sample_full ( 11 ) : 0.6838s [COUNTERS] Fortran TEST ( 12 ) : 0.0542s for 16384 events => throughput is 3.31E-06 events/s [COUNTERS] Fortran TEST2 ( 13 ) : 0.0102s for 16384 events => throughput is 6.26E-07 events/s [COUNTERS] Fortran TEST5 ( 16 ) : 0.1467s for 16384 events => throughput is 8.95E-06 events/s

…er.f ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7533s [COUNTERS] Fortran Other ( 0 ) : -0.0253s [COUNTERS] Fortran X2F ( 1 ) : 0.0165s for 16399 events => throughput is 1.00E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.1355s for 98304 events => throughput is 1.38E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2633s for 16399 events => throughput is 1.61E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0008s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0897s for 16384 events => throughput is 5.48E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0649s [COUNTERS] PROGRAM sample_full ( 11 ) : 0.6855s [COUNTERS] Fortran TEST ( 12 ) : 0.0490s for 16384 events => throughput is 2.99E-06 events/s [COUNTERS] Fortran TEST2 ( 13 ) : 0.0102s for 16384 events => throughput is 6.20E-07 events/s [COUNTERS] Fortran TEST5 ( 16 ) : 0.1488s for 16384 events => throughput is 9.08E-06 events/s

…g1.f This changes the overall balance, now Fortran Other is again positive. This is because pdg2pdf is also called elsewhere (e.g. in unwgt?) which was already profiled elsewhere. ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7551s [COUNTERS] Fortran Other ( 0 ) : 0.0111s [COUNTERS] Fortran X2F ( 1 ) : 0.0168s for 16399 events => throughput is 1.02E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.0986s for 32768 events => throughput is 3.01E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2633s for 16399 events => throughput is 1.61E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0879s for 16384 events => throughput is 5.36E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0662s [COUNTERS] PROGRAM sample_full ( 11 ) : 0.6862s [COUNTERS] Fortran TEST ( 12 ) : 0.0515s for 16384 events => throughput is 3.14E-06 events/s [COUNTERS] Fortran TEST2 ( 13 ) : 0.0099s for 16384 events => throughput is 6.07E-07 events/s [COUNTERS] Fortran TEST5 ( 16 ) : 0.1492s for 16384 events => throughput is 9.11E-06 events/s

Now "Fortran Other" becomes negative again, there is again some double counting ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7511s [COUNTERS] Fortran Other ( 0 ) : -0.0373s [COUNTERS] Fortran X2F ( 1 ) : 0.0168s for 16399 events => throughput is 1.02E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.0965s for 32768 events => throughput is 2.94E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2598s for 16399 events => throughput is 1.58E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0008s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0868s for 16384 events => throughput is 5.30E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0670s [COUNTERS] PROGRAM sample_full ( 11 ) : 0.6811s [COUNTERS] Fortran TEST ( 12 ) : 0.0506s for 16384 events => throughput is 3.09E-06 events/s [COUNTERS] Fortran TEST2 ( 13 ) : 0.0099s for 16384 events => throughput is 6.01E-07 events/s [COUNTERS] Fortran TEST3 ( 14 ) : 0.0541s for 16384 events => throughput is 3.30E-06 events/s [COUNTERS] Fortran TEST5 ( 16 ) : 0.1462s for 16384 events => throughput is 8.93E-06 events/s

This makes it clearer that programtotal = samplefull + initialIO ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7554s [COUNTERS] Fortran Other ( 0 ) : -0.0393s [COUNTERS] Fortran X2F ( 1 ) : 0.0171s for 16399 events => throughput is 1.04E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.0984s for 32768 events => throughput is 3.00E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2621s for 16399 events => throughput is 1.60E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0872s for 16384 events => throughput is 5.32E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0688s [COUNTERS] Fortran TEST ( 12 ) : 0.0521s for 16384 events => throughput is 3.18E-06 events/s [COUNTERS] Fortran TEST2 ( 13 ) : 0.0100s for 16384 events => throughput is 6.08E-07 events/s [COUNTERS] Fortran TEST3 ( 14 ) : 0.0507s for 16384 events => throughput is 3.09E-06 events/s [COUNTERS] Fortran TEST5 ( 16 ) : 0.1478s for 16384 events => throughput is 9.02E-06 events/s [COUNTERS] PROGRAM initial_I/O ( 19 ) : 0.0688s [COUNTERS] PROGRAM sample_full ( 20 ) : 0.6838s

…calls

…grouping ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7428s [COUNTERS] Fortran Other ( 0 ) : -0.0409s [COUNTERS] Fortran X2F ( 1 ) : 0.0169s for 16399 events => throughput is 1.03E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.0982s for 32768 events => throughput is 3.00E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2585s for 16399 events => throughput is 1.58E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0865s for 16384 events => throughput is 5.28E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0670s [COUNTERS] Fortran grouping ( 12 ) : 0.0520s for 16384 events => throughput is 3.17E-06 events/s [COUNTERS] Fortran scale ( 13 ) : 0.0098s for 16384 events => throughput is 5.98E-07 events/s [COUNTERS] Fortran rewgt ( 14 ) : 0.0497s for 16384 events => throughput is 3.03E-06 events/s [COUNTERS] Fortran unwgt ( 16 ) : 0.1445s for 16384 events => throughput is 8.82E-06 events/s [COUNTERS] PROGRAM initial_I/O ( 19 ) : 0.0670s [COUNTERS] PROGRAM sample_full ( 20 ) : 0.6728s

…s, which was causing double counting and a negative Fortran Other The problem is that select_grouping_choice calls dsigproc, which eventually calls dsig1, which includes pdf profiling ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7643s [COUNTERS] Fortran Other ( 0 ) : 0.0111s [COUNTERS] Fortran X2F ( 1 ) : 0.0164s for 16399 events => throughput is 9.98E-07 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.1013s for 32768 events => throughput is 3.09E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2712s for 16399 events => throughput is 1.65E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0008s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0874s for 16384 events => throughput is 5.34E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0663s [COUNTERS] Fortran scale ( 13 ) : 0.0103s for 16384 events => throughput is 6.26E-07 events/s [COUNTERS] Fortran rewgt ( 14 ) : 0.0511s for 16384 events => throughput is 3.12E-06 events/s [COUNTERS] Fortran unwgt ( 16 ) : 0.1484s for 16384 events => throughput is 9.06E-06 events/s [COUNTERS] PROGRAM initial_I/O ( 19 ) : 0.0663s [COUNTERS] PROGRAM sample_full ( 20 ) : 0.6950s

…sig1 (not only dsig1_vec), but it does not show up! - will revert ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7479s [COUNTERS] Fortran Other ( 0 ) : 0.0122s [COUNTERS] Fortran X2F ( 1 ) : 0.0166s for 16399 events => throughput is 1.01E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.0974s for 32768 events => throughput is 2.97E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2625s for 16399 events => throughput is 1.60E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0873s for 16384 events => throughput is 5.33E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0657s [COUNTERS] Fortran scale ( 13 ) : 0.0102s for 16384 events => throughput is 6.21E-07 events/s [COUNTERS] Fortran rewgt ( 14 ) : 0.0494s for 16384 events => throughput is 3.01E-06 events/s [COUNTERS] Fortran unwgt ( 16 ) : 0.1459s for 16384 events => throughput is 8.90E-06 events/s [COUNTERS] PROGRAM initial_I/O ( 19 ) : 0.0657s [COUNTERS] PROGRAM sample_full ( 20 ) : 0.6793s

Revert "[prof] in gg_tt.mad auto_dsig1.f, add profiling for matrix1 also in dsig1 (not only dsig1_vec), but it does not show up! - will revert" This reverts commit d3165cb.

…ging git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)

…ated code except gg_tt.mad for easier merging git checkout upstream/master $(git ls-tree --name-only upstream/master *.mad/Source/dsample.f | grep -v ^gg_tt.mad)

…also amd and v1.00.01 fixes) into prof Fix conflicts (use upstream/master version): epochX/cudacpp/gg_tt.mad/Source/dsample.f Will then regenerate patches from this gg_tt.mad

…/master including v1.00.00 and also amd and v1.00.01 fixes The only files that still need to be patched are - 2 in patch.common: Source/dsample.f, SubProcesses/makefile - 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f Note: this is 3 files more than those needed in upstream/master (added Source/dsample.f, auto_dsig1.f, auto_dsig.f) ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad (Later checked that regenerating gg_tt.mad gives no change)

…and also amd and v1.00.01 fixes)

git checkout grid $(git ls-tree --name-only grid */CODEGEN*txt)

Fix conflicts: epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 (will regenerate anyway)

…ncard/bldall/tlau) into prof - essentially no change to the version created by fixing conflicts (NB: THIS IS THE LAST "PROF" CHANGE FOR THE MOMENT - WILL "TEMPORARELY" MOVE TO A SIMPLER "PROF0") ("PROF0" HAS THE NEW TIMERS/COUNTERS OF PROF WITH NEW APIS, BUT NOT THE ADDITIONAL PROFILING OF FORTRAN COMPONENTS) The only files that still need to be patched are - 2 in patch.common: Source/dsample.f, SubProcesses/makefile - 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f Note: this is 3 files more than those needed in upstream/master (added Source/dsample.f, auto_dsig1.f, auto_dsig.f) ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad (Later checked that regenerating gg_tt.mad gives no change)

…nal profiling of Fortran components: keep only the new timers and counters

…port the "temporary" changes in auto_dsig1.f (so that they do not need to go to patch.P1)

…after "temporarely" removing additional Fortran profiling (and modifying other CODEGEN fragments accordingly) The only files that still need to be patched are - 1 in patch.common: Source/genps.inc, SubProcesses/makefile - 2 in patch.P1: driver.f, matrix1.f (Note in particular that the 'prof0' changes over 'grid' in auto_dsig1.f are in smatrix_multi.f and output.py) ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad (Later regenerated gg_tt.mad and checked that all is ok)

…y" simplification of profiling

…mall-g 72h) - all ok STARTED AT Mon 07 Oct 2024 01:56:32 AM EEST ./tput/teeThroughputX.sh -dmf -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean -nocuda ENDED(1) AT Mon 07 Oct 2024 02:26:23 AM EEST [Status=0] ./tput/teeThroughputX.sh -d_f -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean -nocuda ENDED(2) AT Mon 07 Oct 2024 02:36:48 AM EEST [Status=0] ./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -d_f -bridge -makeclean -nocuda ENDED(3) AT Mon 07 Oct 2024 02:44:54 AM EEST [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -rmbhst -nocuda ENDED(4) AT Mon 07 Oct 2024 02:46:40 AM EEST [Status=0] SKIP './tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -common -nocuda' ENDED(5) AT Mon 07 Oct 2024 02:46:40 AM EEST [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -common -nocuda ENDED(6) AT Mon 07 Oct 2024 02:48:26 AM EEST [Status=0] ./tput/teeThroughputX.sh -dmf -hrd -makej -susyggtt -susyggt1t1 -smeftggtttt -heftggbb -makeclean -nocuda ENDED(7) AT Mon 07 Oct 2024 03:09:29 AM EEST [Status=0] No errors found in logs No FPEs or '{ }' found in logs

…cted STARTED AT Mon 07 Oct 2024 03:09:30 AM EEST (SM tests) ENDED(1) AT Mon 07 Oct 2024 05:27:02 AM EEST [Status=0] (BSM tests) ENDED(1) AT Mon 07 Oct 2024 05:36:08 AM EEST [Status=0] 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt 12 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt 12 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt 12 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_d_inl0_hrd0.txt 1 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_m_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_d_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_m_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_d_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_m_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_d_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt 16 /users/valassia/GPU2024/madgraph4gpu/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_m_inl0_hrd0.txt

Revert "[prof0] rerun 30 tmad tests on LUMI/HIP WITH NEW TIMERS - all as expected" This reverts commit af67440. Revert "[prof0] rerun 96 tput builds and tests with NEW TIMERS on LUMI/HIP (small-g 72h) - all ok" This reverts commit 24f9115.

…l ok STARTED AT Mon Oct 7 12:53:50 AM CEST 2024 ./tput/teeThroughputX.sh -dmf -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean -cpponly ENDED(1) AT Mon Oct 7 01:12:41 AM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -d_f -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean -cpponly ENDED(2) AT Mon Oct 7 01:19:47 AM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -d_f -bridge -makeclean -cpponly ENDED(3) AT Mon Oct 7 01:24:43 AM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -rmbhst -cpponly ENDED(4) AT Mon Oct 7 01:26:11 AM CEST 2024 [Status=0] SKIP './tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -common -cpponly' ENDED(5) AT Mon Oct 7 01:26:11 AM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -common -cpponly ENDED(6) AT Mon Oct 7 01:27:39 AM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -dmf -hrd -makej -susyggtt -susyggt1t1 -smeftggtttt -heftggbb -makeclean -cpponly ENDED(7) AT Mon Oct 7 01:36:12 AM CEST 2024 [Status=0] No errors found in logs No FPEs or '{ }' found in logs

valassi · 2024-10-07T08:35:08Z

Hi @oliviermattelaer as discussed via email, this N=4 and the last PR in the pipeline that I would like to merge.

Again, I changed this to target N=3 for easier review, but then I would merge it to master once approved.

Let me know what you think please.. thanks!

Andrea

STARTED AT Mon Oct 7 01:36:12 AM CEST 2024 (SM tests) ENDED(1) AT Mon Oct 7 04:33:19 AM CEST 2024 [Status=0] (BSM tests) ENDED(1) AT Mon Oct 7 04:38:35 AM CEST 2024 [Status=0] 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_d_inl0_hrd0.txt 1 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_m_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_d_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_m_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_d_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_m_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_d_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt 20 /data/avalassi/GPU2024/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_m_inl0_hrd0.txt

Revert "[prof0] rerun 30 tmad tests on gold91 with NEW TIMERS - all as expected" This reverts commit 3197d60. Revert "[prof0] rerun 96 tput builds and tests on gold91 with NEW TIMERS - all ok" This reverts commit 6ebaa5a.

… all ok STARTED AT Mon Oct 7 12:57:24 AM CEST 2024 ./tput/teeThroughputX.sh -dmf -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean ENDED(1) AT Mon Oct 7 01:27:08 AM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -d_f -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean ENDED(2) AT Mon Oct 7 01:36:53 AM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -d_f -bridge -makeclean ENDED(3) AT Mon Oct 7 01:45:56 AM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -rmbhst ENDED(4) AT Mon Oct 7 01:48:41 AM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -curhst ENDED(5) AT Mon Oct 7 01:51:24 AM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -common ENDED(6) AT Mon Oct 7 01:54:14 AM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -dmf -hrd -makej -susyggtt -susyggt1t1 -smeftggtttt -heftggbb -makeclean ENDED(7) AT Mon Oct 7 02:06:51 AM CEST 2024 [Status=0] No errors found in logs No FPEs or '{ }' found in logs

…cted (heft fail madgraph5#833) STARTED AT Mon Oct 7 02:06:51 AM CEST 2024 (SM tests) ENDED(1) AT Mon Oct 7 05:53:08 AM CEST 2024 [Status=0] (BSM tests) ENDED(1) AT Mon Oct 7 06:03:23 AM CEST 2024 [Status=0] 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_d_inl0_hrd0.txt 1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_m_inl0_hrd0.txt

… branch prof0 (new timers madgraph5#972) and add copyright

valassi · 2024-10-08T14:25:29Z

For the moment I moved back to draft while working on the base PRs

valassi added 30 commits August 10, 2024 13:27

[prof] in gg_tt.mad counters.cc, start refactoring of counters - add …

7d01325

…a counters namespace

[prof] in gg_tt.mad counters.cc driver.f auto_dsig1.f, complete refac…

d43c2f0

…toring of counters using maps and explicit register methods

[prof] in gg_tt.mad counters.cc, revert the addition of a flag showin…

feb7a68

…g if a counter has been defined: use string comparison to "", it is not slower Revert "[prof] in gg_tt.mad counters.cc add a flag showing if a counter has been defined" This reverts commit ee6f9f5.

[prof] in gg_tt.mad counters.cc, improve the error message for counters

07e2a93

[prof] in gg_tt.mad driver.f, clean up comments in counters_register …

fbd5322

…calls

[prof] in gg_tt.mad, revert the profiling for matrix1 in dsig1

d474e21

Revert "[prof] in gg_tt.mad auto_dsig1.f, add profiling for matrix1 also in dsig1 (not only dsig1_vec), but it does not show up! - will revert" This reverts commit d3165cb.

valassi added 16 commits October 4, 2024 18:35

[prof] move to the latest upstream/master CODEGEN logs for easier mer…

95a9070

…ging git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)

[prof] move to dsample.f from the latest upstream/master in all gener…

7c6ba3d

…ated code except gg_tt.mad for easier merging git checkout upstream/master $(git ls-tree --name-only upstream/master *.mad/Source/dsample.f | grep -v ^gg_tt.mad)

Merge remote-tracking branch 'upstream/master' (including v1.0.0 and …

416a52b

…also amd and v1.00.01 fixes) into prof Fix conflicts (use upstream/master version): epochX/cudacpp/gg_tt.mad/Source/dsample.f Will then regenerate patches from this gg_tt.mad

[prof] regenerate all processes after merging upstream/master(v1.0.0 …

817dd25

…and also amd and v1.00.01 fixes)

[prof] move to CODEGEN logs from branch grid for easier merging

abf7214

git checkout grid $(git ls-tree --name-only grid */CODEGEN*txt)

Merge branch 'grid' (runcard/bldall/tlau) into prof

6a402e0

Fix conflicts: epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 (will regenerate anyway)

[prof0] (from prof to prof0) in gg_tt.mad, "temporarely" undo additio…

da9eed2

…nal profiling of Fortran components: keep only the new timers and counters

[prof0] (from prof to prof0) in CODEGEN output.py and output.py, back…

1e3199a

…port the "temporary" changes in auto_dsig1.f (so that they do not need to go to patch.P1)

[prof0] (from prof to prof0) regenerate all processes after "temporar…

03720a3

…y" simplification of profiling

valassi self-assigned this Oct 7, 2024

valassi changed the title ~~Faster RDTSC-based timers and new timer/counter APIs~~ (4 in pipeline) Faster RDTSC-based timers and new timer/counter APIs Oct 7, 2024

valassi changed the base branch from master to valassi_3_grid October 7, 2024 08:33

valassi requested a review from oliviermattelaer October 7, 2024 08:33

valassi linked an issue Oct 7, 2024 that may be closed by this pull request

Faster timers based on rdtsc instead of chrono #972

Open

valassi added 5 commits October 7, 2024 18:44

[prof0] ** COMPLETE PROF0 ** in CHANGELOG.md, document the changes in…

4aae106

… branch prof0 (new timers madgraph5#972) and add copyright

valassi force-pushed the prof0 branch from 0fcd5d3 to 4aae106 Compare October 7, 2024 16:46

valassi marked this pull request as draft October 8, 2024 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(4 in pipeline) Faster RDTSC-based timers and new timer/counter APIs #1018

(4 in pipeline) Faster RDTSC-based timers and new timer/counter APIs #1018

valassi commented Oct 7, 2024

valassi commented Oct 7, 2024

valassi commented Oct 8, 2024

(4 in pipeline) Faster RDTSC-based timers and new timer/counter APIs #1018

Are you sure you want to change the base?

(4 in pipeline) Faster RDTSC-based timers and new timer/counter APIs #1018

Conversation

valassi commented Oct 7, 2024

valassi commented Oct 7, 2024

valassi commented Oct 8, 2024