You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While doing some followup on #563 and #564, about printing out how many good helicities are used in cudacpp and fortran, and understanding the differences between the calculation of good helicities in fortran and cudacpp, I realised something I do not like:
in cudacpp, the calculation of good helicities was designed for the initial standalone app, where we use cycles of very large grids of events
as a consequence, the calculation of good helicities in cudacpp is always done ONLY on one grid of events through the Bridge
currently, this is not a problem because we always use at least 16 or 32 events, even in c++, but there may be cases where we use much smaller grids?
in any case, there is always the alternative of doing it like in fortran? there is no separate computegoodhelicity call, you just keep computing matrix elements and then at some point you stop?
Probably not... probably should just hardcode that at least 16 events must be computed - and this must be documented everywhere.
Open this issue anyway for info...
The text was updated successfully, but these errors were encountered:
In any case, it may be useful to remove sigmaKin_getGoodHel. This makes it seem that there is some magic different operation, while actually it is as simple as that: you compute the first grid of events, and from those you get the helicities.
The only reason why this has been kept separate is to be able to compute throughputs in MEs/s which are computed only on the correct number of helicities...
Or maybe actually this is a reasonable requirement and we better keep sigmakin_getgoodhel...
Yes actually this was added explicitly, see #461. With cuda, and only one cycle of one grid, your apparent throughputs are a factor 2 slower if you do not precompute helicities.
** Essentially: in a cudacpp bridge, the first grid goes through ME calculation twice, the first time for helicity filtering, the second time for MEs **
Functionally, the above is a nonsense. But since we are focusing on throughputs, and we often run only one grid in cuda, it is important to keep these separate.
While doing some followup on #563 and #564, about printing out how many good helicities are used in cudacpp and fortran, and understanding the differences between the calculation of good helicities in fortran and cudacpp, I realised something I do not like:
Probably not... probably should just hardcode that at least 16 events must be computed - and this must be documented everywhere.
Open this issue anyway for info...
The text was updated successfully, but these errors were encountered: