Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP add pp_tt to repo (plus obsolete fixes for bug 872 via reset_cumulative_variable, now fixed by Olivier via a single helicity filter) #935

Draft
wants to merge 123 commits into
base: master
Choose a base branch
from

Conversation

valassi
Copy link
Member

@valassi valassi commented Jul 23, 2024

This is a WIP PR to investigate the pp_tt012j bug 872

valassi added 10 commits July 23, 2024 12:04
NB: there are big differences between these two directories
diff pp_tt012j.mad/SubProcesses/P2_gu_ttxgu/ gu_ttgu.mad/SubProcesses/P1_gu_ttxgu/
…s not show up here?...)

./tmad/teeMadX.sh -guttgu -makeclean +10x
NB1: there are big differences between these two directories
diff pp_tt012j.mad/SubProcesses/P2_gu_ttxgu/ gq_ttgq.mad/SubProcesses/P1_gu_ttxgu/

NB2: there are some differemces between these two directories, but not that many
(the main difference is MAXPROC=4 vs MAXPROC=1)
diff gu_ttgu.mad/SubProcesses/P1_gu_ttxgu/ gq_ttgq.mad/SubProcesses/P1_gu_ttxgu/
…does not show up here?...)

./tmad/teeMadX.sh -gqttgq -makeclean +10x
…rks - P1_gux_ttxgux disappears, but P1_gu_ttxgu is identical
valassi added 8 commits July 23, 2024 13:48
NB: the two following directories are essentially identical
diff pp_tt012j.mad/SubProcesses/P2_gu_ttxgu/ pp_ttjj.mad/SubProcesses/P1_gu_ttxgu/
…l (bug madgraph5#872)

./tmad/teeMadX.sh -pptt012j -makeclean +10x
./tmad/teeMadX.sh -ppttjj -makeclean +10x

HOWEVER
- strangely enough, the comparisons with 8192 events give the same cross sections (?the CI fails with 32?!)
- the cross sections are different in the two cases pp_tt012j and pp_ttjj for the same subprocess
- in particular the numbers of events processed seem different in fortran/cpp and in 012j/jj

Look at this in particular
diff tmad/logs_pptt012j_mad/log_pptt012j_mad_d_inl0_hrd0.txt tmad/logs_ppttjj_mad/log_ppttjj_mad_d_inl0_hrd0.txt
…rorprocs.inc, try to change MIRRORPROCS from true to false to investigate madgraph5#872

(See also Roiser's comments madgraph5#872 (comment))
…MIRRORPROCS=FALSE: they now both succeed (work around madgraph5#872)

./tmad/teeMadX.sh -pptt012j -makeclean +10x
./tmad/teeMadX.sh -ppttjj -makeclean +10x

Note in particular that the numbers of events processed are now the same in Fortran and C++
…(undo PR madgraph5#754) to fix madgraph5#872

This resets MIRROPROCS to FALSE (instead of TRUE) in mirrorprocs.inc files, e.g. in pp_ttjj
This fixes the fortran/cudacpp xsec mismatch madgraph5#872 in pp_ttjj
Note in particular that the numbers of events processed are now the same in Fortran and C++,
while a higher number of events was processed in Fortran without this fix

Revert "avoid that mirroring is reset by the plugin"
This reverts commit 2556cdd (from PR madgraph5#754)
Fix conflicts: epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/model_handling.py
…py to disable mirror processing again (undo PR madgraph5#754) to fix madgraph5#872

This does set MIRRORPROCS to false.
But it ALSO sets nprocesses=1 instead of nprocesses=1.
@valassi valassi mentioned this pull request Jul 24, 2024
@valassi valassi changed the title WIP investigate pp_tt012j bug 872 disable mirror processes again and add back an assert nprocesses=1 (fix pp_tt012j bug 872, undo 754 and partly 764) Jul 24, 2024
valassi added 6 commits July 24, 2024 15:01
…RORPROCS=FALSE and nprocesses=1: it succeeds again (work around madgraph5#872)

The question is, is nprocesses relevant at all in the cudacpp calculations?
(PS the answer is, no it is not: it only used for sanity checks... and I will add back the assert expecting nprocesses=1)

./tmad/teeMadX.sh -pptt012j -makeclean +10x
./tmad/teeMadX.sh -ppttjj -makeclean +10x
…ting nprocesses == 1 (part of the fix for madgraph5#872 in PR madgraph5#935; undo changes from PR madgraph5#754 and madgraph5#764)

NB: nprocesses is unused in cudacpp, because cudacpp is unable to handle mirror processes!
This means that the result of madevent with cudacpp MEs are the same whether nprocesses is 1 or 2.
However, nprocesses=2 signals that Fortran is using mirror processes.
If mirror processes are used in Fortran MEs but not in cudacpp MEs, this leads to a xsec mismatch madgraph5#872.

Implementing mirror processes in cudacpp would be extremely complicated.
This is an assumption that was made years ago.
To be cross checked with Olivier, but I think that the code generated with cudacpp without mirror processes
also ensure that the mirror processes are _separately_ generated as separate subprocesses also in Fortran,
therefore the whole calculation is internally consistent if mirrorprocesses is false in both Fortran and cudacpp.
…rror processied and to add a static_assert for nprocesses=1 (madgraph5#872)
…RORPROCS=FALSE and assert nprocesses=1: it still succeeds (fix madgraph5#872)

./tmad/teeMadX.sh -pptt012j -makeclean +10x
./tmad/teeMadX.sh -ppttjj -makeclean +10x
valassi added 2 commits August 2, 2024 09:16
STARTED  AT Wed Jul 31 11:46:06 AM CEST 2024
./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean
ENDED(1) AT Wed Jul 31 12:07:50 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean
ENDED(2) AT Wed Jul 31 12:15:50 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean
ENDED(3) AT Wed Jul 31 12:24:10 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst
ENDED(4) AT Wed Jul 31 12:26:53 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst
ENDED(5) AT Wed Jul 31 12:29:34 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -common
ENDED(6) AT Wed Jul 31 12:32:20 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -mix -hrd -makej -susyggtt -susyggt1t1 -smeftggtttt -heftggbb -makeclean
ENDED(7) AT Wed Jul 31 12:41:32 PM CEST 2024 [Status=0]
… helicities madgraph5#872) - all as expected (failures only in heft madgraph5#833)

STARTED  AT Wed Jul 31 12:41:32 PM CEST 2024
(SM tests)
ENDED(1) AT Wed Jul 31 05:09:52 PM CEST 2024 [Status=0]
(BSM tests)
ENDED(1) AT Wed Jul 31 05:21:16 PM CEST 2024 [Status=0]

24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_d_inl0_hrd0.txt
1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_pptt_mad/log_pptt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_pptt_mad/log_pptt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_pptt_mad/log_pptt_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_m_inl0_hrd0.txt
@valassi
Copy link
Member Author

valassi commented Aug 2, 2024

I have rerun all my manual tests too, I confirm this as good to go, unless we decide on a different strategy

@oliviermattelaer
Copy link
Member

Ok, so #941 concluded
that

  • Yes, this can be removed from the fortran side
  • That if using limhel (which is important to do) then the list can be different due to the different random number used to fill both list.
  • that for the moment limhel is set to zero (this need to be clarified when this is happening)
  • that the list of helicity was common for all matrix#.f (irrelevant for the gpucpp case since we ensure to have only one)

So I'm working on a PR for the MG5aMC: mg5amcnlo/mg5amcnlo#124
which, in particular: remove the need of imirror but add the support to have different list for different matrix_element.

Let discuss next week.

valassi added 3 commits August 2, 2024 18:40
…5#949 fixing madgraph5#947) into pptt

Fix conflicts in epochX/cudacpp/pp_tt.mad/CODEGEN_mad_pp_tt_log.txt (git checkout pptt pp_tt.mad/CODEGEN_mad_pp_tt_log.txt)
Note: there is no change here, which is strange because pp_tt.mad/bin/internal/madevent_interface.py should have changed?
Strangely, git blame shows that this is considered as a move of gg_tt01g.mad...

git blame pp_tt.mad/bin/internal/madevent_interface.py  | grep 947
8a8ee1a epochX/cudacpp/gg_tt01g.mad/bin/internal/madevent_interface.py (Andrea Valassi 2024-08-02 17:28:18 +0200 3970)                 # avoid case where some file starts with G (madgraph5#947)
@valassi
Copy link
Member Author

valassi commented Aug 2, 2024

Thanks Olivier. On my side I updated this PR with #949 upstream/master.

My proposal is the following:

  • now/soon we merge this PR with at least a temporary patch (the 2nd call to reset cumulative variable) that makes fortran and cudacpp tests identical
  • then/later you work on your Goodhel mg5amcnlo/mg5amcnlo#124 and when this is done I remove the second cumulative_variable call and adapt the code more generally
  • also then/later at some point I work on Move LIMHEL=0 to madgraph4gpu only and add a runcard allowing LIMHEL=0 #950 and I move back to LIMHEL>0 in production but add a runcard to allow LIMHEL=0 and use that as default in madgraph4gpu

But I would not wait to have all of the above done before fixing these pptt012j tests... I want to see the cross sections identical, and be able to debug CMS without having that issue, too

Anyway lets discuss next week! Have a good weekend :-)

Andrea

@oliviermattelaer
Copy link
Member

So my counter proposal is to move directly on this PR: #955
where all CI are now green (and which also removes the first reset_cumulative call since now fortran has a smarter trigger to check if such call is needed and therefore does not need it for the cases tested by your test (this is only needed for MC over helicity).

And then move on #950
(which is now in the run_card).

…ier merging

git checkout upstream/master $(git ls-tree --name-only upstream/master tmad/logs* tput/logs*)
…r merging

git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
…g_tt.mad, to ease merging and conflict resolution

(From the cudacpp directory)
git checkout upstream/master $(git ls-tree --name-only upstream/master *.mad *.sa | grep -v ^gg_tt.mad)
…ase merging (prevent git from thinking that pp_tt is its renaming...)
…mad to the repo

./CODEGEN/generateAndCompare.sh -q gg_tt01g --mad
…ad) from upstream/master, except gg_tt.mad, to ease merging and conflict resolution

(From the cudacpp directory)
git checkout upstream/master $(git ls-tree --name-only upstream/master *.mad *.sa | grep -v ^gg_tt.mad)
… mac madgraph5#974, nvcc madgraph5#966) into pptt

Fix conflicts:
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 (take HEAD version, must recompute)
	epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f (fix manually)
@valassi
Copy link
Member Author

valassi commented Aug 21, 2024

I will look at 955 from Olivier, but in the meantime I resynced this to the latest upstream/master

…/master

The only files that still need to be patched are
- 3 in patch.common: Source/makefile, Source/genps.inc, SubProcesses/makefile
- 3 in patch.P1: auto_dsig1.f, driver.f, matrix1.f

./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch
git diff --no-ext-diff -R gg_tt.mad/Source/makefile gg_tt.mad/Source/genps.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
git checkout gg_tt.mad
@valassi valassi changed the title fix bug 872 (call reset_cumulative_variable once per mirror), add sanity check that two helicity lists in fortran mirrors are identical, add pp_tt to repo WIP add pp_tt to repo (plus obsolete fixes for bug 872 via reset_cumulative_variable, now fixed by Olivier via a single helicity filter) Sep 15, 2024
@valassi valassi marked this pull request as draft September 15, 2024 16:08
@valassi
Copy link
Member Author

valassi commented Sep 15, 2024

I renamed this and marked this as WIP.

The initial focus of this PR (fixing 872 via reset_cumulative_variable) is obsolete. The issue has been fixed by Olivier in his #955, which will be included in the upcoming #992.

I keep this open just in case I want to add pp_tt (temporarely?) to the repo. Low priority, will have a look after #992.

@valassi valassi mentioned this pull request Sep 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants