Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration PR followups: make_workdiv, uniform_elements, concrete kernel dimensions #141

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

ariostas
Copy link
Member

@ariostas ariostas commented Dec 17, 2024

I started to address some follow-up tasks in #75. In particular:

  • Switch to cms::alpakatools::makeworkdir instead of our custom createWorkDiv.
  • Switch to cms::alpakatools::uniform_elements for kernel loops.
  • Don't set a custom work division for CPU.
  • Started moving towards the removal of kVerticalModuleSlope.
  • Use concrete kernel dimensions instead of templated ones.

@ariostas
Copy link
Member Author

ariostas commented Dec 17, 2024

I think the plots might look a bit different now that the work division is different, but hopefully they are still run-to-run reproducible (I'll check).

For now, I'm just making sure I didn't break something.

/run standalone
/run checks

Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     46.8    400.0    187.9    153.9    166.6    515.0    123.0    232.9    146.5      3.9    1976.4    1414.6+/- 393.7     525.2   explicit[s=4] (target branch)
   avg     46.5    386.1    190.4    157.7    165.2    694.5      8.8     11.8    160.6      3.4    1824.9    1084.0+/- 241.8     482.7   explicit[s=4] (this PR)

@ariostas
Copy link
Member Author

Welp, I did break something. I'll look into it

@ariostas ariostas force-pushed the ariostas/integration_pr_followups branch from 17d4223 to aa82696 Compare December 17, 2024 20:39
@ariostas
Copy link
Member Author

Hopefully it's fine now.

/run standalone

Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     45.8    394.5    194.2    150.0    164.7    514.4    121.9    236.0    145.2      3.7    1970.5    1410.2+/- 392.1     519.0   explicit[s=4] (target branch)
   avg     44.4    385.7    188.4    155.9    149.7    702.5    127.3    256.8    177.9      3.7    2192.4    1445.4+/- 400.5     577.8   explicit[s=4] (this PR)

@ariostas
Copy link
Member Author

The plots match perfectly, which is nice. There was a significant increase in timing, especially for pLS, but it seems like this is only for CPU. This is how the timing compares in cgpu-1.

This PR (aa82696)
Total Timing Summary
Average time for map loading = 571.866 ms
Average time for input loading = 7550.76 ms
Average time for lst::Event creation = 0.00314486 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     11.7      0.5      0.3      1.4      1.2      0.4      0.8      0.6      1.1      0.0      18.1       6.0+/-  1.2      20.1   explicit[s=1]
   avg      2.5      0.6      0.5      1.8      1.4      0.4      1.2      0.7      1.6      0.0      10.8       7.8+/-  1.6       6.5   explicit[s=2]
   avg      4.6      1.0      0.8      2.8      2.1      0.6      2.4      1.5      2.7      0.0      18.5      13.3+/-  2.8       5.3   explicit[s=4]
   avg      6.5      1.6      1.1      3.8      3.2      0.7      3.2      1.8      3.9      0.0      25.9      18.6+/-  4.1       4.8   explicit[s=6]
   avg      8.1      2.2      1.5      4.9      4.2      0.9      3.8      2.1      5.2      0.0      32.9      23.9+/-  4.6       4.6   explicit[s=8]

master (9628e8f)
Total Timing Summary
Average time for map loading = 566.258 ms
Average time for input loading = 7584.02 ms
Average time for lst::Event creation = 0.0034968 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     13.1      0.5      0.3      1.5      1.2      0.4      1.3      1.0      1.3      0.0      20.7       7.2+/-  1.3      22.8   explicit[s=1]
   avg      2.5      0.6      0.5      1.8      1.4      0.5      1.1      0.7      1.8      0.0      10.8       7.9+/-  1.6       6.5   explicit[s=2]
   avg      3.4      1.0      0.8      2.8      2.2      0.6      2.0      1.1      3.0      0.0      16.9      13.0+/-  2.8       4.9   explicit[s=4]
   avg      5.7      1.6      1.2      4.0      3.3      0.7      3.0      1.6      4.4      0.0      25.4      18.9+/-  3.4       4.7   explicit[s=6]
   avg     10.0      1.9      1.3      4.8      4.7      0.9      4.3      2.4      5.5      0.0      35.8      24.9+/-  5.1       5.0   explicit[s=8]

@ariostas ariostas force-pushed the ariostas/integration_pr_followups branch 3 times, most recently from 7278857 to 9444e18 Compare December 18, 2024 21:47
@ariostas
Copy link
Member Author

Sorry for all the force-pushing.

I made the code compatible with both kVerticalModuleSlope and infinity, so that we can change the data files without breaking anything. Once that is done we can fully remove it.

I think the last low hanging fruit that I'll include here is to set concrete dimensions instead of templated types for kernels and alpaka functions.

@ariostas ariostas force-pushed the ariostas/integration_pr_followups branch from 2a034e8 to 0da8474 Compare December 20, 2024 16:23
@ariostas ariostas changed the title Integration PR followups: make_workdiv, uniform_elements, ... Integration PR followups: make_workdiv, uniform_elements, concrete kernel dimensions Dec 20, 2024
@ariostas ariostas marked this pull request as ready for review December 20, 2024 16:27
@slava77
Copy link

slava77 commented Dec 20, 2024

/run all

Copy link

There was a problem while building and running in standalone mode. The logs can be found here.

Copy link

There was a problem while building and running with CMSSW. The logs can be found here.

@slava77
Copy link

slava77 commented Dec 20, 2024

There was a problem while building and running in standalone mode. The logs can be found here.

I couldn't parse the error to the point of understanding where a fix is needed

RecoTracker/LSTCore/src/alpaka/Hit.h:87:63:   required from here
/cvmfs/cms.cern.ch/el8_amd64_gcc12/external/alpaka/1.1.0-aba90e6b0efd37975ff68417e953fa78/include/alpaka/workdiv/Traits.hpp:36:81: 
error: incomplete type 
'alpaka::trait::GetWorkDiv<alpaka::WorkDivUniformCudaHipBuiltIn<std::integral_constant<long unsigned int, 1>, unsigned int>, alpaka::origin::Thread, alpaka::unit::Elems, void>'
 used in nested name specifier
   36 |         return trait::GetWorkDiv<ImplementationBase, TOrigin, TUnit>::getWorkDiv(workDiv);
      |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~

@ariostas
Copy link
Member Author

ariostas commented Dec 23, 2024

We had some issues with our #includes since GPU code was showing up in CPU-only parts (without all the Alpaka flags being set appropriately). I didn't notice it because it compiles fine for the CPU backend, which is what I was using for testing. Should be all good now.

/run all

Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     46.5    393.0    188.5    157.4    164.4    508.9    122.1    230.1    146.0      3.7    1960.7    1405.3+/- 388.9     519.0   explicit[s=4] (target branch)
   avg     45.1    388.7    189.8    156.2    153.5    701.3    126.9    248.2    176.3      3.4    2189.2    1442.9+/- 393.2     575.6   explicit[s=4] (this PR)

Copy link

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@slava77
Copy link

slava77 commented Dec 23, 2024

I updated the master branch after the merge of #140 .
There is a conflict now in RecoTracker/LSTCore/interface/alpaka/Common.h

@ariostas ariostas force-pushed the ariostas/integration_pr_followups branch from f6a499c to 4f9d7f9 Compare December 30, 2024 14:44
@ariostas
Copy link
Member Author

There is a conflict now in RecoTracker/LSTCore/interface/alpaka/Common.h

Thank you, Slava. It's fixed now.

/run all

Copy link

There was a problem while building and running in standalone mode. The logs can be found here.

@ariostas ariostas force-pushed the ariostas/integration_pr_followups branch from 4f9d7f9 to c8b78fd Compare December 30, 2024 15:07
Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     45.3    391.9    188.6    151.9    146.0    543.9    121.6    234.5    151.9      3.3    1979.0    1389.8+/- 385.8     523.7   explicit[s=4] (target branch)
   avg     47.1    388.3    194.2    156.3    152.6    701.7    127.6    245.6    177.0      4.1    2194.4    1445.6+/- 397.1     580.8   explicit[s=4] (this PR)

Copy link

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.


#include "write_lst_ntuple.h"

using namespace ALPAKA_ACCELERATOR_NAMESPACE::lst;

// copied from Tripelet.h
float computeRadiusFromThreeAnchorHits(float x1, float y1, float x2, float y2, float x3, float y3, float& g, float& f) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather we do not proliferate copy-pasted code, especially complex cases. Was the reason to avoid include of kernels code?
Perhaps move this method to e.g. RecoTracker/LSTCore/interface/Circle.h and use in the Triplet.h as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants