Skip to content
This repository has been archived by the owner on Dec 9, 2024. It is now read-only.

hitindex for pLS is a bit confusing #267

Open
sgnoohc opened this issue Apr 8, 2023 · 2 comments
Open

hitindex for pLS is a bit confusing #267

sgnoohc opened this issue Apr 8, 2023 · 2 comments

Comments

@sgnoohc
Copy link
Contributor

sgnoohc commented Apr 8, 2023

Because hit index is already referenced against hitsInGPU.idxs here:

TrackLooper/SDL/Event.cu

Lines 678 to 681 in 398270a

hits1[0] = hitsInGPU.idxs[mdsInGPU.anchorHitIndices[innerMDIndex]];
hits1[1] = hitsInGPU.idxs[mdsInGPU.anchorHitIndices[outerMDIndex]];
hits1[2] = hitsInGPU.idxs[mdsInGPU.outerHitIndices[innerMDIndex]];
hits1[3] = hitsInGPU.idxs[mdsInGPU.outerHitIndices[outerMDIndex]];

Downstream it gets confusing.

TrackLooper/SDL/Event.cu

Lines 678 to 681 in 398270a

hits1[0] = hitsInGPU.idxs[mdsInGPU.anchorHitIndices[innerMDIndex]];
hits1[1] = hitsInGPU.idxs[mdsInGPU.anchorHitIndices[outerMDIndex]];
hits1[2] = hitsInGPU.idxs[mdsInGPU.outerHitIndices[innerMDIndex]];
hits1[3] = hitsInGPU.idxs[mdsInGPU.outerHitIndices[outerMDIndex]];

There are two hit indexs (1) within GPU memory, (2) within CMSSW / tracking Ntuple.
(1) idxs is internally used, and then we obtain (1) -> (2) mapping via hitsInGPU.idxs[].
But for pLSs, where the internal indexs should be saved, it already saved the dereferenced version (1) -> (2) into the internal memory.

So far this has not caused a problem, because in the special case of # of outer tracker hits being > N inner tracker hits used in the pLSs, we are fine.
However, in the case tha N outer tracker hits < inner tracker hits from pLSs, we may be in trouble.

@VourMa
Copy link
Contributor

VourMa commented Apr 28, 2023

I have a couple of questions on this issue and I would like to also bring @tresreid to the discussion, as he may understand a bit better the situation:

  1. I can see that segmentsInGPU.pLSHitsIdxs.w/x/y/z are only used in the checkHitspLS kernel, where the order of pLS hits doesn't seem to matter. If that's true, are we safe to change the order to the proper one (0 1 2 3), instead of (0 2 1 3), as mentioned here?
  2. I am not sure what you mean by "dereferencing". Can you elaborate on what this means and where it is done and how it differs for the IT and OT hits? Will the situation be resolved if we didn't do the "dereferencing" for the pLS hits?
  3. I don't understand the last part of your text. Where/How does the number of the OT hits interfere with the "dereferencing" of the IT hits?

Maybe my confusion comes from the fact that you quoted twice the same piece of code, while you wanted to quote something else in the second case?

@VourMa
Copy link
Contributor

VourMa commented Apr 29, 2023

After studying a bit more point 2, I think I reached the following understanding:

Situation for OT hits

  • The hitIdxs going into the TC collection for the OT objects come from the hitIdxs of lower level objects. Following the path for T5s:
    • addTrackCandidateToMemory(trackCandidatesInGPU, 4/*track candidate type T5=4*/, quintupletIndex, quintupletIndex, &quintupletsInGPU.logicalLayers[5 * quintupletIndex], &quintupletsInGPU.lowerModuleIndices[5 * quintupletIndex], &quintupletsInGPU.hitIndices[10 * quintupletIndex], quintupletsInGPU.regressionG[quintupletIndex], quintupletsInGPU.regressionF[quintupletIndex], quintupletsInGPU.regressionRadius[quintupletIndex], trackCandidateIdx, quintupletIndex);
    • quintupletsInGPU.hitIndices[10 * quintupletIndex] = tripletsInGPU.hitIndices[6 * innerTripletIndex];
      quintupletsInGPU.hitIndices[10 * quintupletIndex + 1] = tripletsInGPU.hitIndices[6 * innerTripletIndex + 1];
      quintupletsInGPU.hitIndices[10 * quintupletIndex + 2] = tripletsInGPU.hitIndices[6 * innerTripletIndex + 2];
      quintupletsInGPU.hitIndices[10 * quintupletIndex + 3] = tripletsInGPU.hitIndices[6 * innerTripletIndex + 3];
      quintupletsInGPU.hitIndices[10 * quintupletIndex + 4] = tripletsInGPU.hitIndices[6 * innerTripletIndex + 4];
      quintupletsInGPU.hitIndices[10 * quintupletIndex + 5] = tripletsInGPU.hitIndices[6 * innerTripletIndex + 5];
      quintupletsInGPU.hitIndices[10 * quintupletIndex + 6] = tripletsInGPU.hitIndices[6 * outerTripletIndex + 2];
      quintupletsInGPU.hitIndices[10 * quintupletIndex + 7] = tripletsInGPU.hitIndices[6 * outerTripletIndex + 3];
      quintupletsInGPU.hitIndices[10 * quintupletIndex + 8] = tripletsInGPU.hitIndices[6 * outerTripletIndex + 4];
      quintupletsInGPU.hitIndices[10 * quintupletIndex + 9] = tripletsInGPU.hitIndices[6 * outerTripletIndex + 5];
    • TrackLooper/SDL/Triplet.cu

      Lines 187 to 192 in ce2f36b

      tripletsInGPU.hitIndices[tripletIndex * 6] = mdsInGPU.anchorHitIndices[firstMDIndex];
      tripletsInGPU.hitIndices[tripletIndex * 6 + 1] = mdsInGPU.outerHitIndices[firstMDIndex];
      tripletsInGPU.hitIndices[tripletIndex * 6 + 2] = mdsInGPU.anchorHitIndices[secondMDIndex];
      tripletsInGPU.hitIndices[tripletIndex * 6 + 3] = mdsInGPU.outerHitIndices[secondMDIndex];
      tripletsInGPU.hitIndices[tripletIndex * 6 + 4] = mdsInGPU.anchorHitIndices[thirdMDIndex];
      tripletsInGPU.hitIndices[tripletIndex * 6 + 5] = mdsInGPU.outerHitIndices[thirdMDIndex];
  • The path above leads the hitIdxs from MDs. For OT objects, these come from:
    • unsigned int anchorHitIndex, outerHitIndex;
      if(modulesInGPU.moduleType[lowerModuleIdx] == PS and modulesInGPU.moduleLayerType[lowerModuleIdx] == Strip)
      {
      mdsInGPU.anchorHitIndices[idx] = upperHitIdx;
      mdsInGPU.outerHitIndices[idx] = lowerHitIdx;
      anchorHitIndex = upperHitIdx;
      outerHitIndex = lowerHitIdx;
      }
      else
      {
      mdsInGPU.anchorHitIndices[idx] = lowerHitIdx;
      mdsInGPU.outerHitIndices[idx] = upperHitIdx;
      anchorHitIndex = lowerHitIdx;
      outerHitIndex = upperHitIdx;
      }
    • addMDToMemory(mdsInGPU,hitsInGPU, modulesInGPU, lowerHitArrayIndex, upperHitArrayIndex, lowerModuleIndex, dz, dphi, dphichange, shiftedX, shiftedY, shiftedZ, noShiftedDz, noShiftedDphi, noShiftedDphiChange, mdIndex);
    • unsigned int lowerHitArrayIndex = loHitArrayIndex + lowerHitIndex;
      float xLower = hitsInGPU.xs[lowerHitArrayIndex];
      float yLower = hitsInGPU.ys[lowerHitArrayIndex];
      float zLower = hitsInGPU.zs[lowerHitArrayIndex];
      float rtLower = hitsInGPU.rts[lowerHitArrayIndex];
      unsigned int upperHitArrayIndex = upHitArrayIndex+upperHitIndex;
      float xUpper = hitsInGPU.xs[upperHitArrayIndex];
      float yUpper = hitsInGPU.ys[upperHitArrayIndex];
      float zUpper = hitsInGPU.zs[upperHitArrayIndex];
      float rtUpper = hitsInGPU.rts[upperHitArrayIndex];
      i.e. the hitIdxs correspond to the "internal" hitIdxs, that can be used for derefenrecing the "CMSSW" hitIdxs from here:
      cudaMemcpyAsync(hitsInGPU->idxs, &idxInNtuple[0], nHits*sizeof(unsigned int), cudaMemcpyHostToDevice, stream);

Situation for IT hits

  • The pLSs are added to memory in a different way compared to the OT objects. More specifically, the path followed is:
    • addPixelSegmentToMemory(segmentsInGPU, mdsInGPU, modulesInGPU, innerMDIndex, outerMDIndex, pixelModuleIndex, hits1, hitIndices0[tid], hitIndices2[tid], dPhiChange[tid], ptIn[tid], ptErr[tid], px[tid], py[tid], pz[tid], etaErr[tid], eta[tid], phi[tid], charge[tid], seedIdx[tid], pixelSegmentIndex, tid, superbin[tid], pixelType[tid],isQuad[tid],score_lsq);
      where the hitIdxs are taken already dereferenced to the "CMSSW" version:

      TrackLooper/SDL/Event.cu

      Lines 669 to 672 in ce2f36b

      hits1[0] = hitsInGPU.idxs[mdsInGPU.anchorHitIndices[innerMDIndex]];
      hits1[1] = hitsInGPU.idxs[mdsInGPU.anchorHitIndices[outerMDIndex]];
      hits1[2] = hitsInGPU.idxs[mdsInGPU.outerHitIndices[innerMDIndex]];
      hits1[3] = hitsInGPU.idxs[mdsInGPU.outerHitIndices[outerMDIndex]];
  • At the same time, the MDs for pLSs are handled in the same way as those of the OT:

    TrackLooper/SDL/Event.cu

    Lines 659 to 660 in ce2f36b

    addMDToMemory(mdsInGPU, hitsInGPU, modulesInGPU, hitIndices0[tid], hitIndices1[tid], pixelModuleIndex, 0,0,0,0,0,0,0,0,0,innerMDIndex);
    addMDToMemory(mdsInGPU, hitsInGPU, modulesInGPU, hitIndices2[tid], hitIndices3[tid], pixelModuleIndex, 0,0,0,0,0,0,0,0,0,outerMDIndex);
    where the "internal" hitIdxs are added.
  • When the segmentsInGPU.pLSHitsIdxs are used to get the pLS hitIdxs (as done in PR Added one function compute pt eta phi for TC #269 here), the "CMSSW" version is taken, instead of the "internal" one, as done for the rest of the objects. This can be fixed downstream by doing this trick:
    unsigned int hitidx_in_GPU = trackCandidatesInGPU.hitIndices[14 * idx + i];
    unsigned int hitidx_original = hitsInGPU.idxs[hitidx_in_GPU];
    unsigned int hitidx = type == pLS ? hitidx_in_GPU : hitidx_original; // trackCandidatesInGPU.hitIndices for pLS's are behaving weirdly (see: issue #267)

Some parting thoughts

  • Given the above, it seems like the situation for pLSs is actually simpler than the one for OT hits, where the dereferencing is required. But I am still missing the point of the last part of the initial issue text (point 3 above), so maybe that shifts the tide towards the OT hitIdx handling to be better.
  • One easy fix to homogenize (but probably for the worse otherwise) would be to pass the "internal" hitIdxs here:
    addPixelSegmentToMemory(segmentsInGPU, mdsInGPU, modulesInGPU, innerMDIndex, outerMDIndex, pixelModuleIndex, hits1, hitIndices0[tid], hitIndices2[tid], dPhiChange[tid], ptIn[tid], ptErr[tid], px[tid], py[tid], pz[tid], etaErr[tid], eta[tid], phi[tid], charge[tid], seedIdx[tid], pixelSegmentIndex, tid, superbin[tid], pixelType[tid],isQuad[tid],score_lsq);
    Still, I am not sure what we would gain from that...
  • Note that, the way we would have to use for getting the hitIdxs for OT objects now:
    unsigned int hitidx_in_GPU = trackCandidatesInGPU.hitIndices[14 * idx + i];
    unsigned int hitidx_original = hitsInGPU.idxs[hitidx_in_GPU];
    unsigned int hitidx = type == pLS ? hitidx_in_GPU : hitidx_original; // trackCandidatesInGPU.hitIndices for pLS's are behaving weirdly (see: issue #267)
    also requires the hit arrays to be copied from the GPU to the CPU for getting the output in CMSSW, apart from the TC arrays. My last timing measurements showed that the TC arrays copies take ~0.02 ms, while the hit arrays copies take ~0.55 ms. The latter could be reduced down to ~0.1 ms, if we only copy the hitsInCPU->idxs array, which is actually the only array needed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants