Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove simID and recID fields from the *Association types #51

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

veprbl
Copy link
Member

@veprbl veprbl commented Sep 15, 2023

simID was originally introduced in eicd as there was no way to refer to a
EDM4hep collection at the time
8305134#diff-1d37563b8df596e24b79c561e07c4e0c63e1b501670cf25f22e1ecc21eec273fR532
Later, recID was introduced in 2940813, and possibly had to be used to
hack associations into an early versions of EICrecon. We don't need neither now: this information is already available via sim and rec relations.

@wdconinc
Copy link
Contributor

The reason these were introduced was also in part because in python it was not possible to load the relations branches with # in their name. Is that possible now?

@veprbl
Copy link
Member Author

veprbl commented Sep 15, 2023

The reason these were introduced was also in part because in python it was not possible to load the relations branches with # in their name. Is that possible now?

I haven't ran into any issues. If you have pointers to any reproducers, I can try.

@wdconinc
Copy link
Contributor

I think the main issue was the encoding of the associations, where the index had the format 10000*collectionID+index, which was the not possible to use as a proper array index. That seems fine now:

In [7]: print(file['events']['EcalEndcapNHits#0.index'].array())
[[0, 1], [0, 1, 2, 3, 4, 5, 6, ..., 118, 119, 120, 121, 122, 123], ..., [0, 1]]

@wdconinc
Copy link
Contributor

And also applied to collections:

In [17]: print(file['events']['EcalEndcapPInsertTruthClusterAssociations#0.index'].array())
[[0, 1, 2], [0, 1], [0, 1, 2, 3, ..., 27, 28, 29, 30], ..., [0, 1, 2, 3], [0]]

In [18]: print(file['events']['EcalEndcapPInsertTruthClusterAssociations#1.index'].array())
[[32, 31, 32], [139, 435], [2185, 893, ..., 72], ..., [25, 25, 25, 318], [77]]

@veprbl
Copy link
Member Author

veprbl commented Sep 15, 2023

Yes, collection ids and indices are separate branches (in the new frames podio format?)

@wdconinc
Copy link
Contributor

I agree this looks good. We can probably proceed as follows:

  • push this through in EICrecon already so analyzer codes can start to get updated in advance of data model change,
  • follow the EDM4eic change procedure by presenting at the next compsw meeting (likely two weeks).

@veprbl
Copy link
Member Author

veprbl commented Sep 15, 2023

We don't have to rush EICrecon changes ahead of this, I don't mind updating after a delayed review.

@veprbl
Copy link
Member Author

veprbl commented Sep 27, 2023

@bspage912 brought up that it's TTreeArray that might not be able to handle pound symbols.

The suggestion from @sly2j was to postpone this until a user-friendly approach is found for working with relations (it is appreciated that other relations in EDM4eic do not receive any new "*ID" fields of their own). The users are to be retrained at that later stage. @mdiefent reminds us about principle of user-centric approach.

@bspage912
Copy link

I performed a quick check and TTreeReader (used within compiled c++) can handle branch names with pound symbols, so at least that shouldn't be an issue. I also confirm I see the same values in the *#0 and *#1.index fields as are in recID and simID.

veprbl added 2 commits July 26, 2024 11:35
simID was originally introduced in eicd as there was no way to refer to a
EDM4hep collection at the time
https://github.com/eic/EDM4eic/blame/7299aba54097a3f2a17c0242731c9dabe68fb450/eic_data.yaml#L546-L547
Later, recID was introduced for convenience, and possibly had to be used to
hack associations into an early versions of EICrecon.  We don't need neither.
This information is already available via sim and rec relations.
@wdconinc wdconinc force-pushed the pr/rm_simID_recID branch from 2b39cb1 to d3b75aa Compare July 26, 2024 15:35
@wdconinc
Copy link
Contributor

Conflicts resolved, bumped next version to 7.0.0 since API breaking change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants