Make the I/O functionality use the podio provided functionality #69

tmadlener · 2021-11-18T15:53:38Z

For historical reasons the I/O implementations of standalone podio and the one that is present here have diverged a bit and so now they over in principle the same but still slightly different functionality. I think the framework should use the podio facilities as far as possible, and I originally thought that this would be a somewhat mechanical but in the end straight forward thing to do. However, I have realized that there is a bit more work involved, so that I am recording my observations here. In the end, I think that changes to podio are also necessary and that it might be best to first stabilize the interfaces podio offers before we actually start to work on this here.

High level functionality differences

The following table gives an overview of the things were podio and k4FWCore differ in high level (i.e. user perceivable) functionality

	podio	k4FWCore
`vector` user data	`UserDataCollection<T>` (since `v00-14`, compile time limited `T`)	`DataHandle<std::vector<T>>` (dictionary limited `T`, may fail silently(?) on I/O). There is #25 that might impact the usefulness of this feature(?)
other user data	N/A	`DataHandle<T>` (dedicated handling of ints and floats, but in principle again ROOT dictionary limited)

I am not entirely sure how widespread the usage of these features is throughout the Key4hep components. Hence, it is also hard to gauge whether some of the functionality could be easily removed from k4FWCore (e.g. the possibility to store single int/float values per event). This is something that probably needs discussing.

Technicalities

In k4FWCore the PodioDataSvc is handling the actual reading of the collections, it holds a podio::ROOTReader and a podio::EventStore as members that do the heavy lifting in this regard. The k4DataSvc is in essence a very thin wrapper around the PodioDataSvc that exposes the filename(s) as property to be configured from the options file. The PodioInput algorithm is responsible for actually triggering the reading of the collections (that are specified as a property) in its execute method. For this it just loops over the list of collections to read and calls PodioDataSvc::readCollection. For writing collections there is the PodioOutput algorithm, that basically re-implements the functionality of the podio::ROOTWriter. It holds a KeepDropSwitch to control which collections to actually write to file. In all of this there are a few subtle differences between podio and k4FWCore that make a "trivial" switch to podio facilities impossible. The following table provides a (probably incomplete) overview of them:

	podio	k4FWCore
output collection handling	collections are created once via `EventStore::create` and then simply cleared in the event loop after writing.	collections are re-created every event and the `EventStore` in `PodioDataSvc` never gets to know about them
event data tree	owned by `ROOTWriter`, no outside access	owned by `PodioDataSvc`. Possible to get access to it
output collection writing	`ROOTWriter::registerForWrite` collects a list of collection names to write. Checks in `EventStore` if collections are actually available before adding them to the list. In event loop simply take this list and write (i.e. set branch addresses) and fill the event data tree.	In every call to `PodioOutput::execute` get the complete list of collections from `PodioDataSvc` and check via the `KeepDropSwitch` which collections to write, before setting branch addresses and filling the event data tree.
branch creation for user data	`UserDataCollections` are handled the same as other collections	`DataHandle` creates necessary branches as it also has access to the `PodioDataSvc` (and the event data tree therein). The `DataHandle` also makes sure to do the proper branch address re-setting.
file level meta data	N/A	`PodioOutput` writes the options file config into a separate branch of the meta data tree
I/O file formats	ROOT and SIO. (probably incomplete) abstract `IReader` interface for reading. Separate writer implementations (with equal interfaces)	Only ROOT, but at least for reading a switch to the `IReader` interface should enable reading SIO out of the box

In the end to get everything working the same and using the same facilities, some discussion is required to decide which functionality needs to be supported from podio, which functionality can be built on top of podio here, and most importantly how the interfaces have to look like to enable all this functionality.

The text was updated successfully, but these errors were encountered:

vvolkl · 2021-11-19T08:34:58Z

Hi Thomas, thanks for this comprehensive issue. I think in the end it's simpler than the tables make it seem: the UserData functionality can completely replace what is used now for writing out vector etc. The only thing that I see missing on the podio side is something to allow the reader to ignore certain collections in the end store when writing as was done here with the KeepDropSwitch, but any implementation/interface for that is fine.

tmadlener · 2021-11-19T08:49:02Z

Hi Valentin,
Yes, I agree it probably looks worse than it actually is. I think the major problem for a straight forward migration is the difference in the ownership of the event data tree. It is not yet entirely clear to me how I can make the EventStore in the PodioDataSvc aware of the collections that are created during an event. My attempts so far have not succeeded in that, because collections are recreated every event, but podio currently foresees the creation/registration with the EventStore only once per collection.

The only thing that I see missing on the podio side is something to allow the reader to ignore certain collections in the end store when writing as was done here with the KeepDropSwitch, but any implementation/interface for that is fine.

That could be achieved by only registering the collections that should be kept with the writer (registerForWrite has to be called for every collection that should be written once before the first event is written). In my first approach I simply did this when writing the first event. That seemed to have worked (except for the fact, that I wasn't able to connect the branches to the collections properly)

tmadlener · 2023-10-06T15:18:14Z

This should be done with #100

tmadlener added the enhancement New feature or request label Nov 18, 2021

vvolkl mentioned this issue Nov 20, 2021

Compile error due to getReadCollections() removed in k4fwcore key4hep/k4MarlinWrapper#53

Closed

hegner mentioned this issue Apr 21, 2023

podio::Frame based I/O #94

Closed

tmadlener closed this as completed Oct 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the I/O functionality use the podio provided functionality #69

Make the I/O functionality use the podio provided functionality #69

tmadlener commented Nov 18, 2021

vvolkl commented Nov 19, 2021

tmadlener commented Nov 19, 2021

tmadlener commented Oct 6, 2023

Make the I/O functionality use the podio provided functionality #69

Make the I/O functionality use the podio provided functionality #69

Comments

tmadlener commented Nov 18, 2021

High level functionality differences

Technicalities

vvolkl commented Nov 19, 2021

tmadlener commented Nov 19, 2021

tmadlener commented Oct 6, 2023