Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of workflow systems into physics analysis and the scientific python ecosystem #31

Open
pfackeldey opened this issue Jul 25, 2024 · 7 comments
Labels
2024 PyHEP.dev 2024 workflows see #4

Comments

@pfackeldey
Copy link
Collaborator

pfackeldey commented Jul 25, 2024

#31 (comment):

Typical HEP analysis (i.e. at the LHC) comprise a vast amount of steps with non-trivial dependencies between those. Here, one can use workflow tools, e.g. https://github.com/spotify/luigi, to describe & execute these steps and their dependencies. This is not directly related to the heavy batch processing that is typically done using e.g. Dask / HTCondor / Slurm as it represents only a subset of steps of a whole analysis.

@pfackeldey pfackeldey added workflows see #4 2024 PyHEP.dev 2024 labels Jul 25, 2024
@eduardo-rodrigues
Copy link
Member

Interested 👍.

@JonasEppelt
Copy link

We (@AlexanderHeidelbach and I) recently took over the maintenance and development of b2luigi for the Belle II collaboration.
Therefore we would be very interested in exchanging ideas and experiences on this topic and are looking for overlap or maybe opportunities for collaboration.

@bfis
Copy link

bfis commented Jul 31, 2024

I'm interested in this topic, but with a particular focus on enabling greater flexibility & reusability (in the context of physics analyses) by addressing specific shortcomings in the underlying structures, in particular luigi's handling of parameters and dependencies.
That means, I'm not particularly focused on the integration aspect, but rather the idea that what can/will be integrated should be iterated/improved upon before it is too late and to avoid unnecessary baggage.

@ynikitenko
Copy link

Dear topic starter,

it would be great if you could expand on what you mean with that topic.

I make a talk on an architectural framework for data analysis in Python, therefore I'm generally interested in this theme.

@ynikitenko
Copy link

It would be good to see some other examples of workflows and their comparisons (e.g. there are proposed discussions about dask, but there are also workload managers like Slurm).

@pfackeldey
Copy link
Collaborator Author

Dear @ynikitenko,
Typical HEP analysis (i.e. at the LHC) comprise a vast amount of steps with non-trivial dependencies between those. Here, one can use workflow tools, e.g. https://github.com/spotify/luigi, to describe & execute these steps and their dependencies. This is not directly related to the heavy batch processing that is typically done using e.g. Dask / HTCondor / Slurm as it represents only a subset of steps of a whole analysis.

@ynikitenko
Copy link

Dear @pfackeldey , thank you for a nice example. Would you be so kind as to maybe adding it to the starting message for easier navigation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024 PyHEP.dev 2024 workflows see #4
Projects
None yet
Development

No branches or pull requests

5 participants