You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Very nice, I'm currently experimenting with snakemake to see if it might be good to switch to a pipeline tool with a large user base. Would be interesting to see if there is an integration that can check the format.
Independent of that we could even think of having some code somewhere that generates the schemas, like schemas.create_persons(additional = "income").validate(df_persons), with some standard attributes that need to be there plus optional ones if needed
O_o snakemake looks quite interesting indeed ! joining a broader "pipeline" community would make a lot of sense.
regarding the 2nd point I think I would prefer defining everything inside the script but I see how that might lead to a certain amount of code duplication (if df_persons structure doesn't change much across many scripts for exemple...).
FYI, I'm using pandera right now in another pipeline, and I find it very verbose if you want to validate the whole dataframe at every stage... I'll have a better opinion in a few weeks
I think it would be a good idea to use Pandera to describe and check the input dataframes of a given stage at runtime.
It has the benefit of :
I don't think it can or should be be imposed in every existing stage but it can be strongly encouraged by the community.
For exemple :
The text was updated successfully, but these errors were encountered: