You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to propose a mutate function, similar to pandas' assign function, but more flexible - it will also serve as replacement for the transform functions
Example API
df.mutate(y='sum',n=lambdadf: df.nth(1))
df.mutate(y='sum',n=lambdadf: df.nth(1), by='x')
# replicate dplyr's across# https://stackoverflow.com/q/63200530/7175713# select_columns syntax can fit in nicely heremtcars.mutate(("*t", "mean"), ("*p", "sum"), {"cyl": lambdadf: df+1, "new_col": lambdadf: df.select_columns("*t").sum(axis=1))
The text was updated successfully, but these errors were encountered:
for multiple columns, we use a tuple of three args (cols, func, names) - where cols is the cols we wish to select, func is a function or list/tuple of functions, while col names is how the col will be renamed, either flattened for a MultiIndex, or prefix/suffix added. For the names, we'll use an f-string format sort of. - the idea is borrowed from R's dplyr's across function.
for single columns, we can pass that as a dictionary, or use pandas named agg for more control of the output renaming
so far, tests i've conducted show that building a dictionary and passing it to pandas transform/apply/agg/assign deliver faster performance compared to what I came up with for mutate. maybe someone else comes up with a cleaner API, which is fast as well.
Brief Description
I would like to propose a
mutate
function, similar to pandas'assign
function, but more flexible - it will also serve as replacement for thetransform
functionsExample API
The text was updated successfully, but these errors were encountered: