-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add proprow and rownumber #2556
Conversation
@nalimilan - as usual please comment on the naming style 😄. |
I came to the conclusion that instead of adding @pdeffebach - would this meet your use cases? (it is consistent with SQL |
Also the good thing (I hope you agree) is that we have |
As suggested by @nalimilan instead of Then we could allow For this to work in DataAPI.jl we probably should introduce The only problem is that FreqTables.jl defines @nalimilan - can you comment here please what you think before I post it on #data in Slack? EDIT: we could move the whole definition of EDIT2: actually in DataFrames.jl we will not use the default implementation of |
Yes, the definition of |
I don't think this proposal has a very clear mental model. The reason to special case with
|
That is why
This is a good point as in general
and it requires two steps (not the end of the world but still). Actually a crazy idea would be to allow:
then one would write:
|
Ah, right, I guess the choice between these should depend on whether we expect users to want to apply a variety of post-processing operations, or whether |
I would say that I have it implemented already (would need to test the corner cases, but in general I have worked it out hot to do it safely and efficiently). The major question is if post-processing should be applied before column expansion (this is what I do now) or after it. But I think that before. This is how I implemented However, my hesitation is that if
Here probably @pdeffebach is the best one to answer. |
My reading of this PR is that in the long term, I would like singleton structs that have dispatch. Something like
I worry that adding w.r.t
Same with |
Given everyone has slept over this issue a bit what is our current thinking? There are four things that are in general requested:
My thinking is that I would narrow down this PR to point 1. (adding Please comment what you think. Thank you! |
Agree, |
What about |
It could actually be |
Yes, I'm not sure which is best. We currently have |
OK |
I would find |
Thank you for commenting. Actually, after thinking about it I would not add So passing as
@nalimilan - what do you think of this? |
Also for
we could use |
To summary I would add in this PR the following given the feedback:
The only reservation I have is the following - do we need to expand the API, or it is enough to add tutorials showing how to do these things. In sequence.
and now you have a column in your The generality of So in summary - do we feel that it is worth to expand the API with the new options? |
Maybe it's worth figuring out how a user might add functions that operate like this, on the fly? Having an API for |
We might extend the syntax to allow |
Using I'd say it's also OK to special-case Regarding |
I was also thinking about this. Realistically I think it is highly unlikely we will start allowing such columns in the short term.
we could use
Do you see any of these important to be added before 1.0 release? |
But is
I'd say 1.0 is only about API stability. No features are really essential to have in that release. Some are nice to have for communication purposes (e.g. multithreading is shiny), but features discussed here can wait IMO. |
OK. I am moving a milestone on this. |
Sorry if I'm injecting noise into the discussion, but I was wondering, could it be more general to add a combine(groupby(df, cols), OnIndices(f) => dest) # same for `transform` and `select` applies Or is the idea that there are so few relevant functions one may wish to compute with group indices that it's possible to special case all of them? |
Thank you for commenting.
This was the idea. The objective is to make the most common functions easy to access for newcomers. I have this issue on my radar and eventually it will be resolved. |
I would need to rewrite this PR from scratch so I would close it, but before I do it let us discuss what extra features in the mini-language we need? in particular do we need |
|
The difference vs. dplyr is that our In general - defining |
Starting with a restricted version would be OK, but I'm not sure that's really needed. I'd say the definition of I'm not sure about skipping missing values in the per-group results. The only case where that can happen is if you do |
This is not that easy unfortunately. Consider the following:
would you expect |
Having thought about it I propose to add the following syntaxes:
The rationale for this is that they all follow the same rule as I would not add In the future we can re-consider adding general @nalimilan + @pdeffebach + @jkrumbiegel -> can you please comment if you accept it? I will implement this PR next week. |
In #3001 I propose the implementation, so I am closing this PR. |
FWIW, I just discovered that ggplot2 has added an |
This should be merged after #2554 is merged and then NEWS.md should be updated.
Other than that the PR should be ready.