Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A small comment about new variables in data frames #42

Open
justinmcgrath opened this issue Oct 7, 2022 · 9 comments
Open

A small comment about new variables in data frames #42

justinmcgrath opened this issue Oct 7, 2022 · 9 comments

Comments

@justinmcgrath
Copy link

justinmcgrath commented Oct 7, 2022

Some examples give this code for creating new variables in data frame:
mtcars$hwratio <- mtcars$hp / mtcars$wt.

The corresponding Tidy version is this:
mtcars %>% mutate(hwratio=hp/wt) -> mtcars.

Maybe it's not a "beginner" topic, but I think the typical base R way is this:
mtcars <- within(mtcars, hwratio <- hp/wt).

There's really no difference between that and the Tidy version, since as far as I can tell, mutate and within do the same thing. Tidy insists the %>% operator though. Without that, it's nearly identical (mtcars <- mutate(mtcars, hwaatio=hp/wt)), and one wonders why you would add a slew of dependencies simply to rename "within" to "mutate".

@dusadrian
Copy link

dusadrian commented Oct 7, 2022

I've always wondered why does within() return a copy of the object, instead of modifying the object.
My natural expectation, when I specify something should happen "within", it to actually happen, and not bother with overwriting the object.

This should / could have been enough, IMO:

within(mtcars, hwratio <- hp/wt)

EDIT: I've actually put together a quick function in the development version of package admisc, and it seems to work:

mt <- mtcars
inside(mt, hwratio <- hp/wt)

dim(mtcars) # 32 11

dim(mt) # 32 12

@botanybay
Copy link

Within the tidyverse, I would go for the shorthand:

mtcars %<>% mutate(hwratio=hp/wt)

However, there is no reason to do so, because a very simple assignment in base R does the trick neatly.

However, a pipe over multiple lines may be something where I would go for a "tidy" version. I can add a single line or out-comment one, and still have linearly legible code. 🤷

@BroVic
Copy link

BroVic commented Oct 7, 2022

I think, strictly speaking, tidy proponents don't even favour the %<>% pipe assignment. Is this operator even exported by the tidyverse packages? I don't recall it being so...

@justinmcgrath
Copy link
Author

I've always wondered why does within() return a copy of the object, instead of modifying the object. My natural expectation, when I specify something should happen "within", it to actually happen, and not bother with overwriting the object.

This should / could have been enough, IMO:

within(mtcars, hwratio <- hp/wt)

EDIT: I've actually put together a quick function in the development version of package admisc, and it seems to work:

mt <- mtcars
inside(mt, hwratio <- hp/wt)

dim(mtcars) # 32 11

dim(mt) # 32 12

One of the easiest ways to make a program difficult to understand is to modify an object without using an assignment operator. That is deliberately difficult in R.

@dusadrian
Copy link

I agree, but really, what is the purpose of creating a new variable without overwriting the object?
If I am not grossly mistaken, overwriting the object always happens and hence it should be superfluous.

These are all equivalent, in my mind:

mtcars$hwratio <- mtcars$hp / mtcars$wt

# two assignment operators are definitely more difficult to understand for beginners
mtcars <- with(mtcars, hwratio <- hp/wt) 

# perhaps this is more comprehensive
mtcars$hwratio <- with(mtcars, hp/wt)

# or even better
inside(mtcars, hwratio <- hp/wt)

Note the later does have an assignment operator that signals (or should signal) creating a new variable.

Anyways, if such a function is clearly documented, users should be aware and decide accordingly.
I for one will definitely use it from now on, it saves unnecessary typing and it's the simplest of all above.

@BroVic
Copy link

BroVic commented Oct 8, 2022

transform(mtcars, hwratio = hp/wt)?

@dusadrian
Copy link

dusadrian commented Oct 8, 2022

Oh my...
Thank you @BroVic, good to see there actually is something in base R that confirms my intuition.
EDIT: actually, this doesn't do anything different from within(). It does not modify the object the way inside() does, but returns a copy of the (modified) object.

@BroVic
Copy link

BroVic commented Oct 9, 2022

No it doesn't modify-in-place. I was just referring to the tidyverse discourse. 👍

I think the semantics in R does not provide an advantage to such modifications since, internally, every data frame is copied once it's modified.

@matloff
Copy link
Owner

matloff commented Mar 11, 2023

I really don't think use of within() is standard base R. I've never seen an R book or tutorial use it. And I certainly would not recommend teaching it, for exactly the same reason. Sorry for the long delay in replying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants