Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remapping categorical variables in base and tidy #38

Open
markvanderloo opened this issue Aug 26, 2022 · 7 comments
Open

remapping categorical variables in base and tidy #38

markvanderloo opened this issue Aug 26, 2022 · 7 comments

Comments

@markvanderloo
Copy link

Hi Norm,

I just read your base-vs-tidy document. Very insightful, thank you! I'm sharing it with our internal R users.

Regarding the last section on relabeling classes. Have you considered this option?

map <- c()
map[3:5] <- c("three", "four", "five")
mtcars$gear <- map[mtcars$gear]

Using named vectors as a way to map values onto each other has been an extremely useful trick for me in many cases, although I admit that it may not be very intuitive for beginning users.

A more common use case seems to me is mapping character to character, in which case one simply does something like

map <- c("a"="alpha", "b"="beta")
map[letters]

you even get NA for unmapped values, which seems to me desired behavior in many cases.

Thanks again for writing all this up!
Best,
Mark

@danielreispereira
Copy link

I still think that the nested ifelse() calls is preferable over the case_when() construct. After all, that is what an average human would do on Excel: recode the levels manually with IF()s.

Teaching case_when() is suggesting that instead of sitting down and thinking for 10min one should go for the documentation of a yet another function.

The point is that the R-base documentation is not as sexy, and does not appeal to newcomers.

@BroVic
Copy link

BroVic commented Aug 30, 2022

@markvanderloo that's a very useful trick! I think we need a function in Base-R that does this efficiently.

@dusadrian
Copy link

dusadrian commented Aug 30, 2022

Another solution in base R:

library(admisc)
recode(mtcars$gear, "3 = three; 4 = four; 5 = five")
#  [1] "four"  "four"  "four"  "three" "three" "three" "three" "four"  "four"  "four"  "four" 
# [12] "three" "three" "three" "three" "three" "three" "four"  "four"  "four"  "three" "three"
# [23] "three" "three" "three" "four"  "five"  "five"  "five"  "five"  "five"  "four"

Or, since this is a categorical variable, why not properly declare it as categorical:

library(declared)
gear <- declared(mtcars$gear, label = "Number of gears", labels = c("Three" = 3, "Four" = 4, "Five" = 5))
gear
# <declared<integer>[32]> Number of gears
#  [1] 4 4 4 3 3 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 4 5 5 5 5 5 4
# 
# Labels:
#  value label
#      3 Three
#      4  Four
#      5  Five

w_table(gear)

#       fre    rel   per   cpd
#       ----------------------
# Three  15  0.469  46.9  46.9 
#  Four  12  0.375  37.5  84.4 
#  Five   5  0.156  15.6 100.0 
#       ----------------------
#        32  1.000 100.0

@BroVic
Copy link

BroVic commented Aug 30, 2022

This is another nice approach but I think this is so basic that we ought to have it natively in the language. Or does anyone know of such a solution? I remember trying to get this done with the replace function.

@dusadrian
Copy link

@BroVic: that requires a question about what "base" R means: is it anything which is not using tidy, or is it the R package base?

I believe this is about the difference between the "normal" R language (including contributed R packages that use the normal R language), and the "tidy" R language / dialect.

It would be pointless to expect solving anything using just the base package: this is why contributed packages exist, in the first place.

@BroVic
Copy link

BroVic commented Aug 30, 2022

@dusadrian Yes, I agree with you. But if you ask me, an operation as basic as this should be available by default.

@matloff
Copy link
Owner

matloff commented Mar 11, 2023

Once again, keep it simple. I think use of named vectors falls into the realm, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants