Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function for returning the original data frame with fold assignments appended? #468

Open
mikemahoney218 opened this issue Mar 6, 2024 · 1 comment

Comments

@mikemahoney218
Copy link
Member

mikemahoney218 commented Mar 6, 2024

Feature

Over on spatialsample, there have been a few requests (tidymodels/spatialsample#158, tidymodels/spatialsample#157) for a function that basically works like this:

library(rsample)
library(magrittr)
library(generics)

augment.rset <- function(rset, ..., fold_column = "fold") {
  purrr::list_rbind(
    purrr::map(
      seq_len(nrow(rset)),
      function(fold) {
        fold_members <- get_rsplit(rset, fold) %>%
          assessment()
        fold_members[[fold_column]] <- fold
        fold_members
      }
    )
  )
}

vfold_cv(Orange) %>%
  augment()
#>    Tree  age circumference fold
#> 1     1 1004           115    1
#> 2     3 1231           115    1
#> 3     5 1004           125    1
#> 4     5 1231           142    1
#> 5     2  118            33    2
#> 6     2 1231           172    2
#> 7     4  664           112    2
#> 8     5  484            49    2
#> 9     1  118            30    3
#> 10    2 1582           203    3
#> 11    3  118            30    3
#> 12    4 1231           179    3
#> 13    1  484            58    4
#> 14    1 1582           145    4
#> 15    4 1004           167    4
#> 16    5  118            30    4
#> 17    1  664            87    5
#> 18    2 1004           156    5
#> 19    3  484            51    5
#> 20    5 1372           174    5
#> 21    1 1372           142    6
#> 22    2  664           111    6
#> 23    4 1372           209    6
#> 24    1 1231           120    7
#> 25    3 1582           140    7
#> 26    4  118            32    7
#> 27    2 1372           203    8
#> 28    3 1372           139    8
#> 29    5  664            81    8
#> 30    2  484            69    9
#> 31    3 1004           108    9
#> 32    5 1582           177    9
#> 33    3  664            75   10
#> 34    4  484            62   10
#> 35    4 1582           214   10

Created on 2024-03-06 with reprex v2.0.2

I think this is wanted both as an "escape hatch" from spatialsample, to go and use these CV objects with models that aren't (yet?) built into the tidymodels framework, and to make it easier to visualize fold assignments. The above is basically how autoplot.spatial_rset gets fold assignments for its own visualizations.

Would it make sense to add a function like this to rsample?

@mikemahoney218
Copy link
Member Author

Thinking about this for a second longer -- the implementation above wouldn't work with repeated CV (or nested, I think)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant