Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when getting reference model with k-fold cross validation for cv_varsel #485

Closed
adietzel opened this issue Dec 12, 2023 · 9 comments
Closed

Comments

@adietzel
Copy link

I am revisiting a snippet of code in my analysis that used to work with previous versions of brms and projpred (Projpred Issue #328, and on a different machine after making switch from Windows to Mac.

My work flow with reprex below is:

  1. Create custom stratiefied cross-validation folds with loo::kfold_split_stratified(),
  2. Refit the model with CV using brms::kfold()
  3. Create a custom reference model with the cvfits
  4. Select relevant variables with cv_varsel()

The syntax used here for specifying a custom k-fold cv ref model is also shown in #1286

library(brms) # version 2.20.4
library(projpred) # version 2.7.0

myK <- 2

data <- data.frame(
  Pres = sample(c(0, 1), 100, replace = TRUE),
  X1 = rnorm(100, 0, 1),
  X2 = rnorm(100, 0.2, 1))

RefMod <- brm(
  formula = Pres ~ X1 + X2,
  family = bernoulli(),
  data = data)

ids <- loo::kfold_split_stratified(K = myK, x = data$Pres)
myfits_K <- brms::kfold(RefMod, folds = ids, K = myK, save_fits = TRUE)

RefMod_CV <- brms:::get_refmodel.brmsfit(
  object = RefMod,
  cvfits = structure(list("fits" = myfits_K$fits[, "fit"]),
                     "K" = myK, "folds" = ids))

mycvvs <- cv_varsel(
  RefMod_CV,
  method = "forward",
  cv_method = "kfold",
  K = myK,
  cvfits = RefMod_CV$cvfits,
  seed = 86132035)

When running cv_varsel() I get the following error message:

"in family$linkinv(pred_sub) :
Argument eta must be a nonempty numeric vector
In addition: Warning message:
In parse_args_cv_varsel(refmodel = refmodel, cv_method = cv_method, :
The content of cvfits's sub-list called fits should be moved one level up (and element fits removed). The old structure will continue to work for a while, but is deprecated."

I would appreciate any help with understanding the main error message and also with how to set the ref model correctly now.

Thanks for your help!

Andreas

Also posted as issue on brms github page #1572

@fweber144
Copy link
Collaborator

Yes, this is a known issue with the deprecated structure of cvfits that was fixed by commit b5bdc31. I plan to release a new CRAN version this week and this fix will be included there. Despite this, as mentioned in the warning message, you are using the deprecated structure of cvfits. See the docs for argument cvfits at https://mc-stan.org/projpred/reference/refmodel-init-get.html for the new structure.

@fweber144
Copy link
Collaborator

Closing now since this was fixed by commit b5bdc31.

@adietzel
Copy link
Author

Thanks a lot!

Changing the code to

RefMod_CV <- brms:::get_refmodel.brmsfit(
  object = RefMod,
  cvfits = structure(myfits_K$fits,
                     folds = attr(myfits_K, "folds")))

as suggested by the commit fixed the issue. A
ll terms are selected but I do get another error message that I can't make sense of:

Running the search using the full dataset ...
50% of terms selected
100% of terms selected

Error in UseMethod("family") :
no applicable method for 'family' applied to an object of class "c('integer', 'numeric')"

Is there another issue with my code or will this issue be resolved by the next CRAN release?
Would appreciate your thoughts.

Thanks a lot for your help

@fweber144
Copy link
Collaborator

I would need a reprex for this

@adietzel
Copy link
Author

Here is the one from above with the changes to the cvfits structure:

library(brms) # version 2.20.4
library(projpred) # version 2.7.0

myK <- 2

data <- data.frame(
  Pres = sample(c(0, 1), 100, replace = TRUE),
  X1 = rnorm(100, 0, 1),
  X2 = rnorm(100, 0.2, 1))

RefMod <- brm(
  formula = Pres ~ X1 + X2,
  family = bernoulli(),
  data = data)

ids <- loo::kfold_split_stratified(K = myK, x = data$Pres)
myfits_K <- brms::kfold(RefMod, folds = ids, K = myK, save_fits = TRUE)

RefMod_CV <- brms:::get_refmodel.brmsfit(
  object = RefMod,
  cvfits = structure(myfits_K$fits,
                     folds = attr(myfits_K, "folds")))

mycvvs <- cv_varsel(
  RefMod_CV,
  method = "forward",
  cv_method = "kfold",
  K = myK,
  cvfits = RefMod_CV$cvfits,
  seed = 86132035)

@fweber144
Copy link
Collaborator

You do not specify cvfits as required. Either use the new structure (see #485 (comment)) or use the deprecated structure (which will not be guaranteed to work forever), but then you need the fixed projpred version.

@fweber144
Copy link
Collaborator

fweber144 commented Dec 12, 2023

In any case, you may also use run_cvfun() added in projpred 2.7.0 (see the corresponding NEWS entry).

@adietzel
Copy link
Author

Thanks for pointing me in the right direction!

Code below now works for those interested. Following Frank's advice I created the cvfits object with the function run_cvfun().

To provide an alternative approach based on my original code, I also had another look at the structure required for the cvfits object. Since I am working with custom stratified cross validation folds, the second approach is more flexible than using run_cvfun(). Though I am not sure whether custom CV folds can also be created with run_cvfun() or alternative functions.

library(brms) # version 2.20.4
library(projpred) # version 2.7.0

myK <- 2

data <- data.frame(
  Pres = sample(c(0, 1), 100, replace = TRUE),
  X1 = rnorm(100, 0, 1),
  X2 = rnorm(100, 0.2, 1))

RefMod <- brm(
  formula = Pres ~ X1 + X2,
  family = bernoulli(),
  data = data)

# Either or but first won't work with custom folds to my knowledge
cvfits <- run_cvfun(RefMod, K = myK, save_fits = TRUE)

# Custom stratified folds
ids <- loo::kfold_split_stratified(K = myK, x = data$Pres)
myfits_K <- brms::kfold(RefMod, folds = ids, K = myK, save_fits = TRUE)
cvfits = structure(myfits_K$fits[, "fit"],
                   "folds" = ids)

# Variable selection
mycvvs <- cv_varsel(
  RefMod,
  method = "forward",
  cv_method = "kfold",
  K = myK,
  cvfits = cvfits,
  seed = 86132035)

@fweber144
Copy link
Collaborator

Though I am not sure whether custom CV folds can also be created with run_cvfun() or alternative functions.

This was implemented in #480 and hence will also be included in the upcoming CRAN release (if you need this feature until that CRAN release is completed, you can use the GitHub version):

* `run_cvfun()` has gained a new argument `folds`, accepting a vector of fold indices (the default is `NULL`, meaning that the folds are constructed internally, as before). This new argument is helpful, for example, to perform a stratified K-fold CV in a convenient manner (an example of this has been added to the `?run_cvfun` help). (GitHub: #480)

Btw, save_fits = TRUE can be omitted from the run_cvfun() call (gets ignored anyway).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants