`<vsel_object>$summaries$<sub[[k]]_or_ref>$w` and `weights` in `get_stat()` #330

fweber144 · 2022-07-06T07:57:43Z

I'm sorry, but I still don't understand what element <vsel_object>$summaries$<sub[[k]]_or_ref>$w is supposed to contain: The original observation weights (e.g., the observation-specific numbers of trials in a binomial family with > 1 trials for at least one observation)? Or the weights of the CV folds the observations are in? Or some combination of the two? Or something completely different (although I guess it must have to do with the observations because it has length $N$ with $N$ denoting the number of observations)?

@AlejandroCatalina, could you clarify this? Sorry for bothering you, but this is an important issue because it affects all users with nontrivial observation weights (i.e., where at least one observation weight is not 1). Using Git Blame, I tried to find the original commit introducing this element w but I ended up at commit b05a8f4 which is a commit with a lot of large changes and therefore hard to disentangle.

Related questions concerning this element w already came up in #94, #173, and #188 (note the later comments in #94), but have never really been clarified.

The reason why I stumbled across this again is related to #329: Putting #329 into a "real application" context shows that the weights in projpred:::auc() are not specified correctly:

set.seed(6834)
nobs <- 100L
mu <- binomial()$linkinv(-0.42 + rnorm(nobs))
trials <- sample.int(30L, size = nobs, replace = TRUE)
y <- rbinom(nobs, size = trials, prob = mu)
dat <- data.frame(y = y, trials = trials)

library(rstanarm)
options(mc.cores = parallel::detectCores(logical = FALSE))
r_fit <- stan_glm(
  cbind(y, trials - y) ~ 1,
  family = binomial(),
  data = dat,
  seed = 101
)

library(projpred)
vs <- varsel(r_fit,
             method = "forward",
             seed = 12354)
debug(projpred:::auc)
summary(vs, stats = "auc", seed = 43465)

That debugging of the last line shows that inside of projpred:::auc(), the weights are all 1, even though the original ones in dat$trials are not. While tracing down this issue, I realized that argument weights of get_stat() is probably misspecified in general (i.e., all stats of get_stat() are affected). And since all calls of get_stat() are within .tabulate_stats(), it was easy to see that <vsel_object>$summaries$<sub[[k]]_or_ref>$w is the problem here. (Note that despite this issue here, #329 is important on its own because it shows that even if the weights were specified correctly, the returned AUC value would be incorrect.)

The text was updated successfully, but these errors were encountered:

…#330.

fweber144 · 2022-07-07T07:27:58Z

In contrast to cv_varsel(), it seems like varsel() never returns this element <vsel_object>$summaries$<sub[[k]]_or_ref>$w, so is w supposed to contain the weights of the CV folds the observations are in? But then, what about the original observation weights? The example above shows that at least in case of the AUC, argument weights of get_stat() needs the observation weights.

I also noticed that <vsel_object>$summaries$ref$w (so w for the reference model) is always NULL which is strange because it looks like lines

projpred/R/summary_funs.R

Lines 76 to 78 in 33047d6

    
           summ <- summ_ref 
        
           res <- get_stat(summ$mu, summ$lppd, varsel$d_test, stat, mu.bs = mu.bs, 
        
                           lppd.bs = lppd.bs, weights = summ$w, alpha = alpha, ...)

expect a non-NULL element w also for the reference model (which would also make sense if w is supposed to contain the weights of the CV folds the observations are in because the reference model's performance is also cross-validated if the submodels' performances are).

…ev#329 has been fixed).

…#330.

…fixed).

…#330.

fweber144 · 2022-07-27T18:38:24Z

So it seems like element <vsel_object>$summaries$<sub[[k]]_or_ref>$w is supposed to contain the weights of the CV folds the observations are in. The actual observation weights (those supplied by the user) are not taken into account appropriately (in get_stat()), so this needs to be fixed.

Fix #330

AlejandroCatalina · 2022-10-11T08:37:35Z

The slot w is supposed to contain cross validation weights, while the observation weights should be stored in the refmodel object.

…

On Thu, 7 Jul 2022 at 9:28 AM, Frank Weber ***@***.***> wrote: In contrast to cv_varsel(), it seems like varsel() never returns this element <vsel_object>$summaries$<sub[[k]]_or_ref>$w, so is w supposed to contain the weights of the CV folds the observations are in? But then, what about the original observation weights? The example above shows that at least in case of the AUC, argument weights of get_stat() needs the observation weights. I also noticed that <vsel_object>$summaries$ref$w (so w for the reference model) is always NULL which is strange because it looks like lines https://github.com/stan-dev/projpred/blob/33047d6c9ea98d0644cb0c8503424f61ab0127af/R/summary_funs.R#L76-L78 expect a non-NULL element w also for the reference model (which would also make sense if w is supposed to contain the weights of the CV folds the observations are in because the reference model's performance is also cross-validated if the submodels' performances are). — Reply to this email directly, view it on GitHub <#330 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZ5FH2DGIZNKQWRYFZFWD3VS2BITANCNFSM52YYVDGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

fweber144 · 2022-10-11T08:41:36Z

Thanks for the confirmation :)

fweber144 · 2023-11-09T09:46:45Z

Just for the sake of completeness:

The actual observation weights (those supplied by the user) are not taken into account appropriately (in get_stat()), so this needs to be fixed.

from #330 (comment) has been addressed by 35d41ce (within PR #344) and 0e68163 (which fixes #342).

fweber144 added a commit to fweber144/projpred that referenced this issue Jul 6, 2022

Disallow observation weights (other than 1) because of issue stan-dev…

7129c81

…#330.

fweber144 changed the title ~~<vsel_object>$summaries$<sub_or_ref>$w and weights in get_stat()~~ <vsel_object>$summaries$<sub[[k]]_or_ref>$w and weights in get_stat() Jul 6, 2022

fweber144 added a commit to fweber144/projpred that referenced this issue Jul 7, 2022

Avoid issue stan-dev#330 in the unit tests (necessary now that stan-d…

0391d50

…ev#329 has been fixed).

fweber144 added a commit to fweber144/projpred that referenced this issue Jul 7, 2022

Disallow observation weights (other than 1) because of issue stan-dev…

0e2e4d1

…#330.

fweber144 added a commit that referenced this issue Jul 10, 2022

Avoid issue #330 in the unit tests (necessary now that #329 has been …

8c2190c

…fixed).

fweber144 added a commit to fweber144/projpred that referenced this issue Jul 10, 2022

Disallow observation weights (other than 1) because of issue stan-dev…

d308ff2

…#330.

fweber144 mentioned this issue Jul 16, 2022

Binomial family with > 1 trials: mu and y in get_stat() don't match #342

Closed

fweber144 added a commit that referenced this issue Jul 16, 2022

Partly fix issue #330.

23e7101

fweber144 mentioned this issue Jul 26, 2022

Subsampled LOO #94

Open

fweber144 mentioned this issue Jul 27, 2022

Fix #330 #344

Merged

fweber144 closed this as completed in 35d41ce Jul 27, 2022

fweber144 added a commit that referenced this issue Jul 27, 2022

Merge pull request #344 from fweber144/fix330

f9500cc

Fix #330

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`<vsel_object>$summaries$<sub[[k]]_or_ref>$w` and `weights` in `get_stat()` #330

`<vsel_object>$summaries$<sub[[k]]_or_ref>$w` and `weights` in `get_stat()` #330

fweber144 commented Jul 6, 2022 •

edited

Loading

fweber144 commented Jul 7, 2022

fweber144 commented Jul 27, 2022

AlejandroCatalina commented Oct 11, 2022 via email

fweber144 commented Oct 11, 2022

fweber144 commented Nov 9, 2023

<vsel_object>$summaries$<sub[[k]]_or_ref>$w and weights in get_stat() #330

<vsel_object>$summaries$<sub[[k]]_or_ref>$w and weights in get_stat() #330

Comments

fweber144 commented Jul 6, 2022 • edited Loading

fweber144 commented Jul 7, 2022

fweber144 commented Jul 27, 2022

AlejandroCatalina commented Oct 11, 2022 via email

fweber144 commented Oct 11, 2022

fweber144 commented Nov 9, 2023

`<vsel_object>$summaries$<sub[[k]]_or_ref>$w` and `weights` in `get_stat()` #330

`<vsel_object>$summaries$<sub[[k]]_or_ref>$w` and `weights` in `get_stat()` #330

fweber144 commented Jul 6, 2022 •

edited

Loading