Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add R2 #205

Closed
wants to merge 1 commit into from
Closed

Add R2 #205

wants to merge 1 commit into from

Conversation

sims1253
Copy link

@sims1253 sims1253 commented Oct 28, 2022

This is based on the code I wrote for bayesim. Related issue #201

@jgabry
Copy link
Member

jgabry commented Nov 15, 2022

Thanks for the PR, will try to review soon!

@avehtari what do you think about adding this?

@jgabry
Copy link
Member

jgabry commented Nov 15, 2022

@sims1253 Can you pull in the latest changes from the master branch into your PR (I updated the GitHub actions workflow files on the master branch) so that it runs the tests again? Thanks

@sims1253
Copy link
Author

sims1253 commented Nov 15, 2022

Forgot to push the reference values for the tests >.> Lost them in all the styler changes I didn't want to add to this pr.
I am not too familiar with the whole R generics thing and what the plan is for loo so I guess that might need some touching, but I figured the R2 code was the interesting part and the rest is probably just some cleanup.

@codecov-commenter
Copy link

Codecov Report

Merging #205 (24df877) into master (9b9a8a8) will decrease coverage by 0.04%.
The diff coverage is 88.46%.

@@            Coverage Diff             @@
##           master     #205      +/-   ##
==========================================
- Coverage   93.48%   93.43%   -0.05%     
==========================================
  Files          29       30       +1     
  Lines        2715     2741      +26     
==========================================
+ Hits         2538     2561      +23     
- Misses        177      180       +3     
Impacted Files Coverage Δ
R/loo_r2.R 88.46% <88.46%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@avehtari
Copy link
Collaborator

@avehtari what do you think about adding this?

I'm in favor of adding R2. We need to be careful that all these additional utility/loss functions (see #202 and #203) have the same behavior for the user.

Based on a quick look, the SE is not computed correctly, as it's ignoring the uncertainty in ss_y. If I remember correctly, we did go through the correct derivation with @LeeviLindgren in summer?

@LeeviLindgren
Copy link
Contributor

@avehtari We derived the first-order Taylor approximation of the variance. What we came up with is
image, where $MSE_y$ and $MSE_{\hat{e}}$ are the mean squared errors of predicting data with its mean and LOO-predictions, respectively (note that the equality sign in the image should be $\approx$).

@sims1253
Copy link
Author

Thanks for the feedback. If my memory serves me right, we defined R2 in a slightly different way than what you used in your case study to get the pointwise entries for psis-loo and the SE. But this was a year ago so I need to go check my notes to see what I thought back then and won't have time this week.

@avehtari
Copy link
Collaborator

@sims1253, I'm just checking whether you have had time to check your notes?

@avehtari
Copy link
Collaborator

@sims1253 any update on this?

@sims1253
Copy link
Author

sims1253 commented Nov 29, 2023

I assumed that I could use the same formula as in table_of_estimates to get the SE from the pointwise R2. Iirc this was one of the reasons we used the rmse inspired R2 = 1 - SS_e/SS_y approach.
I would have submitted a PR for our rmse version but was too late (if whatever we tried is even correct :D)

This is from my notes:
Starting with the RMSE as
$$RMSE = \sqrt{\frac{1}{N}\sum_{n=1}^{N}(y_n - \hat{y}_n)^2}$$

We integrate/mean over the posterior samples (and ignore the sum and 1/N until later)
$$\int...\int\sqrt{\frac{1}{N}\sum_{n=1}^N(y_n - \hat{y}_n)^2} p(\hat{y}_1) d\hat{y}_1...p(\hat{y}_S)d\hat{y}_S$$

Which we can factorize into
$$\int...\int (y_1 - \hat{y}_1)^2 + ... + (y_N - \hat{y}_N)^2 p(\hat{y}_1) d\hat{y}_1...p(\hat{y}_S)d\hat{y}_S$$

and reform into
$$\int...\int (y_1 - \hat{y}_1)^2 p(\hat{y}_1) d\hat{y}_1...p(\hat{y}_S)d\hat{y}_S + ... + \int\int (y_N - \hat{y}_N)^2 p(\hat{y}_1) d\hat{y}_1...p(\hat{y}_S)d\hat{y}_S$$

If the s's in the $(y_s - \hat{y}_s)^2$ and $p(\hat{y}_s)d\hat{y}_s$ are not the same, they're constant so we can pull them out.
Pulling the $(y_s - \hat{y}_s)^2$ terms out of the integral leaves $S$ $\int p(\hat{y}_s)d\hat{y}_s$ integrals that all equal 1as they are integrals over a density.

I am missing a step here in my notes for handling all those 1's :( Probably obvious if one can handle integrals better than I can.

$$\sum_{n=1}^N \int (y_n - \hat{y}_n)^2p(\hat{y}_n)d\hat{y}_n$$

Adding back the sum and 1/N gives us
$$\frac{1}{N}\sum_{n=1}^N \sqrt{\int (y_n - \hat{y}_n)^2p(\hat{y}_n)d\hat{y}_n}$$

which is defined pointwise and integrates over posterior samples. So we can apply psis weights and have the benefit of having a pointwise result for SE estimation.

(I am sorry but I can't find the reason for these not to render properly)

Moving on to R2, we use the following definition of R2
$$R^2 = 1 - \frac{SS_e}{SS_y} = 1 - \frac{\sum_{n = 1}^N (y_n - \hat{y}n)^2}{\sum{n = 1}^N (y_n - \bar{y})^2}$$

As there are not posterior samples in the denominator the integral form looks like this:

$$\int R^2 dp(\hat{y}) = 1 - \frac{\sum_{n=1}^N \int (y_n - \hat{y}_n)^2 dp(\hat{y}n)}{\sum{n=1}^N (y_n - \bar{y})^2}$$

$$= \sum_{n = 1}^N ( \frac{1}{N} - \frac{\int (y_n - \hat{y}n)^2}{\sum{k = 1}^N (y_k - \bar{y})^2})$$

For psis we only have to weight the nominator and again we have a pointwise result.

@avehtari
Copy link
Collaborator

It seems github can't handle sum in frac. I was able to preview them elsewhere.

It seems like you are confusing things, as the PSIS part is not an issue, but the part that we don't know the distribution of the future data, which makes the nominator and denominator dependent. @LeeviLindgren comment above gives the correct formula for Var(R2_loo).

@sims1253
Copy link
Author

Ah I think I understand the problem now, thanks for sticking with me. Will try to update this PR over the holidays.

@sims1253
Copy link
Author

Closing this for now so it doesn't clutter your PR list.

@sims1253 sims1253 closed this Dec 15, 2023
@fweber144
Copy link
Contributor

@LeeviLindgren sorry for digging this out; it's because of stan-dev/projpred#496. Do I understand

We derived the first-order Taylor approximation of the variance.

in #205 (comment) correctly that you applied a bivariate delta method to the function $R^2(MSE_{\hat{e}}, MSE_{y}) = 1 - \frac{MSE_{\hat{e}}}{MSE_{y}}$ where both $MSE_{\hat{e}}$ and $MSE_{y}$ are random variables?

@fweber144
Copy link
Contributor

@LeeviLindgren sorry for digging this out; it's because of stan-dev/projpred#496. Do I understand

We derived the first-order Taylor approximation of the variance.

in #205 (comment) correctly that you applied a bivariate delta method to the function R 2 ( M S E e ^ , M S E y ) = 1 − M S E e ^ M S E y where both M S E e ^ and M S E y are random variables?

I now did the math myself and I obtain the same result when proceeding as I described. So this result comes indeed from a bivariate delta method applied to that $R^2(\cdot, \cdot)$ function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants