Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in mean_diff(p) - wrong degrees of freedom & the CI #106

Open
Generalized opened this issue May 25, 2021 · 5 comments
Open

Issue in mean_diff(p) - wrong degrees of freedom & the CI #106

Generalized opened this issue May 25, 2021 · 5 comments

Comments

@Generalized
Copy link

Generalized commented May 25, 2021

It seems the mean_diff takes incorrect degrees of freedom: n rather than n-1 to calculate the quantile of t.

Data:

d <- structure(list(change = c(-3, -3, 3, 3, -3, -3, 25, 25, -8, -8, 
2, 2, 24, 24, 2, 2, 10, 10, 24, 24, 0, 0, 37, 37), Timepoint = c("T1", 
"T2", "T1", "T2", "T1", "T2", "T1", "T2", "T1", "T2", "T1", "T2", 
"T1", "T2", "T1", "T2", "T1", "T2", "T1", "T2", "T1", "T2", "T1", 
"T2"), Result = c(35, 32, 81, 84, 69, 66, 82, 107, 84, 76, 108, 
110, 75, 99, 57, 59, 102, 112, 68, 92, 76, 76, 75, 112), Id = c(1L, 
1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 
9L, 10L, 10L, 11L, 11L, 12L, 12L)), row.names = c(NA, -24L), class = c("tbl_df", 
"tbl", "data.frame"))

d_wide <- structure(list(Id = 1:12, T1 = c(35, 81, 69, 82, 84, 108, 75, 
57, 102, 68, 76, 75), T2 = c(32, 84, 66, 107, 76, 110, 99, 59, 
112, 92, 76, 112), change = c(-3, 3, -3, 25, -8, 2, 24, 2, 10, 
24, 0, 37)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", 
"data.frame"))

There are 12 rows.

Paired t: 11 rows (n-1)

> t.test(Result ~ Timepoint, d, paired=TRUE)

	Paired t-test

data:  Result by Timepoint
t = -2.2653, df = 11, p-value = 0.04467
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -18.5659033  -0.2674301
sample estimates:
mean of the differences 
              -9.416667 

t on change (equivalent to the above)

> t.test(d_wide$change)

	One Sample t-test

data:  d_wide$change
t = 2.2653, df = 11, p-value = 0.04467
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
  0.2674301 18.5659033
sample estimates:
mean of x 
 9.416667 

obraz

Now, dabestr: n=12 (that's fine for the mean itself, but the qt needs 11), the confidence interval does not match.

....
Dataset    :  d
X Variable :  Timepoint
Y Variable :  Result

Paired mean difference of T2 (n = 12) minus T1 (n = 12)
 9.42 [95CI  -8.25; 25.2]
.....

And the plot is wrong:
obraz

Also I checked the bootstrapped version, because you use bootstrap, and the CI differs and definitely does not cover zero. I used the same seed: 12345.
It's understandable, that results of those calculation may fluctuate a bit, but the current one goes too far and it will be rather difficult to justify it against statistical reviewers, when publishing it.

> set.seed(12345)
> boot.ci(boot(d_wide %>% data.frame, 
                              statistic = function(data, indices) 
                                                return(mean(data[indices, "T2"]) - mean(data[indices, "T1"])), R=10000))
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates

CALL : 
boot.ci(boot.out = boot(d_wide %>% data.frame, statistic = function(data, 
    indices) return(mean(data[indices, "T2"]) - mean(data[indices, 
    "T1"])), R = 10000))

Intervals : 
Level      Normal              Basic         
95%   ( 1.646, 17.262 )   ( 1.335, 16.831 )  

Level     Percentile            BCa          
95%   ( 2.002, 17.498 )   ( 2.583, 18.417 )  
Calculations and Intervals on Original Scale
Warning message:

I use version 0.3.0.

@josesho
Copy link
Member

josesho commented May 25, 2021

Paired computations bug is reported previously in #94 , thanks for bumping this up.

At first pass, I don't think this is a degree of freedom problem, but how the paired bootstrap is computed.

@Generalized
Copy link
Author

Generalized commented May 25, 2021

Yes. Initially I thought it's calculated traditionally. Then I found it's done via bootstrap, so it's not the DF fault. Today morning my work was rejected by a statistical reviewer at my client due to this issue, as they used SAS to validate the outcome.

@josesho
Copy link
Member

josesho commented May 25, 2021

That's a bummer! I apologise. We do have an intern who will be working on this.

@Generalized
Copy link
Author

@josesho
Hello, is the corrected version published on the CRAN? I'd like to propose dabestr to my friend, a researcher, but I'm not sure if the error in computing correct CI and df was finally solved and released?

@josesho
Copy link
Member

josesho commented Sep 17, 2021

Hi @Generalized ,

It's not on CRAN yet but you can install the dev branch from Github with

devtools::install_github("ACCLAB/[email protected]")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants